anemll
/

anemll-google-gemma-3-4b-it-qat-int4-unquantized-ctx1024_0.3.5

Apple Neural Engine

Model card Files Files and versions

anemll commited on Jan 30

Commit

d459e2f

·

verified ·

1 Parent(s): 301eb4a

Update README.md

Files changed (1) hide show

README.md +37 -0

README.md CHANGED Viewed

@@ -11,6 +11,43 @@ tags:
 - Apple Neural Engine
 - DeepHermes
 ---
 # ANEMLL
 **ANEMLL** (pronounced like "animal") is an open-source project focused on accelerating the porting of Large Language Models (LLMs) to tensor processors, starting with the Apple Neural Engine (ANE).

 - Apple Neural Engine
 - DeepHermes
 ---
+## Model Quality Benchmarks
+### FP16 Scaling for ANE Compatibility
+Gemma3 4B QAT models produce activations that exceed FP16 range (±65,504) during inference. We apply **weight scaling (α=0.1875)** to prevent overflow:
+- Embedding weights scaled by α=0.1875 (3/16)
+- LM head logits divided by α to restore original scale
+- Zero runtime overhead - transformation applied at conversion time
+- 100% token match with BF16 reference
+### Quantization Results
+| Configuration | KL Divergence | Correlation | Match Rate | Notes |
+|--------------|---------------|-------------|------------|-------|
+| FP16 baseline (no LUT) | 0.0006 | 0.995 | 99.86% | Best quality |
+| **FFN LUT4,4 + LM LUT6,4** | **0.196** | **0.959** | **90%** | ***This model*** |
+| FFN LUT4,8 only | 0.284 | 0.971 | 87% | Larger size |
+| FFN LUT4,8 + LM LUT6,4 | 0.279 | 0.970 | 86% | - |
+### Metric Guidelines
+| Metric | Healthy | Concerning |
+|--------|---------|------------|
+| KL Divergence | < 0.3 | > 0.5 |
+| Correlation | > 0.95 | < 0.90 |
+| Match Rate | > 85% | < 75% |
+### Reference
+- HF Model: `google/gemma-3-4b-it-qat-int4-unquantized`
+- Scaling: α=0.1875 (FP16 overflow prevention)
+- Context: 4096 tokens
+- Sliding Window: 1024
+-
 # ANEMLL
 **ANEMLL** (pronounced like "animal") is an open-source project focused on accelerating the porting of Large Language Models (LLMs) to tensor processors, starting with the Apple Neural Engine (ANE).