anemll commited on
Commit
1376429
·
verified ·
1 Parent(s): 0449180

Update README.md

Browse files

removed 4B version notes

Files changed (1) hide show
  1. README.md +0 -43
README.md CHANGED
@@ -15,49 +15,6 @@ tags:
15
 
16
  ## Apple Neural Engine Optimized
17
 
18
- This model is converted from Google's Gemma3 4B QAT (Quantization-Aware Training) for native execution on Apple Neural Engine (ANE).
19
-
20
- ### FP16 Scaling
21
-
22
- ANE requires FP16 precision, but Gemma3's BF16-trained weights produce intermediate activations that overflow FP16's ±65,504 range—causing NaN/inf failures. We solve this with **weight scaling (α=0.1875)**:
23
-
24
- - Embeddings pre-scaled by 0.1875 at conversion time
25
- - LM head compensates with inverse scaling
26
- - Zero runtime overhead
27
- - Preserves 100% token match with original BF16 model
28
-
29
- ### Quantization
30
-
31
- Additional LUT (Lookup Table) quantization applied for reduced model size:
32
- - **FFN layers**: 4-bit LUT with per-channel group size 4
33
- - **LM head**: 6-bit LUT with per-channel group size 4
34
-
35
- ### Quantization Results
36
-
37
- | Configuration | KL Divergence | Correlation | Match Rate | Notes |
38
- |--------------|---------------|-------------|------------|-------|
39
- | FP16 baseline (no LUT) | 0.0006 | 0.995 | 99.86% | Best quality |
40
- | **FFN LUT4,4 + LM LUT6,4** | **0.196** | **0.959** | **90%** | ***This model*** |
41
- | FFN LUT4,8 only | 0.284 | 0.971 | 87% | Larger size |
42
- | FFN LUT4,8 + LM LUT6,4 | 0.279 | 0.970 | 86% | - |
43
-
44
- ### Metric Guidelines
45
-
46
- | Metric | Healthy | Concerning |
47
- |--------|---------|------------|
48
- | KL Divergence | < 0.3 | > 0.5 |
49
- | Correlation | > 0.95 | < 0.90 |
50
- | Match Rate | > 85% | < 75% |
51
-
52
- ### Reference
53
-
54
- - HF Model: `google/gemma-3-4b-it-qat-int4-unquantized`
55
- - Scaling: α=0.1875 (FP16 overflow prevention)
56
- - Context: 4096 tokens
57
- - Sliding Window: 1024
58
-
59
-
60
-
61
  **ANEMLL** (pronounced like "animal") is an open-source project focused on accelerating the porting of Large Language Models (LLMs) to tensor processors, starting with the Apple Neural Engine (ANE).
62
 
63
  The goal is to provide a fully open-source pipeline from model conversion to inference for common LLM architectures running on ANE.
 
15
 
16
  ## Apple Neural Engine Optimized
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  **ANEMLL** (pronounced like "animal") is an open-source project focused on accelerating the porting of Large Language Models (LLMs) to tensor processors, starting with the Apple Neural Engine (ANE).
19
 
20
  The goal is to provide a fully open-source pipeline from model conversion to inference for common LLM architectures running on ANE.