anemll
/

anemll-google-gemma-3-1b-it-ctx4096_0.3.4

Apple Neural Engine

Model card Files Files and versions

anemll commited on Jan 30

Commit

1376429

·

verified ·

1 Parent(s): 0449180

Update README.md

removed 4B version notes

Files changed (1) hide show

README.md +0 -43

README.md CHANGED Viewed

@@ -15,49 +15,6 @@ tags:
 ## Apple Neural Engine Optimized
-This model is converted from Google's Gemma3 4B QAT (Quantization-Aware Training) for native execution on Apple Neural Engine (ANE).
-### FP16 Scaling
-ANE requires FP16 precision, but Gemma3's BF16-trained weights produce intermediate activations that overflow FP16's ±65,504 range—causing NaN/inf failures. We solve this with **weight scaling (α=0.1875)**:
-- Embeddings pre-scaled by 0.1875 at conversion time
-- LM head compensates with inverse scaling
-- Zero runtime overhead
-- Preserves 100% token match with original BF16 model
-### Quantization
-Additional LUT (Lookup Table) quantization applied for reduced model size:
-- **FFN layers**: 4-bit LUT with per-channel group size 4
-- **LM head**: 6-bit LUT with per-channel group size 4
-### Quantization Results
-| Configuration | KL Divergence | Correlation | Match Rate | Notes |
-|--------------|---------------|-------------|------------|-------|
-| FP16 baseline (no LUT) | 0.0006 | 0.995 | 99.86% | Best quality |
-| **FFN LUT4,4 + LM LUT6,4** | **0.196** | **0.959** | **90%** | ***This model*** |
-| FFN LUT4,8 only | 0.284 | 0.971 | 87% | Larger size |
-| FFN LUT4,8 + LM LUT6,4 | 0.279 | 0.970 | 86% | - |
-### Metric Guidelines
-| Metric | Healthy | Concerning |
-|--------|---------|------------|
-| KL Divergence | < 0.3 | > 0.5 |
-| Correlation | > 0.95 | < 0.90 |
-| Match Rate | > 85% | < 75% |
-### Reference
-- HF Model: `google/gemma-3-4b-it-qat-int4-unquantized`
-- Scaling: α=0.1875 (FP16 overflow prevention)
-- Context: 4096 tokens
-- Sliding Window: 1024
 **ANEMLL** (pronounced like "animal") is an open-source project focused on accelerating the porting of Large Language Models (LLMs) to tensor processors, starting with the Apple Neural Engine (ANE).
 The goal is to provide a fully open-source pipeline from model conversion to inference for common LLM architectures running on ANE.

 ## Apple Neural Engine Optimized
 **ANEMLL** (pronounced like "animal") is an open-source project focused on accelerating the porting of Large Language Models (LLMs) to tensor processors, starting with the Apple Neural Engine (ANE).
 The goal is to provide a fully open-source pipeline from model conversion to inference for common LLM architectures running on ANE.