|
|
| --- |
| language: en |
| license: apache-2.0 |
| tags: |
| - efficient-llm |
| - quantization |
| - ternary |
| - bitnet |
| - pytorch |
| - tinystories |
| - language-modeling |
| datasets: |
| - roneneldan/TinyStories |
| arxiv: 2602.07374 |
| --- |
| |
| # TernaryLM-132M |
|
|
| TernaryLM-132M is a 132M parameter Transformer trained natively using ternary weights {-1, 0, +1}. |
|
|
| Unlike post-training quantization methods, this model learns quantized representations during training. |
|
|
| ## Architecture |
|
|
| - Parameters: 132M |
| - Layers: 12 |
| - Hidden Size: 768 |
| - Attention Heads: 12 |
| - Context Length: 512 |
| - Quantization: Native Ternary Training |
|
|
| ## Training |
|
|
| - Dataset: TinyStories (~60k stories) |
| - Optimizer: AdamW (betas=(0.9, 0.98)) |
| - LR: 3e-4 |
| - Scheduler: OneCycleLR |
| - Epochs: 15 |
| - Hardware: Multi-GPU T4 setup (Kaggle) |
|
|
| ## Intended Use |
|
|
| Research on: |
| - Efficient Transformers |
| - Quantization-aware training |
| - Edge deployment |
|
|
| ## Limitations |
|
|
| - Not instruction-tuned |
| - Limited dataset scale |
| - Research prototype |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{nargund2026ternarylmmemoryefficientlanguagemodeling, |
| title={TernaryLM: Memory-Efficient Language Modeling via Native 1-Bit Quantization with Adaptive Layer-wise Scaling}, |
| author={Nisharg Nargund and Priyesh Shukla}, |
| year={2026}, |
| eprint={2602.07374}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CL}, |
| url={https://arxiv.org/abs/2602.07374}, |
| } |
| ``` |