You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Nanochat Moroccan Base 702M

A 702M-parameter nanochat base model pretrained for Moroccan Darija.

This is a base model, not an instruction-tuned assistant.

Model

  • Parameters: 701,893,188
  • Depth: 18
  • Sequence length: 2048
  • Embedding dim: 1152
  • Attention heads: 9
  • KV heads: 9
  • Window pattern: SSSL
  • Tokenizer vocab size: 32,768

Training Data

Pretrained on Lyte/darija-pretraining-corpus with these subsets:

  • arabic_raw
  • bilingual
  • pure

The goal was Moroccan Darija pretraining, not English benchmark chasing.

Checkpoint Format

This repository stores the original nanochat checkpoint format.

Files:

  • model_003248.pt
  • meta_003248.json
  • tokenizer/tokenizer.pkl
  • tokenizer/token_bytes.pt

Final Training Metrics

  • Total training time: 51.83 minutes
  • Final validation BPB: 0.744182
  • Best validation BPB: 0.743422
  • Base eval train BPB: 0.598625
  • Base eval val BPB: 0.742597
  • CORE metric: 0.0593

Base Eval

Metric Score
HellaSwag 29.56
ARC Easy 29.17
ARC Challenge 21.33
ARC Average 25.25
PIQA 53.65
CommonsenseQA 33.25
Winogrande 48.70
OpenBookQA 22.40
BoolQ 56.02
COPA 59.00

Benchmark Context

For rough context only, here is a comparison with a few small English-oriented base models. These models were trained for broad general benchmarks. Nanochat Moroccan Base 0.7B was trained for Moroccan Darija pretraining, so this comparison should be read as reference, not as a direct leaderboard claim.

Metric Nanochat-Moroccan-Base-0.7B SmolLM2-360M Qwen2.5-0.5B SmolLM-360M
HellaSwag 29.6 54.5 51.2 51.8
ARC (Average) 25.3 53.0 45.4 50.1
PIQA 53.7 71.7 69.9 71.6
MMLU (cloze) - 35.8 33.7 34.4
CommonsenseQA 33.3 38.0 31.6 35.3
TriviaQA - 16.9 4.3 9.1
Winogrande 48.7 52.5 54.1 52.8
OpenBookQA 22.4 37.4 37.4 37.2
GSM8K (5-shot) - 3.2 33.4 1.6

Small note: this model was not built to score well on English benchmarks. Its target was Moroccan Darija base pretraining.

Limitations

  • This is a base pretrained model, not an instruction-tuned assistant.
  • It can generate inaccurate, repetitive, biased, or unsafe text.
  • English benchmark scores are secondary and should not be read as the goal of this model.
  • Real conversational quality should be judged after SFT.

Disclaimer

This release is for research and experimental use. Please evaluate carefully before using it in any real product or user-facing setting.

Credits

Built on top of karpathy/nanochat.

Training adaptation, dataset work, and release by Lyte.

Pretraining data: Lyte/darija-pretraining-corpus

Citation

If you use this model, please cite:

@misc{nanochat-moroccan-base-0.7B,
  author = {Lyte},
  title = {Nanochat Moroccan Base 0.7B},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/KandirResearch/Nanochat-Moroccan-Base-0.7B}}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KandirResearch/Nanochat-Moroccan-Base-0.7B

Finetunes
1 model

Collection including KandirResearch/Nanochat-Moroccan-Base-0.7B