gpt2small-en-it-nanochat-lr2e4-batchmaxpossible-bs7-step9000

This repo stages the best saved checkpoint from the local NanoChat EN/IT GPT-2-small-like run stable-config-recipe-v3-gpt2small-lr2e4-batchmaxpossible-bs7.

What this is

model family: GPT-2-small-like decoder-only LM
parameters: ~136M
languages: English + Italian
context length: 2500
selected checkpoint: step_9000.pt
selection reason: lowest recorded validation loss among saved checkpoints in best_validation.json

Best validation

step: 9000
validation loss: 4.0797094479
validation perplexity: 59.1282875069
validation batches: 128

Important caveat

A later checkpoint step_10000.pt exists, but it is worse on validation than step_9000.pt, so this release intentionally publishes step_9000.pt instead of the latest saved checkpoint.

Training/data provenance

training config: training_config.yaml
tokenizer: tokenizer.json + tokenizer_meta.json
packed dataset root used by the run: /mnt/apps/llm-nanochat/datasets/202605011052_fresh_50_50_score100_2500_sourcebalanced
tokenizer root used by the run: /mnt/apps/llm-nanochat/tokenizers/tok_202605011052_fresh_50_50_score100_32k_fromscratch

Included files

step_9000.pt
step_9000.safetensors
step_9000.safetensors.json
training_config.yaml
tokenizer.json
tokenizer_meta.json
best_validation.json
eval_summary.json
probe_step9000_summary.json
full run telemetry snapshots: eval_metrics.jsonl, metrics.jsonl, probe_generations.jsonl

Probe reading at step 9000

EN factual prompt The capital of Italy is -> Rome: weak (rank=248)
EN simple continuation A small language model should -> be: strong (rank=1)
IT factual prompt La capitale d'Italia è -> Roma: weak (rank=1103)
IT simple continuation Un piccolo modello linguistico dovrebbe -> essere: strong (rank=1)

So this checkpoint is useful as a real intermediate bilingual pretraining artifact, but it is not a polished factual model.

Usage

This project uses a custom NanoChat inference/training stack. The easiest local UI in the source repo is the Chainlit checkpoint tester documented in the repo README.

Limitations

factual recall is still weak
generations can become repetitive
the model was selected by validation loss inside this run family, not by broad downstream benchmark performance
dataset redistribution for the full training corpus may have separate licensing constraints; this repo contains model artifacts, not the raw/prepared training corpus

Downloads last month: -; Downloads are not tracked for this model. How to track