gpt2small-en-it-nanochat-lr2e4-batchmaxpossible-bs7-step9000

This repo stages the best saved checkpoint from the local NanoChat EN/IT GPT-2-small-like run stable-config-recipe-v3-gpt2small-lr2e4-batchmaxpossible-bs7.

What this is

  • model family: GPT-2-small-like decoder-only LM
  • parameters: ~136M
  • languages: English + Italian
  • context length: 2500
  • selected checkpoint: step_9000.pt
  • selection reason: lowest recorded validation loss among saved checkpoints in best_validation.json

Best validation

  • step: 9000
  • validation loss: 4.0797094479
  • validation perplexity: 59.1282875069
  • validation batches: 128

Important caveat

A later checkpoint step_10000.pt exists, but it is worse on validation than step_9000.pt, so this release intentionally publishes step_9000.pt instead of the latest saved checkpoint.

Training/data provenance

  • training config: training_config.yaml
  • tokenizer: tokenizer.json + tokenizer_meta.json
  • packed dataset root used by the run: /mnt/apps/llm-nanochat/datasets/202605011052_fresh_50_50_score100_2500_sourcebalanced
  • tokenizer root used by the run: /mnt/apps/llm-nanochat/tokenizers/tok_202605011052_fresh_50_50_score100_32k_fromscratch

Included files

  • step_9000.pt
  • step_9000.safetensors
  • step_9000.safetensors.json
  • training_config.yaml
  • tokenizer.json
  • tokenizer_meta.json
  • best_validation.json
  • eval_summary.json
  • probe_step9000_summary.json
  • full run telemetry snapshots: eval_metrics.jsonl, metrics.jsonl, probe_generations.jsonl

Probe reading at step 9000

  • EN factual prompt The capital of Italy is -> Rome: weak (rank=248)
  • EN simple continuation A small language model should -> be: strong (rank=1)
  • IT factual prompt La capitale d'Italia è -> Roma: weak (rank=1103)
  • IT simple continuation Un piccolo modello linguistico dovrebbe -> essere: strong (rank=1)

So this checkpoint is useful as a real intermediate bilingual pretraining artifact, but it is not a polished factual model.

Usage

This project uses a custom NanoChat inference/training stack. The easiest local UI in the source repo is the Chainlit checkpoint tester documented in the repo README.

Limitations

  • factual recall is still weak
  • generations can become repetitive
  • the model was selected by validation loss inside this run family, not by broad downstream benchmark performance
  • dataset redistribution for the full training corpus may have separate licensing constraints; this repo contains model artifacts, not the raw/prepared training corpus
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support