TinyStories GPT-2 (2L, 8k vocab)

This model was trained with the tiny-lm repository on the TinyStories dataset (see paper: https://arxiv.org/abs/2305.07759).

Architecture

  • GPT-2 style decoder-only transformer
  • Layers: 2
  • Vocab size: 8192
  • Context length: 1024
  • d_model: 768
  • n_heads: 4
  • d_ff: 3072

Files

  • model.safetensors: model weights (SafeTensors)
  • model_config.yaml: tiny-lm model config
  • tokenizer.pkl: tiktoken encoding
  • tokenizer_config.yaml: tokenizer settings (BOS/EOS)
  • checkpoint.ckpt: original Lightning checkpoint

Usage

This is a tiny-lm model (not Transformers-compatible). Load it with tiny-lm:

import pickle
import torch
from tiny_lm.model.architectures.gpt2 import GPT2
from tiny_lm.model.config import GPT2Config

config = GPT2Config.from_yaml("model_config.yaml")
model = GPT2(
    vocab_size=config.vocab_size,
    d_model=config.d_model,
    n_layers=config.n_layers,
    n_heads=config.n_heads,
    d_ff=config.d_ff,
    context_length=config.context_length,
    emb_dropout=0.0,
    attn_dropout=0.0,
    resid_dropout=0.0,
    ffn_dropout=0.0,
)
from safetensors.torch import load_file as load_safetensors

state = load_safetensors("model.safetensors")
model.load_state_dict(state, strict=True)
model.eval()

with open("tokenizer.pkl", "rb") as f:
    tokenizer = pickle.load(f)
Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
23.3M params
Tensor type
BF16
·
BOOL
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train ferjorosa/tiny-lm-tinystories-8k-gpt2-2l

Paper for ferjorosa/tiny-lm-tinystories-8k-gpt2-2l