TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Paper • 2305.07759 • Published • 45
This model was trained with the tiny-lm repository on the TinyStories dataset (see paper: https://arxiv.org/abs/2305.07759).
model.safetensors: model weights (SafeTensors)model_config.yaml: tiny-lm model configtokenizer.pkl: tiktoken encodingtokenizer_config.yaml: tokenizer settings (BOS/EOS)checkpoint.ckpt: original Lightning checkpointThis is a tiny-lm model (not Transformers-compatible). Load it with tiny-lm:
import pickle
import torch
from tiny_lm.model.architectures.gpt2 import GPT2
from tiny_lm.model.config import GPT2Config
config = GPT2Config.from_yaml("model_config.yaml")
model = GPT2(
vocab_size=config.vocab_size,
d_model=config.d_model,
n_layers=config.n_layers,
n_heads=config.n_heads,
d_ff=config.d_ff,
context_length=config.context_length,
emb_dropout=0.0,
attn_dropout=0.0,
resid_dropout=0.0,
ffn_dropout=0.0,
)
from safetensors.torch import load_file as load_safetensors
state = load_safetensors("model.safetensors")
model.load_state_dict(state, strict=True)
model.eval()
with open("tokenizer.pkl", "rb") as f:
tokenizer = pickle.load(f)