TinyStories GPT-2 (2L, 8k vocab)

This model was trained with the tiny-lm repository on the TinyStories dataset (see paper: https://arxiv.org/abs/2305.07759).

Architecture

GPT-2 style decoder-only transformer
Layers: 2
Vocab size: 8192
Context length: 1024
d_model: 768
n_heads: 4
d_ff: 3072

Files

model.safetensors: model weights (SafeTensors)
model_config.yaml: tiny-lm model config
tokenizer.pkl: tiktoken encoding
tokenizer_config.yaml: tokenizer settings (BOS/EOS)
checkpoint.ckpt: original Lightning checkpoint

Usage

This is a tiny-lm model (not Transformers-compatible). Load it with tiny-lm:

import pickle
import torch
from tiny_lm.model.architectures.gpt2 import GPT2
from tiny_lm.model.config import GPT2Config

config = GPT2Config.from_yaml("model_config.yaml")
model = GPT2(
    vocab_size=config.vocab_size,
    d_model=config.d_model,
    n_layers=config.n_layers,
    n_heads=config.n_heads,
    d_ff=config.d_ff,
    context_length=config.context_length,
    emb_dropout=0.0,
    attn_dropout=0.0,
    resid_dropout=0.0,
    ffn_dropout=0.0,
)
from safetensors.torch import load_file as load_safetensors

state = load_safetensors("model.safetensors")
model.load_state_dict(state, strict=True)
model.eval()

with open("tokenizer.pkl", "rb") as f:
    tokenizer = pickle.load(f)

Downloads last month: -; Downloads are not tracked for this model. How to track

Safetensors

Model size

23.3M params

Tensor type

BF16

BOOL

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train ferjorosa/tiny-lm-tinystories-8k-gpt2-2l

Paper for ferjorosa/tiny-lm-tinystories-8k-gpt2-2l

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 45