GPT Language Model

A 124M parameter GPT model trained from scratch using PyTorch.

This project contains:

  • custom multi-head self-attention
  • transformer blocks
  • causal masking
  • autoregressive text generation
  • mixed precision training
  • top-k / top-p sampling
  • safetensors model weights

The model was trained on a subset of FineWeb-Edu using a GPT-2 tokenizer.


Architecture

Model configuration:

{
    "vocab_size": 50257,
    "context_length": 256,
    "emb_dim": 768,
    "n_heads": 12,
    "n_layers": 12,
    "drop_rate": 0.1,
    "qkv_bias": False
}

Approximate parameter count:

  • ~124M parameters

Architecture components:

  • token embeddings
  • positional embeddings
  • masked multi-head self-attention
  • feed-forward MLP blocks
  • pre-layer normalization
  • residual connections
  • causal language modeling head

Training

Training setup:

  • PyTorch
  • AdamW optimizer
  • Automatic Mixed Precision (AMP)
  • Gradient clipping
  • Top-k / Top-p text generation

Hardware used:

  • RTX 3060 Ti 8GB

Dataset:

  • FineWeb-Edu subset (100M tokens)

Tokenizer:

  • GPT-2 tokenizer

Training Progress

The graph below shows train/validation loss progression during training on FineWeb-Edu.

Training Loss Curve

Installation

Install dependencies:

pip install torch transformers safetensors

Loading The Model

import json
import torch

from safetensors.torch import load_file
from transformers import AutoTokenizer

from model import GPTModel

# load config
with open("config.json") as f:
    cfg = json.load(f)

# create model
model = GPTModel(cfg)

# load weights
state_dict = load_file("model.safetensors")

model.load_state_dict(state_dict)

model.eval()

# tokenizer
tokenizer = AutoTokenizer.from_pretrained(".")

Text Generation Example

from model import generate_and_print_sample

print(generate_and_print_sample(model, tokenizer, "cuda", "The world is big"))

Sample Generations

Example generations from early-stage training:

"The world is big. If you are a scientist, you can’t have to worry about how much the science is going on in your area. What I want is that there is no way to know when it comes to the idea of what the world is doing with our society? We need to understand that this means that we need to be able to take action against the problem and find out which ones will be exposed to the issue, rather than where it has been done. If you have any questions or comments, you might not have heard from your doctor. You may have heard from your doctor for more information about the topic. The best way to do so is to use a lot of information. For example, if you don’t like a doctor, you should be able to tell you how much you will have at home and why you would like to talk to someone else who has never visited them. In order to make sure that they are safe, you can also get the right answer. What kind of questions will I like to ask? Please refer to the following link: - What kind of questions will I need to help me determine if my child has had cancer? - How will I respond to treatment? Will my child receive the same chemotherapy?"

The model currently demonstrates

  • Coherent paragraph generation
  • Long-form text continuation
  • Scientific and educational writing style
  • Basic topic consistency across multiple sentences
  • Emergent reasoning and abstraction patterns
  • Generation of novel names and phrases
  • Structured article-like prose
  • Stable grammar and syntax generation

Current limitations:

  • Factual inaccuracies
  • Semantic repetition
  • Weak instruction following
  • Limited reasoning depth
  • Hallucinated entities and concepts

Files

model.py              # GPT architecture
model.safetensors     # trained weights
config.json           # model configuration
tokenizer files       # GPT2 tokenizer assets
README.md             # project documentation

Notes

This is a custom PyTorch implementation and is not directly compatible with Hugging Face AutoModelForCausalLM.

Users should load the model using the provided model.py architecture.


License

MIT License.

license: mit

Downloads last month
33
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support