chess-v13-macher

Chess-playing language model for the Chess 1M Challenge, with 995,466 parameters (under the 1M limit).

Approach

The model reuses the example solution's GPT-2 architecture unchanged. The key contribution is a square-pair tokenization that compresses the vocabulary from ~1200 tokens down to 73, freeing parameter budget for a deeper network.

Each move is encoded as [from_square, to_square, separator], stripping piece identity and annotations (captures, checks, castling markers), which are implicit in the move sequence. The vocabulary is:

64 square tokens (a1-h8)
4 promotion pieces (q, r, b, n)
1 move separator (newline)
4 special tokens (PAD, BOS, EOS, UNK)

This tiny vocabulary allows investing ~97% of the parameter budget into transformer layers:

	Example solution	This model
vocab_size	1200	73
n_layer	6	9
n_embd	128	112
n_inner	384	250
n_ctx	256	180
Embedding params	~186K	~28K
Layer params	~720K	~968K
Total	~906K	~996K

Results

Official evaluation (full games, deterministic)

Metric	Score
Legal rate (1st try)	100% (520/520)
Legal rate (retry)	100% (520/520)

All 20 games end in draws by repetition with perfect legality.

Extended evaluation

Mode	1st try	With retry
Diverse positions (191)	86.4%	95.3%
Both colors — White	91.3%	95.5%
Both colors — Black	89.3%	95.5%

For comparison, OussamaleZ (#1 on the leaderboard, also 100% on official eval) scores 89.0% on diverse positions and 78.6%/81.6% as White/Black on the extended evaluation.

Move separator experiment

Training with identical hyperparameters but different separator tokens (newline vs space) yields surprisingly different behavior:

	newline	space
Full games (1st try)	100%	80.9%
Full games (retry)	100%	93.1%
Diverse positions (retry)	90.6%	93.7%

The newline model plays conservatively (all draws), while the space model generalizes better to diverse positions but loses most full games. This model uses newline.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("macher/chess-v13-macher", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("macher/chess-v13-macher", trust_remote_code=True)

Training

Dataset: dlouapre/lichess_2025-01_1M (1M Lichess games)
Epochs: 20
Learning rate: 3e-4 (cosine schedule, 5% warmup)
Batch size: 64
Optimizer: AdamW (weight decay 0.01)

Files

model.py — Architecture (detailed docstring with experiment notes)
tokenizer.py — Square-pair tokenizer
eval_extended.py — Extended evaluation script (diverse positions, full games, both colors)
eval_extended_v13.json — Evaluation results

Submitted by

macher

Downloads last month: 448

Safetensors

Model size

995k params

Tensor type

F32