chess-v13-macher

Chess-playing language model for the Chess 1M Challenge, with 995,466 parameters (under the 1M limit).

Approach

The model reuses the example solution's GPT-2 architecture unchanged. The key contribution is a square-pair tokenization that compresses the vocabulary from ~1200 tokens down to 73, freeing parameter budget for a deeper network.

Each move is encoded as [from_square, to_square, separator], stripping piece identity and annotations (captures, checks, castling markers), which are implicit in the move sequence. The vocabulary is:

  • 64 square tokens (a1-h8)
  • 4 promotion pieces (q, r, b, n)
  • 1 move separator (newline)
  • 4 special tokens (PAD, BOS, EOS, UNK)

This tiny vocabulary allows investing ~97% of the parameter budget into transformer layers:

Example solution This model
vocab_size 1200 73
n_layer 6 9
n_embd 128 112
n_inner 384 250
n_ctx 256 180
Embedding params ~186K ~28K
Layer params ~720K ~968K
Total ~906K ~996K

Results

Official evaluation (full games, deterministic)

Metric Score
Legal rate (1st try) 100% (520/520)
Legal rate (retry) 100% (520/520)

All 20 games end in draws by repetition with perfect legality.

Extended evaluation

Mode 1st try With retry
Diverse positions (191) 86.4% 95.3%
Both colors โ€” White 91.3% 95.5%
Both colors โ€” Black 89.3% 95.5%

For comparison, OussamaleZ (#1 on the leaderboard, also 100% on official eval) scores 89.0% on diverse positions and 78.6%/81.6% as White/Black on the extended evaluation.

Move separator experiment

Training with identical hyperparameters but different separator tokens (newline vs space) yields surprisingly different behavior:

newline space
Full games (1st try) 100% 80.9%
Full games (retry) 100% 93.1%
Diverse positions (retry) 90.6% 93.7%

The newline model plays conservatively (all draws), while the space model generalizes better to diverse positions but loses most full games. This model uses newline.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("macher/chess-v13-macher", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("macher/chess-v13-macher", trust_remote_code=True)

Training

  • Dataset: dlouapre/lichess_2025-01_1M (1M Lichess games)
  • Epochs: 20
  • Learning rate: 3e-4 (cosine schedule, 5% warmup)
  • Batch size: 64
  • Optimizer: AdamW (weight decay 0.01)

Files

  • model.py โ€” Architecture (detailed docstring with experiment notes)
  • tokenizer.py โ€” Square-pair tokenizer
  • eval_extended.py โ€” Extended evaluation script (diverse positions, full games, both colors)
  • eval_extended_v13.json โ€” Evaluation results

Submitted by

macher

Downloads last month
448
Safetensors
Model size
995k params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support