Open SLM Leaderboard

A leaderboard for sub-150M parameter language models, evaluated using LM-eval harness or a custom benchmark script available here Arithmark-2.0.

Leaderboard

Zero-shot evaluation. Higher is better for all columns. Click any header to sort.

Model size
# Model Params Avg HellaSwag ARC-Easy ARC-Challenge PIQA ArithMark-2

Scores

Top scores for the active size and benchmark filters.

Benchmark

Top Avg Scores

Efficiency

Score vs parameter count (log scale). Shaded zone = above regression line.

Avg Score vs Log Parameters

Org Leaderboard

Average standard deviations above or below the score-vs-size fit line.

# Organization Models Fit Std Devs Mean Avg Best Model vs Fit

Add your model

Open a PR on this Space with your model's results for the given benchmarks. They will be independently verified by our team and then your PR will be merged. Your model must be open weights to qualify. Open a PR →