Open SLM Leaderboard

A leaderboard for sub-150M parameter language models, evaluated using LM-eval harness or a custom benchmark script available here Arithmark-2.0.

Leaderboard

Zero-shot evaluation. Higher is better for all columns. Click any header to sort.

# ▼	Model	Params	Avg	HellaSwag	ARC-Easy	ARC-Challenge	PIQA	ArithMark-2

Scores

Top scores for the active size and benchmark filters.

Top Avg Scores

Efficiency

Score vs parameter count (log scale). Shaded zone = above regression line.

Avg Score vs Log Parameters

Org Leaderboard

Average standard deviations above or below the score-vs-size fit line.

#	Organization	Models	Fit Std Devs	Mean Avg	Best Model vs Fit

Add your model

Open a PR on this Space with your model's results for the given benchmarks. They will be independently verified by our team and then your PR will be merged. Your model must be open weights to qualify. Open a PR →