A leaderboard for sub-150M parameter language models, evaluated using LM-eval harness or a custom benchmark script available here Arithmark-2.0.
Zero-shot evaluation. Higher is better for all columns. Click any header to sort.
| # ▼ | Model | Params | Avg | HellaSwag | ARC-Easy | ARC-Challenge | PIQA | ArithMark-2 |
|---|
Top scores for the active size and benchmark filters.
Score vs parameter count (log scale). Shaded zone = above regression line.
Average standard deviations above or below the score-vs-size fit line.
| # | Organization | Models | Fit Std Devs | Mean Avg | Best Model vs Fit |
|---|
Open a PR on this Space with your model's results for the given benchmarks. They will be independently verified by our team and then your PR will be merged. Your model must be open weights to qualify. Open a PR →