Add community evaluation results for AIME_2026, GPQA, HLE, HMMT_FEB_2026, MMLU-PRO, SWE-BENCH_PRO, SWE-BENCH_VERIFIED, TERMINAL-BENCH-2.0

#3
by nielsr HF Staff - opened

This PR adds community-provided evaluation results for the following benchmarks:

These results were extracted from the model card. This is based on the new evaluation results feature.

Note: This is an automated PR. Please review the evaluation results before merging.

nice feature. long standing need.
I wonder why HLE dropped vs 3.5

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment