Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

Open source tool to add cost, latency & hallucination dimensions to your LLM evaluation

#1164

by vigneshwar234 - opened 2 days ago

Hey everyone 👋

Love this leaderboard — it's the gold standard for accuracy rankings. But in production, accuracy alone doesn't tell the full story.

I built an open source LLM Evaluation Framework that adds the missing dimensions:

💰 Cost per 1K tokens — real token-count pricing across 15+ models
⚡ Latency — p50/p95/p99 percentiles, not just averages
🔍 Hallucination Rate — linguistic signal analysis, runs locally, zero extra cost
🧠 Reasoning Quality — chain-of-thought depth scoring
🎯 Accuracy — 4-strategy cascade scorer (exact, normalized, MC, fuzzy)

One CLI command. Any LiteLLM-compatible model. Full benchmark report.

pip install llm-evaluation-framework
llm-eval compare --models gpt-4o-mini --models gemini/gemini-1.5-flash --benchmark mmlu --samples 100

GitHub: https://github.com/vignesh2027/LLM-Evaluation-Framework

71 tests, 82% coverage, full CI/CD. Would love feedback from this community!

alozowski changed discussion status to closed about 9 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment