llm_leaderboard_fr

Running on CPU Upgrade

App Files Files Community

malhajar commited on Jul 13, 2025

Commit

6807637

verified ·

1 Parent(s): d58e6ce

Upgrade Transformers ≥ 4.51 — stop false trust_remote_code flags for new models such as Qwen-3, Phi-3, TinyLlama-2, etc.

Browse files

The leaderboard backend is pinned to Transformers 4.48.0 (`backend/pyproject.toml`).
That version predates several architectures now common in the community—Qwen-3, Phi-3, and others.
When the evaluator encounters a model whose `config.json` contains an unknown `"model_type"` (e.g. `"qwen3"` or `"phi3"`), it falls back to “custom-code” mode and demands `trust_remote_code=True`, causing perfectly clean submissions (for example `legmlai/legml-v1.0-instruct`) to be rejected by the safety gate.

Upgrading to Transformers ≥ 4.51.0—where Qwen-3, Phi-3, and friends became first-class citizens—eliminates these false positives while keeping the leaderboard’s *no remote-code* guarantee intact. The latest patch, 4.53.2 (11 Jul 2025), is a drop-in replacement.

### Proposed change

```toml
# backend/pyproject.toml
transformers = ">=4.51,<4.54" # or pin to 4.53.2 for reproducibility
```

Then run `poetry update transformers` (which refreshes the lock file).
If touching Poetry is inconvenient, appending `RUN pip install --no-cache-dir "transformers>=4.53.2"` to the Dockerfile achieves the same effect, but adjusting the dependency pin is cleaner and future-proof.

### Why merge this now?

* **Unblocks modern checkpoints** – models such as Qwen-3, Phi-3, TinyLlama-2, Zephyr-β, etc. evaluate without spurious remote-code errors.
* **Zero behavioural change** – only the loader version bumps; existing scores remain valid.
* **Less tech-debt** – keeps us within two minors of upstream, reducing future scramble fixes.

**Please merge as a permanent fix** so contributors can submit today’s and tomorrow’s architectures without hitting avoidable security gates.

Files changed (1) hide show

backend/pyproject.toml +1 -1

backend/pyproject.toml CHANGED Viewed

@@ -14,7 +14,7 @@ datasets = "^3.2.0"
 pyarrow = "^18.1.0"
 python-multipart = "^0.0.20"
 huggingface-hub = "^0.27.1"
-transformers = "4.48.0"
 safetensors = "^0.4.5"
 aiofiles = "^24.1.0"
 fastapi-cache2 = "^0.2.1"

 pyarrow = "^18.1.0"
 python-multipart = "^0.0.20"
 huggingface-hub = "^0.27.1"
+transformers = "^4.53.0"
 safetensors = "^0.4.5"
 aiofiles = "^24.1.0"
 fastapi-cache2 = "^0.2.1"