Spaces:
Running
on
CPU Upgrade
Upgrade Transformers ≥ 4.51 — stop false trust_remote_code flags for new models such as Qwen-3, Phi-3, TinyLlama-2, etc.
The leaderboard backend is pinned to Transformers 4.48.0 (backend/pyproject.toml).
That version predates several architectures now common in the community—Qwen-3, Phi-3, and others.
When the evaluator encounters a model whose config.json contains an unknown "model_type" (e.g. "qwen3" or "phi3"), it falls back to “custom-code” mode and demands trust_remote_code=True, causing perfectly clean submissions (for example legmlai/legml-v1.0-instruct) to be rejected by the safety gate.
Upgrading to Transformers ≥ 4.51.0—where Qwen-3, Phi-3, and friends became first-class citizens—eliminates these false positives while keeping the leaderboard’s no remote-code guarantee intact. The latest patch, 4.53.2 (11 Jul 2025), is a drop-in replacement.
Proposed change
# backend/pyproject.toml
transformers = ">=4.51,<4.54" # or pin to 4.53.2 for reproducibility
Then run poetry update transformers (which refreshes the lock file).
If touching Poetry is inconvenient, appending RUN pip install --no-cache-dir "transformers>=4.53.2" to the Dockerfile achieves the same effect, but adjusting the dependency pin is cleaner and future-proof.
Why merge this now?
- Unblocks modern checkpoints – models such as Qwen-3, Phi-3, TinyLlama-2, Zephyr-β, etc. evaluate without spurious remote-code errors.
- Zero behavioural change – only the loader version bumps; existing scores remain valid.
- Less tech-debt – keeps us within two minors of upstream, reducing future scramble fixes.
Please merge as a permanent fix so contributors can submit today’s and tomorrow’s architectures without hitting avoidable security gates.