Upgrade Transformers ≥ 4.51 — stop false trust_remote_code flags for new models such as Qwen-3, Phi-3, TinyLlama-2, etc.

#9

The leaderboard backend is pinned to Transformers 4.48.0 (backend/pyproject.toml).
That version predates several architectures now common in the community—Qwen-3, Phi-3, and others.
When the evaluator encounters a model whose config.json contains an unknown "model_type" (e.g. "qwen3" or "phi3"), it falls back to “custom-code” mode and demands trust_remote_code=True, causing perfectly clean submissions (for example legmlai/legml-v1.0-instruct) to be rejected by the safety gate.

Upgrading to Transformers ≥ 4.51.0—where Qwen-3, Phi-3, and friends became first-class citizens—eliminates these false positives while keeping the leaderboard’s no remote-code guarantee intact. The latest patch, 4.53.2 (11 Jul 2025), is a drop-in replacement.

Proposed change

# backend/pyproject.toml
transformers = ">=4.51,<4.54"   # or pin to 4.53.2 for reproducibility

Then run poetry update transformers (which refreshes the lock file).
If touching Poetry is inconvenient, appending RUN pip install --no-cache-dir "transformers>=4.53.2" to the Dockerfile achieves the same effect, but adjusting the dependency pin is cleaner and future-proof.

Why merge this now?

  • Unblocks modern checkpoints – models such as Qwen-3, Phi-3, TinyLlama-2, Zephyr-β, etc. evaluate without spurious remote-code errors.
  • Zero behavioural change – only the loader version bumps; existing scores remain valid.
  • Less tech-debt – keeps us within two minors of upstream, reducing future scramble fixes.

Please merge as a permanent fix so contributors can submit today’s and tomorrow’s architectures without hitting avoidable security gates.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment