Best non-thinking model qwen ever released

by BigBlueWhale - opened 13 days ago

13 days ago

Topic: Qwen3-VL-32B: How to fix a model and ruin a miracle at the same time
We need to talk about what happened to the 32B line. The original Qwen3-32B (April 2025) was a miracle of stability and generalization—easily the #1 open-source model for reliability.
The new VL report reveals a tragic trade-off:

The Good (Instruct): They finally fixed the broken Instruct baseline. The original text-only Instruct model was a disaster on complex prompts (Arena-Hard: 37.4), but the VL training resurrected it to a respectable 64.7.
The Bad (Thinking): Conversely, they suffocated the "Thinking" variant. The original text model was a creative powerhouse, but the VL Thinking variant regressed across the board compared to its text predecessor:
- LiveBench: Dropped from 76.8 to 74.7.
- Creative Writing v3: Dropped from 84.4 to 83.3.
- Math (AIME-25): Dropped from 85.0 to 83.7.
  The Culprit?
  It looks like data pollution. The report leans heavily on synthetic data generation using the 30B-A3B pipeline. There is nothing worse than polluting a dense masterpiece with inferior MoE synthetic sludge. They seemingly sacrificed the 32B's dense "soul" to force-fit multimodal alignment, and the degradation in reasoning stability proves it.
  Great job fixing the Instruct model, but please stop distilling 30B-A3B output into the 32B Thinking weights! 😠

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment