--- pipeline_tag: image-text-to-text base_model: - Qwen/Qwen3.5-4B license: apache-2.0 library_name: transformers tags: - AxionML - ModelOpt - Qwen3.5 - quantized - NVFP4 - nvfp4 - sglang --- # AxionML Qwen3.5-4B-NVFP4 > Developed by [AxionML](https://huggingface.co/AxionML) for open-source serving and deployment use cases. Part of AxionML's effort to provide ready-to-serve quantized models for the community. This is an NVFP4-quantized version of [Qwen/Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B) (4B parameters), quantized using [NVIDIA TensorRT Model Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer). Weights and activations of linear layers are quantized to FP4, reducing disk size and GPU memory by ~4x compared to BF16. **About NVFP4 quantization:** NVFP4 on Blackwell couples a compact E2M1 FP4 codebook with blockwise FP8 (E4M3) scaling over 16-element micro-blocks, so that 4-bit stored values remain numerically useful for neural-network computation. The E2M1 codebook provides a small, nonuniform set of representable magnitudes up to ±6 and relies on saturating behavior rather than IEEE NaN/Inf encodings to maximize usable range per bit. Using an FP8 block scale (rather than power-of-two-only E8M0) enables fractional scales and error-minimizing scale selection strategies such as dual-pass evaluation comparing "map max to 6" versus "map max to 4 with clipping." On Blackwell Tensor Cores, native FP4 multipliers exploit E2M1 simplicity to reduce multiplier area while higher-precision FP32 accumulation protects dot-product accuracy. > **Ready for commercial and non-commercial use under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).** Over recent months, we have intensified our focus on developing foundation models that deliver exceptional utility and performance. Qwen3.5 represents a significant leap forward, integrating breakthroughs in multimodal learning, architectural efficiency, reinforcement learning scale, and global accessibility to empower developers and enterprises with unprecedented capability and efficiency. ## Qwen3.5 Highlights Qwen3.5 features the following enhancement: - **Unified Vision-Language Foundation**: Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks. - **Efficient Hybrid Architecture**: Gated Delta Networks combined with sparse Mixture-of-Experts deliver high-throughput inference with minimal latency and cost overhead. - **Scalable RL Generalization**: Reinforcement learning scaled across million-agent environments with progressively complex task distributions for robust real-world adaptability. - **Global Linguistic Coverage**: Expanded support to 201 languages and dialects, enabling inclusive, worldwide deployment with nuanced cultural and regional understanding. - **Next-Generation Training Infrastructure**: Near-100% multimodal training efficiency compared to text-only training and asynchronous RL frameworks supporting massive-scale agent scaffolds and environment orchestration.  For more details, please refer to our blog post [Qwen3.5](https://qwen.ai/blog?id=qwen3.5). ## Model Overview - Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training & Post-training - Language Model - Number of Parameters: 4B - Hidden Dimension: 2560 - Token Embedding: 248320 (Padded) - Number of Layers: 32 - Hidden Layout: 8 × (3 × (Gated DeltaNet → FFN) → 1 × (Gated Attention → FFN)) - Gated DeltaNet: - Number of Linear Attention Heads: 32 for V and 16 for QK - Head Dimension: 128 - Gated Attention: - Number of Attention Heads: 16 for Q and 4 for KV - Head Dimension: 256 - Rotary Position Embedding Dimension: 64 - Feed Forward Network: - Intermediate Dimension: 9216 - LM Output: 248320 (Tied to token embedding) - MTP: trained with multi-steps - Context Length: 262,144 natively and extensible up to 1,010,000 tokens. ## Benchmark Results ### Language
| Qwen3.5-4B | Qwen3.5-4B-NVFP4 | |
|---|---|---|
| Knowledge & STEM | ||
| MMLU-Pro | 79.1 | 77.9 |
| MMLU-Redux | 88.8 | 87.0 |
| C-Eval | 85.1 | 83.0 |
| SuperGPQA | 52.9 | 52.4 |
| GPQA Diamond | 76.2 | 74.1 |
| Instruction Following | ||
| IFEval | 89.8 | 87.5 |
| IFBench | 59.2 | 58.1 |
| MultiChallenge | 49.0 | 47.8 |
| Long Context | ||
| AA-LCR | 57.0 | 56.2 |
| LongBench v2 | 50.0 | 49.0 |
| Reasoning & Coding | ||
| HMMT Feb 25 | 74.0 | 72.7 |
| HMMT Nov 25 | 76.8 | 75.4 |
| LiveCodeBench v6 | 55.8 | 55.1 |
| OJBench | 24.1 | 23.7 |
| General Agent | ||
| BFCL-V4 | 50.3 | 49.4 |
| TAU2-Bench | 79.9 | 78.7 |
| VITA-Bench | 22.0 | 21.7 |
| DeepPlanning | 17.6 | 17.4 |
| Multilingualism | ||
| MMMLU | 76.1 | 74.8 |
| MMLU-ProX | 71.5 | 70.4 |
| NOVA-63 | 54.3 | 53.3 |
| INCLUDE | 71.0 | 69.6 |
| Global PIQA | 78.9 | 77.5 |
| PolyMATH | 51.1 | 49.8 |
| WMT24++ | 66.6 | 64.1 |
| MAXIFE | 78.0 | 75.2 |
* TAU2-Bench: we follow the official setup except for the airline domain, where all models are evaluated by applying the fixes proposed in the Claude Opus 4.5 system card.
* MMLU-ProX: we report the averaged accuracy on 29 languages.
* WMT24++: a harder subset of WMT24 after difficulty labeling and rebalancing; we report the averaged scores on 55 languages using XCOMET-XXL.
* MAXIFE: we report the accuracy on English + multilingual original prompts (totally 23 settings).
* Empty cells (--) indicate scores not yet available or not applicable.
| Qwen3.5-4B | Qwen3.5-4B-NVFP4 | |
|---|---|---|
| STEM and Puzzle | ||
| MMMU | 77.6 | 76.1 |
| MMMU-Pro | 66.3 | 65.1 |
| MathVision | 74.6 | 73.0 |
| Mathvista(mini) | 85.1 | 82.8 |
| We-Math | 75.4 | 72.6 |
| DynaMath | 83.3 | 80.3 |
| ZEROBench | 3.0 | 2.9 |
| ZEROBench_sub | 26.3 | 26.0 |
| VlmsAreBlind | 92.6 | 90.6 |
| BabyVision | 16.0/19.1 | 16.0/19.1 |
| General VQA | ||
| RealWorldQA | 79.5 | 77.1 |
| MMStar | 78.3 | 77.4 |
| MMBenchEN-DEV-v1.1 | 89.4 | 87.0 |
| SimpleVQA | 43.4 | 42.2 |
| HallusionBench | 65.0 | 63.5 |
| Text Recognition and Document Understanding | ||
| OmniDocBench1.5 | 86.2 | 85.1 |
| CharXiv(RQ) | 70.8 | 69.5 |
| MMLongBench-Doc | 54.2 | 52.9 |
| CC-OCR | 76.7 | 74.6 |
| AI2D_TEST | 89.6 | 88.2 |
| OCRBench | 85.0 | 82.0 |
| Spatial Intelligence | ||
| ERQA | 54.0 | 52.4 |
| CountBench | 96.3 | 94.9 |
| RefCOCO(avg) | 88.1 | 85.9 |
| EmbSpatialBench | 81.3 | 78.9 |
| RefSpatialBench | 54.6 | 53.1 |
| LingoQA | 74.4 | 72.2 |
| Hypersim | 12.5 | 12.2 |
| Nuscene | 9.9 | 9.6 |
| Video Understanding | ||
| VideoMME(w sub.) | 83.5 | 81.1 |
| VideoMME(w/o sub.) | 76.9 | 75.7 |
| VideoMMMU | 74.1 | 73.0 |
| MLVU | 82.8 | 81.7 |
| MVBench | 71.2 | 69.6 |
| LVBench | 66.4 | 64.6 |
| MMVU | 64.9 | 63.7 |
| Visual Agent | ||
| ScreenSpot Pro | 60.3 | 59.3 |
| OSWorld-Verified | 35.6 | 34.9 |
| AndroidWorld | 58.6 | 56.5 |
| Tool Calling | ||
| TIR-Bench | 38.9/29.9 | 38.9/29.9 |
| V* | 84.3/86.4 | 84.3/86.4 |
| Medical VQA | ||
| SLAKE | 76.1 | 75.1 |
| PMC-VQA | 55.5 | 54.4 |
| MedXpertQA-MM | 42.9 | 41.9 |
* MathVision: our model’s score is evaluated using a fixed prompt, e.g., “Please reason step by step, and put your final answer within \boxed{}.” For other models, we report the higher score between runs with and without the \boxed{} formatting.
* BabyVision: scores reported as "with CI / without CI".
* TIR-Bench and V*: scores reported as "with CI / without CI".
* Empty cells (--) indicate scores not yet available or not applicable.