Qwen3-VL-4B-Thinking
Run Qwen3-VL-4B-Thinking optimized for CPU/GPU with NexaSDK.
Quickstart
Install NexaSDK
Run the model locally with one line of code:
nexa infer NexaAI/Qwen3-VL-4B-Thinking-GGUF
Model Description
Qwen3-VL-4B-Thinking is a 4-billion-parameter multimodal large language model from the Qwen team at Alibaba Cloud.
Part of the Qwen3-VL (Vision-Language) family, it is designed for advanced visual reasoning and chain-of-thought generation across image, text, and video inputs.
Compared to the Instruct variant, the Thinking model emphasizes deeper multi-step reasoning, analysis, and planning. It produces detailed, structured outputs that reflect intermediate reasoning steps, making it well-suited for research, multimodal understanding, and agentic workflows.
Features
- Vision-Language Understanding: Processes images, text, and videos for joint reasoning tasks.
- Structured Thinking Mode: Generates intermediate reasoning traces for better transparency and interpretability.
- High Accuracy on Visual QA: Performs strongly on visual question answering, chart reasoning, and document analysis benchmarks.
- Multilingual Support: Understands and responds in multiple languages.
- Optimized for Efficiency: Delivers strong performance at 4B scale for on-device or edge deployment.
Use Cases
- Multimodal reasoning and visual question answering
- Scientific and analytical reasoning tasks involving charts, tables, and documents
- Step-by-step visual explanation or tutoring
- Research on interpretability and chain-of-thought modeling
- Integration into agent systems that require structured reasoning
Inputs and Outputs
Input:
- Text, images, or combined multimodal prompts (e.g., image + question)
Output:
- Generated text, reasoning traces, or structured responses
- May include explicit thought steps or structured JSON reasoning sequences
License
Check the official Qwen license for terms of use and redistribution.
- Downloads last month
- 6,810
4-bit
6-bit
8-bit
16-bit
Model tree for NexaAI/Qwen3-VL-4B-Thinking-GGUF
Base model
Qwen/Qwen3-VL-4B-Thinking