Qwen3-VL-4B-Thinking

Run Qwen3-VL-4B-Thinking optimized for CPU/GPU with NexaSDK.

Quickstart

Install NexaSDK

Run the model locally with one line of code:

nexa infer NexaAI/Qwen3-VL-4B-Thinking-GGUF

Model Description

Qwen3-VL-4B-Thinking is a 4-billion-parameter multimodal large language model from the Qwen team at Alibaba Cloud.
Part of the Qwen3-VL (Vision-Language) family, it is designed for advanced visual reasoning and chain-of-thought generation across image, text, and video inputs.

Compared to the Instruct variant, the Thinking model emphasizes deeper multi-step reasoning, analysis, and planning. It produces detailed, structured outputs that reflect intermediate reasoning steps, making it well-suited for research, multimodal understanding, and agentic workflows.

Features

Vision-Language Understanding: Processes images, text, and videos for joint reasoning tasks.
Structured Thinking Mode: Generates intermediate reasoning traces for better transparency and interpretability.
High Accuracy on Visual QA: Performs strongly on visual question answering, chart reasoning, and document analysis benchmarks.
Multilingual Support: Understands and responds in multiple languages.
Optimized for Efficiency: Delivers strong performance at 4B scale for on-device or edge deployment.

Use Cases

Multimodal reasoning and visual question answering
Scientific and analytical reasoning tasks involving charts, tables, and documents
Step-by-step visual explanation or tutoring
Research on interpretability and chain-of-thought modeling
Integration into agent systems that require structured reasoning

Inputs and Outputs

Input:

Text, images, or combined multimodal prompts (e.g., image + question)

Output:

Generated text, reasoning traces, or structured responses
May include explicit thought steps or structured JSON reasoning sequences

License

Check the official Qwen license for terms of use and redistribution.

Downloads last month: 6,810

GGUF

Model size

4B params

Architecture

Qwen3-VL-4B-Thinking

Hardware compatibility

4-bit

6-bit

8-bit

16-bit

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NexaAI/Qwen3-VL-4B-Thinking-GGUF

Base model

Qwen/Qwen3-VL-4B-Thinking

Quantized

(9)

this model

Collection including NexaAI/Qwen3-VL-4B-Thinking-GGUF

Qwen3VL

Collection

Nexa AI infra to support Qwen3VL running on GPU/NPU/CPU • 22 items • Updated about 18 hours ago • 3