File size: 7,890 Bytes
d68afe0 3210549 d68afe0 3210549 e377af2 e52fb35 d1b06a5 e52fb35 d68afe0 2eea0de d68afe0 2eea0de d68afe0 a3b46d9 545b0ff a3b46d9 e377af2 a3b46d9 d68afe0 5cc3f24 d68afe0 e377af2 d68afe0 e377af2 d68afe0 a3b46d9 d68afe0 a3b46d9 d68afe0 e377af2 d68afe0 e377af2 d68afe0 a3b46d9 d68afe0 a3b46d9 d68afe0 e377af2 d68afe0 e377af2 d68afe0 e377af2 d68afe0 e377af2 a3b46d9 e377af2 a3b46d9 e377af2 a3b46d9 e377af2 d68afe0 e377af2 a3b46d9 d68afe0 e52fb35 2eea0de e52fb35 2eea0de e52fb35 a3b46d9 e377af2 a3b46d9 e52fb35 e377af2 e52fb35 e377af2 e52fb35 e377af2 e52fb35 e377af2 e52fb35 d68afe0 817cc87 0fbbf22 d1b06a5 0fbbf22 d1b06a5 817cc87 d68afe0 2eea0de d68afe0 3210549 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 |
---
base_model:
- Qwen/Qwen3-4B
language:
- en
license: apache-2.0
pipeline_tag: image-text-to-text
library_name: transformers
---
# R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
[[π Arxiv Paper](https://arxiv.org/pdf/2508.21113)] [[π€ Hugging Face](https://huggingface.co/YannQi/R-4B)] [[π€οΈ ModelScope](https://huggingface.co/YannQi/R-4B)] [[π» Code](https://github.com/yannqi/R-4B)]
<div align="center">
<img src="asset/logo_R_4B.png" alt="logo" width="38" />
</div>
<div align="center">
<img src="asset/R-4B.png" width="100%" alt="R-4B Performance">
</div>
## βοΈ Introduction
In this repo, we present **R-4B**, a multimodal large language model designed for general-purpose auto-thinking, autonomously switching between step-by-step thinking and direct response generation based on task complexity. This capability enables R-4B to deliver high-quality responses while significantly improving inference efficiency and reducing computational costs.
The development of R-4B follows a two-stage training paradigm:
(1) Bi-mode Annealing, which establishes both thinking and non-thinking capabilities for VQA; and
(2) Bi-mode Policy Optimization (BPO), which enables the model to adaptively switch between thinking and non-thinking modes based on input demands.
## π Key Features
- π§ **Think Smart, Act Fast: Adaptive & Controllable Thinking!**
Our model provides three-mode control over the response process.
- **Auto-thinking Mode:** Unleash **auto-thinking** that works across general topics, from simple Q&A to complex scientific analysis. It saves time and computation by thinking only when it matters.
- **Support Manual Control:** Explicitly command the model to use its `thinking` or `non-thinking` capabilities, enabling you to make your choices for every job.
- π **Strong Performance, Open for Everyone!**
Our model is now **fully open-source**. It achieves **state-of-the-art performance** among models of comparable size.
## π’ News
- **[2025.08.20]** π **vLLM Support is Here!** Our R-4B model is now fully compatible with [vLLM](https://github.com/vllm-project/vllm) for high-performance inference.
- **[2025.08.18]** π **Top Rank Achieved!** We are thrilled to announce that R-4B is now ranked #1 among all open-source models on the [OpenCompass Multi-modal Reasoning Leaderboard](https://rank.opencompass.org.cn/leaderboard-multimodal-reasoning/?m=REALTIME)!
- **[2025.08.11]** π₯ **Rank #1!** R-4B ranks first under 20B parameters on the [OpenCompass Multi-modal Academic Leaderboard](https://rank.opencompass.org.cn/leaderboard-multimodal/?m=REALTIME)!
- **[2025.08.05]** π **R-4B is Released!** Our model is now publicly available. You can download it from [Hugging Face](https://huggingface.co/YannQi/R-4B).
## π₯ Quickstart
Below, we provide simple examples to show how to use R-4B with π€ Transformers.
### Using π€ Transformers to Chat
> [!NOTE]
> Users can dynamically control the model's response by selecting one of three modes (`auto-thinking`, `thinking`, or `non-thinking`) with `thinking_mode`. `thinking_mode=auto` for `auto-thinking` mode; `thinking_mode=long` for `thinking` mode; `thinking_mode=short` for `non-thinking` mode.
> Default is `auto-thinking`.
```python
import requests
from PIL import Image
import torch
from transformers import AutoModel, AutoProcessor
model_path = "YannQi/R-4B"
# Load model
model = AutoModel.from_pretrained(
model_path,
torch_dtype=torch.float32,
trust_remote_code=True,
).to("cuda")
# Load processor
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
# Define conversation messages
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "http://images.cocodataset.org/val2017/000000039769.jpg",
},
{"type": "text", "text": "Describe this image."},
],
}
]
# Apply chat template
text = processor.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
thinking_mode="auto"
)
# Load image
image_url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(image_url, stream=True).raw)
# Process inputs
inputs = processor(
images=image,
text=text,
return_tensors="pt"
).to("cuda")
# Generate output
generated_ids = model.generate(**inputs, max_new_tokens=16384)
output_ids = generated_ids[0][len(inputs.input_ids[0]):]
# Decode output
output_text = processor.decode(
output_ids,
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)
# Print result
print("Auto-Thinking Output:", output_text)
```
</details>
### Using vLLM for fast R-4B deployment and inference.
- We recommend using vLLM for fast R-4B deployment and inference.
#### Install
The code of R-4B requires the newest vllm now. Please install from local source:
```bash
git clone https://github.com/vllm-project/vllm.git
cd vllm
VLLM_USE_PRECOMPILED=1 uv pip install --editable .
```
##### Online Serving
> [!TIP]
> The `thinking_mode` switch is also available in APIs created by [vLLM](https://github.com/vllm-project/vllm).
> Default is `auto-thinking`.
- Serve
```bash
vllm serve \
yannqi/R-4B \
--served-model-name r4b \
--tensor-parallel-size 8 \
--gpu-memory-utilization 0.8 \
--host 0.0.0.0 \
--port 8000 \
--trust-remote-code
```
- Openai Chat Completion Client
```python
import base64
from PIL import Image
from openai import OpenAI
# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
# image url
image_messages = [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "http://images.cocodataset.org/val2017/000000039769.jpg"
},
},
{"type": "text", "text": "Describe this image."},
],
},
]
chat_response = client.chat.completions.create(
model="r4b",
messages=image_messages,
max_tokens=16384,
extra_body={
"chat_template_kwargs": {"thinking_mode": "auto"},
},
)
print("Chat response:", chat_response)
```
## π Experimental Results
<div align="center">
<img src="asset/performance.png" width="100%" alt="R-4B Performance">
</div>
1. R-4B establishes itself with powerful, state-of-the-art perceptual abilities that are competitive with larger models.
2. In evaluation sets that require complex logical reasoning and mathematical problem-solving, such as WeMath, MathVerse, and LogicVista, R-4B displays a strong performance curve. This highlights its advanced adaptive thinking capacity for logical deduction and solving complex quantitative problems.
## βοΈ Citation
```
@misc{yang2025r4bincentivizinggeneralpurposeautothinking,
title={R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning},
author={Qi Yang and Bolin Ni and Shiming Xiang and Han Hu and Houwen Peng and Jie Jiang},
year={2025},
eprint={2508.21113},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2508.21113},
}
```
## Acknowledgements
R-4B is developed based on the codebases of the following projects: [LLaVA-Next](https://github.com/LLaVA-VL/LLaVA-NeXT), [SigLIP2](https://huggingface.co/google/siglip2-so400m-patch14-384), [Qwen3](https://github.com/QwenLM/Qwen3), [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL), [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). We sincerely thank these projects for their outstanding work. |