YannQi
/

R-4B

@@ -6,9 +6,14 @@ base_model:
 - Qwen/Qwen3-4B
 pipeline_tag: visual-question-answering
 ---
-# R-4B
-[[📚 Arxiv Paper (Coming soon)](https://huggingface.co/YannQi/R-4B))] [[🤗 Hugging Face](https://huggingface.co/YannQi/R-4B)]  [[🤖️ ModelScope](https://huggingface.co/YannQi/R-4B)] [[💻 Code](https://github.com/yannqi/R-4B)]
 <div align="center">
   <img src="asset/R-4B.png" width="100%" alt="R-4B Performance">
@@ -28,12 +33,6 @@ R-4B achieves state-of-the-art performance among models of its scale. In evaluat
 Below, we provide simple examples to show how to use R-4B with 🤗 Transformers.
-<!-- The code of R-4B has been in the latest Hugging face transformers and we advise you to build from source with command: （Coming Soon!）
-```
-pip install git+https://github.com/huggingface/transformers accelerate
-``` -->
 ### Using 🤗 Transformers to Chat
 > [!NOTE]
@@ -104,6 +103,183 @@ print("Auto Thinking Output:", output_text_auto_thinking)
 </details>
 ## 📈 Experimental Results
 <div align="center">

 - Qwen/Qwen3-4B
 pipeline_tag: visual-question-answering
 ---
+# R-4B: Incentivizing General-Purpose Auto-Thinking Capibilities in MLLMs via Bi-Mode Integration
+[[📚 Arxiv Paper (Coming soon)](https://huggingface.co/YannQi/R-4B)] [[🤗 Hugging Face](https://huggingface.co/YannQi/R-4B)]  [[🤖️ ModelScope](https://huggingface.co/YannQi/R-4B)] [[💻 Code](https://github.com/yannqi/R-4B)]
+<div align="center">
+<img src="asset/logo_R_4B.png" alt="logo" width="38" />
+</div>
 <div align="center">
   <img src="asset/R-4B.png" width="100%" alt="R-4B Performance">
 Below, we provide simple examples to show how to use R-4B with 🤗 Transformers.
 ### Using 🤗 Transformers to Chat
 > [!NOTE]
 </details>
+### Using vLLM for fast R-4B deployment and inference.
+- We recommend using vLLM for fast R-4B deployment and inference.
+#### Install
+The code of R-4B requires custom vllm. Please install from local source:
+```bash
+git clone https://github.com/yannqi/vllm.git
+cd vllm
+VLLM_USE_PRECOMPILED=1 uv pip install --editable .
+```
+##### Offline Inference
+```python
+import os
+from transformers import AutoProcessor
+from vllm import LLM, SamplingParams
+from PIL import Image
+import requests
+from io import BytesIO
+def load_image(image_path):
+    """Load image from URL or local path"""
+    if image_path.startswith(('http://', 'https://')):
+        response = requests.get(image_path, timeout=10)
+        response.raise_for_status()
+        image = Image.open(BytesIO(response.content))
+    else:
+        image = Image.open(image_path)
+    # Convert RGBA to RGB if needed
+    if image.mode == "RGBA":
+        background = Image.new('RGB', image.size, (255, 255, 255))
+        background.paste(image, mask=image.split()[-1])
+        image = background
+    return image.convert("RGB")
+def main():
+    model_path = "YannQi/R-4B/"
+    llm = LLM(
+        model=model_path,
+        limit_mm_per_prompt={"image": 5},
+        trust_remote_code=True,
+        tensor_parallel_size=1,
+        gpu_memory_utilization=0.8,
+    )
+    sampling_params = SamplingParams(
+        temperature=0.8,
+        max_tokens=16384,
+    )
+    image_url = "http://images.cocodataset.org/val2017/000000039769.jpg"
+    image = load_image(image_url)
+    text = "Describe this image."
+    messages = [
+        {
+            "role": "user",
+            "content": [
+                {"type": "image", "image": image},
+                {"type": "text", "text": text},
+            ],
+        },
+    ]
+    processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
+    prompt = processor.apply_chat_template(
+        messages,
+        tokenize=False,
+        add_generation_prompt=True,
+    )
+    mm_data = {"image": image}
+    llm_inputs = {
+        "prompt": prompt,
+        "multi_modal_data": mm_data,
+    }
+    outputs = llm.generate([llm_inputs], sampling_params=sampling_params)
+    generated_text = outputs[0].outputs[0].text
+    print(generated_text)
+if __name__ == '__main__':
+    main()
+```
+##### Online Serving
+- Serve
+```bash
+vllm serve \
+    yannqi/R-4B \
+    --served-model-name rvl \
+    --tensor-parallel-size 8 \
+    --gpu-memory-utilization 0.8 \
+    --host 0.0.0.0 \
+    --port 8000 \
+    --trust-remote-code
+```
+- Openai Chat Completion Client
+```python
+import base64
+from PIL import Image
+from openai import OpenAI
+# Set OpenAI's API key and API base to use vLLM's API server.
+openai_api_key = "EMPTY"
+openai_api_base = "http://localhost:8000/v1"
+client = OpenAI(
+    api_key=openai_api_key,
+    base_url=openai_api_base,
+)
+# image url
+image_messages = [
+    {
+        "role": "user",
+        "content": [
+            {
+                "type": "image_url",
+                "image_url": {
+                    "url": "http://images.cocodataset.org/val2017/000000039769.jpg"
+                },
+            },
+            {"type": "text", "text": "Describe this image."},
+        ],
+    },
+]
+chat_response = client.chat.completions.create(
+    model="rvl",
+    messages=image_messages,
+)
+print("Chat response:", chat_response)
+# image base64-encoded
+image_path = "/path/to/local/image.png"
+with open(image_path, "rb") as f:
+    encoded_image = base64.b64encode(f.read())
+encoded_image_text = encoded_image.decode("utf-8")
+image_messages = [
+    {
+        "role": "user",
+        "content": [
+            {
+                "type": "image_url",
+                "image_url": {
+                    "url": f"data:image;base64,{encoded_image_text}"
+                },
+            },
+            {"type": "text", "text": "Describe this image."},
+        ],
+    },
+]
+chat_response = client.chat.completions.create(
+    model="rvl",
+    messages=image_messages,
+)
+print("Chat response:", chat_response)
+```
 ## 📈 Experimental Results
 <div align="center">