YannQi commited on
Commit
e52fb35
Β·
verified Β·
1 Parent(s): 187b607

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +184 -8
README.md CHANGED
@@ -6,9 +6,14 @@ base_model:
6
  - Qwen/Qwen3-4B
7
  pipeline_tag: visual-question-answering
8
  ---
9
- # R-4B
10
 
11
- [[πŸ“š Arxiv Paper (Coming soon)](https://huggingface.co/YannQi/R-4B))] [[πŸ€— Hugging Face](https://huggingface.co/YannQi/R-4B)] [[πŸ€–οΈ ModelScope](https://huggingface.co/YannQi/R-4B)] [[πŸ’» Code](https://github.com/yannqi/R-4B)]
 
 
 
 
 
 
12
 
13
  <div align="center">
14
  <img src="asset/R-4B.png" width="100%" alt="R-4B Performance">
@@ -28,12 +33,6 @@ R-4B achieves state-of-the-art performance among models of its scale. In evaluat
28
 
29
  Below, we provide simple examples to show how to use R-4B with πŸ€— Transformers.
30
 
31
- <!-- The code of R-4B has been in the latest Hugging face transformers and we advise you to build from source with command: (Coming Soon!οΌ‰
32
-
33
- ```
34
- pip install git+https://github.com/huggingface/transformers accelerate
35
- ``` -->
36
-
37
  ### Using πŸ€— Transformers to Chat
38
 
39
  > [!NOTE]
@@ -104,6 +103,183 @@ print("Auto Thinking Output:", output_text_auto_thinking)
104
 
105
  </details>
106
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
107
  ## πŸ“ˆ Experimental Results
108
 
109
  <div align="center">
 
6
  - Qwen/Qwen3-4B
7
  pipeline_tag: visual-question-answering
8
  ---
 
9
 
10
+ # R-4B: Incentivizing General-Purpose Auto-Thinking Capibilities in MLLMs via Bi-Mode Integration
11
+
12
+ [[πŸ“š Arxiv Paper (Coming soon)](https://huggingface.co/YannQi/R-4B)] [[πŸ€— Hugging Face](https://huggingface.co/YannQi/R-4B)] [[πŸ€–οΈ ModelScope](https://huggingface.co/YannQi/R-4B)] [[πŸ’» Code](https://github.com/yannqi/R-4B)]
13
+
14
+ <div align="center">
15
+ <img src="asset/logo_R_4B.png" alt="logo" width="38" />
16
+ </div>
17
 
18
  <div align="center">
19
  <img src="asset/R-4B.png" width="100%" alt="R-4B Performance">
 
33
 
34
  Below, we provide simple examples to show how to use R-4B with πŸ€— Transformers.
35
 
 
 
 
 
 
 
36
  ### Using πŸ€— Transformers to Chat
37
 
38
  > [!NOTE]
 
103
 
104
  </details>
105
 
106
+ ### Using vLLM for fast R-4B deployment and inference.
107
+
108
+ - We recommend using vLLM for fast R-4B deployment and inference.
109
+
110
+ #### Install
111
+
112
+ The code of R-4B requires custom vllm. Please install from local source:
113
+
114
+ ```bash
115
+ git clone https://github.com/yannqi/vllm.git
116
+ cd vllm
117
+ VLLM_USE_PRECOMPILED=1 uv pip install --editable .
118
+ ```
119
+
120
+ ##### Offline Inference
121
+
122
+ ```python
123
+ import os
124
+ from transformers import AutoProcessor
125
+ from vllm import LLM, SamplingParams
126
+ from PIL import Image
127
+ import requests
128
+ from io import BytesIO
129
+
130
+
131
+ def load_image(image_path):
132
+ """Load image from URL or local path"""
133
+ if image_path.startswith(('http://', 'https://')):
134
+ response = requests.get(image_path, timeout=10)
135
+ response.raise_for_status()
136
+ image = Image.open(BytesIO(response.content))
137
+ else:
138
+ image = Image.open(image_path)
139
+
140
+ # Convert RGBA to RGB if needed
141
+ if image.mode == "RGBA":
142
+ background = Image.new('RGB', image.size, (255, 255, 255))
143
+ background.paste(image, mask=image.split()[-1])
144
+ image = background
145
+
146
+ return image.convert("RGB")
147
+
148
+
149
+ def main():
150
+
151
+ model_path = "YannQi/R-4B/"
152
+
153
+ llm = LLM(
154
+ model=model_path,
155
+ limit_mm_per_prompt={"image": 5},
156
+ trust_remote_code=True,
157
+ tensor_parallel_size=1,
158
+ gpu_memory_utilization=0.8,
159
+ )
160
+
161
+ sampling_params = SamplingParams(
162
+ temperature=0.8,
163
+ max_tokens=16384,
164
+ )
165
+
166
+ image_url = "http://images.cocodataset.org/val2017/000000039769.jpg"
167
+ image = load_image(image_url)
168
+ text = "Describe this image."
169
+
170
+ messages = [
171
+ {
172
+ "role": "user",
173
+ "content": [
174
+ {"type": "image", "image": image},
175
+ {"type": "text", "text": text},
176
+ ],
177
+ },
178
+ ]
179
+
180
+ processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
181
+ prompt = processor.apply_chat_template(
182
+ messages,
183
+ tokenize=False,
184
+ add_generation_prompt=True,
185
+ )
186
+
187
+ mm_data = {"image": image}
188
+ llm_inputs = {
189
+ "prompt": prompt,
190
+ "multi_modal_data": mm_data,
191
+ }
192
+
193
+ outputs = llm.generate([llm_inputs], sampling_params=sampling_params)
194
+ generated_text = outputs[0].outputs[0].text
195
+
196
+ print(generated_text)
197
+
198
+ if __name__ == '__main__':
199
+ main()
200
+ ```
201
+
202
+ ##### Online Serving
203
+
204
+ - Serve
205
+
206
+ ```bash
207
+ vllm serve \
208
+ yannqi/R-4B \
209
+ --served-model-name rvl \
210
+ --tensor-parallel-size 8 \
211
+ --gpu-memory-utilization 0.8 \
212
+ --host 0.0.0.0 \
213
+ --port 8000 \
214
+ --trust-remote-code
215
+ ```
216
+
217
+ - Openai Chat Completion Client
218
+
219
+ ```python
220
+ import base64
221
+ from PIL import Image
222
+ from openai import OpenAI
223
+
224
+
225
+ # Set OpenAI's API key and API base to use vLLM's API server.
226
+ openai_api_key = "EMPTY"
227
+ openai_api_base = "http://localhost:8000/v1"
228
+
229
+ client = OpenAI(
230
+ api_key=openai_api_key,
231
+ base_url=openai_api_base,
232
+ )
233
+
234
+ # image url
235
+ image_messages = [
236
+ {
237
+ "role": "user",
238
+ "content": [
239
+ {
240
+ "type": "image_url",
241
+ "image_url": {
242
+ "url": "http://images.cocodataset.org/val2017/000000039769.jpg"
243
+ },
244
+ },
245
+ {"type": "text", "text": "Describe this image."},
246
+ ],
247
+ },
248
+ ]
249
+
250
+ chat_response = client.chat.completions.create(
251
+ model="rvl",
252
+ messages=image_messages,
253
+ )
254
+ print("Chat response:", chat_response)
255
+
256
+ # image base64-encoded
257
+ image_path = "/path/to/local/image.png"
258
+ with open(image_path, "rb") as f:
259
+ encoded_image = base64.b64encode(f.read())
260
+ encoded_image_text = encoded_image.decode("utf-8")
261
+ image_messages = [
262
+ {
263
+ "role": "user",
264
+ "content": [
265
+ {
266
+ "type": "image_url",
267
+ "image_url": {
268
+ "url": f"data:image;base64,{encoded_image_text}"
269
+ },
270
+ },
271
+ {"type": "text", "text": "Describe this image."},
272
+ ],
273
+ },
274
+ ]
275
+
276
+ chat_response = client.chat.completions.create(
277
+ model="rvl",
278
+ messages=image_messages,
279
+ )
280
+ print("Chat response:", chat_response)
281
+ ```
282
+
283
  ## πŸ“ˆ Experimental Results
284
 
285
  <div align="center">