yuhangzang commited on
Commit
2e39016
Β·
verified Β·
1 Parent(s): a5db8ff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -15
README.md CHANGED
@@ -7,6 +7,11 @@ library_name: transformers
7
  tags:
8
  - multimodal
9
  - image caption
 
 
 
 
 
10
  ---
11
 
12
 
@@ -20,21 +25,17 @@ tags:
20
 
21
  ## πŸ“’ News
22
  We are working on even stronger base models and upgrading our training recipe β€” stay tuned!
23
- - πŸ”₯ [10/15/2025] The total downloads of the CapRL-related model and dataset reached 6,000 within just 20 days!
24
- - πŸš€ [10/15/2025] We are excited to announce the release of **CapRL-InternVL3.5-8B**, whose image captioning capability outperforms Qwen2.5-VL-72B!
25
- - πŸš€ [10/15/2025] We release QA curation code.
26
- - πŸš€ [09/25/2025] We release **CapRL** repository, model, evaluation code and dataset.
27
-
28
- Based on the same recipe as CapRL-3B, we used InternVL3.5-8B as the policy model and obtained **CapRL-InternVL3.5-8B** through CapRL.
29
-
30
-
31
- CapRL-3B-GGUF is static quants version, and CapRL-3B-i1-GGUF is weighted/imatrix quants version. Thanks for their contribution!
32
 
33
 
34
  ## Introduction
35
- We are excited to introduce CapRL-3B, a lightweight 3B image captioner that achieves perception capabilities comparable to Qwen2.5-VL-72B.
36
 
37
- This is the first study of applying Reinforcement Learning with Verifiable Rewards for the
38
  open-ended and subjective image captioning task. Unlike traditional Supervised Fine-Tuning, which
39
  can lead to models memorizing a limited set of annotated captions, our method allows the model to
40
  explore and generate a broader range of creative and general descriptions.
@@ -43,8 +44,8 @@ stage uses LVLMs to generate rich and accurate captions. Subsequently, the secon
43
  caption quality by using a vision-only LLM to perform the QA task. We also created a specific QA
44
  curation pipeline to ensure the quality of the questions and answers used for the second stage.
45
 
46
- By employing CapRL training framework, initializing with the Qwen2.5-VL-3B model, and using a carefully
47
- filtered 75K QA dataset as the training set, we obtained a highly capable captioner, CapRL-3B.
48
 
49
  <p align="center">
50
  <img src="./assets/teaser.png" width="750"/>
@@ -59,12 +60,11 @@ filtered 75K QA dataset as the training set, we obtained a highly capable captio
59
  * **Detailed description for natural images**: The outputs of CapRL-3B can perfectly cover all valid visual information while containing fewer hallucinations.
60
 
61
  ## Usage
62
- If you want to use **CapRL-3B** for captioning, you can directly follow the exact same inference approach as in [Qwen2.5-VL-series](https://github.com/QwenLM/Qwen3-VL/tree/d2240f11656bfe404b9ba56db4e51cd09f522ff1).
63
 
64
  We recommend using **vLLM** to speed up inference.
65
 
66
 
67
-
68
  ### Start an OpenAI API Service
69
 
70
  Run the command below to start an OpenAI-compatible API service:
 
7
  tags:
8
  - multimodal
9
  - image caption
10
+ - captioning
11
+ datasets:
12
+ - internlm/CapRL-2M
13
+ base_model:
14
+ - OpenGVLab/InternVL3_5-8B
15
  ---
16
 
17
 
 
25
 
26
  ## πŸ“’ News
27
  We are working on even stronger base models and upgrading our training recipe β€” stay tuned!
28
+ - πŸ”₯ [10/15/2025] The total downloads of the CapRL-related [models and dataset](https://huggingface.co/collections/long-xing1/caprl-68d64ac32ded31596c36e189) reached 6,000 within just 20 days!
29
+ - πŸš€ [10/15/2025] We are excited to announce the release of **[CapRL-InternVL3.5-8B](https://huggingface.co/internlm/CapRL-InternVL3.5-8B)**, whose image captioning capability outperforms Qwen2.5-VL-72B!
30
+ - πŸš€ [10/15/2025] Thanks [mradermacher](https://huggingface.co/mradermacher) for contribution! [CapRL-3B-GGUF](https://huggingface.co/mradermacher/CapRL-3B-GGUF) is the static quants version, and [CapRL-3B-i1-GGUF](https://huggingface.co/mradermacher/CapRL-3B-i1-GGUF) is weighted/imatrix quants version.
31
+ - πŸš€ [10/15/2025] We release [QA curation code](https://github.com/InternLM/CapRL).
32
+ - πŸš€ [09/25/2025] We release **CapRL** repository, [CapRL-3B model](https://huggingface.co/internlm/CapRL-3B), [evaluation code](https://github.com/InternLM/CapRL) and [dataset](https://huggingface.co/datasets/internlm/CapRL-2M).
 
 
 
 
33
 
34
 
35
  ## Introduction
36
+ Based on the same recipe as [CapRL-3B](https://huggingface.co/internlm/CapRL-3B), we used [InternVL3.5-8B](https://huggingface.co/OpenGVLab/InternVL3_5-8B) as the policy model and obtained **[CapRL-InternVL3.5-8B](https://huggingface.co/yuhangzang/CapRL-InternVL3.5-8B)** through CapRL.
37
 
38
+ CapRL is the first study of applying Reinforcement Learning with Verifiable Rewards for the
39
  open-ended and subjective image captioning task. Unlike traditional Supervised Fine-Tuning, which
40
  can lead to models memorizing a limited set of annotated captions, our method allows the model to
41
  explore and generate a broader range of creative and general descriptions.
 
44
  caption quality by using a vision-only LLM to perform the QA task. We also created a specific QA
45
  curation pipeline to ensure the quality of the questions and answers used for the second stage.
46
 
47
+ By employing the CapRL training framework, initializing with the [InternVL3.5-8B](https://huggingface.co/OpenGVLab/InternVL3_5-8B) model, and using a carefully
48
+ filtered 75K QA dataset as the training set, we obtained a highly capable captioner, CapRL-InternVL3.5-8B.
49
 
50
  <p align="center">
51
  <img src="./assets/teaser.png" width="750"/>
 
60
  * **Detailed description for natural images**: The outputs of CapRL-3B can perfectly cover all valid visual information while containing fewer hallucinations.
61
 
62
  ## Usage
63
+ If you want to use **CapRL-InternVL3.5-8B** for captioning, you can directly follow the exact same inference approach as in [InternVL-3.5-series](https://huggingface.co/collections/internlm/internvl35-68ab285d4a1f0871ddcb75b2).
64
 
65
  We recommend using **vLLM** to speed up inference.
66
 
67
 
 
68
  ### Start an OpenAI API Service
69
 
70
  Run the command below to start an OpenAI-compatible API service: