Update README.md
Browse files
README.md
CHANGED
|
@@ -1,12 +1,19 @@
|
|
| 1 |
---
|
| 2 |
-
|
|
|
|
| 3 |
datasets:
|
| 4 |
- rp-yu/VPT_Datasets
|
| 5 |
language:
|
| 6 |
- en
|
|
|
|
|
|
|
| 7 |
metrics:
|
| 8 |
- accuracy
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model:
|
| 3 |
+
- Qwen/Qwen2-VL-2B-Instruct
|
| 4 |
datasets:
|
| 5 |
- rp-yu/VPT_Datasets
|
| 6 |
language:
|
| 7 |
- en
|
| 8 |
+
library_name: transformers
|
| 9 |
+
license: apache-2.0
|
| 10 |
metrics:
|
| 11 |
- accuracy
|
| 12 |
+
pipeline_tag: image-text-to-text
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# Introducing Visual Perception Token into Multimodal Large Language Model
|
| 16 |
+
|
| 17 |
+
This repository contains models based on the paper [Introducing Visual Perception Token into Multimodal Large Language Model](https://arxiv.org/abs/2502.17425). These models utilize Visual Perception Tokens to enhance the visual perception capabilities of multimodal large language models (MLLMs).
|
| 18 |
+
|
| 19 |
+
Code: https://github.com/yu-rp/VisualPerceptionToken
|