FolSpark commited on
Commit
9f92518
·
verified ·
1 Parent(s): 9ad3593

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -0
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ base_model:
5
+ - Qwen/Qwen2.5-7B-Instruct
6
+ - openai/clip-vit-large-patch14
7
+ - stabilityai/stable-diffusion-2-1
8
+ tags:
9
+ - Unified-models
10
+ license: apache-2.0
11
+ ---
12
+
13
+ https://github.com/FolSpark/DreamLLM-Qwen2.5
14
+
15
+ Multiple self-trained DreamLLMs.
16
+
17
+ # Model performance
18
+
19
+ DPO is trained using the dataset of [MM-RLHF](https://huggingface.co/datasets/yifanzhang114/MM-RLHF).
20
+
21
+ *indicates that only the comprehension data of LLaVA1.5 is used for the model's third-stage training.
22
+
23
+ Vicuna-CLIP-SD2.1 and Vicuna-CLIP-SD2.1* data comes from paper(https://openreview.net/forum?id=y01KGvd9Bw).
24
+
25
+ ## Multimodal Comprehension Assessment
26
+
27
+ | Method | Captioning | | VQA | | | | Comprehensive |
28
+ |----------------------|------------|----------|-----------|----------|----------|----------|---------------|
29
+ | | COCO | 12Paragraph | VQAv2 | OKVQA | VizWiz | TextVQA | MM-Vet |
30
+ | | | | | | | | |
31
+ | [**Qwen-InternViT-SD3.5**](https://huggingface.co/FolSpark/DreamLLM-Qwen2.5-InternViT-SD3.5) | 106.4 | 10.7 | 73.9 | **54.2** | 49.1 | 54.8 | 44.0 |
32
+ | [**Qwen-InternViT-SD3.5***](https://huggingface.co/FolSpark/DreamLLM-Qwen2.5-InternViT-SD3.5-CompreOnly)| 102.1 | 10.9 | 73.0 | 53.6 | 48.6 | 55.2 | **45.7** |
33
+ | [**Qwen-InternViT-SD3.5-DPO**](https://huggingface.co/FolSpark/DreamLLM-Qwen2.5-InternViT-SD3.5-DPO) | 64.6 | 11.6 | **74.2** | 50.9 | 48.9 | **55.8** | 44.7 |
34
+ | | | | | | | | |
35
+ | [Qwen-CLIP-SD3.5](https://huggingface.co/FolSpark/DreamLLM-Qwen2.5-CLIP-SD3.5) | 99.9 | 9.7 | 72.9 | 52.3 | 49.0 | 44.0 | 39.8 |
36
+ | [Qwen-CLIP-SD3.5*](https://huggingface.co/FolSpark/DreamLLM-Qwen2.5-CLIP-SD3.5-CompreOnly) | 99.1 | 10.2 | 72.7 | 51.1 | 49.1 | 43.9 | 42.1 |
37
+ | | | | | | | | |
38
+ | [Qwen-CLIP-SD2.1](https://huggingface.co/FolSpark/DreamLLM-Qwen2.5-CLIP-SD2.1) | 82.8 | 9.1 | 72.5 | 52.4 | 49.4 | 43.6 | 42 |
39
+ | [Qwen-CLIP-SD2.1*](https://huggingface.co/FolSpark/DreamLLM-Qwen2.5-CLIP-SD2.1-CompreOnly) | 97.3 | 10.8 | 72.4 | 50.4 | **49.9** | 43.2 | 39.0 |
40
+ | | | | | | | | |
41
+ | Vicuna-CLIP-SD2.1 | **115.4** | **17.4** | 56.6 | 44.3 | 45.8 | 34.9 | 35.9 |
42
+ | Vicuna-CLIP-SD2.1* | 103.7 | 8.4 | 72.9 | 52.2 | 49.3 | 41.8 | 36.6 |
43
+
44
+ ## Image Generation Evaluation
45
+
46
+ | Method | MS-COCO |
47
+ |--------------------------|---------|
48
+ | | |
49
+ | Qwen-InternViT-SD3.5-Stage1 | 11.72 |
50
+ | Qwen-InternViT-SD3.5 | **11.11** |
51
+ | Qwen-InternViT-SD3.5-DPO | 11.33 |
52
+ | | |
53
+ | Qwen-CLIP-SD3.5-Stage1 | 11.72 |
54
+ | Qwen-CLIP-SD3.5 | 11.61 |
55
+ | | |
56
+ | Qwen-CLIP-SD2.1-Stage1 | 13.94 |
57
+ | Qwen-CLIP-SD2.1 | 12.26 |
58
+ | | |
59
+ | Vicuna-CLIP-SD2.1-Stage1 | 8.76(+~2) |
60
+ | Vicuna-CLIP-SD2.1 | 8.46(+~2) |
61
+
62
+ In the original text of DreamLLm, Vicuna-CLIP-SD2.1 and Vicuna-CLIP-SD2.1 were run 8 times, and the best one among 8 images was selected for each figure. All my models were only tested once, with an approximate error of 2~3.