PengxiangLi commited on
Commit
c3e7f66
·
verified ·
1 Parent(s): d0a6d5e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -20,13 +20,13 @@ base_model:
20
  </div>
21
 
22
  This repository contains the **LoRA checkpoint** for **SPORT**, a framework that enables multimodal agents to improve iteratively through self-generated tasks and preference-based optimization.
23
- We finetuned **Qwen2-VL-7B** using **LoRA adapters** and **Direct Preference Optimization (DPO)**, making the model more effective at reasoning about multimodal tasks and aligning with preference signals.
24
 
25
  ---
26
 
27
  ## 📋 Key Features
28
 
29
- * **LoRA Fine-tuning**: Lightweight finetuning on top of Qwen2-VL-7B for efficient adaptation.
30
  * **DPO Training**: Preference-based optimization for stronger alignment without human annotations.
31
  * **Task Synthesis**: Multimodal task generation via LLMs for broad coverage.
32
  * **Step Exploration**: Multiple candidate actions sampled per decision point.
 
20
  </div>
21
 
22
  This repository contains the **LoRA checkpoint** for **SPORT**, a framework that enables multimodal agents to improve iteratively through self-generated tasks and preference-based optimization.
23
+ We finetuned **Qwen2-VL-7B-Instruct** using **LoRA adapters** and **Direct Preference Optimization (DPO)**, making the model more effective at reasoning about multimodal tasks and aligning with preference signals.
24
 
25
  ---
26
 
27
  ## 📋 Key Features
28
 
29
+ * **LoRA Fine-tuning**: Lightweight finetuning on top of Qwen2-VL-7B-Instruct for efficient adaptation.
30
  * **DPO Training**: Preference-based optimization for stronger alignment without human annotations.
31
  * **Task Synthesis**: Multimodal task generation via LLMs for broad coverage.
32
  * **Step Exploration**: Multiple candidate actions sampled per decision point.