PengxiangLi
/

SPORT-Qwen2-VL-7B-Lora

Model card Files Files and versions

PengxiangLi commited on Aug 25

Commit

c3e7f66

·

verified ·

1 Parent(s): d0a6d5e

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -20,13 +20,13 @@ base_model:
 </div>
 This repository contains the **LoRA checkpoint** for **SPORT**, a framework that enables multimodal agents to improve iteratively through self-generated tasks and preference-based optimization.
-We finetuned **Qwen2-VL-7B** using **LoRA adapters** and **Direct Preference Optimization (DPO)**, making the model more effective at reasoning about multimodal tasks and aligning with preference signals.
 ---
 ## 📋 Key Features
-* **LoRA Fine-tuning**: Lightweight finetuning on top of Qwen2-VL-7B for efficient adaptation.
 * **DPO Training**: Preference-based optimization for stronger alignment without human annotations.
 * **Task Synthesis**: Multimodal task generation via LLMs for broad coverage.
 * **Step Exploration**: Multiple candidate actions sampled per decision point.

 </div>
 This repository contains the **LoRA checkpoint** for **SPORT**, a framework that enables multimodal agents to improve iteratively through self-generated tasks and preference-based optimization.
+We finetuned **Qwen2-VL-7B-Instruct** using **LoRA adapters** and **Direct Preference Optimization (DPO)**, making the model more effective at reasoning about multimodal tasks and aligning with preference signals.
 ---
 ## 📋 Key Features
+* **LoRA Fine-tuning**: Lightweight finetuning on top of Qwen2-VL-7B-Instruct for efficient adaptation.
 * **DPO Training**: Preference-based optimization for stronger alignment without human annotations.
 * **Task Synthesis**: Multimodal task generation via LLMs for broad coverage.
 * **Step Exploration**: Multiple candidate actions sampled per decision point.