Update README.md
Browse files
README.md
CHANGED
|
@@ -20,13 +20,13 @@ base_model:
|
|
| 20 |
</div>
|
| 21 |
|
| 22 |
This repository contains the **LoRA checkpoint** for **SPORT**, a framework that enables multimodal agents to improve iteratively through self-generated tasks and preference-based optimization.
|
| 23 |
-
We finetuned **Qwen2-VL-7B** using **LoRA adapters** and **Direct Preference Optimization (DPO)**, making the model more effective at reasoning about multimodal tasks and aligning with preference signals.
|
| 24 |
|
| 25 |
---
|
| 26 |
|
| 27 |
## 📋 Key Features
|
| 28 |
|
| 29 |
-
* **LoRA Fine-tuning**: Lightweight finetuning on top of Qwen2-VL-7B for efficient adaptation.
|
| 30 |
* **DPO Training**: Preference-based optimization for stronger alignment without human annotations.
|
| 31 |
* **Task Synthesis**: Multimodal task generation via LLMs for broad coverage.
|
| 32 |
* **Step Exploration**: Multiple candidate actions sampled per decision point.
|
|
|
|
| 20 |
</div>
|
| 21 |
|
| 22 |
This repository contains the **LoRA checkpoint** for **SPORT**, a framework that enables multimodal agents to improve iteratively through self-generated tasks and preference-based optimization.
|
| 23 |
+
We finetuned **Qwen2-VL-7B-Instruct** using **LoRA adapters** and **Direct Preference Optimization (DPO)**, making the model more effective at reasoning about multimodal tasks and aligning with preference signals.
|
| 24 |
|
| 25 |
---
|
| 26 |
|
| 27 |
## 📋 Key Features
|
| 28 |
|
| 29 |
+
* **LoRA Fine-tuning**: Lightweight finetuning on top of Qwen2-VL-7B-Instruct for efficient adaptation.
|
| 30 |
* **DPO Training**: Preference-based optimization for stronger alignment without human annotations.
|
| 31 |
* **Task Synthesis**: Multimodal task generation via LLMs for broad coverage.
|
| 32 |
* **Step Exploration**: Multiple candidate actions sampled per decision point.
|