Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
base_model:
|
| 5 |
+
- mair-lab/sft-simple
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
# EARL - RL Fine-tuned (S + C) (8B)
|
| 9 |
+
|
| 10 |
+
**Model Name:** `mair-lab/sft-simple.rl-simple-n-complex`
|
| 11 |
+
**Model Size:** 8B parameters
|
| 12 |
+
**Base Checkpoint:** [`mair-lab/sft-simple`](https://huggingface.co/mair-lab/sft-simple)
|
| 13 |
+
**Training Method:** Supervised Fine-Tuning (SFT) on Simple Edits → Reinforcement Learning (RL) on Simple + Complex Edits
|
| 14 |
+
**Datasets:** Simple Edit (S), Complex Edit (C)
|
| 15 |
+
|
| 16 |
+
This model is part of the EARL benchmark study:
|
| 17 |
+
📄 [EARL: The Promise of RL for Autoregressive Image Editing](https://arxiv.org/abs/2508.01119)
|
| 18 |
+
|
| 19 |
+
## Model Summary
|
| 20 |
+
|
| 21 |
+
This RL fine-tuned model builds on the SFT-simple checkpoint, using reinforcement learning to improve performance on both simple and complex edit tasks. It’s optimized using a human-aligned reward function across diverse editing instructions.
|
| 22 |
+
|
| 23 |
+
➡️ **Inference instructions:** [GitHub Repo](https://github.com/saba96/EARL?tab=readme-ov-file)
|
| 24 |
+
|
| 25 |
+
## Full Benchmark Results
|
| 26 |
+
|
| 27 |
+
| Model | Base Model | OmniEdit | EmuEdit | AURORA | MB | VisMin | I2EBench | **AVG** |
|
| 28 |
+
|---------------------------|------------|----------|---------|--------|------|--------|----------|---------|
|
| 29 |
+
| Magicbrush | SD v1.5 | 3.43 | 3.28 | 3.01 | 3.64 | 3.48 | 3.06 | 3.32 |
|
| 30 |
+
| InstructPix2Pix | SD v1.5 | 3.97 | 3.24 | 3.05 | 3.12 | 2.94 | 3.23 | 3.26 |
|
| 31 |
+
| Aurora | SD v1.5 | 4.50 | 4.40 | 4.12 | 4.62 | 3.82 | 3.58 | 4.17 |
|
| 32 |
+
| Omnigen* | - | 5.68 | 5.00 | 4.10 | 4.68 | 4.09 | 4.68 | 4.70 |
|
| 33 |
+
| **SFT (S)** | Emu3 | 5.73 | 3.66 | 3.58 | 3.19 | 3.57 | 3.59 | 3.88 |
|
| 34 |
+
| **EARL SFT (S) → RL (S+C)** | SFT (S) | **6.39** | 4.47 | **4.27** | 4.52 | 4.93 | 4.19 | **4.80** |
|
| 35 |
+
|
| 36 |
+
> 🚀 **Highlight:** Our RL model outperforms all supervised and diffusion baselines, setting a new state-of-the-art across the EARL benchmark with **4.80 AVG**.
|
| 37 |
+
|
| 38 |
+
## Use Cases
|
| 39 |
+
- Simple edits of object, attribute, style and environment changes
|
| 40 |
+
- Complex edits of counting, spatial relation and action changes
|
| 41 |
+
- Instruction-following visual transformations
|