This is the AWQ model of UI-TARS-1.5-7B, built using AutoAWQ on A100(80G), works with vllm and lmdeploy.
Model Description
UI-TARS-1.5-7B is an open-source multimodal agent model released by ByteDance. It achieves state-of-the-art results across a variety of standard benchmarks, demonstrating strong reasoning capabilities and notable improvements over prior models.
Code: https://github.com/bytedance/UI-TARS
Application: https://github.com/bytedance/UI-TARS-desktop
Grounding Capability Evaluation
| Benchmark | UI-TARS-1.5 | OpenAI CUA | Claude 3.7 | Previous SOTA |
|---|---|---|---|---|
| ScreensSpot-V2 | 94.2 | 87.9 | 87.6 | 91.6 |
| ScreenSpotPro | 61.6 | 23.4 | 27.7 | 43.6 |
Model Scale Comparison
This table compares performance across different model scales of UI-TARS on the OSworld benchmark.
| Benchmark Type | Benchmark | UI-TARS-72B-DPO | UI-TARS-1.5-7B | UI-TARS-1.5 |
|---|---|---|---|---|
| Computer Use | OSWorld | 24.6 | 27.5 | 42.5 |
| GUI Grounding | ScreenSpotPro | 38.1 | 49.6 | 61.6 |
The released UI-TARS-1.5-7B focuses primarily on enhancing general computer use capabilities and is not specifically optimized for game-based scenarios, where the UI-TARS-1.5 still holds a significant advantage.
- Downloads last month
- 178
Model tree for flin775/UI-TARS-1.5-7B-AWQ
Base model
ByteDance-Seed/UI-TARS-1.5-7B