This is the AWQ model of UI-TARS-1.5-7B, built using AutoAWQ on A100(80G), works with vllm and lmdeploy.

Model Description

UI-TARS-1.5-7B is an open-source multimodal agent model released by ByteDance. It achieves state-of-the-art results across a variety of standard benchmarks, demonstrating strong reasoning capabilities and notable improvements over prior models.

Code: https://github.com/bytedance/UI-TARS

Application: https://github.com/bytedance/UI-TARS-desktop

Grounding Capability Evaluation

Benchmark UI-TARS-1.5 OpenAI CUA Claude 3.7 Previous SOTA
ScreensSpot-V2 94.2 87.9 87.6 91.6
ScreenSpotPro 61.6 23.4 27.7 43.6

Model Scale Comparison

This table compares performance across different model scales of UI-TARS on the OSworld benchmark.

Benchmark Type Benchmark UI-TARS-72B-DPO UI-TARS-1.5-7B UI-TARS-1.5
Computer Use OSWorld 24.6 27.5 42.5
GUI Grounding ScreenSpotPro 38.1 49.6 61.6

The released UI-TARS-1.5-7B focuses primarily on enhancing general computer use capabilities and is not specifically optimized for game-based scenarios, where the UI-TARS-1.5 still holds a significant advantage.

Downloads last month
178
Safetensors
Model size
3B params
Tensor type
I32
BF16
F16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for flin775/UI-TARS-1.5-7B-AWQ

Quantized
(12)
this model