ARISE: An Adaptive Resolution-Aware Metric for Test-Time Scaling Evaluation in Large Reasoning Models Paper ⢠2510.06014 ⢠Published 25 days ago
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows Paper ⢠2510.24411 ⢠Published 4 days ago
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence Paper ⢠2510.23538 ⢠Published 5 days ago ⢠90
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Paper ⢠2509.15221 ⢠Published Sep 18 ⢠109
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Paper ⢠2509.15221 ⢠Published Sep 18 ⢠109
OS-MAP: How Far Can Computer-Using Agents Go in Breadth and Depth? Paper ⢠2507.19132 ⢠Published Jul 25
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning Paper ⢠2508.20096 ⢠Published Aug 27 ⢠36
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper ⢠2508.18265 ⢠Published Aug 25 ⢠202
CodeEvo: Interaction-Driven Synthesis of Code-centric Data through Hybrid and Iterative Feedback Paper ⢠2507.22080 ⢠Published Jul 25 ⢠9
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents Paper ⢠2507.19478 ⢠Published Jul 25 ⢠30
LeVo: High-Quality Song Generation with Multi-Preference Alignment Paper ⢠2506.07520 ⢠Published Jun 9 ⢠6