Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale Paper • 2409.08264 • Published Sep 12, 2024 • 48
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents Paper • 2405.14573 • Published May 23, 2024