Abstract
A survey of graphical user interface agents powered by large foundation models, detailing benchmarks, metrics, architectures, training methods, and future challenges in automating human-computer interaction.
Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction. These agents autonomously interact with digital systems or software applications via GUIs, emulating human actions such as clicking, typing, and navigating visual elements across diverse platforms. Motivated by the growing interest and fundamental importance of GUI agents, we provide a comprehensive survey that categorizes their benchmarks, evaluation metrics, architectures, and training methods. We propose a unified framework that delineates their perception, reasoning, planning, and acting capabilities. Furthermore, we identify important open challenges and discuss key future directions. Finally, this work serves as a basis for practitioners and researchers to gain an intuitive understanding of current progress, techniques, benchmarks, and critical open problems that remain to be addressed.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Large Language Model-Brained GUI Agents: A Survey (2024)
- Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction (2024)
- AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials (2024)
- Ponder&Press: Advancing Visual GUI Agent towards General Computer Control (2024)
- GUI Agents with Foundation Models: A Comprehensive Survey (2024)
- Improved GUI Grounding via Iterative Narrowing (2024)
- AutoGLM: Autonomous Foundation Agents for GUIs (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
 You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: 
@librarian-bot
	 recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
 
					 
					 
					 
					 
					 
						
 
						
 
						