- 
	
	
	MLGym: A New Framework and Benchmark for Advancing AI Research AgentsPaper • 2502.14499 • Published • 192
- 
	
	
	Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning BenchmarkPaper • 2501.05444 • Published • 3
- 
	
	
	Multimodal RewardBench: Holistic Evaluation of Reward Models for Vision Language ModelsPaper • 2502.14191 • Published • 7
- 
	
	
	CodeCriticBench: A Holistic Code Critique Benchmark for Large Language ModelsPaper • 2502.16614 • Published • 27
Henry Hengyuan Zhao
hhenryz
		AI & ML interests
Multimodal Reasoning, Human-AI Interaction, GUI Automation
		Recent Activity
						updated 
								a collection
							
						2 days ago
						
					Personal Interest
						
						upvoted 
								a
								paper
							
						2 days ago
						
					
						
						
						From Charts to Code: A Hierarchical Benchmark for Multimodal Models
						
						commented on 
								a paper
							
						2 days ago
						
					
						
						
						From Charts to Code: A Hierarchical Benchmark for Multimodal Models
						
 
								




