Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs Paper • 2510.18245 • Published 13 days ago • 6
Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning Paper • 2506.04723 • Published Jun 5 • 1
LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild Paper • 2510.14240 • Published 18 days ago • 11
Synthesizing Agentic Data for Web Agents with Progressive Difficulty Enhancement Mechanisms Paper • 2510.13913 • Published 19 days ago • 3
R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training Paper • 2505.00358 • Published May 1 • 26
Shrinking the Generation-Verification Gap with Weak Verifiers Paper • 2506.18203 • Published Jun 22 • 1
Time To Impeach LLM-as-a-Judge: Programs are the Future of Evaluation Paper • 2506.10403 • Published Jun 12 • 1
The ALCHEmist: Automated Labeling 500x CHEaper Than LLM Data Annotators Paper • 2407.11004 • Published Jun 25, 2024
ScriptoriumWS: A Code Generation Assistant for Weak Supervision Paper • 2502.12366 • Published Feb 17
Evaluating Sample Utility for Data Selection by Mimicking Model Weights Paper • 2501.06708 • Published Jan 12 • 5
Multimodal Data Curation via Object Detection and Filter Ensembles Paper • 2401.12225 • Published Jan 5, 2024
Helpful Agent Meets Deceptive Judge: Understanding Vulnerabilities in Agentic Workflows Paper • 2506.03332 • Published Jun 3 • 2
UniTalk: Towards Universal Active Speaker Detection in Real World Scenarios Paper • 2505.21954 • Published May 28 • 1
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models Paper • 2406.14852 • Published Jun 21, 2024
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models Paper • 2410.10818 • Published Oct 14, 2024 • 17