- 
	
	
	
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 - 
	
	
	
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 - 
	
	
	
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 - 
	
	
	
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23 
Collections
Discover the best community collections!
Collections including paper arxiv:2505.19147 
						
					
				- 
	
	
	
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper • 2504.01990 • Published • 300 - 
	
	
	
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Paper • 2504.10479 • Published • 300 - 
	
	
	
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models
Paper • 2503.24235 • Published • 54 - 
	
	
	
Seedream 3.0 Technical Report
Paper • 2504.11346 • Published • 70 
- 
	
	
	
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 9 - 
	
	
	
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 - 
	
	
	
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper • 2504.10068 • Published • 30 - 
	
	
	
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper • 2504.10481 • Published • 84 
- 
	
	
	
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 274 - 
	
	
	
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 185 - 
	
	
	
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 138 - 
	
	
	
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 
- 
	
	
	
TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations
Paper • 2505.18125 • Published • 112 - 
	
	
	
On-Policy RL with Optimal Reward Baseline
Paper • 2505.23585 • Published • 14 - 
	
	
	
Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering
Paper • 2505.23604 • Published • 23 - 
	
	
	
Are Reasoning Models More Prone to Hallucination?
Paper • 2505.23646 • Published • 24 
- 
	
	
	
One-Minute Video Generation with Test-Time Training
Paper • 2504.05298 • Published • 110 - 
	
	
	
MoCha: Towards Movie-Grade Talking Character Synthesis
Paper • 2503.23307 • Published • 138 - 
	
	
	
Towards Understanding Camera Motions in Any Video
Paper • 2504.15376 • Published • 158 - 
	
	
	
Antidistillation Sampling
Paper • 2504.13146 • Published • 59 
- 
	
	
	
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Paper • 2411.04952 • Published • 30 - 
	
	
	
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models
Paper • 2411.05005 • Published • 13 - 
	
	
	
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Paper • 2411.04075 • Published • 17 - 
	
	
	
Self-Consistency Preference Optimization
Paper • 2411.04109 • Published • 19 
- 
	
	
	
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 - 
	
	
	
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 - 
	
	
	
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 - 
	
	
	
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23 
- 
	
	
	
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 274 - 
	
	
	
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 185 - 
	
	
	
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 138 - 
	
	
	
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 
- 
	
	
	
TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations
Paper • 2505.18125 • Published • 112 - 
	
	
	
On-Policy RL with Optimal Reward Baseline
Paper • 2505.23585 • Published • 14 - 
	
	
	
Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering
Paper • 2505.23604 • Published • 23 - 
	
	
	
Are Reasoning Models More Prone to Hallucination?
Paper • 2505.23646 • Published • 24 
- 
	
	
	
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper • 2504.01990 • Published • 300 - 
	
	
	
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Paper • 2504.10479 • Published • 300 - 
	
	
	
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models
Paper • 2503.24235 • Published • 54 - 
	
	
	
Seedream 3.0 Technical Report
Paper • 2504.11346 • Published • 70 
- 
	
	
	
One-Minute Video Generation with Test-Time Training
Paper • 2504.05298 • Published • 110 - 
	
	
	
MoCha: Towards Movie-Grade Talking Character Synthesis
Paper • 2503.23307 • Published • 138 - 
	
	
	
Towards Understanding Camera Motions in Any Video
Paper • 2504.15376 • Published • 158 - 
	
	
	
Antidistillation Sampling
Paper • 2504.13146 • Published • 59 
- 
	
	
	
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 9 - 
	
	
	
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 - 
	
	
	
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper • 2504.10068 • Published • 30 - 
	
	
	
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper • 2504.10481 • Published • 84 
- 
	
	
	
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Paper • 2411.04952 • Published • 30 - 
	
	
	
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models
Paper • 2411.05005 • Published • 13 - 
	
	
	
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Paper • 2411.04075 • Published • 17 - 
	
	
	
Self-Consistency Preference Optimization
Paper • 2411.04109 • Published • 19