- 
	
	
	
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper • 2504.01990 • Published • 300 - 
	
	
	
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Paper • 2504.10479 • Published • 300 - 
	
	
	
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models
Paper • 2503.24235 • Published • 54 - 
	
	
	
Seedream 3.0 Technical Report
Paper • 2504.11346 • Published • 70 
Collections
Discover the best community collections!
Collections including paper arxiv:2501.09732 
						
					
				- 
	
	
	
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 115 - 
	
	
	
PaSa: An LLM Agent for Comprehensive Academic Paper Search
Paper • 2501.10120 • Published • 52 - 
	
	
	
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong
Paper • 2501.09775 • Published • 33 - 
	
	
	
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario
Paper • 2501.10132 • Published • 22 
- 
	
	
	
MangaNinja: Line Art Colorization with Precise Reference Following
Paper • 2501.08332 • Published • 60 - 
	
	
	
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
Paper • 2501.09732 • Published • 71 - 
	
	
	
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
Paper • 2502.20126 • Published • 20 - 
	
	
	
Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance
Paper • 2502.16944 • Published • 10 
- 
	
	
	
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
Paper • 2412.14922 • Published • 88 - 
	
	
	
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 376 - 
	
	
	
Progressive Multimodal Reasoning via Active Retrieval
Paper • 2412.14835 • Published • 73 - 
	
	
	
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
Paper • 2501.09732 • Published • 71 
- 
	
	
	
One-Minute Video Generation with Test-Time Training
Paper • 2504.05298 • Published • 110 - 
	
	
	
MoCha: Towards Movie-Grade Talking Character Synthesis
Paper • 2503.23307 • Published • 138 - 
	
	
	
Towards Understanding Camera Motions in Any Video
Paper • 2504.15376 • Published • 158 - 
	
	
	
Antidistillation Sampling
Paper • 2504.13146 • Published • 59 
- 
	
	
	
Towards Best Practices for Open Datasets for LLM Training
Paper • 2501.08365 • Published • 63 - 
	
	
	
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
Paper • 2501.09755 • Published • 36 - 
	
	
	
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
Paper • 2501.09732 • Published • 71 - 
	
	
	
MINIMA: Modality Invariant Image Matching
Paper • 2412.19412 • Published • 4 
- 
	
	
	
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 - 
	
	
	
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
Paper • 2501.01257 • Published • 52 - 
	
	
	
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
Paper • 2501.01423 • Published • 43 - 
	
	
	
REDUCIO! Generating 1024times1024 Video within 16 Seconds using Extremely Compressed Motion Latents
Paper • 2411.13552 • Published 
- 
	
	
	
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper • 2504.01990 • Published • 300 - 
	
	
	
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Paper • 2504.10479 • Published • 300 - 
	
	
	
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models
Paper • 2503.24235 • Published • 54 - 
	
	
	
Seedream 3.0 Technical Report
Paper • 2504.11346 • Published • 70 
- 
	
	
	
One-Minute Video Generation with Test-Time Training
Paper • 2504.05298 • Published • 110 - 
	
	
	
MoCha: Towards Movie-Grade Talking Character Synthesis
Paper • 2503.23307 • Published • 138 - 
	
	
	
Towards Understanding Camera Motions in Any Video
Paper • 2504.15376 • Published • 158 - 
	
	
	
Antidistillation Sampling
Paper • 2504.13146 • Published • 59 
- 
	
	
	
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 115 - 
	
	
	
PaSa: An LLM Agent for Comprehensive Academic Paper Search
Paper • 2501.10120 • Published • 52 - 
	
	
	
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong
Paper • 2501.09775 • Published • 33 - 
	
	
	
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario
Paper • 2501.10132 • Published • 22 
- 
	
	
	
Towards Best Practices for Open Datasets for LLM Training
Paper • 2501.08365 • Published • 63 - 
	
	
	
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
Paper • 2501.09755 • Published • 36 - 
	
	
	
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
Paper • 2501.09732 • Published • 71 - 
	
	
	
MINIMA: Modality Invariant Image Matching
Paper • 2412.19412 • Published • 4 
- 
	
	
	
MangaNinja: Line Art Colorization with Precise Reference Following
Paper • 2501.08332 • Published • 60 - 
	
	
	
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
Paper • 2501.09732 • Published • 71 - 
	
	
	
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
Paper • 2502.20126 • Published • 20 - 
	
	
	
Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance
Paper • 2502.16944 • Published • 10 
- 
	
	
	
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
Paper • 2412.14922 • Published • 88 - 
	
	
	
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 376 - 
	
	
	
Progressive Multimodal Reasoning via Active Retrieval
Paper • 2412.14835 • Published • 73 - 
	
	
	
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
Paper • 2501.09732 • Published • 71 
- 
	
	
	
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 - 
	
	
	
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
Paper • 2501.01257 • Published • 52 - 
	
	
	
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
Paper • 2501.01423 • Published • 43 - 
	
	
	
REDUCIO! Generating 1024times1024 Video within 16 Seconds using Extremely Compressed Motion Latents
Paper • 2411.13552 • Published