-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2501.03262
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 625 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 298 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 306 -
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Paper • 2509.03867 • Published • 209
-
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper • 2503.01785 • Published • 84 -
When an LLM is apprehensive about its answers -- and when its uncertainty is justified
Paper • 2503.01688 • Published • 21 -
Predictive Data Selection: The Data That Predicts Is the Data That Teaches
Paper • 2503.00808 • Published • 56 -
Chain of Draft: Thinking Faster by Writing Less
Paper • 2502.18600 • Published • 49
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 102 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 298 -
Towards Best Practices for Open Datasets for LLM Training
Paper • 2501.08365 • Published • 63 -
Qwen2.5-1M Technical Report
Paper • 2501.15383 • Published • 72
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 102 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 101 -
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
Paper • 2510.18927 • Published • 82
-
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation
Paper • 2504.12626 • Published • 51 -
Qwen3 Technical Report
Paper • 2505.09388 • Published • 308 -
Qwen-Image Technical Report
Paper • 2508.02324 • Published • 259 -
DINOv3
Paper • 2508.10104 • Published • 274
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 102 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 72 -
The Differences Between Direct Alignment Algorithms are a Blur
Paper • 2502.01237 • Published • 113 -
Process Reinforcement through Implicit Rewards
Paper • 2502.01456 • Published • 61
-
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
Paper • 2412.14922 • Published • 88 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47 -
Deliberation in Latent Space via Differentiable Cache Augmentation
Paper • 2412.17747 • Published • 32 -
Outcome-Refining Process Supervision for Code Generation
Paper • 2412.15118 • Published • 19
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 102 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 101 -
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
Paper • 2510.18927 • Published • 82
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 625 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 298 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 306 -
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Paper • 2509.03867 • Published • 209
-
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation
Paper • 2504.12626 • Published • 51 -
Qwen3 Technical Report
Paper • 2505.09388 • Published • 308 -
Qwen-Image Technical Report
Paper • 2508.02324 • Published • 259 -
DINOv3
Paper • 2508.10104 • Published • 274
-
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper • 2503.01785 • Published • 84 -
When an LLM is apprehensive about its answers -- and when its uncertainty is justified
Paper • 2503.01688 • Published • 21 -
Predictive Data Selection: The Data That Predicts Is the Data That Teaches
Paper • 2503.00808 • Published • 56 -
Chain of Draft: Thinking Faster by Writing Less
Paper • 2502.18600 • Published • 49
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 102 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 298 -
Towards Best Practices for Open Datasets for LLM Training
Paper • 2501.08365 • Published • 63 -
Qwen2.5-1M Technical Report
Paper • 2501.15383 • Published • 72
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 102 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 72 -
The Differences Between Direct Alignment Algorithms are a Blur
Paper • 2502.01237 • Published • 113 -
Process Reinforcement through Implicit Rewards
Paper • 2502.01456 • Published • 61
-
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
Paper • 2412.14922 • Published • 88 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47 -
Deliberation in Latent Space via Differentiable Cache Augmentation
Paper • 2412.17747 • Published • 32 -
Outcome-Refining Process Supervision for Code Generation
Paper • 2412.15118 • Published • 19