How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models Paper • 2509.19371 • Published Sep 19
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free Paper • 2505.06708 • Published May 10 • 4
Selective Attention: Enhancing Transformer through Principled Context Control Paper • 2411.12892 • Published Nov 19, 2024
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10 • 185
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning Paper • 2504.12216 • Published Apr 16 • 3
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning Paper • 2508.08221 • Published Aug 11 • 48
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14 • 298