training - a lkonstantinov Collection

lkonstantinov 's Collections

agents

mixed precision

training

updated 8 days ago

How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models

Paper • 2509.19371 • Published Sep 19
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Paper • 2505.06708 • Published May 10 • 4
Selective Attention: Enhancing Transformer through Principled Context Control

Paper • 2411.12892 • Published Nov 19, 2024
A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10 • 185
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning

Paper • 2504.12216 • Published Apr 16 • 3
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11 • 48
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 298