Research papers about Muon from MIT

bird-of-paradise · January 10, 2026, 8:42pm

Hi there,

As I’m digging into the paper “Muon is scalable for LLM training”, I found a few recent paper about this optimizer from a norm’s perspective.

and a course on Muon by Laker Newhouse :

for those of you that are interested doing research with Muon, I hope those theoretical proves can provide some insights on what direction to take.

I’m going to digging into those papers too!

Happy researching!

Topic		Replies	Views
First instalment the Muon Optimizer tutorial series Show and Tell	2	216	August 19, 2025
Scaling Is Not Plug-and-Play: What Muon Teaches Us About Optimizers at Scale Show and Tell	0	48	January 4, 2026
[Tutorial] Understanding and Implementing the Muon Optimizer Show and Tell	2	2908	November 7, 2025
My Muon Replication Journey — From Distributed Optimizers to a No-BS Training Glossary 🧩 Show and Tell	2	144	October 28, 2025
Sharing field notes from a small-scale GRPO + Muon experiment Show and Tell	2	38	February 1, 2026