Hi there, ![]()
As I’m digging into the paper “Muon is scalable for LLM training”, I found a few recent paper about this optimizer from a norm’s perspective.
and a course on Muon by Laker Newhouse :
for those of you that are interested doing research with Muon, I hope those theoretical proves can provide some insights on what direction to take.
I’m going to digging into those papers too!
Happy researching! ![]()
- Jen