Research papers about Muon from MIT

Hi there, :waving_hand:

As I’m digging into the paper “Muon is scalable for LLM training”, I found a few recent paper about this optimizer from a norm’s perspective.

and a course on Muon by Laker Newhouse :

for those of you that are interested doing research with Muon, I hope those theoretical proves can provide some insights on what direction to take.

I’m going to digging into those papers too!

Happy researching! :nerd_face:

  • Jen
2 Likes