Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
onekq 
posted an update 23 days ago
Post
2853
The reaction on the QAT post is beyond expectations so below is my optimizer post as promised. But I found that I had lots of explanation to do about optimizer itself. So this post is actually a historical recount. The Muon optimizer (used by Kimi) post (coming very soon) can only continue after this.

https://huggingface.co/blog/onekq/adam-optimizer

If you know Adam(W) optimizer already, you can just skip and sorry for the wait. Otherwise, it should be a useful read.
In this post