AI & ML interests

Helping AI to become AGI

Recent Activity

KingNishย  updated a dataset about 1 month ago
HelpingAI/Dhanishtha-2.0-SUPERTHINKER
Abhaykoulย  updated a dataset about 1 month ago
HelpingAI/KS-WIKI
Abhaykoulย  published a dataset about 1 month ago
HelpingAI/KS-WIKI
View all activity

KingNishย 
posted an update about 1 month ago
view post
Post
2493
Muon vs MuonClip vs Muon+Adamw

Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fineโ€‘tuning? We ran headโ€‘toโ€‘head tests on Qwen3โ€‘4B (10k+ highโ€‘quality instruction rows) to find out.

Short story: Pure Muon converged fastest at the start, but its gradientโ€‘norm spikes made training unstable. MuonClip (Kimi K2โ€™s clipping) stabilizes long pretraining runs, yet in our smallโ€‘scale fineโ€‘tune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW.

Takeaway: for small-scale fine-tuning, hybrid = practical and reliable.

Next Step: scale to larger models/datasets to see if Muonโ€™s spikes become catastrophic or if clipping wins out.

Full Blog Link: https://huggingface.co/blog/KingNish/optimizer-part1
KingNishย 
posted an update about 1 month ago
KingNishย 
updated a Space 2 months ago
KingNishย 
published a Space 2 months ago
Abhaykoulย 
posted an update 4 months ago
view post
Post
3186
๐Ÿš€ Ever dreamed of training your own Large Language Model from scratch? What if I told you it doesn't require a supercomputer or PhD in ML? ๐Ÿคฏ

Introducing LLM Trainer - the educational framework that makes LLM training accessible to EVERYONE! Whether you're on a CPU-only laptop or scaling to distributed GPUs, we've got you covered. ๐Ÿ’ปโžก๏ธ๐Ÿ–ฅ๏ธ

Why LLM Trainer? Because existing tools are either too simplistic (hiding the magic) or too complex (requiring expert knowledge). We bridge the gap with:

๐ŸŽ“ Educational transparency - every component built from scratch with clear code
๐Ÿ’ป CPU-first approach - start training immediately, no GPU needed
๐Ÿ”ง Full customization - modify anything you want
๐Ÿ“ˆ Seamless scaling - from laptop to cluster without code changes
๐Ÿค HuggingFace integration - works with existing models & tokenizers

Key highlights:
โœ… Built-in tokenizers (BPE, WordPiece, HF wrappers)
โœ… Complete Transformer implementation from scratch
โœ… Optimized for CPU training
โœ… Advanced features: mixed precision, gradient checkpointing, multiple generation strategies
โœ… Comprehensive monitoring & metrics

Perfect for:
- Students learning transformers
- Researchers prototyping new ideas
- Developers building domain-specific models

Ready to train your first LLM? It's easier than you think!

๐Ÿ”— Check it out: https://github.com/HelpingAI/llm-trainer
๐Ÿ“š Docs: Getting Started Guide
๐Ÿ’ฌ Join the community: GitHub Discussions

#AI #MachineLearning #LLM #DeepLearning #OpenSource #Python #HuggingFace #NLP

Special thanks to HuggingFace and PyTorch teams for the amazing ecosystem! ๐Ÿ™
  • 1 reply
ยท

remove dupe

#2 opened 5 months ago by
Vortexjr

Update README.md

#1 opened 5 months ago by
Vortexjr