A Closer Look into Mixture-of-Experts in Large Language Models Paper • 2406.18219 • Published Jun 26, 2024 • 17
Mixture of Nested Experts: Adaptive Processing of Visual Tokens Paper • 2407.19985 • Published Jul 29, 2024 • 37
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free Paper • 2410.10814 • Published Oct 14, 2024 • 51
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning Paper • 2510.19338 • Published 10 days ago • 101