- Esper is our full-stack, full-cycle coding, DevOps, and architecture specialist! - Our newest, best DeepSeek technical datasets emphasize more challenging queries and tough real-world coding tasks across a variety of programming languages and development paradigms: - Titanium 3 for coding and reasoning in DevOps and architecture: sequelbox/Titanium3-DeepSeek-V3.1-Terminus - Tachibana 3 for high-difficulty code production in a variety of topics and programming languages: - sequelbox/Tachibana3-Part1-DeepSeek-V3.1-Terminus - sequelbox/Tachibana3-Part2-DeepSeek-V3.2 - Mitakihara for MLOps, AI building, use, expertise, and research: sequelbox/Mitakihara-DeepSeek-R1-0528
Hey, amazing, awesome people of the beautiful internet 😍🥰
Distillation has been (from my point of view) a main driving factor for the success of hashtag#LLMs - like distilling the knowledge of an amazing big model (say hashtag#DeepSeekv3, or hashtag#GeminiAI) into yours.
Probably, you have done it with minimising a KL divergence, and it somehow worked.
Well, not that well, right?
1️⃣ Your model tends to memorise! 2️⃣ Your model might get the right answer, but its reasoning might be flawed.
To fix those problems, we rethink distillation and process a new approach! A method that is based on constrained RL that comes with nice theoretical guarantees and excellent performance!
One of the hardest challenges in AI safety is finding the right balance: how do we protect people from harm without undermining their agency? This tension is especially visible in conversational systems, where safeguards can sometimes feel more paternalistic than supportive.
In my latest piece for Hugging Face, I argue that open source and community-driven approaches offer a promising (though not exclusive) way forward.
✨ Transparency can make safety mechanisms into learning opportunities. ✨ Collaboration with diverse communities makes safeguards more relevant across contexts. ✨ Iteration in the open lets protections evolve rather than freeze into rigid, one-size-fits-all rules.
Of course, this isn’t a silver bullet. Top-down safety measures will still be necessary in some cases. But if we only rely on corporate control, we risk building systems that are safe at the expense of trust and autonomy.
Low-Rank Adaptation (LoRA) is the go-to method for efficient model fine-tuning that adds small low-rank matrices instead of retraining full models. The field isn’t standing still – new LoRA variants push the limits of efficiency, generalization, and personalization. So we’re sharing 10 of the latest LoRA approaches you should know about:
4. aLoRA (Activated LoRA) → Activated LoRA: Fine-tuned LLMs for Intrinsics (2504.12397) Only applies LoRA after invocation, letting the model reuse the base model’s KV cache instead of recomputing the full turn’s KV cache. Efficient in multi-turn conversations