view post Post 2457 MLEB is the largest, most diverse, and most comprehensive benchmark for legal text embedding models. https://huggingface.co/blog/isaacus/introducing-mleb See translation 🚀 5 5 🔥 4 4 ❤️ 4 4 ➕ 3 3 🤗 3 3 😎 3 3 🧠 3 3 🤯 3 3 + Reply
METAGENE-1: Metagenomic Foundation Model for Pandemic Monitoring Paper • 2501.02045 • Published Jan 3 • 23
view post Post 455 Bio LLMs train on many genomes, but can we encode differences within a species? TomatoTomato adds pangenome tokens to represent a domestic tomato and a wild tomato in one sequence 🍅 🧬 monsoon-nlp/tomatotomato-gLM2-150M-v0.1 See translation 🚀 1 1 + Reply
view post Post 6865 We're kick-starting the process of Transformers v5, with @ArthurZ and @cyrilvallez !v5 should be significant: we're using it as a milestone for performance optimizations, saner defaults, and a much cleaner code base worthy of 2025.Fun fact: v4.0.0-rc-1 came out on Nov 19, 2020, nearly five years ago! See translation 6 replies · 🚀 18 18 👍 9 9 🔥 6 6 + Reply