What Layers When: Learning to Skip Compute in LLMs with Residual Gates Paper • 2510.13876 • Published Oct 13 • 11
Hybrid Linear Attention Research Collection All 1.3B & 340M hybrid linear-attention experiments. • 62 items • Updated Sep 11 • 12
view article Article Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance May 21 • 38
Common Models Collection The first generation of models pretrained on Common Corpus. • 5 items • Updated Dec 5, 2024 • 41
Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients Paper • 2406.17660 • Published Jun 25, 2024 • 5
Efficient Continual Pre-training by Mitigating the Stability Gap Paper • 2406.14833 • Published Jun 21, 2024 • 20
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity Paper • 2401.17072 • Published Jan 30, 2024 • 25
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs Paper • 2309.09582 • Published Sep 18, 2023 • 4