LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning Paper • 2510.14211 • Published 18 days ago • 6
QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models Paper • 2509.17428 • Published Sep 22 • 9
QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models Paper • 2509.17428 • Published Sep 22 • 9
QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models Paper • 2509.17428 • Published Sep 22 • 9 • 2
L4Q: Parameter Efficient Quantization-Aware Training on Large Language Models via LoRA-wise LSQ Paper • 2402.04902 • Published Feb 7, 2024 • 4
Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning Paper • 2505.13866 • Published May 20 • 16
AUTOACT: Automatic Agent Learning from Scratch via Self-Planning Paper • 2401.05268 • Published Jan 10, 2024 • 4
Running 3.4k 3.4k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks Paper • 2402.09025 • Published Feb 14, 2024 • 9