LongCodeZip: Compress Long Context for Code Language Models Paper • 2510.00446 • Published Oct 1 • 108
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs Paper • 2509.22220 • Published Sep 26 • 64
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation Paper • 2509.16198 • Published Sep 19 • 127
MachineLearningLM: Continued Pretraining Language Models on Millions of Synthetic Tabular Prediction Tasks Scales In-Context ML Paper • 2509.06806 • Published Sep 8 • 63
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published Aug 19 • 118
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published Feb 13 • 193
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos Paper • 2501.04001 • Published Jan 7 • 47
An Empirical Study of Autoregressive Pre-training from Videos Paper • 2501.05453 • Published Jan 9 • 41
A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following Paper • 2501.08187 • Published Jan 14 • 27
Enhancing Automated Interpretability with Output-Centric Feature Descriptions Paper • 2501.08319 • Published Jan 14 • 11