Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for
Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper
β’
2412.13663
β’
Published
β’
156
Paper
β’
2412.15115
β’
Published
β’
376
Are Your LLMs Capable of Stable Reasoning?
Paper
β’
2412.13147
β’
Published
β’
94
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
β’
2412.09871
β’
Published
β’
108
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper
β’
2412.10360
β’
Published
β’
147
Expanding Performance Boundaries of Open-Source Multimodal Models with
Model, Data, and Test-Time Scaling
Paper
β’
2412.05271
β’
Published
β’
159
Enhancing Human-Like Responses in Large Language Models
Paper
β’
2501.05032
β’
Published
β’
57
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
Scaling
Paper
β’
2502.06703
β’
Published
β’
153