-
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 188 -
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Paper • 2508.14444 • Published • 36 -
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Paper • 2507.06261 • Published • 63 -
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Paper • 2506.13585 • Published • 268
Collections
Discover the best community collections!
Collections including paper arxiv:2504.13161
-
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 75 -
DataDecide: How to Predict Best Pretraining Data with Small Experiments
Paper • 2504.11393 • Published • 18 -
Efficient Process Reward Model Training via Active Learning
Paper • 2504.10559 • Published • 13 -
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Paper • 2504.13161 • Published • 93
-
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Paper • 2504.13161 • Published • 93 -
Hebbian Learning based Orthogonal Projection for Continual Learning of Spiking Neural Networks
Paper • 2402.11984 • Published -
BlackGoose Rimer: Harnessing RWKV-7 as a Simple yet Superior Replacement for Transformers in Large-Scale Time Series Modeling
Paper • 2503.06121 • Published • 5 -
Timer: Transformers for Time Series Analysis at Scale
Paper • 2402.02368 • Published • 1
-
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Paper • 2411.04952 • Published • 30 -
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models
Paper • 2411.05005 • Published • 13 -
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Paper • 2411.04075 • Published • 17 -
Self-Consistency Preference Optimization
Paper • 2411.04109 • Published • 19
-
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels
Paper • 2405.07526 • Published • 21 -
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
Paper • 2405.15613 • Published • 17 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper • 2402.13232 • Published • 16 -
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 31
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 31 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 26 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 104
-
InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning
Paper • 2408.07089 • Published • 14 -
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models
Paper • 2409.16191 • Published • 42 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 140 -
Self-Boosting Large Language Models with Synthetic Preference Data
Paper • 2410.06961 • Published • 16
-
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Paper • 2404.01197 • Published • 31 -
CosmicMan: A Text-to-Image Foundation Model for Humans
Paper • 2404.01294 • Published • 17 -
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Paper • 2406.08707 • Published • 17 -
DataComp-LM: In search of the next generation of training sets for language models
Paper • 2406.11794 • Published • 54
-
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 188 -
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Paper • 2508.14444 • Published • 36 -
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Paper • 2507.06261 • Published • 63 -
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Paper • 2506.13585 • Published • 268
-
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 75 -
DataDecide: How to Predict Best Pretraining Data with Small Experiments
Paper • 2504.11393 • Published • 18 -
Efficient Process Reward Model Training via Active Learning
Paper • 2504.10559 • Published • 13 -
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Paper • 2504.13161 • Published • 93
-
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Paper • 2504.13161 • Published • 93 -
Hebbian Learning based Orthogonal Projection for Continual Learning of Spiking Neural Networks
Paper • 2402.11984 • Published -
BlackGoose Rimer: Harnessing RWKV-7 as a Simple yet Superior Replacement for Transformers in Large-Scale Time Series Modeling
Paper • 2503.06121 • Published • 5 -
Timer: Transformers for Time Series Analysis at Scale
Paper • 2402.02368 • Published • 1
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 31 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 26 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 104
-
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Paper • 2411.04952 • Published • 30 -
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models
Paper • 2411.05005 • Published • 13 -
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Paper • 2411.04075 • Published • 17 -
Self-Consistency Preference Optimization
Paper • 2411.04109 • Published • 19
-
InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning
Paper • 2408.07089 • Published • 14 -
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models
Paper • 2409.16191 • Published • 42 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 140 -
Self-Boosting Large Language Models with Synthetic Preference Data
Paper • 2410.06961 • Published • 16
-
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels
Paper • 2405.07526 • Published • 21 -
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
Paper • 2405.15613 • Published • 17 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper • 2402.13232 • Published • 16 -
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 31
-
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Paper • 2404.01197 • Published • 31 -
CosmicMan: A Text-to-Image Foundation Model for Humans
Paper • 2404.01294 • Published • 17 -
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Paper • 2406.08707 • Published • 17 -
DataComp-LM: In search of the next generation of training sets for language models
Paper • 2406.11794 • Published • 54