-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 1 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2406.11794
-
DataComp-LM: In search of the next generation of training sets for language models
Paper • 2406.11794 • Published • 54 -
mlfoundations/dclm-baseline-1.0
Preview • Updated • 768k • 240 -
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
Paper • 2410.10792 • Published • 31 -
Memory-Efficient LLM Training with Online Subspace Descent
Paper • 2408.12857 • Published • 16
-
DataComp-LM: In search of the next generation of training sets for language models
Paper • 2406.11794 • Published • 54 -
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
Paper • 2410.02749 • Published • 13 -
Fewer Truncations Improve Language Modeling
Paper • 2404.10830 • Published • 3 -
How to Train Long-Context Language Models (Effectively)
Paper • 2410.02660 • Published • 2
-
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels
Paper • 2405.07526 • Published • 21 -
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
Paper • 2405.15613 • Published • 17 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper • 2402.13232 • Published • 16 -
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 31
-
Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated
Paper • 2509.05739 • Published • 2 -
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
Paper • 2509.03059 • Published • 24 -
Universal Deep Research: Bring Your Own Model and Strategy
Paper • 2509.00244 • Published • 13 -
<think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs
Paper • 2509.08358 • Published • 13
-
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 376 -
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 150 -
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Paper • 2409.12122 • Published • 4 -
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 207
-
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
Paper • 2402.14848 • Published • 20 -
The Prompt Report: A Systematic Survey of Prompting Techniques
Paper • 2406.06608 • Published • 67 -
CRAG -- Comprehensive RAG Benchmark
Paper • 2406.04744 • Published • 48 -
Transformers meet Neural Algorithmic Reasoners
Paper • 2406.09308 • Published • 44
-
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 22 -
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Paper • 2406.17557 • Published • 97 -
DataComp-LM: In search of the next generation of training sets for language models
Paper • 2406.11794 • Published • 54 -
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 134
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 1 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2
-
Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated
Paper • 2509.05739 • Published • 2 -
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
Paper • 2509.03059 • Published • 24 -
Universal Deep Research: Bring Your Own Model and Strategy
Paper • 2509.00244 • Published • 13 -
<think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs
Paper • 2509.08358 • Published • 13
-
DataComp-LM: In search of the next generation of training sets for language models
Paper • 2406.11794 • Published • 54 -
mlfoundations/dclm-baseline-1.0
Preview • Updated • 768k • 240 -
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
Paper • 2410.10792 • Published • 31 -
Memory-Efficient LLM Training with Online Subspace Descent
Paper • 2408.12857 • Published • 16
-
DataComp-LM: In search of the next generation of training sets for language models
Paper • 2406.11794 • Published • 54 -
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
Paper • 2410.02749 • Published • 13 -
Fewer Truncations Improve Language Modeling
Paper • 2404.10830 • Published • 3 -
How to Train Long-Context Language Models (Effectively)
Paper • 2410.02660 • Published • 2
-
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 376 -
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 150 -
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Paper • 2409.12122 • Published • 4 -
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 207
-
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
Paper • 2402.14848 • Published • 20 -
The Prompt Report: A Systematic Survey of Prompting Techniques
Paper • 2406.06608 • Published • 67 -
CRAG -- Comprehensive RAG Benchmark
Paper • 2406.04744 • Published • 48 -
Transformers meet Neural Algorithmic Reasoners
Paper • 2406.09308 • Published • 44
-
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels
Paper • 2405.07526 • Published • 21 -
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
Paper • 2405.15613 • Published • 17 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper • 2402.13232 • Published • 16 -
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 31
-
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 22 -
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Paper • 2406.17557 • Published • 97 -
DataComp-LM: In search of the next generation of training sets for language models
Paper • 2406.11794 • Published • 54 -
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 134