Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2406.11794

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 23
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 1
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

IFML Affiliated Research Y5

Explore our research on Hugging Face! IFML is the National AI Institute for Foundations of Machine Learning (IFML).

DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17, 2024 • 54
mlfoundations/dclm-baseline-1.0

Preview • Updated Jul 22, 2024 • 768k • 240
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

Paper • 2410.10792 • Published Oct 14, 2024 • 31
Memory-Efficient LLM Training with Online Subspace Descent

Paper • 2408.12857 • Published Aug 23, 2024 • 16

DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17, 2024 • 54
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis

Paper • 2410.02749 • Published Oct 3, 2024 • 13
Fewer Truncations Improve Language Modeling

Paper • 2404.10830 • Published Apr 16, 2024 • 3
How to Train Long-Context Language Models (Effectively)

Paper • 2410.02660 • Published Oct 3, 2024 • 2

Training-related

DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17, 2024 • 54
griffin/chain_of_density

Viewer • Updated Sep 8, 2023 • 1.1k • 46 • 71
HuggingFaceFV/finevideo

Viewer • Updated Dec 16, 2024 • 39.5k • 7.1k • 330

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

Paper • 2405.07526 • Published May 13, 2024 • 21
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Paper • 2405.15613 • Published May 24, 2024 • 17
A Touch, Vision, and Language Dataset for Multimodal Alignment

Paper • 2402.13232 • Published Feb 20, 2024 • 16
How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Paper • 2406.11813 • Published Jun 17, 2024 • 31

Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated

Paper • 2509.05739 • Published Sep 6 • 2
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers

Paper • 2509.03059 • Published Sep 3 • 24
Universal Deep Research: Bring Your Own Model and Strategy

Paper • 2509.00244 • Published Aug 29 • 13
<think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs

Paper • 2509.08358 • Published Sep 10 • 13

Llammy3.2-3B-GUFF

prithivMLmods/Llama-Sentient-3.2-3B-Instruct

Text Generation • Updated Dec 10, 2024 • 2 • 9
bartendr604/Llama.Diffusion.Flix

Updated Apr 12 • 1
Running

1.39k

1.39k

FLUX Unlimited

🔥

Use the FLUX model as much as you want.
HKUSTAudio/xcodec2

Audio-to-Audio • 0.8B • Updated Feb 23 • 13.5k • 91

LLM Tech Report

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 376
Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18, 2024 • 150
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement

Paper • 2409.12122 • Published Sep 18, 2024 • 4
Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 207

Relevant-Papers-Midterm

Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

Paper • 2402.14848 • Published Feb 19, 2024 • 20
The Prompt Report: A Systematic Survey of Prompting Techniques

Paper • 2406.06608 • Published Jun 6, 2024 • 67
CRAG -- Comprehensive RAG Benchmark

Paper • 2406.04744 • Published Jun 7, 2024 • 48
Transformers meet Neural Algorithmic Reasoners

Paper • 2406.09308 • Published Jun 13, 2024 • 44

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 22
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 97
DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17, 2024 • 54
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22, 2024 • 134

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 23
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 1
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated

Paper • 2509.05739 • Published Sep 6 • 2
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers

Paper • 2509.03059 • Published Sep 3 • 24
Universal Deep Research: Bring Your Own Model and Strategy

Paper • 2509.00244 • Published Aug 29 • 13
<think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs

Paper • 2509.08358 • Published Sep 10 • 13

IFML Affiliated Research Y5

Explore our research on Hugging Face! IFML is the National AI Institute for Foundations of Machine Learning (IFML).

DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17, 2024 • 54
mlfoundations/dclm-baseline-1.0

Preview • Updated Jul 22, 2024 • 768k • 240
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

Paper • 2410.10792 • Published Oct 14, 2024 • 31
Memory-Efficient LLM Training with Online Subspace Descent

Paper • 2408.12857 • Published Aug 23, 2024 • 16

Llammy3.2-3B-GUFF

prithivMLmods/Llama-Sentient-3.2-3B-Instruct

Text Generation • Updated Dec 10, 2024 • 2 • 9
bartendr604/Llama.Diffusion.Flix

Updated Apr 12 • 1
Running

1.39k

1.39k

FLUX Unlimited

🔥

Use the FLUX model as much as you want.
HKUSTAudio/xcodec2

Audio-to-Audio • 0.8B • Updated Feb 23 • 13.5k • 91

DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17, 2024 • 54
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis

Paper • 2410.02749 • Published Oct 3, 2024 • 13
Fewer Truncations Improve Language Modeling

Paper • 2404.10830 • Published Apr 16, 2024 • 3
How to Train Long-Context Language Models (Effectively)

Paper • 2410.02660 • Published Oct 3, 2024 • 2

LLM Tech Report

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 376
Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18, 2024 • 150
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement

Paper • 2409.12122 • Published Sep 18, 2024 • 4
Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 207

Training-related

DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17, 2024 • 54
griffin/chain_of_density

Viewer • Updated Sep 8, 2023 • 1.1k • 46 • 71
HuggingFaceFV/finevideo

Viewer • Updated Dec 16, 2024 • 39.5k • 7.1k • 330

Relevant-Papers-Midterm

Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

Paper • 2402.14848 • Published Feb 19, 2024 • 20
The Prompt Report: A Systematic Survey of Prompting Techniques

Paper • 2406.06608 • Published Jun 6, 2024 • 67
CRAG -- Comprehensive RAG Benchmark

Paper • 2406.04744 • Published Jun 7, 2024 • 48
Transformers meet Neural Algorithmic Reasoners

Paper • 2406.09308 • Published Jun 13, 2024 • 44

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

Paper • 2405.07526 • Published May 13, 2024 • 21
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Paper • 2405.15613 • Published May 24, 2024 • 17
A Touch, Vision, and Language Dataset for Multimodal Alignment

Paper • 2402.13232 • Published Feb 20, 2024 • 16
How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Paper • 2406.11813 • Published Jun 17, 2024 • 31

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 22
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 97
DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17, 2024 • 54
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22, 2024 • 134

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs