-
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI
Paper • 2505.19443 • Published • 15 -
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
Paper • 2506.19290 • Published • 52 -
CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks
Paper • 2105.12655 • Published -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 148
Collections
Discover the best community collections!
Collections including paper arxiv:2406.08464
-
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Paper • 2406.08464 • Published • 71 -
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Paper • 2406.20094 • Published • 104 -
argilla/magpie-ultra-v1.0
Viewer • Updated • 3.22M • 314 • 49 -
simplescaling/s1K-1.1
Viewer • Updated • 1k • 2.14k • 138
-
Textbooks Are All You Need
Paper • 2306.11644 • Published • 146 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper • 2309.05463 • Published • 88 -
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Paper • 2305.07759 • Published • 36 -
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Paper • 2406.20094 • Published • 104
-
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 31 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 62 -
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Paper • 2406.08464 • Published • 71 -
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Paper • 2402.13064 • Published • 50
-
AgentInstruct: Toward Generative Teaching with Agentic Flows
Paper • 2407.03502 • Published • 50 -
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Paper • 2406.08464 • Published • 71 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 257 -
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 31
-
Bootstrapping Language Models with DPO Implicit Rewards
Paper • 2406.09760 • Published • 40 -
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Paper • 2406.11931 • Published • 65 -
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Paper • 2406.14544 • Published • 35 -
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper • 2406.14491 • Published • 95
-
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI
Paper • 2505.19443 • Published • 15 -
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
Paper • 2506.19290 • Published • 52 -
CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks
Paper • 2105.12655 • Published -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 148
-
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Paper • 2406.08464 • Published • 71 -
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Paper • 2406.20094 • Published • 104 -
argilla/magpie-ultra-v1.0
Viewer • Updated • 3.22M • 314 • 49 -
simplescaling/s1K-1.1
Viewer • Updated • 1k • 2.14k • 138
-
AgentInstruct: Toward Generative Teaching with Agentic Flows
Paper • 2407.03502 • Published • 50 -
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Paper • 2406.08464 • Published • 71 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 257 -
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 31
-
Textbooks Are All You Need
Paper • 2306.11644 • Published • 146 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper • 2309.05463 • Published • 88 -
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Paper • 2305.07759 • Published • 36 -
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Paper • 2406.20094 • Published • 104
-
Bootstrapping Language Models with DPO Implicit Rewards
Paper • 2406.09760 • Published • 40 -
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Paper • 2406.11931 • Published • 65 -
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Paper • 2406.14544 • Published • 35 -
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper • 2406.14491 • Published • 95
-
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 31 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 62 -
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Paper • 2406.08464 • Published • 71 -
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Paper • 2402.13064 • Published • 50