view article Article Extract Text and Knowledge from Images with Open Vision Language Models By dvilasuero • about 3 hours ago • 3
view article Article How to Choose the Best Open Source LLM for Your Project in 2025 By dvilasuero • Sep 9 • 72
view article Article Introducing AI Sheets: a tool to work with datasets using open AI models! Aug 8 • 100
view article Article Vibe coding for data science: how to label a dataset with Kimi K2 By dvilasuero • Jul 22 • 21
view article Article LLM Hallucinations: bug or feature? The US Supreme Court 2025 cases experiment By dvilasuero • Jul 8 • 19
view article Article FineWeb-C: A Community-Driven Dataset for Educational Quality Annotations in 122 Languages By davanstrien and 5 others • Jul 8 • 31
view article Article Fine-tune ModernBERT for RAG with Synthetic Data By sdiazlor and 2 others • Jan 20 • 42
view article Article FineWeb2-C: Help Build Better Language Models in Your Language By davanstrien and 5 others • Dec 23, 2024 • 21
view article Article Introducing the Synthetic Data Generator - Build Datasets with Natural Language Dec 16, 2024 • 144
view article Article Open Preference Dataset for Text-to-Image Generation by the 🤗 Community Dec 9, 2024 • 68
view article Article Let’s make a generation of amazing image generation models By burtenshaw and 4 others • Nov 26, 2024 • 33
view article Article Argilla 2.4: Easily Build Fine-Tuning and Evaluation datasets on the Hub — No Code Required Nov 4, 2024 • 45
view article Article How to build a custom text classifier without days of human labeling By sdiazlor and 4 others • Oct 17, 2024 • 55
view article Article How to optimize your data labelling project with custom interfaces By burtenshaw and 9 others • Oct 16, 2024 • 20
view article Article 🔥 Argilla 2.0: the data-centric tool for AI makers 🤗 By dvilasuero • Jul 30, 2024 • 38
view article Article Llama 3.1 - 405B, 70B & 8B with multilinguality and long context Jul 23, 2024 • 238