Nouamane Tazi's picture

Nouamane Tazi PRO

nouamanetazi

·

https://nouamanetazi.github.io

AI & ML interests

Scale it 'til you make it

Recent Activity

new activity 2 days ago

HuggingFaceTB/smol-training-playbook:Troubleshooting Interconnect: Share Your Experience

posted an update 2 days ago

After training 𝐒𝐦𝐨𝐥𝐋𝐌𝟑 on 𝟑𝟖𝟒 𝐇𝟏𝟎𝟎𝐬 for nearly a month, I've come to realize something most people overlook: 𝐢𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐢𝐬 𝐭𝐡𝐞 𝐦𝐚𝐤𝐞-𝐨𝐫-𝐛𝐫𝐞𝐚𝐤 𝐟𝐚𝐜𝐭𝐨𝐫 𝐢𝐧 𝐋𝐋𝐌 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠. 🔥 Everyone talks about model architecture and data quality. And yes, those matter immensely. But here's what nobody tells you: when your training run fails at 2 AM because of mysterious 𝐍𝐂𝐂𝐋 𝐞𝐫𝐫𝐨𝐫𝐬, or when your expensive GPU cluster is running at 𝟔𝟎% 𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐜𝐲, the problem isn't your model. It's most probably a 𝐦𝐢𝐬𝐮𝐬𝐞 𝐨𝐟 𝐭𝐡𝐞 𝐡𝐚𝐫𝐝𝐰𝐚𝐫𝐞. 🛠️ Questions that seemed simple but had no clear answers: Why is 𝐌𝐨𝐄 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐬𝐥𝐨𝐰𝐞𝐫 𝐭𝐡𝐚𝐧 𝐝𝐞𝐧𝐬𝐞 𝐦𝐨𝐝𝐞𝐥𝐬? Which 𝐍𝐂𝐂𝐋 𝐟𝐥𝐚𝐠𝐬 should we actually set? How often should we checkpoint without killing throughput? That's why we built 𝐓𝐡𝐞 𝐒𝐦𝐨𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐏𝐥𝐚𝐲𝐛𝐨𝐨𝐤 📖: a complete guide covering everything from model architecture and data curation to the SmolLM3 training marathon, post-training techniques, and crucially, the 𝐢𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐥𝐚𝐲𝐞𝐫 that most teams get wrong. We validated real vs theoretical bandwidth across the entire stack: 𝐇𝐁𝐌𝟑 𝐡𝐢𝐭𝐭𝐢𝐧𝐠 𝟑 𝐓𝐁/𝐬, 𝐍𝐕𝐋𝐢𝐧𝐤 𝟒.𝟎 𝐫𝐞𝐚𝐜𝐡𝐢𝐧𝐠 𝟕𝟖𝟔 𝐆𝐁/𝐬, 𝐏𝐂𝐈𝐞 𝐆𝐞𝐧𝟒 𝐚𝐭 𝟏𝟒.𝟐 𝐆𝐁/𝐬. Then we ran collective operations across 𝟏𝟐𝟖 𝐆𝐏𝐔𝐬 (16 nodes, 8xH100s each) and measured how performance degrades at scale: all-reduce drops from 𝟒𝟖𝟎 𝐆𝐁/𝐬 on a single node to 𝟑𝟐𝟎-𝟑𝟓𝟎 𝐆𝐁/𝐬 across 16 nodes. If you've ever wondered why your training runs are slower than they should be, or you're planning to scale up and want to avoid expensive mistakes, this guide might save you weeks of debugging. 𝐓𝐡𝐞 𝐒𝐦𝐨𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐏𝐥𝐚𝐲𝐛𝐨𝐨𝐤: https://lnkd.in/e5MKXUHS Shared with ❤️ by the HuggingFace team

liked a Space 2 days ago

HuggingFaceTB/smol-playbook-toc

View all activity

Organizations

liked 2 Spaces 2 days ago

Smol Training Playbook - Table of Contents

The Smol Training Playbook: The Secrets to Building World-Class LLMs

liked 2 models 13 days ago

hassan-IA/amalaz-ha2en-micro

76.4M • Updated 17 days ago • 39 • 3

hassan-IA/amalaz-en2ha-micro

76.4M • Updated 17 days ago • 68 • 3

liked a Space about 1 month ago

Bringing paper to life: A modern template for scientific writing

Generate a scientific paper template

liked 2 datasets about 2 months ago

atlasia/AtlasOCRBench

Viewer • Updated Sep 16 • 251 • 58 • 2

atlasia/atlasOCR-data

Viewer • Updated Sep 16 • 30.3k • 581 • 3

liked a Space about 2 months ago

AtlasOCR Demo

Test AtlasOCR on your darija/arabic documents.

liked a model about 2 months ago

atlasia/AtlasOCR

Updated Sep 16 • 6

liked a Space 2 months ago

Pipeline Parallelism Schedule Visualizer

Visualize pipeline parallelism schedules

liked a Space 3 months ago

Dots OCR

Extract and visualize layout from PDFs or images

liked a model 3 months ago

openai/gpt-oss-120b

Text Generation • 120B • Updated Aug 26 • 3.7M • • 4.08k

liked a Space 3 months ago

README

liked a Space 4 months ago

Flux Moroccan Ghibli style

A Text-to-Image Space For Generating Morrocan Ghibli Style

liked a model 4 months ago

HuggingFaceTB/SmolLM3-3B-Base

Text Generation • 3B • Updated Aug 14 • 17.4k • 132

liked a Space 4 months ago

IqraEval Shared Task @ ArabicNLP 2025

Description Shared Task

liked 3 Spaces 5 months ago

Stable Virtual Camera

Generate 3D video from input images

Trackio 1234

Visualize project metrics with real-time updates

Worldwide Map

Display hackathon locations on a map

liked a model 5 months ago

UBC-NLP/NileChat-3B

Text Generation • 3B • Updated Jun 16 • 3.15k • 20