Hilko's picture

Hilko

hilkob

·

AI & ML interests

None yet

Recent Activity

reacted to hesamation's post with ❤️ 4 days ago

this is big... 50 AI researchers from Bytedance, Alibaba, Tencent, and other labs/universities just published a 300-page paper with surprising lessons about coding models and agents (data, pre and post-training, etc). key highlights: > small LLMs can beat proprietary giants RL (RLVR specifically) gives small open-source models an edge over big models in reasoning. a 14B model trained with RLVR on high-quality verified problems can match the performance of OpenAI's o3. > models have a hard time learning Python. mixing language models during pre-training is good, but Python behaves different from statically typed languages. languages with similar syntax (Java and C#, or JavaScript and TypeScript) creates high positive synergy. mixing Python heavily into the training of statically typed languages can actually hurt because of Python's dynamic typing. > not all languages are equal (coding scaling laws) the amount of data required to specialize a model on a language drastically depends on the language. paper argues like C# and Java are easier to learn (less training data required). languages like Python and Javascript are actually more tricky to learn, ironically (you see AI most used for these languages :) > MoE vs Dense (ability vs stability) MoE models offer higher capacity, but are much more fragile during SFT than dense models. hyperparams in training have a more drastic effect in MoE models, while dense models are more stable. MoE models also require constant learning rate schedules to avoid routing instability. > code models are "insecure" by default (duh) training on public repos makes models learn years of accumulated insecure coding patterns. safety fine-tuning often fails to work much on code. a model might refuse to write a hate speech email but will happily generate a SQL-injection vulnerable function because it "works." read the full paper: https://huggingface.co/papers/2511.18538

reacted to fdaudens's post with 🔥 7 months ago

Forget everything you know about transcription models - NVIDIA's parakeet-tdt-0.6b-v2 changed the game for me! Just tested it with Steve Jobs' Stanford speech and was speechless (pun intended). The video isn’t sped up. 3 things that floored me: - Transcription took just 10 seconds for a 15-min file - Got a CSV with perfect timestamps, punctuation & capitalization - Stunning accuracy (correctly captured "Reed College" and other specifics) NVIDIA also released a demo where you can click any transcribed segment to play it instantly. The improvement is significant: number 1 on the ASR Leaderboard, 6% error rate (best in class) with complete commercial freedom (cc-by-4.0 license). Time to update those Whisper pipelines! H/t @Steveeeeeeen for the finding! Model: https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2 Demo: https://huggingface.co/spaces/nvidia/parakeet-tdt-0.6b-v2 ASR Leaderboard: https://huggingface.co/spaces/hf-audio/open_asr_leaderboard

updated a collection almost 2 years ago

View all activity

Organizations

None yet

Collections 1

models 0

None public yet

datasets 0

None public yet