-
TheBloke/Mistral-7B-Instruct-v0.1-GGUF
Text Generation • 7B • Updated • 33.6k • 607 -
TheBloke/OpenHermes-2.5-neural-chat-7B-v3-1-7B-GGUF
7B • Updated • 512 • 52 -
TinyLlama/TinyLlama-1.1B-Chat-v0.6
Text Generation • 1B • Updated • 4.36k • 107 -
openai/whisper-large-v3
Automatic Speech Recognition • 2B • Updated • 5.06M • • 5.18k
Hilko
hilkob
·
AI & ML interests
None yet
Recent Activity
reacted
to
hesamation's
post
with ❤️
4 days ago
this is big... 50 AI researchers from Bytedance, Alibaba, Tencent, and other labs/universities just published a 300-page paper with surprising lessons about coding models and agents (data, pre and post-training, etc).
key highlights:
> small LLMs can beat proprietary giants
RL (RLVR specifically) gives small open-source models an edge over big models in reasoning. a 14B model trained with RLVR on high-quality verified problems can match the performance of OpenAI's o3.
> models have a hard time learning Python.
mixing language models during pre-training is good, but Python behaves different from statically typed languages. languages with similar syntax (Java and C#, or JavaScript and TypeScript) creates high positive synergy. mixing Python heavily into the training of statically typed languages can actually hurt because of Python's dynamic typing.
> not all languages are equal (coding scaling laws)
the amount of data required to specialize a model on a language drastically depends on the language. paper argues like C# and Java are easier to learn (less training data required). languages like Python and Javascript are actually more tricky to learn, ironically (you see AI most used for these languages :)
> MoE vs Dense (ability vs stability)
MoE models offer higher capacity, but are much more fragile during SFT than dense models. hyperparams in training have a more drastic effect in MoE models, while dense models are more stable. MoE models also require constant learning rate schedules to avoid routing instability.
> code models are "insecure" by default (duh)
training on public repos makes models learn years of accumulated insecure coding patterns. safety fine-tuning often fails to work much on code. a model might refuse to write a hate speech email but will happily generate a SQL-injection vulnerable function because it "works."
read the full paper:
https://huggingface.co/papers/2511.18538
updated
a collection
almost 2 years ago
llmac
Organizations
None yet