view article Article 🏟️ Smol AI WorldCup: A 5-Axis Benchmark That Reveals What Small Language Models Can Really Do 18 days ago • 38
view article Article MARL: Runtime Middleware That Reduces LLM Hallucination Without Fine-Tuning 19 days ago • 15
view article Article Structural Problems in AI Benchmarking and the Case for a Unified Evaluation Framework 20 days ago • 12
FINAL Bench Collection World's First Functional Metacognition Benchmark. "Not how much AI knows — but whether it knows what it doesn't know, and can fix it." • 2 items • Updated Feb 21 • 4