BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses Paper • 2510.00232 • Published 27 days ago • 15
ParlaSpeech Collection Speech + text dataset collection based on the ParlaMint data. Paper describing the construction process: https://www.arxiv.org/abs/2409.15397. • 4 items • Updated Oct 11, 2024 • 1
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them Paper • 2509.21117 • Published Sep 25 • 29
Granary: Speech Recognition and Translation Dataset in 25 European Languages Paper • 2505.13404 • Published May 19 • 2
AudioStory: Generating Long-Form Narrative Audio with Large Language Models Paper • 2508.20088 • Published Aug 27 • 20
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers Paper • 2508.20453 • Published Aug 28 • 63
view article Article How to build a custom text classifier without days of human labeling By sdiazlor and 4 others • Oct 17, 2024 • 55
Llama-3.1-Nemotron-70B Collection SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. • 6 items • Updated 6 days ago • 155
Bielik-11B-v2.2 Collection A collection of models based on Bielik-11B-v2.2 - instruct and quantized versions. • 17 items • Updated Jun 6 • 28
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer Paper • 2401.16658 • Published Jan 30, 2024 • 14