HaluMem: Evaluating Hallucinations in Memory Systems of Agents Paper • 2511.03506 • Published 24 days ago • 91
BixBench: a Comprehensive Benchmark for LLM-based Agents in Computational Biology Paper • 2503.00096 • Published Feb 28 • 2