Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions Paper • 2505.00675 • Published May 1 • 3
AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph Paper • 2311.09174 • Published Nov 15, 2023
AbsInstruct: Eliciting Abstraction Ability from LLMs through Explanation Tuning with Plausibility Estimation Paper • 2402.10646 • Published Feb 16, 2024
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers Paper • 2509.03059 • Published Sep 3 • 24
NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents Paper • 2510.07172 • Published Oct 8 • 28
Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents Paper • 2512.20092 • Published 5 days ago • 5
Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents Paper • 2512.20092 • Published 5 days ago • 5
KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detection Paper • 2310.09044 • Published Oct 13, 2023
CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning Paper • 2401.07286 • Published Jan 14, 2024
DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects Paper • 2410.02730 • Published Oct 3, 2024
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly Paper • 2505.10610 • Published May 15 • 54