MLLM Reasoning, Rewarding, and Understanding - a shuoxing Collection

shuoxing 's Collections

MLLM Reasoning, Rewarding, and Understanding

MLLM Reasoning, Rewarding, and Understanding

updated 16 days ago

Papers on the reasoning, rewarding, and understanding of the MLLMs and LLMs

Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30, 2025 • 277
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis

Paper • 2506.02096 • Published Jun 2, 2025 • 52
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation

Paper • 2506.02397 • Published Jun 3, 2025 • 35
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30, 2025 • 143
Time Blindness: Why Video-Language Models Can't See What Humans Can?

Paper • 2505.24867 • Published May 30, 2025 • 80
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?

Paper • 2505.23359 • Published May 29, 2025 • 38
Fractured Chain-of-Thought Reasoning

Paper • 2505.12992 • Published May 19, 2025 • 23
MiMo-VL Technical Report

Paper • 2506.03569 • Published Jun 4, 2025 • 80
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos

Paper • 2506.05349 • Published Jun 5, 2025 • 24
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning

Paper • 2506.05331 • Published Jun 5, 2025 • 13
How much do language models memorize?

Paper • 2505.24832 • Published May 30, 2025 • 4
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning

Paper • 2505.15966 • Published May 21, 2025 • 53
GRIT: Teaching MLLMs to Think with Images

Paper • 2505.15879 • Published May 21, 2025 • 12
InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning

Paper • 2505.18291 • Published May 23, 2025 • 2
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

Paper • 2505.16192 • Published May 22, 2025 • 12
Reward Reasoning Model

Paper • 2505.14674 • Published May 20, 2025 • 37
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

Paper • 2507.13348 • Published Jul 17, 2025 • 77
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities

Paper • 2507.13158 • Published Jul 17, 2025 • 23
Memp: Exploring Agent Procedural Memory

Paper • 2508.06433 • Published Aug 8, 2025 • 35
MIRIX: Multi-Agent Memory System for LLM-Based Agents

Paper • 2507.07957 • Published Jul 10, 2025 • 79
Zep: A Temporal Knowledge Graph Architecture for Agent Memory

Paper • 2501.13956 • Published Jan 20, 2025 • 8
Why Language Models Hallucinate

Paper • 2509.04664 • Published Sep 4, 2025 • 195
Reasoning with Sampling: Your Base Model is Smarter Than You Think

Paper • 2510.14901 • Published Oct 16, 2025 • 47
Memory in the Age of AI Agents

Paper • 2512.13564 • Published 21 days ago • 130
JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

Paper • 2512.16649 • Published 18 days ago • 24
Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

Paper • 2512.07461 • Published 28 days ago • 75
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

Paper • 2512.07783 • Published 28 days ago • 36
Arbitrage: Efficient Reasoning via Advantage-Aware Speculation

Paper • 2512.05033 • Published Dec 4, 2025 • 15