GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via
Blender-Oriented GPT Planning
Paper
•
2311.12631
•
Published
•
15
DeepSeekMoE: Towards Ultimate Expert Specialization in
Mixture-of-Experts Language Models
Paper
•
2401.06066
•
Published
•
56
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in
One Step
Paper
•
2504.01956
•
Published
•
40
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence
with Spatial Reasoning and Understanding
Paper
•
2506.23219
•
Published
•
7
CriticLean: Critic-Guided Reinforcement Learning for Mathematical
Formalization
Paper
•
2507.06181
•
Published
•
43
Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs
More Realistic and Less Risky
Paper
•
2507.03336
•
Published
•
5
GTA1: GUI Test-time Scaling Agent
Paper
•
2507.05791
•
Published
•
26
LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+
FPS
Paper
•
2507.07136
•
Published
•
37
Lumos-1: On Autoregressive Video Generation from a Unified Model
Perspective
Paper
•
2507.08801
•
Published
•
30
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive
Token-Level Computation
Paper
•
2507.10524
•
Published
•
70
SWE-Perf: Can Language Models Optimize Code Performance on Real-World
Repositories?
Paper
•
2507.12415
•
Published
•
42
OpenCodeReasoning-II: A Simple Test Time Scaling Approach via
Self-Critique
Paper
•
2507.09075
•
Published
•
15
REST: Stress Testing Large Reasoning Models by Asking Multiple Problems
at Once
Paper
•
2507.10541
•
Published
•
29
Lizard: An Efficient Linearization Framework for Large Language Models
Paper
•
2507.09025
•
Published
•
18
AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs
Paper
•
2507.08616
•
Published
•
13
EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and
Reasoning Modes
Paper
•
2507.11407
•
Published
•
57
The Imitation Game: Turing Machine Imitator is Length Generalizable
Reasoner
Paper
•
2507.13332
•
Published
•
48
RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA
Optimization
Paper
•
2507.12142
•
Published
•
36
FLEXITOKENS: Flexible Tokenization for Evolving Language Models
Paper
•
2507.12720
•
Published
•
9
Inverse Reinforcement Learning Meets Large Language Model Post-Training:
Basics, Advances, and Opportunities
Paper
•
2507.13158
•
Published
•
24
Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated
Diffusion Transformers
Paper
•
2507.08422
•
Published
•
36
WebShaper: Agentically Data Synthesizing via Information-Seeking
Formalization
Paper
•
2507.15061
•
Published
•
59
Robust 3D-Masked Part-level Editing in 3D Gaussian Splatting with
Regularized Score Distillation Sampling
Paper
•
2507.11061
•
Published
•
37
Gaussian Splatting with Discretized SDF for Relightable Assets
Paper
•
2507.15629
•
Published
•
23
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper
•
2507.16784
•
Published
•
120
Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention
Paper
•
2507.17745
•
Published
•
33
Deep Researcher with Test-Time Diffusion
Paper
•
2507.16075
•
Published
•
64
ScreenCoder: Advancing Visual-to-Code Generation for Front-End
Automation via Modular Multimodal Agents
Paper
•
2507.22827
•
Published
•
98
On the Expressiveness of Softmax Attention: A Recurrent Neural Network
Perspective
Paper
•
2507.23632
•
Published
•
6
Beyond Fixed: Variable-Length Denoising for Diffusion Large Language
Models
Paper
•
2508.00819
•
Published
•
62
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent
Foundation Models Training
Paper
•
2508.00414
•
Published
•
91
Qwen-Image Technical Report
Paper
•
2508.02324
•
Published
•
257
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and
Outcome Reward
Paper
•
2508.03686
•
Published
•
36
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Paper
•
2508.01191
•
Published
•
236
Efficient Agents: Building Effective Agents While Reducing Cost
Paper
•
2508.02694
•
Published
•
85
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper
•
2508.03680
•
Published
•
70
CRINN: Contrastive Reinforcement Learning for Approximate Nearest
Neighbor Search
Paper
•
2508.02091
•
Published
•
13
Reinforcement Pre-Training
Paper
•
2506.08007
•
Published
•
262
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy
Optimization
Paper
•
2508.05731
•
Published
•
25
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving
Clipping Policy Optimization
Paper
•
2508.07629
•
Published
•
41
Adapting Vision-Language Models Without Labels: A Comprehensive Survey
Paper
•
2508.05547
•
Published
•
11
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper
•
2508.06471
•
Published
•
186
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of
Deep-Research Agent
Paper
•
2508.06600
•
Published
•
39
Reinforcement Learning in Vision: A Survey
Paper
•
2508.08189
•
Published
•
28
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper
•
2508.08221
•
Published
•
47
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with
Long-Term Memory
Paper
•
2508.09736
•
Published
•
56
SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings
and Speaks in Tokens
Paper
•
2508.05305
•
Published
•
46
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models
Paper
•
2508.09968
•
Published
•
15
A Survey on Diffusion Language Models
Paper
•
2508.10875
•
Published
•
34
Quantization Meets dLLMs: A Systematic Study of Post-training
Quantization for Diffusion LLMs
Paper
•
2508.14896
•
Published
•
22
XQuant: Breaking the Memory Wall for LLM Inference with KV Cache
Rematerialization
Paper
•
2508.10395
•
Published
•
42
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid
Mamba-Transformer Reasoning Model
Paper
•
2508.14444
•
Published
•
36
Deep Think with Confidence
Paper
•
2508.15260
•
Published
•
87
Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From
Sparse Inputs without Per-Scene Optimization
Paper
•
2508.14811
•
Published
•
40
UQ: Assessing Language Models on Unsolved Questions
Paper
•
2508.17580
•
Published
•
15
T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image
Generation
Paper
•
2508.17472
•
Published
•
26
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large
Language Models
Paper
•
2508.18773
•
Published
•
15
Autoregressive Universal Video Segmentation Model
Paper
•
2508.19242
•
Published
•
28
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding
in Vision-Language-Action Policies
Paper
•
2508.20072
•
Published
•
30
Self-Rewarding Vision-Language Model via Reasoning Decomposition
Paper
•
2508.19652
•
Published
•
84
SpotEdit: Evaluating Visually-Guided Image Editing Methods
Paper
•
2508.18159
•
Published
•
3
ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks
Paper
•
2508.15804
•
Published
•
15
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable
Text-to-Image Reinforcement Learning
Paper
•
2508.20751
•
Published
•
89
TreePO: Bridging the Gap of Policy Optimization and Efficacy and
Inference Efficiency with Heuristic Tree-based Modeling
Paper
•
2508.17445
•
Published
•
80
Mixture of Contexts for Long Video Generation
Paper
•
2508.21058
•
Published
•
34
VibeVoice Technical Report
Paper
•
2508.19205
•
Published
•
123
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer
Use Agent with Decoupled Reinforcement Learning
Paper
•
2508.20096
•
Published
•
36
InMind: Evaluating LLMs in Capturing and Applying Individual Human
Reasoning Styles
Paper
•
2508.16072
•
Published
•
4
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for
General Robot Control
Paper
•
2508.21112
•
Published
•
75
UItron: Foundational GUI Agent with Advanced Perception and Planning
Paper
•
2508.21767
•
Published
•
12
Efficient Code Embeddings from Code Generation Models
Paper
•
2508.21290
•
Published
•
18
TiKMiX: Take Data Influence into Dynamic Mixture for Language Model
Pre-training
Paper
•
2508.17677
•
Published
•
14
CLIPSym: Delving into Symmetry Detection with CLIP
Paper
•
2508.14197
•
Published
•
8
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn
Reinforcement Learning
Paper
•
2509.02544
•
Published
•
121
Mixture of Global and Local Experts with Diffusion Transformer for
Controllable Face Generation
Paper
•
2509.00428
•
Published
•
17
Symbolic Graphics Programming with Large Language Models
Paper
•
2509.05208
•
Published
•
45
LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion
Transformers via Explicit Correspondence
Paper
•
2509.12203
•
Published
•
19
Locality in Image Diffusion Models Emerges from Data Statistics
Paper
•
2509.09672
•
Published
•
12
LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised
Learning in Open-World Scenarios
Paper
•
2509.09926
•
Published
•
13
Single-stream Policy Optimization
Paper
•
2509.13232
•
Published
•
33
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis
Paper
•
2509.10441
•
Published
•
30
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon
Agents
Paper
•
2509.13309
•
Published
•
66
Towards General Agentic Intelligence via Environment Scaling
Paper
•
2509.13311
•
Published
•
69
Stable Part Diffusion 4D: Multi-View RGB and Kinematic Parts Video
Generation
Paper
•
2509.10687
•
Published
•
6
RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation
Paper
•
2509.15212
•
Published
•
21
AToken: A Unified Tokenizer for Vision
Paper
•
2509.14476
•
Published
•
36
Qwen3-Omni Technical Report
Paper
•
2509.17765
•
Published
•
131
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and
Open Resources
Paper
•
2509.21268
•
Published
•
100
MinerU2.5: A Decoupled Vision-Language Model for Efficient
High-Resolution Document Parsing
Paper
•
2509.22186
•
Published
•
121
Fine-tuning Done Right in Model Editing
Paper
•
2509.22072
•
Published
•
27
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with
Verifiable Rewards via Monte Carlo Tree Search
Paper
•
2509.25454
•
Published
•
133
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive
Exploration for Agentic Reinforcement Learning
Paper
•
2509.22601
•
Published
•
29
Attention as a Compass: Efficient Exploration for Process-Supervised RL
in Reasoning Models
Paper
•
2509.26628
•
Published
•
13
More Thought, Less Accuracy? On the Dual Nature of Reasoning in
Vision-Language Models
Paper
•
2509.25848
•
Published
•
77
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget
Allocation
Paper
•
2509.25849
•
Published
•
46
VoiceAssistant-Eval: Benchmarking AI Assistants across Listening,
Speaking, and Viewing
Paper
•
2509.22651
•
Published
•
22
LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale
Diffusion Transformer
Paper
•
2509.22414
•
Published
•
21
LongCodeZip: Compress Long Context for Code Language Models
Paper
•
2510.00446
•
Published
•
106
No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM
Reinforcement Learning via Entropy-Guided Advantage Shaping
Paper
•
2509.21880
•
Published
•
44
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large
Multimodal Models
Paper
•
2510.05034
•
Published
•
45
Reactive Transformer (RxT) -- Stateful Real-Time Processing for
Event-Driven Reactive Language Models
Paper
•
2510.03561
•
Published
•
23
Free Lunch Alignment of Text-to-Image Diffusion Models without
Preference Image Pairs
Paper
•
2509.25771
•
Published
•
10
Why Low-Precision Transformer Training Fails: An Analysis on Flash
Attention
Paper
•
2510.04212
•
Published
•
22
Efficient Intent Detection with Dual Sentence Encoders
Paper
•
2003.04807
•
Published
•
2
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning
for LLMs
Paper
•
2510.11696
•
Published
•
164
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for
MLLMs
Paper
•
2510.09201
•
Published
•
46
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning
and Online Reinforcement Learning
Paper
•
2510.12693
•
Published
•
25