oeohomos
			's Collections
			 
		
			
		Inbox
		
	updated
			
 
				
				
	
	
	
			
			RuCCoD: Towards Automated ICD Coding in Russian
		
			Paper
			
•
			2502.21263
			
•
			Published
				
			•
				
				132
			
 
	
	 
	
	
	
			
			Unified Reward Model for Multimodal Understanding and Generation
		
			Paper
			
•
			2503.05236
			
•
			Published
				
			•
				
				123
			
 
	
	 
	
	
	
			
			Sketch-of-Thought: Efficient LLM Reasoning with Adaptive
  Cognitive-Inspired Sketching
		
			Paper
			
•
			2503.05179
			
•
			Published
				
			•
				
				46
			
 
	
	 
	
	
	
			
			R1-Searcher: Incentivizing the Search Capability in LLMs via
  Reinforcement Learning
		
			Paper
			
•
			2503.05592
			
•
			Published
				
			•
				
				27
			
 
	
	 
	
	
	
			
			Forgetting Transformer: Softmax Attention with a Forget Gate
		
			Paper
			
•
			2503.02130
			
•
			Published
				
			•
				
				32
			
 
	
	 
	
	
	
			
			SafeArena: Evaluating the Safety of Autonomous Web Agents
		
			Paper
			
•
			2503.04957
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play
  Context Control
		
			Paper
			
•
			2503.05639
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			R1-Omni: Explainable Omni-Multimodal Emotion Recognition with
  Reinforcing Learning
		
			Paper
			
•
			2503.05379
			
•
			Published
				
			•
				
				38
			
 
	
	 
	
	
	
			
			Learning from Failures in Multi-Attempt Reinforcement Learning
		
			Paper
			
•
			2503.04808
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation
		
			Paper
			
•
			2503.04872
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos
  via Diffusion Models
		
			Paper
			
•
			2503.05638
			
•
			Published
				
			•
				
				19
			
 
	
	 
	
	
	
			
			BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation
  for Everyday Household Activities
		
			Paper
			
•
			2503.05652
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			ProReflow: Progressive Reflow with Decomposed Velocity
		
			Paper
			
•
			2503.04824
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			An Empirical Study on Eliciting and Improving R1-like Reasoning Models
		
			Paper
			
•
			2503.04548
			
•
			Published
				
			•
				
				8
			
 
	
	 
	
	
	
			
			Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
		
			Paper
			
•
			2503.05447
			
•
			Published
				
			•
				
				8
			
 
	
	 
	
	
	
			
			LONGCODEU: Benchmarking Long-Context Language Models on Long Code
  Understanding
		
			Paper
			
•
			2503.04359
			
•
			Published
				
			•
				
				6
			
 
	
	 
	
	
	
			
			SAGE: A Framework of Precise Retrieval for RAG
		
			Paper
			
•
			2503.01713
			
•
			Published
				
			•
				
				7
			
 
	
	 
	
	
	
			
			EAGLE-3: Scaling up Inference Acceleration of Large Language Models via
  Training-Time Test
		
			Paper
			
•
			2503.01840
			
•
			Published
				
			•
				
				5
			
 
	
	 
	
	
	
			
			Know You First and Be You Better: Modeling Human-Like User Simulators
  via Implicit Profiles
		
			Paper
			
•
			2502.18968
			
•
			Published
				
			•
				
				3
			
 
	
	 
	
	
	
			
			LoRACode: LoRA Adapters for Code Embeddings
		
			Paper
			
•
			2503.05315
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM
		
			Paper
			
•
			2503.04504
			
•
			Published
				
			•
				
				4
			
 
	
	 
	
	
	
			
			YuE: Scaling Open Foundation Models for Long-Form Music Generation
		
			Paper
			
•
			2503.08638
			
•
			Published
				
			•
				
				70
			
 
	
	 
	
	
	
			
			SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by
  Imitating Human Annotator Trajectories
		
			Paper
			
•
			2503.08625
			
•
			Published
				
			•
				
				27
			
 
	
	 
	
	
	
			
			Seedream 2.0: A Native Chinese-English Bilingual Image Generation
  Foundation Model
		
			Paper
			
•
			2503.07703
			
•
			Published
				
			•
				
				37
			
 
	
	 
	
	
	
			
			Block Diffusion: Interpolating Between Autoregressive and Diffusion
  Language Models
		
			Paper
			
•
			2503.09573
			
•
			Published
				
			•
				
				73
			
 
	
	 
	
	
	
			
			Multimodal Language Modeling for High-Accuracy Single Cell
  Transcriptomics Analysis and Generation
		
			Paper
			
•
			2503.09427
			
•
			Published
				
			•
				
				5
			
 
	
	 
	
	
	
			
			Video Action Differencing
		
			Paper
			
•
			2503.07860
			
•
			Published
				
			•
				
				33
			
 
	
	 
	
	
	
			
			LightGen: Efficient Image Generation through Knowledge Distillation and
  Direct Preference Optimization
		
			Paper
			
•
			2503.08619
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			OmniMamba: Efficient and Unified Multimodal Understanding and Generation
  via State Space Models
		
			Paper
			
•
			2503.08686
			
•
			Published
				
			•
				
				19
			
 
	
	 
	
	
	
			
			Exploiting Instruction-Following Retrievers for Malicious Information
  Retrieval
		
			Paper
			
•
			2503.08644
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution
  Autonomous Driving VQA from Peru
		
			Paper
			
•
			2503.07587
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			"Principal Components" Enable A New Language of Images
		
			Paper
			
•
			2503.08685
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			^RFLAV: Rolling Flow matching for infinite Audio Video generation
		
			Paper
			
•
			2503.08307
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			BiasEdit: Debiasing Stereotyped Language Models via Model Editing
		
			Paper
			
•
			2503.08588
			
•
			Published
				
			•
				
				7
			
 
	
	 
	
	
	
			
			AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion
  Models
		
			Paper
			
•
			2503.08417
			
•
			Published
				
			•
				
				8
			
 
	
	 
	
	
	
			
			AI-native Memory 2.0: Second Me
		
			Paper
			
•
			2503.08102
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			Benchmarking AI Models in Software Engineering: A Review, Search Tool,
  and Enhancement Protocol
		
			Paper
			
•
			2503.05860
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			LocAgent: Graph-Guided LLM Agents for Code Localization
		
			Paper
			
•
			2503.09089
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents
		
			Paper
			
•
			2503.08684
			
•
			Published
				
			•
				
				5
			
 
	
	 
	
	
	
		
			Paper
			
•
			2503.08507
			
•
			Published
				
			•
				
				7
			
 
	
	 
	
	
	
			
			More Documents, Same Length: Isolating the Challenge of Multiple
  Documents in RAG
		
			Paper
			
•
			2503.04388
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			Quantizing Large Language Models for Code Generation: A Differentiated
  Replication
		
			Paper
			
•
			2503.07103
			
•
			Published
				
			•
				
				8
			
 
	
	 
	
	
	
			
			Cost-Optimal Grouped-Query Attention for Long-Context LLMs
		
			Paper
			
•
			2503.09579
			
•
			Published
				
			•
				
				5
			
 
	
	 
	
	
	
			
			Self-Taught Self-Correction for Small Language Models
		
			Paper
			
•
			2503.08681
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			Multi Agent based Medical Assistant for Edge Devices
		
			Paper
			
•
			2503.05397
			
•
			Published
				
			•
				
				8
			
 
	
	 
	
	
	
			
			MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented
  Generation System
		
			Paper
			
•
			2503.09600
			
•
			Published
				
			•
				
				4
			
 
	
	 
	
	
	
			
			PhysicsGen: Can Generative Models Learn from Images to Predict Complex
  Physical Relations?
		
			Paper
			
•
			2503.05333
			
•
			Published
				
			•
				
				8
			
 
	
	 
	
	
	
			
			Technologies on Effectiveness and Efficiency: A Survey of State Spaces
  Models
		
			Paper
			
•
			2503.11224
			
•
			Published
				
			•
				
				28
			
 
	
	 
	
	
	
			
			API Agents vs. GUI Agents: Divergence and Convergence
		
			Paper
			
•
			2503.11069
			
•
			Published
				
			•
				
				37
			
 
	
	 
	
	
	
			
			Group-robust Machine Unlearning
		
			Paper
			
•
			2503.09330
			
•
			Published
				
			•
				
				1
			
 
	
	 
	
	
	
			
			Personalize Anything for Free with Diffusion Transformer
		
			Paper
			
•
			2503.12590
			
•
			Published
				
			•
				
				44
			
 
	
	 
	
	
	
			
			Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
		
			Paper
			
•
			2503.12605
			
•
			Published
				
			•
				
				35
			
 
	
	 
	
	
	
			
			Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation
		
			Paper
			
•
			2503.13070
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			RWKV-7 "Goose" with Expressive Dynamic State Evolution
		
			Paper
			
•
			2503.14456
			
•
			Published
				
			•
				
				153
			
 
	
	 
	
	
	
			
			Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion
  Transformers via In-Context Reflection
		
			Paper
			
•
			2503.12271
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			Pensez: Less Data, Better Reasoning -- Rethinking French LLM
		
			Paper
			
•
			2503.13661
			
•
			Published
				
			•
				
				5
			
 
	
	 
	
	
	
			
			PyGDA: A Python Library for Graph Domain Adaptation
		
			Paper
			
•
			2503.10284
			
•
			Published
				
			•
				
				4
			
 
	
	 
	
	
	
			
			CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous
  Driving
		
			Paper
			
•
			2503.08683
			
•
			Published
				
			•
				
				2
			
 
	
	 
	
	
	
			
			DAPO: An Open-Source LLM Reinforcement Learning System at Scale
		
			Paper
			
•
			2503.14476
			
•
			Published
				
			•
				
				141
			
 
	
	 
	
	
	
			
			STEVE: AStep Verification Pipeline for Computer-use Agent Training
		
			Paper
			
•
			2503.12532
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			GKG-LLM: A Unified Framework for Generalized Knowledge Graph
  Construction
		
			Paper
			
•
			2503.11227
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning
  Tasks
		
			Paper
			
•
			2503.15478
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			ELTEX: A Framework for Domain-Driven Synthetic Data Generation
		
			Paper
			
•
			2503.15055
			
•
			Published
				
			•
				
				6
			
 
	
	 
	
	
	
			
			Survey on Evaluation of LLM-based Agents
		
			Paper
			
•
			2503.16416
			
•
			Published
				
			•
				
				95
			
 
	
	 
	
	
	
			
			JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play
  Visual Games with Keyboards and Mouse
		
			Paper
			
•
			2503.16365
			
•
			Published
				
			•
				
				40
			
 
	
	 
	
	
	
			
			Reinforcement Learning for Reasoning in Small LLMs: What Works and What
  Doesn't
		
			Paper
			
•
			2503.16219
			
•
			Published
				
			•
				
				52
			
 
	
	 
	
	
	
			
			Why Do Multi-Agent LLM Systems Fail?
		
			Paper
			
•
			2503.13657
			
•
			Published
				
			•
				
				47
			
 
	
	 
	
	
	
			
			MAPS: A Multi-Agent Framework Based on Big Seven Personality and
  Socratic Guidance for Multimodal Scientific Problem Solving
		
			Paper
			
•
			2503.16905
			
•
			Published
				
			•
				
				54
			
 
	
	 
	
	
	
			
			MARS: A Multi-Agent Framework Incorporating Socratic Guidance for
  Automated Prompt Optimization
		
			Paper
			
•
			2503.16874
			
•
			Published
				
			•
				
				44
			
 
	
	 
	
	
	
			
			Can Large Vision Language Models Read Maps Like a Human?
		
			Paper
			
•
			2503.14607
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			A Comprehensive Survey on Long Context Language Modeling
		
			Paper
			
•
			2503.17407
			
•
			Published
				
			•
				
				49
			
 
	
	 
	
	
	
			
			UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement
  Learning
		
			Paper
			
•
			2503.21620
			
•
			Published
				
			•
				
				62
			
 
	
	 
	
	
	
			
			Large Language Model Agent: A Survey on Methodology, Applications and
  Challenges
		
			Paper
			
•
			2503.21460
			
•
			Published
				
			•
				
				83
			
 
	
	 
	
	
	
			
			ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large
  Reasoning Models with Iterative Retrieval Augmented Generation
		
			Paper
			
•
			2503.21729
			
•
			Published
				
			•
				
				29
			
 
	
	 
	
	
	
			
			Exploring the Evolution of Physics Cognition in Video Generation: A
  Survey
		
			Paper
			
•
			2503.21765
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			Think Before Recommend: Unleashing the Latent Reasoning Power for
  Sequential Recommendation
		
			Paper
			
•
			2503.22675
			
•
			Published
				
			•
				
				36
			
 
	
	 
	
	
	
			
			Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards
  for Reasoning-Enhanced Text-to-SQL
		
			Paper
			
•
			2503.23157
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			Advances and Challenges in Foundation Agents: From Brain-Inspired
  Intelligence to Evolutionary, Collaborative, and Safe Systems
		
			Paper
			
•
			2504.01990
			
•
			Published
				
			•
				
				300
			
 
	
	 
	
	
	
			
			Rethinking RL Scaling for Vision Language Models: A Transparent,
  From-Scratch Framework and Comprehensive Evaluation Scheme
		
			Paper
			
•
			2504.02587
			
•
			Published
				
			•
				
				32
			
 
	
	 
	
	
	
			
			One-Minute Video Generation with Test-Time Training
		
			Paper
			
•
			2504.05298
			
•
			Published
				
			•
				
				110
			
 
	
	 
	
	
	
			
			SmolVLM: Redefining small and efficient multimodal models
		
			Paper
			
•
			2504.05299
			
•
			Published
				
			•
				
				200
			
 
	
	 
	
	
	
			
			DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
		
			Paper
			
•
			2504.07128
			
•
			Published
				
			•
				
				86
			
 
	
	 
	
	
	
			
			MM-IFEngine: Towards Multimodal Instruction Following
		
			Paper
			
•
			2504.07957
			
•
			Published
				
			•
				
				35
			
 
	
	 
	
	
	
			
			SQL-R1: Training Natural Language to SQL Reasoning Model By
  Reinforcement Learning
		
			Paper
			
•
			2504.08600
			
•
			Published
				
			•
				
				31
			
 
	
	 
	
	
	
			
			WORLDMEM: Long-term Consistent World Simulation with Memory
		
			Paper
			
•
			2504.12369
			
•
			Published
				
			•
				
				34
			
 
	
	 
	
	
	
			
			VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference
  Optimization for Large Video Models
		
			Paper
			
•
			2504.13122
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			ToolRL: Reward is All Tool Learning Needs
		
			Paper
			
•
			2504.13958
			
•
			Published
				
			•
				
				48
			
 
	
	 
	
	
	
			
			UFO2: The Desktop AgentOS
		
			Paper
			
•
			2504.14603
			
•
			Published
				
			•
				
				29
			
 
	
	 
	
	
	
			
			OTC: Optimal Tool Calls via Reinforcement Learning
		
			Paper
			
•
			2504.14870
			
•
			Published
				
			•
				
				35
			
 
	
	 
	
	
	
			
			The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks
		
			Paper
			
•
			2504.15521
			
•
			Published
				
			•
				
				64
			
 
	
	 
	
	
	
			
			Describe Anything: Detailed Localized Image and Video Captioning
		
			Paper
			
•
			2504.16072
			
•
			Published
				
			•
				
				63
			
 
	
	 
	
	
	
			
			MR. Video: "MapReduce" is the Principle for Long Video Understanding
		
			Paper
			
•
			2504.16082
			
•
			Published
				
			•
				
				5
			
 
	
	 
	
	
	
			
			LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
		
			Paper
			
•
			2504.16030
			
•
			Published
				
			•
				
				37
			
 
	
	 
	
	
	
			
			Paper2Code: Automating Code Generation from Scientific Papers in Machine
  Learning
		
			Paper
			
•
			2504.17192
			
•
			Published
				
			•
				
				120
			
 
	
	 
	
	
	
			
			Perception, Reason, Think, and Plan: A Survey on Large Multimodal
  Reasoning Models
		
			Paper
			
•
			2505.04921
			
•
			Published
				
			•
				
				185
			
 
	
	 
	
	
	
			
			Flow-GRPO: Training Flow Matching Models via Online RL
		
			Paper
			
•
			2505.05470
			
•
			Published
				
			•
				
				85
			
 
	
	 
	
	
	
			
			Vision-Language-Action Models: Concepts, Progress, Applications and
  Challenges
		
			Paper
			
•
			2505.04769
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			A Survey of Context Engineering for Large Language Models
		
			Paper
			
•
			2507.13334
			
•
			Published
				
			•
				
				258
			
 
	
	 
	
	
	
			
			Understanding Tool-Integrated Reasoning
		
			Paper
			
•
			2508.19201
			
•
			Published
				
			•
				
				32
			
 
	
	 
	
	
	
			
			Agentic Context Engineering: Evolving Contexts for Self-Improving
  Language Models
		
			Paper
			
•
			2510.04618
			
•
			Published
				
			•
				
				113
			
 
	
	 
	
	
	
			
			Robot Learning: A Tutorial
		
			Paper
			
•
			2510.12403
			
•
			Published
				
			•
				
				99