Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance Paper • 2512.08765 • Published about 16 hours ago • 76
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling Paper • 2511.20785 • Published 15 days ago • 150
GVE Collection Towards General Video Embeddings: Models and Benchmarks • 4 items • Updated Nov 3 • 19
Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum Paper • 2510.27571 • Published Oct 31 • 17
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published Oct 27 • 174
E^2Rank: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker Paper • 2510.22733 • Published Oct 26 • 31
Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation Paper • 2510.23581 • Published Oct 27 • 41
VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting Paper • 2510.21817 • Published Oct 21 • 41
Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences Paper • 2510.23451 • Published Oct 27 • 26
RobotArena infty: Scalable Robot Benchmarking via Real-to-Sim Translation Paper • 2510.23571 • Published Oct 27 • 8
LimRank: Less is More for Reasoning-Intensive Information Reranking Paper • 2510.23544 • Published Oct 27 • 8
PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity Paper • 2510.23603 • Published Oct 27 • 22
IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction Paper • 2510.22706 • Published Oct 26 • 39