RIR-Mega: a large-scale simulated room impulse response dataset for machine learning and room acoustics modeling Paper • 2510.18917 • Published 6 days ago • 4
The Unanticipated Asymmetry Between Perceptual Optimization and Assessment Paper • 2509.20878 • Published Sep 25 • 3
DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured Latent Space Paper • 2508.00413 • Published Aug 1 • 5
Dedelayed: Deleting remote inference delay via on-device correction Paper • 2510.13714 • Published 12 days ago • 1
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model Paper • 2408.17175 • Published Aug 30, 2024 • 6
Dedelayed: Deleting remote inference delay via on-device correction Paper • 2510.13714 • Published 12 days ago • 1 • 2
Dedelayed: Deleting remote inference delay via on-device correction Paper • 2510.13714 • Published 12 days ago • 1
Stable Video Infinity: Infinite-Length Video Generation with Error Recycling Paper • 2510.09212 • Published 17 days ago • 12
Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction Paper • 2510.04759 • Published 21 days ago • 9
On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in Large Vision-Language Models Paper • 2510.09008 • Published 17 days ago • 14
UniFusion: Vision-Language Model as Unified Encoder in Image Generation Paper • 2510.12789 • Published 12 days ago • 16
FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution Paper • 2510.12747 • Published 12 days ago • 35
StreamingVLM: Real-Time Understanding for Infinite Video Streams Paper • 2510.09608 • Published 16 days ago • 48
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs Paper • 2510.10689 • Published 15 days ago • 45
Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training Paper • 2510.12586 • Published 13 days ago • 106