LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models Paper • 2410.09732 • Published Oct 13, 2024 • 54
Metis: A Foundation Speech Generation Model with Masked Generative Pre-training Paper • 2502.03128 • Published Feb 5 • 2
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement Paper • 2501.15417 • Published Jan 26 • 1
TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling Paper • 2508.16790 • Published Aug 22 • 10
Vevo2: Bridging Controllable Speech and Singing Voice Generation via Unified Prosody Learning Paper • 2508.16332 • Published Aug 22