ByteWrist: A Parallel Robotic Wrist Enabling Flexible and Anthropomorphic Motion for Confined Spaces Paper • 2509.18084 • Published Sep 22 • 13
Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis Paper • 2509.09595 • Published Sep 11 • 48
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning Paper • 2509.08519 • Published Sep 10 • 126
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning Paper • 2507.14111 • Published Jul 18 • 23
Biomed-Enriched: A Biomedical Dataset Enriched with LLMs for Pretraining and Extracting Rare and Hidden Content Paper • 2506.20331 • Published Jun 25 • 5
Optimizing Multilingual Text-To-Speech with Accents & Emotions Paper • 2506.16310 • Published Jun 19 • 24 • 9
Optimizing Multilingual Text-To-Speech with Accents & Emotions Paper • 2506.16310 • Published Jun 19 • 24 • 9
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published Mar 10 • 101 • 4
Magic 1-For-1: Generating One Minute Video Clips within One Minute Paper • 2502.07701 • Published Feb 11 • 35 • 4
Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks Paper • 2210.14712 • Published Oct 26, 2022