microsoft/VibeVoice-Realtime-0.5B Text-to-Speech • 1B • Updated about 2 hours ago • 106k • 766
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published 15 days ago • 189
RynnVLA-002: A Unified Vision-Language-Action and World Model Paper • 2511.17502 • Published 21 days ago • 24
PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity Paper • 2510.23603 • Published Oct 27 • 22