4 16 14

Kevin Zhang

Kevin-thu

https://kevin-thu.github.io/homepage

AI & ML interests

Computer Vision, Generation Models, Neural Rendering

Recent Activity

updated a model about 4 hours ago

Kevin-thu/StoryMem

upvoted a paper about 17 hours ago

Kling-Omni Technical Report

authored a paper 2 days ago

StoryMem: Multi-shot Long Video Storytelling with Memory

View all activity

Organizations

None yet

upvoted a paper about 17 hours ago

Kling-Omni Technical Report

Paper • 2512.16776 • Published 10 days ago • 160

upvoted a paper 5 days ago

StoryMem: Multi-shot Long Video Storytelling with Memory

Paper • 2512.19539 • Published 6 days ago • 16

upvoted a paper about 1 month ago

In-Video Instructions: Visual Signals as Generative Control

Paper • 2511.19401 • Published Nov 24 • 30

upvoted 5 papers 2 months ago

Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation

Paper • 2510.08673 • Published Oct 9 • 125

Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13 • 165

upvoted 2 papers 4 months ago

3D and 4D World Modeling: A Survey

Paper • 2509.07996 • Published Sep 4 • 58

STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer

Paper • 2508.10893 • Published Aug 14 • 31

upvoted 3 papers 5 months ago

HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels

Paper • 2507.21809 • Published Jul 29 • 136

X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again

Paper • 2507.22058 • Published Jul 29 • 39

Epona: Autoregressive Diffusion World Model for Autonomous Driving

Paper • 2506.24113 • Published Jun 30 • 1

upvoted a paper over 1 year ago

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Paper • 2408.06072 • Published Aug 12, 2024 • 39

upvoted 2 papers about 2 years ago

DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing

Paper • 2312.07409 • Published Dec 12, 2023 • 23

CogAgent: A Visual Language Model for GUI Agents

Paper • 2312.08914 • Published Dec 14, 2023 • 31

Kevin Zhang

AI & ML interests

Recent Activity

Organizations

Kevin-thu's activity