8 19 18

Xinchen Zhang

comin

https://cominclip.github.io/

Cominclip

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model

updated a model 6 days ago

comin/OmniVerifier-7B

new activity 12 days ago

comin/ViVerBench:Enhance ViVerBench dataset card: Add metadata, links, and sample usage

View all activity

Organizations

None yet

upvoted a paper 2 days ago

From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model

Paper • 2510.19871 • Published 7 days ago • 28

upvoted a paper 14 days ago

Generative Universal Verifier as Multimodal Meta-Reasoner

Paper • 2510.13804 • Published 14 days ago • 24

upvoted a paper about 1 month ago

LongLive: Real-time Interactive Long Video Generation

Paper • 2509.22622 • Published Sep 26 • 177

upvoted a paper about 2 months ago

Reconstruction Alignment Improves Unified Multimodal Models

Paper • 2509.07295 • Published Sep 8 • 40

upvoted a paper 3 months ago

AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning

Paper • 2507.12841 • Published Jul 17 • 41

upvoted 2 papers 4 months ago

SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation

Paper • 2507.09862 • Published Jul 14 • 49

Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10 • 157

upvoted a paper 5 months ago

MMaDA: Multimodal Large Diffusion Language Models

Paper • 2505.15809 • Published May 21 • 96

upvoted a paper 6 months ago

Seed1.5-VL Technical Report

Paper • 2505.07062 • Published May 11 • 151

upvoted 2 papers 8 months ago

Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening

Paper • 2502.12146 • Published Feb 17 • 16

HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation

Paper • 2502.12148 • Published Feb 17 • 17

upvoted 2 papers 9 months ago

Improving Video Generation with Human Feedback

Paper • 2501.13918 • Published Jan 23 • 52

EMO2: End-Effector Guided Audio-Driven Avatar Video Generation

Paper • 2501.10687 • Published Jan 18 • 14

upvoted an article 10 months ago

Article

Explaining the SDXL latent space

•

May 20, 2024

• 52

upvoted a paper 11 months ago

Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Paper • 2412.04431 • Published Dec 5, 2024 • 18

upvoted 3 papers about 1 year ago

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17, 2024 • 98

IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

Paper • 2410.07171 • Published Oct 9, 2024 • 43

A Survey on the Honesty of Large Language Models

Paper • 2409.18786 • Published Sep 27, 2024 • 32

upvoted a paper over 1 year ago

RealCompo: Dynamic Equilibrium between Realism and Compositionality Improves Text-to-Image Diffusion Models

Paper • 2402.12908 • Published Feb 20, 2024 • 10

Xinchen Zhang

AI & ML interests

Recent Activity

Organizations

comin's activity

Explaining the SDXL latent space