Wan2.2-TI2V-5B-Turbo-Diffusers

This repo is the Diffusers version of quanhaol/Wan2.2-TI2V-5B-Turbo

GitHub HuggingFace HuggingFace

Wan2.2-TI2V-5B-Turbo is designed for efficient step distillation and CFG distillation based on Wan2.2-TI2V-5B.

Leveraging the Self-Forcing framework, it enables 4-step TI2V-5B model training. Our model can generate 121-frame videos at 24 FPS with a resolution of 1280×704 in just 4 steps, eliminating the need for the CFG trick.

To the best of our knowledge, Wan2.2-TI2V-5B-Turbo is the first open-source repository of the distilled I2V version of Wan2.2-TI2V-5B.

🔥Video Demos

🐍 Installation

pip install -U diffusers

🚀Quick Start

Text To Video

from diffusers import WanPipeline, UniPCMultistepScheduler

device = "cuda"
pipe = WanPipeline.from_pretrained("yetter-ai/Wan2.2-TI2V-5B-Turbo-Diffusers", torch_dtype=torch.bfloat16).to(device)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=5.0)

width = 1280
height = 704
num_frames = 121
prompt = "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."

with torch.inference_mode():
    video = pipe(
        prompt = prompt,
        guidance_scale = 1.0,
        num_inference_steps = 4,
        generator = torch.Generator(device=device).manual_seed(43),
        width = width,
        height = height,
        num_frames = num_frames,
    ).frames[0]

    export_to_video(video, "video.mp4", fps=24)

Image To Video

import torch
import numpy as np
from diffusers import UniPCMultistepScheduler, WanImageToVideoPipeline
from diffusers.utils import export_to_video, load_image

device = "cuda"
pipe = WanImageToVideoPipeline.from_pretrained("yetter-ai/Wan2.2-TI2V-5B-Turbo-Diffusers", torch_dtype=torch.bfloat16).to(device)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=5.0)

max_area = 1280 * 704
mod_value = pipe.vae_scale_factor_spatial * pipe.transformer.config.patch_size[1]
image = load_image("https://github.com/quanhaol/Wan2.2-TI2V-5B-Turbo/blob/main/examples/images/cat.JPG?raw=true").convert("RGB")

aspect_ratio = image.width / image.height
width= round(np.sqrt(max_area * aspect_ratio)) // mod_value * mod_value
height = round(np.sqrt(max_area / aspect_ratio)) // mod_value * mod_value
image = image.resize((width, height))
prompt = "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
num_frames = 121

with torch.inference_mode():
    video = pipe(
        prompt = prompt,
        image = image,
        guidance_scale = 1.0,
        num_inference_steps = 4,
        generator = torch.Generator(device=device).manual_seed(43),
        width = width,
        height = height,
        num_frames = num_frames,
    ).frames[0]

    export_to_video(video, "video.mp4", fps=24)
Downloads last month
61
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yetter-ai/Wan2.2-TI2V-5B-Turbo-Diffusers

Finetuned
(11)
this model

Dataset used to train yetter-ai/Wan2.2-TI2V-5B-Turbo-Diffusers