Wan2.2-TI2V-5B-Turbo-Diffusers
This repo is the Diffusers version of quanhaol/Wan2.2-TI2V-5B-Turbo
Wan2.2-TI2V-5B-Turbo is designed for efficient step distillation and CFG distillation based on Wan2.2-TI2V-5B.
Leveraging the Self-Forcing framework, it enables 4-step TI2V-5B model training. Our model can generate 121-frame videos at 24 FPS with a resolution of 1280×704 in just 4 steps, eliminating the need for the CFG trick.
To the best of our knowledge, Wan2.2-TI2V-5B-Turbo is the first open-source repository of the distilled I2V version of Wan2.2-TI2V-5B.
🔥Video Demos
🐍 Installation
pip install -U diffusers
🚀Quick Start
Text To Video
from diffusers import WanPipeline, UniPCMultistepScheduler
device = "cuda"
pipe = WanPipeline.from_pretrained("yetter-ai/Wan2.2-TI2V-5B-Turbo-Diffusers", torch_dtype=torch.bfloat16).to(device)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=5.0)
width = 1280
height = 704
num_frames = 121
prompt = "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
with torch.inference_mode():
video = pipe(
prompt = prompt,
guidance_scale = 1.0,
num_inference_steps = 4,
generator = torch.Generator(device=device).manual_seed(43),
width = width,
height = height,
num_frames = num_frames,
).frames[0]
export_to_video(video, "video.mp4", fps=24)
Image To Video
import torch
import numpy as np
from diffusers import UniPCMultistepScheduler, WanImageToVideoPipeline
from diffusers.utils import export_to_video, load_image
device = "cuda"
pipe = WanImageToVideoPipeline.from_pretrained("yetter-ai/Wan2.2-TI2V-5B-Turbo-Diffusers", torch_dtype=torch.bfloat16).to(device)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=5.0)
max_area = 1280 * 704
mod_value = pipe.vae_scale_factor_spatial * pipe.transformer.config.patch_size[1]
image = load_image("https://github.com/quanhaol/Wan2.2-TI2V-5B-Turbo/blob/main/examples/images/cat.JPG?raw=true").convert("RGB")
aspect_ratio = image.width / image.height
width= round(np.sqrt(max_area * aspect_ratio)) // mod_value * mod_value
height = round(np.sqrt(max_area / aspect_ratio)) // mod_value * mod_value
image = image.resize((width, height))
prompt = "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
num_frames = 121
with torch.inference_mode():
video = pipe(
prompt = prompt,
image = image,
guidance_scale = 1.0,
num_inference_steps = 4,
generator = torch.Generator(device=device).manual_seed(43),
width = width,
height = height,
num_frames = num_frames,
).frames[0]
export_to_video(video, "video.mp4", fps=24)
- Downloads last month
- 61
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for yetter-ai/Wan2.2-TI2V-5B-Turbo-Diffusers
Base model
Wan-AI/Wan2.2-TI2V-5B