Diffusers
Safetensors
WanDMDPipeline
BrianChen1129 commited on
Commit
f3def88
·
verified ·
1 Parent(s): e72280f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -6
README.md CHANGED
@@ -22,13 +22,27 @@ license: apache-2.0
22
 
23
 
24
 
 
 
 
 
 
 
25
  ## Model Overview
26
- - This model is jointly finetuned with [DMD](https://arxiv.org/pdf/2405.14867) and [VSA](https://arxiv.org/pdf/2505.13389), based on [Wan-AI/Wan2.1-T2V-14B-Diffusers](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B-Diffusers).
27
- - It supports 3-step inference and achieves up to 50x speed up.、
28
- - Supports generating videos with **61×448×832** resolution.
29
- - Both [finetuning](https://github.com/hao-ai-lab/FastVideo/blob/main/scripts/distill/v1_distill_dmd_wan_VSA.sh) and [inference](https://github.com/hao-ai-lab/FastVideo/blob/main/scripts/inference/v1_inference_wan_dmd.sh) scripts are available in the [FastVideo](https://github.com/hao-ai-lab/FastVideo) repository.
30
- - Try it out on **FastVideo** — we support a wide range of GPUs from **H100** to **4090**, and even support **Mac** users!
31
- - We use [FastVideo 480P Synthetic Wan dataset](https://huggingface.co/datasets/FastVideo/Wan-Syn_77x448x832_600k) for training.
 
 
 
 
 
 
 
 
32
 
33
 
34
 
 
22
 
23
 
24
 
25
+ ## Introduction
26
+
27
+ This model is jointly finetuned with [DMD](https://arxiv.org/pdf/2405.14867) and [VSA](https://arxiv.org/pdf/2505.13389), based on [Wan-AI/Wan2.1-T2V-1.3B-Diffusers](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B-Diffusers). It supports efficient 3-step inference and generates high-quality videos at **61×448×832** resolution. We adopt the [FastVideo 480P Synthetic Wan dataset](https://huggingface.co/datasets/FastVideo/Wan-Syn_77x448x832_600k), consisting of 600k synthetic latents.
28
+
29
+ ---
30
+
31
  ## Model Overview
32
+
33
+ - 3-step inference is supported and achieves up to **50x speed up** on a single **H100** GPU.
34
+ - Supports generating videos with resolution **61×448×832**.
35
+ - Finetuning and inference scripts are available in the [FastVideo](https://github.com/hao-ai-lab/FastVideo) repository:
36
+ - [Finetuning script](https://github.com/hao-ai-lab/FastVideo/blob/main/scripts/distill/v1_distill_dmd_wan_VSA.sh)
37
+ - [Inference script](https://github.com/hao-ai-lab/FastVideo/blob/main/scripts/inference/v1_inference_wan_dmd.sh)
38
+ - Try it out on **FastVideo** — we support a wide range of GPUs from **H100** to **4090**, and also support **Mac** users!
39
+
40
+ ### Training Infrastructure
41
+
42
+ Training was conducted on **8 nodes with 64 H200 GPUs** in total, using a `global batch size = 64`.
43
+ We enable `gradient checkpointing`, and use `learning rate = 1e-5`.
44
+ We set **VSA attention sparsity** to 0.9, and training runs for **3000 steps (~52 hours)**
45
+ The detailed training example script is available [here](https://github.com/hao-ai-lab/FastVideo/blob/main/examples/distill/Wan-Syn-480P/distill_dmd_VSA_t2v_14B_480P.slurm).
46
 
47
 
48