Skywork
/

SkyReels-V2-I2V-14B-540P

Image-to-Video

Safetensors

i2v

Model card Files Files and versions

xet

Community

Add library_name and link to paper on HF Hub

by nielsr HF Staff - opened Apr 21

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

-84

Files changed (1) hide show

README.md +7 -84

README.md CHANGED Viewed

@@ -3,7 +3,9 @@ license: other
 license_name: skywork-license
 license_link: LICENSE
 pipeline_tag: image-to-video
 ---
 <p align="center">
   <img src="assets/logo2.png" alt="SkyReels Logo" width="50%">
 </p>
@@ -11,7 +13,7 @@ pipeline_tag: image-to-video
 <h1 align="center">SkyReels V2: Infinite-Length Film Generative Model</h1>
 <p align="center">
-📑 <a href="https://arxiv.org/pdf/2504.13074">Technical Report</a> · 👋 <a href="https://www.skyreels.ai/home?utm_campaign=huggingface_skyreels_v2" target="_blank">Playground</a> · 💬 <a href="https://discord.gg/PwM6NYtccQ" target="_blank">Discord</a> · 🤗 <a href="https://huggingface.co/collections/Skywork/skyreels-v2-6801b1b93df627d441d0d0d9" target="_blank">Hugging Face</a> · 🤖 <a href="https://www.modelscope.cn/collections/SkyReels-V2-f665650130b144" target="_blank">ModelScope</a> · 🌐 <a href="https://github.com/SkyworkAI/SkyReels-V2" target="_blank">GitHub</a>
 </p>
 ---
@@ -44,7 +46,7 @@ The demos above showcase 30-second videos generated using our SkyReels-V2 Diffus
 ## 📑 TODO List
-- [x] <a href="https://arxiv.org/pdf/2504.13074">Technical Report</a>
 - [x] Checkpoints of the 14B and 1.3B Models Series
 - [x] Single-GPU & Multi-GPU Inference Code
 - [x] <a href="https://huggingface.co/Skywork/SkyCaptioner-V1">SkyCaptioner-V1</a>: A Video Captioning Model
@@ -274,7 +276,8 @@ torchrun --nproc_per_node=2 generate_video_df.py \
   --base_num_frames 97 \
   --num_frames 257 \
   --overlap_history 17 \
-  --prompt "A serene lake surrounded by towering mountains, with a few swans gracefully gliding across the water and sunlight dancing on the surface." \
   --use_usp \
   --offload \
   --seed 42
@@ -604,84 +607,4 @@ The evaluation demonstrates that our model achieves significant advancements in
       <td>3.18</td>
       <td>2.93</td>
     </tr>
-    <tr>
-      <td>SkyReels-V2-I2V</td>
-      <td>3.29</td>
-      <td>3.42</td>
-      <td>3.18</td>
-      <td>3.56</td>
-      <td>3.01</td>
-    </tr>
-  </tbody>
-</table>
-</p>
-Our results demonstrate that both **SkyReels-V2-I2V (3.29)** and **SkyReels-V2-DF (3.24)** achieve state-of-the-art performance among open-source models, significantly outperforming HunyuanVideo-13B (2.84) and Wan2.1-14B (2.85) across all quality dimensions. With an average score of 3.29, SkyReels-V2-I2V demonstrates comparable performance to proprietary models Kling-1.6 (3.4) and Runway-Gen4 (3.39).
-#### VBench
-To objectively compare SkyReels-V2 Model against other leading open-source Text-To-Video models, we conduct comprehensive evaluations using the public benchmark <a href="https://github.com/Vchitect/VBench">V-Bench</a>. Our evaluation specifically leverages the benchmark’s longer version prompt. For fair comparison with baseline models, we strictly follow their recommended setting for inference.
-<p align="center">
-<table align="center">
-  <thead>
-    <tr>
-      <th>Model</th>
-      <th>Total Score</th>
-      <th>Quality Score</th>
-      <th>Semantic Score</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td><a href="https://github.com/hpcaitech/Open-Sora">OpenSora 2.0</a></td>
-      <td>81.5 %</td>
-      <td>82.1 %</td>
-      <td>78.2 %</td>
-    </tr>
-    <tr>
-      <td><a href="https://github.com/THUDM/CogVideo">CogVideoX1.5-5B</a></td>
-      <td>80.3 %</td>
-      <td>80.9 %</td>
-      <td>77.9 %</td>
-    </tr>
-    <tr>
-      <td><a href="https://github.com/Tencent/HunyuanVideo">HunyuanVideo-13B</a></td>
-      <td>82.7 %</td>
-      <td>84.4 %</td>
-      <td>76.2 %</td>
-    </tr>
-    <tr>
-      <td><a href="https://github.com/Wan-Video/Wan2.1">Wan2.1-14B</a></td>
-      <td>83.7 %</td>
-      <td>84.2 %</td>
-      <td><strong>81.4 %</strong></td>
-    </tr>
-    <tr>
-      <td>SkyReels-V2</td>
-      <td><strong>83.9 %</strong></td>
-      <td><strong>84.7 %</strong></td>
-      <td>80.8 %</td>
-    </tr>
-  </tbody>
-</table>
-</p>
-The VBench results demonstrate that SkyReels-V2 outperforms all compared models including HunyuanVideo-13B and Wan2.1-14B, With the highest **total score (83.9%)** and **quality score (84.7%)**. In this evaluation, the semantic score is slightly lower than Wan2.1-14B, while we outperform Wan2.1-14B in human evaluations, with the primary gap attributed to V-Bench’s insufficient evaluation of shot-scenario semantic adherence.
-## Acknowledgements
-We would like to thank the contributors of <a href="https://github.com/Wan-Video/Wan2.1">Wan 2.1</a>, <a href="https://github.com/xdit-project/xDiT">XDit</a> and <a href="https://qwenlm.github.io/blog/qwen2.5/">Qwen 2.5</a> repositories, for their open research and contributions.
-## Citation
-```bibtex
-@misc{chen2025skyreelsv2infinitelengthfilmgenerative,
-      title={SkyReels-V2: Infinite-length Film Generative Model},
-      author={Guibin Chen and Dixuan Lin and Jiangping Yang and Chunze Lin and Juncheng Zhu and Mingyuan Fan and Hao Zhang and Sheng Chen and Zheng Chen and Chengchen Ma and Weiming Xiong and Wei Wang and Nuo Pang and Kang Kang and Zhiheng Xu and Yuzhe Jin and Yupeng Liang and Yubing Song and Peng Zhao and Boyuan Xu and Di Qiu and Debang Li and Zhengcong Fei and Yang Li and Yahui Zhou},
-      year={2025},
-      eprint={2504.13074},
-      archivePrefix={arXiv},
-      primaryClass={cs.CV},
-      url={https://arxiv.org/abs/2504.13074},
-}
-```

 license_name: skywork-license
 license_link: LICENSE
 pipeline_tag: image-to-video
+library_name: transformers
 ---
 <p align="center">
   <img src="assets/logo2.png" alt="SkyReels Logo" width="50%">
 </p>
 <h1 align="center">SkyReels V2: Infinite-Length Film Generative Model</h1>
 <p align="center">
+📑 <a href="https://huggingface.co/papers/2504.13074">Technical Report</a> · 👋 <a href="https://www.skyreels.ai/home?utm_campaign=huggingface_skyreels_v2" target="_blank">Playground</a> · 💬 <a href="https://discord.gg/PwM6NYtccQ" target="_blank">Discord</a> · 🤗 <a href="https://huggingface.co/collections/Skywork/skyreels-v2-6801b1b93df627d441d0d0d9" target="_blank">Hugging Face</a> · 🤖 <a href="https://www.modelscope.cn/collections/SkyReels-V2-f665650130b144" target="_blank">ModelScope</a> · 🌐 <a href="https://github.com/SkyworkAI/SkyReels-V2" target="_blank">GitHub</a>
 </p>
 ---
 ## 📑 TODO List
+- [x] <a href="https://huggingface.co/papers/2504.13074">Technical Report</a>
 - [x] Checkpoints of the 14B and 1.3B Models Series
 - [x] Single-GPU & Multi-GPU Inference Code
 - [x] <a href="https://huggingface.co/Skywork/SkyCaptioner-V1">SkyCaptioner-V1</a>: A Video Captioning Model
   --base_num_frames 97 \
   --num_frames 257 \
   --overlap_history 17 \
+  --prompt "A graceful white swan with a curved neck and delicate feathers swimming in a serene lake at dawn, its reflection perfectly mirrored in the still water as mist rises from the surface, with the swan occasionally dipping its head into the water to feed." \
+  --addnoise_condition 20 \
   --use_usp \
   --offload \
   --seed 42
       <td>3.18</td>
       <td>2.93</td>
     </tr>
+    <tr>