Image-to-Video
Safetensors
i2v

Add library_name and link to paper on HF Hub

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +7 -84
README.md CHANGED
@@ -3,7 +3,9 @@ license: other
3
  license_name: skywork-license
4
  license_link: LICENSE
5
  pipeline_tag: image-to-video
 
6
  ---
 
7
  <p align="center">
8
  <img src="assets/logo2.png" alt="SkyReels Logo" width="50%">
9
  </p>
@@ -11,7 +13,7 @@ pipeline_tag: image-to-video
11
  <h1 align="center">SkyReels V2: Infinite-Length Film Generative Model</h1>
12
 
13
  <p align="center">
14
- 📑 <a href="https://arxiv.org/pdf/2504.13074">Technical Report</a> · 👋 <a href="https://www.skyreels.ai/home?utm_campaign=huggingface_skyreels_v2" target="_blank">Playground</a> · 💬 <a href="https://discord.gg/PwM6NYtccQ" target="_blank">Discord</a> · 🤗 <a href="https://huggingface.co/collections/Skywork/skyreels-v2-6801b1b93df627d441d0d0d9" target="_blank">Hugging Face</a> · 🤖 <a href="https://www.modelscope.cn/collections/SkyReels-V2-f665650130b144" target="_blank">ModelScope</a> · 🌐 <a href="https://github.com/SkyworkAI/SkyReels-V2" target="_blank">GitHub</a>
15
  </p>
16
 
17
  ---
@@ -44,7 +46,7 @@ The demos above showcase 30-second videos generated using our SkyReels-V2 Diffus
44
 
45
  ## 📑 TODO List
46
 
47
- - [x] <a href="https://arxiv.org/pdf/2504.13074">Technical Report</a>
48
  - [x] Checkpoints of the 14B and 1.3B Models Series
49
  - [x] Single-GPU & Multi-GPU Inference Code
50
  - [x] <a href="https://huggingface.co/Skywork/SkyCaptioner-V1">SkyCaptioner-V1</a>: A Video Captioning Model
@@ -274,7 +276,8 @@ torchrun --nproc_per_node=2 generate_video_df.py \
274
  --base_num_frames 97 \
275
  --num_frames 257 \
276
  --overlap_history 17 \
277
- --prompt "A serene lake surrounded by towering mountains, with a few swans gracefully gliding across the water and sunlight dancing on the surface." \
 
278
  --use_usp \
279
  --offload \
280
  --seed 42
@@ -604,84 +607,4 @@ The evaluation demonstrates that our model achieves significant advancements in
604
  <td>3.18</td>
605
  <td>2.93</td>
606
  </tr>
607
- <tr>
608
- <td>SkyReels-V2-I2V</td>
609
- <td>3.29</td>
610
- <td>3.42</td>
611
- <td>3.18</td>
612
- <td>3.56</td>
613
- <td>3.01</td>
614
- </tr>
615
- </tbody>
616
- </table>
617
- </p>
618
-
619
- Our results demonstrate that both **SkyReels-V2-I2V (3.29)** and **SkyReels-V2-DF (3.24)** achieve state-of-the-art performance among open-source models, significantly outperforming HunyuanVideo-13B (2.84) and Wan2.1-14B (2.85) across all quality dimensions. With an average score of 3.29, SkyReels-V2-I2V demonstrates comparable performance to proprietary models Kling-1.6 (3.4) and Runway-Gen4 (3.39).
620
-
621
-
622
- #### VBench
623
- To objectively compare SkyReels-V2 Model against other leading open-source Text-To-Video models, we conduct comprehensive evaluations using the public benchmark <a href="https://github.com/Vchitect/VBench">V-Bench</a>. Our evaluation specifically leverages the benchmark’s longer version prompt. For fair comparison with baseline models, we strictly follow their recommended setting for inference.
624
-
625
- <p align="center">
626
- <table align="center">
627
- <thead>
628
- <tr>
629
- <th>Model</th>
630
- <th>Total Score</th>
631
- <th>Quality Score</th>
632
- <th>Semantic Score</th>
633
- </tr>
634
- </thead>
635
- <tbody>
636
- <tr>
637
- <td><a href="https://github.com/hpcaitech/Open-Sora">OpenSora 2.0</a></td>
638
- <td>81.5 %</td>
639
- <td>82.1 %</td>
640
- <td>78.2 %</td>
641
- </tr>
642
- <tr>
643
- <td><a href="https://github.com/THUDM/CogVideo">CogVideoX1.5-5B</a></td>
644
- <td>80.3 %</td>
645
- <td>80.9 %</td>
646
- <td>77.9 %</td>
647
- </tr>
648
- <tr>
649
- <td><a href="https://github.com/Tencent/HunyuanVideo">HunyuanVideo-13B</a></td>
650
- <td>82.7 %</td>
651
- <td>84.4 %</td>
652
- <td>76.2 %</td>
653
- </tr>
654
- <tr>
655
- <td><a href="https://github.com/Wan-Video/Wan2.1">Wan2.1-14B</a></td>
656
- <td>83.7 %</td>
657
- <td>84.2 %</td>
658
- <td><strong>81.4 %</strong></td>
659
- </tr>
660
- <tr>
661
- <td>SkyReels-V2</td>
662
- <td><strong>83.9 %</strong></td>
663
- <td><strong>84.7 %</strong></td>
664
- <td>80.8 %</td>
665
- </tr>
666
- </tbody>
667
- </table>
668
- </p>
669
-
670
- The VBench results demonstrate that SkyReels-V2 outperforms all compared models including HunyuanVideo-13B and Wan2.1-14B, With the highest **total score (83.9%)** and **quality score (84.7%)**. In this evaluation, the semantic score is slightly lower than Wan2.1-14B, while we outperform Wan2.1-14B in human evaluations, with the primary gap attributed to V-Bench’s insufficient evaluation of shot-scenario semantic adherence.
671
-
672
- ## Acknowledgements
673
- We would like to thank the contributors of <a href="https://github.com/Wan-Video/Wan2.1">Wan 2.1</a>, <a href="https://github.com/xdit-project/xDiT">XDit</a> and <a href="https://qwenlm.github.io/blog/qwen2.5/">Qwen 2.5</a> repositories, for their open research and contributions.
674
-
675
- ## Citation
676
-
677
- ```bibtex
678
- @misc{chen2025skyreelsv2infinitelengthfilmgenerative,
679
- title={SkyReels-V2: Infinite-length Film Generative Model},
680
- author={Guibin Chen and Dixuan Lin and Jiangping Yang and Chunze Lin and Juncheng Zhu and Mingyuan Fan and Hao Zhang and Sheng Chen and Zheng Chen and Chengchen Ma and Weiming Xiong and Wei Wang and Nuo Pang and Kang Kang and Zhiheng Xu and Yuzhe Jin and Yupeng Liang and Yubing Song and Peng Zhao and Boyuan Xu and Di Qiu and Debang Li and Zhengcong Fei and Yang Li and Yahui Zhou},
681
- year={2025},
682
- eprint={2504.13074},
683
- archivePrefix={arXiv},
684
- primaryClass={cs.CV},
685
- url={https://arxiv.org/abs/2504.13074},
686
- }
687
- ```
 
3
  license_name: skywork-license
4
  license_link: LICENSE
5
  pipeline_tag: image-to-video
6
+ library_name: transformers
7
  ---
8
+
9
  <p align="center">
10
  <img src="assets/logo2.png" alt="SkyReels Logo" width="50%">
11
  </p>
 
13
  <h1 align="center">SkyReels V2: Infinite-Length Film Generative Model</h1>
14
 
15
  <p align="center">
16
+ 📑 <a href="https://huggingface.co/papers/2504.13074">Technical Report</a> · 👋 <a href="https://www.skyreels.ai/home?utm_campaign=huggingface_skyreels_v2" target="_blank">Playground</a> · 💬 <a href="https://discord.gg/PwM6NYtccQ" target="_blank">Discord</a> · 🤗 <a href="https://huggingface.co/collections/Skywork/skyreels-v2-6801b1b93df627d441d0d0d9" target="_blank">Hugging Face</a> · 🤖 <a href="https://www.modelscope.cn/collections/SkyReels-V2-f665650130b144" target="_blank">ModelScope</a> · 🌐 <a href="https://github.com/SkyworkAI/SkyReels-V2" target="_blank">GitHub</a>
17
  </p>
18
 
19
  ---
 
46
 
47
  ## 📑 TODO List
48
 
49
+ - [x] <a href="https://huggingface.co/papers/2504.13074">Technical Report</a>
50
  - [x] Checkpoints of the 14B and 1.3B Models Series
51
  - [x] Single-GPU & Multi-GPU Inference Code
52
  - [x] <a href="https://huggingface.co/Skywork/SkyCaptioner-V1">SkyCaptioner-V1</a>: A Video Captioning Model
 
276
  --base_num_frames 97 \
277
  --num_frames 257 \
278
  --overlap_history 17 \
279
+ --prompt "A graceful white swan with a curved neck and delicate feathers swimming in a serene lake at dawn, its reflection perfectly mirrored in the still water as mist rises from the surface, with the swan occasionally dipping its head into the water to feed." \
280
+ --addnoise_condition 20 \
281
  --use_usp \
282
  --offload \
283
  --seed 42
 
607
  <td>3.18</td>
608
  <td>2.93</td>
609
  </tr>
610
+ <tr>