Improve model card with pipeline tag, library name, and license clarification
#4
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,13 +1,14 @@
|
|
| 1 |
---
|
| 2 |
-
license: other
|
| 3 |
-
license_link: https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE
|
| 4 |
language:
|
| 5 |
-
|
|
|
|
|
|
|
| 6 |
tags:
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
inference: false
|
|
|
|
| 11 |
---
|
| 12 |
|
| 13 |
# CogVideoX1.5-5B
|
|
@@ -23,224 +24,104 @@ inference: false
|
|
| 23 |
<a href="https://arxiv.org/pdf/2408.06072">📜 arxiv </a>
|
| 24 |
</p>
|
| 25 |
<p align="center">
|
| 26 |
-
📍 Visit <a href="https://chatglm.cn/video?fr=
|
| 27 |
</p>
|
| 28 |
|
| 29 |
## Model Introduction
|
| 30 |
|
| 31 |
-
CogVideoX is an open-source video generation model similar to [QingYing](https://chatglm.cn/video?fr=osm_cogvideo).
|
| 32 |
-
Below is a table listing information on the video generation models available in this generation:
|
| 33 |
-
|
| 34 |
|
| 35 |
<table style="border-collapse: collapse; width: 100%;">
|
| 36 |
<tr>
|
| 37 |
<th style="text-align: center;">Model Name</th>
|
| 38 |
-
<th style="text-align: center;">CogVideoX1.5-5B (
|
| 39 |
-
<th style="text-align: center;">CogVideoX1.5-5B-I2V</th>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
</tr>
|
| 41 |
<tr>
|
| 42 |
<td style="text-align: center;">Video Resolution</td>
|
| 43 |
<td colspan="1" style="text-align: center;">1360 * 768</td>
|
| 44 |
<td colspan="1" style="text-align: center;"> Min(W, H) = 768 <br> 768 ≤ Max(W, H) ≤ 1360 <br> Max(W, H) % 16 = 0 </td>
|
| 45 |
-
</
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
<tr>
|
| 47 |
<td style="text-align: center;">Inference Precision</td>
|
| 48 |
-
<td colspan="2" style="text-align: center;"><b>BF16 (
|
|
|
|
|
|
|
| 49 |
</tr>
|
| 50 |
<tr>
|
| 51 |
-
<td style="text-align: center;">Single GPU
|
| 52 |
-
<td colspan="2"
|
|
|
|
|
|
|
| 53 |
</tr>
|
| 54 |
<tr>
|
| 55 |
-
<td style="text-align: center;">Multi-GPU
|
| 56 |
-
<td colspan="2" style="text-align: center;"><b>BF16: 24GB* </b><br></td>
|
|
|
|
|
|
|
| 57 |
</tr>
|
| 58 |
<tr>
|
| 59 |
-
<td style="text-align: center;">Inference Speed<br>(Step = 50, BF16)</td>
|
| 60 |
<td colspan="2" style="text-align: center;">Single A100: ~1000 seconds (5-second video)<br>Single H100: ~550 seconds (5-second video)</td>
|
|
|
|
|
|
|
| 61 |
</tr>
|
| 62 |
<tr>
|
| 63 |
<td style="text-align: center;">Prompt Language</td>
|
| 64 |
<td colspan="5" style="text-align: center;">English*</td>
|
| 65 |
</tr>
|
| 66 |
<tr>
|
| 67 |
-
<td style="text-align: center;">
|
| 68 |
<td colspan="2" style="text-align: center;">224 Tokens</td>
|
|
|
|
| 69 |
</tr>
|
| 70 |
<tr>
|
| 71 |
<td style="text-align: center;">Video Length</td>
|
| 72 |
-
<td colspan="2" style="text-align: center;">5 or 10 seconds</td>
|
|
|
|
| 73 |
</tr>
|
| 74 |
<tr>
|
| 75 |
-
|
| 76 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 77 |
</tr>
|
| 78 |
</table>
|
| 79 |
|
| 80 |
-
**
|
| 81 |
-
|
| 82 |
-
+ Testing with the `diffusers` library enabled all optimizations included in the library. This scheme has not been
|
| 83 |
-
tested on non-NVIDIA A100/H100 devices. It should generally work with all NVIDIA Ampere architecture or higher
|
| 84 |
-
devices. Disabling optimizations can triple VRAM usage but increase speed by 3-4 times. You can selectively disable
|
| 85 |
-
certain optimizations, including:
|
| 86 |
-
|
| 87 |
-
```
|
| 88 |
-
pipe.enable_sequential_cpu_offload()
|
| 89 |
-
pipe.vae.enable_slicing()
|
| 90 |
-
pipe.vae.enable_tiling()
|
| 91 |
-
```
|
| 92 |
-
|
| 93 |
-
+ In multi-GPU inference, `enable_sequential_cpu_offload()` optimization needs to be disabled.
|
| 94 |
-
+ Using an INT8 model reduces inference speed, meeting the requirements of lower VRAM GPUs while retaining minimal video
|
| 95 |
-
quality degradation, at the cost of significant speed reduction.
|
| 96 |
-
+ [PytorchAO](https://github.com/pytorch/ao) and [Optimum-quanto](https://github.com/huggingface/optimum-quanto/) can be
|
| 97 |
-
used to quantize the text encoder, Transformer, and VAE modules, reducing CogVideoX’s memory requirements, making it
|
| 98 |
-
feasible to run the model on smaller VRAM GPUs. TorchAO quantization is fully compatible with `torch.compile`,
|
| 99 |
-
significantly improving inference speed. `FP8` precision is required for NVIDIA H100 and above, which requires source
|
| 100 |
-
installation of `torch`, `torchao`, `diffusers`, and `accelerate`. Using `CUDA 12.4` is recommended.
|
| 101 |
-
+ Inference speed testing also used the above VRAM optimizations, and without optimizations, speed increases by about
|
| 102 |
-
10%. Only `diffusers` versions of models support quantization.
|
| 103 |
-
+ Models support English input only; other languages should be translated into English during prompt crafting with a
|
| 104 |
-
larger model.
|
| 105 |
-
|
| 106 |
-
**Note**
|
| 107 |
-
|
| 108 |
-
+ Use [SAT](https://github.com/THUDM/SwissArmyTransformer) for inference and fine-tuning SAT version models. Check our
|
| 109 |
-
GitHub for more details.
|
| 110 |
-
|
| 111 |
-
## Getting Started Quickly 🤗
|
| 112 |
-
|
| 113 |
-
This model supports deployment using the Hugging Face diffusers library. You can follow the steps below to get started.
|
| 114 |
-
|
| 115 |
-
**We recommend that you visit our [GitHub](https://github.com/THUDM/CogVideo) to check out prompt optimization and
|
| 116 |
-
conversion to get a better experience.**
|
| 117 |
-
|
| 118 |
-
1. Install the required dependencies
|
| 119 |
-
|
| 120 |
-
```shell
|
| 121 |
-
# diffusers (from source)
|
| 122 |
-
# transformers>=4.46.2
|
| 123 |
-
# accelerate>=1.1.1
|
| 124 |
-
# imageio-ffmpeg>=0.5.1
|
| 125 |
-
pip install git+https://github.com/huggingface/diffusers
|
| 126 |
-
pip install --upgrade transformers accelerate diffusers imageio-ffmpeg
|
| 127 |
-
```
|
| 128 |
-
|
| 129 |
-
2. Run the code
|
| 130 |
-
|
| 131 |
-
```python
|
| 132 |
-
import torch
|
| 133 |
-
from diffusers import CogVideoXPipeline
|
| 134 |
-
from diffusers.utils import export_to_video
|
| 135 |
-
|
| 136 |
-
prompt = "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance."
|
| 137 |
-
|
| 138 |
-
pipe = CogVideoXPipeline.from_pretrained(
|
| 139 |
-
"THUDM/CogVideoX1.5-5B",
|
| 140 |
-
torch_dtype=torch.bfloat16
|
| 141 |
-
)
|
| 142 |
-
|
| 143 |
-
pipe.enable_sequential_cpu_offload()
|
| 144 |
-
pipe.vae.enable_tiling()
|
| 145 |
-
pipe.vae.enable_slicing()
|
| 146 |
-
|
| 147 |
-
video = pipe(
|
| 148 |
-
prompt=prompt,
|
| 149 |
-
num_videos_per_prompt=1,
|
| 150 |
-
num_inference_steps=50,
|
| 151 |
-
num_frames=81,
|
| 152 |
-
guidance_scale=6,
|
| 153 |
-
generator=torch.Generator(device="cuda").manual_seed(42),
|
| 154 |
-
).frames[0]
|
| 155 |
-
|
| 156 |
-
export_to_video(video, "output.mp4", fps=8)
|
| 157 |
-
```
|
| 158 |
-
|
| 159 |
-
## Quantized Inference
|
| 160 |
-
|
| 161 |
-
[PytorchAO](https://github.com/pytorch/ao) and [Optimum-quanto](https://github.com/huggingface/optimum-quanto/) can be
|
| 162 |
-
used to quantize the text encoder, transformer, and VAE modules to reduce CogVideoX's memory requirements. This allows
|
| 163 |
-
the model to run on free T4 Colab or GPUs with lower VRAM! Also, note that TorchAO quantization is fully compatible
|
| 164 |
-
with `torch.compile`, which can significantly accelerate inference.
|
| 165 |
-
|
| 166 |
-
```python
|
| 167 |
-
# To get started, PytorchAO needs to be installed from the GitHub source and PyTorch Nightly.
|
| 168 |
-
# Source and nightly installation is only required until the next release.
|
| 169 |
-
|
| 170 |
-
import torch
|
| 171 |
-
from diffusers import AutoencoderKLCogVideoX, CogVideoXTransformer3DModel, CogVideoXImageToVideoPipeline
|
| 172 |
-
from diffusers.utils import export_to_video
|
| 173 |
-
from transformers import T5EncoderModel
|
| 174 |
-
from torchao.quantization import quantize_, int8_weight_only
|
| 175 |
-
|
| 176 |
-
quantization = int8_weight_only
|
| 177 |
-
|
| 178 |
-
text_encoder = T5EncoderModel.from_pretrained("THUDM/CogVideoX1.5-5B", subfolder="text_encoder",
|
| 179 |
-
torch_dtype=torch.bfloat16)
|
| 180 |
-
quantize_(text_encoder, quantization())
|
| 181 |
-
|
| 182 |
-
transformer = CogVideoXTransformer3DModel.from_pretrained("THUDM/CogVideoX1.5-5B", subfolder="transformer",
|
| 183 |
-
torch_dtype=torch.bfloat16)
|
| 184 |
-
quantize_(transformer, quantization())
|
| 185 |
-
|
| 186 |
-
vae = AutoencoderKLCogVideoX.from_pretrained("THUDM/CogVideoX1.5-5B", subfolder="vae", torch_dtype=torch.bfloat16)
|
| 187 |
-
quantize_(vae, quantization())
|
| 188 |
-
|
| 189 |
-
# Create pipeline and run inference
|
| 190 |
-
pipe = CogVideoXImageToVideoPipeline.from_pretrained(
|
| 191 |
-
"THUDM/CogVideoX1.5-5B",
|
| 192 |
-
text_encoder=text_encoder,
|
| 193 |
-
transformer=transformer,
|
| 194 |
-
vae=vae,
|
| 195 |
-
torch_dtype=torch.bfloat16,
|
| 196 |
-
)
|
| 197 |
-
|
| 198 |
-
pipe.enable_model_cpu_offload()
|
| 199 |
-
pipe.vae.enable_tiling()
|
| 200 |
-
pipe.vae.enable_slicing()
|
| 201 |
-
|
| 202 |
-
prompt = "A little girl is riding a bicycle at high speed. Focused, detailed, realistic."
|
| 203 |
-
video = pipe(
|
| 204 |
-
prompt=prompt,
|
| 205 |
-
num_videos_per_prompt=1,
|
| 206 |
-
num_inference_steps=50,
|
| 207 |
-
num_frames=81,
|
| 208 |
-
guidance_scale=6,
|
| 209 |
-
generator=torch.Generator(device="cuda").manual_seed(42),
|
| 210 |
-
).frames[0]
|
| 211 |
-
|
| 212 |
-
export_to_video(video, "output.mp4", fps=8)
|
| 213 |
-
```
|
| 214 |
-
|
| 215 |
-
Additionally, these models can be serialized and stored using PytorchAO in quantized data types to save disk space. You
|
| 216 |
-
can find examples and benchmarks at the following links:
|
| 217 |
-
|
| 218 |
-
- [torchao](https://gist.github.com/a-r-r-o-w/4d9732d17412888c885480c6521a9897)
|
| 219 |
-
- [quanto](https://gist.github.com/a-r-r-o-w/31be62828b00a9292821b85c1017effa)
|
| 220 |
-
|
| 221 |
-
## Further Exploration
|
| 222 |
-
|
| 223 |
-
Feel free to enter our [GitHub](https://github.com/THUDM/CogVideo), where you'll find:
|
| 224 |
-
|
| 225 |
-
1. More detailed technical explanations and code.
|
| 226 |
-
2. Optimized prompt examples and conversions.
|
| 227 |
-
3. Detailed code for model inference and fine-tuning.
|
| 228 |
-
4. Project update logs and more interactive opportunities.
|
| 229 |
-
5. CogVideoX toolchain to help you better use the model.
|
| 230 |
-
6. INT8 model inference code.
|
| 231 |
-
|
| 232 |
-
## Model License
|
| 233 |
-
|
| 234 |
-
This model is released under the [CogVideoX LICENSE](LICENSE).
|
| 235 |
-
|
| 236 |
-
## Citation
|
| 237 |
-
|
| 238 |
-
```
|
| 239 |
-
@article{yang2024cogvideox,
|
| 240 |
-
title={CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer},
|
| 241 |
-
author={Yang, Zhuoyi and Teng, Jiayan and Zheng, Wendi and Ding, Ming and Huang, Shiyu and Xu, Jiazheng and Yang, Yuanming and Hong, Wenyi and Zhang, Xiaohan and Feng, Guanyu and others},
|
| 242 |
-
journal={arXiv preprint arXiv:2408.06072},
|
| 243 |
-
year={2024}
|
| 244 |
-
}
|
| 245 |
-
```
|
| 246 |
-
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
| 2 |
language:
|
| 3 |
+
- en
|
| 4 |
+
license: apache-2.0
|
| 5 |
+
pipeline_tag: text-to-video
|
| 6 |
tags:
|
| 7 |
+
- video-generation
|
| 8 |
+
- thudm
|
| 9 |
+
- image-to-video
|
| 10 |
inference: false
|
| 11 |
+
library_name: diffusers
|
| 12 |
---
|
| 13 |
|
| 14 |
# CogVideoX1.5-5B
|
|
|
|
| 24 |
<a href="https://arxiv.org/pdf/2408.06072">📜 arxiv </a>
|
| 25 |
</p>
|
| 26 |
<p align="center">
|
| 27 |
+
📍 Visit <a href="https://chatglm.cn/video?lang=en?fr=osm_cogvideo">QingYing</a> and <a href="https://open.bigmodel.cn/?utm_campaign=open&_channel_track_key=OWTVNma9">API Platform</a> to experience larger-scale commercial video generation models.
|
| 28 |
</p>
|
| 29 |
|
| 30 |
## Model Introduction
|
| 31 |
|
| 32 |
+
CogVideoX is an open-source video generation model similar to [QingYing](https://chatglm.cn/video?lang=en?fr=osm_cogvideo). The table below displays the list of video generation models we currently offer, along with their foundational information.
|
|
|
|
|
|
|
| 33 |
|
| 34 |
<table style="border-collapse: collapse; width: 100%;">
|
| 35 |
<tr>
|
| 36 |
<th style="text-align: center;">Model Name</th>
|
| 37 |
+
<th style="text-align: center;">CogVideoX1.5-5B (Latest)</th>
|
| 38 |
+
<th style="text-align: center;">CogVideoX1.5-5B-I2V (Latest)</th>
|
| 39 |
+
<th style="text-align: center;">CogVideoX-2B</th>
|
| 40 |
+
<th style="text-align: center;">CogVideoX-5B</th>
|
| 41 |
+
<th style="text-align: center;">CogVideoX-5B-I2V</th>
|
| 42 |
+
</tr>
|
| 43 |
+
<tr>
|
| 44 |
+
<td style="text-align: center;">Release Date</td>
|
| 45 |
+
<th style="text-align: center;">November 8, 2024</th>
|
| 46 |
+
<th style="text-align: center;">November 8, 2024</th>
|
| 47 |
+
<th style="text-align: center;">August 6, 2024</th>
|
| 48 |
+
<th style="text-align: center;">August 27, 2024</th>
|
| 49 |
+
<th style="text-align: center;">September 19, 2024</th>
|
| 50 |
</tr>
|
| 51 |
<tr>
|
| 52 |
<td style="text-align: center;">Video Resolution</td>
|
| 53 |
<td colspan="1" style="text-align: center;">1360 * 768</td>
|
| 54 |
<td colspan="1" style="text-align: center;"> Min(W, H) = 768 <br> 768 ≤ Max(W, H) ≤ 1360 <br> Max(W, H) % 16 = 0 </td>
|
| 55 |
+
<td colspan="3" style="text-align: center;">720 * 480</td>
|
| 56 |
+
</tr>
|
| 57 |
+
<tr>
|
| 58 |
+
<td style="text-align: center;">Number of Frames</td>
|
| 59 |
+
<td colspan="2" style="text-align: center;">Should be <b>16N + 1</b> where N <= 10 (default 81)</td>
|
| 60 |
+
<td colspan="3" style="text-align: center;">Should be <b>8N + 1</b> where N <= 6 (default 49)</td>
|
| 61 |
+
</tr>
|
| 62 |
<tr>
|
| 63 |
<td style="text-align: center;">Inference Precision</td>
|
| 64 |
+
<td colspan="2" style="text-align: center;"><b>BF16 (Recommended)</b>, FP16, FP32, FP8*, INT8, Not supported: INT4</td>
|
| 65 |
+
<td style="text-align: center;"><b>FP16*(Recommended)</b>, BF16, FP32, FP8*, INT8, Not supported: INT4</td>
|
| 66 |
+
<td colspan="2" style="text-align: center;"><b>BF16 (Recommended)</b>, FP16, FP32, FP8*, INT8, Not supported: INT4</td>
|
| 67 |
</tr>
|
| 68 |
<tr>
|
| 69 |
+
<td style="text-align: center;">Single GPU Memory Usage<br></td>
|
| 70 |
+
<td colspan="2" style="text-align: center;"><a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> BF16: 76GB <br><b>diffusers BF16: from 10GB*</b><br><b>diffusers INT8(torchao): from 7GB*</b></td>
|
| 71 |
+
<td style="text-align: center;"><a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> FP16: 18GB <br><b>diffusers FP16: 4GB minimum* </b><br><b>diffusers INT8 (torchao): 3.6GB minimum*</b></td>
|
| 72 |
+
<td colspan="2" style="text-align: center;"><a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> BF16: 26GB <br><b>diffusers BF16 : 5GB minimum* </b><br><b>diffusers INT8 (torchao): 4.4GB minimum* </b></td>
|
| 73 |
</tr>
|
| 74 |
<tr>
|
| 75 |
+
<td style="text-align: center;">Multi-GPU Memory Usage</td>
|
| 76 |
+
<td colspan="2" style="text-align: center;"><b>BF16: 24GB* using diffusers</b><br></td>
|
| 77 |
+
<td style="text-align: center;"><b>FP16: 10GB* using diffusers</b><br></td>
|
| 78 |
+
<td colspan="2" style="text-align: center;"><b>BF16: 15GB* using diffusers</b><br></td>
|
| 79 |
</tr>
|
| 80 |
<tr>
|
| 81 |
+
<td style="text-align: center;">Inference Speed<br>(Step = 50, FP/BF16)</td>
|
| 82 |
<td colspan="2" style="text-align: center;">Single A100: ~1000 seconds (5-second video)<br>Single H100: ~550 seconds (5-second video)</td>
|
| 83 |
+
<td style="text-align: center;">Single A100: ~90 seconds<br>Single H100: ~45 seconds</td>
|
| 84 |
+
<td colspan="2" style="text-align: center;">Single A100: ~180 seconds<br>Single H100: ~90 seconds</td>
|
| 85 |
</tr>
|
| 86 |
<tr>
|
| 87 |
<td style="text-align: center;">Prompt Language</td>
|
| 88 |
<td colspan="5" style="text-align: center;">English*</td>
|
| 89 |
</tr>
|
| 90 |
<tr>
|
| 91 |
+
<td style="text-align: center;">Prompt Token Limit</td>
|
| 92 |
<td colspan="2" style="text-align: center;">224 Tokens</td>
|
| 93 |
+
<td colspan="3" style="text-align: center;">226 Tokens</td>
|
| 94 |
</tr>
|
| 95 |
<tr>
|
| 96 |
<td style="text-align: center;">Video Length</td>
|
| 97 |
+
<td colspan="2" style="text-align: center;">5 seconds or 10 seconds</td>
|
| 98 |
+
<td colspan="3" style="text-align: center;">6 seconds</td>
|
| 99 |
</tr>
|
| 100 |
<tr>
|
| 101 |
+
<td style="text-align: center;">Frame Rate</td>
|
| 102 |
+
<td colspan="2" style="text-align: center;">16 frames / second </td>
|
| 103 |
+
<td colspan="3" style="text-align: center;">8 frames / second </td>
|
| 104 |
+
</tr>
|
| 105 |
+
<tr>
|
| 106 |
+
<td style="text-align: center;">Position Encoding</td>
|
| 107 |
+
<td colspan="2" style="text-align: center;">3d_rope_pos_embed</td>
|
| 108 |
+
<td style="text-align: center;">3d_sincos_pos_embed</td>
|
| 109 |
+
<td style="text-align: center;">3d_rope_pos_embed</td>
|
| 110 |
+
<td style="text-align: center;">3d_rope_pos_embed + learnable_pos_embed</td>
|
| 111 |
+
</tr>
|
| 112 |
+
<tr>
|
| 113 |
+
<td style="text-align: center;">Download Link (Diffusers)</td>
|
| 114 |
+
<td style="text-align: center;"><a href="https://huggingface.co/THUDM/CogVideoX1.5-5B">🤗 HuggingFace</a><br><a href="https://modelscope.cn/models/ZhipuAI/CogVideoX1.5-5B">🤖 ModelScope</a><br><a href="https://wisemodel.cn/models/ZhipuAI/CogVideoX1.5-5B">🟣 WiseModel</a></td>
|
| 115 |
+
<td style="text-align: center;"><a href="https://huggingface.co/THUDM/CogVideoX1.5-5B-I2V">🤗 HuggingFace</a><br><a href="https://modelscope.cn/models/ZhipuAI/CogVideoX1.5-5B-I2V">🤖 ModelScope</a><br><a href="https://wisemodel.cn/models/ZhipuAI/CogVideoX1.5-5B-I2V">🟣 WiseModel</a></td>
|
| 116 |
+
<td style="text-align: center;"><a href="https://huggingface.co/THUDM/CogVideoX-2b">🤗 HuggingFace</a><br><a href="https://modelscope.cn/models/ZhipuAI/CogVideoX-2b">🤖 ModelScope</a><br><a href="https://wisemodel.cn/models/ZhipuAI/CogVideoX-2b">🟣 WiseModel</a></td>
|
| 117 |
+
<td style="text-align: center;"><a href="https://huggingface.co/THUDM/CogVideoX-5b">🤗 HuggingFace</a><br><a href="https://modelscope.cn/models/ZhipuAI/CogVideoX-5b">🤖 ModelScope</a><br><a href="https://wisemodel.cn/models/ZhipuAI/CogVideoX-5b">🟣 WiseModel</a></td>
|
| 118 |
+
<td style="text-align: center;"><a href="https://huggingface.co/THUDM/CogVideoX-5b-I2V">🤗 HuggingFace</a><br><a href="https://modelscope.cn/models/ZhipuAI/CogVideoX-5b-I2V">🤖 ModelScope</a><br><a href="https://wisemodel.cn/models/ZhipuAI/CogVideoX-5b-I2V">🟣 WiseModel</a></td>
|
| 119 |
+
</tr>
|
| 120 |
+
<tr>
|
| 121 |
+
<td style="text-align: center;">Download Link (SAT)</td>
|
| 122 |
+
<td colspan="2" style="text-align: center;"><a href="https://huggingface.co/THUDM/CogVideoX1.5-5b-SAT">🤗 HuggingFace</a><br><a href="https://modelscope.cn/models/ZhipuAI/CogVideoX1.5-5b-SAT">🤖 ModelScope</a><br><a href="https://wisemodel.cn/models/ZhipuAI/CogVideoX1.5-5b-SAT">🟣 WiseModel</a></td>
|
| 123 |
+
<td colspan="3" style="text-align: center;"><a href="./sat/README_zh.md">SAT</a></td>
|
| 124 |
</tr>
|
| 125 |
</table>
|
| 126 |
|
| 127 |
+
**(rest of the content remains the same as the original)**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|