Spaces:
Running
on
Zero
Running
on
Zero
Update generation mode descriptions: clarify GPU budget is up to limit, add 10s video generation note for 720s+40 steps
Browse files
README.md
CHANGED
|
@@ -212,17 +212,20 @@ python app.py
|
|
| 212 |
#### Generation Modes
|
| 213 |
The Gradio demo provides two generation modes:
|
| 214 |
|
| 215 |
-
- **Fast Mode (240s GPU
|
| 216 |
-
- Fixed
|
| 217 |
- Suitable for single-person videos or quick previews
|
| 218 |
- Lower GPU usage quota consumption
|
|
|
|
| 219 |
|
| 220 |
-
- **Quality Mode (720s GPU
|
| 221 |
- Custom denoising steps (adjustable via "Diffusion steps" slider)
|
| 222 |
- Recommended for multi-person videos that require higher quality
|
| 223 |
- Longer generation time but better quality output
|
|
|
|
|
|
|
| 224 |
|
| 225 |
-
**Design Rationale**: Multi-person videos generally have longer duration and require more computational resources. To achieve better quality, especially for complex multi-person interactions, more denoising steps and longer GPU allocation time are needed. The Quality Mode provides sufficient Usage Quota (720 seconds) to accommodate these requirements, while the Fast Mode offers a quick preview option with fixed
|
| 226 |
|
| 227 |
|
| 228 |
|
|
|
|
| 212 |
#### Generation Modes
|
| 213 |
The Gradio demo provides two generation modes:
|
| 214 |
|
| 215 |
+
- **Fast Mode (up to 240s GPU budget)**:
|
| 216 |
+
- Fixed 12 denoising steps for quick generation
|
| 217 |
- Suitable for single-person videos or quick previews
|
| 218 |
- Lower GPU usage quota consumption
|
| 219 |
+
- The 240s is the maximum GPU allocation time (budget), not the actual generation time
|
| 220 |
|
| 221 |
+
- **Quality Mode (up to 720s GPU budget)**:
|
| 222 |
- Custom denoising steps (adjustable via "Diffusion steps" slider)
|
| 223 |
- Recommended for multi-person videos that require higher quality
|
| 224 |
- Longer generation time but better quality output
|
| 225 |
+
- The 720s is the maximum GPU allocation time (budget), not the actual generation time
|
| 226 |
+
- With 40 denoising steps, approximately 10 seconds of video can be generated
|
| 227 |
|
| 228 |
+
**Design Rationale**: Multi-person videos generally have longer duration and require more computational resources. To achieve better quality, especially for complex multi-person interactions, more denoising steps and longer GPU allocation time are needed. The Quality Mode provides sufficient Usage Quota (up to 720 seconds) to accommodate these requirements, while the Fast Mode offers a quick preview option with fixed 12 steps for faster iteration. Note that the GPU duration values (240s/720s) represent the maximum budget allocated, not the actual generation time.
|
| 229 |
|
| 230 |
|
| 231 |
|
app.py
CHANGED
|
@@ -291,7 +291,7 @@ def _parse_args():
|
|
| 291 |
parser.add_argument(
|
| 292 |
"--det_thresh",
|
| 293 |
type=float,
|
| 294 |
-
default=0.
|
| 295 |
help="Threshold for InsightFace face detection.")
|
| 296 |
parser.add_argument(
|
| 297 |
"--mode",
|
|
@@ -606,11 +606,11 @@ def run_graio_demo(args):
|
|
| 606 |
# 参考: https://huggingface.co/spaces/KlingTeam/LivePortrait/blob/main/app.py
|
| 607 |
# @spaces.GPU 装饰器会自动处理 GPU 初始化,不需要手动初始化
|
| 608 |
|
| 609 |
-
# 快速生成模式:240秒,固定
|
| 610 |
@spaces.GPU(duration=240)
|
| 611 |
def gpu_wrapped_generate_video_fast(*args, **kwargs):
|
| 612 |
-
# 固定使用
|
| 613 |
-
kwargs['fixed_steps'] =
|
| 614 |
return gpu_wrapped_generate_video_worker(*args, **kwargs)
|
| 615 |
|
| 616 |
# 高质量生成模式:720秒,用户选择去噪步数
|
|
@@ -758,7 +758,7 @@ def run_graio_demo(args):
|
|
| 758 |
|
| 759 |
with gr.Row():
|
| 760 |
run_i2v_button_fast = gr.Button(
|
| 761 |
-
"Generate Video (Fast - 240s,
|
| 762 |
variant="secondary",
|
| 763 |
scale=1
|
| 764 |
)
|
|
@@ -769,10 +769,10 @@ def run_graio_demo(args):
|
|
| 769 |
)
|
| 770 |
gr.Markdown("""
|
| 771 |
**Generation Modes:**
|
| 772 |
-
- **Fast Mode (240s)**: Fixed
|
| 773 |
-
- **Quality Mode (720s)**: Custom denoising steps (adjustable via "Diffusion steps" slider). Recommended for multi-person videos that require higher quality
|
| 774 |
|
| 775 |
-
*Note: Multi-person videos generally require longer duration and more Usage Quota for better quality.*
|
| 776 |
""")
|
| 777 |
|
| 778 |
with gr.Column(scale=2):
|
|
@@ -807,7 +807,7 @@ def run_graio_demo(args):
|
|
| 807 |
)
|
| 808 |
|
| 809 |
|
| 810 |
-
# 快速生成按钮:240秒,固定
|
| 811 |
run_i2v_button_fast.click(
|
| 812 |
fn=gpu_wrapped_generate_video_fast,
|
| 813 |
inputs=[img2vid_image, img2vid_prompt, n_prompt, img2vid_audio_1, img2vid_audio_2, img2vid_audio_3, sd_steps, seed, guide_scale, person_num_selector, audio_mode_selector],
|
|
|
|
| 291 |
parser.add_argument(
|
| 292 |
"--det_thresh",
|
| 293 |
type=float,
|
| 294 |
+
default=0.12,
|
| 295 |
help="Threshold for InsightFace face detection.")
|
| 296 |
parser.add_argument(
|
| 297 |
"--mode",
|
|
|
|
| 606 |
# 参考: https://huggingface.co/spaces/KlingTeam/LivePortrait/blob/main/app.py
|
| 607 |
# @spaces.GPU 装饰器会自动处理 GPU 初始化,不需要手动初始化
|
| 608 |
|
| 609 |
+
# 快速生成模式:240秒,固定12步去噪
|
| 610 |
@spaces.GPU(duration=240)
|
| 611 |
def gpu_wrapped_generate_video_fast(*args, **kwargs):
|
| 612 |
+
# 固定使用12步去噪,通过关键字参数传递
|
| 613 |
+
kwargs['fixed_steps'] = 12
|
| 614 |
return gpu_wrapped_generate_video_worker(*args, **kwargs)
|
| 615 |
|
| 616 |
# 高质量生成模式:720秒,用户选择去噪步数
|
|
|
|
| 758 |
|
| 759 |
with gr.Row():
|
| 760 |
run_i2v_button_fast = gr.Button(
|
| 761 |
+
"Generate Video (Fast - 240s, 12 steps)",
|
| 762 |
variant="secondary",
|
| 763 |
scale=1
|
| 764 |
)
|
|
|
|
| 769 |
)
|
| 770 |
gr.Markdown("""
|
| 771 |
**Generation Modes:**
|
| 772 |
+
- **Fast Mode (up to 240s GPU budget)**: Fixed 12 denoising steps for quick generation. Suitable for single-person videos or quick previews. The 240s is the maximum GPU allocation time, not the actual generation time.
|
| 773 |
+
- **Quality Mode (up to 720s GPU budget)**: Custom denoising steps (adjustable via "Diffusion steps" slider). Recommended for multi-person videos that require higher quality. The 720s is the maximum GPU allocation time, not the actual generation time. With 40 denoising steps, approximately 10 seconds of video can be generated.
|
| 774 |
|
| 775 |
+
*Note: The GPU duration (240s/720s) represents the maximum budget allocated, not the actual generation time. Multi-person videos generally require longer duration and more Usage Quota for better quality.*
|
| 776 |
""")
|
| 777 |
|
| 778 |
with gr.Column(scale=2):
|
|
|
|
| 807 |
)
|
| 808 |
|
| 809 |
|
| 810 |
+
# 快速生成按钮:240秒,固定12步
|
| 811 |
run_i2v_button_fast.click(
|
| 812 |
fn=gpu_wrapped_generate_video_fast,
|
| 813 |
inputs=[img2vid_image, img2vid_prompt, n_prompt, img2vid_audio_1, img2vid_audio_2, img2vid_audio_3, sd_steps, seed, guide_scale, person_num_selector, audio_mode_selector],
|