C4G-HKUST commited on
Commit
864d14b
·
1 Parent(s): 37e0f4b

Update generation mode descriptions: clarify GPU budget is up to limit, add 10s video generation note for 720s+40 steps

Browse files
Files changed (2) hide show
  1. README.md +7 -4
  2. app.py +9 -9
README.md CHANGED
@@ -212,17 +212,20 @@ python app.py
212
  #### Generation Modes
213
  The Gradio demo provides two generation modes:
214
 
215
- - **Fast Mode (240s GPU duration)**:
216
- - Fixed 15 denoising steps for quick generation
217
  - Suitable for single-person videos or quick previews
218
  - Lower GPU usage quota consumption
 
219
 
220
- - **Quality Mode (720s GPU duration)**:
221
  - Custom denoising steps (adjustable via "Diffusion steps" slider)
222
  - Recommended for multi-person videos that require higher quality
223
  - Longer generation time but better quality output
 
 
224
 
225
- **Design Rationale**: Multi-person videos generally have longer duration and require more computational resources. To achieve better quality, especially for complex multi-person interactions, more denoising steps and longer GPU allocation time are needed. The Quality Mode provides sufficient Usage Quota (720 seconds) to accommodate these requirements, while the Fast Mode offers a quick preview option with fixed 15 steps for faster iteration.
226
 
227
 
228
 
 
212
  #### Generation Modes
213
  The Gradio demo provides two generation modes:
214
 
215
+ - **Fast Mode (up to 240s GPU budget)**:
216
+ - Fixed 12 denoising steps for quick generation
217
  - Suitable for single-person videos or quick previews
218
  - Lower GPU usage quota consumption
219
+ - The 240s is the maximum GPU allocation time (budget), not the actual generation time
220
 
221
+ - **Quality Mode (up to 720s GPU budget)**:
222
  - Custom denoising steps (adjustable via "Diffusion steps" slider)
223
  - Recommended for multi-person videos that require higher quality
224
  - Longer generation time but better quality output
225
+ - The 720s is the maximum GPU allocation time (budget), not the actual generation time
226
+ - With 40 denoising steps, approximately 10 seconds of video can be generated
227
 
228
+ **Design Rationale**: Multi-person videos generally have longer duration and require more computational resources. To achieve better quality, especially for complex multi-person interactions, more denoising steps and longer GPU allocation time are needed. The Quality Mode provides sufficient Usage Quota (up to 720 seconds) to accommodate these requirements, while the Fast Mode offers a quick preview option with fixed 12 steps for faster iteration. Note that the GPU duration values (240s/720s) represent the maximum budget allocated, not the actual generation time.
229
 
230
 
231
 
app.py CHANGED
@@ -291,7 +291,7 @@ def _parse_args():
291
  parser.add_argument(
292
  "--det_thresh",
293
  type=float,
294
- default=0.15,
295
  help="Threshold for InsightFace face detection.")
296
  parser.add_argument(
297
  "--mode",
@@ -606,11 +606,11 @@ def run_graio_demo(args):
606
  # 参考: https://huggingface.co/spaces/KlingTeam/LivePortrait/blob/main/app.py
607
  # @spaces.GPU 装饰器会自动处理 GPU 初始化,不需要手动初始化
608
 
609
- # 快速生成模式:240秒,固定15步去噪
610
  @spaces.GPU(duration=240)
611
  def gpu_wrapped_generate_video_fast(*args, **kwargs):
612
- # 固定使用15步去噪,通过关键字参数传递
613
- kwargs['fixed_steps'] = 15
614
  return gpu_wrapped_generate_video_worker(*args, **kwargs)
615
 
616
  # 高质量生成模式:720秒,用户选择去噪步数
@@ -758,7 +758,7 @@ def run_graio_demo(args):
758
 
759
  with gr.Row():
760
  run_i2v_button_fast = gr.Button(
761
- "Generate Video (Fast - 240s, 15 steps)",
762
  variant="secondary",
763
  scale=1
764
  )
@@ -769,10 +769,10 @@ def run_graio_demo(args):
769
  )
770
  gr.Markdown("""
771
  **Generation Modes:**
772
- - **Fast Mode (240s)**: Fixed 15 denoising steps for quick generation. Suitable for single-person videos or quick previews.
773
- - **Quality Mode (720s)**: Custom denoising steps (adjustable via "Diffusion steps" slider). Recommended for multi-person videos that require higher quality and longer generation time.
774
 
775
- *Note: Multi-person videos generally require longer duration and more Usage Quota for better quality.*
776
  """)
777
 
778
  with gr.Column(scale=2):
@@ -807,7 +807,7 @@ def run_graio_demo(args):
807
  )
808
 
809
 
810
- # 快速生成按钮:240秒,固定15
811
  run_i2v_button_fast.click(
812
  fn=gpu_wrapped_generate_video_fast,
813
  inputs=[img2vid_image, img2vid_prompt, n_prompt, img2vid_audio_1, img2vid_audio_2, img2vid_audio_3, sd_steps, seed, guide_scale, person_num_selector, audio_mode_selector],
 
291
  parser.add_argument(
292
  "--det_thresh",
293
  type=float,
294
+ default=0.12,
295
  help="Threshold for InsightFace face detection.")
296
  parser.add_argument(
297
  "--mode",
 
606
  # 参考: https://huggingface.co/spaces/KlingTeam/LivePortrait/blob/main/app.py
607
  # @spaces.GPU 装饰器会自动处理 GPU 初始化,不需要手动初始化
608
 
609
+ # 快速生成模式:240秒,固定12步去噪
610
  @spaces.GPU(duration=240)
611
  def gpu_wrapped_generate_video_fast(*args, **kwargs):
612
+ # 固定使用12步去噪,通过关键字参数传递
613
+ kwargs['fixed_steps'] = 12
614
  return gpu_wrapped_generate_video_worker(*args, **kwargs)
615
 
616
  # 高质量生成模式:720秒,用户选择去噪步数
 
758
 
759
  with gr.Row():
760
  run_i2v_button_fast = gr.Button(
761
+ "Generate Video (Fast - 240s, 12 steps)",
762
  variant="secondary",
763
  scale=1
764
  )
 
769
  )
770
  gr.Markdown("""
771
  **Generation Modes:**
772
+ - **Fast Mode (up to 240s GPU budget)**: Fixed 12 denoising steps for quick generation. Suitable for single-person videos or quick previews. The 240s is the maximum GPU allocation time, not the actual generation time.
773
+ - **Quality Mode (up to 720s GPU budget)**: Custom denoising steps (adjustable via "Diffusion steps" slider). Recommended for multi-person videos that require higher quality. The 720s is the maximum GPU allocation time, not the actual generation time. With 40 denoising steps, approximately 10 seconds of video can be generated.
774
 
775
+ *Note: The GPU duration (240s/720s) represents the maximum budget allocated, not the actual generation time. Multi-person videos generally require longer duration and more Usage Quota for better quality.*
776
  """)
777
 
778
  with gr.Column(scale=2):
 
807
  )
808
 
809
 
810
+ # 快速生成按钮:240秒,固定12
811
  run_i2v_button_fast.click(
812
  fn=gpu_wrapped_generate_video_fast,
813
  inputs=[img2vid_image, img2vid_prompt, n_prompt, img2vid_audio_1, img2vid_audio_2, img2vid_audio_3, sd_steps, seed, guide_scale, person_num_selector, audio_mode_selector],