MultiPerson

Running on Zero

App Files Files Community

C4G-HKUST commited on 15 days ago

Commit

864d14b

1 Parent(s): 37e0f4b

Update generation mode descriptions: clarify GPU budget is up to limit, add 10s video generation note for 720s+40 steps

Browse files

Files changed (2) hide show

README.md +7 -4
app.py +9 -9

README.md CHANGED Viewed

@@ -212,17 +212,20 @@ python app.py
 #### Generation Modes
 The Gradio demo provides two generation modes:
-- **Fast Mode (240s GPU duration)**:
-  - Fixed 15 denoising steps for quick generation
   - Suitable for single-person videos or quick previews
   - Lower GPU usage quota consumption
-- **Quality Mode (720s GPU duration)**:
   - Custom denoising steps (adjustable via "Diffusion steps" slider)
   - Recommended for multi-person videos that require higher quality
   - Longer generation time but better quality output
-**Design Rationale**: Multi-person videos generally have longer duration and require more computational resources. To achieve better quality, especially for complex multi-person interactions, more denoising steps and longer GPU allocation time are needed. The Quality Mode provides sufficient Usage Quota (720 seconds) to accommodate these requirements, while the Fast Mode offers a quick preview option with fixed 15 steps for faster iteration.

 #### Generation Modes
 The Gradio demo provides two generation modes:
+- **Fast Mode (up to 240s GPU budget)**:
+  - Fixed 12 denoising steps for quick generation
   - Suitable for single-person videos or quick previews
   - Lower GPU usage quota consumption
+  - The 240s is the maximum GPU allocation time (budget), not the actual generation time
+- **Quality Mode (up to 720s GPU budget)**:
   - Custom denoising steps (adjustable via "Diffusion steps" slider)
   - Recommended for multi-person videos that require higher quality
   - Longer generation time but better quality output
+  - The 720s is the maximum GPU allocation time (budget), not the actual generation time
+  - With 40 denoising steps, approximately 10 seconds of video can be generated
+**Design Rationale**: Multi-person videos generally have longer duration and require more computational resources. To achieve better quality, especially for complex multi-person interactions, more denoising steps and longer GPU allocation time are needed. The Quality Mode provides sufficient Usage Quota (up to 720 seconds) to accommodate these requirements, while the Fast Mode offers a quick preview option with fixed 12 steps for faster iteration. Note that the GPU duration values (240s/720s) represent the maximum budget allocated, not the actual generation time.

app.py CHANGED Viewed

@@ -291,7 +291,7 @@ def _parse_args():
     parser.add_argument(
         "--det_thresh",
         type=float,
-        default=0.15,
         help="Threshold for InsightFace face detection.")
     parser.add_argument(
         "--mode",
@@ -606,11 +606,11 @@ def run_graio_demo(args):
     # 参考: https://huggingface.co/spaces/KlingTeam/LivePortrait/blob/main/app.py
     # @spaces.GPU 装饰器会自动处理 GPU 初始化，不需要手动初始化
-    # 快速生成模式：240秒，固定15步去噪
     @spaces.GPU(duration=240)
     def gpu_wrapped_generate_video_fast(*args, **kwargs):
-        # 固定使用15步去噪，通过关键字参数传递
-        kwargs['fixed_steps'] = 15
         return gpu_wrapped_generate_video_worker(*args, **kwargs)
     # 高质量生成模式：720秒，用户选择去噪步数
@@ -758,7 +758,7 @@ def run_graio_demo(args):
                 with gr.Row():
                     run_i2v_button_fast = gr.Button(
-                        "Generate Video (Fast - 240s, 15 steps)",
                         variant="secondary",
                         scale=1
                     )
@@ -769,10 +769,10 @@ def run_graio_demo(args):
                     )
                 gr.Markdown("""
                 **Generation Modes:**
-                - **Fast Mode (240s)**: Fixed 15 denoising steps for quick generation. Suitable for single-person videos or quick previews.
-                - **Quality Mode (720s)**: Custom denoising steps (adjustable via "Diffusion steps" slider). Recommended for multi-person videos that require higher quality and longer generation time.
-                *Note: Multi-person videos generally require longer duration and more Usage Quota for better quality.*
                 """)
             with gr.Column(scale=2):
@@ -807,7 +807,7 @@ def run_graio_demo(args):
                 )
-        # 快速生成按钮：240秒，固定15步
         run_i2v_button_fast.click(
             fn=gpu_wrapped_generate_video_fast,
             inputs=[img2vid_image, img2vid_prompt, n_prompt, img2vid_audio_1, img2vid_audio_2, img2vid_audio_3, sd_steps, seed, guide_scale, person_num_selector, audio_mode_selector],

     parser.add_argument(
         "--det_thresh",
         type=float,
+        default=0.12,
         help="Threshold for InsightFace face detection.")
     parser.add_argument(
         "--mode",
     # 参考: https://huggingface.co/spaces/KlingTeam/LivePortrait/blob/main/app.py
     # @spaces.GPU 装饰器会自动处理 GPU 初始化，不需要手动初始化
+    # 快速生成模式：240秒，固定12步去噪
     @spaces.GPU(duration=240)
     def gpu_wrapped_generate_video_fast(*args, **kwargs):
+        # 固定使用12步去噪，通过关键字参数传递
+        kwargs['fixed_steps'] = 12
         return gpu_wrapped_generate_video_worker(*args, **kwargs)
     # 高质量生成模式：720秒，用户选择去噪步数
                 with gr.Row():
                     run_i2v_button_fast = gr.Button(
+                        "Generate Video (Fast - 240s, 12 steps)",
                         variant="secondary",
                         scale=1
                     )
                     )
                 gr.Markdown("""
                 **Generation Modes:**
+                - **Fast Mode (up to 240s GPU budget)**: Fixed 12 denoising steps for quick generation. Suitable for single-person videos or quick previews. The 240s is the maximum GPU allocation time, not the actual generation time.
+                - **Quality Mode (up to 720s GPU budget)**: Custom denoising steps (adjustable via "Diffusion steps" slider). Recommended for multi-person videos that require higher quality. The 720s is the maximum GPU allocation time, not the actual generation time. With 40 denoising steps, approximately 10 seconds of video can be generated.
+                *Note: The GPU duration (240s/720s) represents the maximum budget allocated, not the actual generation time. Multi-person videos generally require longer duration and more Usage Quota for better quality.*
                 """)
             with gr.Column(scale=2):
                 )
+        # 快速生成按钮：240秒，固定12步
         run_i2v_button_fast.click(
             fn=gpu_wrapped_generate_video_fast,
             inputs=[img2vid_image, img2vid_prompt, n_prompt, img2vid_audio_1, img2vid_audio_2, img2vid_audio_3, sd_steps, seed, guide_scale, person_num_selector, audio_mode_selector],