GRPO Counting Model
A Stable Diffusion 3.5-M model fine-tuned with GRPO (Generative Reinforcement Policy Optimization) method, specifically designed for generating images with precise object counting control.
Model Description
- Base Model: Stable Diffusion 3.5-M
- Training Method: GRPO (Group Relative Policy Optimization)
- Training Data: COCO80 dataset
- Key Feature: Precise control over object quantities in generated images
- Supported Range: 1-10 objects
Model Variants
This repository contains four variants of the model, each trained with different strategies:
Strict First (
strict_first/)- Uses strict reward function
- Timestep selection: First 50 steps
- Best for: Most accurate object counting
Relative First (
relative_first/)- Uses relative reward function
- Timestep selection: First 50 steps
- Best for: Balance between accuracy and image quality
Strict Random (
strict_random/)- Uses strict reward function
- Timestep selection: Random steps
- Best for: Diverse image generation with accurate counting
Relative Random (
relative_random/)- Uses relative reward function
- Timestep selection: Random steps
- Best for: Maximum diversity in generation
Model Usage
1. Download the Model
from huggingface_hub import snapshot_download
# Download the model locally (replace variant with one of: strict_first, relative_first, strict_random, relative_random)
variant = "strict_first" # Choose the variant you want to use
model_path = snapshot_download(
repo_id="MiaTiancai/grpo-counting-model",
local_dir=f"./grpo_counting_model_{variant}", # specify your local path
subfolder=variant # Specify which variant to download
)
2. Model Inference
For inference, please refer to the Flow-GRPO repository. The repository contains all necessary code and instructions for running inference with this model.
Usage Tips
Prompt Format:
- Always include a specific number (1-10) in your prompt
- Use clear object descriptions
- Examples:
"3 cats sitting on a couch","5 red balloons floating in the sky"
Best Practices:
- Keep numbers within the supported range (1-10)
- Use simple and clear scene descriptions
- Avoid overly complex compositions
- Choose the appropriate model variant based on your needs:
- For highest counting accuracy: use
strict_first - For best image quality: use
relative_random - For balanced results: use
relative_first
- For highest counting accuracy: use
Limitations
- Optimal performance for scenes with 1-10 objects
- May have reduced effectiveness with complex scenes
- Results depend on prompt quality and clarity
- Each variant has its own strengths and trade-offs
License
MIT License
- Downloads last month
- -