How to run inference with a model loaded on both CPU and GPU with device_map="balanced"

#9
by sadimanna - opened
Hugging Face Discord Community org
This comment has been hidden (marked as Off-Topic)
sadimanna changed discussion title from How to run inference with a model loaded with device_map="balanced" to How to run inference with a model loaded with device_map="balanced" on CPU and GPU
sadimanna changed discussion title from How to run inference with a model loaded with device_map="balanced" on CPU and GPU to How to run inference with a model loaded on both CPU and GPU with device_map="balanced"
Hugging Face Discord Community org

the error comes from a device mismatch. Some parts of your pipeline are on the GPU (cuda:0), while others (like the text_encoder) are still on the CPU. When tensors from different devices interact (e.g., matrix multiplication between CPU and GPU tensors), PyTorch throws this error.

You used device_map="balanced". That tells accelerate/diffusers to automatically spread submodules across CPU and GPU to fit memory.

In your case:

python
pipeline.hf_device_map
{'text_encoder': 'cpu', 'vae': 0}
โ†’ The text encoder is on CPU, while the VAE is on GPU.

During inference, the text encoder outputs CPU tensors, but the rest of the pipeline expects GPU tensors โ†’ mismatch.

see i would suggest you to do device_map = "auto" it will smartly offload some part of model to cpu its the easiest fix there are other fixes but they are complicated let me know if you want to know

Hugging Face Discord Community org

@Parveshiiii

Using

device_map = "auto" 

gave me error.

Hugging Face Discord Community org

what was the error?

Hugging Face Discord Community org

what is the hardware you are using?

Hugging Face Discord Community org
โ€ข
edited Sep 24

@Parveshiiii , I am trying to run it on Google Colab with a T4 GPU, and it looks like

device_map="auto"

is not supported in the diffusers library.

I am getting the error message

NotImplementedError: auto not supported. Supported strategies are: balanced, cuda

In diffusers/src/diffusers/pipelines/pipeline_utils.py

 SUPPORTED_DEVICE_MAP = ["balanced"] + [get_device()]
Hugging Face Discord Community org
โ€ข
edited Sep 24

Sign up or log in to comment