Spaces:
				
			
			
	
			
			
					
		Running
		
	
	
	
			
			
	
	
	
	
		
		How to run inference with a model loaded on both CPU and GPU with device_map="balanced"
the error comes from a device mismatch. Some parts of your pipeline are on the GPU (cuda:0), while others (like the text_encoder) are still on the CPU. When tensors from different devices interact (e.g., matrix multiplication between CPU and GPU tensors), PyTorch throws this error.
You used device_map="balanced". That tells accelerate/diffusers to automatically spread submodules across CPU and GPU to fit memory.
In your case:
python
pipeline.hf_device_map
{'text_encoder': 'cpu', 'vae': 0}
โ The text encoder is on CPU, while the VAE is on GPU.
During inference, the text encoder outputs CPU tensors, but the rest of the pipeline expects GPU tensors โ mismatch.
see i would suggest you to do device_map = "auto" it will smartly offload some part of model to cpu its the easiest fix there are other fixes but they are complicated let me know if you want to know
what was the error?
what is the hardware you are using?
@Parveshiiii , I am trying to run it on Google Colab with a T4 GPU, and it looks like
device_map="auto"
is not supported in the diffusers library.
I am getting the error message
NotImplementedError: auto not supported. Supported strategies are: balanced, cuda
In diffusers/src/diffusers/pipelines/pipeline_utils.py
 SUPPORTED_DEVICE_MAP = ["balanced"] + [get_device()]

