You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

By clicking "Agree", you agree to the License Agreement and acknowledge Stability AI's Privacy Policy.

Log in or Sign Up to review the conditions and access this model content.

Stable Diffusion 3.5 Large TensorRT

Introduction

This repository hosts the TensorRT-optimized version of Stable Diffusion 3.5 Large, developed in collaboration between Stability AI and NVIDIA. This implementation leverages NVIDIA's TensorRT deep learning inference library to deliver significant performance improvements while maintaining the exceptional image quality of the original model.

Stable Diffusion 3.5 Large is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. The TensorRT optimization makes these capabilities accessible for production deployment and real-time applications.

Model Details

Model Description

This repository holds the ONNX exports of the T5, MMDiT and VAE models in BF16 precision. It also holds the MMDiT model in FP8 precision. The transformer model was quantized to FP8 precision using NVIDIA/TensorRT-Model-Optimizer.

Performance using TensorRT 10.13

Timings for 30 steps at 1024x1024

Accelerator Precision CLIP-G CLIP-L T5 MMDiT x 30 VAE Decoder Total
H100 BF16 13.83 ms 5.66 ms 8.55 ms 7945 ms 97.17 ms 8101.83 ms
H100 FP8 16.80 ms 6.91 ms 8.56 ms 5604.97 ms 36.91 ms 5708.69 ms

Usage Example

  1. Follow the setup instructions on launching a TensorRT NGC container.
git clone https://github.com/NVIDIA/TensorRT.git
cd TensorRT
git checkout release/sd35
docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:25.01-py3 /bin/bash
  1. Install libraries and requirements
cd demo/Diffusion
source setup.sh
  1. Generate HuggingFace user access token To download model checkpoints for the Stable Diffusion 3.5 checkpoints, please request access on the Stable Diffusion 3.5 Large page. You will then need to obtain a read access token to HuggingFace Hub and export as shown below. See instructions.
export HF_TOKEN=<your access token>
  1. Perform TensorRT optimized inference:
  • Stable Diffusion 3.5 Large in BF16 precision

    python3 demo_txt2img_sd35.py \
      "A chic urban apartment interior highlighting mid-century modern furniture, vibrant abstract art pieces on clean white walls, and large windows providing a stunning view of the bustling city below." \
      --version=3.5-large \
      --bf16 \
      --download-onnx-models \
      --denoising-steps=30 \
      --guidance-scale 3.5 \
      --build-static-batch \
      --use-cuda-graph \
      --hf-token=$HF_TOKEN
    
  • Stable Diffusion 3.5 Large using FP8 quantization

    python3 demo_txt2img_sd35.py \
      "A chic urban apartment interior highlighting mid-century modern furniture, vibrant abstract art pieces on clean white walls, and large windows providing a stunning view of the bustling city below." \
      --version=3.5-large \
      --fp8 \
      --denoising-steps=30 \
      --guidance-scale 3.5  \
      --download-onnx-models \
      --build-static-batch \
      --use-cuda-graph \
      --hf-token=$HF_TOKEN \
      --onnx-dir onnx_fp8 \
      --engine-dir engine_fp8
    
Downloads last month
59,229
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Collection including stabilityai/stable-diffusion-3.5-large-tensorrt