Unlock Faster Image Generation in Stable Diffusion Web UI with NVIDIA TensorRT

Stable Diffusion is an open-source generative AI image-based model that enables users to generate images with simple text descriptions. Gaining traction among developers, it has powered popular applications like Wombo and Lensa.

End users typically access the model through distributions that package it together with a user interface and a set of tools. The most popular distribution is the Automatic 1111 Stable Diffusion Web UI. This post explains how leveraging NVIDIA TensorRT can double the performance of a model. It features an example using the Automatic 1111 Stable Diffusion Web UI.

Efficient generative AI requires GPUs

Stable Diffusion is a deep learning model that uses diffusion processes to generate images based on input text and images. While it can be a useful tool to enhance creator workflows, the model is computationally intensive. Generating a single batch of four images takes minutes on nonspecialized hardware like CPUs, which breaks workflows and can be a barrier for many developers.

Without dedicated hardware, AI features are slow because CPUs are not inherently designed for the highly parallel operations demanded by neural networks, and are instead optimized for general-purpose tasks. Stable Diffusion exemplifies why GPUs are necessary to run AI efficiently.

NVIDIA TensorRT accelerates performance

GeForce RTX GPUs excel at parallelized work, required to run generative AI models. They are also equipped with dedicated hardware called Tensor Cores that accelerate matrix operations for AI use cases. The best way to enable these optimizations is with NVIDIA TensorRT SDK, a high-performance deep learning inference optimizer.

TensorRT provides layer fusion, precision calibration, kernel auto-tuning, and other capabilities that significantly boost the efficiency and speed of deep learning models. This makes it indispensable for real-time applications and resource-intensive tasks like Stable Diffusion.

TensorRT substantially accelerates performance. In the case of Stable Diffusion Web UI image generation, it doubled the number of image generations per minute, compared to the most accelerated method previously used (PyTorch xFormers).

Comparison of images generated per minute of Apple M2 Ultra and GeForce RTX 4090 (with both PyTorch xFormers and TensorRT acceleration). — *Figure 1. NVIDIA TensorRT acceleration doubles the number of image generations per minute*

Image generation: Stable Diffusion 1.5, 512 x 512, batch size 1, Stable Diffusion Web UI from Automatic 1111 (for NVIDIA) and Mochi (for Apple)
Hardware: GeForce RTX 4090 with Intel i9 12900K; Apple M2 Ultra with 76 cores

Implementing TensorRT in a Stable Diffusion pipeline

NVIDIA has published a TensorRT demo of a Stable Diffusion pipeline that provides developers with a reference implementation on how to prepare diffusion models and accelerate them using TensorRT. This is the starting point if you’re interested in turbocharging your diffusion pipeline and bringing lightning-fast inference to your applications.

Building on this foundation, the TensorRT pipeline was then applied to a project commonly used by Stable Diffusion developers. Implementing TensorRT into the Stable Diffusion Web UI further democratizes generative AI and provides broad, easy access.

Screenshot of Stable Diffusion Web UI
with generated images. — *Figure 2. Images generated in the Stable Diffusion Web UI*

This journey began with the introduction of a TensorRT Python package for Windows, which significantly simplified the installation process. Even those with minimal technical knowledge can easily install and start using TensorRT.

Once installed, it provides an intuitive user interface that triggers the ahead-of-time compilation required for TensorRT engines. A caching mechanism drastically reduces compile times. These simplifications free users to focus on core tasks. The integration is flexible: dynamic shapes enable users to render different resolutions with minimal impact on performance. This implementation provides a useful tool for developers. Leverage this plug-in to enhance your own Stable Diffusion pipelines.

Get started with TensorRT

To download the Stable Diffusion Web UI TensorRT extension, visit NVIDIA/Stable-Diffusion-WebUI-TensorRT on GitHub. And check out NVIDIA/TensorRT for a demo showcasing the acceleration of a Stable Diffusion pipeline. For more details about the Automatic 1111 TensorRT extension, see TensorRT Extension for Stable Diffusion Web UI.

For broader guidance on how to integrate TensorRT into your applications, see Getting Started with NVIDIA AI for Your Applications. Learn how to profile your pipeline to pinpoint where optimization is critical and where minor changes can have a big impact. Accelerate your AI pipeline by choosing a machine learning framework, and discover SDKs for video, graphic design, photography, and audio.