Generative AI

Understanding Diffusion Models: An Essential Guide for AEC Professionals

A GIF showing the creation of a building image with diffusion models.

Generative AI, the ability of algorithms to process various types of inputs—such as text, images, audio, video, and code—and generate new content, is advancing at an unprecedented rate. While this technology is making significant strides across multiple industries, one sector that stands to benefit immensely is the Architecture, Engineering, and Construction (AEC) industry. 

AEC firms have historically struggled with fragmented data systems. This causes vital information to be isolated within various departments or project phases, resulting in inefficiencies, misinterpretations, and increased project costs. With the advent of generative AI, the AEC industry is on the brink of transformation.

This cutting-edge technology has the potential to revolutionize the AEC industry by integrating data, automating design tasks, and enhancing collaboration, resulting in more efficient, innovative, and sustainable projects.

Diffusion models: a key component of generative AI in AEC

Since the introduction of generative AI, large language models (LLMs) like GPT-4 have been at the forefront, renowned for their versatility in natural language processing, machine translation, and content creation. Alongside these, image generators such as OpenAI’s DALL-E, Google’s Imagen, Midjourney and Stability AI’s Stable Diffusion, are changing the way architects, engineers, and construction professionals visualize and design projects, enabling rapid prototyping, enhanced creativity, and more efficient workflows.

A futuristic 2 story building nestled in the wilderness.
Figure 1. Diffusion model turning noise into a picture of a futuristic building

At their core, diffusion models possess a distinctive capability. They can generate high-quality data from prompts by progressively adding and removing noise from a dataset. 

Training diffusion models is done by adding noise to millions of images over many iterations and rewarding the model when it recreates the image in the reverse process. Once trained, the model is ready for inference whereby a user is able to generate realistic data, such as images, text, video, audio or 3D models. 

Why noise? It helps diffusion models mimic random changes, understand the data, prevent overfitting, and ensure smooth transformations. Imagine you have a sketch of a building design. You start adding random noise to it, making it look more and more like a messy scribble. This is the forward process. The reverse process is like cleaning up that messy scribble step by step until you get back to a detailed and clear architectural rendering. 

The model learns how to do this cleaning process so well that it can start with random noise and end up generating a completely new, realistic building design. With this innovative approach diffusion models can produce remarkably accurate and detailed outputs, making them a powerful tool.

A sequence of images showing a futuristic home coming into focus. 
Figure 2. A sequence of images showing how diffusion models are trained to create new designs

Diffusion models have a reputation for being difficult to control due to the way they learn, interpret, and produce visuals. However, ControlNets, a group of neural networks trained on specific tasks, can enhance the base model’s capabilities. Architects can exert precise structural and visual control over the generation process by providing references.

Images representing how ControlNets provide structural and visual control over the image generation process
Figure 3. ControlNet converts an architectural sketch into a detailed render

For example, Sketch ControlNet can transform an architectural drawing into a fully realized render.

Multiple ControlNets can be combined together for additional control. For instance, a Sketch ControlNet can be paired with an adaptor, which can incorporate a reference image to apply specific colors and styles to the design.

Images representing how multiple ControlNets can be combined together for additional control.
Figure 4. A sequence of images showing multiple ControlNets combined for precise image generation

ControlNets are highly effective as they can process various types of information, empowering architects and designers with new ways to manage their designs and communicate ideas with clients.

Leveraging NVIDIA accelerated compute capabilities further enhances the performance of diffusion models. NVIDIA-optimized models, such as the SDXL Turbo and LCM-LoRA, offer state-of-the-art performance with real-time image generation capabilities. These models significantly improve inference speed and reduce latency, enabling the production of up to four images per second–drastically reducing the time required for high-resolution image generation. 

Diffusion models offer several specific benefits to the AEC sector, enhancing various aspects of design, visualization, and project management:

High-quality visualizations

Diffusion models can generate photorealistic images and videos from simple sketches, textual descriptions, or a combination. This capability is invaluable for creating detailed architectural renderings and visualizations, helping decision-makers understand and visualize proposed projects. 

Daylighting and energy efficiency

Diffusion models can generate daylighting maps and analyze the impact of natural light on building designs. This helps optimize window placements and other design elements to enhance indoor daylighting and energy efficiency, ensuring that buildings are comfortable and sustainable. 

Rapid prototyping

By automating the generation of design alternatives and visualizations, including materials, or object positioning, diffusion models can significantly speed up the design process. Architects and engineers can explore more design options faster, leading to more innovative and optimized solutions. 

Cost savings and process optimization

Diffusion models enable the customization of BIM (Building Information Modeling) policies to suit the needs of specific regions and projects. By ensuring that resources are directed to the areas of greatest need, resource allocation is improved. This flexibility makes sure that ‌policies are tailored to the unique requirements of different regions and projects, leading to reduced project costs and improved overall efficiency. 

Use, customize, or build your diffusion models

Organizations can leverage diffusion models in multiple ways. They can use pretrained models as-is, customize them for specific needs, or build new models from scratch and harness their full potential by tailoring them to a user’s unique requirements. 

Pretrained models are deployable immediately, reducing the time to market and minimizing initial investment. Customizing pretrained models enables the integration of domain-specific data, improving accuracy and relevance for particular applications. Developing models from scratch, although resource-intensive, enables the creation of highly specialized solutions that can address unique challenges and provide a competitive edge. 

Consider diffusion models in the AEC industry like architecting a house. Using pretrained models is similar to using standard prefabricated homes—they’re ready to use, saving time and initial costs. Customizing pretrained models is like modifying standard off-the-shelf house plans to fit specific requirements, making sure the design meets particular needs and preferences. Building models from scratch is similar to creating entirely new blueprints from the ground up. This approach offers the most flexibility and customization but requires significant expertise, time, and resources. 

Each method has advantages and disadvantages, enabling organizations to select the most suitable approach according to their project objectives and available resources.

Pretrained models for quick deployment

For many organizations, the quickest way to benefit from diffusion models is to use pretrained models. Available through the NVIDIA API catalog, these models are optimized for high performance and can be deployed directly into applications.

NVIDIA NIM offers a streamlined and efficient way for organizations to deploy diffusion models, enabling the generation of high-resolution, realistic images from text prompts. With prebuilt containers, organizations can quickly set up and run diffusion models on NVIDIA accelerated infrastructure (available from NVIDIA workstations, data centers, cloud services partners, and private on-prem servers). 

This approach simplifies the deployment process and maximizes performance, enabling businesses to focus on building innovative generative AI workflows without the complexities of model development and optimization. 

Developers can experience and experiment with NVIDIA-hosted NIMs at no charge

Members of the NVIDIA Developer Program can access NIM for free for research, development, and testing on their preferred infrastructure.

Enterprises can deploy AI applications in production with NIM through the NVIDIA AI Enterprise software platform

Customizing diffusion models

Customizing diffusion models can improve the relevance, accuracy, and performance of diffusion models for AEC organizations. It also enables organizations to include their own knowledge and industry-specific terms, and to address specific challenges. 

Fine-tuning involves taking a pretrained model and adjusting its parameters using a smaller, domain-specific dataset to better align with the specific needs and nuances of the organization. This tailored approach improves the quality and utility of the generated content and offers scalability and flexibility. Organizations can adapt the models as their needs evolve. 

For firms wanting a user-friendly path to start customizing diffusion models, NVIDIA AI Workbench offers a streamlined environment that lets data scientists and developers get up and running quickly with generative AI. With AI Workbench users can get started with pre-configured projects that are adaptable to different data and use cases. It’s ideal for quick, iterative development and local testing. 

Example projects, such as fine-tuning diffusion models, can be modified to support things like generating architectural renderings. Furthermore, this flexibility extends to ‌ supported infrastructure. Users can start locally on NVIDIA RTX-powered AI Workstations and scale to virtually anywhere—data center or cloud—in just a few clicks. For more details on how to customize diffusion models, explore the GitHub project

Another lightweight training technique used for fine-tuning diffusion models is Low-Rank Adaptation or LoRA. LoRA models are ideal for architectural firms due to their small size. They can be managed and trained on local workstations without extensive cloud resources.

Check out how you can seamlessly deploy and scale multiple LoRA adapters with NVIDIA NIM

For advanced customization and high-performance training, NVIDIA NeMo offers a comprehensive, scalable, and cloud-native platform. NeMo offers a choice of customization techniques and is optimized for at-scale inference of diffusion models, with multi-GPU and multi-node configurations. 

The DRaFT+ algorithm, integrated into the NeMo framework, enhances the fine-tuning of diffusion models and makes sure that the model produces diverse and high-quality outputs aligned with specific project requirements. For more technical details and to access the DRaFT+ algorithm, visit the NeMo-Aligner library on GitHub

NVIDIA Launchpad provides a free hands-on lab environment where AEC professionals can learn to fine-tune diffusion models with custom images and optimize them for specific tasks, such as generating high-quality architectural renderings or visualizing construction projects.

Building diffusion models that match your style

Now that we’ve covered pretrained and customized models, let’s build diffusion models from scratch. Investing in custom diffusion models allows AEC organizations to harness the full potential of AI, leading to more efficient, accurate, and innovative project outcomes.

For instance, an architectural firm might build their own diffusion model to generate design concepts that align with their specific architectural style and client preferences, while a construction company could develop a model to optimize resource allocation and project scheduling.

One example of this approach is the work of Heatherwick Studio, a design firm based in London. They’ve been using AI in their design process. The studio is known for its innovative projects around the world, including Google’s headquarters in London and California, Africa’s first museum of contemporary African art in Cape Town, and a new district in Tokyo. Heatherwick Studio has been developing tools that use their data to streamline design processes, rendering, and data access.

“At the studio, we not only believe in the transformational power of AI to improve the industry but are actively developing and deploying in-house custom diffusion models in our everyday work,” said Pablo Zamorano, head of Geometry and Computational Design at Heatherwick studio. 

“We have developed a web-based tool that enables quick design provocations, fast rendering, and image editing as well as a tool that allows for tailored knowledge search from within our BIM tools. These tools empower the work of our designers and visualizers and are now well established.”

Image illustration of converting an architectural sketch into a detailed render.
Figure 5. Image courtesy of Heatherwick Studio, showing a 2D diagram used to generate design options based on their custom models

Creating custom diffusion models with NVIDIA

NeMo provides a powerful framework that provides components for building and training custom diffusion models on-premises, across all leading cloud service providers, or in NVIDIA DGX Cloud. It includes a suite of customization techniques from prompt learning to parameter-efficient fine-tuning (PEFT), making it ideal for AEC customers who need to generate high-quality architectural renderings and optimize construction visualizations efficiently. 

Alternatively, NVIDIA Picasso is an AI foundry leveraged by asset marketplace companies to build and deploy cutting-edge generative AI models with APIs for commercially safe visual content. 

Built on Picasso, generative AI services by Getty Images for image generation and Shutterstock for 3D generation, create commercially safe visual media from text or image. AEC organizations can fine-tune their choice of Picasso-powered models to create custom diffusion models that generate images from text prompts or sketches in different styles. Picasso supports end-to-end  AI model development, from data preparation and model training to model fine-tuning and deployment, making it an ideal solution for developing custom generative AI services. 

Responsible innovation with diffusion models

Using AI models involves several critical steps, including data collection, preprocessing, algorithm selection, training, and evaluation. Each of these steps requires careful consideration to make sure the model performs well and meets the specific needs of the project. 

However, it’s equally important to integrate responsible AI practices throughout this process. Generative AI models, despite their impressive capabilities, are susceptible to biases, security vulnerabilities, and unintended consequences. Without proper safeguards, these models can produce outputs that reinforce harmful stereotypes, discriminate against certain demographics, or contain security flaws. 

Additionally, protecting the security of diffusion models is crucial for generative AI-powered applications. NVIDIA introduced accelerated Confidential Computing, a groundbreaking security feature that mitigates threats while providing access to the unprecedented acceleration of NVIDIA H100 Tensor Core GPUs for AI workloads. This feature makes sure that sensitive data remains secure and protected, even during processing.

Get started

Generative AI, particularly diffusion models, is revolutionizing the AEC industry by enabling the creation of photorealistic renderings and innovative designs from simple sketches or textual descriptions.

To get started, AEC firms should prioritize data collection and management, identify processes that can benefit from automation, and adopt a phased approach to implementation. The NVIDIA training program helps organizations train their workforce on the latest technology and bridge the skills gap by offering comprehensive technical hands-on workshops and courses.

Discuss (0)