Today, brands and their creative agencies are under huge strain to create and deliver high-quality, accurate product images at scale, from campaign key visuals to packshots for e-commerce. Audience-targeted content, such as personalized and localized visual variations, adds additional layers of complexity to production.
Production costs, short timelines, resources, and maintaining brand identity are all repeat hurdles that prevent marketing teams from creating more assets and more targeted content for their audience segments.
For example, an espresso machine manufacturer might want to target a wide range of audiences for an upcoming product launch, from young professionals living in a city to older generations enjoying retirement in the countryside. Historically, this would require multiple workstreams, locations, teams, and review cycles to execute, which are often not possible, limiting the available content that marketing teams can use for targeting.
To generate high-quality, brand-accurate content at scale for wide-ranging audience segments, creative teams can now harness generative AI workflows. Integrating generative AI into tools and applications used for brand-accurate visual asset generation and content production can unlock new possibilities and efficiencies for the content supply chain.
Many developers are already working to make this a reality.
In this post, we introduce the NVIDIA Omniverse Blueprint for 3D conditioning for precise visual generative AI, give an overview of how it works, what you could use it for, and hear from a few industry leaders on how they are thinking about the development of this field.
NVIDIA Omniverse Blueprints are reference workflows that enable you to easily implement and build 3D, simulation, and digital twin applications.
Model conditioning to unlock generative AI for scalable and controlled asset creation
Integrating generative AI into a workflow to create precise on-brand images can be problematic if there is no control over the visual input of the product. You can have specific geometry, color, logos, and brand guidelines be misinterpreted or lost without certain conditioning.
Model conditioning means providing a model with specific information or rules to help it make better predictions or decisions based on what you want it to do. To condition an LLM, you provide text-based instructions, examples, context, or previous conversation history. For image generators, you can provide text or a sample image.
But this only provides so much control over the AI model. This is why 3D conditioning is required.
Setting the stage in 3D enables artists to have ultimate creative control or direction over the output of the generated visuals. Building an easy-to-use UI for end-user interaction enables non-technical teams to iterate and create content in a controlled and conditioned framework, while keeping branded assets untouched by the AI.
This Omniverse Blueprint takes a multimodal approach, along with a hybrid of combining 3D for the hero asset and simple environment geometry and 2D render passes for rapid inpainting to complete the controlled scene. You maintain the integrity of the product digital twin with masking and can frame the shot by changing camera angle and zoom through a 3D viewport.
Building a 3D-conditioned workflow for precise visual generative AI involves a handful of key components:
- On-brand hero asset: A finalized asset, built by an artist and typically approved by a brand manager and art director, which should be considered the hero asset. For this example, we provided a simple espresso machine.
- A simple, untextured 3D scene: Provided by a 3D artist, to use for staging the hero asset and controlling layout and composition.
- Custom application: Built with the Kit App Template based on Kit 106.2.
- Generative AI microservices and kit extensions: Add generative AI functionality to your custom application. In this case, a diffusion model takes care of inpainting.
- Solution testing: Verifies the functionality and performance of your integrated workflow.
For this workflow, we specifically explored microservices that enable you to use generative AI while also taking advantage of OpenUSD for 3D application and workflow development.
Omniverse Blueprints are designed to be extensible and customizable. Here are some additional components that you can introduce to the workflow:
- Large multimodal models (LMMs) + ComfyUI: Fast generative text-to-image models that can synthesize photorealistic images from a text prompt.
- Edify 360 NIM: Shutterstock early access preview of generative 3D service for 360 High Dynamic Range Image (HDRI) generation. Trained on NVIDIA Edify using Shutterstock’s licensed creative libraries.
- Edify 3D NIM: Shutterstock generative 3D service for 3D asset generation, used for additional 3D objects for scene dressing. Trained on NVIDIA Edify using Shutterstock’s licensed creative libraries
- USD Code: A language model that answers OpenUSD knowledge queries and generates USD Python code.
- USD Search: An AI-powered search for OpenUSD data, 3D models, images, and assets using text– or image-based inputs.
By the end of the workflow guide, you will be able to develop your own custom app with AI to enable and accelerate your creative and marketing teams. All microservices are currently available as a preview on build.nvidia.com, where you can make API calls for evaluation.
Marketing ecosystem builds with NVIDIA Omniverse Blueprints
Developers at independent software vendors (ISVs) and production services agencies are building the next generation of content creation solutions, infused with controllable generative AI, built on OpenUSD.
For example, Accenture Song, GRIP, Monks, WPP, and Collective World are adopting Omniverse Blueprints to accelerate development.
Developing a scalable AI solution for on-brand asset creation
This blueprint provides you with an example architecture of how to build controllable generative AI applications. You or your client can now get the most out of your app:
- Multimodal AI-generated final-frame campaign assets
- Rapid concepting and ideation for key visuals
- Batch processing of prompt inputs, generating potentially hundreds of visual outputs from predefined text prompts fed from a database
By implementing this blueprint, you or your client get the following benefits:
- Accelerated time to market: Significantly decrease the time it takes to create high-resolution branded assets to allow for products to be taken to market faster.
- Low-effort localization: Enable the creation of localized imagery instantly to help brands meet certain cultural trends or requirements for different markets.
- Increased productivity: Easy-to-use tools that use 3D data can lower the technical skillset traditionally associated with high-fidelity asset creation.
Get started
In this post, we introduced the NVIDIA Omniverse Blueprint for 3D conditioning for precise visual generative AI and showed you ways to benefit from building generative AI applications for brand-accurate visual asset generation and content production.
For more information, see the following resources:
- 3D conditioning for precise visual generative AI blueprint with the interactive demo in the NVIDIA API catalog
- GA release of USD Search API, including a downloadable Helm chart for self-deployment to interface with your own data on your own infrastructure
- Reference architecture sample workflow with a step-by-step guide to implementing the blueprint
- /NVIDIA-Omniverse-blueprints/3d-conditioning GitHub repo, including a workflow guide