One-Click Deployments for the Best of NVIDIA AI with NVIDIA Launchables

AI development has become a core part of modern software engineering, and NVIDIA is committed to finding ways to bring optimized accelerated computing to every developer that wants to start experimenting with AI.

To address this, we’ve been working on making the accelerated computing stack more accessible with NVIDIA Launchables: preconfigured GPU computing environments that enable you to deploy reference workflows and start building immediately, with the required compute provided.

What are NVIDIA Launchables?

NVIDIA Launchables are one-click deployable GPU development environments with predefined configurations that can help you get up and running with a workflow. They function as templates that contain all the essential components necessary to achieve a purpose:

NVIDIA GPUs
Python
CUDA
Docker containers
Development frameworks, including NVIDIA NIM, NVIDIA NeMo, and NVIDIA Omniverse
SDKs
Dependencies
Environment configurations

They also can contain GitHub repos or Jupyter notebooks automatically set up and mounted in a GPU instance.

For teams collaborating on projects or individual developers working across multiple environments, Launchables ensure consistent and reproducible setups without manual config and overhead:

On-demand access to NVIDIA GPUs: Start evaluating a reference workflow even without a GPU by spinning up an environment as specified by the preset variables to get to value faster.
Community: Configure an environment for others to easily deploy. Useful for sharing demos, demonstrating training and inference pipelines, and teaching with reference code examples. Creators receive metrics on how a Launchable is viewed or deployed.

Launchable examples

Here are a couple of scenarios where Launchables come in handy:

Setting up Megatron-LM for GPU-optimized training
Running NVIDIA AI Blueprint for multimodal PDF data extraction
Deploying Llama3-8B for inference with NVIDIA TensorRT-LLM

Setting up Megatron-LM for GPU-optimized training

Before tinkering with different parallelism techniques like tensor or pipeline parallelism, you must have PyTorch, CUDA, and a beefy GPU setup to have a reasonable training pipeline.

With the Megatron-LM Launchable, you get access to an 8xH100 GPU node environment from a cloud partner that comes with PyTorch, CUDA, and Megatron-LM setup. Now, you can immediately adjust different parameters, such as --tensor-model-parallel-size and --pipeline-model-parallel-size, to determine which parallelism technique is most suitable for your specific model size and pretraining requirements.

Running NVIDIA AI Blueprint for multimodal PDF data extraction

Unstructured PDF sources can often contain text, tables, charts, and images that must be extracted to run RAG and other downstream generative AI use cases.

The pdf-ingest-blueprint Launchable comes with a Jupyter notebook that sets up a PDF data extraction pipeline for enterprise partners. With the NVIDIA-Ingest microservice and various NIM microservices deployed through the Launchable, you can set up a production-grade pipeline to parallelize document splitting and test retrieval on massive corpuses of PDF data.

Deploying Llama3-8B for inference with NVIDIA TensorRT-LLM

The Run Llama3 Inference with TRT-LLM Launchable comes with a Jupyter notebook guide and is used as documentation. It demonstrates how to deploy Llama3 with TensorRT-LLM for low-latency inference, by converting a model into an ONNX intermediate representation, creating an underlying runtime through a build config (implements optimization plugins for attention mechanisms using --gpt_attention_plugin and matrix multiplication operations using --gemm_plugin), and deploys the TensorRT engine to run inference on input tokens.

Launchable benefits

After collecting feedback from early users, here are some core technical capabilities that have developers excited about using Launchables for reproducible workflows:

True one-click deployment
Environment reproducibility
Flexible configuration options
Built for collaboration

True one-click deployment

Development environment setup typically involves hours of debugging dependencies, configuring GPU drivers, and testing framework compatibility.

Launchables reduce this to a one-click deployment process by providing preconfigured environments with frameworks, CUDA versions, and hardware configurations. This means that you can start writing code immediately instead of wrestling with infrastructure.

Environment reproducibility

Environment inconsistency remains a major source of debugging overhead in AI development teams.

Launchables solve this by packaging your entire development stack, from CUDA drivers to framework versions, into a versioned, reproducible configuration. When you share a Launchable URL, you’re guaranteeing that any end consumer gets an identical development environment, eliminating “works on my machine” scenarios.

Flexible configuration options

Different AI workloads require different hardware and software configurations.

Launchables support this through granular environment customization:

Select specific NVIDIA GPUs (T4 to H100) based on your vRAM requirements.
Define container configurations with precise Python and CUDA version requirements.
Include specific GitHub repositories or Jupyter notebooks to be automatically mounted in your GPU instance.

Built for collaboration

Launchables streamline collaboration by enabling anyone to share complete development environments through a single URL. For open source maintainers, educational instructors, or even teammates sharing an internal project, you can track deployment metrics to understand how others are using your environment.

This is also particularly valuable for ensuring reproducibility in research settings and maintaining consistent training environments across distributed teams.

Creating a Launchable

Creating a Launchable is straightforward:

Choose your compute: Select from a range of NVIDIA GPUs and customize your compute resources.
Configure your environment: Pick a VM or container configuration with specific Python and CUDA versions.
Add your code: Connect your Jupyter notebooks or GitHub repositories to be added to your end GPU environment.
Share and deploy: Generate a shareable link that others can use to instantly deploy the same environment.

Video 1. How to Create an NVIDIA Launchable

After you create a Launchable, you get the following:

A shareable URL: Share with others directly or through an asset like a YouTube video or blog post so that anyone can visit the Launchable. Save in your notes to come back to a preconfigured setup from the past.
Markdown code for a badge: Embed a one-click deployment badge in your GitHub readME, Jupyter notebook, and so on.

As you share the URL with others to use or save it for your own reproducible setup, you can view metrics on how many times your Launchable has been viewed and deployed.

Get started with one-click deployments today

Launchables drastically reduce the traditional friction of sharing and reproducing GPU development environments by letting you package, version, and instantly deploy exact configurations. Teams spend less time on infrastructure setup and more time building AI applications.

We are actively expanding readily available Launchables on build.nvidia.com as new NIM microservices and other NVIDIA software, SDKs, and libraries are released. Explore them today!