New Video: What Runs ChatGPT?

Some years ago, Jensen Huang, founder and CEO of NVIDIA, hand-delivered the world’s first NVIDIA DGX AI system to OpenAI. Fast forward to the present and OpenAI’s ChatGPT has taken the world by storm, highlighting the benefits and capabilities of artificial intelligence (AI) and how it can be applied in every industry and business, small or enterprise.

Now, have you ever stopped to think about the technologies and infrastructure that it takes to host and support ChatGPT?

In this video, Mark Russinovich, Microsoft Azure CTO, explains the technology stack behind their purpose-built AI supercomputer infrastructure. It was developed by NVIDIA and Microsoft Azure, in collaboration with OpenAI, to host ChatGPT and other large language models (LLMs) at any scale.

Key takeaways

A data parallelism approach resulted in 30x higher performance in inferencing and 4x higher for model training with NVIDIA H100 Tensor Core GPUs.
To meet the higher processing demands of LLMs, VMs were scaled with NVIDIA Quantum-2 InfiniBand networking
Server failures and network flaps are inevitable with large-scale training. Microsoft’s Project Forge introduced transparent checkpointing to quickly resume jobs and maintain high levels of utilization globally
Low-rank adaptive (LoRA) fine-tuning decreases GPU usage and checkpoint size when handling billion-parameter models at an increased scale.
Industry pioneers such as Wayve are leveraging AI supercomputer infrastructure for compute-intensive workloads.
Upcoming support for confidential computing with NVIDIA H100 GPUs on Azure will help secure sensitive data and protect valuable AI models in use, enabling secure multi-party collaboration use cases for AI.

Video 1. What runs ChatGPT? Inside Microsoft’s AI supercomputer | Featuring Mark Russinovich

Summary

When training AI models with hundreds of billions of parameters, an efficient data center infrastructure is key: from increasing throughput and minimizing server failures to leveraging multi-GPU clusters for compute-intensive workloads.

For more information about optimizing your data center infrastructure to reliably deploy large models at scale, see the following resources:

NVIDIA AI platform: Make AI development easier with full-stack innovation, from computing and software to AI models and services.
Modern Data Centers: See out how IT leaders are scaling and managing data centers to readily adopt NVIDIA AI.
H100 Tensor Core GPU: Speed up LLMs by 30x over the previous generation with the combined technology innovations.
NVIDIA NeMo: Enable your enterprise to build, customize, and deploy LLMs to power generative AI applications.
NVIDIA Quantum InfiniBand Platform