Data Center / Cloud

New Video: What Runs ChatGPT?

Decorative picture of connected servers.

Some years ago, Jensen Huang, founder and CEO of NVIDIA, hand-delivered the world’s first NVIDIA DGX AI system to OpenAI. Fast forward to the present and OpenAI’s ChatGPT has taken the world by storm, highlighting the benefits and capabilities of artificial intelligence (AI) and how it can be applied in every industry and business, small or enterprise.

Now, have you ever stopped to think about the technologies and infrastructure that it takes to host and support ChatGPT?

In this video, Mark Russinovich, Microsoft Azure CTO, explains the technology stack behind their purpose-built AI supercomputer infrastructure. It was developed by NVIDIA and Microsoft Azure, in collaboration with OpenAI, to host ChatGPT and other large language models (LLMs) at any scale.

Key takeaways

  • A data parallelism approach resulted in 30x higher performance in inferencing and 4x higher for model training with NVIDIA H100 Tensor Core GPUs.
  • To meet the higher processing demands of LLMs, VMs were scaled with NVIDIA Quantum-2 InfiniBand networking
  • Server failures and network flaps are inevitable with large-scale training. Microsoft’s Project Forge introduced transparent checkpointing to quickly resume jobs and maintain high levels of utilization globally
  • Low-rank adaptive (LoRA) fine-tuning decreases GPU usage and checkpoint size when handling billion-parameter models at an increased scale.
  • Industry pioneers such as Wayve are leveraging AI supercomputer infrastructure for compute-intensive workloads.
  • Upcoming support for confidential computing with NVIDIA H100 GPUs on Azure will help secure sensitive data and protect valuable AI models in use, enabling secure multi-party collaboration use cases for AI.
Video 1. What runs ChatGPT? Inside Microsoft’s AI supercomputer | Featuring Mark Russinovich


When training AI models with hundreds of billions of parameters, an efficient data center infrastructure is key: from increasing throughput and minimizing server failures to leveraging multi-GPU clusters for compute-intensive workloads.

For more information about optimizing your data center infrastructure to reliably deploy large models at scale, see the following resources:

    Discuss (0)