Data Center / Cloud

NVIDIA-Certified Next-Generation Computing Platforms for AI, Video, and Data Analytics Performance

Next-generation computing graphic

The business applications of GPU-accelerated computing are set to expand greatly in the coming years. One of the fastest-growing trends is the use of generative AI for creating human-like text and all types of images. 

Driving the explosion of market interest in generative AI are technologies such as transformer models that bring AI to everyday applications, from conversational text to protein structure generation. Visualization and 3D computing are also rapidly gaining interest, particularly in the areas of industrial simulation and collaboration. 

GPUs are poised to become a significant driver of efficiency and cost savings for data analytics, business intelligence, and machine learning, with the acceleration of core applications such as Apache Spark. Finally, AI inference deployments at the edge represent one of the fastest growing areas for enterprises, driven by the expansion of smart spaces and industrial automation. 

A new generation of computing technologies designed to address these increasingly complex compute demands is emerging. This includes new GPU architectures from NVIDIA, as well new CPUs from AMD, Intel, and NVIDIA. 

Global system manufacturers have created new systems that bring these together into powerful computing platforms designed to address a full range of accelerated computing workloads. These systems are NVIDIA-Certified to ensure the best performance, reliability, and scale for enterprise solutions, and are available for purchase today. Visit the Qualified System Catalog to learn more. This post describes some of these new technologies, and discusses the best way for enterprises to take advantage of them.

Accelerate generative AI and large language models

Optimized for training large language models and for inference, NVIDIA HGX H100 servers perform up to 4x faster for AI training and up to 30x faster for AI inference compared to the previous generation NVIDIA A100 Tensor Core GPUs.* The latest servers, which include the new generation of CPUs, feature the highest performance for AI and HPC, as detailed below.

  • 4-way H100 GPUs with 268 TFLOPs FP64
  • 8-way H100 GPUs with 31,664 TFLOPs FP8
  • 3.6 TFLOPs FP16 with NVIDIA SHARP in-network compute
  • Fourth-generation NVLink with 3x faster all-reduce communications
  • PCIe Gen5 end-to-end for higher data transfer rates from CPU to GPU to network
  • 3.35 TB/s memory bandwidth per GPU

*Configuration: HGX A100 cluster: HDR IB network. HGX H100 cluster: NDR IB network, GPT-3 16 B 512 (batch 256), GPT-3 16 K (batch 512). All performance numbers are from the NVIDIA H100 GPU Architecture whitepaper.

Right chart shows 30x higher performance for H100 vs A100 for Megatron 530B. Left chart shows 4X higher performance for H100 over A100 for GPT-3 175B.
Figure 1. The NVIDIA HGX H100 significantly outperforms the NVIDIA HGX A100 in real-time inference and training throughput in different configurations

At the NVIDIA GTC 2023 keynote, NVIDIA announced the NVIDIA H100 NVL, an H100 PCIe product with dual connections for NVLink featuring 94 GB of HBM3 memory. It is ideally suited for large language models and delivers 12x the performance of NVIDIA HGX A100 for GPT-3. 

NVIDIA H100 PCIe GPU configuration includes an NVIDIA AI Enterprise software suite subscription to streamline development and deployment of AI production workloads. It provides all the capabilities of NVIDIA H100 GPUs in just 350 watts of thermal design power (TDP). This configuration can optionally use the NVLink bridge for connecting up to two GPUs at 600 GB/s of bandwidth, nearly 5x PCIe Gen5. 

Well suited for mainstream accelerated servers that go into standard racks offering lower power per server, the NVIDIA H100 PCIe GPU provides great performance for applications that scale from one to four GPUs at a time, including AI inference and HPC applications. 

NVIDIA partners are shipping NVIDIA-Certified servers with H100 PCIe today. Visit the Qualified System Catalog to learn more. Systems from other partners with both NVIDIA H100 PCIe and NVIDIA HGX H100 are expected to be NVIDIA-Certified later this year. Taken together, these new platforms enable enterprises to run the latest AI and HPC applications with even better performance and greater scale.   

Energy-efficient performance for AI video and inference 

The NVIDIA Ada Lovelace L4 Tensor Core GPU delivers universal acceleration and energy efficiency for video, AI, virtual workstations, and graphics applications in the enterprise, in the cloud, and at the edge. And with the NVIDIA AI platform and full-stack approach, the L4 GPU is optimized for video and inference at scale for a broad range of AI applications to deliver the best in personalized experiences. To learn more, see Supercharging AI Video and AI Inference Performance with NVIDIA L4 GPUs.

As the most efficient NVIDIA accelerator for mainstream, servers equipped with the L4 GPU enable up to 120x higher AI video performance over CPU solutions, while providing 2.7x more generative AI performance. They provide over 4x more graphics performance compared to the previous generation. The NVIDIA L4 GPU is versatile with an energy-efficient, single-slot, low-profile form factor, making it ideal for edge, cloud, and enterprise deployments.

Graphs showing measured performance:  8x L4 vs 2S Intel 8380 CPU server performance comparison : end-to-end video pipeline with CV-CUDA pre-post processing, decode, inference (SegFormer), encode, TRT 8.6 vs CPU only pipeline using OpenCV 4.7.
L4 vs T4: image generation performance, 512x512 Stable Diffusion, FP16
Figure 2. NVIDIA L4 GPU boosts video and AI performance over the NVIDIA T4 Tensor Core GPU

The NVIDIA L4 GPU edge use case benefits from its video acceleration with hardware decoders and encoders plus its AI acceleration with Tensor Cores. These are valuable in edge video analysis applications for smart cities, factory quality assurance, and retail marketing in smart spaces. The L4 GPU is uniquely designed to address requirements for AI in HPC edge sensor processing applications. Its graphics and video performance supercharge visualization for scientific applications at the edge instrument.

The NVIDIA L4 GPU is available in NVIDIA-Certified Systems from NVIDIA partners, including Advantech, ASUS, Atos, Cisco, Dell Technologies, Fujitsu, GIGABYTE, Hewlett Packard Enterprise, Lenovo, QCT, and Supermicro in over 100 unique server models.   

Next-generation CPUs

Advances in CPU technologies complement the new NVIDIA GPUs. The newest generation of CPUs includes the 4th Gen Intel Xeon Scalable processors, also known as Sapphire Rapids, as well as the 4th Generation AMD EPYC processors, also known as Genoa. These latest architectures have capabilities that enable enterprises to run the latest AI applications with even better performance and greater scale. This includes high data speed transfer across the system bus and higher data bandwidth from main memory. 

The NVIDIA Grace Hopper Superchip, based on Arm architecture, delivers excellent performance and energy efficiency. Built for giant-scale AI and HPC, Grace Hopper Superchip features NVLink C2C to deliver a CPU plus GPU coherent memory model for accelerated AI.

NVIDIA-Certified Systems for accelerated computing

As each new generation of technology brings added sophistication, the need for prevalidated solutions to streamline acquisition is greater than ever. The NVIDIA-Certified Systems program was created specifically to answer this need. 

NVIDIA-Certified Systems bring together NVIDIA GPUs and NVIDIA high-speed, secure networking to systems from leading NVIDIA partners in configurations validated for optimum performance, reliability, and scale for a diverse range of workloads. 

The tests are based on real-world data and represent the latest GPU-accelerated applications, including deep learning training with PyTorch and TensorFlow, HPC, data analytics with Apache Spark, and 3D computing with NVIDIA Omniverse

The certification is built entirely on a container-based test suite using Kubernetes for orchestration, ensuring that any certified system can be seamlessly integrated into modern cloud native management frameworks.

It is important to understand the difference between qualification and NVIDIA certification. A qualified system has undergone thermal, mechanical, power, and signal integrity tests to ensure a particular NVIDIA GPU is fully functional in that server model. A certified system has passed a set of tests to validate its performance for a wide range of workloads categories, as well as for networking, security, and management features. These capabilities become critical for any enterprise computing solution. 

If you want to ensure that the system is both supported and optimally designed and configured, choose a certified system. 

Enterprise-ready next-generation computing platforms

NVIDIA-Certified Systems from global manufacturers with the new generation of GPU and CPU technologies are available today. Visit the Qualified Systems Catalog to see what models are available from your preferred vendor. 

Discuss (0)