NVIDIA Grace CPU Integrates with the Arm Software Ecosystem

The NVIDIA Grace CPU is transforming data center design by offering a new level of power-efficient performance. Built specifically for data center scale, the Grace CPU is designed to handle demanding workloads while consuming less power.

NVIDIA believes in the benefit of leveraging GPUs to accelerate every workload. However, not all workloads are accelerated. This is especially true for those workloads involving complex, branchy code such as graph analytics, commonly used in popular use cases like fraud detection, operational optimization, and social network analysis.

As data centers face increasing power constraints, it’s crucial to accelerate as many workloads as possible and run the rest on the most efficient compute possible. The Grace CPU is optimized to handle both accelerated and CPU-only tasks, delivering up to 2x the performance at the same power as conventional CPUs.

The Grace CPU features 72 high-performance and energy-efficient Arm Neoverse V2 cores, connected by the NVIDIA Scalable Coherency Fabric (SCF). This high-bandwidth fabric ensures smooth data flow between CPU cores, cache, memory, and system I/O, providing up to 3.2 TB/s of bisection bandwidth—double that of traditional CPUs.

The Grace CPU also uses high-speed LPDDR5X memory with server-class reliability, delivering up to 500 GB/s of memory bandwidth while consuming just one-fifth the energy of traditional DDR memory.

In this post, we wanted to share how the Grace CPU builds on the existing Arm ecosystem while taking advantage of the vast array of NVIDIA software and tools.

Standard software infrastructure

The Grace CPU was designed to be a balanced general-purpose CPU and to work just like any other CPU. The workflow for getting software to run on the Grace CPU is the same workflow that you’d use on any x86 CPU. Standard Linux distros (Ubuntu, RHEL, SLES, and so on) and any multi-platform, open-source compiler (GCC, LLVM, and so on) all support the Grace CPU.

The majority of open source software today already supports Arm, and thus is supported on the Grace CPU. Similarly, any software optimizations and porting done on the Grace CPU also work on the rest of the Arm Neoverse software ecosystem.

NVIDIA continues to work with developers and partners in the Arm ecosystem and is committed to ensure that open-source compilers, libraries, frameworks, tools, and applications fully leverage Arm Neoverse-based CPUs, like the Grace CPU.

Many cloud-native and commercial ISV applications already provide optimized executables for Arm. The Arm Developer Hub provides a showcase of selected software packages for AI, cloud, data center, 5G, networking, and edge. This hub also provides guidance on how to migrate applications to Arm.

This ecosystem is enabled by Arm standards, such as the Arm Server Base System Architecture (SBSA) and the Base Boot Requirements (BBR) of the Arm SystemReady Certification Program.

NVIDIA software supports the Arm ecosystem

Arm has invested for decades in the software ecosystem. You can innovate and know that the software not only just works but is optimized for Arm. The NVIDIA software ecosystem also takes advantage of decades of work in accelerated computing and has now been optimized for Arm:

The NVIDIA HPC SDK and every CUDA component have Arm-native installers and containers.
The NVIDIA container ecosystem of NVIDIA NIM microservices and NGC provides deep learning, machine learning, and HPC containers optimized for Arm. NVIDIA NIM enhances inference performance, enabling high-throughput and low-latency AI at scale.

NVIDIA is also expanding its software ecosystem for Arm CPUs. NVIDIA previously launched a new suite of high performance math libraries for Arm CPUs called NVIDIA Performance Libraries (NVPL). These libraries implement standard APIs, making their adoption an easy drop-in replacement from x86 at the linking stage.

Similarly, math libraries such as the Arm’s Performance Library (ArmPL) are also tuned to maximize the performance of the Grace CPU in addition to any other Arm CPU. For example, Arm has shared how ArmPL Sparse can be used in a similar fashion to x86. ArmPL has similar APIs to those of the x86 math libraries, which means that developing a wrapper may require nothing more than just a few API changes in the code.

NVIDIA is an active participant in the open-source software communities like those for GCC and LLVM compilers. If you don’t want to wait for these regular releases and want to build code that performs optimally on the Grace CPU, the latest optimizations are also made available through the Clang distribution.

Seamlessly moving your software to Arm

The Arm software ecosystem is large and growing, with hundreds of open source projects and commercial ISVs already supporting the Arm architecture. If your application is not yet supported, you may need to just recompile the source code. There are a variety of tools available to help you do so:

NVIDIA LaunchPad has a module on porting to Arm that you can try for yourself.
For vector intrinsics, such as AVX, there are tools that simplify conversion. SIMD Everywhere and the A Demonstration of AI and HPC Applications for NVIDIA Grace CPU GTC session walk through several examples.
The Arm Neoverse Migration Overview has detailed training.

For more information about application porting and optimization, see the NVIDIA Grace Performance Tuning Guide. It includes instructions for setting up and optimizing performance on the Grace CPU. It also provides high-level developer guidance on Arm SIMD programming, the Arm memory model, and other details. Use this guide to help you realize the best possible performance for your particular NVIDIA Grace system.

Summary

The NVIDIA Grace CPU is designed for the modern data center with 72 high-performance Arm Neoverse V2 cores, an NVIDIA-designed high-bandwidth SCF to maximize performance and high-bandwidth low-power memory. It can deliver up to 2x the performance in the same power envelope as leading traditional x86 CPUs.

The NVIDIA Grace CPU is a standards-based Arm SBSA design that works just like any other CPU and is fully compatible with the broad Arm software ecosystem.

For more information about software and system setup, see NVIDIA Grace CPU.