Enabling the World’s First GPU-Accelerated 5G Open RAN for NTT DOCOMO with NVIDIA Aerial

NVIDIA, working with Fujitsu and Wind River, has enabled NTT DOCOMO to launch the first GPU-accelerated commercial Open RAN 5G service in its network in Japan. This makes it the first-ever telco in the world to deploy a GPU-accelerated commercial 5G network.

The announcement is a major milestone as the telecom industry strives to address the multi-billion-dollar problem of driving improvements in performance, total cost of ownership (TCO), and energy efficiency. The solution unlocks the flexibility, scalability, and supply chain diversity promise of Open RAN.

DOCOMO and its partners confirmed that the solution is based on Fujitsu’s virtualized centralized unit (vCU) and virtualized distributed unit (vDU), the NVIDIA Aerial platform, and Wind River’s distributed cloud platform.

The 5G Open RAN solution is the first 5G vRAN for telecom commercial deployment using the NVIDIA Aerial platform. The platform brings together the NVIDIA Aerial vRAN stack for 5G, AI frameworks, accelerated compute infrastructure, and long-term software support and maintenance. It delivers innovative and transformational new capabilities for telco operators.

TCO and energy efficiency benefits

Working with vendors Fujitsu and Wind River, the new deployment uses the NVIDIA Aerial platform to deliver high performance, high cell density, and flexibility to DOCOMO’s 5G network, bringing better utilization of its network, lower TCO, and reduced power consumption.

DOCOMO notes that, when compared to its standard network based on a proprietary solution, this new solution reduces TCO by up to 30%, improves network design utilization by up to 50%, and reduces power consumption at base stations by up to 50%.

As of July 2023, DOCOMO serves over 22M 5G subscribers, from 20K+ base stations, with 5G coverage in over 815 cities. It uses 29 types of radio units (RUs) from four vendors and eight types of CUs from three vendors. The introduction of vRAN is expected to expand the capacity and coverage of the 5G network.

While the ability to mix and match products from different vendors promises improvements in flexibility and scalability from Open RAN networks, it poses two major concerns for operators:

First, it makes it challenging to bring out the best performance from the different vendor products.
Second, there are often technical issues that are only found at interoperability testing, which any operator deploying Open RAN must deal with.

NVIDIA, Fujitsu, Wind River 5G Open RAN partnership

DOCOMO launched OREX as an Open RAN service brand to address the challenges facing Open RAN. After the project was launched in February 2021, Fujitsu, NVIDIA, and Wind River worked together under OREX to develop the 5G vRAN solution, which is based on Fujitsu’s vDU and vCU.

The solution uses the following components:

Commercial-off-the-shelf (COTS) servers
Wind River distributed cloud platform
Fujitsu’s 5G vRAN software
NVIDIA Aerial vRAN stack
NVIDIA Converged Accelerator

This is the first vendor consortium to deliver a commercial live 5G vRAN that meets NTT DOCOMO’s performance and interoperability requirements.

The NVIDIA Aerial platform includes NVIDIA Aerial vRAN stack software for the physical (PHY) layer 1 (L1), and NVIDIA Converged Accelerator with its combined data processing unit (DPU) and GPU for hardware acceleration of the computationally intense L1 workload. This makes it the world’s first DPU and GPU-accelerated (that is, NVIDIA-accelerated) 5G Open RAN to be deployed commercially to deliver a scalable, flexible, and cost-efficient network.

Diagram shows the software stack of the Fujitsu DU, NVIDIA Aerial L1, and Wind River Studio Cloud platform. The hardware stack includes pictures of a COTS server and NVIDIA Converged Accelerator next to the GPU, DPU, and CPU boxes. — *Figure 1. NVIDIA-accelerated 5G vRAN stack deployed by NTT DOCOMO*

NVIDIA Aerial platform: Building blocks for wireless innovation

NVIDIA is driving innovation in the telecom industry with a portfolio of wireless frameworks, AI frameworks, and accelerated computing infrastructure. This enables the development of high-performance, fully software-defined, and AI-native networks with cloud economics (Figure 2).

Diagram lists benefits for high-performance, AI-native, fully software-defined 5G networks with cloud economics: faster and flexible deployment, ease of management, open platform, highest spectral efficiency, and more. — *Figure 2. RAN innovation benefits with the NVIDIA full-stack 5G vRAN*

The accelerated computing infrastructure is made up of a combination of CPU, DPU, and GPU, together with a range of NVIDIA-certified COTS hardware servers.

NVIDIA Aerial is the platform with software, hardware, and support for delivering innovation in the wireless market segment. It brings together the NVIDIA Aerial vRAN stack for 5G, AI frameworks, other wireless frameworks, an accelerated compute infrastructure, and long-term software support.

Thanks to this combination of industry-shaping hardware, software, and the long-term support and maintenance typical for a commercial-grade software stack, this enables new performance thresholds for 5G networks.

The key components of the platform are as follows:

Software: NVIDIA Aerial vRAN stack
Hardware: NVIDIA Accelerated Computing
Carrier-grade support

Software: NVIDIA Aerial vRAN stack

This is an application framework for building high-performance, 100% software-defined, cloud-native, 5G vRAN, with O-RAN 7.2-x split. The NVIDIA Aerial vRAN stack is highly flexible and scalable and can deliver high performance for the L1.

NVIDIA Aerial has adopted a GPU-centric approach and relies on two notable subcomponents:

NVIDIA cuBB SDK (CUDA baseband)
NVIDIA DOCA GPUNetIO

cuBB provides GPU-accelerated 5G L1 processing. It delivers high throughput and efficiency by keeping all PHY layer processing within the high-performance GPU memory. The cuBB SDK also includes the 5G L1 high-PHY acceleration library cuPHY, which is optimized for NVIDIA GPUs. cuPHY offers unparalleled scalability by using the GPU’s massive computing capability and a high degree of parallelism.

NVIDIA DOCA GPUNetIO improves the performance of inline hardware acceleration. It provides optimized I/O and packet processing by exchanging packets directly between GPU memory and the DPU using direct memory access (DMA) technology.

The full stack for NVIDIA Aerial 5G vRAN showing the cuBB SDK, framework libraries, and the toolkit and drivers. — *Figure 3. NVIDIA Aerial vRAN stack for complete L1 PHY acceleration*

Hardware: NVIDIA Accelerated Computing

This is the processing engine of the vRAN and is made up of CPUs, DPUs, and GPUs, together with COTS hardware servers.

The performance of the NVIDIA Aerial vRAN stack, especially the computationally intensive L1, is dependent on the choice of hardware it is deployed on. NVIDIA offers two hardware options for 5G network deployments. Their comparative performance is shown in Table 1 below.

x86 and NVIDIA Converged Accelerator (used in the NTT DOCOMO announcement)
NVIDIA Grace Hopper and NVIDIA BlueField DPU

x86 and NVIDIA Converged Accelerator

This option combines the performance of NVIDIA DPUs and GPUs, together with an x86 CPU server, to deliver maximum performance for the 5G vRAN. This is also the hardware acceleration option in the current DOCOMO deployment.

The integration of the GPU and DPU brings all front-haul enhanced common public radio interface (eCPRI) data traffic into the GPU without the CPU in the datapath. This is a full inline L1 offload, so the solution achieves high performance by avoiding the back-and-forth of eCPRI data between the CPU and the hardware accelerator in alternative systems across the host PCIe interface.

After the data is in the GPU, it benefits from the massive parallelism of the GPU architecture to optimize the processing capacity of the base station system. This brings improved RU capacity and processing power, provides a high-quality communications environment, and can handle high-load data processing along with future improvements in antenna technologies.

NVIDIA Grace Hopper and NVIDIA BlueField DPU

The NVIDIA Grace Hopper Superchip brings together the NVIDIA Grace CPU, which is based on the Arm architecture, and the high-performance NVIDIA Hopper GPU. The BlueField DPU helps to achieve the same performance from full inline L1 offload in a similar way to NVIDIA Converged Accelerator.

However, the biggest boost to performance comes from the integration of the CPU and GPU architectures using NVIDIA NVLink-C2C to deliver a CPU+GPU coherent memory model for accelerated workloads such as 5G vRAN.

NVIDIA NVLink-C2C is the NVIDIA memory-coherent, high-bandwidth, and low-latency interconnect. It delivers up to 900 GB/s total bandwidth: 7x higher bandwidth than the x16 PCIe Gen5 lanes commonly used in accelerated systems.

With the NVLink-C2C memory coherency, both CPU and GPU threads can concurrently and transparently access both CPU and GPU resident memory, enabling the RAN to optimize how it handles eCPRI data across CPU and GPU.

X86 + NVIDIA Converged Accelerator (refer to AX800)		Grace Hopper + BlueField 3 (refer to GH200)
Up to 20 peak cells of 4T4T = equivalent to 36 Gbps per 2U server	Configuration*	Up to 40 peak cells of 4T4T = equivalent to 72 Gbps per 2U server
3.2X (36 Gbps)	5G Performance*	6.4X (72 Gbps)
76X (tokens/sec)	LLM* (LLAMA 65B)	284X (tokens/sec)
1.3X (34 W/Gbps)	5G Power Efficiency*	3.3X (13 W/Gbps)
1X (PCIE)	CPU – GPU Bandwidth*	7X (C2C)

* Relative performance is estimated vs X86 5G SKU for 100 MHZ ,4T4R, 4DL/2UL. PCIE Gen5, 2U Server

Table 1: Comparative performance of NVIDIA Converged Accelerators vs NVIDIA Grace Hopper for 5G vRAN. Footnote: DOCOMO’s current deployment is using the X86 + NVIDIA Converged Accelerator option.

Carrier-grade support

The NVIDIA Aerial platform provides a full-stack, carrier-grade, hardened, and mature 5G solution with 10 years of long-life support and maintenance for telecommunications operators. This level of carrier-grade support provides assurances of reliability and resilience for telcos for field or commercial deployment using the platform.

OREX: Building out from Japan and beyond

The commercial deployment of a 5G Open RAN network by NTT DOCOMO, using the NVIDIA 5G platform, is a major milestone for the telecommunications industry. It showcases the capabilities of GPU-based acceleration for computationally intensive L1 PHY processing.

This new network comes with improved performance, flexibility, and scalability, plus higher cell density, significant improvements in energy efficiency, and a reduction in TCO. The delivery of this network paves the way for widespread adoption of GPU-based acceleration in cellular RANs.

DOCOMO and its partners in OREX are working together to promote a multi-vendor, Open RAN–compliant 5G vRAN to the global operator community. The commercial deployment in Japan is in alignment with the vision of OREX, enabling its members to validate their solutions commercially and then promote it to other operators globally.

NVIDIA continues to work with DOCOMO and other partners to support operators around the world to deploy high-performance, energy-efficient, software-defined, commercial 5G vRAN.