Improving 5G Performance Using OvS Over ASAP² with AMD EPYC 7002 and NVIDIA Mellanox SmartNICs

Over the last five years, compute and storage technology have achieved substantial performance increases. At the same time, they’ve been hampered by PCI Express Gen3 (PCIe Gen3) bandwidth limitations.

AMD is the first X86 processor company to release support for the PCIe fourth generation bus (PCIe Gen4) with the AMD EPYC 7002 Series processor. This is the second-generation AMD EPYC processor, but the first x86 data center processor with PCIe Gen4 support delivering substantial system performance improvements by doubling the bandwidth available to storage, networking, and other peripherals when compared to CPUs that only support PCIe Gen3. AMD EPYC 7002 Series processors also offers more PCIe lanes and support for adding more DRAM capacity, allowing the AMD EPYC 7002 Series processor to provide the industry’s highest PCIe bandwidth and memory capacity.

New AMD EPYC 7002 Series processor

The new AMD EPYC 7002 Series processor delivers advanced processing capabilities, capable of unleashing giant performance gains for a wide variety of workloads and aimed at addressing new data center challenges. The new AMD EPYC 7002 Series processor offers up to 64 multithreaded cores per chip for a total of 128 processing cores in a single socket, dual processor server. It delivers dual-socket performance and I/O without the dual-socket price tag. AMD is also the first to bring to market an x86 data center processor based on 7nm process technology. With double the core density and optimizations that improve instructions per cycle, the result is 4x the floating-point performance of first-generation AMD EPYC. Using 7nm process technology also brings energy efficiency so the second-generation AMD EPYC can provide the same performance at half the power consumption. This was based on the June 8, 2018 AMD internal testing of same-architecture product ported from 14 to 7 nm technology with similar implementation flow and methodology, using performance from SGEMM. That is amazing!

Alongside its high core count, there are an extra pair of memory channels, allowing the AMD EPYC 7002 Series processors to take advantage of up to 4 TB of RAM for a single socket and 8 TB for a dual-socket server with 256-GB DIMMs. For companies looking to host multi-tenant workloads, the option of adding more DRAM means more tenants can be added per server, which translates to substantial increase in revenue streams.

ConnectX adapters

NVIDIA Mellanox ConnectX offers 200 Gb/s InfiniBand (HDR) and Ethernet connectivity, with sub-600 nanosecond latency and up to 200 million messages per second. ConnectX SmartNICs and BlueField I/O processing units (IPU) are the world’s first PCIe Gen4 smart adapters.

The ConnectX smart adapter solutions are optimized to provide breakthrough performance and scalability with the new AMD EPYC 7002 Series processor for the most demanding compute and storage infrastructures. By using more of the faster PCI Express 4.0 lanes, ConnectX 100 and 200 gigabit per second adapters can achieve full I/O throughout with direct connectivity to 24 NVMe storage drives in a single system.

The combination of NVIDIA Mellanox adapters with PCIe Gen4 support and the second-generation AMD EPYC processor are ideal for advanced server and storage solutions, providing high-performance computing, artificial intelligence, cloud and enterprise data centers with the high data bandwidth they need for the most compute and storage demanding applications. By leveraging the PCIe Gen4 support in both second-generation AMD EPYC processors and ConnectX adapters, mutual customers can maximize data center ROI.

Kernel bypass technology

Network and storage processing are CPU-intensive operations. However, the CPU doesn’t only have to handle these data movement and processing tasks, it must also perform application workload activities. ConnectX adapters use offloads and accelerators such as Accelerated Switching and Packet Processing (ASAP²), remote direct memory access (RDMA), and overlay network encap/decap (for example, VXLAN) to relieve the CPU from I/O tasks and enable the industry’s lowest network latency. This allows for more efficient data movement for the network, storage devices, and application workloads, resulting in lower application latency and leaving more CPU cycles available to accelerate applications and processes.

Impact on compute and storage

The improved PCIe Gen4 bandwidth and added PCIe lane count directly translates to helping tackle the growing need for more compute processing and storage bandwidth. Most of the bandwidth need is in the PCIe bus as a path to local and networked storage and network links to other servers. The added memory is a bonus for storage solutions where a large memory cache is needed, and the up to 4 TB of memory for a single socket is a lot of headroom for future workloads.

Where do we see AMD EPYC 7002 Series processors fitting in initially? There are many use cases. The first might be single-socket Windows Storage Spaces Direct solutions. These are typically 1U and 2U platforms that support a multi-node, hyperconverged infrastructure (HCI) deployment. Building them with the second-generation AMD EPYC processors allows more dedicated NVMe PCIe lanes without the need for a PCIe NVMe switch. That means more NVMe SSDs with higher storage throughput and IOPS available for workloads running on these platforms.

In a hyperconverged solution, you could set up the system with a higher clock-speed CPU compared to core depth because most of the common virtual machine workloads each use 2-4 virtual CPUs. By using 16 cores with 1 TB of RAM, the AMD EPYC 7002 Series processor provides a solution that bumps up the core density without the need to add the cost of a dual-socket setup.

Again, leading the charge to adopt new technology, cloud computing market is already taking advantage of the massive compute capacity of AMD EPYC 7002 Series processors. Microsoft Azure is already offering their customers industry-leading compute performance for all workloads. After being the first global cloud provider to announce the deployment of AMD EPYC 7001 Series Processor–based Azure VMs in 2017, Microsoft been working together with AMD and NVIDIA Mellanox to continue to bringing the latest computing innovation to enterprises of all size and shape. Azure VMs provide more customer choice for meeting a broad range of requirements on general purpose workloads using the new AMD EPYC 7002 processor and NVIDIA Mellanox SmartNICs.

Impact on 5G, NFV, and edge cloud

For telecommunication carriers and multi-service operator companies who are looking to deploy virtualized telco cloud infrastructure to support 3GPP 5G CUPS, network function virtualization (NFV) and multi-access edge computing (MEC) workloads, having highest capacity economical compute coupled with fastest efficient network means highest performance at the lowest cost for service provider applications. Given the CapEx and OpEx reduction pressure for the service provider industry, the combination of AMD EPYC 7002 Series processors and NVIDIA Mellanox SmartNICs quickly translates to highest ROI and fastest time to average revenue per user (ARPU).

Performance testing

We decided to put an AMD EPYC 7002 Series processorbased server with NVIDIA Mellanox ConnectX-5 PCIe Gen4 SmartNICs to the test in both virtualized and bare metal OpenStack cloud environments. The amazing performance results of our telco benchmark testing are summarized in this post.

AMD EPYC 7002 Series processor with ConnectX NICs delivers 197 million packets per second and near-line rate on bare metal servers.
Figure 1. Virtualized telco cloud testing: Bare metal server test for AMD EPYC 7002 Series processor–based server with ConnectX-5 PCIe Gen4 100G adapters (frame size).

In bare metal server testing, we saw over 197 million packets per sec (Mpps) at 64-byte frames and over 93 Gbps or just over 97% of line rate. While running at 1518-byte frames and using dual ports of a ConnectX-5 with PCIe Gen4 connectivity to an AMD EPYC 7002 Series processor with 16 cores, there was still ample room left for application processing with three fourths of the cores unused and available.

Theoretically, with just a single-socket AMD EPYC 7002 Series processor 64-core system that supports four PCIe Gen4 slots, using ConnectX-5 SmartNICs, you could achieve a 600 Mpps packet rate or 400 Gbps aggregate throughput on a single CPU server. That really is performance!

AMD EPYC 7002 Series processor and ConnectX 5 using ASAP² delivers up to 10X better performance than when using DPDK.
Figure 2. Frame rate in Mpps for the AMD EPYC 7002 Series processor–based server with ConnectX-5 PCIe Gen4 100G adapters and ASAP2 compared to OvS-DPDK.
AMD EPYC 7002 Series processor and ConnectX 5 with ASAP² delivers up to 2.5X better performance than OVS-DPDK.
Figure 3. Throughput as % of line rate for the AMD EPYC 7002 Series processor–based server with ConnectX-5 PCIe Gen4 100G adapters.

In a virtualized server environment, when you compare the ASAP2 OvS hardware offload to OvS-DPDK testing with multi-tenant UDP traffic, ASAP2 achieved 67 Mpps at a 114-byte frame size and 87.84% of the line rate at a 1518-byte frame size, all without any CPU cores required for the network load (that is, UDP VXLAN packet processing).

With OvS-DPDK for multi-tenant UDP traffic, we only achieved 6.6 Mpps for 114-byte frames, or just 33.2 Gb/s and 33.2% of the line rate for 1518-byte frames while still consuming 12 CPU cores for packet processing. With ASAP², we achieved up to 10X or 1000% the packet rate and 2.5X or 250% the throughput compared to OvS-DPDK for overlay UDP traffic, without consuming any CPU cores.

Without ASAP2 technology, the massive compute capacity available in the AMD EPYC 7002 Series processors could remain untapped due to scarcity of high-speed network traffic. Indeed, this proves the well-known adage that faster compute needs faster networks. NVIDIA Mellanox SmartNICs achieve great performance with AMD EPYC 7002 Series processors.

AMD EPYC 7002 Series processor and ConnectX 5 with DPDK delivers up to 25.8 Mpps for 64-byte packets.
Figure 4. Line rate and Mpps for the AMD EPYC 7002 Series processor–based server with ConnectX-5 PCIe Gen4 100G adapters and 12 cores.

In the final test case, we tested OvS performance for UDP-only traffic with all 12 CPU cores dedicated to the OvS running over DPDK. Figure 4 shows the performance results for various packet sizes and the percentage of line rate traffic for the test methodology. At a 64-byte frame size, OvS was able to achieve 25.8 Mpps. This is an amazing performance!

Summary

With the release of industry’s first PCIe Gen4-capable X86 CPU with the AMD EPYC 7002 Series processor, AMD has revolutionized the computing industry to take advantage of the massive compute capacity for all kinds of workloads. The collaboration between NVIDIA Mellanox and AMD has been at the heart of this sea change.

Together with AMD EPYC 7002 Series Processors, NVIDIA Mellanox SmartNICs are enabling smarter, better, and faster networking without compromising the efficiency of modern cloud native data centers. Beyond the phenomenal benchmarking performance already demonstrated for HPC, storage, and cloud computing workloads, NVIDIA Mellanox has now also validated the great performance gained from the combination of AMD EPYC 7002 Series processors and ConnectX network adapters for the telecommunications and service provider use cases.