ChatGPT, Stable Diffusion, DALL-E, and similar applications have awakened the world to generative AI. ChatGPT is the fastest-growing application in history. The ease of use and impressive capabilities have attracted over a hundred million users in just a few months.
Generative AI has created a sense of urgency for companies to reimagine their products and business models. As NVIDIA CEO Jensen Huang said in his GTC 2023 Keynote, the iPhone moment of AI has arrived. And NVIDIA accelerated computing is helping the world’s enterprises leverage AI by solving problems that are beyond the capacity of normal computers.
NVIDIA BlueField DPUs power accelerated computing
The NVIDIA accelerated computing technology stack enables every industry to tap into the power of AI, delivering the performance, scale, and efficiency levels needed for running the next wave of applications.
Accelerated computing runs primarily on three foundational elements:
- CPUs that are used for serial processing and running hyperthreaded applications.
- GPUs that excel at parallel processing and are optimized for accelerating modern workloads.
- DPUs that are ideal for infrastructure computing tasks; used to offload, accelerate, and isolate data center networking, storage, security, and manageability workloads.
In a modern software-defined data center, the OS executing virtualization, network, storage, and security can consume nearly half of the data center’s CPU cores and associated power. Data centers must accelerate every workload to reclaim power and free CPUs for revenue-generating workloads.
NVIDIA BlueField data processing units (DPUs) offload and accelerate the data center OS and infrastructure software.
NVIDIA is integrating BlueField-3 across its data center computing systems—including the third-generation of NVIDIA OVX systems, for creating and operating NVIDIA Omniverse applications at data center-scale. For more information, see Omniverse at Scale: NVIDIA Announces Third-Generation OVX Computing Systems to Power Industrial Metaverse Applications.
Leading enterprises using DPUs
Oracle Cloud Infrastructure (OCI) recently unveiled its plan to integrate NVIDIA BlueField-3 DPUs into its networking stack, following an announcement made at GTC 2023. This plan aims to optimize data center performance by offloading networking and security tasks from the CPU to the DPU, resulting in faster and more efficient processing.
By leveraging the power of BlueField-3 DPUs, OCI is poised to enhance its infrastructure and provide customers with a seamless cloud experience. For more details, see Oracle Cloud Infrastructure Chooses NVIDIA BlueField Data Center Acceleration Platform.
In addition, over two dozen ecosystem partners, including Check Point, Cisco, DDN, Dell EMC, Juniper, Palo Alto Networks, Red Hat, and VMWare, use BlueField data center acceleration technology to run their software platforms more efficiently.
NVIDIA BlueField-3 platform overview
NVIDIA BlueField-3, with 22 billion transistors, is the third-generation NVIDIA DPU. It is a system-on-a-chip (SoC) device that delivers Ethernet and InfiniBand connectivity at up to 400 Gbps. Supporting up to four distinct MAC addresses, BlueField-3 can offer various port configurations from a single port running 400 Gbps (four lanes of PAM4 112) to four ports running at speeds of 25, 50, or 100 Gbps.
BlueField-3 has 2x the network bandwidth, 4x the compute power, and almost 5x the memory bandwidth compared to the previous generation—all while delivering full backward compatibility through the NVIDIA DOCA software framework.
These key advancements enable BlueField-3 to run workloads up to 8x faster while reducing the TCO and delivering data center energy efficiency. For example, Bluefield-3 offloads HPC/AI MPI collective operations from the CPU, delivering nearly a 20% increase in speed, which translates to $18 million dollars in cost savings for large-scale supercomputers. For more information, see Take the Green Train: NVIDIA BlueField DPUs Drive Data Center Efficiency.
The BlueField-3 DPU consists of three major blocks:
- Networking: The latest generation NVIDIA ConnectX-7 SmartNIC with integrated networking and security hardware accelerators.
- Programmable compute: A powerful cluster of 16 ARM A78 v8.2 with fully coherent low-latency mesh interconnect optimized for control-plane applications. Data-plane programmability is achieved through the accelerated pipeline and a new programmable Data Path Accelerator (DPA). DPA is an I/O and packet processor consisting of 16 hyperthreaded cores, purpose-built for IO-intensive, low-compute tasks such as device emulation, congestion control, custom protocols, and more.
- Memory: Dual 64-bit DDR5-5600 memory interfaces (80 GB bandwidth) and integrated 32-lane PCIe Gen 5.0 switch. The PCIe interface can be bifurcated and used as either server-hosted (endpoint) or self-hosted (root complex) to manage a GPU or direct attached SSD devices.
Acting as a “server in front of a server,” BlueField-3 is the only DPU platform with an integrated ASPEED AST2600 baseboard management controller (BMC). The BlueField BMC is a dedicated processor that monitors the physical state of the DPU board and enables the system administrator to manage the platform through an independent connection. This enhances system security, reliability, availability, and serviceability.
The DPU BMC is a trusted entity with its own external root-of-trust to ensure that its firmware is secured. It enables provisioning and managing the BlueField DPU over a separated, out-of-band management network, using standard interfaces and Redfish protocol to manage the full lifecycle of the DPU.
Some of the BMC functionalities include:
- Console interface access to the BlueField DPU
- Setting BlueField UEFI configuration
- Monitoring the BlueField DPU and its resources
- Updating and recovery the BlueField DPU firmware
- Reset control (even when BlueField OS is halted)
BlueField-3 comprehensive portfolio
NVIDIA offers a broad range of BlueField-3 platforms designed to meet the unique computing, memory, and performance needs of various industries and use cases. This enables customers to choose the right BlueField-3 product that matches their specific requirements, while enjoying advanced features and cutting-edge performance.
Target markets and flagship platforms
BlueField-3 DPUs are used across several key flagship platforms and target markets, as detailed below.
Hyperscale HPC / AI
HPC and AI workloads are the first to embrace network speeds of 400 Gbps (NDR InfiniBand and 400 GbE), as HPC is all about maximum performance and immense scale. BlueField extends NVIDIA in-network computing capabilities by leveraging its Arm cores to offload elements of the message passing interface (MPI) library from the system host CPU, and implement the nonblocking collective operations. This enables the system host CPU to perform computation with peak overlap.
B3240: Boasts the performance and network capabilities to address the most challenging Hyperscale HPC/AI needs. This BlueField-3 platform powers systems like NVIDIA DGX H100 to take on scientific research or generative AI workloads. It uses dual 400 Gbps NDR connectivity, 32 GB DDR5 memory subsystem, and Arm core speed of 2.3 GHz.
B3140H: Features a half-height-half-length (HHHL) form factor, making it compatible with most enterprise servers. This device offers a single 400 Gbps port and 16 GB DDR5 memory, all while operating within a low power envelope. This makes it an ideal choice for HPC/AI environments that require scalable performance within the constraints of limited space or power availability.
The rapid growth that the cloud industry is experiencing requires cloud providers to continually innovate and tailor their service offerings to meet customer demand. Modern cloud platforms use hypervisor-based virtualization to maximize the number of virtual instances allocated to tenants at both the compute and data center levels. The BlueField-3, which supports up to 4,096 virtual functions, enables cloud providers to host 4-8x more virtual instances on a cloud compute platform, compared to the previous generation.
B3220: Providing dual 200 Gbps support, 32 GB DDR5 memory subsystem and Arm cores speed of 2.3 GHz, the B3220 has the performance and network capabilities to serve the most challenging cloud needs. This is why hyperscaler Oracle Cloud Infrastructure (OCI) has added BlueField-3 to its networking stack, aimed at providing state-of-the-art, sustainable cloud infrastructure with extreme performance. The B3220 platform also powers NVIDIA OVX 3.0 systems, enabling higher performance, zero-trust security, and limitless scaling of industrial metaverse applications in the cloud.
B3210: At 100 Gbps, the B3210 is the best fit to address the demands of leading enterprise data centers. The B3210 is the target DPU for running enterprise workload platform VMware vSphere, improving performance, efficiency, and security for thousands of companies.
B3220SH: The B3220SH self-hosted platform is optimized for NVMe storage systems with integrated NVMe-oF or NVMe/TCP or data-at-rest accelerators. B3220SH can host up to 16 SSDs using its x32 PCIe Gen 5.0 interface.
Industry-leading network performance
BlueField-3 offers significant performance improvements over its predecessor, making it an ideal solution for data-intensive AI workloads that require high-performance networking. Figure 3 shows benchmark results that attest to the leading BlueField-3 network performance.
Power your applications with NVIDIA BlueField-3 DPUs
Modern workloads such as generative AI, data science, and metaverse applications are booming, at a time when cloud dominates enterprise IT. To address the skyrocketing demand for AI, cloud builders are turning to NVIDIA accelerated computing—primarily GPUs and DPUs.
NVIDIA BlueField-3 DPUs, powered by NVIDIA DOCA software, transform traditional computing environments into efficient, high-performance, secure, and sustainable data centers, enabling the delivery of the next wave of applications. For more information, check out the BlueField-3 datasheet and Networking Resources.