Choosing NVIDIA Spectrum for Microsoft Azure SONiC

Everyone agrees that open solutions are the best solutions but, there are few truly open operating systems for Ethernet switches. At NVIDIA, we embraced open source for our Ethernet switches. Besides supporting SONiC, we have contributed many innovations to open-source community projects.

This post was originally published on the Mellanox blog in June 2018 but has been updated.

Microsoft runs one of the largest clouds in the world with Azure. In building and deploying Azure, they have gained a lot of insight into managing a global, high-performance, highly available, and secure network.

The network operating system (NOS) Microsoft uses for Azure, SONiC (Software for Open Networking in the Cloud), is built on open source. Their experience with hundreds of data centers and tens of thousands of switches has educated them about what is required:

  • Use best-of-breed switching hardware.
  • Ensure that deploying new features won’t affect end users.
  • Updates must be released securely and reliably across the globe within hours.
  • Use cloud-scale deep telemetry and automation for failure mitigation.
  • Enable software-defined networking to quickly provision and manage hardware elements in the network through a unified structure to eliminate duplication and reduce failures.

SONiC, a breakthrough for network switch operations and management, addresses these requirements. Microsoft open-sourced this innovation to the community, making it available on their SONiC GitHub repository.

SONiC is a uniquely extensible platform with a large and growing ecosystem of hardware and software partners that offers multiple switching platforms and various software components.

SONiC system’s architecture comprises multiple modules that interact with each other through a centralized and scalable infrastructure. This infrastructure relies on a Redis-database engine which allows data persistence, replication, and multi-process communication among all SONiC subsystems.

The Redis-engine infrastructure relies on a messaging paradigm of publisher/subscriber so that applications can subscribe only to the data views that they require, avoiding implementation details irrelevant to their functionality.

Diagram shows the configuration and management tools plus network applications working on the SONic base.
Figure 1. SONiC architecture

For more information about the SONiC architecture, see Architecture in the SONiC wiki.

NVIDIA Spectrum switches support a variety of Layer 2 and Layer 3 networking connectivity and management features. Table 1 shows the features that SONiC currently supports.

Table 1. Currently supported features

Why should you use NVIDIA Spectrum Switch with SONiC?

When choosing a switch to run SONiC on top, you should look at two main factors:

  • Is the switch vendor capable of supporting your deployment, ASIC, Switch Abstraction Interface (SAI), and software-wise?
  • What are the capabilities of the ASIC running underneath?

NVIDIA Spectrum ASIC-based switches

The NVIDIA Open Ethernet Switch portfolio is entirely based on the Spectrum ASIC, providing the lowest latency for 25G/100G in the market, zero packet loss, and a fully shared buffer. It is the ideal combination for cloud networking demands.

SONiC works with the Spectrum ASICs through their unique driver solutions. SONiC uses SAI, an open-source driver solution co-invented by NVIDIA. This open capability of Spectrum also means that any Linux distribution can run on a Spectrum switch.

NVIDIA is the only switch silicon vendor that has contributed their ASIC driver directly to the Linux kernel, enabling support for a mix of SONiC and any standard Linux distributions, like Red Hat or Ubuntu, to run directly on the switch.

Image shows multiple company logos  under sections labeled Application & Management Tools, SONiC, and SAI.
Figure 2. The SONiC development community

NVIDIA is the only company participating in all levels of the SONiC development community. We are one of the first companies to develop and adopt SAI. SONiC fully supports all Spectrum family switches and can be deployed on any switch in our Ethernet portfolio. We are also a major and active contributor to the SONiC OS feature set.

Pictures of SN2700, SN2410, and SN2100 switches.
Figure 3. NVIDIA switches

All NVIDIA networking platforms support port splitting through the SONiC OS, the only platforms that currently support this feature. Spectrum switches also deliver exceptional network performance compared to a commodity silicon-based switch using real-life mixed frame size, “noisy neighbor,” and microburst absorption scenarios.

For more information about the fundamental differences between NVIDIA Spectrum and Broadcom Tomahawk-based switches, and our unmatched ASIC performance, see Tolly Performance Evaluation: NVIDIA Spectrum-3 Ethernet Switch.

NVIDIA Spectrum switch systems are an ideal spine and top-of-rack solution, allowing flexibility, with port speeds ranging from 10 Gb/s to 100 Gb/s per port, and port density that enables full rack connectivity to every server at any speed. These ONIE-based switch platforms support multiple operating systems, including SONiC and leverage the advantages of Open Network disaggregation and the NVIDIA Spectrum ASIC capabilities.

Spectrum adaptive routing technology supports various network topologies. For typical topologies such as CLOS (or leaf/spine), the distance of the multiple paths to a given destination is the same. Therefore, the switch transmits the packets through the least congested port.

In other topologies where distances vary between paths, the switch prefers to send the traffic over the shortest path. If congestion occurs on the shortest path, then the least-congested alternative paths are selected. You can build a high-performing CLOS data center using the NVIDIA switches as your building blocks.

Similarly, Border Gateway Protocol (BGP) is a routing protocol responsible for looking at all the available paths that data could travel and picking the best route. BGP enables communication to happen quickly and efficiently.

Diagram shows 32 switches linked with pods by eBGP. Layer 3 ECMP, all links active/active, with very small fault domains.
Figure 4. Typical leaf-spine pod design with BGP as the routing protocol

Spectrum switches enable PODs. A POD is a network, storage, and compute unit that works together to deliver networking services. A POD is a repeatable design pattern that provides scalable and easier-to-manage data centers.

Diagram shows switches linking to multiple clusters and pods.
Figure 5. Scaling to multiple PODs

Finally, the Spectrum family enables a set of advanced network functions that future-proof the switch with the flexibility to handle evolving networking technologies. This includes new protocols that may be developed in the future, enabling custom applications, advanced telemetry, and new tunneling/overlay capabilities. Spectrum combines a programmable, flexible, and massively parallel packet processing pipeline with a fully shared and stateful forwarding database. Spectrum also features What Just Happened (WJH), the world’s most useful switch telemetry technology.

For more information, see the following resources:

Discuss (0)