Why There is No Ideal Data Center Network Design

Network administrators have a hard job. They are responsible for ensuring connectivity for all users, servers, and applications on their networks. They are often tasked with building a network design before getting application requirements, making a challenging project even more difficult. In these scenarios, it’s logical for networking admins to try to find an ideal network design they can use with any set of applications.

There is no one-size-fits-all network solution that will work every time, and every design has benefits and drawbacks. In this post, we analyze three different network types that could be perceived as ideal. Then, we describe where each falls short, based on real-world factors.

The candidates are:

Pure layer 3
Layer 2 only
Overlay with VXLAN and EVPN

Ready? Let’s get started.

Pure layer 3 design

Many forward-thinking architects think pure layer 3 (L3) is the ideal design due to its simplicity and reliance on only one protocol stack. All traffic is routed and balanced at the L3 level using equal-cost multipath and endpoint redundancy is achieved through a natively functional anycast address solution. It’s simple and elegant.

Many large web-scale IT companies choose it for its excellent operational efficiency. It also gives them robust control over their application environment to design applications that work within this design.

Applications that rely on network overlays or pure routing are optimized for the L3 architecture. Whether using a container-based solution that leverages routing as its mechanism to provide access to the environment, or a Container Network Interface to encapsulate the container-to-container communication, these solutions work great on this architecture.

The advent of SmartNICs and DPUs makes L3 more user-friendly by providing host-based solutions to offload resource-intensive tasks such as storing routing tables, performing packet encapsulation, and doing NAT.

The biggest drawback with L3 is that it doesn’t allow any distribution of layer 2 (L2) adjacency. Over time, most enterprises must introduce an application that requires L2 adjacency, either within or between racks. Historically, developers have been unreliable in writing their applications to handle clustering using L3 capabilities. Instead of using DNS or other L3 discovery processes, many legacy applications use L2 broadcast domains to discover and detect nodes to join the cluster. A pure L3 solution struggles to service software that requires such an environment because each L2 domain is limited to one node or one server.

Layer 2 only design

The L2 only solution is the opposite of pure L3. L2 only primarily leverages VLANs for segregating its connectivity and relies on legacy features like MLAG and spanning tree protocol (STP) to provide a distributed solution. The L2-only solution still has a place in network environments, typically in simple, static environments that don’t require scale.

People are comfortable with L2, as it uses tried-and-true technologies familiar to most people. It’s simple in the protocol stack, making all forwarding decisions based on only the first two layers of the OSI model. Also, most low-cost network devices on the market are capable of these feature sets.

However, L2 has gaps in scale and performance. Relying on STP across three tiers to prevent loops, leads to inefficient redundant paths. To circumvent this limitation in spanning-tree convergence, you can try deploying back-to-back MLAG. MLAG is not as efficient as a pure layer 3 solution at handling device failures and synchronizing control planes, however. L2 networks tend to limit broadcast and multicast traffic. These are just a few limitations that create a hidden cost of ownership around deploying an L2 only design.

Overlay design: VXLAN and EVPN

The most common design in the enterprise data center is VXLAN as the transport layer encapsulation technology, with EVPN as the control plane technology. This architecture provides the greatest flexibility, with all the benefits of a pure layer 3 solution, and provides the network administrator the adaptability to support applications requiring L2 to function.

It provides the benefits of L2 adjacency without introducing inefficient protocols such as STP and MLAG. Leveraging EVPN as the L2 control plane and multihoming as the optimal alternative to MLAG, overlay solutions solve many inefficiencies with L2.

A one-size-fits-all solution like VXLAN and EVPN could be thought of as ideal, but even this has drawbacks. Its detractors point to the multiple layered protocols required to make it operate. The solution builds on a BGP-enabled underlay with EVPN configured between the tunnel endpoints. VXLAN tunnels are configured on top of the overlay with varying levels of complexity depending on tenancy requirements. This may include integrating with VRFs, introducing L3 VNIs for intersubnet communication, and the reliance on border leafs for intertenant communication through VRF route leaking. Combining all these technologies can create a level of complexity that makes troubleshooting and operations difficult.

Conclusion

Everything has tradeoffs, whether you’re sacrificing operational complexity for network simplicity or trading application control for flexibility. The upside of accepting that there is no perfect network design is that you are now free to pick and choose the architectures and workflow that best fit your network. Work with your application and infrastructure teams to identify the server requirements, optimize your workflows, and select the best solutions for your applications’ needs.

Learn about NVIDIA Networking Solutions.