Is EVPN magic? As Arthur C Clarke said, any sufficiently advanced technology is indistinguishable from magic. On that premise, moving from a traditional layer 2 environment to VXLAN driven by EVPN has much of that same hocus-pocus feeling.
To help demystify the sorcery, I aim to help users new to EVPN understand how EVPN works and how the control plane converges. In this post, I focus on basic layer 2 (L2) building blocks then work my way up to layer 3 (L3) connectivity and the control plane.
I use the reference topology as the cable plan and foundation to build your understanding of the traffic flow. The infrastructure tries to demystify a symmetric-mode EVPN environment using distributed gateways. All configurations are standardized using the production-ready automation and linked in the publicly available cumulus_ansible_modules GitLab repo.
To follow along, build your own Cumulus in the Cloud and deploy the following playbook:
~$ git clone https://gitlab.com/cumulus-consulting/goldenturtle/cumulus_ansible_modules.git Cloning into 'cumulus_ansible_modules'... remote: Enumerating objects: 822, done. remote: Counting objects: 100% (822/822), done. remote: Compressing objects: 100% (374/374), done. remote: Total 4777 (delta 416), reused 714 (delta 340), pack-reused 3955 Receiving objects: 100% (4777/4777), 4.64 MiB | 22.64 MiB/s, done. Resolving deltas: 100% (2121/2121), done. ~$ ~$ cd cumulus_ansible_modules/ ~/cumulus_ansible_modules$ ansible-playbook -i inventories/evpn_symmetric/host playbooks/deploy.yml
EVPN message types
Like any good protocol, EVPN has a robust process for exchanging information with its peers: message types. If you already know OSPF and the LSA messages, you can think of EVPN message types as similar. Each EVPN message type can carry a different kind of information about the EVPN traffic flow.
There are about five different message types. In this post, I focus on the two most popular types for now: Type 2 MAC and Type 2 MAC/IP information.
Digging into EVPN message types: Type 2
The easiest EVPN messages to understand are type 2. As mentioned earlier, type 2 routes contain MAC and MAC/IP mappings. To start off, inspect a type 2 entry at work. To do that, you can verify basic connectivity from leaf01 to the server01.
First, look at the bridge table to make sure that the MAC address of the switch has the correct mapping to the correct port for the server.
Get the Server01 MAC address:
cumulus@server01:~$ ip address show ... 5: uplink: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000 link/ether 44:38:39:00:00:32 brd ff:ff:ff:ff:ff:ff inet 10.1.10.101/24 scope global uplink valid_lft forever preferred_lft forever inet6 fe80::4638:39ff:fe00:32/64 scope link valid_lft forever preferred_lft forever
Look at Leaf01’s bridge table to make sure the MAC address is mapped to the port that you expect. Cross reference it with LLDP:
cumulus@server01:~$ ip address show ... 5: uplink: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000 link/ether 44:38:39:00:00:32 brd ff:ff:ff:ff:ff:ff inet 10.1.10.101/24 scope global uplink valid_lft forever preferred_lft forever inet6 fe80::4638:39ff:fe00:32/64 scope link valid_lft forever preferred_lft forever Look at Leaf01’s bridge table to make sure the MAC address is mapped to the port that you expect. Cross reference it with LLDP: cumulus@leaf01:mgmt:~$ net show bridge macs VLAN Master Interface MAC TunnelDest State Flags LastSeen -------- ------ --------- ----------------- ---------- --------- ------------------ -------- ... 10 bridge bond1 46:38:39:00:00:32 <1 sec cumulus@leaf01:mgmt:~$ net show lldp LocalPort Speed Mode RemoteHost RemotePort --------- ----- ---------- ------------------- ----------------- eth0 1G Mgmt oob-mgmt-switch swp10 swp1 1G BondMember server01.simulation 44:38:39:00:00:32 swp2 1G BondMember server02 44:38:39:00:00:34 swp3 1G BondMember server03 44:38:39:00:00:36 swp49 1G BondMember leaf02 swp49 swp50 1G BondMember leaf02 swp50 swp51 1G Default spine01 swp1 swp52 1G Default spine02 swp1 swp53 1G Default spine03 swp1 swp54 1G Default spine04 swp1 Checking the ARP table, you can validate that the MAC and IP addresses are mapped correctly. cumulus@leaf01:mgmt:~$ net show neighbor Neighbor MAC Interface AF STATE ------------------------- ----------------- ------------- ---- --------- ... 10.1.10.101 44:38:39:00:00:32 vlan10 IPv4 REACHABLE ...
Now that you’ve checked the basics, start looking at how this gets pulled into EVPN. Validate the local VNIs that are configured:
cumulus@leaf01:mgmt:~$ net show evpn vni VNI Type VxLAN IF # MACs # ARPs # Remote VTEPs Tenant VRF 20 L2 vni20 9 2 1 RED 30 L2 vni30 10 2 1 BLUE 10 L2 vni10 11 4 1 RED 4001 L3 vniRED 2 2 n/a RED 4002 L3 vniBLUE 1 1 n/a BLUE
Because you validated that server01 is mapped to vlan10 as per the bridge mac table, you now check if the IP neighbor entries are being pulled into the EVPN cache. This cache describes the information that is being exchanged with the other EVPN speakers in the environment.
cumulus@leaf01:mgmt:~$ net show evpn arp-cache vni 10 Number of ARPs (local and remote) known for this VNI: 4 Flags: I=local-inactive, P=peer-active, X=peer-proxy Neighbor Type Flags State MAC Remote ES/VTEP Seq #'s ... 10.1.10.101 local active 44:38:39:00:00:32 0/0 10.1.10.104 remote active 44:38:39:00:00:3e 10.0.1.34
Here’s what you’ve got so far. The L2 connectivity works correctly as the L2 bridge table and L3 neighbor table are populated locally on leaf01. Next, you verified that the mac and IP information are being properly pulled into EVPN through the EVPN ARP cache.
Using this information, you can check the RD and RT mapping so that you can learn more about the full VNI advertisement.
An RD is a route distinguisher. It’s used to disambiguate EVPN routes in different VNIs, as they may have the same MAC or IP address.
The RTs are route targets. They are used to describe the VPN membership for the route, specifically which VRFs are exporting and importing the different routes in the infrastructure.
cumulus@leaf01:mgmt:~$ net show bgp l2vpn evpn vni Advertise Gateway Macip: Disabled Advertise SVI Macip: Disabled Advertise All VNI flag: Enabled BUM flooding: Head-end replication Number of L2 VNIs: 3 Number of L3 VNIs: 2 Flags: * - Kernel VNI Type RD Import RT Export RT Tenant VRF * 20 L2 10.10.10.1:2 65101:20 65101:20 RED * 30 L2 10.10.10.1:4 65101:30 65101:30 BLUE * 10 L2 10.10.10.1:3 65101:10 65101:10 RED * 4001 L3 10.10.10.1:5 65101:4001 65101:4001 RED * 4002 L3 10.10.10.1:6 65101:4002 65101:4002 BLUE
Because the local L2 VNI has RD 10.255.255.11:2, the RD is essentially an identifier for all routes that are exchanged by this node. When looking elsewhere in the fabric, you use that information to see all the routes advertised by leaf01.
cumulus@leaf01:mgmt:~$ net show bgp l2vpn evpn route rd 10.10.10.1:3 EVPN type-1 prefix: [1]:[ESI]:[EthTag]:[IPlen]:[VTEP-IP] EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC] EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP] EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP] EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP] BGP routing table entry for 10.10.10.1:3:UNK prefix Paths: (1 available, best #1) Advertised to non peer-group peers: leaf02(peerlink.4094) spine01(swp51) spine02(swp52) spine03(swp53) spine04(swp54) Route [2]:[0]:[48]:[44:38:39:00:00:32] VNI 10/4001 Local 10.0.1.12 from 0.0.0.0 (10.10.10.1) Origin IGP, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (First path received) Extended Community: ET:8 RT:65101:10 RT:65101:4001 Rmac:44:38:39:be:ef:aa Last update: Tue May 18 11:41:45 2021 BGP routing table entry for 10.10.10.1:3:UNK prefix Paths: (1 available, best #1) Advertised to non peer-group peers: leaf02(peerlink.4094) spine01(swp51) spine02(swp52) spine03(swp53) spine04(swp54) Route [2]:[0]:[48]:[44:38:39:00:00:32]:[32]:[10.1.10.101] VNI 10/4001 Local 10.0.1.12 from 0.0.0.0 (10.10.10.1) Origin IGP, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (First path received) Extended Community: ET:8 RT:65101:10 RT:65101:4001 Rmac:44:38:39:be:ef:aa Last update: Tue May 18 11:44:38 2021 .... Displayed 8 prefixes (8 paths) with this RD
Here’s an important piece of information. There are two different forms that a type 2 route can take. In this case, you’re sending each of the two types:
- Type 2 MAC Route: It only includes a 48-byte MAC entry. This entry is pulled in directly from the bridge table and only has L2 information in it. Anytime a MAC address is learned in the bridge table, that MAC address is pulled into EVPN as a type 2 MAC route.
- Type 2 MAC/IP Route: These entries are pulled into EVPN from the ARP table. Reading this entry, the first section includes MAC address and the second one is a mapping for the IP address and mask. The mask for the IP address is a /32. As this is pulled from the ARP table, all EVPN routes are pulled in as host routes.
BGP routing table entry for 10.10.10.1:3:UNK prefix ... Route [2]:[0]:[48]:[44:38:39:00:00:32] VNI 10/4001 … BGP routing table entry for 10.10.10.1:3:UNK prefix ... Route [2]:[0]:[48]:[44:38:39:00:00:32]:[32]:[10.1.10.101] VNI 10/4001 ...
Using this information, you can validate that this /32 host route for server01 is in the routing table of leaf03 as a pure L3 route, pointing out to the L3VNI.
cumulus@leaf01:mgmt:~$ net show route vrf RED show ip route vrf RED ====================== Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, F - PBR, f - OpenFabric, > - selected route, * - FIB route, q - queued, r - rejected, b - backup t - trapped, o - offload failure VRF RED: K>* 0.0.0.0/0 [255/8192] unreachable (ICMP unreachable), 00:18:17 C * 10.1.10.0/24 [0/1024] is directly connected, vlan10-v0, 00:18:17 C>* 10.1.10.0/24 is directly connected, vlan10, 00:18:17 B>* 10.1.10.104/32 [20/0] via 10.0.1.34, vlan4001 onlink, weight 1, 00:18:05 C * 10.1.20.0/24 [0/1024] is directly connected, vlan20-v0, 00:18:17 C>* 10.1.20.0/24 is directly connected, vlan20, 00:18:17 B>* 10.1.30.0/24 [20/0] via 10.0.1.255, vlan4001 onlink, weight 1, 00:18:04
Spend some time dissecting this output. The neighbor entry in Leaf01 for Server01 has made it all the way to Leaf03 as a /32 host route where the next hop is leaf01 but through the L3VNI.
To validate that the connection between the L2 VNI and the L3 VNI are accomplished successfully, examine the L3 VNI:
cumulus@leaf01:mgmt:~$ net show evpn vni 4001 VNI: 4001 Type: L3 Tenant VRF: RED Local Vtep Ip: 10.0.1.12 Vxlan-Intf: vniRED SVI-If: vlan4001 State: Up VNI Filter: none System MAC: 44:38:39:be:ef:aa Router MAC: 44:38:39:be:ef:aa L2 VNIs: 10 20
In this output, the L3 VNI of 4001 is mapped to VRF RED, which you validated in the output of net show evpn vni 10
. Using this, you also can see that VNI 10 is mapped to VRF 4001 through VLAN 4001. All the outputs that you’re seeing are lining up to indicate that you have a full working EVPN Type 2 VXLAN infrastructure.
Summary
There you have it. From start to finish, you saw how EVPN works for Type 2–based routes. Specifically, I discussed the different EVPN message types and how control planes converge in an L2 extension environment. It’s not witchcraft, just good technology.
For more information about extending the EVPN control plane demystification and tackling the traffic flows around Type 5 messages and VXLAN routing, see [LINK]. If you haven’t already, I highly recommend trying this out for yourself with NVIDIA Cumulus in the Cloud. If you’d like to take a deeper dive, we’ve put together a hub of EVPN content, from whitepapers to videos.