Networking / Communications

Looking Behind the Curtain of EVPN Traffic Flows

Is EVPN magic? As Arthur C Clarke said, any sufficiently advanced technology is indistinguishable from magic. On that premise, moving from a traditional layer 2 environment to VXLAN driven by EVPN has much of that same hocus-pocus feeling.

To help demystify the sorcery, I aim to help users new to EVPN understand how EVPN works and how the control plane converges. In this post, I focus on basic layer 2 (L2) building blocks then work my way up to layer 3 (L3) connectivity and the control plane.

I use the reference topology as the cable plan and foundation to build your understanding of the traffic flow. The infrastructure tries to demystify a symmetric-mode EVPN environment using distributed gateways. All configurations are standardized using the production-ready automation and linked in the publicly available cumulus_ansible_modules GitLab repo.

To follow along, build your own Cumulus in the Cloud and deploy the following playbook:

~$ git clone https://gitlab.com/cumulus-consulting/goldenturtle/cumulus_ansible_modules.git
  
 Cloning into 'cumulus_ansible_modules'...
 remote: Enumerating objects: 822, done.
 remote: Counting objects: 100% (822/822), done.
 remote: Compressing objects: 100% (374/374), done.
 remote: Total 4777 (delta 416), reused 714 (delta 340), pack-reused 3955
 Receiving objects: 100% (4777/4777), 4.64 MiB | 22.64 MiB/s, done.
 Resolving deltas: 100% (2121/2121), done.
  
 ~$
 ~$ cd cumulus_ansible_modules/
 ~/cumulus_ansible_modules$ ansible-playbook -i inventories/evpn_symmetric/host playbooks/deploy.yml 

EVPN message types

Like any good protocol, EVPN has a robust process for exchanging information with its peers:  message types. If you already know OSPF and the LSA messages, you can think of EVPN message types as similar. Each EVPN message type can carry a different kind of information about the EVPN traffic flow.

There are about five different message types. In this post, I focus on the two most popular types for now: Type 2 MAC and Type 2 MAC/IP information.

Digging into EVPN message types: Type 2

The easiest EVPN messages to understand are type 2. As mentioned earlier, type 2 routes contain MAC and MAC/IP mappings. To start off, inspect a type 2 entry at work. To do that, you can verify basic connectivity from leaf01 to the server01.

First, look at the bridge table to make sure that the MAC address of the switch has the correct mapping to the correct port for the server.

Get the Server01 MAC address:

cumulus@server01:~$ ip address show
 ...
 5: uplink: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
      link/ether 44:38:39:00:00:32 brd ff:ff:ff:ff:ff:ff
      inet 10.1.10.101/24 scope global uplink
      valid_lft forever preferred_lft forever
      inet6 fe80::4638:39ff:fe00:32/64 scope link
      valid_lft forever preferred_lft forever 

Look at Leaf01’s bridge table to make sure the MAC address is mapped to the port that you expect. Cross reference it with LLDP:

cumulus@server01:~$ ip address show
 ...
 5: uplink: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
      link/ether 44:38:39:00:00:32 brd ff:ff:ff:ff:ff:ff
      inet 10.1.10.101/24 scope global uplink
      valid_lft forever preferred_lft forever
      inet6 fe80::4638:39ff:fe00:32/64 scope link
      valid_lft forever preferred_lft forever
 Look at Leaf01’s bridge table to make sure the MAC address is mapped to the port that you expect. Cross reference it with LLDP:
  
 cumulus@leaf01:mgmt:~$ net show bridge macs
  
 VLAN       Master  Interface  MAC                TunnelDest  State     Flags            LastSeen
 --------  ------  ---------  -----------------  ----------  ---------  ------------------  --------
 ...
 10         bridge  bond1   46:38:39:00:00:32                                           <1 sec
  
  
 cumulus@leaf01:mgmt:~$ net show lldp
  
 LocalPort  Speed  Mode     RemoteHost            RemotePort
 ---------  -----  ----------  -------------------  -----------------
 eth0       1G   Mgmt       oob-mgmt-switch       swp10
 swp1       1G   BondMember  server01.simulation  44:38:39:00:00:32
 swp2       1G   BondMember  server02             44:38:39:00:00:34
 swp3       1G   BondMember  server03             44:38:39:00:00:36
 swp49      1G   BondMember  leaf02               swp49
 swp50      1G   BondMember  leaf02               swp50
 swp51      1G   Default    spine01               swp1
 swp52      1G   Default    spine02               swp1
 swp53      1G   Default    spine03               swp1
 swp54      1G   Default    spine04               swp1
 Checking the ARP table, you can validate that the MAC and IP addresses are mapped correctly.
  
 cumulus@leaf01:mgmt:~$ net show neighbor
 Neighbor                   MAC             Interface       AF   STATE
 -------------------------  -----------------  -------------  ----  ---------
 ...
 10.1.10.101                44:38:39:00:00:32  vlan10        IPv4  REACHABLE
 ... 

Now that you’ve checked the basics, start looking at how this gets pulled into EVPN. Validate the local VNIs that are configured:

cumulus@leaf01:mgmt:~$ net show evpn vni
 VNI        Type VxLAN IF              # MACs   # ARPs   # Remote VTEPs  Tenant VRF
 20         L2   vni20                 9     2     1               RED
 30         L2   vni30                 10    2     1               BLUE
 10         L2   vni10                 11    4     1               RED
 4001       L3   vniRED                2     2     n/a             RED
 4002       L3   vniBLUE               1     1     n/a             BLUE 

Because you validated that server01 is mapped to vlan10 as per the bridge mac table, you now check if the IP neighbor entries are being pulled into the EVPN cache. This cache describes the information that is being exchanged with the other EVPN speakers in the environment.

cumulus@leaf01:mgmt:~$ net show evpn arp-cache vni 10
 Number of ARPs (local and remote) known for this VNI: 4
 Flags: I=local-inactive, P=peer-active, X=peer-proxy
 Neighbor              Type   Flags State    MAC             Remote ES/VTEP              Seq #'s
 ...
 10.1.10.101           local      active   44:38:39:00:00:32                             0/0
 10.1.10.104                 remote             active   44:38:39:00:00:3e 10.0.1.34                  

Here’s what you’ve got so far. The L2 connectivity works correctly as the L2 bridge table and L3 neighbor table are populated locally on leaf01. Next, you verified that the mac and IP information are being properly pulled into EVPN through the EVPN ARP cache.

Using this information, you can check the RD and RT mapping so that you can learn more about the full VNI advertisement.

An RD is a route distinguisher. It’s used to disambiguate EVPN routes in different VNIs, as they may have the same MAC or IP address.

The RTs are route targets. They are used to describe the VPN membership for the route, specifically which VRFs are exporting and importing the different routes in the infrastructure.

 cumulus@leaf01:mgmt:~$ net show bgp l2vpn evpn vni
 Advertise Gateway Macip: Disabled
 Advertise SVI Macip: Disabled
 Advertise All VNI flag: Enabled
 BUM flooding: Head-end replication
 Number of L2 VNIs: 3
 Number of L3 VNIs: 2
 Flags: * - Kernel
   VNI      Type RD                    Import RT                  Export RT                  Tenant VRF
 * 20       L2   10.10.10.1:2          65101:20                   65101:20              RED
 * 30       L2   10.10.10.1:4          65101:30                   65101:30              BLUE
 * 10       L2   10.10.10.1:3          65101:10                   65101:10              RED
 * 4001     L3   10.10.10.1:5          65101:4001                 65101:4001            RED
 * 4002     L3   10.10.10.1:6          65101:4002                 65101:4002            BLUE 

Because the local L2 VNI has RD 10.255.255.11:2, the RD is essentially an identifier for all routes that are exchanged by this node. When looking elsewhere in the fabric, you use that information to see all the routes advertised by leaf01.

 cumulus@leaf01:mgmt:~$ net show bgp l2vpn evpn route rd 10.10.10.1:3
 EVPN type-1 prefix: [1]:[ESI]:[EthTag]:[IPlen]:[VTEP-IP]
 EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]
 EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
 EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP]
 EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]         
                
 BGP routing table entry for 10.10.10.1:3:UNK prefix
 Paths: (1 available, best #1)
   Advertised to non peer-group peers:
   leaf02(peerlink.4094) spine01(swp51) spine02(swp52) spine03(swp53) spine04(swp54)
   Route [2]:[0]:[48]:[44:38:39:00:00:32] VNI 10/4001
   Local
      10.0.1.12 from 0.0.0.0 (10.10.10.1)
      Origin IGP, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (First path received)
      Extended Community: ET:8 RT:65101:10 RT:65101:4001 Rmac:44:38:39:be:ef:aa
      Last update: Tue May 18 11:41:45 2021
 BGP routing table entry for 10.10.10.1:3:UNK prefix
 Paths: (1 available, best #1)
   Advertised to non peer-group peers:
   leaf02(peerlink.4094) spine01(swp51) spine02(swp52) spine03(swp53) spine04(swp54)
   Route [2]:[0]:[48]:[44:38:39:00:00:32]:[32]:[10.1.10.101] VNI 10/4001
   Local
      10.0.1.12 from 0.0.0.0 (10.10.10.1)
      Origin IGP, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (First path received)
      Extended Community: ET:8 RT:65101:10 RT:65101:4001 Rmac:44:38:39:be:ef:aa
      Last update: Tue May 18 11:44:38 2021
  
 ....
  
 Displayed 8 prefixes (8 paths) with this RD 

Here’s an important piece of information. There are two different forms that a type 2 route can take. In this case, you’re sending each of the two types:

  • Type 2 MAC Route: It only includes a 48-byte MAC entry. This entry is pulled in directly from the bridge table and only has L2 information in it. Anytime a MAC address is learned in the bridge table, that MAC address is pulled into EVPN as a type 2 MAC route.
               
  • Type 2 MAC/IP Route: These entries are pulled into EVPN from the ARP table. Reading this entry, the first section includes MAC address and the second one is a mapping for the IP address and mask. The mask for the IP address is a /32. As this is pulled from the ARP table, all EVPN routes are pulled in as host routes.
 BGP routing table entry for 10.10.10.1:3:UNK prefix
 ...
   Route [2]:[0]:[48]:[44:38:39:00:00:32] VNI 10/4001
 …
  
 BGP routing table entry for 10.10.10.1:3:UNK prefix
 ...
   Route [2]:[0]:[48]:[44:38:39:00:00:32]:[32]:[10.1.10.101] VNI 10/4001
 ... 

Using this information, you can validate that this /32 host route for server01 is in the routing table of leaf03 as a pure L3 route, pointing out to the L3VNI.

 cumulus@leaf01:mgmt:~$ net show route vrf RED
 show ip route vrf RED
 ======================
 Codes: K - kernel route, C - connected, S - static, R - RIP,
      O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
      T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
      F - PBR, f - OpenFabric,
      > - selected route, * - FIB route, q - queued, r - rejected, b - backup
      t - trapped, o - offload failure
  
 VRF RED:
 K>* 0.0.0.0/0 [255/8192] unreachable (ICMP unreachable), 00:18:17
 C * 10.1.10.0/24 [0/1024] is directly connected, vlan10-v0, 00:18:17
 C>* 10.1.10.0/24 is directly connected, vlan10, 00:18:17
 B>* 10.1.10.104/32 [20/0] via 10.0.1.34, vlan4001 onlink, weight 1, 00:18:05
 C * 10.1.20.0/24 [0/1024] is directly connected, vlan20-v0, 00:18:17
 C>* 10.1.20.0/24 is directly connected, vlan20, 00:18:17
 B>* 10.1.30.0/24 [20/0] via 10.0.1.255, vlan4001 onlink, weight 1, 00:18:04 

Spend some time dissecting this output. The neighbor entry in Leaf01 for Server01 has made it all the way to Leaf03 as a /32 host route where the next hop is leaf01 but through the L3VNI.

To validate that the connection between the L2 VNI and the L3 VNI are accomplished successfully, examine the L3 VNI:

 cumulus@leaf01:mgmt:~$ net show evpn vni 4001
 VNI: 4001
   Type: L3
   Tenant VRF: RED
   Local Vtep Ip: 10.0.1.12
   Vxlan-Intf: vniRED
   SVI-If: vlan4001
   State: Up
   VNI Filter: none
   System MAC: 44:38:39:be:ef:aa
   Router MAC: 44:38:39:be:ef:aa
   L2 VNIs: 10 20          

In this output, the L3 VNI of 4001 is mapped to VRF RED, which you validated in the output of net show evpn vni 10. Using this, you also can see that VNI 10 is mapped to VRF 4001 through VLAN 4001. All the outputs that you’re seeing are lining up to indicate that you have a full working EVPN Type 2 VXLAN infrastructure.

Summary

There you have it. From start to finish, you saw how EVPN works for Type 2–based routes. Specifically, I discussed the different EVPN message types and how control planes converge in an L2 extension environment. It’s not witchcraft, just good technology.

For more information about extending the EVPN control plane demystification and tackling the traffic flows around Type 5 messages and VXLAN routing, see [LINK]. If you haven’t already, I highly recommend trying this out for yourself with NVIDIA Cumulus in the Cloud. If you’d like to take a deeper dive, we’ve put together a hub of EVPN content, from whitepapers to videos.

Discuss (1)

Tags