E-LAN (VPLS)

E-LAN is a simulated MP2MP (multipoint-to-multipoint) service. The service provider infrastructure acts as one large layer 2 switch, providing a simulated broadcast domain.

Why is it MP2MP and not P2MP? This is because no single port is a root. Every port can send to every other port in MP2MP (”any to any”). If there was a single root, or hub, that can send to all spokes, that is P2MP.

The technology we will focus on which enables this is called VPLS (Virtual Private LAN Service).

In many ways, VPLS is very similar to AToM, in that it is a L2VPN service which uses two labels: a transport label which represents the egress PE, and a service label which represents the VC ID.

Another similarity to AToM is that VPLS uses targeted LDP. But because there are many egress PEs now, and not just one (as in xconnect), there is a single LDP session with every other PE. This means there is a full mesh of LDP neighborships for all PEs participating in the LDP. (VPLS can also use BGP for autodiscovery and service label signaling. We will explore this in a future article.)

The main difference between VPLS and AToM, or “xconnect,” is that the routers must preform bridging functions to simulate a LAN. This involves:

  • Learning MAC addresses, and aging them out based on a timer

  • Forwarding frames based on the destination MAC address

  • Flooding BUM traffic (broadcast, unknown unicast, and multicast)

  • Loop prevention

With AToM, the router simply transports every frame it receives to the remote egress PE. No bridging/switching functions are needed.

So how does the router know all the other PEs in the VPLS domain in order to form targeted LDP sessions with them? There are two ways: manual configuration of every neighbor PE, or autodiscovery of neighbor PEs using MP-BGP. In this article we will use manual configuration for simplicity in order to focus on the VPLS technology itself, and then add autodiscovery via MP-BGP later.

Lab

Here is our topology:

This service should appear as a giant L2 switch to the customer:

The service provider routers are pre-configured with loopback IPs, LDP, and OSPF.

Here is the VPLS configuration:

PE1

interface GigabitEthernet2
 service instance 100 ethernet
  encapsulation default 
!
l2vpn vfi context CUSTOMER_VPLS 
 vpn id 100
 member 2.2.2.2 encapsulation mpls
 member 3.3.3.3 encapsulation mpls
!
bridge-domain 1 
 member GigabitEthernet2 service-instance 100
 member vfi CUSTOMER_VPLS

PE2

interface GigabitEthernet2
 service instance 100 ethernet
  encapsulation default
!
l2vpn vfi context CUSTOMER_VPLS 
 vpn id 100
 member 1.1.1.1 encapsulation mpls
 member 3.3.3.3 encapsulation mpls
!
bridge-domain 1 
 member GigabitEthernet2 service-instance 100
 member vfi CUSTOMER_VPLS

PE3

interface GigabitEthernet2
 service instance 100 ethernet
  encapsulation default
!
l2vpn vfi context CUSTOMER_VPLS 
 vpn id 100
 member 1.1.1.1 encapsulation mpls
 member 2.2.2.2 encapsulation mpls
!
bridge-domain 2
 member GigabitEthernet2 service-instance 100
 member vfi CUSTOMER_VPLS

vfi stands for virtual forwarding instance.

The bridge-domain is needed to group together physical ports and virtual circuits (which are all defined in the referenced VFI). The bridge-domain creates the switching functionality, allowing the PE to learn MAC addresses.

You may have noticed that on PE3 I used bridge-domain 2 instead of bridge-domain 1. This is to demonstrate that the bridge-domain number is locally signficant. It is not a VLAN ID.

To expand on this concept, imagine this switch configuration:

switch#show run | sec vlan 100
vlan 100
 name my-vlan
int gi0/0
 switchport access vlan 100
int gi0/1
 switchport access vlan 100
int gi0/48
 switchport mode trunk
 switchport trunk allowed vlan add 100

On the switch, “vlan 100” creates the vlan, and the ports gi0/0 and gi0/1 belong to the vlan. If frames from gi0/0 or gi0/1 are transported over the trunk on gi0/48, an 802.1q tag of 100 is added. So the “100” number both is used to define a domain of bridging, and also defines the specific tag to insert on traffic over a trunk port.

However on the router running VPLS, “bridge-domain 2” does not define a tag to impose on frames. The number 2 is simply used to be able to create multiple, separate bridge-domains and differentiate them from each other.

Some people refer to this concept as a separation of the VLAN tag and flooding domain. The tag on frames from the CE can be anything - double tagged, single tagged, or not tagged at all. You can pop tags, push tags and translate tags. All of this is done on the service instance. The flooding domain is defined by the bridge-domain and has nothing to do with what tag the frames happen to have.

Moving on, the service instance number is also locally signficant, and does not need to match (even though it does in my example).

The only thing that needs to be consistent on all PEs in the VPLS domain is the vpn id. This is similar to the pw-id or xconnect ID that we saw with the psuedowire service.

Flooding Behaviour

With the psuedowire/xconnect service, there was no consideration for flooding behaviour. No matter what the layer 2 payload was, the PE simply encapsulated the frame and transported it to the other end.

However with VPLS, the service provider network has to act as a switch. BUM traffic must be flooded without causing loops, and unicast traffic must be delivered to the correct PE without flooding.

If the core routers were switches, the topology would look like this:

You might wonder, why is each PE directly connected now in this diagram, when there was P4 in the middle and each PE was connected to P4 in the first diagram?

Think of the IP/MPLS network as an underlay, and the VCs forming an overlay.

By manually specifying each neighbor PE in the l2vpn vfi definition, it creates a pseudowire, or VC (virtual circuit) for each neighbor. If PE1 knows that MAC X is behind PE2, it should encapsulate the traffic with an MPLS header for PE2’s loopback. If PE1 knows that MAC Y is behind PE3, it should encapsulate the traffic with a header for PE3’s loopback. In this way, you should think of the overlay topolgy as the diagram above, with each PE directly connected to each other.

So if the CE routers were switches, how would they prevent broadcast storms? You would likely have STP running and one port would be Blocked, as such:

But it would be pretty ugly to run STP over MPLS pseudowires, right? Instead there is a split-horizon rule, which states the following: If a BUM frame is received on a VC, then the router must not forward the frame out any other VC.

Here is an example. The dotted lines are VCs (virtual circuits) and the solid lines are ACs (access circuits).

  • CE1 originates a broadcast frame

  • PE1 receives it on an AC (attachment circuit) and therefore forwards it out all ports, including all VCs

  • PE2 and PE3 receive the frame on a VC, so they only forward out all ACs. They cannot forward a copy out any other VCs.

  • As part of this process, PE2 and PE3 learn the MAC address aaaa.aaaa.0001 via the pseudowire to PE1, if they have not learned the MAC address already.

This split-horizon behaviour is on by default. To turn it off you would configure the l2vpn vfi like this:

PE1#
l2vpn vfi context CUSTOMER_VPLS 
 vpn id 100
 member 3.3.3.3 encapsulation mpls no-split-horizon
 member 2.2.2.2 encapsulation mpls no-split-horizon

This is actually a technique for creating a P2MP service, known as E-Tree. We will get to this later, but for now, know that you want split-horizon turned on when using VPLS, and it is on by default.

VC setup

Let’s examine pcaps of the LDP mapping messages to see how each PE learns the other neighbor PEs labels to use for each VC for this VPLS service.

R1’s mapping message:

R2’s mapping message:

R3’s mapping message:

To my eye, I cannot see any difference between these mapping messages and the mapping messages for the xconnects. This means that the control plane is exactly the same for xconnects as for VPLS. Essentially, the control plane for VPLS is a full mesh of pseudowires. The difference is the data plane, and how the router inspects the layer two traffic and makes a forwarding/switching decision.

Verification

The show command with verbose output that we used for xconnects also works well for VPLS. This shows the label stack, stats on packet totals, LDP status, and more.

PE1#show mpls l2transport vc detail
Local interface: VFI CUSTOMER_VPLS vfi up
  Interworking type is Ethernet
  Destination address: 2.2.2.2, VC ID: 100, VC status: up
    Output interface: Gi1, imposed label stack {17 21}
    Preferred path: not configured  
    Default path: active
    Next hop: 10.1.4.4
  Create time: 01:22:02, last status change time: 00:11:40
    Last label FSM state change time: 00:12:43
  Signaling protocol: LDP, peer 2.2.2.2:0 up
    Targeted Hello: 1.1.1.1(LDP Id) -> 2.2.2.2, LDP is UP
    Graceful restart: not configured and not enabled
    Non stop routing: not configured and not enabled
    Status TLV support (local/remote)   : enabled/supported
      LDP route watch                   : enabled
      Label/status state machine        : established, LruRru
      Last local dataplane   status rcvd: No fault
      Last BFD dataplane     status rcvd: Not sent
      Last BFD peer monitor  status rcvd: No fault
      Last local AC  circuit status rcvd: No fault
      Last local AC  circuit status sent: No fault
      Last local PW i/f circ status rcvd: No fault
      Last local LDP TLV     status sent: No fault
      Last remote LDP TLV    status rcvd: No fault
      Last remote LDP ADJ    status rcvd: No fault
    MPLS VC labels: local 21, remote 21 
    Group ID: local n/a, remote 0
    MTU: local 1500, remote 1500
    Remote interface description: 
  Sequencing: receive disabled, send disabled
  Control Word: On (configured: autosense)
  SSO Descriptor: 2.2.2.2/100, local label: 21
  Dataplane:
    SSM segment/switch IDs: 8200/8194 (used), PWID: 1
  VC statistics:
    transit packet totals: receive 84, send 91
    transit byte totals:   receive 9763, send 10947
    transit packet drops:  receive 79, seq error 0, send 0

Local interface: VFI CUSTOMER_VPLS vfi up
  Interworking type is Ethernet
  Destination address: 3.3.3.3, VC ID: 100, VC status: up
    Output interface: Gi1, imposed label stack {18 21}
    Preferred path: not configured  
    Default path: active
    Next hop: 10.1.4.4
  Create time: 01:01:18, last status change time: 00:12:38
    Last label FSM state change time: 00:12:38
  Signaling protocol: LDP, peer 3.3.3.3:0 up
    Targeted Hello: 1.1.1.1(LDP Id) -> 3.3.3.3, LDP is UP
    Graceful restart: not configured and not enabled
    Non stop routing: not configured and not enabled
    Status TLV support (local/remote)   : enabled/supported
      LDP route watch                   : enabled
      Label/status state machine        : established, LruRru
      Last local dataplane   status rcvd: No fault
      Last BFD dataplane     status rcvd: Not sent
      Last BFD peer monitor  status rcvd: No fault
      Last local AC  circuit status rcvd: No fault
      Last local AC  circuit status sent: No fault
      Last local PW i/f circ status rcvd: No fault
      Last local LDP TLV     status sent: No fault
      Last remote LDP TLV    status rcvd: No fault
      Last remote LDP ADJ    status rcvd: No fault
    MPLS VC labels: local 22, remote 21 
    Group ID: local n/a, remote 0
    MTU: local 1500, remote 1500
    Remote interface description: 
  Sequencing: receive disabled, send disabled
  Control Word: On (configured: autosense)
  SSO Descriptor: 3.3.3.3/100, local label: 22
  Dataplane:
    SSM segment/switch IDs: 16393/4101 (used), PWID: 2
  VC statistics:
    transit packet totals: receive 2, send 84
    transit byte totals:   receive 198, send 10075
    transit packet drops:  receive 2, seq error 0, send 0

Let’s prove that MTU on the AC must match on all PEs. With an xconnect, the MTU of the interface itself is signaled. But with VPLS, you might have multiple ACs on the PE that belong to the same VPLS domain. So where is the MTU configured?

The answer is that the MTU is configured under the VFI, and is by default 1500.

Let’s change it to 1600 on PE3 to see what happens.

PE3#show run | sec l2vpn
l2vpn vfi context CUSTOMER_VPLS 
 vpn id 100
 mtu 1600
 member 2.2.2.2 encapsulation mpls
 member 1.1.1.1 encapsulation mpls

Now on PE1, we see the VC as down.

PE1#show mpls l2transport vc 

Local intf     Local circuit              Dest address    VC ID      Status
-------------  -------------------------- --------------- ---------- ----------
VFI CUSTOMER_VPLS  \
               vfi                        2.2.2.2         100        UP        
VFI CUSTOMER_VPLS  \
               vfi                        3.3.3.3         100        DOWN

PE1#show mpls l2transport vc destination 3.3.3.3 detail 
Local interface: VFI CUSTOMER_VPLS vfi up
  Interworking type is Ethernet
  Destination address: 3.3.3.3, VC ID: 100, VC status: down
    Last error: Pseudowire MTU mismatch with peer
<snip>

Let’s put the MTU back to its default of 1500.

PE3(config)#l2vpn vfi context CUSTOMER_VPLS
PE3(config-vfi)#mtu 1500

CE Configuration

Let’s configure all CEs in the same subnet and enable OSPF. Remember to leave the OSPF network type as the default (broadcast).

CE1#
interface GigabitEthernet0/0
 ip address 10.1.1.1 255.255.255.0
!
int lo0
 ip address 1.1.1.1 255.255.255.255
!
router ospf 1
 network 0.0.0.0 255.255.255.255 area 0

CE2#
interface GigabitEthernet0/0
 ip address 10.1.1.2 255.255.255.0
!
int lo0
 ip address 2.2.2.2 255.255.255.255
!
router ospf 1
 network 0.0.0.0 255.255.255.255 area 0

CE3#
interface GigabitEthernet0/0
 ip address 10.1.1.3 255.255.255.0
!
int lo0
 ip address 3.3.3.3 255.255.255.255
!
router ospf 1
 network 0.0.0.0 255.255.255.255 area 0

All routers see each other as OSPF neighbors

CE1#show ip ospf nei

Neighbor ID     Pri   State           Dead Time   Address         Interface
2.2.2.2           1   FULL/DROTHER    00:00:37    10.1.1.2        GigabitEthernet0/0
3.3.3.3           1   FULL/DR         00:00:36    10.1.1.3        GigabitEthernet0/0

“Routers as switches”

Now that our CEs are functional, lets explore how the PE routers preform switching functions.

First we explore the CAM table on PE1:

PE1#show bridge-domain 
Bridge-domain 1 (3 ports in all)
State: UP                    Mac learning: Enabled
Aging-Timer: 300 second(s)
Maximum address limit: 65536
    GigabitEthernet2 service instance 100
    vfi CUSTOMER_VPLS neighbor 2.2.2.2 100
    vfi CUSTOMER_VPLS neighbor 3.3.3.3 100
   AED MAC address    Policy  Tag       Age  Pseudoport
   0   5254.000F.1800 forward dynamic   296  GigabitEthernet2.EFP100
   0   5254.0013.4581 forward dynamic   295  CUSTOMER_VPLS.404012
   0   5254.0016.7133 forward dynamic   296  CUSTOMER_VPLS.404011

Here we see three MAC addresses have been learned. As you may have noticed, the outgoing port for a MAC address can either be a physical port or a pseudoport.

The first MAC address is known via Gi2 on EFP100, which is service instance 100. (EFP = Ethernet Flow Point, which is what is created by the service instance).

The two other MAC addresess are known via the VFI named CUSTOMER_VPLS. But what is 404012 and 404011? These must be the two VCs between PE1 ↔ PE2, and PE1 ↔PE3. But which is which?

Using the somewhat obtuse command below, we can see which pseudoport belongs to which neighbor:

PE1#show platform software ethernet fp active vfi 
Total number of VFI neighbors: 2

VFI-name                BD    Peer-IP-Address  VC-ID       ShGrp   
-----------------------------------------------------------------
CUSTOMER_VPLS.404011    1     2.2.2.2          100         1       
CUSTOMER_VPLS.404012    1     3.3.3.3          100         1

From this information, we can see that the MAC ending in 4581 is known via 3.3.3.3 (PE3), and the MAC ending in 7133 is known via 2.2.2.2 (PE2).

Fixing MTU

The customer sends you a ticket stating that they cannot send packets larger than 1474 bytes. They include this output:

CE1#ping 2.2.2.2 source 1.1.1.1 size 1474 df-bit 
Type escape sequence to abort.
Sending 5, 1474-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1 
Packet sent with the DF bit set
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 6/8/14 ms

CE1#ping 2.2.2.2 source 1.1.1.1 size 1475 df-bit 
Type escape sequence to abort.
Sending 5, 1475-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1 
Packet sent with the DF bit set
.....
Success rate is 0 percent (0/5)

Everything in the lab is at the default MTU of 1500. Why can the customer only send 1474 byte sized packets?

This is because 1474 + 14 (ethernet header) + 4 (control word) + 4 (MPLS top label) + 4 (MPLS bottom label) = 1500

Let’s set the MTU in the MPLS core to the minimum value which will allow the customer to use 1500 byte packets but nothing larger than 1500. Let’s leave the control word on, unlike last time with the xconnect, where we turned the control word off.

What value should we use on our core interfaces?

1500 + 14 (ethernet) + 4 (control word) + 4 (transport label)+ 4 (service label) = 1526

PE1(config)#int gi1
PE1(config-if)#mtu 1526

PE2(config)#int gi1
PE2(config-if)#mtu 1526

PE3(config)#int gi1
PE3(config-if)#mtu 1526

P4(config)#int range gi1-3
P4(config-if-range)#mtu 1526

Now 1500 byte size pings work:

CE1#ping 2.2.2.2 source 1.1.1.1 size 1500 df-bit 
Type escape sequence to abort.
Sending 5, 1500-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1 
Packet sent with the DF bit set
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 3/5/9 ms

CE1#ping 2.2.2.2 source 1.1.1.1 size 1501 df-bit 
Type escape sequence to abort.
Sending 5, 1501-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1 
Packet sent with the DF bit set
.....
Success rate is 0 percent (0/5)

Futher Reading

https://www.cisco.com/c/en/us/td/docs/routers/asr920/configuration/guide/mpls/17-1-1/b-mp-l2-vpns-xe-17-1-asr920/b-mp-l2-vpns-xe-17-1-asr920_chapter_011.html

Luc De Ghein, MPLS Fundamentals, Ch. 11 VPLS

Last updated