E-LAN (VPLS)
E-LAN is a simulated MP2MP (multipoint-to-multipoint) service. The service provider infrastructure acts as one large layer 2 switch, providing a simulated broadcast domain.
Why is it MP2MP and not P2MP? This is because no single port is a root. Every port can send to every other port in MP2MP (”any to any”). If there was a single root, or hub, that can send to all spokes, that is P2MP.
The technology we will focus on which enables this is called VPLS (Virtual Private LAN Service).
In many ways, VPLS is very similar to AToM, in that it is a L2VPN service which uses two labels: a transport label which represents the egress PE, and a service label which represents the VC ID.
Another similarity to AToM is that VPLS uses targeted LDP. But because there are many egress PEs now, and not just one (as in xconnect), there is a single LDP session with every other PE. This means there is a full mesh of LDP neighborships for all PEs participating in the LDP. (VPLS can also use BGP for autodiscovery and service label signaling. We will explore this in a future article.)
The main difference between VPLS and AToM, or “xconnect,” is that the routers must preform bridging functions to simulate a LAN. This involves:
Learning MAC addresses, and aging them out based on a timer
Forwarding frames based on the destination MAC address
Flooding BUM traffic (broadcast, unknown unicast, and multicast)
Loop prevention
With AToM, the router simply transports every frame it receives to the remote egress PE. No bridging/switching functions are needed.
So how does the router know all the other PEs in the VPLS domain in order to form targeted LDP sessions with them? There are two ways: manual configuration of every neighbor PE, or autodiscovery of neighbor PEs using MP-BGP. In this article we will use manual configuration for simplicity in order to focus on the VPLS technology itself, and then add autodiscovery via MP-BGP later.
Lab
Here is our topology:
This service should appear as a giant L2 switch to the customer:
The service provider routers are pre-configured with loopback IPs, LDP, and OSPF.
Here is the VPLS configuration:
PE1
interface GigabitEthernet2
service instance 100 ethernet
encapsulation default
!
l2vpn vfi context CUSTOMER_VPLS
vpn id 100
member 2.2.2.2 encapsulation mpls
member 3.3.3.3 encapsulation mpls
!
bridge-domain 1
member GigabitEthernet2 service-instance 100
member vfi CUSTOMER_VPLS
PE2
interface GigabitEthernet2
service instance 100 ethernet
encapsulation default
!
l2vpn vfi context CUSTOMER_VPLS
vpn id 100
member 1.1.1.1 encapsulation mpls
member 3.3.3.3 encapsulation mpls
!
bridge-domain 1
member GigabitEthernet2 service-instance 100
member vfi CUSTOMER_VPLS
PE3
interface GigabitEthernet2
service instance 100 ethernet
encapsulation default
!
l2vpn vfi context CUSTOMER_VPLS
vpn id 100
member 1.1.1.1 encapsulation mpls
member 2.2.2.2 encapsulation mpls
!
bridge-domain 2
member GigabitEthernet2 service-instance 100
member vfi CUSTOMER_VPLS
vfi stands for virtual forwarding instance.
The bridge-domain is needed to group together physical ports and virtual circuits (which are all defined in the referenced VFI). The bridge-domain creates the switching functionality, allowing the PE to learn MAC addresses.
You may have noticed that on PE3 I used bridge-domain 2 instead of bridge-domain 1. This is to demonstrate that the bridge-domain number is locally signficant. It is not a VLAN ID.
To expand on this concept, imagine this switch configuration:
switch#show run | sec vlan 100
vlan 100
name my-vlan
int gi0/0
switchport access vlan 100
int gi0/1
switchport access vlan 100
int gi0/48
switchport mode trunk
switchport trunk allowed vlan add 100
On the switch, “vlan 100” creates the vlan, and the ports gi0/0 and gi0/1 belong to the vlan. If frames from gi0/0 or gi0/1 are transported over the trunk on gi0/48, an 802.1q tag of 100 is added. So the “100” number both is used to define a domain of bridging, and also defines the specific tag to insert on traffic over a trunk port.
However on the router running VPLS, “bridge-domain 2” does not define a tag to impose on frames. The number 2 is simply used to be able to create multiple, separate bridge-domains and differentiate them from each other.
Some people refer to this concept as a separation of the VLAN tag and flooding domain. The tag on frames from the CE can be anything - double tagged, single tagged, or not tagged at all. You can pop tags, push tags and translate tags. All of this is done on the service instance. The flooding domain is defined by the bridge-domain and has nothing to do with what tag the frames happen to have.
Moving on, the service instance number is also locally signficant, and does not need to match (even though it does in my example).
The only thing that needs to be consistent on all PEs in the VPLS domain is the vpn id. This is similar to the pw-id or xconnect ID that we saw with the psuedowire service.
Flooding Behaviour
With the psuedowire/xconnect service, there was no consideration for flooding behaviour. No matter what the layer 2 payload was, the PE simply encapsulated the frame and transported it to the other end.
However with VPLS, the service provider network has to act as a switch. BUM traffic must be flooded without causing loops, and unicast traffic must be delivered to the correct PE without flooding.
If the core routers were switches, the topology would look like this:
You might wonder, why is each PE directly connected now in this diagram, when there was P4 in the middle and each PE was connected to P4 in the first diagram?
Think of the IP/MPLS network as an underlay, and the VCs forming an overlay.
By manually specifying each neighbor PE in the l2vpn vfi definition, it creates a pseudowire, or VC (virtual circuit) for each neighbor. If PE1 knows that MAC X is behind PE2, it should encapsulate the traffic with an MPLS header for PE2’s loopback. If PE1 knows that MAC Y is behind PE3, it should encapsulate the traffic with a header for PE3’s loopback. In this way, you should think of the overlay topolgy as the diagram above, with each PE directly connected to each other.
So if the CE routers were switches, how would they prevent broadcast storms? You would likely have STP running and one port would be Blocked, as such:
But it would be pretty ugly to run STP over MPLS pseudowires, right? Instead there is a split-horizon rule, which states the following: If a BUM frame is received on a VC, then the router must not forward the frame out any other VC.
Here is an example. The dotted lines are VCs (virtual circuits) and the solid lines are ACs (access circuits).
CE1 originates a broadcast frame
PE1 receives it on an AC (attachment circuit) and therefore forwards it out all ports, including all VCs
PE2 and PE3 receive the frame on a VC, so they only forward out all ACs. They cannot forward a copy out any other VCs.
As part of this process, PE2 and PE3 learn the MAC address aaaa.aaaa.0001 via the pseudowire to PE1, if they have not learned the MAC address already.
This split-horizon behaviour is on by default. To turn it off you would configure the l2vpn vfi like this:
PE1#
l2vpn vfi context CUSTOMER_VPLS
vpn id 100
member 3.3.3.3 encapsulation mpls no-split-horizon
member 2.2.2.2 encapsulation mpls no-split-horizon
This is actually a technique for creating a P2MP service, known as E-Tree. We will get to this later, but for now, know that you want split-horizon turned on when using VPLS, and it is on by default.
VC setup
Let’s examine pcaps of the LDP mapping messages to see how each PE learns the other neighbor PEs labels to use for each VC for this VPLS service.
R1’s mapping message:
R2’s mapping message:
R3’s mapping message:
To my eye, I cannot see any difference between these mapping messages and the mapping messages for the xconnects. This means that the control plane is exactly the same for xconnects as for VPLS. Essentially, the control plane for VPLS is a full mesh of pseudowires. The difference is the data plane, and how the router inspects the layer two traffic and makes a forwarding/switching decision.
Verification
The show command with verbose output that we used for xconnects also works well for VPLS. This shows the label stack, stats on packet totals, LDP status, and more.
PE1#show mpls l2transport vc detail
Local interface: VFI CUSTOMER_VPLS vfi up
Interworking type is Ethernet
Destination address: 2.2.2.2, VC ID: 100, VC status: up
Output interface: Gi1, imposed label stack {17 21}
Preferred path: not configured
Default path: active
Next hop: 10.1.4.4
Create time: 01:22:02, last status change time: 00:11:40
Last label FSM state change time: 00:12:43
Signaling protocol: LDP, peer 2.2.2.2:0 up
Targeted Hello: 1.1.1.1(LDP Id) -> 2.2.2.2, LDP is UP
Graceful restart: not configured and not enabled
Non stop routing: not configured and not enabled
Status TLV support (local/remote) : enabled/supported
LDP route watch : enabled
Label/status state machine : established, LruRru
Last local dataplane status rcvd: No fault
Last BFD dataplane status rcvd: Not sent
Last BFD peer monitor status rcvd: No fault
Last local AC circuit status rcvd: No fault
Last local AC circuit status sent: No fault
Last local PW i/f circ status rcvd: No fault
Last local LDP TLV status sent: No fault
Last remote LDP TLV status rcvd: No fault
Last remote LDP ADJ status rcvd: No fault
MPLS VC labels: local 21, remote 21
Group ID: local n/a, remote 0
MTU: local 1500, remote 1500
Remote interface description:
Sequencing: receive disabled, send disabled
Control Word: On (configured: autosense)
SSO Descriptor: 2.2.2.2/100, local label: 21
Dataplane:
SSM segment/switch IDs: 8200/8194 (used), PWID: 1
VC statistics:
transit packet totals: receive 84, send 91
transit byte totals: receive 9763, send 10947
transit packet drops: receive 79, seq error 0, send 0
Local interface: VFI CUSTOMER_VPLS vfi up
Interworking type is Ethernet
Destination address: 3.3.3.3, VC ID: 100, VC status: up
Output interface: Gi1, imposed label stack {18 21}
Preferred path: not configured
Default path: active
Next hop: 10.1.4.4
Create time: 01:01:18, last status change time: 00:12:38
Last label FSM state change time: 00:12:38
Signaling protocol: LDP, peer 3.3.3.3:0 up
Targeted Hello: 1.1.1.1(LDP Id) -> 3.3.3.3, LDP is UP
Graceful restart: not configured and not enabled
Non stop routing: not configured and not enabled
Status TLV support (local/remote) : enabled/supported
LDP route watch : enabled
Label/status state machine : established, LruRru
Last local dataplane status rcvd: No fault
Last BFD dataplane status rcvd: Not sent
Last BFD peer monitor status rcvd: No fault
Last local AC circuit status rcvd: No fault
Last local AC circuit status sent: No fault
Last local PW i/f circ status rcvd: No fault
Last local LDP TLV status sent: No fault
Last remote LDP TLV status rcvd: No fault
Last remote LDP ADJ status rcvd: No fault
MPLS VC labels: local 22, remote 21
Group ID: local n/a, remote 0
MTU: local 1500, remote 1500
Remote interface description:
Sequencing: receive disabled, send disabled
Control Word: On (configured: autosense)
SSO Descriptor: 3.3.3.3/100, local label: 22
Dataplane:
SSM segment/switch IDs: 16393/4101 (used), PWID: 2
VC statistics:
transit packet totals: receive 2, send 84
transit byte totals: receive 198, send 10075
transit packet drops: receive 2, seq error 0, send 0
Let’s prove that MTU on the AC must match on all PEs. With an xconnect, the MTU of the interface itself is signaled. But with VPLS, you might have multiple ACs on the PE that belong to the same VPLS domain. So where is the MTU configured?
The answer is that the MTU is configured under the VFI, and is by default 1500.
Let’s change it to 1600 on PE3 to see what happens.
PE3#show run | sec l2vpn
l2vpn vfi context CUSTOMER_VPLS
vpn id 100
mtu 1600
member 2.2.2.2 encapsulation mpls
member 1.1.1.1 encapsulation mpls
Now on PE1, we see the VC as down.
PE1#show mpls l2transport vc
Local intf Local circuit Dest address VC ID Status
------------- -------------------------- --------------- ---------- ----------
VFI CUSTOMER_VPLS \
vfi 2.2.2.2 100 UP
VFI CUSTOMER_VPLS \
vfi 3.3.3.3 100 DOWN
PE1#show mpls l2transport vc destination 3.3.3.3 detail
Local interface: VFI CUSTOMER_VPLS vfi up
Interworking type is Ethernet
Destination address: 3.3.3.3, VC ID: 100, VC status: down
Last error: Pseudowire MTU mismatch with peer
<snip>
Let’s put the MTU back to its default of 1500.
PE3(config)#l2vpn vfi context CUSTOMER_VPLS
PE3(config-vfi)#mtu 1500
CE Configuration
Let’s configure all CEs in the same subnet and enable OSPF. Remember to leave the OSPF network type as the default (broadcast).
CE1#
interface GigabitEthernet0/0
ip address 10.1.1.1 255.255.255.0
!
int lo0
ip address 1.1.1.1 255.255.255.255
!
router ospf 1
network 0.0.0.0 255.255.255.255 area 0
CE2#
interface GigabitEthernet0/0
ip address 10.1.1.2 255.255.255.0
!
int lo0
ip address 2.2.2.2 255.255.255.255
!
router ospf 1
network 0.0.0.0 255.255.255.255 area 0
CE3#
interface GigabitEthernet0/0
ip address 10.1.1.3 255.255.255.0
!
int lo0
ip address 3.3.3.3 255.255.255.255
!
router ospf 1
network 0.0.0.0 255.255.255.255 area 0
All routers see each other as OSPF neighbors
CE1#show ip ospf nei
Neighbor ID Pri State Dead Time Address Interface
2.2.2.2 1 FULL/DROTHER 00:00:37 10.1.1.2 GigabitEthernet0/0
3.3.3.3 1 FULL/DR 00:00:36 10.1.1.3 GigabitEthernet0/0
“Routers as switches”
Now that our CEs are functional, lets explore how the PE routers preform switching functions.
First we explore the CAM table on PE1:
PE1#show bridge-domain
Bridge-domain 1 (3 ports in all)
State: UP Mac learning: Enabled
Aging-Timer: 300 second(s)
Maximum address limit: 65536
GigabitEthernet2 service instance 100
vfi CUSTOMER_VPLS neighbor 2.2.2.2 100
vfi CUSTOMER_VPLS neighbor 3.3.3.3 100
AED MAC address Policy Tag Age Pseudoport
0 5254.000F.1800 forward dynamic 296 GigabitEthernet2.EFP100
0 5254.0013.4581 forward dynamic 295 CUSTOMER_VPLS.404012
0 5254.0016.7133 forward dynamic 296 CUSTOMER_VPLS.404011
Here we see three MAC addresses have been learned. As you may have noticed, the outgoing port for a MAC address can either be a physical port or a pseudoport.
The first MAC address is known via Gi2 on EFP100, which is service instance 100. (EFP = Ethernet Flow Point, which is what is created by the service instance).
The two other MAC addresess are known via the VFI named CUSTOMER_VPLS. But what is 404012 and 404011? These must be the two VCs between PE1 ↔ PE2, and PE1 ↔PE3. But which is which?
Using the somewhat obtuse command below, we can see which pseudoport belongs to which neighbor:
PE1#show platform software ethernet fp active vfi
Total number of VFI neighbors: 2
VFI-name BD Peer-IP-Address VC-ID ShGrp
-----------------------------------------------------------------
CUSTOMER_VPLS.404011 1 2.2.2.2 100 1
CUSTOMER_VPLS.404012 1 3.3.3.3 100 1
From this information, we can see that the MAC ending in 4581 is known via 3.3.3.3 (PE3), and the MAC ending in 7133 is known via 2.2.2.2 (PE2).
Fixing MTU
The customer sends you a ticket stating that they cannot send packets larger than 1474 bytes. They include this output:
CE1#ping 2.2.2.2 source 1.1.1.1 size 1474 df-bit
Type escape sequence to abort.
Sending 5, 1474-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
Packet sent with the DF bit set
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 6/8/14 ms
CE1#ping 2.2.2.2 source 1.1.1.1 size 1475 df-bit
Type escape sequence to abort.
Sending 5, 1475-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
Packet sent with the DF bit set
.....
Success rate is 0 percent (0/5)
Everything in the lab is at the default MTU of 1500. Why can the customer only send 1474 byte sized packets?
This is because 1474 + 14 (ethernet header) + 4 (control word) + 4 (MPLS top label) + 4 (MPLS bottom label) = 1500
Let’s set the MTU in the MPLS core to the minimum value which will allow the customer to use 1500 byte packets but nothing larger than 1500. Let’s leave the control word on, unlike last time with the xconnect, where we turned the control word off.
What value should we use on our core interfaces?
1500 + 14 (ethernet) + 4 (control word) + 4 (transport label)+ 4 (service label) = 1526
PE1(config)#int gi1
PE1(config-if)#mtu 1526
PE2(config)#int gi1
PE2(config-if)#mtu 1526
PE3(config)#int gi1
PE3(config-if)#mtu 1526
P4(config)#int range gi1-3
P4(config-if-range)#mtu 1526
Now 1500 byte size pings work:
CE1#ping 2.2.2.2 source 1.1.1.1 size 1500 df-bit
Type escape sequence to abort.
Sending 5, 1500-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
Packet sent with the DF bit set
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 3/5/9 ms
CE1#ping 2.2.2.2 source 1.1.1.1 size 1501 df-bit
Type escape sequence to abort.
Sending 5, 1501-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
Packet sent with the DF bit set
.....
Success rate is 0 percent (0/5)
Futher Reading
Luc De Ghein, MPLS Fundamentals, Ch. 11 VPLS
Last updated