MPLS-TE Fast Reroute (FRR)
Last updated
Last updated
Fast Reroute is a feature in which a backup path is pre-calculated to protect either a link or node from failure. If the link or node goes down, the router uses the pre-calculated backup path immediately, without having to rely on the IGP to converge and calculate a new best path. The number you will usually see for failover time is sub-50msec. Fast Reroute gaurantees under 50msecs of downtime, while IGP reconvergence alone cannot gaurantee this. The backup path is only used temporarily until the IGP calculates a new best path. Once the new best path is calculated, the tunnel switches over to this path.
To examine this feature we’ll re-use our lab from the previous series:
First we’ll remove all configured tunnels on R1 and XR5. We’ll setup a bidirectional TE tunnel between R1 and XR5 that uses the IGP shortest path of R1-R2-XR3-XR5. The tunnel will require 50M. For practice, try to setup the TE tunnel yourself without looking at the config below.
To verify the tunnel is up and the LSP works, let’s try a new method. Let’s enable MPLS OAM and do an mpls traceroute. MPLS OAM is already enabled on IOS-XE, but we’ll need to turn it on for XR3 and XR5. (If you would like to learn more about MPLS OAM be sure to check out the associated article on this website).
As a comparison, let’s remove the the autoroute announce and try the mpls traceroute again.
The request was not even sent out the interface, as seen by the code, Q. The 0th hop has no label, as the path in the CEF table to 5.5.5.5/32 is no longer through the tunnel interface. It is the IP forwarding path from the IGP.
Add autoroute announce back to the tun1 interface and we’ll continue on.
To appreciate FRR we first need to evaluate what happens when a link failure occurs and we do not have FRR turned on.
Imagine that Gi3 on R2 goes down. The interface needs to detect the failure, and then inform the IGP of the link failure. ISIS generates a new LSP for R2.00-00 that is flooded and indicates its interface facing XR3 is down. All routers then have to run SPF and calculate new best paths in response to this failure.
What happens to the TE tunnel? R2 sends a PathError message to R1 and XR5, for each of the tunnels that they are the headend for. Each headend then tears down the path and needs to calculate a new path. The new path is setup using a PATH message.
Let’s test this out by initiating an extended mpls ping from R1 to XR5, and shutting down the Gi3 interface on R2 while this ping is running.
In my lab I lose a single ping during this.
A pcap taken on the link connecting R1 and R2 shows the PATH ERROR from R2, the PATH TEAR from R1, and the new PATH message from R1 with the newly calculated path.
Here is the details of the PATH ERROR message that R2 sends to R1:
Let’s now configure FRR which will protect the Gi3 link on R2. FRR involves creating a second backup TE tunnel. If the protected link goes down, traffic will be immediately steered along the backup path. The node, R2 in this case, will still send a PATH ERROR message but it will indicate in this message that traffic is still working due to the backup tunnel. This gives the headend ample time to setup the new path and cutover to it.
All traffic is steered along this backup tunnel, even the RSVP messages themselves. If the backup tunnel stays up long enough, the primary tunnel will be refreshed and the RSVP messages will be tunneled through the backup tunnel. This can happen if the primary tunnel cannot find an alternate path after the link failure occurs. (Perhaps there is only a strict explict-path which uses the failed link).
The local node where you configure the backup tunnel is called the PLR (point of local repair). We will configure the backup tunnel on R2 because it is the local point where it will “repair” the link failure by routing around it. The PLR is the headend for the backup tunnel.
The backup tunnel destination will be XR3, as it is the nexthop along the main path. This is called the MP (merge point) because this is where the backup LSP merges back onto the main LSP. So R2 will setup a backup tunnel that takes the path R2-R4-XR3.
To do this we create a TE tunnel as usual, and simply add the command mpls traffic-eng backup-path interface to the link that we are protecting.
Notice that we do not use autoroute announce, as we don’t want to forward IP traffic over this LSP. We simply want to backup any tunnels that use Gi3.
We also need to setup a tunnel in the reverse direction, from XR3 to R2.
Additionally, on the main tunnels which are going to be protected, we have to enable fast-reroute. FRR protection is disabled by default. By enabling FRR protection, this will signal to other midpoint LSRs in the path to back this tunnel up, if they have a backup tunnel configured.
On R2 we can confirm that the tunnel for which R1 is the headend is “ready” to be protected by tun100:
On R2, shut down Gi3 and then immediately run the show command again. You should see the following output:
Within just a second, the original path will reconverge and the backup path will no longer be needed, as Gi3 is down and there is nothing to protect any longer:
Can you guess what traffic that takes the backup path looks like? How many labels are on packets sent from R2 to R4? And from R4 to XR3?
We’ll have to re-simulate the FRR again. Bring up Gi3 on R2 and then reoptimize the tunnel on R1.
You’ll know it worked if you see the backup tunnel “ready” to protect the R1 tunnel again:
Now I will take a pcap on the link between R2-R4 and R4-XR3, while running an extended ping from R1 to XR5, then shutting down Gi3.
The first ping that R4 sees from R2 has two labels, 18/24004
18 is the label that R4 advertised for the backup tunnel that R2 created. 24004 is the label that XR3 advertised to R2 for the primary tunnel that is currently being backedup. If R2 was still able to deliver the traffic over the Gi3 link, it would have the label 24004.
R4 pops the top label off, and leaves XR3 with 24004, which it will pop and deliver to XR5. To XR3, it is as if the primary tunnel was still up but just got routed along a different path. 24004 is the label for the primary tunnel.
Notice in the above pcap that there is a PATH message at frame 794. After R1 sets up the new path, traffic has a single label. The R1 tunnel has a brand new path of R1-R2-R4-XR3-XR5, and R4 simply sees the label that it advertised to R2 for this tunnel. The backup protection is no longer required.
After the new path is fully setup, R4 will only see label 20. I had to run a new ping for this, as the router will wait a while to ensure the new path is fully setup before switching off the backup path and onto the new path.
We lost 0 pings this time with FRR enabled. As you can see, about 4 pings were protected by the backup path before ISIS and RSVP fully converged and setup the new path.
In the previous link-protection example, we created a NHOP (next-hop) backup tunnel. The next-hop of the regular tunnel was the destination for the backup tunnel at the PLR. If the link went down, the PLR router needed to get the traffic to the NHOP.
In node protection FRR, we are protecting a node from going down completely. So now we cannot form a tunnel to the NHOP, as that router will be down. Instead we form a tunnel to the NNHOP (next next-hop). This is the next-hop’s next-hop. Confusing isn’t it? An example will surely help.
In order to better demonstrate this feature, we’ll add a node, R6, in between XR3 and XR5, and add a link between R4 and R6.
The tunnel will take the path R1-R2-XR3-R6-XR5 under normal conditions. We will protect XR3 from node failure. If it fails we will use the path R1-R2-R4-R6-XR5 to route around XR3.
Backup path which protects XR3 from failure.
XR3 is the node we are protecting. R2 is the PLR since its next-hop is XR3. The NNHOP would be XR3’s next-hop, which is R6. So the backup tunnel takes the path R2-R4-R6.
Let’s configure the topology with the new node, R6:
First we’ll ensure that the tunnel on R1 is taking the intended path.
There are two ECMP routes to 5.5.5.5 (R1-R2-XR3-R6-XR5 and R1-R2-R4-R6-R5). Let’s set R2’s Gi3 TE metric to 5 so that it is preferred over Gi2.
Next on the PLR, which is R2, we’ll create a node-protecting tunnel which will simply exclude XR3’s RID.
Notice that the backup tunnel is configured in the same manner as the link-protecting backup tunnel. The only difference is that the node-protecting tunnel excludes a node instead of an interface, and the node-protecting tunnel has a destination of the NNHOP instead of the NHOP. The same link failure of Gi3 will trigger the backup tunnel.
The tricky aspect of the node protecting tunnel is that R2 must learn R6’s label for the main (protected) tunnel and push that label onto the packet. In addition it pushes the backup tunnel label that the backup tunnel’s NHOP advertised, which in this case is R4.
R6’s label for the primary (R1) tunnel is 18. The instruction is to pop the label and deliver to XR5:
R4’s label for the backup tunnel200 is 20. The instruction is to pop the label and deliver to R6:
So if the Gi3 interface on R2 goes down, R2 should swap and impose a label. We should see packets with a label stack of 20/18 on the link connecting R2 to R4. Let’s run an extended ping on R1 to R5, shutdown Gi3 and capture traffic on Gi2.
Indeed this is the label stack that we see:
The question is, how did R2 learn R6’s label for the main tunnel without something like targeted LDP? The answer is that when tun1 on R1 was configured for fast-reroute, R1 sets the flag for Label recording: Desired in the RSVP PATH message. This tells every hop to record the list of labels in the RESV message. Normally only the NHOP’s label is visable in the RESV message.
The RRO (record route object) in the RESV message then shows each hop’s reserved label for this tunnel:
Using this, R2 can see that R6’s label is 18. Also note the phrase “The label will be understood if received on any interface.” This means that R6 will understand label 18 even if it is not received on the interface connecting to XR3. Pretty neat!
Now you’ve seen FRR for MPLS-TE in action. Link-protection uses a tunnel built to the NHOP that avoids the link you are protecting. Node-protection uses a tunnel built to the NNHOP that avoids the node you are protecting. In both cases, link failure triggers the activation of the backup tunnel.
As you can see, building backup tunnels like this is quite a manual process. I believe that back in the day, companies such as Cisco had software which would discover your IGP topology and automatically build these backup tunnels so that you did not have to.
These days you are much better off going with Segment Routing. With SR you can create loop-free alternate paths with 100% coverage of your network by using a single command. Additionally, SR does not keep state in the network. With MPLS-TE FRR here, we had to keep state for every single backup tunnel. With SR, the node that is the PLR simply pushes a label stack to steer traffic around a failed link or node. Similar to how R2 knew all routers labels for the R1 tun1 tunnel, in SR all nodes know all other nodes’ label via the IGP!