TI-LFA Pt. 1 (Theory)

To understand and appreciate TI-LFA, you first need to understand the fast reroute protocols that came before it. First there was LFA, which offered loop-free alternate routes but only when a directly connected neighbor on the alternate path won’t loop traffic back to the PLR. rLFA (Remote LFA) came next, which uses targeted LDP to discover the labels of PQ nodes that are not directly connected, which extended coverage that LFA could not provide. This first section will briefly explore the theory behind these protocols, and then we will take a look at TI-LFA.

Before we start, I want to mention that Nick Russo already has a series on this that is much better than anything I could produce. I highly recommend reading his series, which I will link below. This series you are reading now will be much more condensed.

Nick’s Series on LFA, rLFA, and TI-LFA:

Part 1: https://www.allhandsontech.com/it-ops/cisco/ip-loop-free-alternatives-withospfv2/

Part 2: https://www.allhandsontech.com/it-ops/cisco/exploring-downstream-and-node-protecting-ip-lfas-ospfv2/

Part 3: https://www.allhandsontech.com/it-ops/cisco/examining-broadcast-disjointedness-and-ip-lfa-coverage-with-ospfv2/

Part 4: https://www.allhandsontech.com/it-ops/cisco/improving-ip-lfa-coverage-using-remote-lfa-with-mpls-and-ospfv2/

Part 5: https://www.allhandsontech.com/it-ops/cisco/maximizing-ip-lfa-coverage-using-topology-independent-lfa-and-segment-routing/

Fast Reroute

LFA, rLFA, and TI-LFA are all fast-reroute technologies. The purpose of fast-reroute is to reduce downtime when a failure (link or node) occurs in the network. The idea behind all of these technologies is that a router will pre-program a backup path that it can gaurantee will be loop-free upon a given failure. This backup path is programmed into the FIB. The router will immediately begin using this path upon link failure while the IGP re-converges. The backup path needs to be loop-free so that other routers will forward the traffic properly before their IGP has converged. This minimizes downtime to under 50msec.

If you haven’t explored FRR with MPLS-TE then I suggest you read that article first. This series of articles re-uses terms such as PLR and NHOP. LFA/rLFA/TI-LFA are all fast reroute technologies that work in a similar manner to MPLS-TE FRR.

The difference between FRR with LFA/rLFA/TI-LFA and MPLS-TE is that MPLS-TE is circuit based. The fast-reroute protects an entire link, and must avoid that link but get back to the NHOP in order to stitch the circuit back together. The path that it takes may be suboptimal from the prespective that overall the traffic simply needs to get to the egress PE. But instead it is avoiding the failure and getting back to the NHOP of the original TE tunnel.

LFA/rLFA/TI-LFA are IPFRR (IP Fast Reroute) technologies. These are IP prefix-based and not circuit-based like MPLS-TE. IPFRR installs backup routes per-prefix, which gives you ECMP ability and optimal forwarding, which MPLS-TE lacks.

The other advantage with IPFRR compared to MPLS-TE FRR is that MPLS-TE FRR requires state in the network. This is the same advantage of SR vs. RSVP-TE in general.

LFA

A repair path is an LFA if a directly connected neighbor’s shortest path to the destination does not use the protected link or node under normal conditions (when all links are up).

The formula that represents this is:

Dist(N,D) < Dist(N, PLR) + Dist(PLR, D)

"Neighbor's distance to D" < "Neighbor's distance to me" + "My distance to D"
  • N is the neighbor

  • D is the destination

  • PLR is the local node

This guarantees that the repair path is loop-free because the neighbor’s distance to the destination is less than the local node’s distance to that destination. Therefore the packet won’t loop back to the local node when the link/node that is being protected fails. Quite literally the path is a loop-free alternate (LFA) path to the destination.

If the neighbor’s distance is higher, then when the link fails, but before the neighbor “hears” of the failure via the IGP, the neighbor will loop traffic back to the PLR. When the IGP converges on the neighbor, the loop stops. This period of time in which packets loop before the IGP converges is called micro-looping.

Example

R1’s distance to R6 is 30. If the R1-R2 link in red goes down, R1 needs to use R3 as a backup path. R3’s distance to R6 is 35. R1’s distance to R3 is 10. Let’s plug these numbers into the formula:

Dist(N,D) < Dist(N, PLR) + Dist(PLR, D)
    35    <      10      +      30
    35    <      40

The inequality is valid, therefore R1 can use R3 as an LFA. Intuitively this should make sense. If R3 receives a packet destined for R6, it will send it to R5 regardless of whether the R1-R2 link is up or not. Therefore it is safe for R1 to use R3 for the backup path.

What happens if we increase the cost of the R3-R5 link to 25?

R1 can no longer use R3 as an LFA, as R3 has two ECMP routes to R6. 40 via R5, and 40 via R1. The inequality is no longer valid:

Dist(N,D) < Dist(N, PLR) + Dist(PLR, D)
    40    <      10      +      30
    40    <      40

In this case there is no LFA. But what if we could tunnel the traffic directly to R5 and “bypass” R3’s IP forwarding decision making process. If R5 gets the traffic it should deliver it safely to R6, right? This is where rLFA (Remote LFA) comes in.

Remote LFA

Remote LFA extends LFA functionality past directly connected neighbors. This allows a node to direct backup traffic directly to another router that is not directly connected.

To determine whether a node can be a Remote LFA, we need to understand the concept of P-space and Q-space. P-space is the space in which the PLR’s shortest path to a node will not take the protected link/node. Q-space in the space in which the shortest path to the destination will not take the protected link/node.

It may be helpful to realize that both P and Q nodes must never traverse the protected link for the given destination. For P nodes, from their prespective, the destination is the local router itself (R1). For Q nodes, the destination is the actual destination which contains the prefix for the route we are protecting (R6).

Example

In our topology, R1 has a best path to R3 and R5 without using the protected link. This is the P-space. R5, R4, and R2 have best paths to R6 (the destination) also without using the protected link. This is the Q-space.

R5 is in both the P-space and the Q-space, making it a PQ node. Therefore if R1 can tunnel traffic to R5 directly, we know that R5 will not loop the traffic back towards R1. R5 is a Remote LFA.

We now use the same inequality as before, but R5 is our neighbor instead of R3. This is because we can tunnel traffic directly to R5.

Dist(N,D) < Dist(N, PLR) + Dist(PLR, D)
    15    <      35      +      30
    15    <      65

But how do we direct traffic from R1 directly to R5 during a link failure event? R1 and R5 become targeted LDP neighbors. R1 learns R5’s LDP bindings and vice-versa. R1 imposes two labels onto traffic and sends it to R3: { R3’s advertised LDP label for R5, R5’s LDP label for R6 }. This means that R3 will pop the top label and deliver the packet to R5, leaving R5 with its own LDP label for R6 which it will also pop and forward to R6. R1 learns R5’s LDP label for R6 via the targeted LDP session.

The problem with LFA and rLFA

What happens if we increase the cost of the R5 - R6 link to 100? Now R5 is no longer a Q node, as its shortest path to R6 is the “other way” around the ring (R5-R3-R1-R2-R4-R6) with a cost of 65.

We no longer have a PQ node. We can say that LFA and rLFA are topology dependent, because even though a backup path does exist (R1 can use R3-R5-R6 in the case of R1-R2 link failure), whether we can use the nodes along the alternate path depends on the topology.

This is where TI-LFA comes in. TI-LFA stands for Topology-Independent LFA. No matter what the topology looks like, TI-LFA can gaurantee 100% coverage because of the use of adjacency-SIDs. TI-LFA requires SR while LFA and RLFA do not.

TI-LFA

SR’s ability to do source-based routing is what makes TI-LFA so effective. With SR, a given node can steer traffic to any other node in the network by using a stack of labels that represent other nodes (prefix-SIDs) or links (adjacency-SIDs).

With TI-LFA no longer have to take a neighbor’s IGP distance to the destination into account. This doesn’t matter any longer, as we can use adjacency-SIDs to force traffic out a link, bypassing the IP routing process completely.

Because of this, TI-LFA can use the post-convergence path as the backup path, which is not the case in regular LFA. In regular LFA, the backup path is not always the same path as when the IGP reconverges after the link failure. When TI-LFA uses the post-convergence path, this means that while the IGP is reconverging, the backup path is already taking the exact path that the IGP will select! When you consider that this is done with zero state in the network, TI-LFA is really a huge improvment over LFA and rLFA.

To calculate a backup path, TI-LFA takes the current outgoing interface or next-hop node, and removes it from the SPF tree. Then it runs SPF to find the best path when that interface/node goes down. It encodes this path as a list of segments, and programs it into the FIB. This is done on a per-prefix basis. The number of segments is the minimum required so that the MPLS label stack does not get too large. For example, if a single prefix SID can represent hop #3, and hops 1 and 2 have a shortest path to hop 3 without traversing the protected link/node, then there is no point in using a label stack of {hop1, hop2, hop3} when {hop3} sufficies. This optimization happens automatically on the router.

Example

TI-LFA will remove the R1-R2 link and calculate the best path, which is R1-R3-R5-R6 with a cost of 135. In fact this is the only available path. TI-LFA calculates the SID list as { prefix-SID for R5, adjacency-SID for R5-R6 link advertised by R5}. R3 will pop the prefix-SID representing R5 and forward it out the R3-R5 link. R5 will pop the adjacency-SID and forward the traffic out the R5-R6 link. The adjacency-SID is used to force traffic out the R5-R6 interface even though it is a longer path in terms of IGP cost.

What is great about using TI-LFA, is that you can run both LDP and SR with sr-prefer not configured, and TI-LFA can still provide fast reroute protection. If a link goes down, TI-LFA can step in and impose SR labels in order to steer around the failure. 100% coverage is only achieved if all routers in the topology are running SR. If a PQ router does not run SR, for example, this cannot work, as the PLR would need to somehow learn the PQ router’s LDP label for the destination. In addition to protecting LDP traffic, SR can also protect IP (unlabeled) traffic in the same manner.

Further Reading

Nick Russo’s blog posts mentioned above

Youtube video on TI-LFA: https://www.youtube.com/watch?v=WEPiq4drHXw&ab_channel=BarbaraAnne

Last updated