LDP/IGP Sync

LDP/IGP Sync may seem complex, but I hope by the end of the article you find this feature intuitive and easy to understand.

In an MPLS environment, the IGP is generally needed to advertise and provide reachability to the /32 loopbacks of PEs in the network. These /32s are the endpoints that identify ingress and egress routers for VPN services.

LDP is needed to advertise labels for these /32 loopbacks to directly connected neighbors. If the IGP is up, and a router has a route to the /32 of PE, but LDP is not up, the router may IP route the traffic instead of label switch the traffic. This can break VPNs. The packet becomes unlabeled, and the LSR in the middle has no idea that the traffic actually belongs to an L2VPN or L3VPN.

The workaround to prevent this issue is to have the IGP (OSPF or ISIS) wait for LDP to be up. This is called LDP Sync - the IGP synchronizes with LDP. But how does it do this? To achieve syncronization, the IGP simply advertises the link cost as the maximum metric until the LDP session is up. This discourages the use of this link for existing labeled traffic while the LDP session is in the process of forming. If there is any other path to the destination, that path is going to be used, which will prevent the labels from getting popped off. Once LDP is up on the interface, the IGP metric is advertised with its normal value again. This ensures that labeled traffic takes alternative paths until LDP is actually up on the interface, so that the traffic can be label switched end-to-end.

What if LDP goes up before the IGP neighborship goes up? This isn’t a concern, as the router will use the best IGP path to each /32 prefix. If the IGP neighborship is not up, the router has the label bindings from its neighbor, but no best paths through that neighbor. So the traffic will use alternative routes until the IGP comes up. There is no syncronization needed in this case. LDP sync is only used to make the IGP wait for LDP - not the other way around.

The Issue Which LDP/IGP Sync Solves

We’ll use the following topology to explain the theory behind LDP/IGP sync.

All IGP link costs are 1. PE1 and PE2 are the PEs for an L3VPN service that CE1 and CE2 participate in. Under normal circumstances, traffic from CE1 to CE2 is label switched along the path PE1-P3-P4-PE2. At P3 and P4 the traffic is double labeled, with the top label being the transport label identifying PE2’s loopback and the bottom label being the L3VPN label which is only signficant to PE2.

Let’s imagine there is some issue with LDP between P3 and P4. The IGP neighborship is still up but LDP is down. P3 has a labeled path for PE2’s loopback via P5, however the IGP shortest path is via P4 unlabeled. Upon receiving traffic from PE1, P3 will pop both MPLS labels and send the packet to P4. P4 will try to route the traffic in its global routing table, breaking the L3VPN. We will explore this scenario in the lab later in the article.

The outage caused by this situation could be avoided if we could just force P3 to send the traffic via P5. The labels would stay intact and the L3VPN service would stay up. If we had LDP sync turned on, the IGP would advertise the P3-P4 link with the max metric, making the P3-P5-P4 path better. Only once the LDP session comes up again will the metric for the P3-P4 link be advertised as 1.

Lab (OSPF)

Here are the startup configs, using the diagram above.

Pings between CE1 and CE2 should succeed and take the path PE1-P3-P4-PE2

To expose the issue when LDP sync is not configured. We’ll bring down LDP between P3 and P4 but leave the IGP up. To accomplish this, we’ll implement LDP authentication on all routers, P3, P4, P5 but “forget” to put the password for neighbor 3.3.3.3 on P4.

After you clear LDP sessions, you should see no LDP session between P3 and P4:

P3’s shortest path to 2.2.2.2 is still via P4 though. Remember the OSPF neighborship is still up between P3 and P4, but P3 is not learning P4’s label binding for the 2.2.2.2 prefix. Therefore P3 will forward labeled traffic destined to 2.2.2.2 unlabeled to P4.

Traffic between the CEs no longer works:

Let’s enable LDP sync and see if that fixes the issue. LDP sync is configured under the IGP configuration.

P3’s nexthop for 2.2.2.2/32 is now R5:

Through P3’s LDP session with P5, it has learned P5’s label binding for 2.2.2.2. Traffic between the CEs is working again:

How does this work? P3 and P4 know that they should be forming an LDP neighborship on the link between themselves, since there is an OSPF neighborship up. However, since the LDP session is down, they each advertise the link with the max metric, which is 65535 in OSPF. This means that the associated link is always going to be the least preferred path. This roughly similar to the STP uplinkfast feature, in which a switch’s bridge priority changes from 32768 to 49152 and port costs are increased by 3000. The idea with LDP sync is to make the link undesirable. The idea with uplinkfast is to make the switch undesirable to become the root bridge.

Let’s bring the LDP session up between P3 and P4 by configuring the correct password on P4:

P3 now has a next-hop of P4 for 2.2.2.2/32

The interface metric is back to the normal metric value of 1:

While the LDP session was coming up, I ran an extended ping between CE1 and CE2 and did not lose a single ping:

Lab (ISIS)

For completeness, we’ll tear down OSPF, implement ISIS, and test LDP sync again.

First we will replace OSPF with ISIS:

CE1 should be able to ping CE2 again:

We’ll break the LDP session again by removing the neighbor 3.3.3.3 password on P4

Our L3VPN is broken again (feel free to test this yourself). Let’s turn on LDP sync under the ISIS config on P3, P4 and P5.

Before we configure P4, let’s stop here and consider what the state of our L3VPN is right now. Do you think it is fully broken, half broken, or fixed?

P3 has adjusted its Gi2 metric to the max metric (which is 63 since we did not enable wide-metrics).

However, P4 does not have LDP sync turned on yet, so its Gi0/0/0/1 interface is still at the default metric (10), even though LDP is not up on the interface.

This means that traffic will be one-way only. Traffic from CE1 to CE2 works, but traffic from CE2 to CE1 is dropped at P4.

Here is a pcap at the PE2-CE2 link. CE2 is replying to the ICMP request:

At the PE1-P3 link, only the ICMP request from CE1 is seen:

Let’s add LDP sync under isis in P4 now. On IOS-XR for ISIS, ldp sync is configured per-interface under the interface address-family:

P4 now advertises its link to P3 as metric 63:

Traffic between CE1 and CE2 is working again, and taking the P3-P5-P4 path:

Pop quiz! If we change the ISIS metrics to wide-metrics, do you know what the metric will be on the P3-P4 link?

Let’s try it to find out:

Actually, first, one more pop quiz,. What happens when two neighbors have differing metric stlyes? Will the neighborship go down? If not, will there still be reachability?

First I configure PE1 for wide metrics, but not P3. The neighborship is still up, but P3 does not know how to reach 1.1.1.1/32. A handy command show isis topology displays ** for the metric of PE1, which lets us know that there is a metric-style mismatch.

After configuring metric-style wide on all routers, we can see that the P3-P4 link has a metric of 16777214! This is 2^24 -2. As a note, narrow metrics (the default) are 6 bits, and the max value is 2^6 - 1 = 63. It is interesting that the max wide metric is not 16777215. I’m not quite sure why Cisco subtracts 2 from 2^24 instead of 1.

Cisco will allow you to use 16777215, but warns you that this is for compatibility with older versions:

LDP IGP Sync Holddown Timer

By default, the IGP will wait indefinitely for LDP to come up. The idea is that the IGP will not even form an adjacency on the link if LDP is not up. (I have found in my own labbing that I can only reproduce this when the routers have never established an IGP adjacency in the first place. If they have already established an IGP adjancency, and then I break LDP, I can't get them to then never establish IGP adjacency again. See the article on "Troubleshooting OSPF Adjacencies" here for more information on this.)

On IOS-XE you can configure a holddown timer. If this timer expires and LDP is still not up, the IGP will form an adjancency anyways, and advertise the link as the max metric. I could not find a way to implement this in IOS-XR.

This would configure the holddown time for 10 seconds:

Sync Delay

On both IOS-XE and IOS-XR, you can delay IGP metric adjustment after LDP establishes using the sync delay timer. On IOS-XE you can configure a time of 5-60 seconds. On IOS-XR the timer range is 5-300 seconds.

We’ll configure a timer on both P3 and P4, then add the LDP authentication to P4 again. LDP will wait for the specified delay interval before notifying the IGP that it is up. You may want to use this in case LDP bounces up and down. This command requires that LDP stay up for the entire delay interval before the IGP adjusts the link back to the default metric.

Verification

Conclusion

LDP/IGP Sync is a feature that prevents blackholing of VPN traffic by de-preferring an IGP link if there is no functional LDP session on that link. The IGP (OSPF or ISIS) advertises the link with the maximum metric. This ensures that if there is any other path to the destination, that other path will be used. Once the LDP session is up on the interface, the IGP advertises the link with the normal metric again. You can configure a sync delay time, so that LDP will wait up to 60 seconds (IOS XE) or up to 300 seconds (IOS XR) before notifying the IGP that the LDP is up. This ensures that LDP is stable before reducing the IGP metric back to the normal metric value.

LDP/IGP Sync is a feature that you should always have configured in your core. There is no reason not to have this configured. Failing to configure the feature in fact leaves you open to issues if and when LDP and IGP reconverge. In this lab we broke LDP purposefully to examine the functionality of the feature. But even if the IGP simply converges faster than LDP for a second or two, you can drop traffic during that period.

A huge benefit of SR (Segment Routing) is that there is no need to worry about this sync problem. In SR, the labels are advertised within the IGP itself using new TLVs (ISIS) or LSAs (OSPF). LDP is no longer needed to disseminate label bindings for IGP prefixes. If you migrate to SR, you can completely remove LDP and LDP sync.

Last updated