Classic Traceroute Behavior in MPLS Networks
Last updated
Last updated
Before we explore LSP Ping and LSP Traceroute, we should first understand the behavior of regular ping and traceroute in an MPLS network. We can then compare the MPLS tools to see what problems they solve.
In this article we will explore the behavior of traceroute in an MPLS environment from a customer’s perspective. We will also see how we can hide the internal MPLS network from the customer by disabling IP to MPLS TTL propagation.
We will use the following basic L3VPN topology:
Here are the startup configs:
We’ll first do a traceroute from CE1 to CE2. Notice that the MPLS labels are visible in the traceroute, as well as the MPLS EXP value.
The MPLS labels are visible due to an ICMP extension for MPLS. You can see that the entire MPLS header is included in the ICMP TTL message back to the original sender:
Let’s briefly review traceroute operations. CE1 begins with a UDP packet with DST port 33434 and TTL=1 and sends it towards the destination that was specified. PE1 decrements the TTL to 0 and sends an ICMP TTL exceeded message to CE1. PE1 uses its source interface Gi1 that it received the packet on to source the ICMP error message:
Next, CE1 increments the TTL by one (1 to 2) and the UDP DST port by one (33434 to 33435) and sends the packet towards the destination. PE1 decrements the TTL by 1 and forwards it towards PE2. To do this, it must encapsulate the packet in two MPLS headers. The IP TTL of 1 is copied into the MPLS TTL header (more on this later). P3 receives the packet with a top MPLS header that has a TTL of 1. P3 decrements the MPLS TTL by 1 and now must generate an ICMP TTL exceeded message. But how can P3 forward this directly to CE1? P3 does not have the CUSTOMER VRF table and has no knowledge of 10.1.1.1. But yet P3 sets 10.1.1.1 as the destination. How can this work?
Notice that the TTL of the IP header of the packet P3 sends to CE1 has a TTL of 248 when it is received at CE1. This is a hint into how P3 delivers the packet to CE1.
An ICMP extension for MPLS dictates that the ICMP TTL exceeded message generated in response to an MPLS-labeled packet that is decremented to 0 should be forwarded all the way through the original LSP in order to be delivered back to the source. So what happens is that P3 generates the ICMP TTL exceeded message, and then continues to send the traffic all the way towards PE2. PE2 then sees the destination of 10.1.1.1 and forwards it all the way back to PE1.
Let’s take packet captures at each link to examine this. I will run traceroute on CE1 but only send a TTL value of 2, so that we can focus on the process for only P3’s TTL exceeded message.
The first “2” is the min TTL, and the second “2” is the max TTL
Here is the packet that PE1 forwards to P3. The IP TTL was decremented to 1, and this TTL was copied into the MPLS headers:
P3 generates the ICMP TTL exceeded message which has a new TTL of 255. The labels that were received on the MPLS packet (19/24005) are included in the ICMP message. This TTL of 255 is copied into the new MPLS headers and sent along the same LSP that P3 received the packet on.
The labels that P3 imposed above are the same labels you would see if the TTL never expired on the packet and it continued being sent along the LSP. The service label, 24005, is unchanged. 24003 is the swap result for incoming label 19 on P3.
P4 pops the top label, decrements the remaining service label’s TTL by 1, and forwards to PE2. The IP packet generated by P3 still has a TTL of 255 at this point.
PE2 decrements the MPLS TTL by 1 (to 253) and copies this to the IP packet’s TTL. PE2 then does a lookup on the destination and sees a next-hop of PE1. PE2 should decrement the IP packet’s TTL by 1 (which should now be 252) and encapsulates the IP packet down the LSP towards PE1. The IP TTL is copied into the MPLS TTL on the imposed labels, which as seen on the P4-PE2 link is actually 251, not 252. To be honest I don’t quite understand why this is, but it doesn’t affect the rest of the explanation. Each MPLS router along the path now decrements the top MPLS TTL by 1.
From PE2 to P4:
From P4 to P3:
From P3 to PE1:
Because the top label’s TTL was lower than the bottom label’s TTL, when the top label was popped, its TTL value was copied to the bottom label. Then that TTL was decremented by one by P3 (to 249) before sending to PE1.
PE1 copies the MPLS TTL of 249 to the IP packet TTL. PE1 decrements the IP packet’s TTL by 1 to 248 and sends to CE1. On CE1 we see this:
As you’ve seen above, when an IP packet is encapsulated into an LSP, by default, the IP TTL is copied into the MPLS TTL. LSRs only decrement the MPLS TTL, leaving the IP TTL alone. In fact, LSRs don’t even inspect the IP TTL. The egress router copies the MPLS TTL to the IP TTL. This is why a traceroute on CE1 shows every single router along the path. The IP TTL and MPLS TTL are essentially one in the same.
We can alter this behaviour by turning off IP to MPLS TTL propagation on the PEs. Let’s try this out and see how it affects a traceroute from the CEs.
First we’ll disable IP to MPLS TTL propagation for forwarded packets on PE1. When you disable IP to MPLS TTL propagation you have three choices: forwarded traffic, locally generated traffic, or both. If you select forwarded traffic, locally generated traceroutes will still show your entire LSP, as the locally generated IP packet’s TTL is copied to the MPLS TTL. However, traffic which is received on an interface will not have the TTL propgated to the MPLS TTL. The MPLS TTL will be 255 on imposition. On disposition (at the egress PE when the final label is removed), the MPLS TTL is not copied to the IP TTL. This hides the internal MPLS network from customers.
From CE1 run the traceroute again:
Routers P3 and PR4 are missing from the traceroute now. If we do a packet capture at the PE1-P3 link, we can see that the packet with TTL of 2 has an MPLS TTL of 255. This means that neither P3 nor P4 will generate a TTL exceeded message.
When PE2 receives the MPLS TTL value of 253, it would normally decrement by 1 and copy this to the IP TTL. However, there is a rule that if the MPLS TTL is higher than the IP TTL, you cannot copy it, as it could result in loops in certain circumstances.
Do you notice anything a little strange about the second hop in the traceroute? PE2 responds with its internal IP of 10.2.4.2. We probably don’t want the customer to see our internal IPs, so let’s configure PE2 to respond with an interface in the customer VRF:
This command instructs the router to source ICMP TTL exceeded messages from the source VRF, not the interface it received the packet on.
What happens if we trace from CE2 to CE1 right now? Will we see every router in the path, or will the MPLS network be hidden?
The answer is that we will see every router in the path, because PE2 still has the default IP to MPLS TTL propagation.
To turn this off on XR, we use the following command:
Now the traceroute from the other direction also hides the internal MPLS network:
However, if we trace from PE1 or PE2 to a destination, the IP TTL will be propagated.
If we use the command no mpls ip propagate-ttl on PE1, both forwarded and locally generated traffic will not have the IP TTL propgated to the MPLS TTL.
ICMP has been extended for MPLS in order to provide label information in TTL exceeded messages. The entire stack of MPLS headers is included in the ICMP data.
By default the IP TTL of a received or generated packet is copied to the MPLS label(s) imposed by the ingress PE when it sends the traffic down the LSP. To prevent this behavior, and hide the internal MPLS network, you use the following command:
This command causes the ingress PE to use an MPLS TTL value of 255 regardless of what the IP TTL is. When the egress PE tries to copy the MPLS TTL to the IP packet, if the MPLS TTL is larger than the existing IP TTL, the IP TTL is left as it is.
https://packetlife.net/blog/2008/dec/22/disabling-mpls-ttl-propagation/