Classic Traceroute Behavior in MPLS Networks

Before we explore LSP Ping and LSP Traceroute, we should first understand the behavior of regular ping and traceroute in an MPLS network. We can then compare the MPLS tools to see what problems they solve.

In this article we will explore the behavior of traceroute in an MPLS environment from a customer’s perspective. We will also see how we can hide the internal MPLS network from the customer by disabling IP to MPLS TTL propagation.

Lab

We will use the following basic L3VPN topology:

Here are the startup configs:

#CE1
hostname CE1
!
int Gi1
 ip address 100.64.0.2 255.255.255.252
 no shut
!
int lo0
 ip address 10.1.1.1 255.255.255.0
!
router bgp 65000
 neighbor 100.64.0.1 remote-as 100
 network 10.1.1.0 mask 255.255.255.0
!
no ip domain lookup

#CE2
hostname CE2
!
int Gi1
 ip address 100.64.0.6 255.255.255.252
 no shut
!
int lo0
 ip address 10.1.2.1 255.255.255.0
!
router bgp 65001
 neighbor 100.64.0.5 remote-as 100
 network 10.1.2.0 mask 255.255.255.0
!
no ip domain lookup

#PE1
hostname PE1
!
vrf definition CUSTOMER
 rd 1:1
 route-target both 1:1
 address-family ipv4 unicast
!
int Gi1
 vrf forwarding CUSTOMER
 ip address 100.64.0.1 255.255.255.252
 no shut
!
int Gi2
 ip address 10.1.3.1 255.255.255.0
 no shut
 mpls ip
 ip ospf network point-to-point
!
int lo0
 ip address 1.1.1.1 255.255.255.255
!
router bgp 100
 neighbor 2.2.2.2 remote-as 100
 neighbor 2.2.2.2 update-source lo0
 address-family vpnv4
  neighbor 2.2.2.2 activate
 address-family ipv4 unicast vrf CUSTOMER
  neighbor 100.64.0.2 remote-as 65000
!
router ospf 1
 network 0.0.0.0 255.255.255.255 area 0
!
no ip domain lookup

#P3
hostname P3
!
int Gi1
 ip address 10.3.4.3 255.255.255.0
 mpls ip
 ip ospf network point-to-point
 no shut
!
int Gi2
 ip address 10.1.3.3 255.255.255.0
 no shut
 mpls ip
 ip ospf network point-to-point
!
int lo0
 ip address 3.3.3.3 255.255.255.255
!
router ospf 1
 network 0.0.0.0 255.255.255.255 area 0

#P4(XR)
hostname P4
!
int Gi0/0/0/0
 ip address 10.3.4.4/24
 no shut
!
int Gi0/0/0/1
 ip address 10.2.4.4/24
 no shut
!
int lo0
 ip address 4.4.4.4/32
!
router ospf 1
 area 0
  int Gi0/0/0/0
   network point-to-point
  int Gi0/0/0/1
   network point-to-point
  int Lo0
!
mpls ldp
 int Gi0/0/0/0
 int Gi0/0/0/1

#PE2(XR)
hostname PE2
!
vrf CUSTOMER
 address-family ipv4 unicast
  import route-target 1:1
  export route-target 1:1
!
int Gi0/0/0/0
 vrf CUSTOMER
 ip address 100.64.0.5/30
 no shut
!
int Gi0/0/0/1
 ip address 10.2.4.2/24
 no shut
!
int lo0
 ip address 2.2.2.2/32
!
router ospf 1
 area 0
  int Gi0/0/0/1
   network point-to-point
  int Lo0
!
router bgp 100
 address-family ipv4 unicast
 address-family vpnv4 unicast
 neighbor 1.1.1.1
  remote-as 100
  update-source Lo0
  address-family vpnv4 unicast
 vrf CUSTOMER
  rd 1:1
  address-family ipv4 unicast
  neighbor 100.64.0.6
   remote-as 65001
   address-family ipv4 unicast
    route-policy PASS in
    route-policy PASS out
!
route-policy PASS
 pass
end-policy
!
mpls ldp
 int Gi0/0/0/1

We’ll first do a traceroute from CE1 to CE2. Notice that the MPLS labels are visible in the traceroute, as well as the MPLS EXP value.

CE1#traceroute 10.1.2.1 source lo0 probe 1
Type escape sequence to abort.
Tracing the route to 10.1.2.1
VRF info: (vrf in name/id, vrf out name/id)
  1 100.64.0.1 28 msec
  2 10.1.3.3 [MPLS: Labels 19/24005 Exp 0] 24 msec
  3 10.3.4.4 [MPLS: Labels 24003/24005 Exp 0] 15 msec
  4 10.2.4.2 [MPLS: Label 24005 Exp 0] 16 msec
  5 100.64.0.6 44 msec

The MPLS labels are visible due to an ICMP extension for MPLS. You can see that the entire MPLS header is included in the ICMP TTL message back to the original sender:

Let’s briefly review traceroute operations. CE1 begins with a UDP packet with DST port 33434 and TTL=1 and sends it towards the destination that was specified. PE1 decrements the TTL to 0 and sends an ICMP TTL exceeded message to CE1. PE1 uses its source interface Gi1 that it received the packet on to source the ICMP error message:

Next, CE1 increments the TTL by one (1 to 2) and the UDP DST port by one (33434 to 33435) and sends the packet towards the destination. PE1 decrements the TTL by 1 and forwards it towards PE2. To do this, it must encapsulate the packet in two MPLS headers. The IP TTL of 1 is copied into the MPLS TTL header (more on this later). P3 receives the packet with a top MPLS header that has a TTL of 1. P3 decrements the MPLS TTL by 1 and now must generate an ICMP TTL exceeded message. But how can P3 forward this directly to CE1? P3 does not have the CUSTOMER VRF table and has no knowledge of 10.1.1.1. But yet P3 sets 10.1.1.1 as the destination. How can this work?

Notice that the TTL of the IP header of the packet P3 sends to CE1 has a TTL of 248 when it is received at CE1. This is a hint into how P3 delivers the packet to CE1.

An ICMP extension for MPLS dictates that the ICMP TTL exceeded message generated in response to an MPLS-labeled packet that is decremented to 0 should be forwarded all the way through the original LSP in order to be delivered back to the source. So what happens is that P3 generates the ICMP TTL exceeded message, and then continues to send the traffic all the way towards PE2. PE2 then sees the destination of 10.1.1.1 and forwards it all the way back to PE1.

Let’s take packet captures at each link to examine this. I will run traceroute on CE1 but only send a TTL value of 2, so that we can focus on the process for only P3’s TTL exceeded message.

CE1#traceroute 10.1.2.1 source lo0 probe 1 ttl 2 2 
Type escape sequence to abort.
Tracing the route to 10.1.2.1
VRF info: (vrf in name/id, vrf out name/id)
  2 10.1.3.3 [MPLS: Labels 19/24005 Exp 0] 13 msec

The first “2” is the min TTL, and the second “2” is the max TTL

Here is the packet that PE1 forwards to P3. The IP TTL was decremented to 1, and this TTL was copied into the MPLS headers:

P3 generates the ICMP TTL exceeded message which has a new TTL of 255. The labels that were received on the MPLS packet (19/24005) are included in the ICMP message. This TTL of 255 is copied into the new MPLS headers and sent along the same LSP that P3 received the packet on.

The labels that P3 imposed above are the same labels you would see if the TTL never expired on the packet and it continued being sent along the LSP. The service label, 24005, is unchanged. 24003 is the swap result for incoming label 19 on P3.

P3#show mpls forwarding-table labels 19
Local      Outgoing   Prefix           Bytes Label   Outgoing   Next Hop    
Label      Label      or Tunnel Id     Switched      interface              
19         24003      2.2.2.2/32       3884          Gi1        10.3.4.4

P4 pops the top label, decrements the remaining service label’s TTL by 1, and forwards to PE2. The IP packet generated by P3 still has a TTL of 255 at this point.

PE2 decrements the MPLS TTL by 1 (to 253) and copies this to the IP packet’s TTL. PE2 then does a lookup on the destination and sees a next-hop of PE1. PE2 should decrement the IP packet’s TTL by 1 (which should now be 252) and encapsulates the IP packet down the LSP towards PE1. The IP TTL is copied into the MPLS TTL on the imposed labels, which as seen on the P4-PE2 link is actually 251, not 252. To be honest I don’t quite understand why this is, but it doesn’t affect the rest of the explanation. Each MPLS router along the path now decrements the top MPLS TTL by 1.

From PE2 to P4:

From P4 to P3:

From P3 to PE1:

Because the top label’s TTL was lower than the bottom label’s TTL, when the top label was popped, its TTL value was copied to the bottom label. Then that TTL was decremented by one by P3 (to 249) before sending to PE1.

PE1 copies the MPLS TTL of 249 to the IP packet TTL. PE1 decrements the IP packet’s TTL by 1 to 248 and sends to CE1. On CE1 we see this:

IP TTL Propagation

As you’ve seen above, when an IP packet is encapsulated into an LSP, by default, the IP TTL is copied into the MPLS TTL. LSRs only decrement the MPLS TTL, leaving the IP TTL alone. In fact, LSRs don’t even inspect the IP TTL. The egress router copies the MPLS TTL to the IP TTL. This is why a traceroute on CE1 shows every single router along the path. The IP TTL and MPLS TTL are essentially one in the same.

We can alter this behaviour by turning off IP to MPLS TTL propagation on the PEs. Let’s try this out and see how it affects a traceroute from the CEs.

First we’ll disable IP to MPLS TTL propagation for forwarded packets on PE1. When you disable IP to MPLS TTL propagation you have three choices: forwarded traffic, locally generated traffic, or both. If you select forwarded traffic, locally generated traceroutes will still show your entire LSP, as the locally generated IP packet’s TTL is copied to the MPLS TTL. However, traffic which is received on an interface will not have the TTL propgated to the MPLS TTL. The MPLS TTL will be 255 on imposition. On disposition (at the egress PE when the final label is removed), the MPLS TTL is not copied to the IP TTL. This hides the internal MPLS network from customers.

#PE1
PE1(config)#no mpls ip propagate-ttl ?
  forwarded  Propagate IP TTL for forwarded traffic
  local      Propagate IP TTL for locally originated traffic
  <cr>       <cr>

no mpls ip propgate-ttl forwarded

From CE1 run the traceroute again:

CE1#traceroute 10.1.2.1 source lo0 probe 1
Type escape sequence to abort.
Tracing the route to 10.1.2.1
VRF info: (vrf in name/id, vrf out name/id)
  1 100.64.0.1 2 msec
  2 10.2.4.2 [MPLS: Label 24005 Exp 0] 13 msec
  3 100.64.0.6 8 msec

Routers P3 and PR4 are missing from the traceroute now. If we do a packet capture at the PE1-P3 link, we can see that the packet with TTL of 2 has an MPLS TTL of 255. This means that neither P3 nor P4 will generate a TTL exceeded message.

When PE2 receives the MPLS TTL value of 253, it would normally decrement by 1 and copy this to the IP TTL. However, there is a rule that if the MPLS TTL is higher than the IP TTL, you cannot copy it, as it could result in loops in certain circumstances.

Do you notice anything a little strange about the second hop in the traceroute? PE2 responds with its internal IP of 10.2.4.2. We probably don’t want the customer to see our internal IPs, so let’s configure PE2 to respond with an interface in the customer VRF:

#PE2 (XR)
icmp ipv4 source vrf

This command instructs the router to source ICMP TTL exceeded messages from the source VRF, not the interface it received the packet on.

CE1#traceroute 10.1.2.1 source lo0 probe 1        
Type escape sequence to abort.
Tracing the route to 10.1.2.1
VRF info: (vrf in name/id, vrf out name/id)
  1 100.64.0.1 1 msec
  2 100.64.0.5 [MPLS: Label 24005 Exp 0] 24 msec
  3 100.64.0.6 11 msec

What happens if we trace from CE2 to CE1 right now? Will we see every router in the path, or will the MPLS network be hidden?

The answer is that we will see every router in the path, because PE2 still has the default IP to MPLS TTL propagation.

CE2#traceroute 10.1.1.1 source lo0 probe 1
Type escape sequence to abort.
Tracing the route to 10.1.1.1
VRF info: (vrf in name/id, vrf out name/id)
  1 100.64.0.5 3 msec
  2 10.2.4.4 [MPLS: Labels 24000/16 Exp 0] 15 msec
  3 10.3.4.3 [MPLS: Labels 16/16 Exp 0] 16 msec
  4 100.64.0.1 [MPLS: Label 16 Exp 0] 12 msec
  5 100.64.0.2 56 msec

To turn this off on XR, we use the following command:

RP/0/0/CPU0:PE2(config)#mpls ip-ttl-propagate disable ?
  forwarded  Disable IP TTL propagation for only forwarded MPLS packets
  local      Disable IP TTL propagation for only locally generated MPLS packets
  <cr>

mpls ip-ttl-propagate disable forwarded

Now the traceroute from the other direction also hides the internal MPLS network:

CE2#traceroute 10.1.1.1 source lo0 probe 1
Type escape sequence to abort.
Tracing the route to 10.1.1.1
VRF info: (vrf in name/id, vrf out name/id)
  1 100.64.0.5 3 msec
  2 100.64.0.1 [MPLS: Label 16 Exp 0] 15 msec
  3 100.64.0.2 12 msec

However, if we trace from PE1 or PE2 to a destination, the IP TTL will be propagated.

#PE1
! First we redistribute the connected /30 into BGP so that PE2 and CE2 have a return route.
router bgp 100
 address-family ipv4 unicast vrf CUSTOMER
  redistribute connected

PE1#traceroute vrf CUSTOMER 10.1.2.1 source Gi1 probe 1
Type escape sequence to abort.
Tracing the route to 10.1.2.1
VRF info: (vrf in name/id, vrf out name/id)
  1 10.1.3.3 [MPLS: Labels 19/24005 Exp 0] 13 msec
  2 10.3.4.4 [MPLS: Labels 24003/24005 Exp 0] 14 msec
  3 100.64.0.5 [MPLS: Label 24005 Exp 0] 11 msec
  4 100.64.0.6 24 msec

If we use the command no mpls ip propagate-ttl on PE1, both forwarded and locally generated traffic will not have the IP TTL propgated to the MPLS TTL.

#PE1
no mpls ip propagate-ttl
end


PE1#traceroute vrf CUSTOMER 10.1.2.1 source Gi1 probe 1
Type escape sequence to abort.
Tracing the route to 10.1.2.1
VRF info: (vrf in name/id, vrf out name/id)
  1 100.64.0.5 [MPLS: Label 24005 Exp 0] 12 msec
  2 100.64.0.6 10 msec

Conclusion

ICMP has been extended for MPLS in order to provide label information in TTL exceeded messages. The entire stack of MPLS headers is included in the ICMP data.

By default the IP TTL of a received or generated packet is copied to the MPLS label(s) imposed by the ingress PE when it sends the traffic down the LSP. To prevent this behavior, and hide the internal MPLS network, you use the following command:

#IOS-XE
no mpls ip propgate-ttl forwarded|local

#IOS-XR
mpls ip-ttl-propagate disable forwarded|local

This command causes the ingress PE to use an MPLS TTL value of 255 regardless of what the IP TTL is. When the egress PE tries to copy the MPLS TTL to the IP packet, if the MPLS TTL is larger than the existing IP TTL, the IP TTL is left as it is.

Lab

IP TTL Propagation

Conclusion

Further Reading