In Draft Rosen, PEs peer with CEs in the C-PIM, and the provider core runs P-PIM. The following is a diagram of the PIM adjacencies:
In Draft Rosen, GRE is used to tunnel traffic between PEs over the P-PIM network. Each PE specifies the group address for the default MDT (multicast distribution tree) under the customer VRF. This group address must be the same for each VRF on all PEs. Every customer VRF has a single group address for the associated default MDT. This is a PIM-SSM group, in which each PE is a source. In the above diagram, there will be three PIM-SSM entries in the P-PIM network, one for each PE. (PE1, 232.1.1.1), (PE2, 232.1.1.1), and (PE3, 232.1.1.1). Each PE joins the trees of all other PEs.
The PEs need a way to discover which remote PEs are participating in this MDT. The routers run bgp address-family ipv4 mdt for auto-discovery. Each PE advertises a route in this AFI that includes its loopback and the MDT group. This allows the PEs to learn about each other. If a PE has a VRF with a matching default MDT group, it sends a PIM Join for the PIM-SSM (S, G) pair of every other remote PE within the P-PIM.
Once the default MDT is formed in the P-PIM, each PE router uses a multicast GRE tunnel to establish PIM adjacencies with all other PE in the C-PIM. The GRE tunnel has a destination of the MDT group address (232.1.1.1). This means that a packet forwarded out the GRE tunnel is received by every other remote PE participating in the C-PIM.
You can think of the P-PIM as being the underlay, and the C-PIM as an overlay. The PEs achieve a full mesh of C-PIM adjacencies between each other using the GRE tunnels.
The tunneling technology here is GRE, not MPLS. So there are no MPLS labels on these packets. GRE is used to tunnel C-PIM traffic from one PE to another. The P routers forward based on the P-PIM multicast routing table (the underlay).
This will make more sense once you see it in the lab.
Lab
We will use the following topology. The provider core runs ISIS and LDP.
Startup Configs:
#PE1
hostname PE1
!
vrf definition CUSTOMER
rd 100:1
route-target both 100:1
address-family ipv4 unicast
!
int Gi1
ip address 10.1.10.1 255.255.255.0
no shut
ip router isis
isis network point-to-point
mpls ip
!
int Gi2
vrf forwarding CUSTOMER
ip address 100.64.0.1 255.255.255.252
no shut
!
int lo0
ip address 1.1.1.1 255.255.255.255
ip router isis
!
router isis
net 49.0001.0000.0000.0001.00
is-type level-2-only
!
router bgp 100
neighbor 10.10.10.10 remote-as 100
neighbor 10.10.10.10 update-source lo0
address-family vpnv4 unicast
neighbor 10.10.10.10 activate
address-family ipv4 unicast vrf CUSTOMER
neighbor 100.64.0.2 remote-as 65000
neighbor 100.64.0.2 as-override
#PE2
hostname PE2
!
vrf definition CUSTOMER
rd 100:1
route-target both 100:1
address-family ipv4 unicast
!
int Gi1
ip address 10.2.20.2 255.255.255.0
no shut
ip router isis
isis network point-to-point
mpls ip
!
int Gi2
vrf forwarding CUSTOMER
ip address 100.64.0.5 255.255.255.252
no shut
!
int lo0
ip address 2.2.2.2 255.255.255.255
ip router isis
!
router isis
net 49.0001.0000.0000.0002.00
is-type level-2-only
!
router bgp 100
neighbor 10.10.10.10 remote-as 100
neighbor 10.10.10.10 update-source lo0
address-family vpnv4 unicast
neighbor 10.10.10.10 activate
address-family ipv4 unicast vrf CUSTOMER
neighbor 100.64.0.6 remote-as 65000
neighbor 100.64.0.6 as-override
#PE3
hostname PE3
!
vrf CUSTOMER
address-family ipv4 unicast
import route-target 100:1
export route-target 100:1
!
int Gi0/0/0/0
ip address 10.3.20.3/24
no shut
!
int Gi0/0/0/1
vrf CUSTOMER
ip address 100.64.0.9/30
no shut
!
int lo0
ip address 3.3.3.3/32
!
router isis 1
net 49.0001.0000.0000.0003.00
is-type level-2-only
int Gi0/0/0/0
point-to-point
address-family ipv4 unicast
!
int Lo0
address-family ipv4 unicast
!
router bgp 100
!
address-family ipv4 unicast
address-family vpnv4 unicast
!
neighbor 10.10.10.10
remote-as 100
update-source lo0
address-family vpnv4 unicast
!
vrf CUSTOMER
rd 100:1
address-family ipv4 unicast
neighbor 100.64.0.10
remote-as 65000
address-family ipv4 unicast
route-policy PASS in
route-policy PASS out
as-override
!
route-policy PASS
pass
end-policy
!
mpls ldp
int Gi0/0/0/0
#P1
hostname P1
!
int Gi0/0/0/0
ip address 10.10.20.10/24
no shut
!
int Gi0/0/0/1
ip address 10.1.10.10/24
no shut
!
int lo0
ip address 10.10.10.10/32
!
router isis 1
net 49.0001.0000.0000.0010.00
is-type level-2-only
int Gi0/0/0/0
point-to-point
address-family ipv4 unicast
!
int Gi0/0/0/1
point-to-point
address-family ipv4 unicast
!
int Lo0
address-family ipv4 unicast
!
mpls ldp
int Gi0/0/0/0
int Gi0/0/0/1
!
router bgp 100
neighbor-group IBGP
update-source lo0
remote-as 100
address-family vpnv4 unicast
route-reflector-client
exit
exit
!
address-family vpnv4 unicast
!
neighbor 1.1.1.1 use neighbor-group IBGP
neighbor 2.2.2.2 use neighbor-group IBGP
neighbor 3.3.3.3 use neighbor-group IBGP
#P2
hostname P2
!
int Gi0/0/0/0
ip address 10.10.20.20/24
no shut
!
int Gi0/0/0/1
ip address 10.2.20.20/24
no shut
!
int Gi0/0/0/2
ip address 10.3.20.20/24
no shut
!
int lo0
ip address 20.20.20.20/32
!
router isis 1
net 49.0001.0000.0000.0020.00
is-type level-2-only
int Gi0/0/0/0
point-to-point
address-family ipv4 unicast
!
int Gi0/0/0/1
point-to-point
address-family ipv4 unicast
!
int Gi0/0/0/2
point-to-point
address-family ipv4 unicast
!
int Lo0
address-family ipv4 unicast
!
mpls ldp
int Gi0/0/0/0
int Gi0/0/0/1
int Gi0/0/0/2
#CE1
hostname CE1
!
int gi1
ip address 100.64.0.2 255.255.255.252
no shut
!
int gi2
ip address 10.1.1.1 255.255.255.0
no shut
!
router bgp 65000
neighbor 100.64.0.1 remote-as 100
network 10.1.1.0 mask 255.255.255.0
#CE2
hostname CE2
!
int gi1
ip address 100.64.0.6 255.255.255.252
no shut
!
int gi2
ip address 10.1.2.1 255.255.255.0
no shut
!
router bgp 65000
neighbor 100.64.0.5 remote-as 100
network 10.1.2.0 mask 255.255.255.0
#CE3
hostname CE3
!
int gi1
ip address 100.64.0.10 255.255.255.252
no shut
!
int gi2
ip address 10.1.3.1 255.255.255.0
no shut
!
router bgp 65000
neighbor 100.64.0.9 remote-as 100
network 10.1.3.0 mask 255.255.255.0
#C1
hostname C1
int gi0/0
ip address 10.1.1.10 255.255.255.0
no shut
!
ip route 0.0.0.0 0.0.0.0 10.1.1.1
#C2
hostname C2
int gi0/0
ip address 10.1.2.10 255.255.255.0
no shut
!
ip route 0.0.0.0 0.0.0.0 10.1.2.1
#C3
hostname C3
int gi0/0
ip address 10.1.3.10 255.255.255.0
no shut
!
ip route 0.0.0.0 0.0.0.0 10.1.3.1
Ensure you have a fully functional L3VPN. C1 should be able to ping C3.
C1#ping 10.1.3.10
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.1.3.10, timeout is 2 seconds:
..!!!
Success rate is 60 percent (3/5), round-trip min/avg/max = 13/14/16 ms
Enable PIM in the customer network.
#CE1, CE2, CE3
ip multicast-routing distributed
!
int range Gi1-2
ip pim sparse-mode
#CE1
ip pim bsr-candidate Gi2
ip pim rp-candidate Gi2
Each customer site is currently an isolated PIM network. CE2 and CE3 have not learned the RP yet.
Enable C-PIM on PE1, PE2, and PE3. This allows the PE to form a PIM neighborship with the CE. Notice that you must enable multicast routing for the VRF.
#PE1, PE2
ip multicast-routing vrf CUSTOMER distributed
!
int Gi2
ip pim sparse-mode
#PE3
multicast-routing vrf CUSTOMER
address-family ipv4
interface Gi0/0/0/1
enable
!
router pim
vrf CUSTOMER
address-family ipv4
int Gi0/0/0/1
enable
All PEs have a PIM neighborship with their local CE:
PE1#show ip pim vrf CUSTOMER neighbor
PIM Neighbor Table
Mode: B - Bidir Capable, DR - Designated Router, N - Default DR Priority,
P - Proxy Capable, S - State Refresh Capable, G - GenID Capable,
L - DR Load-balancing Capable
Neighbor Interface Uptime/Expires Ver DR
Address Prio/Mode
100.64.0.2 GigabitEthernet2 00:00:42/00:01:31 v2 1 / DR S P G
PE2#show ip pim vrf CUSTOMER nei
*Sep 20 22:42:33.544: %SYS-5-CONFIG_I: Configured from console by console
PIM Neighbor Table
Mode: B - Bidir Capable, DR - Designated Router, N - Default DR Priority,
P - Proxy Capable, S - State Refresh Capable, G - GenID Capable,
L - DR Load-balancing Capable
Neighbor Interface Uptime/Expires Ver DR
Address Prio/Mode
100.64.0.6 GigabitEthernet2 00:02:52/00:01:19 v2 1 / DR S P G
RP/0/0/CPU0:PE3#show pim vrf CUSTOMER neighbor
Tue Sep 20 22:44:27.260 UTC
PIM neighbors in VRF CUSTOMER
Flag: B - Bidir capable, P - Proxy capable, DR - Designated Router,
E - ECMP Redirect capable
* indicates the neighbor created for this router
Neighbor Address Interface Uptime Expires DR pri Flags
100.64.0.9* GigabitEthernet0/0/0/1 20:32:57 00:01:25 1 B E
100.64.0.10 GigabitEthernet0/0/0/1 00:04:55 00:01:20 1 (DR) P
So far which routers are aware of the RP in the C-PIM?
The answer is only CE1 (which is itself the RP) and PE1. We still have three separate C-PIM islands.
PE1#show ip pim vrf CUSTOMER rp map
PIM Group-to-RP Mappings
Group(s) 224.0.0.0/4
RP 10.1.1.1 (?), v2
Info source: 10.1.1.1 (?), via bootstrap, priority 0, holdtime 150
Uptime: 00:04:28, expires: 00:02:03
PE2#show ip pim vrf CUSTOMER rp map
PIM Group-to-RP Mappings
PE2#
RP/0/0/CPU0:PE3#show pim vrf CUSTOMER rp mapping
Tue Sep 20 22:47:11.878 UTC
PIM Group-to-RP Mappings
RP/0/0/CPU0:PE3#
Enable P-PIM (PIM inside the provider core). We will use PIM-SSM, so no RP is needed. P-PIM will be used as our underlay.
On IOS-XE the BGP peering address is used as the mdt source by default. On IOS-XR you must specify the source interface.
Each PE now advertises an ipv4 mdt route which includes its loopback address, RD for the VRF, and MDT group address:
The PEs automatically bring up a GRE tunnel interface. The PEs also source a PIM Join for every SSM (PE, MDT-group) pair learned via MP-BGP. On P1, examine the PIM topology. (In IOS-XR the pim topology is roughly similar to examining the ip mroute table on IOS-XE).
Even though IGMPv2 is being used by default, the PEs send (S, G) SSM Joins. IGMPv3 is not necessary since we technically have no hosts in the P-PIM. The IGMP SSM Report is generated via the loopback on the PEs. Each PE will then generate the PIM Joins for each (S, G) pair. The P routers never see an IGMP report.
The PEs now have a full mesh of adjacencies in the C-PIM:
Examine the tun1 interface. This interface was automatically created when specifying the default MDT group under the VRF.
In this pcap you can see that C-PIM Hellos sourced from each PE are received by all other PEs.
The outer header has a source of PE2 and destination of the default MDT group, 232.1.1.1
The inner header is a normal PIM Hello, destined for all PIM routers (224.0.0.13)
Notice the PIM Bootstrap messages are now being tunneled as well
CE2 and CE3 should now learn the RP via BSR. PE2 and PE3 flood this via their C-PIM adjacencies with the CE routers.
CE2#show ip pim rp map
PIM Group-to-RP Mappings
Group(s) 224.0.0.0/4
RP 10.1.1.1 (?), v2
Info source: 10.1.1.1 (?), via bootstrap, priority 0, holdtime 150
Uptime: 02:20:57, expires: 00:02:06
CE3#show ip pim rp map
PIM Group-to-RP Mappings
Group(s) 224.0.0.0/4
RP 10.1.1.1 (?), v2
Info source: 10.1.1.1 (?), via bootstrap, priority 0, holdtime 150
Uptime: 02:21:17, expires: 00:01:49
Multicast traffic should now be working. On C1 join a group, and ping that group from C2.
#C1
int Gi0/0
ip igmp join-group 239.1.2.3
C2#ping 239.1.2.3 repeat 5
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 239.1.2.3, timeout is 2 seconds:
Reply to request 0 from 10.1.1.10, 234 ms
Reply to request 1 from 10.1.1.10, 60 ms
Reply to request 1 from 10.1.1.10, 101 ms
Reply to request 2 from 10.1.1.10, 17 ms
Reply to request 3 from 10.1.1.10, 18 ms
Reply to request 4 from 10.1.1.10, 14 ms
Traffic is now working! Do you notice any problems with this setup?
Take a pcap at the P2-PE3 link. What do you think we’ll see? C3 is neither a receiver nor a source for this multicast tree right now. Will PE3 receive traffic?
C2#ping 239.1.2.3 repeat 10 timeout 1
Type escape sequence to abort.
Sending 10, 100-byte ICMP Echos to 239.1.2.3, timeout is 1 seconds:
Reply to request 0 from 10.1.1.10, 15 ms
Reply to request 1 from 10.1.1.10, 15 ms
Reply to request 2 from 10.1.1.10, 13 ms
Reply to request 3 from 10.1.1.10, 18 ms
Reply to request 4 from 10.1.1.10, 14 ms
Reply to request 5 from 10.1.1.10, 11 ms
Reply to request 6 from 10.1.1.10, 13 ms
Reply to request 7 from 10.1.1.10, 13 ms
Reply to request 8 from 10.1.1.10, 13 ms
Reply to request 9 from 10.1.1.10, 14 ms
PE3 sees this traffic. Why is that?
It’s because PE2 tunnels this traffic using the default MDT group, which PE3 is a receiver for. Therefore PE3 must receive this traffic. When using the default MDT group, traffic is flooded to all PEs in that VRF.
There is an optimization for mVPN profile 0 to prevent this unnecessary flooding, which is to use a data MDT group. If traffic passes a certain bandwidth threshold, it can switchover to a different multicast group, and PE3 can choose not to join it as a result of having no (S, G) state for the customer traffic in the C-PIM table.
Data MDTs
Data MDTs allow PEs to switchover to a group other than the default MDT in the P-PIM underlay. A PE advertises the C-PIM (S, G) and the new P-PIM group that traffic will switchover to. If another PE has listeners for this C-PIM (S, G), the PE will send a (S, G) P-PIM Join for the new SSM group in the underlay. If not, the PE does not join the SSM group. This prevents the unneccessary flooding of the customer traffic that we saw above.
Configure a range of data MDT groups each router will use under the VRF. Since this is PIM-SSM, we can reuse a range of groups on every PE. We’ll use 232.100.100.0/24. This allows for 256 unique groups. We also need to specify the bandwidth threshold, which defines when the PE will switch from the default MDT to a data MDT.
#PE1, PE2
vrf definition CUSTOMER
address-family ipv4
mdt data 232.100.100.0 0.0.0.255
! MDT data threshold is in kbps
mdt data threshold 1
#PE3
multicast-routing
vrf CUSTOMER
address-family ipv4
mdt data 232.100.100.0/24 threshold 1
We’ll start a stream of traffic again on C2 and take pcaps in two places: at PE1 and at PE3.
C2#ping 239.1.2.3 size 1400 timeout 1 repeat 500
Once traffic exceeds 1kbps, CE2 sends a Data-MDT Join, which is basically an advertisement that says “If you want traffic for (10.1.2.10, 239.1.2.3), then I’m moving to (2.2.2.2, 232.100.100.0).”
The data contains the C-PIM (S, G) and the new P-PIM Data MDT group (232.100.100.0). Wireshark doesn’t automatically decode it.
PE1 then sends a P-PIM Join for (2.2.2.2, 232.100.100.0) because PE1 has state for (10.1.2.10, 239.1.2.3) with an outgoing interface towards CE1.
CE2 waits a few seconds and then switches over to 232.100.100.0
The traffic is encapsulated in GRE just as before. The difference is that it is now using the data MDT group instead of the default MDT group in the underlay.
PE3 also saw the Data-MDT Join (because it is sent on the default MDT group), but PE3 has no state for the C-PIM (S, G) so it does not join the data MDT group. After a few seconds, it stops seeing the ICMP customer traffic.
Using the below command, you can see the data MDT group that was advertised by a router:
PE2#show ip pim vrf CUSTOMER mdt send
MDT-data send list for VRF: CUSTOMER
(source, group) MDT-data group/num ref_count
(10.1.2.10, 239.1.2.3) 232.100.100.0 1
Conclusion
The following technologies define mVPN Profile 0:
A working L3 VPN (prerequisite)
PE routers run C-PIM with the CE
All PE and P routers run P-PIM (underlay)
PE routers configure a default MDT under the VRF
PE routers run bgp ipv4 mdt to learn about other PEs for this default MDT
PE routers automatically generate a GRE interface
PE routers automatically send PIM Joins for all SSM groups they learn via MP-BGP
PE routers then form a full mesh of C-PIM adjacencies (overlay) using the GRE tunnel interfaces
Optionally, PE routers define a data MDT to reduce unnecessary flooding of customer traffic
What defines mVPN profile 0 is the way GRE is used to tunnel traffic between PEs, ontop of a PIM network running as an underlay in the provider network.
Next we will explore profile 1, in which MPLS is used to tunnel traffic, instead of GRE.
PE1#show bgp ipv4 mdt all 1.1.1.1/32
BGP routing table entry for version 3
Paths: (1 available, best #1, table IPv4-MDT-BGP-Table)
Advertised to update-groups:
1
Refresh Epoch 1
Local
0.0.0.0 from 0.0.0.0 (1.1.1.1)
Origin incomplete, localpref 100, valid, sourced, local, best,
rx pathid: 0, tx pathid: 0x0
Updated on Sep 20 2022 22:49:38 UTC
PE1#show bgp ipv4 mdt all | beg Network
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 100:1 (default for vrf CUSTOMER)
*> 1.1.1.1/32 0.0.0.0 0 ?
*>i 2.2.2.2/32 2.2.2.2 0 100 0 ?
*>i 3.3.3.3/32 3.3.3.3 100 0 i
PE1#show ip pim vrf CUSTOMER neighbor
PIM Neighbor Table
Mode: B - Bidir Capable, DR - Designated Router, N - Default DR Priority,
P - Proxy Capable, S - State Refresh Capable, G - GenID Capable,
L - DR Load-balancing Capable
Neighbor Interface Uptime/Expires Ver DR
Address Prio/Mode
100.64.0.2 GigabitEthernet2 02:24:36/00:01:21 v2 1 / DR S P G
3.3.3.3 02:16:36/00:01:17 v2 1 / DR G
2.2.2.2 Tunnel1 02:16:37/00:01:15 v2 1 / S P G
PE1#show int tun1
Tunnel1 is up, line protocol is up
Hardware is Tunnel
Interface is unnumbered. Using address of Loopback0 (1.1.1.1)
MTU 9976 bytes, BW 100 Kbit/sec, DLY 50000 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation TUNNEL, loopback not set
Keepalive not set
Tunnel linestate evaluation up
Tunnel Subblocks:
src-track:
Tunnel1 source tracking subblock associated with Loopback0
Set of tunnels with source Loopback0, 1 member (includes iterators), on interface <OK>
Tunnel protocol/transport
Key disabled, sequencing disabled
Checksumming of packets disabled
Tunnel TTL 255, Fast tunneling enabled
Tunnel transport MTU 1476 bytes
Tunnel transmit bandwidth 8000 (kbps)
Tunnel receive bandwidth 8000 (kbps)
Last input 00:00:00, output 00:00:03, output hang never
Last clearing of "show interface" counters 02:17:12
Input queue: 0/375/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/0 (size/max)
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 0 bits/sec, 0 packets/sec
554 packets input, 32132 bytes, 0 no buffer
Received 0 broadcasts (554 IP multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
561 packets output, 42090 bytes, 0 underruns
Output 0 broadcasts (0 IP multicasts)
0 output errors, 0 collisions, 0 interface resets
0 unknown protocol drops
0 output buffer failures, 0 output buffers swapped out