Interdomain Multicast (PIM-SM)

PIM-SM was developed for intradomain operation. If two separate ASs want to route multicast traffic between their domains, they must use MSDP (Multicast Source Discovery Protocol) in order to solve an issue with source discovery.

The inherent issue with interdomain multicast is that an AS should have control over its own RPs. It does not want its internal routers to use the RP of remote domains. This poses a problem when it comes to source discovery. In intradomain PIM-SM, the FHR will Register the source with its RP. In interdomain PIM-SM, we need another mechanism to do source registration, because we don’t want all FHRs to use an RP in another domain.

With MSDP, the RPs in each domain peer with each other. When an RP sees a PIM Register, it sends a SA (Source Active) message to all MSDP peers to notify them of this source. This allows the other RP(s) to create a SPT rooted at this source and pull in the traffic over the interdomain boundary. In effect, the SA replaces the PIM Register, and in fact the multicast traffic is encapsulated in the SA message just like in a PIM Register message. The SA message is sent every 60 seconds while the source continues to be active to keep the entry from timing out on remote RPs.

MSDP uses TCP port 639. It functions sort of like a "lite" version of BGP. Peers are explicitly configured and maintained via keepalives.

In addition to MSDP, you can optionally use MP-BGP to exchange reachability information of sources over the domain boundry. The AFI is ipv4 multicast. The prefixes of the sources are advertised, allowing other routers in the domain to create a (S, G) SPT rooted at a source in another domain. The prefixes learned over ipv4 multicast are used for the RPF check. These prefixes are not present in the unicast RIB, instead they are present in a separate mcast table. Using MP-BGP is not required for interdomain multicast to work. As long as a route to the source is present, traffic will work. The RPF check can use either the unicast table or mcast table (which is populated via BGP).

This article will cover interdomain PIM-SM. Interdomain PIM-SSM is a much easier alternative, as you don’t need to run MSDP because RPs are not required in the first place. You simply must have reachability to the sources in order for PIM-SSM to work.

Lab

We will use the following topology. Domain 1 runs OSPF and Domain 2 runs ISIS. Configure R3 and R6 as the RP for their domains using BSR.

Here are the startup configs:

#R1
hostname R1
!
ip multicast-routing distributed
!
int Gi1
 ip address 10.10.13.1 255.255.255.0
 no shut
 ip ospf network point-to-point
 ip pim sparse-mode
!
int Gi2
 ip address 10.10.100.1 255.255.255.0
 ip pim sparse-mode
 no shut
!
int Lo0
 ip address 1.1.1.1 255.255.255.255
!
router ospf 1
 network 0.0.0.0 255.255.255.255 area 0
 passive-interface Gi2

#R2
hostname R2
!
ip multicast-routing distributed
!
int Gi1
 ip address 10.10.10.1 255.255.255.0
 no shut
 ip pim sparse-mode
!
int Gi2
 ip address 10.10.23.2 255.255.255.0
 ip pim sparse-mode
 ip ospf network point-to-point
 no shut
!
int Lo0
 ip address 2.2.2.2 255.255.255.255
!
router ospf 1
 network 0.0.0.0 255.255.255.255 area 0
 passive-interface Gi1

#R3
hostname R3
!
ip multicast-routing distributed
!
int Gi1
 ip address 10.10.23.3 255.255.255.0
 no shut
 ip pim sparse-mode
 ip ospf network point-to-point
!
int Gi2
 ip address 10.10.13.3 255.255.255.0
 ip pim sparse-mode
 ip ospf network point-to-point
 no shut
!
int Gi3
 ip address 10.10.34.3 255.255.255.0
 ip pim sparse-mode
 ip ospf network point-to-point
 no shut
!
int Lo0
 ip address 3.3.3.3 255.255.255.255
 ip pim sparse-mode
!
ip pim bsr-candidate lo0
ip pim rp-candidate lo0
!
router ospf 1
 network 0.0.0.0 255.255.255.255 area 0

#R4
hostname R4
!
ip multicast-routing distributed
!
int Gi1
 ip address 10.10.34.4 255.255.255.0
 no shut
 ip pim sparse-mode
 ip ospf network point-to-point
!
int Gi2
 ip address 10.4.5.4 255.255.255.0
 no shut
!
int Lo0
 ip address 4.4.4.4 255.255.255.255
!
router ospf 1
 network 0.0.0.0 255.255.255.255 area 0
 passive-interface Gi2

#R5
hostname R5
!
ip multicast-routing distributed
!
int Gi1
 ip address 10.20.56.5 255.255.255.0
 no shut
 ip pim sparse-mode
 ip router isis
 isis network point-to-point
!
int Gi2
 ip address 10.4.5.5 255.255.255.0
 no shut
!
int Lo0
 ip address 5.5.5.5 255.255.255.255
 ip router isis
!
router isis
 net 49.0001.0000.0000.0005.00
 is-type level-2-only
 passive-interface Gi2

#R6
hostname R6
!
ip multicast-routing distributed
!
int Gi1
 ip address 10.20.56.6 255.255.255.0
 no shut
 ip pim sparse-mode
 ip router isis
 isis network point-to-point
!
int Gi2
 ip address 10.20.67.6 255.255.255.0
 no shut
 ip pim sparse-mode
 ip router isis
 isis network point-to-point
!
int Lo0
 ip address 6.6.6.6 255.255.255.255
 ip router isis
 ip pim sparse-mode
!
ip pim bsr-candidate lo0
ip pim rp-candidate lo0
!
router isis
 net 49.0001.0000.0000.0006.00
 is-type level-2-only

#R7
hostname R7
!
ip multicast-routing distributed
!
int Gi1
 ip address 10.20.67.7 255.255.255.0
 no shut
 ip pim sparse-mode
 ip router isis
 isis network point-to-point
!
int Gi2
 ip address 10.20.100.1 255.255.255.0
 ip pim sparse-mode
 no shut
!
int Lo0
 ip address 7.7.7.7 255.255.255.255
 ip router isis
!
router isis
 net 49.0001.0000.0000.0007.00
 is-type level-2-only
 passive-interface Gi2

#Source1
hostname Source1
!
no ip domain lookup
!
int Gi1
 ip address 10.10.10.10 255.255.255.0
 no shut
!
ip route 0.0.0.0 0.0.0.0 10.10.10.1

#Host1
hostname Host1
!
int Gi0/0
 ip address 10.10.100.10 255.255.255.0
 ip igmp join-group 239.1.1.1
 no shut
!
ip route 0.0.0.0 0.0.0.0 10.10.100.1

#Host2
hostname Host2
!
int Gi0/0
 ip address 10.20.100.10 255.255.255.0
 ip igmp join-group 239.1.1.1
 no shut
!
ip route 0.0.0.0 0.0.0.0 10.20.100.1

Both Host1 and Host2 have joined the group 239.1.1.1. However, traffic from Source1 is only reaching Host1 right now.

Source1#ping 239.1.1.1 repeat 3
Type escape sequence to abort.
Sending 3, 100-byte ICMP Echos to 239.1.1.1, timeout is 2 seconds:

Reply to request 0 from 10.10.100.10, 47 ms
Reply to request 1 from 10.10.100.10, 6 ms
Reply to request 1 from 10.10.100.10, 39 ms
Reply to request 2 from 10.10.100.10, 5 ms

In order for Host2 to receive the traffic, we must do two things:

  • Advertise routes for the RPs and sources into each domain via eBGP at the ASBRs (R4 and R5)

  • Configure MSDP peering between the two RPs

First we’ll enable eBGP between R4 and R5 and advertise the RP and Source prefixes. Configure R3 and R6 to be the RR in each domain.

#R1, R2
router bgp 100
 neighbor 3.3.3.3 remote-as 100
 neighbor 3.3.3.3 update-source lo0

#R3
router bgp 100
 neighbor 1.1.1.1 remote-as 100
 neighbor 1.1.1.1 update-source lo0
 neighbor 2.2.2.2 remote-as 100
 neighbor 2.2.2.2 update-source lo0
 neighbor 4.4.4.4 remote-as 100
 neighbor 4.4.4.4 update-source lo0
 !
 neighbor 1.1.1.1 route-reflector-client
 neighbor 2.2.2.2 route-reflector-client
 neighbor 4.4.4.4 route-reflector-client

#R4
router bgp 100
 neighbor 3.3.3.3 remote-as 100
 neighbor 3.3.3.3 update-source lo0
 neighbor 3.3.3.3 next-hop-self
 neighbor 10.4.5.5 remote-as 200
 !
 network 10.10.10.0 mask 255.255.255.0
 network 3.3.3.3 mask 255.255.255.255

#R5
router bgp 200
 neighbor 6.6.6.6 remote-as 200
 neighbor 6.6.6.6 update-source lo0
 neighbor 6.6.6.6 next-hop-self
 neighbor 10.4.5.4 remote-as 100
 !
 network 6.6.6.6 mask 255.255.255.255
 ! Host2's prefix is only advertised for the ICMP Reply to work
 network 10.20.100.0 mask 255.255.255.0

#R6
router bgp 200
 neighbor 5.5.5.5 remote-as 200
 neighbor 5.5.5.5 update-source lo0
 neighbor 7.7.7.7 remote-as 200
 neighbor 7.7.7.7 update-source lo0
 !
 neighbor 5.5.5.5 route-reflector-client
 neighbor 7.7.7.7 route-reflector-client


#R7
router bgp 200
 neighbor 6.6.6.6 remote-as 200
 neighbor 6.6.6.6 update-source lo0

Next we need to configure R3 and R6 as MSDP peers. This configuration is quite simple. The connect-source is analgous to BGP update-source.

#R3
ip msdp peer 6.6.6.6 connect-source lo0

#R6
ip msdp peer 3.3.3.3 connect-source lo0

Confirm the MSDP session is up:

R3#show ip msdp summary 
MSDP Peer Status Summary
Peer Address     AS    State    Uptime/  Reset SA    Peer Name
                                Downtime Count Count
6.6.6.6          200   Up       00:00:36 0     0     ?


R3#show ip msdp peer 
MSDP Peer 6.6.6.6 (?), AS 200
  Connection status:
    State: Up, Resets: 0, Connection source: Loopback0 (3.3.3.3)
    Uptime(Downtime): 00:00:32, Messages sent/received: 1/1
    Output messages discarded: 0
    Connection and counters cleared 00:01:32 ago
  SA Filtering:
    Input (S,G) filter: none, route-map: none
    Input RP filter: none, route-map: none
    Output (S,G) filter: none, route-map: none
    Output RP filter: none, route-map: none
  SA-Requests: 
    Input filter: none
  Peer ttl threshold: 0
  SAs learned from this peer: 0
  Number of connection transitions to Established state: 1
    Input queue size: 0, Output queue size: 0
  MD5 signature protection on MSDP TCP connection: not enabled
  Message counters:
    RPF Failure count: 0
    SA Messages in/out: 0/0
    SA Requests in: 0
    SA Responses out: 0
    Data Packets in/out: 0/0

Begin sending multicast traffic again at Source1. When R3 learns the source, it will send a Source Active message to R6. R6 will then join a (S, G) tree to the source. This is the same process as intradomain multicast, but the source just happens to be in a different domain.

Here is the MSDP SA message sent from R3 to R6. Notice that the multicast packet is encapsulated in the SA message. This is similar to a PIM Register, except the peer will never send a “stop” message.

Source1#ping 239.1.1.1 repeat 3
Type escape sequence to abort.
Sending 3, 100-byte ICMP Echos to 239.1.1.1, timeout is 2 seconds:

Reply to request 0 from 10.10.100.10, 38 ms
Reply to request 1 from 10.10.100.10, 6 ms
Reply to request 1 from 10.10.100.10, 42 ms
Reply to request 2 from 10.10.100.10, 6 ms

Why don’t we see replies from Host2 right now? R6 has joined an (S, G) tree rooted at Source1 (output below). Shouldn’t this be working?

R6#show ip mroute 239.1.1.1

(*, 239.1.1.1), 00:00:01/stopped, RP 6.6.6.6, flags: S
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    GigabitEthernet2, Forward/Sparse, 00:00:01/00:03:28
          
(10.10.10.10, 239.1.1.1), 00:00:01/00:02:58, flags: M
  Incoming interface: GigabitEthernet1, RPF nbr 10.20.56.5
  Outgoing interface list:
    GigabitEthernet2, Forward/Sparse, 00:00:01/00:03:28

Spend a few minutes troubleshooting on your own before scrolling down to see the answer.

Here’s a hint, take a look at R5’s mroute table:

R5#show ip mroute 239.1.1.1

(*, 239.1.1.1), 00:01:29/stopped, RP 6.6.6.6, flags: SP
  Incoming interface: GigabitEthernet1, RPF nbr 10.20.56.6
  Outgoing interface list: Null

(10.10.10.10, 239.1.1.1), 00:01:29/00:01:30, flags: 
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    GigabitEthernet1, Forward/Sparse, 00:01:29/00:02:58

There is no RPF neighbor for (10.10.10.10, 239.1.1.1). This is because R4 and R5 are not running PIM. This is an important point about interdomain multicast. The ASBRs must run PIM with eachother. Configure PIM sparse-mode on the interfaces connecting R4 and R5.

#R4, R5
int Gi2
 ip pim sparse-mode

Traffic is now reaching Host2 as well:

Source1#ping 239.1.1.1 repeat 3
Type escape sequence to abort.
Sending 3, 100-byte ICMP Echos to 239.1.1.1, timeout is 2 seconds:

Reply to request 0 from 10.10.100.10, 5 ms
Reply to request 0 from 10.20.100.10, 8 ms
Reply to request 1 from 10.10.100.10, 5 ms
Reply to request 1 from 10.20.100.10, 103 ms
Reply to request 2 from 10.10.100.10, 5 ms
Reply to request 2 from 10.20.100.10, 7 ms

We have one issue though. Examine the RP mappings on R1 and R7.

R1#show ip pim rp map
PIM Group-to-RP Mappings

Group(s) 224.0.0.0/4
  RP 6.6.6.6 (?), v2
    Info source: 6.6.6.6 (?), via bootstrap, priority 0, holdtime 150
         Uptime: 00:00:19, expires: 00:02:07
  RP 3.3.3.3 (?), v2
    Info source: 6.6.6.6 (?), via bootstrap, priority 0, holdtime 150
         Uptime: 00:34:49, expires: 00:02:09
R1#show ip pim rp
Group: 239.1.1.1, RP: 6.6.6.6, uptime 00:00:25, expires 00:02:01

R7#sho ip pim rp 
PIM Group-to-RP Mappings

Group(s) 224.0.0.0/4
  RP 6.6.6.6 (?), v2
    Info source: 6.6.6.6 (?), via bootstrap, priority 0, holdtime 150
         Uptime: 00:27:11, expires: 00:01:34
  RP 3.3.3.3 (?), v2
    Info source: 6.6.6.6 (?), via bootstrap, priority 0, holdtime 150
         Uptime: 00:00:52, expires: 00:01:36
R7#show ip pim rp
Group: 239.1.1.1, RP: 6.6.6.6, uptime 00:27:13, expires 00:01:33

Both routers are learning the other domain’s RP information. In fact, R1 is currently using R6 as an RP! (It wins due to highest IP address). How did this happen? When we configured PIM between R4 and R5, the BSR messages leaked to each domain. To fix this we need to configure Gi2 on each router as the BSR boundary.

#R4, R5
int Gi2
 ip pim bsr-border

! The above command prevents PIM Bootstrap messages from being sent on the link

Wait a few minutes and the inter-domain RP mappings should time out on each router.

Using MP-BGP

Currently the route for Source1 (10.10.10.0/24) is present in the unicast RIB of all routers in domain 2.

R6#show ip route | in 10.10.10.
B        10.10.10.0/24 [200/3] via 5.5.5.5, 00:25:14

If this is not desirable, you can exchange this route over BGP ipv4 multicast instead of BGP ipv4 unicast. This will remove the prefix from the unicast routing table, but still allow the route to be used for the RPF check.

Using BGP ipv4 multicast can also be a traffic engineering technique. Perhaps you want one interdomain link to be used for multicast and another interdomain link for unicast traffic. You can configure separate peering sessions over each link in order to have greater control.

Let’s configure an ipv4 multicast session between R4 and R5 and advertise the source prefix over this session. We will also need to run the ipv4 multicast AFI on the internal routers in domain 2.

#R4
router bgp 100
 no network 10.10.10.0 mask 255.255.255.0
 address-family ipv4 multicast
  neighbor 10.4.5.5 activate
  network 10.10.10.0 mask 255.255.255.0

#R5
router bgp 200
 address-family ipv4 multicast
  neighbor 10.4.5.4 activate
! Within Domain 2, we also need to run the ipv4 multicast AFI
  neighbor 6.6.6.6 activate
  neighbor 6.6.6.6 next-hop-self

#R6
router bgp 200
 address-family ipv4 multicast
   neighbor 5.5.5.5 activate
   neighbor 5.5.5.5 route-reflector-client
   neighbor 7.7.7.7 activate
   neighbor 7.7.7.7 route-reflector-client
   
#R7
router bgp 200
 address-family ipv4 multicast
   neighbor 6.6.6.6 activate
   neighbor 6.6.6.6 route-reflector-client

The prefix is now gone from the RIB, but the RPF check can still work by using the mcast BGP table.

R6#show ip route | in 10.10.10.
R6#
R6#show ip rpf 10.10.10.10     
RPF information for ? (10.10.10.10)
  RPF interface: GigabitEthernet1
  RPF neighbor: ? (10.20.56.5)
  RPF route/mask: 10.10.10.0/24
  RPF type: multicast (bgp 200)
  Doing distance-preferred lookups across tables
  RPF topology: ipv4 multicast base, originated from ipv4 unicast base

It is important to realize that the BGP mcast table is only used for RPF checks based on mutlicast sources. You do not advertise any multicast group addresses (224/4) into BGP. The RPF check is not actually used for forwarding traffic. It is used in order to have a usable upstream interface to create a (S, G) tree. The traffic then flows down this tree. (Traffic arrives from the source. Traffic is not sent to the source. We need a route to the source to do an RPF check though.)

Traffic from Source1 will still arrive at Host2, however Domain2 no longer has a unicast route back to the source, so the pings time out. But multicast traffic is nevertheless working.

Source1#ping 239.1.1.1
Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 239.1.1.1, timeout is 2 seconds:

Reply to request 0 from 10.10.100.10, 47 ms

Host# debug ip icmp
*Sep 17 21:36:48.600: ICMP: echo reply sent, src 10.20.100.10, dst 10.10.10.10, topology BASE, dscp 0 topoid 0
*Sep 17 21:36:48.601: ICMP: dst (10.20.100.10) host unreachable rcv from 10.20.100.1
  • R7 sends a host unreachable to Host2 because 10.10.10.0/24 is no longer in the unicast RIB

Conclusion

Interdomain multicast requires either PIM-SSM (which is the simplest deployment) or PIM-SM with MSDP. MSDP (Multicast Source Discovery Protocol) allows an RP to discover a source in another domain. When a source goes active, the RP local to the source’s domain sends a Source Active message to all other MSDP peers. This allows the RPs in other domains to discover the source just as in a PIM Register message. The RPs, if they have state for the matching (*, G) shared tree, join a (S, G) tree rooted at the source in order to “pull in” the traffic.

Interdomain multicast also requires routes to sources in other domains in order to pass the RPF check. This can be done using unicast routes, or by running BGP for the ipv4 multicast AFI. Additionally, you must run PIM on the ASBRs at the interdomain link. Be sure to take precaution that these routers do not leak RP information.

Last updated