PIM-BiDir

Up to this point we have been examining one-to-many multicast scenarios. We had a single source which originated traffic to a multicast group, in which there were one or more receivers.

Some applications use a many-to-many approach. For example, a multicast-enabled video chat application may have every participant be both a source and a receiver. The participant is a source when it sends its own video stream, and a receiver of all other participants’ streams.

PIM-BiDir (bidirectional) was created to efficiently handle many-to-many multicast traffic. PIM-BiDir uses PIM-SM but with a few modifications. First, there is no (S, G) state in PIM-BiDir. In PIM-SM, routers have the option of switching over to the SPT. This is not an option in PIM-BiDir.

Second, there is no source registration process. In PIM-SM, when a source begins sending, the FHR unicasts Register messages to the RP. The RP creates (S, G) state and joins a SPT rooted at the sender. In contrast, in PIM-BiDir traffic flows both up and down the shared tree. When a source starts sending, the FHR simply starts forwarding the traffic towards the RP. The RP then forwards the traffic down the shared tree towards receivers.

In PIM-BiDir, the relationship between the RP and receivers is the same as you’ve seen in normal PIM-SM. Receivers join via IGMP/PIM Joins being sent upstream towards the RP. The process of the source traffic flowing “up” the shared tree is what is different.

To fully understand why source traffic cannot flow “up” the shared tree in PIM-SM, we need to look at the RPF process. The shared tree is rooted at the RP. Let’s imagine a router has a shortest path to the RP via Gi2. Therefore any shared tree has an incoming interface of Gi2. A packet from a source arrives via Gi1. If the router were to consider this as part of the shared tree, it would fail the RPF check. In contrast, when the RP forwards traffic down the shared tree, downstream routers will see the traffic arrive via their RPF interface facing the RP. This is fundamentally why the PIM Register process exists, and why the RP joins a (S, G) tree rooted at the source. (The PIM Register circumvents the RPF check because it is unicast traffic destined directly to the RP. Routers “in the middle” between the FHR and RP do IP forwarding of the Register message instead of multicast forwarding. The RP then has to do a (S, G) Join in order to “pull” the traffic from the source directly to itself.)

PIM-BiDir allows packets to be forwarded “up” the shared tree, circumventing the RPF check. Because of this, we need an additional loop prevention mechanism. PIM-BiDir uses the concept of a Designated Forwarder (DF) on every link in order to handle loop prevention. The DF is not to be confused with the Designated Router (DR) which is elected during neighbor adjacency formation. A DF is elected on every segment. The router with the best unicast metric to the RP is elected DF. In PIM-BiDir the DF also takes on the role of the DR.

Let’s configure this in the lab to really understand how this works.

Lab

We’ll reuse our topology from the previous articles but move the RP to R4.

BSR is still configured from the last article. To enable PIM-BiDir we must do two things:

  • Enable bidir on every router using ip pim bidir-enable

  • Set the RP as bidir

    • Statically: ip pim rp-address 1.1.1.1 bidir

    • BSR: ip pim rp-candidate Lo0 bidir

    • Auto-RP: ip pim send-rp-announce lo0 scope 255 bidir

All interfaces will still be set to ip pim sparse-mode

Let’s move the BSR rp-candidate to only R4, and enable bidir on every router.

#R1, R6
no ip pim rp-candidate lo0

#R4
int lo0
 ip pim sparse-mode
!
ip pim bidir-enable
ip pim rp-candidate lo0 bidir

#All routers
ip pim bidir-enable

Take a look at any non-RP’s mroute table. Notice that there is a (*, 224/4) entry.

R1#show ip mroute

(*,224.0.0.0/4), 00:35:20/-, RP 4.4.4.4, flags: B
  Bidir-Upstream: GigabitEthernet4, RPF nbr: 10.1.6.6
  Incoming interface list:
    Loopback0, Accepting/Sparse
    GigabitEthernet1, Accepting/Sparse
    GigabitEthernet4, Accepting/Sparse

There is now a bidir-upstream interface. This is the RPF interface facing the RP. The incoming interface list includes every interface for which the router is the DF plus the upstream interface. The router will not accept traffic on an interface for which it is not the DF. Traffic that does arrive on an interface in the incoming interface list will be forwarded up the RPT out the bidir-upstream interface.

DF Election

Let’s look at the DF election process on R1 in a little more detail. The DF is election based on the lowest metric to the RP (4.4.4.4), with the tiebreaker being the highest IP address.

  • Gi1

    • No other neighbors on this interface. R1 wins the DF election

  • Gi2

    • R2 has a metric of 2 to 4.4.4.4/32. R1 has a metric of 3 to 4.4.4.4/32. R2 wins the DF election.

  • Gi3

    • R5 has a metric of 2 as well, so R5 wins

  • Gi4

    • R6 has a metric of 2, so R6 wins

    • This is also the upstream interface towards R4

The DF process ensures loop-free multicast delivery by allowing incoming traffic only on certain interfaces. If R1 received a multicast packet on Gi2 and forwarded it to R6, and R6 could forward this to R2, this would create a loop. The routers form a sort of spanning-tree loop free topology. You can almost think of the non-DF interfaces as being “blocked.” A “blocked” interface will not show up in the (*, 224/4) entry under incoming interfaces. You can think of the RP as the root bridge in spanning-tree.

You can easily see which interfaces a router won the DF election on, by using the following command:

R1#show ip pim int df
* implies this system is the DF
Interface                RP               DF Winner        Metric     Uptime
GigabitEthernet1         4.4.4.4          *10.10.10.1       3          00:41:39
GigabitEthernet2         4.4.4.4           10.1.2.2         2          00:41:37
GigabitEthernet3         4.4.4.4           10.1.5.5         2          00:41:22
GigabitEthernet4         4.4.4.4           10.1.6.6         2          00:41:39
Loopback0                4.4.4.4          *1.1.1.1          3          00:41:39

This election happens with a special PIM DF Election packet type. Each neighbor advertises its support for BiDir in the PIM Hello. Once both neighbors see that each other supports BiDir, they send a PIM DF Election packet which lists the AD and metric of their route to the RP.

Traffic flow

Let’s examine how traffic flows. Ensure that Host4 has joined 239.100.100.100. Run a ping from Source1.

Source#ping 239.100.100.100 repeat 3
Type escape sequence to abort.
Sending 3, 100-byte ICMP Echos to 239.100.100.100, timeout is 2 seconds:

Reply to request 0 from 10.10.200.10, 5 ms
Reply to request 1 from 10.10.200.10, 4 ms
Reply to request 2 from 10.10.200.10, 5 ms

Notice that the routers on the path from the FHR to the RP have no state for this multicast group.

R1#show ip mroute

(*,224.0.0.0/4), 00:43:48/-, RP 4.4.4.4, flags: B
  Bidir-Upstream: GigabitEthernet4, RPF nbr: 10.1.6.6
  Incoming interface list:
    Loopback0, Accepting/Sparse
    GigabitEthernet1, Accepting/Sparse
    GigabitEthernet4, Accepting/Sparse

(*, 224.0.1.40), 1d00h/00:02:40, RP 0.0.0.0, flags: DCL
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    GigabitEthernet4, Forward/Sparse, 1d00h/stopped
    GigabitEthernet3, Forward/Sparse, 1d00h/stopped
    GigabitEthernet2, Forward/Sparse, 1d00h/stopped
    GigabitEthernet1, Forward/Sparse, 1d00h/stopped

R6#show ip mroute

(*,224.0.0.0/4), 00:44:01/-, RP 4.4.4.4, flags: B
  Bidir-Upstream: GigabitEthernet3, RPF nbr: 10.4.6.4
  Incoming interface list:
    GigabitEthernet1, Accepting/Sparse
    GigabitEthernet2, Accepting/Sparse
    Loopback0, Accepting/Sparse
    GigabitEthernet3, Accepting/Sparse

(*, 224.0.1.40), 1d00h/00:02:26, RP 0.0.0.0, flags: DCL
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    GigabitEthernet1, Forward/Sparse, 1d00h/00:02:26

These routers forwarded the traffic using the (*, 224/4) entry alone. PIM-BiDir allows for greater scalability, as routers will never create (S, G) state, nor will routers that forward “up” the tree ever create the (*, G) state. The only state required here is (*, 224/4).

The routers between the LHR and the RP have the (*, 239.1.1.1) entry due to the PIM Join process. The LHR will not switch over to the (S, G) tree. This looks like PIM-SM with no SPT switchover.

R5#show ip mroute 239.100.100.100

(*, 239.100.100.100), 00:44:57/00:02:26, RP 4.4.4.4, flags: BC
  Bidir-Upstream: GigabitEthernet3, RPF nbr 10.4.5.4
  Outgoing interface list:
    GigabitEthernet2, Forward/Sparse, 00:44:57/00:02:26
    GigabitEthernet3, Bidir-Upstream/Sparse, 00:44:57/stopped

Interestingly, the bidir-upstream interface is always added to the OIL. This is so that a host can source traffic to this group, hence enabling bidirectional functionality.

Phantom RP

In PIM-BiDir, the RP never needs to take any action besides acting as the root of the shared tree. The RP does not join a (S, G) tree rooted at the source as in PIM-SM. The RP therefore does not actually need to be an interface on a router in PIM-BiDir. You can use a longest-matching routing trick to create a redundant RP setup with a single static RP address.

In our lab we will delete the BSR configuration on R4, and make R4 the failover phantom RP and R6 the primary phantom RP. To do this, we will statically configure all routers with an RP address of 192.168.100.1. R4 will have a loopback of 192.168.100.2/29 and R6 will have a loopback of 192.168.100.2/30. Longest prefix mask routing will deliver PIM Joins to R6, even though R6 does not have an interface with the specific address of the RP. PIM-BiDir still works though, because the PIM Join simply needs to “make its way” to the phantom RP.

#R6
int lo1
 ip address 192.168.100.2 255.255.255.252
 ip ospf network point-to-point ! Used to allow OSPF to advertise the loopback's mask
 ip pim sparse-mode
!
ip pim rp-address 192.168.100.1 bidir

#R4
int lo1
 ip address 192.168.100.2 255.255.255.248
 ip ospf network point-to-point
 ip pim sparse-mode
!
no ip pim rp-candidate Loopback0 bidir
ip pim rp-address 192.168.100.1 bidir

#All other routers
ip pim rp-address 192.168.100.1 bidir

The PIM Join for 239.100.100.100 has now made its way up to R6:

R6#show ip mroute 239.100.100.100

(*, 239.100.100.100), 00:03:23/00:03:03, RP 192.168.100.1, flags: B
  Bidir-Upstream: GigabitEthernet3, RPF nbr 0.0.0.0
  Outgoing interface list:
    GigabitEthernet2, Forward/Sparse, 00:03:23/00:03:03
    GigabitEthernet3, Bidir-Upstream/Sparse, 00:03:23/stopped

It’s important that the loopbacks are not the actual IP of the RP. R4 in this case needs to forward traffic for 192.168.100.1 towards R6, and not accept it as if it is the actual RP. R4 forwards traffic to R6 because of the longer prefix match. 192.168.100.2/29 is it’s own interface, and 192.168.100.1 is reachable via Gi3.

R4#show ip route 192.168.100.2
Routing entry for 192.168.100.2/32
  Known via "connected", distance 0, metric 0 (connected)
  Routing Descriptor Blocks:
  * directly connected, via Loopback1
      Route metric is 0, traffic share count is 1

R4#show ip route 192.168.100.1
Routing entry for 192.168.100.0/30
  Known via "ospf 1", distance 110, metric 2, type intra area
  Last update from 10.4.6.6 on GigabitEthernet3, 00:06:19 ago
  Routing Descriptor Blocks:
  * 10.4.6.6, from 6.6.6.6, 00:06:19 ago, via GigabitEthernet3
      Route metric is 2, traffic share count is 1

To simulate RP failover, we need to run BFD on all links on R6. This is because in our lab environment, the link status will not propagate to the other side of the link if we simply shutdown links on R6.

#R6
int range gi1-3
 bfd interval 250 min_rx 250 multiplier 3
!
router ospf 1
 bfd all-interfaces

#R4
int gi3
 bfd interval 250 min_rx 250 multiplier 3
!
router ospf 1
 bfd all-interfaces

#R2
int gi4
 bfd interval 250 min_rx 250 multiplier 3
!
router ospf 1
 bfd all-interfaces

#R1
int gi4
 bfd interval 250 min_rx 250 multiplier 3
!
router ospf 1
 bfd all-interfaces

Run a repeated ping from Source1 and then shutdown all interfaces on R6 while the ping is running.

#Source1
ping 239.100.100.100 repeat 10

#R6
int range Gi1-3
 shutdown


! In my lab, no pings are lost. You may lose a single ping.

Source#ping 239.100.100.100 repeat 10
Type escape sequence to abort.
Sending 10, 100-byte ICMP Echos to 239.100.100.100, timeout is 2 seconds:

Reply to request 0 from 10.10.200.10, 4 ms
Reply to request 1 from 10.10.200.10, 5 ms
Reply to request 2 from 10.10.200.10, 4 ms
Reply to request 3 from 10.10.200.10, 4 ms
Reply to request 4 from 10.10.200.10, 4 ms
Reply to request 5 from 10.10.200.10, 4 ms
Reply to request 6 from 10.10.200.10, 4 ms
Reply to request 7 from 10.10.200.10, 4 ms
Reply to request 8 from 10.10.200.10, 4 ms
Reply to request 9 from 10.10.200.10, 5 ms

R1 converges and picks a new interface, Gi3 as the bidir-upstream towards 192.168.100.1. R1 has two equal-cost routes to R4, via R2 and R5. However, a multicast router can only have a single upstream interface, and picks the nexthop with the highest IP (R5).

R1#show ip mroute 224.0.0.0/4
(*,224.0.0.0/4), 00:00:41/-, RP 192.168.100.1, flags: B
  Bidir-Upstream: GigabitEthernet3, RPF nbr: 10.1.5.5
  Incoming interface list:
    GigabitEthernet3, Accepting/Sparse
    GigabitEthernet4, Accepting/Sparse
    Loopback0, Accepting/Sparse
    GigabitEthernet1, Accepting/Sparse

When R5 receives the traffic on Gi1, it sends it to both R4 and the LAN on Gi2. This is because both Gi3 and Gi2 are in the OIL for (*, 239.100.100.100)

R5#show ip mroute 239.100.100.100   

(*, 239.100.100.100), 00:31:33/00:02:15, RP 192.168.100.1, flags: BC
  Bidir-Upstream: GigabitEthernet3, RPF nbr 10.4.5.4
  Outgoing interface list:
    GigabitEthernet3, Bidir-Upstream/Sparse, 00:10:58/stopped
    GigabitEthernet2, Forward/Sparse, 00:31:33/00:02:15

R4 receives the traffic but does nothing with it, as the only other outgoing interface is Lo1.

  • Blue is the path when R6 is the RP. Green is the path when R4 becomes RP.

To test your knowledge, try to answer this question:

Q: Why does the phantom RP technique not work for PIM-SM?

A: The RP in PIM-SM needs to receive PIM Register messages that are delivered via unicast to it. The RP needs to have an interface on the router which matches the IP in the packet in order to process the PIM Register and then initiate a (S, G) tree to the source.

Also note that in PIM-BiDir, the RP never creates the encap/decap tunnel interfaces, because PIM Register is never used in BiDir.

Conclusion

PIM-BiDir is a PIM mode which allows for scalable many-to-many multicast traffic. In PIM-BiDir no (S, G) state is created. Multicast traffic flowing down from the RP to the receivers works like PIM-SM with no SPT switchover. Traffic flowing from sources to the RP is very different however. Routers simply forward traffic towards the RP, using a blanket (*, 224/4) entry. (Keep in mind you can map another range besides 224/4).

PIM-BiDir also adds the concept of a DF. This creates a loop-free multicast topology, because a router will not permit multicast traffic incoming on an interface for which it is not the DF.

PIM-BiDir adds scalability in terms of less state in the network, however it adds some complexity when it comes to operations and troubleshooting. Additionaly, PIM-BiDir places a high burden on the RP as all traffic must flow through it. You cannot switchover to a (S, G) tree and bypass the RP as with PIM-SM. In BiDir, the RP is forced to be in the data plane, while in ASM, the RP only needs to be in the data plane for the first packet just to faciliate source discovery at the LHR.

Further Reading

https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/ipmulti_pim/configuration/xe-16-5/imc-pim-xe-16-5-book/imc-basic-cfg.html#GUID-028F70C9-0791-4BB2-8D1E-E8415EB12F76

https://lostintransit.se/2015/08/12/more-pim-bidir-considerations/

https://community.cisco.com/t5/networking-knowledge-base/rp-redundancy-with-pim-bidir-phantom-rp/ta-p/3117191

Last updated