Troubleshooting OSPF Adjacencies

To troubleshoot OSPF adjacency issues, you first need a thorough understanding of the OSPF adjacency process. Two routers go through several states when establishing an adjacency, and there are a few states were the routers can get “stuck.” If you understand what is supposed to happen at each state, you can more easily troubleshoot issues.

I would suggest not trying to necessarily memorize each issue and what state the routers are stuck in. Instead, try to understand deeply what happens at each state, and from there you will naturally understand what problems will leave routers stuck in which state.

OSPF adjacency state machine

Down
- This is only seen if a neighbor is manually defined, or the dead interval has passed for a previously Full neighbor. If a router has never been discovered, then it cannot be shown as “Down.” It will simply not be present in “show ip ospf neighbor” output.
Init
- The router has received a Hello from its neighbor, but its own RID is not seen in the Hello yet.
2-Way
- The local router’s RID is seen in the Helloz
- It is expected for a router to be “stuck” in 2-Way on multiaccess network types (broadcast and NMBA) when one router is not the DR or BDR.
Exstart
- During this phase, the master-slave status is negotiated using DBD packets. The master controls the database exchange process and is responsible for incrementing the inital sequence number. The highest RID is the master.
- The DBD packets do not contain the LSA headers yet at this point. This will happen in the next state (Exchange).
- The MTU is signaled here in the DBD packet. If there is a mismatch, the routers will be stuck in Exstart/Exchange.
Exchange
- During this phase, DBD packets are exchanged which describe the LSDB. The DBD packets contain each LSA’s header.
Loading
- During this phase, the actual content of the LSAs are exchanged. A router sends LSR (Link State Request) packets for LSAs which it doesn’t have, or which are outdated. The neighbor sends the full LSAs in response to LSRs. The router then sends LSAcks to acknowledge the receipt of LSAs.
Full
- The routers are now fully adjacent. They have fully exchanged their database with one another and the databases are synchronized.

You also should be familiar with the criteria that needs to match in order for OSPF two routers to become neighbors:

Same area
Same authentication
Same subnet and subnet mask
- If using the p2p network type, the subnet mask does not need to match
Same Hello/Dead interval
Same stub type (stub, NSSA, or “regular” area)
The RID must be unique between the two routers

You can see these parameters in the OSPF Header and Hello:

The Stub/NSSA area is set using the E and N flags under the Hello Options. The E flag, when set, means the area is not a stub (it can externally route).

N=0 and E=1 is a normal area
N=0 and E=0 is a stub area
N=1 and E=0 is an NSSA area
The “totally” aspect of an NSSA or stub area is not signaled in the OSPF Hello. An ABR uses the command area 1 stub no-summary or area 1 nssa no-summary to filter Type 3 LSAs from entering the area. This is why a regular internal router does not need the no-summary command.

What happens when the neighborship criteria does not match?

In all cases, when the area, authentication, subnet mask, Hello/Dead interval, or stub type don’t match, or when the RIDs are identical, a neighbor simply will be stuck in the “Down” state (if manually configured), or not seen at all (if dynamically discovered - which is usually the case).

By using debug ip ospf adj and debug ip ospf hello you can see why an adjacency is not coming up:

! R1 places its interface in area 2, and R2 places its interface in area 1
R1#
*Nov 30 14:34:11.663: OSPF-1 ADJ   Gi1: Rcv pkt from 10.1.2.2, area 0.0.0.2, mismatched area 0.0.0.1 in the header

! R1 sets clear text authentication (type 1) but R2 uses no authentication
R1#
*Nov 30 14:35:55.458: OSPF-1 ADJ   Gi1: Rcv pkt from 10.1.2.2 : Mismatched Authentication type. Input packet specified type 0, we use type 1

! R1 sets its interface netmask to /25, and R2 uses /24. The network type is broadcast.
! I believe "R" means received and "C" means configured
R1#
*Nov 30 14:48:16.837: OSPF-1 HELLO Gi1: Rcv hello from 3.3.3.3 area 1 10.1.2.2
*Nov 30 14:48:16.837: OSPF-1 HELLO Gi1: Mismatched hello parameters from 10.1.2.2
*Nov 30 14:48:16.837: OSPF-1 HELLO Gi1: Dead R 40 C 40, Hello R 10 C 10 Mask R 255.255.255.0 C 255.255.255.128

! R1 sets its dead-interval to 30 seconds, while R2 uses the default of 40 seconds
R1#
*Nov 30 14:47:11.392: OSPF-1 HELLO Gi1: Rcv hello from 3.3.3.3 area 1 10.1.2.2
*Nov 30 14:47:11.392: OSPF-1 HELLO Gi1: Mismatched hello parameters from 10.1.2.2
*Nov 30 14:47:11.392: OSPF-1 HELLO Gi1: Dead R 40 C 30, Hello R 10 C 10 Mask R 255.255.255.0 C 255.255.255.0

! R1 sets area 1 to stub, and R2 does not
R1#
*Nov 30 14:49:32.722: OSPF-1 HELLO Gi1: Send hello to 224.0.0.5 area 1 from 10.1.2.1
*Nov 30 14:49:32.809: OSPF-1 HELLO Gi1: Rcv hello from 3.3.3.3 area 1 10.1.2.2
*Nov 30 14:49:32.809: OSPF-1 HELLO Gi1: Hello from 10.1.2.2 with mismatched Stub/Transit area option bit

! R1 and R2 use the same RID of 3.3.3.3
R1#
*Nov 30 14:50:28.232: %OSPF-4-DUP_RTRID_NBR: OSPF detected duplicate router-id 3.3.3.3 from 10.1.2.2 on interface GigabitEthernet1

What about mismatching network types, such as p2p and broadcast?

As you saw in the pcap earlier, the network type is not signaled. Therefore two routers with a mismatched network type can form a Full adjacency, as I will demostrate below:

In the lab I set R1 to broadcast (which is the default), and R2 to point-to-point.

R1#show run int gi1
!
interface GigabitEthernet1
 ip address 10.1.2.1 255.255.255.0
 ip ospf 1 area 1

R2#show run int gi1
!
interface GigabitEthernet1
 ip address 10.1.2.2 255.255.255.0
 ip ospf network point-to-point
 ip ospf 1 area 1

Both routers accept each other’s Hellos, and proceed through the neighbor state transitions. R1 waits 40 seconds for the DR election. During this time, R2 is stuck in 2-Way, as seen on R1, and R1 is stuck in Exstart, as seen on R2.

Once R1 elects itself as the DR, database exchange begins, and both routers acheive a Full adjecency. The only reason you may know that the network types mismatch, is because R2 sees that R1 has set a DR in the OSPF Hello. Because R2 considers the network p2p, a DR should not be elected, so R2 generates a logging message:

R2#
*Nov 30 15:08:23.821: %OSPF-4-NET_TYPE_MISMATCH: Received Hello from 1.1.1.1 on GigabitEthernet1 indicating a  potential 
             network type mismatch
R2#show ip ospf nei

Neighbor ID     Pri   State           Dead Time   Address         Interface
1.1.1.1           0   FULL/  -        00:00:34    10.1.2.1        GigabitEthernet1

R1#show ip ospf nei

Neighbor ID     Pri   State           Dead Time   Address         Interface
2.2.2.2           1   FULL/DR         00:00:35    10.1.2.2        GigabitEthernet1

What happens when MTU mismatches?

The interface MTU is signaled in the OSPF DBD message:

When MTU does not match, routers will be stuck in Exstart/Exchange, and will not proceed through the database exchange process.

Below, R1 Gi1 is set to mtu 2000. R2 Gi1 is by default 1500.

R1#
*Nov 30 15:15:29.023: OSPF-1 ADJ   Gi1: Rcv DBD from 2.2.2.2 seq 0xF4 opt 0x52 flag 0x7 len 32  mtu 1500 state EXCHANGE
*Nov 30 15:15:29.024: OSPF-1 ADJ   Gi1: Nbr 2.2.2.2 has smaller interface MTU

When testing this, I saw times where both routers were stuck in Exstart, and times where one was Exstart and one was Exchange.

It is also important to note that you will eventually see the neighborship transition from Exstart/Exchange to Down, over and over again.

R1#
*Nov 30 15:22:56.866: OSPF-1 ADJ   Gi1: Killing nbr 2.2.2.2 due to excessive (25) retransmissions
*Nov 30 15:22:56.866: OSPF-1 ADJ   Gi1: 2.2.2.2 address 10.1.2.2 is dead, state DOWN
*Nov 30 15:22:56.867: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.2 on GigabitEthernet1 from EXSTART to DOWN, Neighbor Down: Too many retransmissions

If you wanted to, you can use the knob ip ospf mtu-ignore to bring the adjacency up. There shouldn’t really be a reason to do this, because it makes more sense to just set MTU to be the same on both devices. But if you don’t have access to the other device and need to bring the adjacency up immediately, this might work. The other device would most likely need to also ignore the MTU mismatch so all-in-all it’s probably easier to just fix the root issue.

#R1 and R2
int Gi1
 ip ospf mtu-ignore

What happens when an ACL drops OSPF packets?

This situation does not happen much in the real-world but it can come up as an option in an exam, so it is useful to think through the problem.

Interestingly, if an ACL blocks OSPF packets inbound, this is the only situation I can think of which results in a neighbor being stuck in INIT. This is because the router with no ACL receives Hellos from its neighbor, but the Hellos never list its own RID as being seen.

The problem of being stuck in INIT is simply that two-way traffic is not working. Traffic for one router is two-way, but for the other router is one-way. The router for which the traffic is two-way, sees the neighbor as stuck in Init.

To test this, I put an ACL on R2 that blocks OSPF packets inbound (which have an IP protocol number of 89).

#R2
ip access-list extended BLOCK_OSPF
 deny 89 any any
 permit ip any any
!
int Gi1
 ip access-group BLOCK_OSPF in

On R1, the neighbor is stuck in INIT, because the Hello packets from the neighbor do not list R1’s RID as being seen. R1 puts R2’s RID as being seen in its own Hellos, but R2’s ACL drops them, so R2 never learns of R1.

! Traffic for R1 is two-way, so R2 is stuck in INIT
R1#show ip ospf nei

Neighbor ID     Pri   State           Dead Time   Address         Interface
2.2.2.2           0   INIT/  -        00:00:34    10.1.2.2        GigabitEthernet

! R2 lists no neighbors, because traffic for R2 is one-way
R2#show ip ospf nei
R2#

Hello from R2 lists no routers as being seen

Hello from R1 lists R2 as being seen, but R2’s ACL drops this packet.

You might wonder what happens if you apply the ACL in the outbound direction instead of inbound. This will do nothing, because an ACL does not filter traffic generated by the router itself.

#R2
int Gi1
 no ip access-group BLOCK_OSPF in
 ip access-group BLOCK_OSPF out

! Adjacency goes full because the ACL does not filter locally-generated traffic
R2#show ip ospf nei

Neighbor ID     Pri   State           Dead Time   Address         Interface
1.1.1.1           0   FULL/  -        00:00:35    10.1.2.1        GigabitEthernet1

What happens if you use ip unnumbered?

It is totally valid to use ip unnumbered on a p2p adjacency.

In fact, when you configure an interface for ip unnumbered, the router warns you that this will only work on an IGP adjacency when using the point-to-point network type.

R1(config-if)#int gi1
R1(config-if)#no ip add
R1(config-if)#ip unnumbered lo0
Warning: dynamic routing protocols will not work on non-point-to-point interfaces with IP unnumbered configured.

The router LSAs simply list a bogus interface address, which seems to possibly be related to the ifIndex of the interface?

R1#show ip ospf data router       

            OSPF Router with ID (1.1.1.1) (Process ID 1)

                Router Link States (Area 1)

  LS age: 194
  Options: (No TOS-capability, DC)
  LS Type: Router Links
  Link State ID: 1.1.1.1
  Advertising Router: 1.1.1.1
  LS Seq Number: 80000002
  Checksum: 0x35E6
  Length: 36
  Number of Links: 1

    Link connected to: another Router (point-to-point)
     (Link ID) Neighboring Router ID: 2.2.2.2
     (Link Data) Router Interface address: 0.0.0.7
      Number of MTID metrics: 0
       TOS 0 Metrics: 1


  LS age: 195
  Options: (No TOS-capability, DC)
  LS Type: Router Links
  Link State ID: 2.2.2.2
  Advertising Router: 2.2.2.2
  LS Seq Number: 80000002
  Checksum: 0xC651
  Length: 36
  Number of Links: 1

    Link connected to: another Router (point-to-point)
     (Link ID) Neighboring Router ID: 1.1.1.1
     (Link Data) Router Interface address: 0.0.0.7
      Number of MTID metrics: 0
       TOS 0 Metrics: 1

The next-hop for all routes is the RID of the neighbor. The router simply resolves the RID via ARP. For this to work, the Loopback that the RID is based off of needs to be advertised into OSPF.

For example, R1 can reach 2.2.2.20/32 on R2 using 2.2.2.2 as the nexthop.

R2#show run int gi1
interface GigabitEthernet1
 ip unnumbered Loopback0
 ip ospf network point-to-point
 ip ospf 1 area 1
!
R2#show run int lo0
interface Loopback0
 ip address 2.2.2.2 255.255.255.255
 ip ospf 1 area 1
!
R2#show run int lo100
interface Loopback100
 ip address 2.2.2.20 255.255.255.255
 ip ospf 1 area 1


R1#show ip route ospf | beg Gateway
Gateway of last resort is not set

      2.0.0.0/32 is subnetted, 2 subnets
O        2.2.2.2 [110/2] via 2.2.2.2, 00:00:03, GigabitEthernet1
O        2.2.2.20 [110/2] via 2.2.2.2, 00:00:03, GigabitEthernet1

R1#show arp
Protocol  Address          Age (min)  Hardware Addr   Type   Interface
Internet  1.1.1.1                 -   5254.0015.acc8  ARPA   GigabitEthernet1
Internet  2.2.2.2                 1   5254.0007.066b  ARPA   GigabitEthernet1

R1#ping 2.2.2.20 source lo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2.2.2.20, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1 
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/1 ms

All routes that have a best path via R2 have a nexthop of R2’s RID. The RID is resolved via ARP.

Is it valid for neighbors to be “stuck” in 2-Way?

Yes, this is expected when there are four or more routers on a shared multiaccess network. Any adjacency between two non-DR/BDR routers will be “stuck” in 2-Way. This is because routers preform the database exchange process with the DR and BDR. (When there are only three routers on the segment, the single DROther will form FULL adjacencies with the DR and BDR, so there will be no adjacencies in 2-Way. This why you must have four or more to see 2-Way.)

To demonstrate this, I use four routers, R1-R4, using the default broadcast network type. R1 is the DR and R2 is the BDR, which I controlled using the OSPF priority.

When looking at the DR or BDR, you will see all adjacencies as being FULL:

! R1 is the DR, with priority 100
R1#show ip ospf nei

Neighbor ID     Pri   State           Dead Time   Address         Interface
2.2.2.2          50   FULL/BDR        00:00:35    10.0.0.2        GigabitEthernet1
3.3.3.3           1   FULL/DROTHER    00:00:37    10.0.0.3        GigabitEthernet1
4.4.4.4           1   FULL/DROTHER    00:00:38    10.0.0.4        GigabitEthernet1

! R2 is the BDR, with priority 50
R2#show ip ospf nei

Neighbor ID     Pri   State           Dead Time   Address         Interface
1.1.1.1         100   FULL/DR         00:00:37    10.0.0.1        GigabitEthernet1
3.3.3.3           1   FULL/DROTHER    00:00:39    10.0.0.3        GigabitEthernet1
4.4.4.4           1   FULL/DROTHER    00:00:31    10.0.0.4        GigabitEthernet1

When looking at a DROther, you will see all adjacencies with other DROthers as “stuck” in 2-Way:

R3#show ip ospf nei

Neighbor ID     Pri   State           Dead Time   Address         Interface
1.1.1.1         100   FULL/DR         00:00:36    10.0.0.1        GigabitEthernet1
2.2.2.2          50   FULL/BDR        00:00:34    10.0.0.2        GigabitEthernet1
4.4.4.4           1   2WAY/DROTHER    00:00:39    10.0.0.4        GigabitEthernet1

I cannot find a way to produce a p2p adjacency that is stuck in 2-way. So as far as I know, “stuck in 2-way” should never actually be an issue. If an adjacency achieves the 2-way state, then two-way communication is working, so they will just enter Exchange/Exstart next. If two-way communication is not working, they will be stuck in Init.

Some astute readers may wonder about situtations in which multicast traffic works (Hellos) but unicast doesn’t (Exchange/Exstart). Would this cause neighbors to be stuck in 2-way? Indeed I wondered the same thing, which leads us to the next section!

What happens if unicast traffic does not work?

OSPF neighbor discovery works by multicasting Hellos to 224.0.0.5. However, once neighbors are discovered, the adjacency process “switches over” to unicast addresses when preforming the database exchange process. Therefore, if for some reason unicast reachability is not working, two routers will be stuck in Exstart/Exchange.

OSPF Hellos, LSUs, and LSAcks are multicast to 224.0.0.5
OSPF DBD and LSRs are unicast to the neighbor’s interface IP (learned from the Hello)

To demonstrate this issue, we can create an access-list which blocks traffic with a destination address of the router itself. Multicast packets will pass through, but unicast will not.

#R2
ip access-list extended BLOCK_UNICAST
 deny   ip any host 10.1.2.2
 permit ip any any
!
int Gi0/0
 ip access-group BLOCK_UNICAST in

Both routers are stuck in Exstart/Exchange, which is much more often seen when MTU mismatches.

R1#show ip ospf nei

Neighbor ID     Pri   State           Dead Time   Address         Interface
10.1.2.2          1   EXCHANGE/BDR    00:00:38    10.1.2.2        GigabitEthernet0/0

R2#show ip ospf nei

Neighbor ID     Pri   State           Dead Time   Address         Interface
10.1.2.1          1   EXSTART/DR      00:00:32    10.1.2.1        GigabitEthernet0/0

Can LDP/IGP Sync prevent adjacencies?

The LDP/IGP sync feature is used to “discourage” the use of the interface by the IGP until LDP is up. When the IGP adjacency happens before LDP is up, labeled traffic can be blackholed. The IGP tries to use the interface as the best path based on metric alone, but the labeled traffic is unknown to the router because LDP is not up yet. The LDP/IGP sync feature advertises the link with the maximum metric if an LDP neighborship is not established on the link.

But did you know that LDP/IGP sync can cause an adjacency to not form in the first place? For this reason I’ve included this problem in this article.

When LDP/IGP sync is configured and an LDP neighbor is not discovered on an interface, the IGP adjacency will not come up at all.

To demonstrate this, I enable LDP/IGP sync on R1, enable LDP on Gi0/0, but do not enable LDP on R2.

#R1
router ospf 1
 mpls ldp sync
!
int Gi0/0
 ip address 10.1.2.1 255.255.255.0
 ip ospf network point-to-point
 ip ospf 1 area 0
 mpls ip
 shutdown
!
int lo0
 ip address 1.1.1.1 255.255.255.255
 ip ospf 1 area 0

#R2
int Gi0/0
 ip address 10.1.2.2 255.255.255.0
 ip ospf network point-to-point
 ip ospf 1 area 0
 ! "mpls ip" is not enabled
!
int lo0
 ip address 2.2.2.2 255.255.255.255
 ! This interface is purposefully not advertised via OSPF

I then no shut Gi0/0 on R1. The router never sends Hellos because an LDP neighbor is not discovered on the interface. The router waits indefinitely for an LDP neighbor to be discovered before transmitting Hellos on the interface.

R1#show ip ospf int gi0/0
GigabitEthernet0/0 is up, line protocol is up 
  Internet Address 10.1.2.1/24, Area 0, Attached via Interface Enable
  Process ID 1, Router ID 1.1.1.1, Network Type POINT_TO_POINT, Cost: 1
  Topology-MTID    Cost    Disabled    Shutdown      Topology Name
        0           1         no          no            Base
  Enabled by interface config, including secondary ip addresses
  Transmit Delay is 1 sec, State DOWN (waiting for LDP)
  Timer intervals configured, Hello 10, Dead 40, Wait 40, Retransmit 5
    oob-resync timeout 40

R1#show ip ospf mpls ldp int gi0/0
GigabitEthernet0/0
  Process ID 1, Area 0
  LDP is not configured through LDP autoconfig
  LDP-IGP Synchronization : Required
  Holddown timer is not configured
  Interface is down and pending LDP

I then enable LDP on Gi0/0 of R2. This will allow R1 to discover R2 via LDP, and R1 will start transmitting OSPF Hellos. However, R2’s lo0 interface is not advertised via OSPF, so an LDP neighborship will not form. (R1 cannot route to R2’s transport address). Therefore R1 will use the max metric for Gi0/0. This “discourages” the IGP from using the interface.

#R2
int Gi0/0
 mpls ip


R1#
*Dec  1 03:03:34.216: %OSPF-5-ADJCHG: Process 1, Nbr 10.1.2.2 on GigabitEthernet0/0 from LOADING to FULL, Loading Done
R1#show ip ospf mpls ldp int gi0/0
GigabitEthernet0/0
  Process ID 1, Area 0
  LDP is not configured through LDP autoconfig
  LDP-IGP Synchronization : Required
  Holddown timer is not configured
  Interface is up and sending maximum metric

The way that LDP/IGP sync can completely prevent an adjacency from coming up at all can potentially be a huge issue. If a router only has a single OSPF link to the core, this problem could prevent you from even being able to remotely manage the router at all in very specific failure cases.

Luckily you can tell the router to form an adjancecy anyways, even if no LDP neighbor is discovered on the interface. You configure a holddown timer which specifies how long to wait to discover an LDP neighbor before giving up and bringing up the IGP adjacency:

R1(config)#mpls ldp igp sync holddown ?
  <1-2147483647>  Hold down time in milliseconds

When holddown timer expires, the router will transmit Hellos, even if no LDP peers are discovered.