Troubleshooting OSPF Adjacencies
To troubleshoot OSPF adjacency issues, you first need a thorough understanding of the OSPF adjacency process. Two routers go through several states when establishing an adjacency, and there are a few states were the routers can get “stuck.” If you understand what is supposed to happen at each state, you can more easily troubleshoot issues.
I would suggest not trying to necessarily memorize each issue and what state the routers are stuck in. Instead, try to understand deeply what happens at each state, and from there you will naturally understand what problems will leave routers stuck in which state.
OSPF adjacency state machine
Down
This is only seen if a neighbor is manually defined, or the dead interval has passed for a previously Full neighbor. If a router has never been discovered, then it cannot be shown as “Down.” It will simply not be present in “show ip ospf neighbor” output.
Init
The router has received a Hello from its neighbor, but its own RID is not seen in the Hello yet.
2-Way
The local router’s RID is seen in the Helloz
It is expected for a router to be “stuck” in 2-Way on multiaccess network types (broadcast and NMBA) when one router is not the DR or BDR.
Exstart
During this phase, the master-slave status is negotiated using DBD packets. The master controls the database exchange process and is responsible for incrementing the inital sequence number. The highest RID is the master.
The DBD packets do not contain the LSA headers yet at this point. This will happen in the next state (Exchange).
The MTU is signaled here in the DBD packet. If there is a mismatch, the routers will be stuck in Exstart/Exchange.
Exchange
During this phase, DBD packets are exchanged which describe the LSDB. The DBD packets contain each LSA’s header.
Loading
During this phase, the actual content of the LSAs are exchanged. A router sends LSR (Link State Request) packets for LSAs which it doesn’t have, or which are outdated. The neighbor sends the full LSAs in response to LSRs. The router then sends LSAcks to acknowledge the receipt of LSAs.
Full
The routers are now fully adjacent. They have fully exchanged their database with one another and the databases are synchronized.
You also should be familiar with the criteria that needs to match in order for OSPF two routers to become neighbors:
Same area
Same authentication
Same subnet and subnet mask
If using the p2p network type, the subnet mask does not need to match
Same Hello/Dead interval
Same stub type (stub, NSSA, or “regular” area)
The RID must be unique between the two routers
You can see these parameters in the OSPF Header and Hello:
The Stub/NSSA area is set using the E and N flags under the Hello Options. The E flag, when set, means the area is not a stub (it can externally route).
N=0 and E=1 is a normal area
N=0 and E=0 is a stub area
N=1 and E=0 is an NSSA area
The “totally” aspect of an NSSA or stub area is not signaled in the OSPF Hello. An ABR uses the command area 1 stub no-summary or area 1 nssa no-summary to filter Type 3 LSAs from entering the area. This is why a regular internal router does not need the no-summary command.
What happens when the neighborship criteria does not match?
In all cases, when the area, authentication, subnet mask, Hello/Dead interval, or stub type don’t match, or when the RIDs are identical, a neighbor simply will be stuck in the “Down” state (if manually configured), or not seen at all (if dynamically discovered - which is usually the case).
By using debug ip ospf adj and debug ip ospf hello you can see why an adjacency is not coming up:
What about mismatching network types, such as p2p and broadcast?
As you saw in the pcap earlier, the network type is not signaled. Therefore two routers with a mismatched network type can form a Full adjacency, as I will demostrate below:
In the lab I set R1 to broadcast (which is the default), and R2 to point-to-point.
Both routers accept each other’s Hellos, and proceed through the neighbor state transitions. R1 waits 40 seconds for the DR election. During this time, R2 is stuck in 2-Way, as seen on R1, and R1 is stuck in Exstart, as seen on R2.
Once R1 elects itself as the DR, database exchange begins, and both routers acheive a Full adjecency. The only reason you may know that the network types mismatch, is because R2 sees that R1 has set a DR in the OSPF Hello. Because R2 considers the network p2p, a DR should not be elected, so R2 generates a logging message:
What happens when MTU mismatches?
The interface MTU is signaled in the OSPF DBD message:
When MTU does not match, routers will be stuck in Exstart/Exchange, and will not proceed through the database exchange process.
Below, R1 Gi1 is set to mtu 2000. R2 Gi1 is by default 1500.
When testing this, I saw times where both routers were stuck in Exstart, and times where one was Exstart and one was Exchange.
It is also important to note that you will eventually see the neighborship transition from Exstart/Exchange to Down, over and over again.
If you wanted to, you can use the knob ip ospf mtu-ignore to bring the adjacency up. There shouldn’t really be a reason to do this, because it makes more sense to just set MTU to be the same on both devices. But if you don’t have access to the other device and need to bring the adjacency up immediately, this might work. The other device would most likely need to also ignore the MTU mismatch so all-in-all it’s probably easier to just fix the root issue.
What happens when an ACL drops OSPF packets?
This situation does not happen much in the real-world but it can come up as an option in an exam, so it is useful to think through the problem.
Interestingly, if an ACL blocks OSPF packets inbound, this is the only situation I can think of which results in a neighbor being stuck in INIT. This is because the router with no ACL receives Hellos from its neighbor, but the Hellos never list its own RID as being seen.
The problem of being stuck in INIT is simply that two-way traffic is not working. Traffic for one router is two-way, but for the other router is one-way. The router for which the traffic is two-way, sees the neighbor as stuck in Init.
To test this, I put an ACL on R2 that blocks OSPF packets inbound (which have an IP protocol number of 89).
On R1, the neighbor is stuck in INIT, because the Hello packets from the neighbor do not list R1’s RID as being seen. R1 puts R2’s RID as being seen in its own Hellos, but R2’s ACL drops them, so R2 never learns of R1.
Hello from R2 lists no routers as being seen
Hello from R1 lists R2 as being seen, but R2’s ACL drops this packet.
You might wonder what happens if you apply the ACL in the outbound direction instead of inbound. This will do nothing, because an ACL does not filter traffic generated by the router itself.
What happens if you use ip unnumbered?
It is totally valid to use ip unnumbered on a p2p adjacency.
In fact, when you configure an interface for ip unnumbered, the router warns you that this will only work on an IGP adjacency when using the point-to-point network type.
The router LSAs simply list a bogus interface address, which seems to possibly be related to the ifIndex of the interface?
The next-hop for all routes is the RID of the neighbor. The router simply resolves the RID via ARP. For this to work, the Loopback that the RID is based off of needs to be advertised into OSPF.
For example, R1 can reach 2.2.2.20/32 on R2 using 2.2.2.2 as the nexthop.
All routes that have a best path via R2 have a nexthop of R2’s RID. The RID is resolved via ARP.
Is it valid for neighbors to be “stuck” in 2-Way?
Yes, this is expected when there are four or more routers on a shared multiaccess network. Any adjacency between two non-DR/BDR routers will be “stuck” in 2-Way. This is because routers preform the database exchange process with the DR and BDR. (When there are only three routers on the segment, the single DROther will form FULL adjacencies with the DR and BDR, so there will be no adjacencies in 2-Way. This why you must have four or more to see 2-Way.)
To demonstrate this, I use four routers, R1-R4, using the default broadcast network type. R1 is the DR and R2 is the BDR, which I controlled using the OSPF priority.
When looking at the DR or BDR, you will see all adjacencies as being FULL:
When looking at a DROther, you will see all adjacencies with other DROthers as “stuck” in 2-Way:
I cannot find a way to produce a p2p adjacency that is stuck in 2-way. So as far as I know, “stuck in 2-way” should never actually be an issue. If an adjacency achieves the 2-way state, then two-way communication is working, so they will just enter Exchange/Exstart next. If two-way communication is not working, they will be stuck in Init.
Some astute readers may wonder about situtations in which multicast traffic works (Hellos) but unicast doesn’t (Exchange/Exstart). Would this cause neighbors to be stuck in 2-way? Indeed I wondered the same thing, which leads us to the next section!
What happens if unicast traffic does not work?
OSPF neighbor discovery works by multicasting Hellos to 224.0.0.5. However, once neighbors are discovered, the adjacency process “switches over” to unicast addresses when preforming the database exchange process. Therefore, if for some reason unicast reachability is not working, two routers will be stuck in Exstart/Exchange.
OSPF Hellos, LSUs, and LSAcks are multicast to 224.0.0.5
OSPF DBD and LSRs are unicast to the neighbor’s interface IP (learned from the Hello)
To demonstrate this issue, we can create an access-list which blocks traffic with a destination address of the router itself. Multicast packets will pass through, but unicast will not.
Both routers are stuck in Exstart/Exchange, which is much more often seen when MTU mismatches.
Can LDP/IGP Sync prevent adjacencies?
The LDP/IGP sync feature is used to “discourage” the use of the interface by the IGP until LDP is up. When the IGP adjacency happens before LDP is up, labeled traffic can be blackholed. The IGP tries to use the interface as the best path based on metric alone, but the labeled traffic is unknown to the router because LDP is not up yet. The LDP/IGP sync feature advertises the link with the maximum metric if an LDP neighborship is not established on the link.
But did you know that LDP/IGP sync can cause an adjacency to not form in the first place? For this reason I’ve included this problem in this article.
When LDP/IGP sync is configured and an LDP neighbor is not discovered on an interface, the IGP adjacency will not come up at all.
To demonstrate this, I enable LDP/IGP sync on R1, enable LDP on Gi0/0, but do not enable LDP on R2.
I then no shut Gi0/0 on R1. The router never sends Hellos because an LDP neighbor is not discovered on the interface. The router waits indefinitely for an LDP neighbor to be discovered before transmitting Hellos on the interface.
I then enable LDP on Gi0/0 of R2. This will allow R1 to discover R2 via LDP, and R1 will start transmitting OSPF Hellos. However, R2’s lo0 interface is not advertised via OSPF, so an LDP neighborship will not form. (R1 cannot route to R2’s transport address). Therefore R1 will use the max metric for Gi0/0. This “discourages” the IGP from using the interface.
The way that LDP/IGP sync can completely prevent an adjacency from coming up at all can potentially be a huge issue. If a router only has a single OSPF link to the core, this problem could prevent you from even being able to remotely manage the router at all in very specific failure cases.
Luckily you can tell the router to form an adjancecy anyways, even if no LDP neighbor is discovered on the interface. You configure a holddown timer which specifies how long to wait to discover an LDP neighbor before giving up and bringing up the IGP adjacency:
When holddown timer expires, the router will transmit Hellos, even if no LDP peers are discovered.
Further Reading
https://www.cisco.com/c/en/us/support/docs/ip/open-shortest-path-first-ospf/13685-13.html
See articles listed in “Related Information” but know that they are a bit old. I found one piece of outdated information, which said that neighbors that do not have matching authentication will be stuck in Init. When testing this, I found this to not be true.
Last updated