# Shortest Path First (SPF) Algorithm Demystified ..

For the sake of brevity i would refer an Intermediate System (IS) in IS-IS parlance as a router in the following post – thus a router in this post could mean either an OSPF speaker or an IS-IS speaker or some other routing element from your favorite link state protocol. For the SPF algorithm to work, it would require *all* routers in the network to know about all the other routers in the network and the links connecting them. How a link state routing protocol encodes this information and ensures that its disseminated properly is left to that protocol. OSPF encodes this information in Link State Advertisements (LSAs) and floods it reliably, while IS-IS encodes this in a Link State Packet (LSP) that it originates.

Once each router knows about all the other routers and the links connecting them, it runs the Dijkstra Shortest Path First algorithm to determine the shortest path from itself to all the other routers in the network. Since each router has a similar copy of the link state database and each runs the same algorithm, they end up constructing the same view of the network and packets get routed consistently at each hop.

So, how does SPF algorithm work in OSPF and IS-IS.

Imagine a simple network as shown in the figure below.

Once each router has flooded its link state information in the network, all routers know about all the other routers and the links connecting them. The link state database on each router looks like the following:

[A, B, 3], [A, C, 6], [B, A, 3], [B, D, 3], [B, E, 5], [C, A, 6], [C, D, 9], [D, C, 9], [D, B, 3], [D, E, 3] , [E, B, 5] and [E, D, 3]

Each triple should be read as {originating router, router its connected to, the cost of the link connecting the two routers}

So what does Router A do with this information and how is this used in SPF?

While running SPF, each router maintains two lists – the first is a list of nodes for which the shortest path has been determined and we are sure that no path shorter than the one we have computed can exist. This list is called the PATH (or PATHS) list. The second is the list of paths through the routers that may or may not be the shortest to a destination. This list is called the TENTative list, or simply the TENT list From now on, TENT would refer to the TENT list and PATH, to the PATH list.

Each element in the list is a triplet of the kind {endpoint router that we’re trying to reach, total distance from the calculating router, next-hop to reach the endpoint router}

Each router runs the following algorithm to compute the shortest path to each node:

Step I: Put “self” on the PATH with a distance of 0 and a next hop of self. The router running the SPF refers to itself as either “self” or the root node, because this node is the root of the shortest-path tree.

Step II: Take the node (call it the PATH node) just placed on the PATH list and examine its list of neighbors. Add each neighbor to the TENT with a next hop of the PATH node, unless that neighbor is already in the TENT or PATH list with a lower cost.

Call the node just added to the TENT as the TENT node. Set the cost to reach the TENT node equal to the cost to get from the root node to the PATH node plus the cost to get from the PATH node to the TENT node.

If the node just added to the TENT already exists in the TENT, but with a higher cost, replace the higher-cost node with the node currently under consideration.

Step III: Find the lowest cost neighbor in the TENT and move that neighbor to the PATH, and repeat Step 2. Stop only when TENT becomes empty.

Lets follow the sequence that Router A goes through for building its SPF tree.

1st Iteration of the SPF run

Step I: Put “self” on the PATH with a distance of 0 and a next hop of self. After this step the PATH and TENT look as follows:

PATH – {A, 0, A}

TENT – { }

Step II: Take the node (call it the PATH node) just placed on the PATH list and examine its list of neighbors. Patently, A is the PATH node. Examine its list of neighbors ({A, B, 3}, {A, C, 6}). OSPF does this by looking at the LSAs advertised by Router A, while IS-IS does this by looking at the neighbors TLV found in Router A’s LSP. When an IS-IS node is placed in PATHS, all IP prefixes advertised by it are installed in the IS-IS Routing Information Base (RIB) with the corresponding metric and next hop.

The Step II further says – “Add each neighbor to the TENT with a next hop of the PATH node, unless that neighbor is already in the TENT or PATH list with a lower cost.” A’s neighbors are B and C, and since neither of them is in the PATH or TENT, we add both of them to the TENT.

PATH – {A, 0, A}

TENT – {B, 3, A}, {C, 6, A}

Step II says – “Call the node just added to the TENT as the TENT node. Set the cost to reach the TENT node equal to the cost to get from the root node to the PATH node plus the cost to get from the PATH node to the TENT node.”

Lets pick up the TENT node B. The cost to reach B would be the cost to reach from root node to PATH node + cost from PATH node to TENT node. In the first iteration of SPF, both the root node and PATH node is A. Thus total cost to reach B is cost to reach from A (PATH node) to B (TENT node) which is 3.

Step II says – “If the node just added to the TENT already exists in the TENT, but with a higher cost, replace the higher-cost node with the node currently under consideration.”

B isnt in the TENT, so skip this.

Step III says – “Find the lowest cost neighbor in the TENT and move that neighbor to the PATH, and repeat“. The lowest cost neighbor is B (with cost 3). Move this to PATH. We go back to Step II since TENT isnt yet empty.

2nd Iteration of the SPF run

PATH – {A, 0, A} {B, 3, A}

TENT – {C, 6, A}

We have thus added the neighbor B in PATH, since we know that there cannot be any other shorter path to reach it. And this is, as you will note, consistent with our definition of PATH wherein we had earlier stated that nodes can only be placed there once we are sure that there cannot be any shorter path to reach them from the root node.

We begin our 2nd iteration and go to Step II which says – “Take the node (call it the PATH node) just placed on the PATH list and examine its list of neighbors“. B is the PATH node and we examine its neighbors (A, E and D). Step II further says – “Add each neighbor to the TENT with a next hop of the PATH node, unless that neighbor is already in the TENT or PATH list with a lower cost.”

Since A is already in PATH with ignore it and only add E and D to TENT.

PATH – {A, 0, A} {B, 3, A}

TENT – {C, 6, A} {D, 3, B}, {E, 5, B}

Step II says – “Call the node just added to the TENT as the TENT node. Set the cost to reach the TENT node equal to the cost to get from the root node to the PATH node plus the cost to get from the PATH node to the TENT node.

Aah .. this means that cost against D and E would not be 3 and 5, as what i have shown, but would instead be 3 (cost from A to B) +3 (cost from B to D) = 6 and 3 (cost from A to B) +5 (B to E) = 8

Thus the PATH and TENT look as follows:

PATH – {A, 0, A} {B, 3, A}

TENT – {C, 6, A} {D, 6, B}, {E, 8, B}

The rest of the Step II does not apply here since D and E dont exist in the TENT.

Come to Step III which says “Find the lowest cost neighbor in the TENT and move that neighbor to the PATH, and repeat Step 2″.

Lowest cost neighbor can either be C or D so pick on up randomly. It can mathematically be proven that we would end up with the same SPF tree irrespective of which equal cost neighbor is picked up from the TENT first. In our case, lets pick up C.

It is thus moved to the PATH

PATH – {A, 0, A} {B, 3, A} {C, 6, A}

TENT – {D, 6, B}, {E, 8, B}

3rd Iteration of the SPF run

Go back to Step II which says “Take the node (call it the PATH node) just placed on the PATH list and examine its list of neighbors. Add each neighbor to the TENT with a next hop of the PATH node, unless that neighbor is already in the TENT or PATH list with a lower cost

C’s neighbors are A and D. Since A is already in the PATH only D is added in the TENT.

PATH – {A, 0, A} {B, 3, A} {C, 6, A}

TENT – {D, 6, B}, {E, 8, B} {D, 9, C}

As per Step II we now need to fix the cost to reach D from the root node. This would be cost to reach from A to C (6) + cost from C to D (9) = 15

PATH and TENT now:

PATH – {A, 0, A} {B, 3, A} {C, 6, A}

TENT – {D, 6, B}, {E, 8, B} {D, 15, C}

Step II further says – “If the node just added to the TENT already exists in the TENT, but with a higher cost, replace the higher-cost node with the node currently under consideration.

node D already exists in the TENT (via B with cost 6) and since its with a lesser cost, we remove the node that we had just added from the TENT. This is because a lower cost path to reach node D already exists in the TENT.

PATH – {A, 0, A} {B, 3, A} {C, 6, A}

TENT – {D, 6, B}, {E, 8, B}

We come to Step III which says “Find the lowest cost neighbor in the TENT and move that neighbor to the PATH, and repeat Step 2″.

Lowest cost neighbor is D – which means we move D now to the PATH and go to Step II, since the TENT isnt yet empty.

4th Iteration of the SPF run

PATH – {A, 0, A} {B, 3, A} {C, 6, A} {D, 6, B}

TENT – {E, 8, B}

Step II says “Take the node (call it the PATH node) just placed on the PATH list and examine its list of neighbors. Add each neighbor to the TENT with a next hop of the PATH node, unless that neighbor is already in the TENT or PATH list with a lower cost.”

To examine D’s neighbors we look at the link state information it advertised. It advertised the following information:

[D, C, 9], [D, B, 3], [D, E, 3]

This means that D is says that its connected to C, B and E. We ignore its connection to B and C, since they are already in PATH. We thus only add neighbor E in the TENT.

PATH – {A, 0, A} {B, 3, A} {C, 6, A} {D, 6, B}

TENT – {E, 8, B} {E, 3, D}

Continuing with Step II which further says – “Call the node just added to the TENT as the TENT node. Set the cost to reach the TENT node equal to the cost to get from the root node to the PATH node plus the cost to get from the PATH node to the TENT node.”

This means that we need to adjust the cost of the triple {E, 3, D} that we just added to the TENT. The cost to reach E via D would thus be the cost to reach D from A (which is the root node) + the cost to reach E from D. This comes out to be 6 + 3 = 9.

TENT thus looks like this – {E, 8, B} {E, 9, D}

Step II further says – “If the node just added to the TENT already exists in the TENT, but with a higher cost, replace the higher-cost node with the node currently under consideration”

We just added node E in the TENT and a route to E already exists in the TENT, and its with a lower cost. This means that we remove the route that we had just added.

So PATH and TENT at this point look as follows:

PATH – {A, 0, A} {B, 3, A} {C, 6, A} {D, 6, B}

TENT – {E, 8, B}

We go to Step III which says – “Find the lowest cost neighbor in the TENT and move that neighbor to the PATH, and repeat Step 2″.

The lowest cost neighbor in TENT right now is E. We move this to PATH.

So PATH and TENT at this point look as follows:

PATH – {A, 0, A} {B, 3, A} {C, 6, A} {D, 6, B} {E, 8, B}

TENT – { }

Step III further states that we continue if and only if something remains in the TENT. TENT is now empty, which means that we have computed the shortest paths to all the nodes that A was aware of.

This is marks the end of the SPF algorithm run and the SPF tree that it has computed looks as follows

# Convergence and Scalability Issues in OSPF and IS-IS

This seems to be the favorite question that every newbie has! There is no unequivocal answer to this and it all depends upon the kind of network and the topology. Having said this, lets try to see how the two IGPs can be compared.

IS-IS

This protocol is limited by the maximum number of LSPs that each IS-IS router can issue. This is 256 as its LSP ID is 1 octet long. The total number of IP prefixes carried by IS-IS can be easily computed and it comes to O(31000). However, RFC 3786 describes mechanisms to relax the limit on the number of LSP fragments, thereby increasing the number of IP prefixes that can be carried within IS-IS.

I have however, never seen any network carrying more than O(30K) IP prefixes inside IS-IS. Do let me know if you’re aware of networks where you see IS-IS carrying more than 30K routes.

I say this because this is a reasonable number for any sane IS-IS deployment and it will not run out of space unless someone actually injects the entire (or even partial) BGP feed into the IGP. In that case we will run out of space at about 20% of the way into redistribution and not be able to advertise the rest. It is for this reason that this practice has now been deprecated and the RFC 1745 which lays down the rules for BGP- OSPF interaction, has been moved to the HISTORICAL status.

There are 8 bits to define a pseudonode number in the LSPID which means that a router can be a Designated Intermediate System (DIS) for only 256 LANs. Additionally there is also a limitation on the number of routers that can be advertised in pseudonode LSP of the DIS. Dont worry – RFC 3786 fixes this!

RFC 3373 OTOH proposes a new TLV thats carried in the IIH PDUs that can increase the number of point-to-point adjacencies from 256 on a single router.

The “Remaining lifetime” field which gives the number of seconds before LSP is considered expired is 16 bits wide.

This gives the life time of the LSP as 2^16/60/60 Hrs = 18.7 Hrs

Thus the LSP issued by a router needs to be refreshed after every 18.7 Hrs. So youre not going to see a lot of IS-IS control packets being regenerated in a stable topology.

OSPF

In theory, OSPF topology is limited by the number of links that can be advertised in the Router LSA as each router gets only one Router LSA and it cant be bigger than 64K which is the biggest an IP packet can be. The same constraint applies to the Network LSA also.

Each link in the router can take up at most 24 bytes. Thus, number of links which can be supported is given by (64 * 1024) / 24 = 2370

However, if we take the minimum link size per link (12 bytes) then the maximum is about 2 * 2370 = O(5000) links

To be more specific, we can have O(2300) p2p and p2mp links (not considering virtual links, etc) and O(5000) broadcast/NMBA links described in OSPF’s Router LSA and its Network LSA.

Thus each Router LSA can carry some 5000 links information in it. It is hard to imagine a router having 5000 neighbors but there are already routers with 400 neighbors in some ISPs, and it may not take long to reach the order of the magnitude limited by OSPF.

The Network LSAs are generated by the designated router (DR) for each broadcast network it is connected to. To have scaling problems it should have 2730 * 6 times neighbors on that interface. This is even less probable and hence there are no scalability problems with OSPF per se.

All other LSAs apart from Type 1 and Type 2 hold single prefixes. Because there is no limit to the number of such LSAs, a large number of inter-area and externals can be generated depending upon the memory resources of the router.

Each LSA has an LS Age field which is counted upwards starting from zero. Its life is an architectural constant which says one hour. When an LSA’s LS age field reaches MaxAge, it is reflooded in an attempt to flush the LSA from the routing domain. One hour seems like a long time but if one originates 50,000 LSAs then OSPF will be refreshing on an average of just 36ms!

Total number of LSAs to be refreshed = 50,000

Time by which all the LSAs must be refreshed = LSRefreshTime = 30mins = 1800 secs

Rate at which the LSAs need to be refreshed = 1800/50000 = 36ms

However, if the refreshes are perfectly spread out across time and perfectly batched, the actual update transmission rate may be on the order of one packet per second.

There is however a “do-not-age” LSA which in theory can be pressed into service and which never gets aged. However, such LSAs will be eventually purged from the LS database if they become stale after being held for at least 60 minutes and the originator not reachable for the same period. Moreover it is not backward compatible and if one deploys that in the network today with some routers not supporting this then the network can really get weird. So there isn’t really much that can be done using these unless the whole network is changed!

Theoretically, both the routing protocols are scalable and there should not be any issues with either one of these if implemented properly. Both have similar stability and convergence properties. Practically, providers must go with what their vendors suggest since the vendors are best aware of how each protocol has been implemented on their platforms.

I discuss more of this here.

# Database Granularity in OSPF vs IS-IS ..

This post compares how the two link state protocols hold their routing information in their databases as this affects their behavior in how they flood/distribute the change of routing information and the internal implementation complexity.

OSPF

o Organization of Routing Information

OSPF encodes the routing information into small chunks, which it calls Link State Advertisement (LSA). Each LSA has its own 20-byte header in order to be identified uniquely. This header is called the LSA Header. There is no limitation on the size of a LSA, though the actual LSA size is limited by IP packet size limitation: 65,535 bytes minus the LSA Header size and IP packet header size. The database access in OSPF is per LSA basis.

In OSPF routing, the information within an area is described by type 1 and type 2 LSAs (known as Router-LSA and Network-LSA respectively). These LSAs can become big depending upon the number of adjacencies to be advertised and prefixes to be carried inside an area. In other words, the routing information with respect to a single node (either router or network node) is encoded inside a single LSA. On the other hand, each inter-area or external prefix is advertised in a separate LSA (AS-External LSA).

An OSPFv2 router may originate only one Router-LSA for itself, while in OSPFv3, a router is allowed to originate multiple Router-LSAs. A router may originate a Network-LSA for each IP subnet on which the router acts as a designated router (DR). A router may originate one LSA for each inter-area and external prefix, with no limitations on the number of LSAs that it may originate.

Consequences

Originating a new and a unique LSA for each inter-area route and an external prefix implies that there is a LSA Header overhead involved while the information is kept in the database or is flooded to the neighbors. There is thus some extra memory and bandwidth consumed in total.

o Carrying Routing Information

LSAs are carried in Link State Update packets (called LS Updates or LSUs). Each LS Update packet has its own header, consists of a 24 byte OSPF protocol header, and a 4-bytes field indicating the number of LSAs contained in the packet. Thus multiple LSAs can be packed into a single LS Update packet. Some implementations may not do this as its considered difficult achieving this during flooding.

Consequences

In the face of network changes, OSPF floods only the updated LSAs. Therefore, even if an implementation does not pack multiple LSAs into a single LS Update packet (and so bandwidth is consumed by LS Update header for each update of a single LSA), the bandwidth consumption for each network change can be considered adequately small.

IS-IS

o Organization of the Routing Information

In IS-IS, protocol packets are called Protocol Data Units or PDUs. IS-IS encodes the link state information into the set of TLVs and packs these TLVs into one or more Link State PDUs (LSPs). The size limit of a LSP is configurable. The Routing database consists of these PDUs and the access to the database is per PDU basis. The original IS-IS specification places an upper bound on the number of LSPs a router can originate to 255. There are however techniques which enable a router to originate more than 255 LSPs, by using multiple system-id’s for itself.

Consequences

Since routing information in IS-IS for each router is packed in fewer LSPs, the memory consumed for bookkeeping of the routing data within the database is less and is more efficient.

o Carrying Routing Information

Each LSP is flooded independently, without being modified all the way from the originator through the routers till the very end. This results in all the routers having the same LSPs as that originated by the first router.

Consequences

Since LSPs are not modified in any way and are not allowed to be fragmented, in order to be flooded successfully over all links existing in the IS-IS network, great care must be ensured when configuring the size limit of LSP that routers can originate and receive.

If the size limit of the LSP is set without taking into account the minimum value of the MTUs throughout the network, or if the size limit of LSPs conflict among some the routers in the network, the database synchronization may not be achieved, and this can result in routing loops and/or blackholes.

When a change occurs to a LSP, the whole LSP needs to be flooded, and therefore the bandwidth usage can be non-optimal. There is however a solution which exists in theory. If an implementation finds some of the entities to be flapping, then they may be packed into smaller LSPs or may be isolated from the other stable entities. This way one needs to only advertise the unstable LSP/LSPs. I have not btw come across any implementation that does that. Leave a comment if you know one that does this!

Database granularity also affects when two routers need to synchronize their databases. In OSPF, because of its high database granularity there are a lot of items which it needs to synchronize and that process is somewhat complicated with a lot of DBD packets being exchanged back and forth. This gets worse if the router trying to sync is being inundated with a lot of other data traffic also. This is not much of an issue these days as any router worth its salt would prioritize the OSPF control packets.

This is however much simpler in case of IS-IS and there isn’t any finite state machine that the neighbors need to go through to synchronize their databases. It just uses it regular flooding mechanism (a couple of CSNPs describe their entire topology information) to exchange its entire database. You plug in the new IS-IS router and before you realize the router is already sync’ed up with all the other IS-IS routers in the network!

# Checks on HELLOs for OSPF and IS-IS during Adjacency Formation ..

The HELLOs (or IIHs in IS-IS parlance)  are responsible bringing up the adjacencies between the two (or multiple) routers. Forming adjacencies is an integral part of all link state routing protocols as all protocol packets other than HELLOs are flooded only over the adjacencies. The rules for formation of such adjacencies however differ between IS-IS, OSPF v2 and OSPF v3.

IS-IS

Besides the basic checks to verify the integrity of the packet, IS-IS does a few checks before forming any adjacency upon receiving the IIHs.

o It allows multiple area addresses to be configured on a router. During the IIH exchange the adjacency is formed only if at least one of the area address matches. The advantage of having multiple areas is explained in the further posts. NOTE that Level 2 only adjacencies would be formed even if the area addresses are not matching.

o To prevent the LSPs and CSNPs from being dropped due to different values for originatingLSPBufferSize and ReceiveLSPBufferSize, all IIHs are padded till the maximum MTU when the adjacency comes up. This check verifies consistent settings between the adjacent routers. This is however not a sufficient check.

o Adjacencies are formed without regard to interface addressing or asymmetric HOLD timer values. Values of IIH interval are not sent in IIH packets. While the IS-IS protocol provides sufficient routing information for relaying packets between adjacent routers, many implementations nonetheless require ARP support to do this. These implementations typically refuse to form an adjacency unless the neighbor interface IP address is on the local interface’s IP subnet.

o IS-IS can carry addressing information of different protocols in its TLV’s. However, the protocol supported field must be sent in Dual and IP-Only routers. RFC1195 specifies no checks for the protocol supported field for adjacency formation. It places topology restrictions on multi-protocol networks. In networks that conform to these restrictions, neighboring routers will always have a protocol in common. Therefore, it does not state whether adjacency formation should take protocols supported into account. However, many implementations, do not form an adjacency with a neighbor unless they share at least one protocol in common.

o Not matching Hold Timer values has advantages wherein the administrator can set different Hold times for different routers. This helps in cases where the going down of a DIS or some router needs to be detected faster. For such routers the hold timer can be set to a lower value.

OSPFv2

The checks for formation of adjacencies are stricter in OSPFv2 as compare to that of IS-IS.

o The area-id of the received packet should always match the incoming interface (with the exception of virtual links). Area type is strictly checked by checking the E-bit (not set for non-default areas) and the N- bit (not-set for non-NSSA areas).