Convergence and Scalability Issues in OSPF and IS-IS

This seems to be the favorite question that every newbie has! There is no unequivocal answer to this and it all depends upon the kind of network and the topology. Having said this, lets try to see how the two IGPs can be compared.

IS-IS

This protocol is limited by the maximum number of LSPs that each IS-IS router can issue. This is 256 as its LSP ID is 1 octet long. The total number of IP prefixes carried by IS-IS can be easily computed and it comes to O(31000). However, RFC 3786 describes mechanisms to relax the limit on the number of LSP fragments, thereby increasing the number of IP prefixes that can be carried within IS-IS.

I have however, never seen any network carrying more than O(30K) IP prefixes inside IS-IS. Do let me know if you’re aware of networks where you see IS-IS carrying more than 30K routes.

I say this because this is a reasonable number for any sane IS-IS deployment and it will not run out of space unless someone actually injects the entire (or even partial) BGP feed into the IGP. In that case we will run out of space at about 20% of the way into redistribution and not be able to advertise the rest. It is for this reason that this practice has now been deprecated and the RFC 1745 which lays down the rules for BGP- OSPF interaction, has been moved to the HISTORICAL status.

There are 8 bits to define a pseudonode number in the LSPID which means that a router can be a Designated Intermediate System (DIS) for only 256 LANs. Additionally there is also a limitation on the number of routers that can be advertised in pseudonode LSP of the DIS. Dont worry – RFC 3786 fixes this!

RFC 3373 OTOH proposes a new TLV thats carried in the IIH PDUs that can increase the number of point-to-point adjacencies from 256 on a single router.

The “Remaining lifetime” field which gives the number of seconds before LSP is considered expired is 16 bits wide.

This gives the life time of the LSP as 2^16/60/60 Hrs = 18.7 Hrs

Thus the LSP issued by a router needs to be refreshed after every 18.7 Hrs. So youre not going to see a lot of IS-IS control packets being regenerated in a stable topology.

OSPF

In theory, OSPF topology is limited by the number of links that can be advertised in the Router LSA as each router gets only one Router LSA and it cant be bigger than 64K which is the biggest an IP packet can be. The same constraint applies to the Network LSA also.

Each link in the router can take up at most 24 bytes. Thus, number of links which can be supported is given by (64 * 1024) / 24 = 2370

However, if we take the minimum link size per link (12 bytes) then the maximum is about 2 * 2370 = O(5000) links

To be more specific, we can have O(2300) p2p and p2mp links (not considering virtual links, etc) and O(5000) broadcast/NMBA links described in OSPF’s Router LSA and its Network LSA.

Thus each Router LSA can carry some 5000 links information in it. It is hard to imagine a router having 5000 neighbors but there are already routers with 400 neighbors in some ISPs, and it may not take long to reach the order of the magnitude limited by OSPF.

The Network LSAs are generated by the designated router (DR) for each broadcast network it is connected to. To have scaling problems it should have 2730 * 6 times neighbors on that interface. This is even less probable and hence there are no scalability problems with OSPF per se.

All other LSAs apart from Type 1 and Type 2 hold single prefixes. Because there is no limit to the number of such LSAs, a large number of inter-area and externals can be generated depending upon the memory resources of the router.

Each LSA has an LS Age field which is counted upwards starting from zero. Its life is an architectural constant which says one hour. When an LSA’s LS age field reaches MaxAge, it is reflooded in an attempt to flush the LSA from the routing domain. One hour seems like a long time but if one originates 50,000 LSAs then OSPF will be refreshing on an average of just 36ms!

Total number of LSAs to be refreshed = 50,000

Time by which all the LSAs must be refreshed = LSRefreshTime = 30mins = 1800 secs

Rate at which the LSAs need to be refreshed = 1800/50000 = 36ms

However, if the refreshes are perfectly spread out across time and perfectly batched, the actual update transmission rate may be on the order of one packet per second.

There is however a “do-not-age” LSA which in theory can be pressed into service and which never gets aged. However, such LSAs will be eventually purged from the LS database if they become stale after being held for at least 60 minutes and the originator not reachable for the same period. Moreover it is not backward compatible and if one deploys that in the network today with some routers not supporting this then the network can really get weird. So there isn’t really much that can be done using these unless the whole network is changed!

Theoretically, both the routing protocols are scalable and there should not be any issues with either one of these if implemented properly. Both have similar stability and convergence properties. Practically, providers must go with what their vendors suggest since the vendors are best aware of how each protocol has been implemented on their platforms.

I discuss more of this here.

Database Granularity in OSPF vs IS-IS ..

This post compares how the two link state protocols hold their routing information in their databases as this affects their behavior in how they flood/distribute the change of routing information and the internal implementation complexity.

OSPF

o Organization of Routing Information

OSPF encodes the routing information into small chunks, which it calls Link State Advertisement (LSA). Each LSA has its own 20-byte header in order to be identified uniquely. This header is called the LSA Header. There is no limitation on the size of a LSA, though the actual LSA size is limited by IP packet size limitation: 65,535 bytes minus the LSA Header size and IP packet header size. The database access in OSPF is per LSA basis.

In OSPF routing, the information within an area is described by type 1 and type 2 LSAs (known as Router-LSA and Network-LSA respectively). These LSAs can become big depending upon the number of adjacencies to be advertised and prefixes to be carried inside an area. In other words, the routing information with respect to a single node (either router or network node) is encoded inside a single LSA. On the other hand, each inter-area or external prefix is advertised in a separate LSA (AS-External LSA).

An OSPFv2 router may originate only one Router-LSA for itself, while in OSPFv3, a router is allowed to originate multiple Router-LSAs. A router may originate a Network-LSA for each IP subnet on which the router acts as a designated router (DR). A router may originate one LSA for each inter-area and external prefix, with no limitations on the number of LSAs that it may originate.

Consequences

Originating a new and a unique LSA for each inter-area route and an external prefix implies that there is a LSA Header overhead involved while the information is kept in the database or is flooded to the neighbors. There is thus some extra memory and bandwidth consumed in total.

o Carrying Routing Information

LSAs are carried in Link State Update packets (called LS Updates or LSUs). Each LS Update packet has its own header, consists of a 24 byte OSPF protocol header, and a 4-bytes field indicating the number of LSAs contained in the packet. Thus multiple LSAs can be packed into a single LS Update packet. Some implementations may not do this as its considered difficult achieving this during flooding.

Consequences

In the face of network changes, OSPF floods only the updated LSAs. Therefore, even if an implementation does not pack multiple LSAs into a single LS Update packet (and so bandwidth is consumed by LS Update header for each update of a single LSA), the bandwidth consumption for each network change can be considered adequately small.

IS-IS

o Organization of the Routing Information

In IS-IS, protocol packets are called Protocol Data Units or PDUs. IS-IS encodes the link state information into the set of TLVs and packs these TLVs into one or more Link State PDUs (LSPs). The size limit of a LSP is configurable. The Routing database consists of these PDUs and the access to the database is per PDU basis. The original IS-IS specification places an upper bound on the number of LSPs a router can originate to 255. There are however techniques which enable a router to originate more than 255 LSPs, by using multiple system-id’s for itself.

Consequences

Since routing information in IS-IS for each router is packed in fewer LSPs, the memory consumed for bookkeeping of the routing data within the database is less and is more efficient.

o Carrying Routing Information

Each LSP is flooded independently, without being modified all the way from the originator through the routers till the very end. This results in all the routers having the same LSPs as that originated by the first router.

Consequences

Since LSPs are not modified in any way and are not allowed to be fragmented, in order to be flooded successfully over all links existing in the IS-IS network, great care must be ensured when configuring the size limit of LSP that routers can originate and receive.

If the size limit of the LSP is set without taking into account the minimum value of the MTUs throughout the network, or if the size limit of LSPs conflict among some the routers in the network, the database synchronization may not be achieved, and this can result in routing loops and/or blackholes.

When a change occurs to a LSP, the whole LSP needs to be flooded, and therefore the bandwidth usage can be non-optimal. There is however a solution which exists in theory. If an implementation finds some of the entities to be flapping, then they may be packed into smaller LSPs or may be isolated from the other stable entities. This way one needs to only advertise the unstable LSP/LSPs. I have not btw come across any implementation that does that. Leave a comment if you know one that does this!

Database granularity also affects when two routers need to synchronize their databases. In OSPF, because of its high database granularity there are a lot of items which it needs to synchronize and that process is somewhat complicated with a lot of DBD packets being exchanged back and forth. This gets worse if the router trying to sync is being inundated with a lot of other data traffic also. This is not much of an issue these days as any router worth its salt would prioritize the OSPF control packets.

This is however much simpler in case of IS-IS and there isn’t any finite state machine that the neighbors need to go through to synchronize their databases. It just uses it regular flooding mechanism (a couple of CSNPs describe their entire topology information) to exchange its entire database. You plug in the new IS-IS router and before you realize the router is already sync’ed up with all the other IS-IS routers in the network!