Category Archives: Security

Securing BFD now possible!

Confession Time.

I am guilty of committing several sins. One that egregiously stands out is writing two IETF specs for BFD security (here and here) without considering the impact on the routers and switches implementing those specs. Bear in mind that Bi-directional Forwarding Detection (BFD) is a hard protocol to implement well. Its hard to get into a conversation with engineers working on BFD without a few of them shedding copious quantities of tears on what it took them to avoid those dreaded BFD flaps in scaled setups. They will tell you how they resorted to clever tricks (hacks, if you will) to process BFD packets as fast as they could (plucking them out of order from a shared queue, dedicated tasks picking up BFD packets in the ISR contexts, etc) . In a candid conversation, an ex-employee of a reputed vendor revealed how they stage managed their BFD during a demo to a major customer since they didnt want their BFD to flap while the show (completely scripted) was on. So, long story short — BFD is hard when you start scaling. It just becomes a LOT worse, when you add security on top of it.

The reason BFD is hard is because of the high rate at which packets are sent and consumed. If you miss out a few packets from your neighbor you consider him dead and you bring down your routing adjacency with that neighbor. This causes a lot of bad things (micro-loops, traffic storms, angry phone calls) to happen , least of which trust me, is rerouting the traffic around the “affected” node.

When BFD flaps

The cost of losing BFD packets is very high — so you really want to keep the packet processing minimal, the protocol lean, which is why folks in the BFD WG get a migraine whenever an enthusiastic (though noble) soul mentions a TLV extension to BFD or (even worse) a BFD v2.

Now when you add security, things become a waaaaaaaaaaaaay more complex. Not only do you need to process the packets at a high rate, you also need to compute the SHA or the MD5 digest for each one of those. This becomes difficult when the sessions scale even with hardware assist for BFD.  The current BFD specification for security mandates the digest to be computed for each packet that is sent (you could do something clever with the non-meticulous mode and we’ll talk about it some other day) so the spec is really useless as there is no vendor who can do that at the rate at which BFD packets need to be processed.

This also explains why the BFD specs have not moved further down on the standards track — or simply why they arent RFCs yet.

But there is a need to enhance BFD security, since thats currently the weakest link in the service provider network security. The routing and the signalling protocols have all been enhanced to support stronger cryptographic algorithms and BFD is the only protocol left thats still running without any authentication  (!!!) . I hope this doesnt inspire hackers all around the world to break into the Verizons, the Comcasts and the Tatas. Well, if somebody does, then please pass me a pointer so that i can increase my bandwidth to get all those Pink Floyd bootlegs that i have been scavenging for.

So now, we need  to secure BFD and we are stuck with a proposal that cant be used. Kind of cute, if youre not responsible for running a network.

One way to crack BFD security

The solution to this routing quagmire is however quite simple. I dribbled coffee all over my shirt when i thought of it the first time — checked if I wasnt missing out something obvious and when i was sure that it would hold ground, i pinged one of my co-authors who happened to be thinking on similar lines, and we quickly came up with a draft (after more than a year).

What we’re essentially proposing is this:

Most BFD packets are ping-pong packets carrying same information as was carried in the earlier packets — the payload doesnt change at all (used by most vendors to optimize their implementation — HINT use caching). Only when the state changes, that is, the BFD sessions go Up or Down, or a parameter changes (rarely), does the payload change. Use authentication only when the payload changes and never otherwise. This means that in most cases the packets will be sent in clear-text which can be easily handled as is done today. Only when the state changes, the digest needs to be computed which we know from our extensive experience is a relatively low occurrence event.

This proposal makes it very easy for the vendors to support BFD security, something which folks have been wishing for since long. You can get all the sordid details of our proposal here.

This is the first iteration of the draft and things will change as we move forward. While the current version suggests no changes to the existing BFD protocol, we might going ahead suggest a few changes to the state machine if  that’s what it takes to make the protocol secure. Who said securing BFD was simple ? Its perhaps for this reason that the IETF community still hasnt proposed a solid mechanism for stronger authentication of BFD packets.

You can follow the discussion on the BFD WG mailing list or keep looking at this space for more updates.

Is this draft a reparation for the sins i had mentioned at the beginning of my post earlier?

How bad is the OSPF vulnerability exposed by Black Hat?


I was asked a few weeks ago by our field engineers to provide a fix for the OSPF vulnerability exposed by Black Hat last month. Prima facie there appeared nothing new in this attack as everyone knows that OSPF (or ISIS) networks can be brought down by insider attacks. This isnt the first time that OSPF vulnerability has been announced at Black Hat. Way back in 2011 Gabi  Nakibly, the researcher at Israel’s Electronic Warfare Research and Simulation Center, had demonstrated how OSPF could be brought down using insider attacks.  Folks were not impressed, as anybody who had access to one of the routers could launch attacks on the routing infrastructure. So it was with certain skepticism that i started looking at yet another OSPF vulnerability exposed by Gabi, again at Black Hat. Its only when i started delving deep into the attack vector that the real scale of the attack dawned on me. This attack evades OSPF’s natural fight back mechanism against malacious LSAs which makes it a bit more insidious than the other attacks reported so far.

I exchanged a few emails with Gabi when i heard about his latest exposé. I wanted to understand how this attack was really different from the numerous other insider attacks that have been published in the past. Insider attacks are not very interesting, really. Well, if you were careless enough to let somebody access your trusted router, or somebody was smart enough to masquerade as one of your routers and was able to inject malicious LSAs then the least that you can expect is a little turbulence in your routing infrastructure. However, this attack stands apart from the others as we shall soon see.

OSPF (and ISIS too) has a natural fight back mechanism against any malacious LSA that has been injected in a network. When an OSPF router receives an LSA that lists that router as the originating router (referred to as a self-originated LSA) it looks at the contents of the LSA (just in case you didnt realize this). If the received LSA looks newer than the LSA that this router had last originated, the router advances the LSA’s LS sequence number one past the received LS sequence number and originates a new instance of this LSA. In case its not interested in this LSA, it flushes the LSA by originating a new LSA with age set to MaxAge.

All other routers in the network now update their LS database with this new instance and the malacious LSA effectively gets purged from the network. Viola,its that simple!

As a result of this, the attacker can only flood malacious LSAs inside the network till the router that the malacious LSA purports to come from (victim router) receives a copy. As soon as this router floods an updated copy, it doesnt take long for other routers in the network to update their LS DB as well – the flooding process is very efficient in disseminating information since network diameters are typically not huge, and yes, packets travel with the speed of light. Did you know that?

In the attack that Gabi described, the victim router does not recognize the malacious LSA as its own and thus never attempts at refreshing it. As a result the malicious LSA remains stealthily hidden in the routing domain and can go undetected for a really long time. Thus by controlling a single router inside an AS (the one that will flood the malacious LSA), an attacker can gain control over the entire routing domain. In fact, an attacker need not even gain control of an entire router inside the AS.  Its enough if it can somehow inject the malacious LSAs over a link such that one of the OSPF routers in the network accept this. In the media release, Black Hat claimed ” The new attack allows an attacker that owns just a single router within an AS to effectively own the routing tables of ALL the routers in that AS without actually owning the routers themselves. This may be utilized to induce routing loops, network cuts or longer routes in order to facilitate DoS of the routing domain or to gain access to information flows which otherwise the attacker had no access to.

So what is this attack?

Lets start by looking at what the LS header looks like.

LS Header

In this attack we are only interested in the two fields, the Link State ID and the Advertising Router, in the LS Header. In the context of a Router LSA, the Link State ID identifies the router whose links are listed in the LSA. Its always populated with the router ID of that router.  The Advertising Router field identifies the router that initially advertised (originated) the LSA. The OSPF spec dictates that only a router itself can originate its own LSA (i.e. no router is expected to originate a LSA on behalf of other routers), therefore in Router LSAs the two fields – ‘Link State ID’ and ‘Advertising Router’ – must have the exact same value. However, the OSPF spec does not specify a check to verify this equality on Router LSA reception.

Unlike several other IETF standards, the OSPF spec is very detailed, leaving little room for any ambiguity in interpreting and implementing the standard. This is usually good as it results in interoperable implementations where everybody does the right thing. The flip side however is that since everybody follows the spec to the tittle, a potential bug or an omission in the standard, would very likely affect several vendor implementations.

This attack exploits a potential omission (or a bug if you will) in the standard where it does not mandate that the receiving router verifies that the Link State ID and the Advertising Router fields in the Router LSA are the exact same value.

This attack sends malacious Router LSAs with two different values in the LS header. The Link State ID carries the Router ID of the router that is being attacked (the victim) and the Advertising Router is set to some different (any) value.

When the victim receives the malacious Router LSA, it does not refresh this LSA as it doesnt recognize this as its own self generated LSA. This is because the OSPF spec clearly says in Sec 13.4 that “A self-originated LSA is detected when either 1) The LSA’s Advertising Router is equal to the router’s own Router ID or 2) the LSA is a network LSA .. “.

This means that OSPF’s natural fight back mechanism is NOT triggered by the victim router as long as the field ‘Advertising Router’ of a LSA is NOT equal to the victim’s Router ID. This is true even if the ‘Link State ID’ of that LSA is equal to the victim’s Router ID. Going further it means no LSA refresh is triggered even if the malacious LSA claims to describe the links of the victim router!

When this LSA is flooded all the routers accept and install this LSA in their LS database. This exists along side the valid LSA originated by the victim router. Thus each router in the network now has two Router LSAs for the victim router – the first that was genuinely originated by the victim router and the second that has been inserted by the attacker.

When computing the shortest path first algorithm, the OSPF spec in Sec 16.1 requires implementations to pick up the LSA from the LS DB by doing a lookup “based on the Vertex ID“. The Vertex ID refers to the Link State ID field in the Router LSAs. This means that when computing SPF, routers only identify the LSAs based on their Link State ID. This creates an ambiguity on which LSA will be picked up from the LS database. Will it be the genuine one originated by the victim router or will it be the malacious LSA injected by the attacker? The answer depends on how the data structures for LS DB lookup have been implemented in the vendor’s routers. Ones that pick up the wrong LSA will be susceptible to the attack. The ones that dont, would be oblivious to the malacious LSA sitting in their LSA DBs.

Most router implementations are vulnerable to this attack since nobody expects the scenario where multiple LSAs with the same Link State ID will exist in the LS DB. It turns out that at least 3 major router vendors (Cisco, Juniper and Alcatel-Lucent) have already released advisories and announced fixes/patches that fixes this issue. The fix for 7210 would be out soon ..

Once again, the attacker does not need to have an OSPF adjacency to inject the forged LSAs.

Doing this is not as difficult as we might think it is. There is no need for the attacker to access the LS DB sequence number – all it needs to do is to send an LSA with a reasonably high sequence number, say something like MAX_SEQUENCE – 1 to get this LSA accepted.

The attack can also be performed without complete information about the OSPF topology. But, this is highly dependent on the attack scenario and what piece of false information the attacker wishes to advertise on behalf of the victim. For example, if the attacker wishes to disconnect the victim router from the OSPF topology then merely sending an empty LSA without knowing the OSPF topology in advance would also work. In the worst case, the attacker can also get partial information on the OSPF topology by using trace routes, etc. This way the attacker can construct LSAs that look very close to what has been originally advertised by the victim router, making it all the more difficult to suspect that such LSAs exist in the network.

Its time we retire Authentication Header (AH) from the IPsec Suite!

Folks who think Authentication Header (AH) is a manna from heavens need to read the Bible again. Thankfully you dont find too many such folks these days. But there are still some who thank Him everyday for blessing their lives with AH. I dread getting stuck with such people in the elevators — actually, i dont think i would like getting stuck with anybody in an elevator, but these are definitely the worst kind to get stuck with.

So lets start from the beginning.

IPsec, for reasons that nobody cares to remember now, decided to come out with two protocols – Encapsulating Security Payload (ESP) and AH, as part of the core architecture. ESP did pretty much what AH did, with the addition of providing encryption services. While both provided data integrity protection, AH went a step further and also secured a few fields from the IP header for you.

There are bigots, and i unfortunately met one a few days ago, who like to argue that AH provides greater security than ESP since AH covers the IP header as well. They parrot this since that’s what most textbooks and wannabe CCIE blogs and websites say. Lets see if securing the IP header really helps us.

When IPsec successfully authenticates the payload, we know that the packet came from someone who knew the authentication key. I would wager that that should be enough to accept the packet. The IP header is just required to route the packet to reach the recipient – its not meant to do anything else. Thats networking 101 really.

IPsec Security Associations are established based on the source and destination addresses and some L4 port information. The receiver matches the incoming packet’s against SPI and inbound selectors associated with the SA. Packet is only accepted if it came from the correct source and destination IP address. If an attacker somehow manages to change the IP header then there are high chances that it will get rejected by IPsec since it will fail the Security Policy Database (SPD) check.  So, what is protecting the header really giving us?

BTW ESP can also protect the IP header if its used in the tunnel mode. So, if someone is really keen on protecting the IP header then ESP in the tunnel mode can also be used. It should however be noted that ESP tunnel mode SA applied to an, say IPv6 flow,  results in at least 50 bytes of additional overhead per packet. This additional overhead may be undesirable for many bandwidth-constrained wireless and/or satellite communications networks, as these types of infrastructure are not over provisioned.

Packet overhead is particularly significant for traffic profiles characterized by small packet payloads (e.g., various voice codecs). If these small packets are afforded the security services of an IPsec tunnel mode SA, the amount of per-packet overhead is increased.

This issue will be alleviated by header compression schemes defined in the IETF.

I have recently published an IETF draft where i explicitly ask for AH to be retired since there is nothing useful that it does that cant be achieved with ESP with NULL encryption algorithm.

Please note that i have absolutely no complaints with AH and the claims that it makes. It does its job really well. Its just that its completely redundant and the world can certainly do with one less protocol to manage.

Retiring AH doesn’t mean that people have to stop using AH right now. It only means that in the opinion of the community there are now better alternatives. This will discourage new applications and protocols to mandate the use of AH. It however, does not preclude the possibility of new work to IETF that will require or enhance AH. It just means that the authors will have to do a real good job of convincing the community on why that solution is really needed and the reason why ESP with NULL encryption algorithm cannot be used instead.

The IETF draft that i have written aims to dispel several myths  surrounding AH and i show that in each case ESP with NULL encryption algorithm can be used instead, often with better results.

Life of Crypto Keys employed in Routing Protocols

Everyone knows that the cryptographic key used for securing your favorite protocol (OSPF, IS-IS, BGP TCP-AO, PIM-SM, BFD, etc)  must have a limited life time and the keys must be changed frequently. However, most people don’t understand the real reason for doing so. They argue that keys must be regularly changed since they are vulnerable to cryptanalysis attacks. Each time a crypto key is employed it generates a cipher text. In case of routing protocols the cipher text is the authentication data that is carried by the protocol packets. Its alleged that using the same key repetitively allows an attacker to build up a store of cipher texts which can prove sufficient for a successful cryptanalysis of the key value. It is also believed that if a routing protocol is transmitting packets at a high rate then the “long life” may be in order of a few hours. Thus it’s the amount of traffic that has been put on the wire using a specific key for authentication and not necessarily the duration for which the key has been in use that determines how long the key should be employed.

This was true in the Jurassic ages but not any more. The number of times a key can be used is  dependent upon the properties of the cryptographic mode than the algorithms themselves. In a cipher block chaining mode, with a b-bit block, one can safely encrypt to around 2^(b/2) blocks. AES (Advanced Encryption Standard)  used worldwide has a fixed block size of 128, which means that it can be safely used for 2^(64+4) bytes of routing data. If we assume a protocol that sends 1 Gig (!!) worth of control traffic *every* second, even then it is safe enough to be used for around 8700 *years* without changing the key! Hopefully, the system admin will remember to change the crypto key after 8700 years! 😉

So, if the data is secure then why do we really need to change the crypto keys ever?

As a general rule, where strong cryptography is employed, physical, procedural, and logical access protection considerations often have more impact on the key life than do algorithm and key size factors. People need to change the keys when an operator who had access to the keys leaves the company. Using a key chain, a set of keys derived from the same keying material and used one after the other, also does not help as one still has to change all the keys in the key chain when an operator having access to all those keys leaves the company. Additionally, key chains will not help if the routing transport subsystem does not support rolling over to the new keys without bouncing the routing sessions and adjacencies.

Another threat against a long-lived key is that one of the systems storing the key, or one of the users entrusted with the key, could be subverted. So, while there may not be cryptographic motivations of changing the keys, there could be system security motivations for rolling or changing the key.

What complicates this further is that more frequent manual key changes might actually increase the risk of exposure as it is during the time that the keys are being changed that they are likely to be disclosed! In these cases, especially when very strong cryptography is employed, it may be more prudent to have fewer, well controlled manual key distributions rather than more frequent, poorly controlled manual key distributions.

To summarize, operators need to change their crypto keys because of social and political, rather than scientific or engineering driven reasons.

You can read more about this in the IETF draft that i have co-authored here.

So what are inter-session Replay attacks?

Inter-session replay attacks are extremely hard to fix and most IETF routing and signaling protocols are vulnerable to them. Lets first understand what an inter-session replay attack is before we delve deeper into how we can fix them.

A reply attack is a type of an attack where the attacker captures the packets exchanged between two routers and later retransmits, or “replays”, this same packet back to the routers and thereby deceiving them into believing that this is a legitimate packet sent by their remote neighbor. Lets see how this will work:

Assume router A is sending an integrity protected (via some authentication mechanism) protocol packet to router B. The attacker can record the packet that A is sending. The attacker now waits for some time and retransmits this packet without any modification, back to B. B upon receiving this packet will as usual first try to verify the contents for any tampering. It will do this by verifying the authentication digest (usually Keyed-MD5 or HMAC-SHA) that the packet carries. Since the attacker has not modified the packet it will pass the integrity check as long as the key exchanged between the two routers remains unchanged. The integrity checks will pass on Router B and it will accept this packet as a legitimate packet sent by A.

This is a replay attack – So, how can it harm you?

Assume A was not advertising any route, or any neighbor reachability when the attacker had recorded this control packet.  In OSPF parlance, this could be a Hello without any neighbors or a RIPv2 packet without any routing information. Later when A learns some routes or neighbors it sends an updated protocol packet listing this information. B receives this packet and updates its protocol state and routing tables based on the information that A provides. Now the attacker replays the earlier recorded packet. B, upon receiving this “new” packet believes this to have come from A and updates its routing tables accordingly. This is incorrect as B will now update its forwarding tables based on stale information. If the replayed packet is an old OSPF Hello when A did not have any neighbors, B will, upon receiving this packet assume that A has now lost all its neighbors and will delete all routes via A. I had co-authored RFC 6039 some time back which describes many such replay attacks in great detail.

So, how do IETF protocols protect themselves from such attacks?

Most protocols packets carry a Cryptographic Sequence Number that increases as each packet is sent. The receivers only accept a packet if it carries a sequence number that is higher than what it had received earlier from the same neighbor.

This fixes the problem that i had described earlier as the replayed packet will carry a sequence number that will be lower than what B would have last heard from A. B, upon receiving this replayed packet will not accept it and would thus prevent itself from such replay attacks.  Its appears that we have a solution against all replay attacks – do we?

Well it turns out that the answer to this question is a big NO!

The cryptographic sequence number can protect us from what i call the intra-session replay attacks. However, it cannot protect us against inter-session replay attacks. Let see why?

Assume that the cryptographic sequence number currently being used by router A for some specific routing protocol is 1000. This means that B will not accept any protocol packet if it comes with a sequence number less than 1000. This is fine, and this will protect us against some attacks. Now assume that the attacker captures and records this packet with sequence number 1000. No one will know about this as the attacker has silently recorded this packet.

Now the attacker has to wait patiently till the current session between the Router A and Router B goes down and a new one is established. This can happen if one of the routers reboots (could be planned or unplanned). When this happens the routers reset their cryptographic sequence number to 0 and start all over again. If the password key  between the two routers has not changed, and it usually doesn’t, then the packet that the attacker has captured is carrying a valid cryptographic digest. The attacker can replay this packet any time and this will get accepted by B if the current sequence number that its seeing in the new session from A is less than 1000. This is an inter-session replay attack and is extremely difficult to fix with the current IETF security and authentication mechanisms. Note that a trivial way to protect against inter-session replay attacks is by changing the key each time a new session is established. However changing the key requires manual intervention and thus cannot be easily done all the time.

So, how do you fix this issue?

Sam Hartman (Huawei), Dacheng (Huawei) and I have submitted two proposals in the IETF to fix this inter-session replay attacks that i have described above.

The first is extremely simple.

We propose to change the current cryptographic sequence number space from 32 bits to 64.  The least significant 32 bits would be the usual cryptographic sequence number that will monotonically increase with each fresh packet transmitted. The most significant 32 bits would indicate the number of times this router has cold booted. Thus when the router initially comes up for the first time its value would be 0. Next time when it reboots and comes up, its value would be 1.

Consider a state when the router has cold booted “n” times and its current cryptographic sequence number is “m”. The aggregated cryptographic sequence number that will be used by the routing protocols would be:

m << 32 || n, where << is the left shift operator and || is the bit-wise OR operation.

Now this router reboots (again planned or unplanned).

Now its cryptographic sequence space starts from:

(m+1) << 32

Its trivial to see that the ((m+1) << 32) > ((m << 32) || n) for all values of m and n where each m and n > 0.

This mechanism will solve the inter-session replay attacks that have been described above. I will describe the second method in some other post. We have defined a generic mechanism that all protocols can use here in this KARP draft.

Catching Corrupted OSPF Packets!

I was having a discussion with Paul Jakma (a friend, co-author in a few IETF drafts, a routing protocols expert, the guy behind Quagga, the list just goes on ..) some time back on a weird problem that he came across at a customer network where the OSPF packets were being corrupted in between being read off the wire and having CRC and IP checksum verified and being delivered to OSPF stack. While the problem was repeatable within 30 minutes on that particular network, he could never reproduce it on his VM network (and neither could the folks who reported this problem).

Eventually, for some inexplicable reason, he asked them to turn on MD5 authentication (with a tweak to drop duplicate sequence number packets – duplicate packets as the trigger of the problems being a theory). With this, their problems changed from “weird” to “adjacencies just start dropping, with lots of log messages about MD5 failures”!

So it appears that the customer had some kind of corruption bug in custom parts of their network stack, on input, such that OSPF gets handed a good long sequence of corrupt packets – all of which  (we dont know how many) seem to pass the internet checksum and then cause very odd problems for OSPF.

So, is this a realistic scenario and can this actually happen? While i have personally never experienced this, there are chances of this happening because of any of the following reasons:

o PCI transmission error (PCI parallel had parity checks, but not always enabled, PCI express has a 32bit CRC though)

o memory bus error (though, all routers and hosts should use ECC RAM)

o bad memory (same)

o bad cache (CPUs don’t always ECC their caches – Sun its seems was badly bitten by this; While the last few generations of Intel and AMD CPU do this, what about all those embedded CPUs that we use in the routers?)

o logic errors, particularly in network hardware drivers

o finally, CRCs and the internet checksum are not very good and its not impossible for wire-corrupted packets to get past link, IP AND OSPF CRC/checksums.

The internet checksum, which is used for the OSPF packet checksum, is incredibly weak. There are various papers out there, particularly ones by Partridge (who helped author the internet checksum RFC!) which cover this, basically it offers very little protection:

– it can’t detect re-ordering of 2-byte aligned words
– it can’t detect various bit flips that keep the 1s complement sum the same (e.g. 0x0000 to 0xffff and vice versa)

Even the link-layer CRC also is not perfect, and Partridge has co-authored papers detailling how corrupted packets can even get past both CRC and internet checksum.

So what choice do the operators have for catching corrupted packets in the SW?

Well, they could either use the incredibly poor internet checksum that exists today or they could turn on cryptographic authentication (keyed MD5 with RFC 2328 or different HMAC-SHA variants with RFC 5709) and catch all such failures. The former would not always work as there are errors that can creep in with these algorithms. The latter would work but there are certain disadvantages  is using cryptographic HMACs purely for integrity checking. The algorithms require more computation, which may be noticable on less powerful and/or energy-sensitive platforms. Additionally, the need to configure and synchronize the keying material is an additional administrative burden. I had posted a survey on Nanog some time back where i had asked the operators if they had ever turned on crypto protection to detect such failures and i received a couple of responses offline where they alluded to doing this to prevent checksum failures.

Paul and I wrote a short IETF draft some time back where we propose to change the checksum algorithm used for verifying OSPFv2 and OSPFv3 packets. We would only like to upgrade the very weak packet checksum with something thats more stronger without having to go with the full crypto hash protection way. You can find all the gory details here!

Can we solve the inter-session replay attacks?

Most routing (OSPF, BFD, RIP, OSPFv3-AT, etc) and signalling (LDP, RSVP, etc) protocols defined by IETF have a cryptographic sequence number within the authentication data that increases monotonically with each new packet that the router originates. This protects the protocol from replay attacks as the receivers now keep track of the sequence numbers and ignore all packets that arrive with a number thats lower than the currently active one.

At worst, the attacker can keep replaying the last packet that was originated since most protocols accept packets with sequence number greater than or equal to what they had last received. This in my opinion is a hole that can be trivially plugged by mandating that protocols must only accept protocol packets if they come with a sequence number thats greater than what they have received till now.

So does this solve all replay attacks problem?

No, not really.

Imagine an attacker who captures a protocol packet when the cryptographic sequence number is say, 1000.  Now the next time this router cold boots it will reinitialize its sequence space to 1 and start sending packets from this value. The attacker can now replay the earlier captured packet – the one with the sequence number 1000. The receivers will accept the replayed packet since it comes with a sequence number thats higher than what they were currently seeing from the router. This is a vulnerability that most IETF protocols are susceptible to. This is not an issue with protocols that use an automated key management protocol (like IKEv2) as all the security parameters are renegotiated when a session bounces. However, most routing and signalling protocols DONT use an automated key management protocol and are thus exposed to this risk.

I call this as an inter-session replay attack where packets from the previous/stale sessions can be replayed. So, do we have a solution to this problem?

Well, there are a couple of things that we could do here. The most obvious solution is to update the last cryptographic sequence number in the non volatile memory of the router. Thus we update the memory each time we increment the sequence number. This can be read when the router cold boots and it can start using sequence numbers from this value. The problem with this solution is that this will involve frequent writes to the non volatile memory on the routers which is not recommended because of the limited life of such media.

The other solution is to use the clock (number of seconds elapsed since midnight UTC  January 1 1970) as the sequence numbers. In theory this time will always advance and we will thus never have a router issuing sequence numbers that will ever go back. This would ideally also work when the router reboots as the time would only have advanced. The problem with this solution is that we end up relying on NTP or 1588 and an assumption that clocks on a router will NEVER go back. This is unrealistic and cannot be the basis of a security system defined for any protocol. Its fragile and can be broken.

So what are we left with?

Sam Hartman, Dacheng Zhang and I start looking at this problem for OSPF and have written an IETF draft that we think addresses this problem. It associates two scalars with a router – the Session ID and the Nonce, and uses these in combination with the cryptographic sequence numbers to protect OSPF routers against inter and intra replay attacks. The mechanism described in this draft can be easily generalized and extended to other routing and signalling protocols.

This is currently being discussed actively in the OSPF WG and the KARP WG mailing lists. I will in some other post explain how the concept of Nonce and Session ID helps in solving the inter-reply attacks which is the key problem that needs to be solved.