Securing BFD now possible!

Confession Time.

I am guilty of committing several sins. One that egregiously stands out is writing two IETF specs for BFD security (here and here) without considering the impact on the routers and switches implementing those specs. Bear in mind that Bi-directional Forwarding Detection (BFD) is a hard protocol to implement well. Its hard to get into a conversation with engineers working on BFD without a few of them shedding copious quantities of tears on what it took them to avoid those dreaded BFD flaps in scaled setups. They will tell you how they resorted to clever tricks (hacks, if you will) to process BFD packets as fast as they could (plucking them out of order from a shared queue, dedicated tasks picking up BFD packets in the ISR contexts, etc) . In a candid conversation, an ex-employee of a reputed vendor revealed how they stage managed their BFD during a demo to a major customer since they didnt want their BFD to flap while the show (completely scripted) was on. So, long story short — BFD is hard when you start scaling. It just becomes a LOT worse, when you add security on top of it.

The reason BFD is hard is because of the high rate at which packets are sent and consumed. If you miss out a few packets from your neighbor you consider him dead and you bring down your routing adjacency with that neighbor. This causes a lot of bad things (micro-loops, traffic storms, angry phone calls) to happen , least of which trust me, is rerouting the traffic around the “affected” node.

The cost of losing BFD packets is very high — so you really want to keep the packet processing minimal, the protocol lean, which is why folks in the BFD WG get a migraine whenever an enthusiastic (though noble) soul mentions a TLV extension to BFD or (even worse) a BFD v2.

Now when you add security, things become a waaaaaaaaaaaaay more complex. Not only do you need to process the packets at a high rate, you also need to compute the SHA or the MD5 digest for each one of those. This becomes difficult when the sessions scale even with hardware assist for BFD. The current BFD specification for security mandates the digest to be computed for each packet that is sent (you could do something clever with the non-meticulous mode and we’ll talk about it some other day) so the spec is really useless as there is no vendor who can do that at the rate at which BFD packets need to be processed.

This also explains why the BFD specs have not moved further down on the standards track — or simply why they arent RFCs yet.

But there is a need to enhance BFD security, since thats currently the weakest link in the service provider network security. The routing and the signalling protocols have all been enhanced to support stronger cryptographic algorithms and BFD is the only protocol left thats still running without any authentication (!!!) . I hope this doesnt inspire hackers all around the world to break into the Verizons, the Comcasts and the Tatas. Well, if somebody does, then please pass me a pointer so that i can increase my bandwidth to get all those Pink Floyd bootlegs that i have been scavenging for.

So now, we need to secure BFD and we are stuck with a proposal that cant be used. Kind of cute, if youre not responsible for running a network.

The solution to this routing quagmire is however quite simple. I dribbled coffee all over my shirt when i thought of it the first time — checked if I wasnt missing out something obvious and when i was sure that it would hold ground, i pinged one of my co-authors who happened to be thinking on similar lines, and we quickly came up with a draft (after more than a year).

What we’re essentially proposing is this:

Most BFD packets are ping-pong packets carrying same information as was carried in the earlier packets — the payload doesnt change at all (used by most vendors to optimize their implementation — HINT use caching). Only when the state changes, that is, the BFD sessions go Up or Down, or a parameter changes (rarely), does the payload change. Use authentication only when the payload changes and never otherwise. This means that in most cases the packets will be sent in clear-text which can be easily handled as is done today. Only when the state changes, the digest needs to be computed which we know from our extensive experience is a relatively low occurrence event.

This proposal makes it very easy for the vendors to support BFD security, something which folks have been wishing for since long. You can get all the sordid details of our proposal here.

This is the first iteration of the draft and things will change as we move forward. While the current version suggests no changes to the existing BFD protocol, we might going ahead suggest a few changes to the state machine if that’s what it takes to make the protocol secure. Who said securing BFD was simple ? Its perhaps for this reason that the IETF community still hasnt proposed a solid mechanism for stronger authentication of BFD packets.

You can follow the discussion on the BFD WG mailing list or keep looking at this space for more updates.

Is this draft a reparation for the sins i had mentioned at the beginning of my post earlier?

Catching Corrupted OSPF Packets!

I was having a discussion with Paul Jakma (a friend, co-author in a few IETF drafts, a routing protocols expert, the guy behind Quagga, the list just goes on ..) some time back on a weird problem that he came across at a customer network where the OSPF packets were being corrupted in between being read off the wire and having CRC and IP checksum verified and being delivered to OSPF stack. While the problem was repeatable within 30 minutes on that particular network, he could never reproduce it on his VM network (and neither could the folks who reported this problem).

Eventually, for some inexplicable reason, he asked them to turn on MD5 authentication (with a tweak to drop duplicate sequence number packets – duplicate packets as the trigger of the problems being a theory). With this, their problems changed from “weird” to “adjacencies just start dropping, with lots of log messages about MD5 failures”!

So it appears that the customer had some kind of corruption bug in custom parts of their network stack, on input, such that OSPF gets handed a good long sequence of corrupt packets – all of which (we dont know how many) seem to pass the internet checksum and then cause very odd problems for OSPF.

So, is this a realistic scenario and can this actually happen? While i have personally never experienced this, there are chances of this happening because of any of the following reasons:

o PCI transmission error (PCI parallel had parity checks, but not always enabled, PCI express has a 32bit CRC though)

o memory bus error (though, all routers and hosts should use ECC RAM)

o bad memory (same)

o bad cache (CPUs don’t always ECC their caches – Sun its seems was badly bitten by this; While the last few generations of Intel and AMD CPU do this, what about all those embedded CPUs that we use in the routers?)

o logic errors, particularly in network hardware drivers

o finally, CRCs and the internet checksum are not very good and its not impossible for wire-corrupted packets to get past link, IP AND OSPF CRC/checksums.

The internet checksum, which is used for the OSPF packet checksum, is incredibly weak. There are various papers out there, particularly ones by Partridge (who helped author the internet checksum RFC!) which cover this, basically it offers very little protection:

– it can’t detect re-ordering of 2-byte aligned words
– it can’t detect various bit flips that keep the 1s complement sum the same (e.g. 0x0000 to 0xffff and vice versa)

Even the link-layer CRC also is not perfect, and Partridge has co-authored papers detailling how corrupted packets can even get past both CRC and internet checksum.

So what choice do the operators have for catching corrupted packets in the SW?

Well, they could either use the incredibly poor internet checksum that exists today or they could turn on cryptographic authentication (keyed MD5 with RFC 2328 or different HMAC-SHA variants with RFC 5709) and catch all such failures. The former would not always work as there are errors that can creep in with these algorithms. The latter would work but there are certain disadvantages is using cryptographic HMACs purely for integrity checking. The algorithms require more computation, which may be noticable on less powerful and/or energy-sensitive platforms. Additionally, the need to configure and synchronize the keying material is an additional administrative burden. I had posted a survey on Nanog some time back where i had asked the operators if they had ever turned on crypto protection to detect such failures and i received a couple of responses offline where they alluded to doing this to prevent checksum failures.

Paul and I wrote a short IETF draft some time back where we propose to change the checksum algorithm used for verifying OSPFv2 and OSPFv3 packets. We would only like to upgrade the very weak packet checksum with something thats more stronger without having to go with the full crypto hash protection way. You can find all the gory details here!

BFD Generic Cryptographic Authentication

Bidirectional Forwarding Detection (BFD) specification includes five different types of authentication schemes: Simple Password, Keyed Message Digest 5 (MD5), Meticulous Keyed MD5, Keyed Secure Hash Algorithm (SHA-1) and Meticulous SHA-1. In the simple password scheme of authentication, the passwords are exchanged in the clear text on the network and anyone with physical access to the network can learn the password and compromise the security of the BFD domain.

It was discovered that collisions can be found in MD5 algorithm in less than 24 hours, making MD5 insecure. Further research has verified this result and shown other ways to find collisions in MD5 hashes.

It should however be noted that these attacks may not necessarily result in direct vulnerabilities in Keyed-MD5 as used in BFD authentication purposes, because the colliding message may not necessarily be a syntactically correct protocol packet. However, there is a need felt to move away from MD5 towards more complex and difficult to break hash algorithms.

In Keyed SHA-1 and Meticulous SHA-1, the BFD routers share a secret key which is used to generate a keyed SHA-1 digest for each packet and a monotonically increasing sequence number scheme is used to prevent replay attacks.

Like MD5 there have been reports of attacks on SHA-1. Such attacks do not mean that all the protocols using SHA-1 for authentication are at risk. However, it does mean that SHA-1 should be replaced as soon as possible and should not be used for new applications.

However, if SHA-1 is used in the Hashed Message Authentication Code (HMAC) construction then collision attacks currently known against SHA-1 do not apply. The new attacks on SHA-1 have no impact on the security of HMAC-SHA-1.

I have written an IETF document that proposes two new authentication types – the cryptographic authentication and the meticulous cryptographic authentication . These can be used to specify any authentication algorithm for authenticating and verifying the BFD packets (aka key agility). In addition to this, this memo also explains how HMAC-SHA authentication can be used for BFD.

HMAC can be used, without modifying any hash function, for calculating and verifying the message authentication values. It verifies both the data integrity and the authenticity of a message.

By definition, HMAC requires a cryptographic hash function. We propose to use any one of SHA-1, SHA-256, SHA-384 and SHA-512 for this purpose to authenticate the BFD packets.

I recently co-authored an IETF draft that does BFD’s security and authentication mechanism’s gap analysis for the KARP WG – that draft can be found here.

Issues with existing Cryptographic Protection Methods for Routing Protocols

Most of us believe that using cryptographic authentication methods (MD5, etc) for the routing protocols running inside our networks really makes them very secure. Well, not really ..

We have published RFC 6039 that explains how each routing protocol can be exploited despite using the cryptographic authentication mechanisms endorsed by the IETF community.

To cite an example, a simple IP header attack on OSPF or RIP can result in the two adjacent routers bringing down the peering relationship between them. This can, in the worst case, blackhole a substantial amount of data traffic inside the network, something that will certainly not go well with the customers!

So how can an OSPF adjacency be brought down?

OSPF neighbors on the broadcast, NBMA and point-to-multipoint networks are identified by the IP address in the IP header. Because the IP header is not covered by the MAC in the cryptographic authentication scheme as described in RFC 2328, an attack can be made exploiting this vulnerability.

R1 sends an authenticated HELLO to R2. This HELLO is captured and replayed back to R1, changing the source IP in the IP header to that of R2.

R1 not finding itself in HELLO would deduce that the connection is not bidirectional and would bring down the adjacency!

The RFC also discusses some issues that we found with Bidirectional Forwarding Detection (BFD) protocol thats very frequently used in the service provider networks.