Securing BFD now possible!

Confession Time.

I am guilty of committing several sins. One that egregiously stands out is writing two IETF specs for BFD security (here and here) without considering the impact on the routers and switches implementing those specs. Bear in mind that Bi-directional Forwarding Detection (BFD) is a hard protocol to implement well. Its hard to get into a conversation with engineers working on BFD without a few of them shedding copious quantities of tears on what it took them to avoid those dreaded BFD flaps in scaled setups. They will tell you how they resorted to clever tricks (hacks, if you will) to process BFD packets as fast as they could (plucking them out of order from a shared queue, dedicated tasks picking up BFD packets in the ISR contexts, etc) . In a candid conversation, an ex-employee of a reputed vendor revealed how they stage managed their BFD during a demo to a major customer since they didnt want their BFD to flap while the show (completely scripted) was on. So, long story short — BFD is hard when you start scaling. It just becomes a LOT worse, when you add security on top of it.

The reason BFD is hard is because of the high rate at which packets are sent and consumed. If you miss out a few packets from your neighbor you consider him dead and you bring down your routing adjacency with that neighbor. This causes a lot of bad things (micro-loops, traffic storms, angry phone calls) to happen , least of which trust me, is rerouting the traffic around the “affected” node.

When BFD flaps

The cost of losing BFD packets is very high — so you really want to keep the packet processing minimal, the protocol lean, which is why folks in the BFD WG get a migraine whenever an enthusiastic (though noble) soul mentions a TLV extension to BFD or (even worse) a BFD v2.

Now when you add security, things become a waaaaaaaaaaaaay more complex. Not only do you need to process the packets at a high rate, you also need to compute the SHA or the MD5 digest for each one of those. This becomes difficult when the sessions scale even with hardware assist for BFD.  The current BFD specification for security mandates the digest to be computed for each packet that is sent (you could do something clever with the non-meticulous mode and we’ll talk about it some other day) so the spec is really useless as there is no vendor who can do that at the rate at which BFD packets need to be processed.

This also explains why the BFD specs have not moved further down on the standards track — or simply why they arent RFCs yet.

But there is a need to enhance BFD security, since thats currently the weakest link in the service provider network security. The routing and the signalling protocols have all been enhanced to support stronger cryptographic algorithms and BFD is the only protocol left thats still running without any authentication  (!!!) . I hope this doesnt inspire hackers all around the world to break into the Verizons, the Comcasts and the Tatas. Well, if somebody does, then please pass me a pointer so that i can increase my bandwidth to get all those Pink Floyd bootlegs that i have been scavenging for.

So now, we need  to secure BFD and we are stuck with a proposal that cant be used. Kind of cute, if youre not responsible for running a network.

One way to crack BFD security

The solution to this routing quagmire is however quite simple. I dribbled coffee all over my shirt when i thought of it the first time — checked if I wasnt missing out something obvious and when i was sure that it would hold ground, i pinged one of my co-authors who happened to be thinking on similar lines, and we quickly came up with a draft (after more than a year).

What we’re essentially proposing is this:

Most BFD packets are ping-pong packets carrying same information as was carried in the earlier packets — the payload doesnt change at all (used by most vendors to optimize their implementation — HINT use caching). Only when the state changes, that is, the BFD sessions go Up or Down, or a parameter changes (rarely), does the payload change. Use authentication only when the payload changes and never otherwise. This means that in most cases the packets will be sent in clear-text which can be easily handled as is done today. Only when the state changes, the digest needs to be computed which we know from our extensive experience is a relatively low occurrence event.

This proposal makes it very easy for the vendors to support BFD security, something which folks have been wishing for since long. You can get all the sordid details of our proposal here.

This is the first iteration of the draft and things will change as we move forward. While the current version suggests no changes to the existing BFD protocol, we might going ahead suggest a few changes to the state machine if  that’s what it takes to make the protocol secure. Who said securing BFD was simple ? Its perhaps for this reason that the IETF community still hasnt proposed a solid mechanism for stronger authentication of BFD packets.

You can follow the discussion on the BFD WG mailing list or keep looking at this space for more updates.

Is this draft a reparation for the sins i had mentioned at the beginning of my post earlier?


BFD in the new Avatar


BFDWe all love Bi-directional Forwarding Detection (BFD) and cant possibly imagine our lives without it. We love it so much that we were ready with sabers and daggers drawn when we approached IEEE to let BFD control the individual links inside a LAG — something thats traditionally done by LACP.

Having done that, you would imagine that people would have settled down for a while (after their small victory dance of course) — but no, not the folks in the BFD WG. We are now working on a new enhancement that really takes BFD to the next level.

There isnt anything egregiously wrong or missing per se in BFD today. Its just not very optimal in certain scenarios and we’re trying to plug those holes (and doing our bit to ensure that folks in data comm industry have ample work and remain perennially employed).

Ok, lets not be modest – there are some scenarios where it doesnt work (as we shall see).

So what are we fixing here?

Slow Start

Well for one, BFD takes awfully looooong to bring up the session. Remember BFD starts with sedate timers and then slowly picks up (each side needs to come to an agreement on the rate at which they will send packets) . So it takes a while before BFD can really be used for path/end node liveliness detection. If BFD is being used to validate an MPLS path/LSP then it will take a few additional seconds for BFD to come up because of the LSP ping bootstrapping procedures (RFC 5884).

In certain deployments, this delay is bad and we want to eliminate it. It is expected that some MPLS deployments would require traffic engineered LSPs to be created dynamically, driven by external applications as in Software Defined Networks (SDN). It is operationally critical to ensure that the forwarding paths are up (via BFD) before the applications start utilizing the newly created tunnels. We cant hence wait for BFD to take its time in coming up since the applications are ready to push data down the tunnels. So, something needs to be done to get BFD to come up FAST!

This is an issue in SDN domains where a centralized controller is managing and maintaining the dynamic network. Since the tunnels are being engineered by this centralized entity we want to be really sure that the new tunnel is up before sending traffic down that path. In the absence of additional control protocols (eg. RSVP) we might want to use BFD to ensure that the path is up before using it. Current BFD, with large set up times, can become a bottle neck. If the centralized controller can quickly verify the forwarding path, it can steer the traffic to the traffic engineered tunnel very quickly without adversely affecting the service.

The problem exacerbates as the scale of the network and the number of traffic engineered tunnels increase.

Unidirectional Forwarding Path Validation

The “B” in BFD, stands for “Bi-directional” (in case you missed that). The protocol was originally defined to verify bidirectional connectivity between two nodes. This means that when you run BFD between routers A and B, then both A and B come to know when either router goes down (or when something nasty happens to the link). However, there are many scenarios where only one of the routers is interested in verifying the data plane continuity between the two nodes (e.g., static route using BFD to validate reachability to the next-hop router OR a Unidirectional tunnel using BFD to validate reachability to the egress node). In such cases, validating the reverse direction is not required.

However, traditional BFD requires the other side to maintain the entire BFD state even if its not interested in the liveliness of the remote end.  So if you have “n” routers using a particular gateway, then the gateway has to maintain “n” BFD sessions with all its clients. This is not required and can easily be done away with.

Anycast Addresses

Anycast addressing is used for high availability, fast recovery, load balancing and dispersed deployments where the IGPs direct the traffic to the nearest server(s) within a group of potential servers, all sharing the same Anycast address. BFD as defined today is stateful, and hence cannot work with Anycast addresses.

With the growing need to use Anycast addresses for higher reliability (DNS, multicast, 6to4, etc) there is a need for a BFD variant that can work with Anycast addresses.

BFD Fault Isolation

BFD works in a binary state – it either tells you that the session is UP or its DOWN. In case of failures it doesnt help you identify and localize the fault. Using other tools to isolate the fault may not necessarily work as the OAM packets may not follow the exact same path as the BFD packets (e.g., when ECMP is employed).

There is hence a need for a BFD variant that has some capabilities that can help in fault isolation.

So, where does this lead to?

We have attempted to fix all the issues that i have described above in a new BFD variant that we call the “Seamless BFD” (S-BFD). Its stateless and the receiver (or the reflector) responds with an S-BFD response packet whenever it receives an S-BFD packet from the source. You can imagine this as a ping-pong game between the source and the destination routers. The source (or the client in S-BFD speak) wants to check if the path to the destination (or the Reflector in S-BFD speak) is UP or the reflector is UP and sends an S-BFD “ping” packet. The Reflector upon receiving this, responds with a S-BFD “Pong” packet.  The client upon receiving the “Pong” knows that the Reflector is alive and starts using the path.

Each Reflector selects a well known “Discriminator” that all the other devices in the network know about. This can be statically configured, or a routing protocol can be used to flood/distribute this information. We could use OSPF/IS-IS within an AS and BGP across the ASes. Any clinet that wants to send an S-BFD packet to this Reflector (or a server if it helps) sends the S-BFD packet with the peer’s Discriminator value.

A reflector receiving an S-BFD packet with its own Discriminator value responds with a S-BFD packet. It must NOT transmit any BFD packet based on a local timer expiration.

A router can also advertise more than one Discriminator value for others to use. In such cases it should accept all S-BFD packets addressed to any of those Discriminator values. Why would somebody do that?

You could, if you want to implement some sort of redundancy. A node could choose to terminate S-BFD packets with different Discriminator values on different line cards for load distribution (works for architectures where a BFD controller in HW resides on a line card). Two nodes can now have multiple S-BFD sessions between them (similar to micro-BFD sessions that we have defined for the LAG in RFC 7130) — where each terminates on a different line card (demuxed using different Discriminator values). The aggregate BFD session will  only go down when all the component S-BFD sessions go down. Hence the aggregate BFD session between the two nodes will remain alive as long as there at least one component S-BFD session alive. This is another use case that can be added to S-BFD btw!

This helps in the SDN environments where you want to verify the forwarding path before actually using it. With S-BFD you no longer need to wait for the session to come up. The centralized controller can quickly use S-BFD to determine if the path is up. If the originating node receives an S-BFD response from the destination then it knows that the end point is alive and this information can be passed to the controller.

Similarly applications in the SDN environments can quickly send a S-BFD packet to the destination. If they receive an S-BFD response then they know that the path can be used.

This also alleviates the issue of maintaining redundant BFD sesssion states on the servers since they only need to respond with S-BFD packets.

Authentication becomes a slight challenge since the reflector is not keeping track of the crypto sequence numbers (remember the point was to make it stateless!). However, this isnt an insurmountable problem and can be fixed.

For more sordid details refer to the IETF draft in the BFD WG which explains the Seamless BFD protocol and another one with the use-cases. I have not covered all use cases for Seamless BFD (S-BFD) and we have a few more described there in the use-case document.

Issues with how BFD is currently implemented over LAGs

The BFD standards dont explicitly talk about how BFD should be implemented on Link Aggregation Groups (LAGs). This leaves a lot of room for imagination and vendors have implemented their own proprietary mechanisms to make BFD work on LAGs. Now, there is only this much room for innovation and most vendors have naturally arrived at similar techniques to implement interoperable BFD over LAGs. So, what makes BFD so sticky to implement on LAGs?

BFD being an L3 protocol, is oblivious to the physical link that the BFD packets go out on. Usually, there is only one link associated with an L3 interface, and there is thus no ambiguity on the link that packet needs to go out on. However, when an IP interface is configured over a LAG, there are multiple constituent links that the packet can go out on, and BFD has to decide the link it wants to use for sending the packets out.

A LAG binds together several physical ports between two adjacent nodes so they appear to higher-layer protocols and applications as a single, higher bandwidth “virtual” pipe.

The problem with running BFD over a LAG is that without internal knowledge of the LAG structure it is impossible for BFD to guarantee a detection of anything but a full LAG shutdown within the BFD timeout period. The LAG shutdown is typically initiated by some LAG module. LAG timers are typically multiple times slower than the BFD detection timers (multiple 100ms vs. multiple 10ms of BFD). There is thus a need to bring some sort of determinism in how BFD runs over a LAG. There is also a need to detect member link failures much faster than what Link Aggregation Control Protocol (LACP) allows.

Lets look at what implementations currently do to implement BFD on LAGs.

The simplest approach to run BFD on a LAG interface is to ignore the internal structure and treat the LAG as one “big, virtual pipe”.

Because there is no standard, vendors have implemented their own proprietary mechanisms to run BFD over LAG interfaces. Two examples are shown here.

Some implementations send BFD packets only over the “primary” member link of the LAG. Others spray BFD packets over all member links of the LAG. There are issues with both these designs.

In the first design, BFD will remain Up as long as the primary link is alive. If the primary link goes down, and another link is not selected as the primary, before BFD times out (around 30-50ms), then the BFD session on the LAG comes crashing down. Problems arise as BFD, in this design, is oblivious to the presence of other member links in the LAG. If a non-primary link goes down, the BFD session remains unaffected as it can still send and receive BFD packets over the primary link. Since the BFD session is Up, other routers in the network continue sending traffic meant to egress out of this interface. As expected from the LAG, all traffic egressing out of this interface gets load distributed on all LAG member links. Now, there is one link thats down. All traffic sent over that failed link gets dropped, till the LAG manager detects this and removes it from the LAG.

In the second design, BFD packets are sprayed over all the member links of a LAG. This is done naively via round-robin, where each BFD packet is sent using the subsequent member link, in a round-robin fashion. It solves the problem of BFD going down because of the primary link going down, but it still does not solve the problem of traffic getting lost when one of the member link goes down. This is because, when a member link goes down, BFD remains up and traffic continues to go over the link that has failed till a higher layer protocol (usually LACP or the LAG Manager) detects this and removes the offending link from the LAG.

The above two designs defeat the core purpose of a BFD, which is to detect faults between the two forwarding engines. In each design traffic gets lost on a failed link till some protocol other than BFD detects this and removes that link from the LAG. The timers associated with the other protocol are an order of magnitude higher than BFD.

Operators have since long expressed a need to be able to detect the failed links fast so that their traffic doesnt get lost. The idea is to get BFD to take charge of the LAG and make it responsible for maintaining the list of active links in a LAG. This way we can use the BFD fast timers to quickly detect link failures.

One could argue that there are native Ethernet OAM mechanisms (.1ag, .3ah) that can be used to detect link failures in a LAG, and one need not rely on slow protocols like LACP or the LAG manager. The reality is that operators who have deployed BFD in their IP/MPLS networks want a common failure detection mechanism and dont want a mix of different technologies.

To solve the above mentioned issues I have co-authored an IETF document that proposes running BFD on each constituent link of the LAG. We call the BFD sessions running on each link a “micro BFD session”. We call this mode of BFD on LAGs as BLM – BFD on Lag Members.

BLM is an umbrella BFD session that contains information about the LAG (or the aggregated interface) that its running on. It consists of a set of micro BFD sessions that are running on each constituent link of the LAG. And it contains a state that we call the “Concluded State”, which describes the overall state of the LAG (Up, Down, AdminDown).

Each micro BFD session is a regular RFC 5880 and RFC 5881 compliant BFD session. Only Asynchronous mode is supported for the micro BFD sessions as the sole reason for running BFD on each member link is to verify the link connectivity. The Echo function for the micro BFD sessions is not recommended as it requires twice as many packets to achieve a particular Detection time as does the pure Asynchronous mode.

At least one system MUST take the Active role (possibly both). The micro BFD sessions on the member links are independent BFD sessions. They use their own unique, local discriminator values, maintain their own set of state variables and have their own independent state machine. Typically each micro BFD session will have the same timer values, however, nothing precludes the possibility of having different timer values among the different micro BFD sessions belonging to the same LAG.

A session begins with the periodic, slow transmission of BFD Control packets. When bidirectional communication is achieved, the BFD session becomes Up. The LAG manager is informed at this point, and the member link becomes an active link of the LAG.

If the micro session goes Down, the transmission of Control packets goes back to the slow rate. The LAG Manager is informed which removes the member link from the LAG.

Once a session has been declared Down, it cannot come back up until the remote end first signals that it is down (by leaving the Upstate), thus implementing a three-way handshake.

A session MAY be kept administratively down by entering the AdminDown state and sending an explanatory diagnostic code in the Diagnostic field.

In short, its pretty much the same as a standard BFD session.

This solves the issues that i had described in the earlier two designs. The micro BFD sessions will quickly detect a failed link, and will instantly remove it from the LAG. Traffic that was earlier egressing out over the failed link, will now get hashed to a different link in the LAG. This results in zero traffic loss on the LAG.

You can read more about our proposal here (more on how it evolved within IETF here).

Recognizing the need for running BFD on all member links, various vendors support their own proprietary, un-interoperable implementation of BFD over LAGs. We’re hoping that our IETF proposal to standardize this behavior will bring some order to the chaos thats out there and a relief to the providers who are stuck with proprietary solutions.

BFD Generic Cryptographic Authentication

Bidirectional Forwarding Detection (BFD) specification includes five different types of authentication schemes: Simple Password, Keyed Message Digest 5 (MD5), Meticulous Keyed MD5, Keyed Secure Hash Algorithm (SHA-1) and Meticulous SHA-1. In the simple password scheme of authentication, the passwords are exchanged in the clear text on the network and anyone with physical access to the network can learn the password and compromise the security of the BFD domain.

It was discovered that collisions can be found in MD5 algorithm in less than 24 hours, making MD5 insecure. Further research has verified this result and shown other ways to find collisions in MD5 hashes.

It should however be noted that these attacks may not necessarily result in direct vulnerabilities in Keyed-MD5 as used in BFD authentication purposes, because the colliding message may not necessarily be a syntactically correct protocol packet. However, there is a need felt to move away from MD5 towards more complex and difficult to break hash algorithms.

In Keyed SHA-1 and Meticulous SHA-1, the BFD routers share a secret key which is used to generate a keyed SHA-1 digest for each packet and a monotonically increasing sequence number scheme is used to prevent replay attacks.

Like MD5 there have been reports of attacks on SHA-1. Such attacks do not mean that all the protocols using SHA-1 for authentication are at risk. However, it does mean that SHA-1 should be replaced as soon as possible and should not be used for new applications.

However, if SHA-1 is used in the Hashed Message Authentication Code (HMAC) construction then collision attacks currently known against SHA-1 do not apply. The new attacks on SHA-1 have no impact on the security of HMAC-SHA-1.

I have written an IETF document that proposes two new authentication types – the cryptographic authentication and the meticulous cryptographic authentication . These can be used to specify any authentication algorithm for authenticating and verifying the BFD packets (aka key agility). In addition to this, this memo also explains how HMAC-SHA authentication can be used for BFD.

HMAC can be used, without modifying any hash function, for calculating and verifying the message authentication values. It verifies both the data integrity and the authenticity of a message.

By definition, HMAC requires a cryptographic hash function. We propose to use any one of SHA-1, SHA-256, SHA-384 and SHA-512 for this purpose to authenticate the BFD packets.

I recently co-authored an IETF draft that does BFD’s security and authentication mechanism’s gap analysis for the KARP WG – that draft can be found here.