OpenFlow, Controllers – Whats missing in Routing Protocols today?

openflowThere is a lot of hype around OpenFlow as a technology and as a protocol these days. Few envision this to be the most exciting innovation in the networking industry after the vaccum tubes, diodes and transistors were miniaturized to form integrated circuits.  This is obviously an exaggeration, but you get the drift, right?

The idea in itself is quite radical. It changes the classical IP forwarding model from one where all decisions are distributed to one where there is a centralized beast – the controller – that takes the forwarding decisions and pushes that state to all the devices (could be routers, switches, WiFi access points, remote access devices such as CPEs) in the network.

Before we get into the details, let’s look at the main components – the Management, Control and the Forwarding (Data) plane – of a networking device. The Management plane is used to manage (CLI, loading firmware, etc) and monitor the device through its connection to the network and also coordinates functions between the Control and the Forwarding plane. Examples of protocols processed in the management plane are SNMP, Telnet, HTTP, Secure HTTP (HTTPS), and SSH.

The Forwarding plane is responsible for forwarding frames – it receives frames from an ingress port, processes them, and sends those out on an egress port based on what’s programmed in the forwarding tables. The Control plane gathers and maintains network topology information, and passes it to the forwarding plane so that it knows where to forward the received frames. It’s in here that we run OSPF, LDP, BGP, STP, TRILL, etc – basically, whatever it takes us to program the forwarding tables.

Routing Protocols gather information about all the devices and the routes in the network and populate the Routing Information Base (RIB) with that information. The RIB then selects the best route from all the routing protocols and populates the forwarding tables – and Routing thus becomes Forwarding.

So far, so good.

The question that keeps coming up is whether our routing protocols are good enough? Are ISIS, OSPF, BGP, STP, etc the only protocols that we can use today to map the paths in the network? Are there other, better options – Can we do better than what we have today?

Note that these protocols were designed more than 20 years ago (STP was invented in 1985 and the first version of OSPF in 1989) with the mathematics that goes in behind these protocols even further. The code that we have running in our networks is highly reliable, practical, proven to be scalable – and it works. So, the question before us is – Are there other, alternate, efficient ways to program the network?

Lets start with what’s good in the Routing Protocols today.

They are reliable – We’ve had them since last 20+ years. They have proven themselves to be workable. The code that we use to run them has proven itself to be reliable. There wouldn’t be an Internet if these protocols weren’t working.

They are deterministic in that we know and understand them and are highly predictable – we have experience with them. So we know that when we configure OSPF, what exactly will it end up doing and how exactly will it work – there are no surprises.

Also what’s important about today’s protocols are that they are self healing. In a network where there are multiple paths between the source and the destination, a loss of an interface or a device causes the network to self heal. It will autonomously discover alternate paths and will begin to forward frames along the secondary path. While this may not necessarily be the best path, the frames will get delivered.

We can also say that today’s protocols are scalable.  BGP certainly has proven itself to run at the Internet’s scale with extraordinarily large number of routes. ISIS has as per the local folklore proven to be more scalable than OSPF. Trust me when i say that the scalability aspect is not the limitation of the protocol, but is rather the limitation of perhaps the implementation. More on this here.

And like everything else in the world, there are certain things that are not so good.

Routing Protocols work under the idea that if you have a room full of people and you want them to agree on something then they must speak the same language. This means that if we’re running OSPFv3, then all the devices in the network must run the exact same version of OSPFv3 and must understand the same thing. This means that if you throw in a lot of different devices with varying capabilities in the network then they must all support OSPFv3 if they want to be heard.

Most of the protocols are change resistant, i.e., we find it very difficult to extend OSPFv2 to say introduce newer types of LSAs. We find it difficult to make enhancements to STP to make it better, faster – more scalable, to add more features. Nobody wants to radically change the design of these protocols.

Another argument that’s often discussed is that the metrics used by these protocols are really not good enough. BGP for example considers the entire AS as one hop. In OSPF and ISIS, the metrics are a function of the BW of the link. But is BW really the best way to calculate a metric of an interface to feed in to the computation to select the best path?

When OSPF and all the routing protocols that we use today were designed and built they were never designed to forward data packets while they were still re-converging. They were designed to drop data as that was the right thing to do at that time because the mathematical computation/algorithms took long enough and it was more important to avoid loops by dropping packets.  To cite an example, when OSPF comes up, it installs the routes only after it has exchanged the entire LSDB with its neighbors and has reached a FULL state. Given the volume of ancillary data that OSPF today exchanges via Opaque LSAs this design is an over-kill and folks at IETF are already working on addressing this.

We also have poor multipath ability with our current protocols today. We can load balance between multiple interfaces, but we have problems with the return path which does not necessarily come back the way you wanted. We work around that to some extent by network designs that adapt to that.

Current routing protocols forward data based on destination address only. We send traffic to 192.168.1.1 but we don’t care where it came from. In truth as networks get more complex and applications get more sophisticated, we need a way to route by source as well by destination. We need to be able to do more sophisticated forwarding. Is it just enough to send an envelope by writing somebody’s address on an envelope and putting it in a post box and letting it go in the hope that it gets there? Shouldn’t it say that Hey this message is from the electricity deptt. That can go at a lower priority than say a birthday card from grandma that goes at a higher priority. They all go to the same address but do we want to treat them with the same priority?

So the question is that are our current protocols good enough – The answer is of course Yes, but they do have some weaknesses and that’s the part which has been driving the next generation of networking and a part of which is where OpenFlow comes in ..

If we want to replace the Routing protocols (OSPF, STP, LDP, RSVP-TE, etc) then we need something to replace those with. We’ve seen that Routing protocols have only one purpose for their existence, and that’s to update the forwarding tables in the networking devices. The SW that runs the whole system today is reasonably complex, i.e., SW like OSPF, LDP, BGP, multicast is all sitting inside the SW in an attempt to load the data into the forwarding tables. So a reasonably complex layer of Control Plane is sitting inside each device in the network to load the correct data into the forwarding tables so that correct forwarding decisions are taken.

Now imagine for a moment that we can replace all this Control Plane with some central controller that can update the forwarding tables on all the devices in the network. This is essentially the OpenFlow idea, or the OpenFlow model.

In the OpenFlow model there is an OpenFlow controller that sends the Forwarding table data to the OpenFlow client in each device. The device firmware then loads that into the forwarding path. So now we’ve taken all that complexity around the Control Plane in the networking device and replaced it with a simple client that merely receives and processes data from the Controller. The OpenFlow controller loads data directly into the OpenFlow client which then loads it directly into the FIB. In this situation the only SW in the device is the chip firmware to load the data into the FIB or TCAM memories and to run the simple device management functions, the CLI, to run the flash and monitor the system environmentals. All the complexity around generating the forwarding table has been abstracted away into an external controller. Now its also possible that the device can still maintain the complex Control Plane and have OpenFlow support. OpenFlow in such cases would load data into the FIBs in addition to the RIB that’s maintained by the Control Plane.

The Networking OS would change a little to handle all device operations such as Boot, Flash, Memory Management, OpenFlow protocol handler, SNMP agent, etc. This device will have no OSPF, ISIS,RSVP or Multicast – none of the complex protocols running. Typically, routers spend close to 30+% of CPU cycles doing topology discovery. If this information is already available in some central server, then this frees up significant CPU cycles on all routers in the network. There will also be no code bloat – we will only keep what we need on the devices. Clearly, smaller the code running on the devices, lesser is the bugs, resources required to maintain it – all translating into lower cost.

If we have a controller that’s dumping data into the FIB of a network device then it’s a piece of SW – its an application. It’s a SW program that sits on a computer somewhere. It could be an appliance, a virtual machine (VM) or could reside somewhere on a router. The controller needs to have connectivity to all the networking devices so that it can write out, send the FIB updates to all devices. And it would need to receive data back from the devices. It is envisioned that the controller would build a topology of the network in memory and run some algorithm to decide how the forwarding tables should be programmed in each networking device. Once the algorithm has been executed across the network topology then it could dispatch topology updates to the forwarding tables using OpenFlow.

OpenFlow is an API and a protocol which decides how to map the FIB entries out of the controller and into the device. In this sense a controller is, if we look back at what we understand today, very similar to Stack Master in Cisco. So if one has 5 switches in a stack then one of them becomes the Stack Master. It takes all of the data about the forwarding table. It’s the one that runs the STP algo, decides what the FIB looks like and sends the FIB data on the stacking backplane to each of the devices so that each has a local FIB (that was decided by the Stack Master).

To better understand the Controllers we need to think of 5 elements as shown in the figure.

Controller

At the bottom we have the network with all the devices. The OpenFlow protocol communicates with these devices and the Controller. The Controller has its own model of the network (as shown on the right) and presents the User Interface out to the user so that the config data can come in. Via the User Interface the admin selects the rules, does some configuration, instructs on how it wants the network to look like. The Controller then looks at its model of the network that it has constructed by gathering information from the network and then proceeds with programming the forwarding tables in all the network devices to be able to achieve that successful outcome. OpenFlow is a protocol – its not a SW or a platform – it’s a defined information style that allows for dynamic configuration of the networking devices.

A controller could build a model of the network and have a database and then run SPF, RSVP-TE, etc algorithms across the network to produce the same results as OSPF, RSVP-TE running on live devices. We could build an SPF model inside the controller and run SPF over that model and load the forwarding tables in all devices in the network. This would free up each device in the network from running OSPF, etc.

The controller has real time visibility of the network in terms of the topology, preferences, faults, performance, capacity, etc. This data can be aggregated by the controller and made available to the network applications.  The modern network applications can be made adaptive, with the potential to become more network-efficient and achieve better application performance (e.g., accelerated download rates, higher resolution videos), by leveraging better network provided information.

Theoretically these concepts can be used for saving energy by identifying underused devices and shutting them down when they are not needed.

So for one last time, lets see what OpenFlow is.

OpenFlow is a protocol between networking devices and an external controller, or in other words a standard method to interface between the control and data planes. In today’s network switches, the data forwarding path and the control path execute in the same device. The OpenFlow specification defines a new operational model for these devices that separates these two functions with the packet processing path on the switch but with the control functions such as routing protocols, ACL definition moved from the switch to a separate controller. The OpenFlow specification defines the protocol and messages that are communicated between the controller and network elements to manage their forwarding operation.

Added Later: Network Function Virtualization is not directly SDN. However, if youre interested i have covered it here and here.

BGP Path Hunting and Exploration ..

It has been shown through various studies that BGP convergence can be roughly given by the following formula:

Convergence = (Maximum AS_PATH – Minimum AS_PATH) x MRAI (MinimumRouteAdvertisementInterval)

MRAI as per the specification is the amount of time that a BGP speaker will wait before passing on successive route updates for the same prefix – it defaults to 30 seconds, and applies to all announcements, with no exemption for withdrawals. Since there can be some variance in the MRAI across different BGP implementations the convergence time is roughly given by the above formula.

Lets see how we arrive at this. Consider the topology as show in the fig 1 below:

Network Topology
Figure 1: Network Topology

A, B, C, D and E are all routers in ASes A, B, C, D and E respectively. Assume that A has announced its reachability to 10/8 and all routers have converged. This implies that E has the following paths for 10/8 in its adj-RIBs-In:

p(B) -> AS_PATH {B A} (path learnt from B)

p(C) -> AS_PATH {C B A} (path learnt from C)

p(D) -> AS_PATH {D C B A} (path learnt from D)

In steady state E would advertise 10/8 with AS_PATH {E B A}

Lets see what happens when A loses its connection to the network 10/8.

A announces a withdrawal for 10/8 to B as shown in figure 2.

Figure 2
Figure 2

B runs its decision process and finds that there is no other route to 10/8, and thus sends an explicit withdrawal to its peers C and E.

E upon receiving the withdrawl runs its decision process and finds that the best route, now available for 10/8 is the one that it learnt from C – the one with the AS_PATH {C B A}. It installs this route in the FIB and advertises this, as shown in figure 3.

Figure 3
Figure 3

E switches to the next best route (via C) and continues to forward traffic for 10/8. It should however be noted, that E at this point of time, has absolutely no idea that C too has received a withdrawal for 10/8.

C meanwhile runs its decision process and decides to send an explicit withdrawal to its peers D and E.

E upon receiving the withdrawal from C runs its decision process and finds that it still has a valid path to reach 10/8 – the one via D (refer to figure 4). It switches to this path, again unaware that D too has received an withdrawal for 10/8.

Figure 4
Figure 4

D now sends a withdrawal to E, and thereby removing all possible paths to 10/8 from E (fig 5). Its now that the network is converged.

Figure 5
Figure 5

What we just saw happening above on E is “BGP Path Hunting”. i.e., BGP “hunts” though all possible AS paths, starting from the shorter ones to the longer ones, until it finally converges.

It can be easily proven that the amount of path hunting will increase as the meshiness of the topology increases. In an acyclic topology (say a tree) there is only one possible path so there is no path hunting. An addition of a single link in this topology creates a cycle in the graph and thus at most two possible paths for BGP to “hunt”. Subsequent links can add many more alternate paths for BGP to hunt, depending on where they’re placed in the graph.

In the worst possible case path hunting in BGP can explore every possible path of each path length. More commonly it has been observed that path hunting in today’s Internet can add an additional 2 or 3 BGP updates to a prefix withdrawal.

AboveNet Hijacks Africa Online!

Internet, only a few weeks ago, had seen Pakistan Telecom Authority (PTA) hijacking the IP prefixes announced by Youtube, as protest against some videos that had been put up there, knocking millions off, all around the globe from accessing Youtube. I wrote about this here. This time its an ISP from USA and Europe, AboveNet (AS 6461) thats hijacked prefix announced and owned by Africa Online (AS 36915).

AboveNet inexplicably started announcing reachability to one of the prefixes (194.9.82.0/24) owned by Africa Online. It took AboveNet more than 22 hours since the problem was first reported, to fix it. Wonder what took them so long! As a result of this prefix hijack, potentially millions of users in Kenya or Africa, all behind 194.9.82.0/24, lost connectivity to the Internet. In isolation 194.9.82.0/24 is not a huge space, but add a couple of NATs and the number of users easily swells to millions. What this means for you and me, who are not being served by Africa Online is, that we lose connectivity to all the websites being hosted behind this IP address block. Imagine what it would do to the Internet if emergency services, banks, google were being hosted there!

Lets see why users in Kenya would lose total connectivity to the Internet:

A user accesses google.com with a (NATed or otherwise) source IP address 194.9.82.x. Google graciously responds, and the IP packet carries the destination IP 194.9.82.x. Because AboveNet has announced reachability to this IP address block, all traffic destined to 194.9.82.x comes to AboveNet where it gets royally dumped, while the user sitting in Kenya (or Africa) is still hopelessly waiting for the packet to arrive.

So, why are the service providers all over the world preferring the route announcement from AboveNet over the one originated from Africa Online?

Well, thats unfortunately how Internet, and my favorite routing protocol – BGP, works!

In BGP, the route advertisement from the provider which has a better vantage point on the Internet, usually wins.

In this particular case both AboveNet(AS 6461) and Africa Online (AS 36915) announced the route to 194.9.82.0/24 . AboveNet, operating from US, sits much closer to the core as compared to Africa Online, and is thus better connected to the other networks than the latter. The AS_PATH length thus seen by the other service providers for the route advertised by AboveNet is much shorter than the one advertised by Africa Online. As a result of this, other BGP speakers pick up the route advertised by AboveNet against the route advertised by Africa Online.

The figure below, constructed using BGPlay from RIPE NCC, shows a snapshot of the routing activity for 194.9.82.0/24 during the period when it was hijacked. The colored lines indicate the path different ASes would take to reach this prefix. Clearly most of the ASes believed AboveNet to be a better path for 194.9.82.0/24.

Vantage Point on the Internet

This is how BGP works and mind you, this isnt broken.

Whats broken is our inability to verify the claim of a service provider when it announces ownership of an address block in BGP. Restrictive route filtering can be applied where the providers only accept the specific prefixes allocated to the customers or where the upstream accepts only specific prefixes allocated to the ISPs, but this is too cumbersome and rarely works. As the matter stands today, there isn’t any clean way to know if the reachability announced by your friendly peer is genuine or whether the provider has a feasible path to the destinations advertised. This needs to be fixed and there is work going on towards this direction in the SIDR WG of IETF.

Can the service providers do something when they learn that an IP prefix has been hijacked by some AS? The answer, fortunately, to this is an unequivocal Yes.

A service provider can override BGP’s decision process in selecting the route advertisement with the shortest AS_PATH length by (i) manipulating the BGP path attribute like LOCAL_PREF since its checked before the AS_PATH length or (ii)  decreasing the weight of the offending peer you learn the hijacked route from (this would only work for routers connected  directly to AboveNet) (iii) Use regular expressions to filter all or the specific hijacked route advertisement from AS 6461 (AboveNet) so that the announcement from Africa Online wins. The legitimate route is now propagated to other parts of the world.

Each time an ISP inadvertently hijacks someone else’s address block it risks to lose some amount of credibility in the service provider world. Their names are splashed on the mails/PPTs in NANOG and IETF whenever there’s a discussion on interdomain security or on the blogs all over the world.

Fortunately for AboveNet, their hijacking didn’t throw millions off popular websites like Youtube, Google, Yahoo! etc. That would have attracted a LOT more attention than what this event did. When PTA had hijacked YouTube, it was all over the news and there were columns running in Wall Street Journal and New York Times about how tenuous the Internet architecture is. Also what unfortunately went in favor of AboveNet was that the affected users were not in US/Europe/Japan, but were in a relatively silent African subcontinent.

Pakistan Hijacks Youtube!

We’ve been reminded yet again, of how vulnerable the Internet architecture is, to malicious attacks and simple mis-configuration (oversight?) from the service provider side. A week ago, the Pakistan Telecommunication Authority released an order instructing the country’s 70 odd ISPs to block Youtube.com until further notice. The ISPs acting on this directive could have installed access-control lists (ACLs) on all their router interfaces dropping packets bound to this website, but they instead, chose a more convoluted way of blocking traffic bound to this site – they created a more specific route pointing to a NULL (or a discard) interface, thereby black-holing all traffic bound to this address. Fair enough, this is also a way to drop traffic since the former would entail augmenting all existing ACL filtering policies on all router interfaces. Whats intriguing is how this route “accidentally” leaked into BGP and got advertised to PCCW (AS 3491), the upstream provider providing services to Pakistan Telecom (AS 17557), from where it propagated further to other parts of the Internet.

Youtube advertises 208.65.152.0/22 and Pakistan Telecom advertised a more specific route, 208.65.153.0/24, to its provider PCCW . PCCW, like most ISPs, without validating the prefix announcements based on Regional Internet Registry (RIR) allocations or even Internet Routing Registry (IRR) objects, further propagated this BGP UPDATE to its peers. Within no time, the erroneous BGP announcement permeated across large parts of the internet, resulting in all traffic bound to Youtube, to go towards Pakistan, where it would end up getting blackholed. Youtube was thus successfully “hijacked” by Pakistan Telecom .

I dont intend to discuss, whether this was done maliciously or whether it was an error on part of the PT, but the whole saga raises uncomfortable questions on the security and frailty of the Internet as it works today. Currently, the larger ISPs tend to trust the network providers that they are connected to and work on the tenuous assumption that the smaller providers would not illegitimately “hijack” someone else’s IP address and will behave decorously and only announce the IPs that they own. Patently, this doesn’t seem to be working out.

An attacker can masquerade as a big financial website by “hijacking” all traffic bound towards the legitimate website, thereby wreaking havoc on the gullible users. It should be noted that this sort of attack is more potent and dangerous than the spam mails that we often receive in our mailboxes, asking us to update and enter our bank details. In the latter cases, an alert user can always detect, that the server on which the page is loaded does not belong to the bank or the financial institution that the mail purports to be. However when the IP address block is “hijacked”, there are no such defenses.

Internet connectivity is vital in todays age, with a lot of emergency services being routed over the Internet. Imagine a natural disaster or a terrorist attack followed by an IP address block “hijack”. The full repercussions of what all can be achieved with this kind of an attack makes your mind spin and wobble.

A timeline created by Renesys, which provides real-time monitoring services, says that it took about 15 seconds for large Pacific-rim providers to direct YouTube.com traffic to the Pakistan ISP, and about 45 seconds for the central routers on which most of the Internet traffic relies, to misroute the traffic.

So, what happened to Pakistan’s Internet Connectivity as it attracted zillions of bytes of data intended for Youtube.com? It probably went down as PT could not have handled that massive amount of data, rendering millions inside Pakistan without any Internet connectivity.

In the coming days i would discuss what BGP Prefix Hijacking is and what it entails to protect the Internet from this and the work being done in IETF wrt this.

Also read about AboveNet hijacking Africa Online here.

Issues with existing Cryptographic Protection Methods for Routing Protocols

Most of us believe that using cryptographic authentication methods (MD5, etc) for the routing protocols running inside our networks really makes them very secure. Well, not really ..

We have published RFC 6039 that explains how each routing protocol can be exploited despite using the cryptographic authentication mechanisms endorsed by the IETF community.

To cite an example, a simple IP header attack on OSPF or RIP can result in the two adjacent routers bringing down the peering relationship between them. This can, in the worst case, blackhole a substantial amount of data traffic inside the network, something that will certainly not go well with the customers!

So how can an OSPF adjacency be brought down?

OSPF neighbors on the broadcast, NBMA and point-to-multipoint networks are identified by the IP address in the IP header. Because the IP header is not covered by the MAC in the cryptographic authentication scheme as described in RFC 2328, an attack can be made exploiting this vulnerability.

R1 sends an authenticated HELLO to R2. This HELLO is captured and replayed back to R1, changing the source IP in the IP header to that of R2.

R1 not finding itself in HELLO would deduce that the connection is not bidirectional and would bring down the adjacency!

The RFC also discusses some issues that we found with Bidirectional Forwarding Detection (BFD) protocol thats very frequently used in the service provider networks.