RSVP-TE – Routing Freak!

Multiprotocol Label Switching (MPLS) and Generalized MPLS (GMPLS) provide a mechanism to set up point-to-multipoint (P2MP) LSPs which carry traffic from one ingress point (the root node) to several egress points (the leaf nodes), thus enabling multicast forwarding in an MPLS domain. However, there is no provision to provide a co-routed path back from the egress points (the leaves) to the ingress node (the root). The only way to do this is by configuring Unidirectional point-to-point LSPs from the leaves back to the root node. This entails configuring each leaf node with an LSP back to the root which could be a configuration and management nightmare if the number of leaves are large. Second, it can also not guarantee a co-routed path back from the leaf to the node, as the process of setting up the Unidirectional P2P path is independent from setting up the P2MP path.

This post introduces the concept of a hub-and-spoke multipoint (HSMP) LSP that allows traffic from the root to the leaves via a P2MP LSP and back from the leaves to the root via a unidirectional co-routed LSP. The proposed technique targets one-to-many applications that require reverse one-to-one traffic flow (thus many one-to-one in the reverse direction).

Consider the figure shown below.

PE1 is the ingress router for the HSMP LSP. The egress routers (leaves) are PE2, PE3 and PE4. As can be seen from the figure, PE1 creates a single copy of each packet arriving from the data source. This packet carries the MPLS label value L1. P1 is an ordinary Label Switch Router (LSR) that swaps the incoming label L1 with L2. Lets now focus on the packet forwarding process on the node P2.

For each packet belonging to the HSMP LSP, P2 makes three copies (just like how its done for a P2MP LSP), each of which is sent to PE2, PE3 and PE4 respectively. Packets arrive at P2 with MPLS label L2. As shown in the above figure, P2’s ILM contains an entry for label L2 saying that one copy of the packet should be sent out on interface if1 with label L3, another copy on interface if2 with label L4 and a last copy on interface if3 with label L5. Since the LSR P2 is replicating the MPLS packets its called the branch node.

An obvious advantage of this scheme is the bandwidth optimization. If we had used unidirectional P2P LSPs instead of an HSMP LSP, PE1 would have sent three copies of each packet that it had received from the data source and thus congesting the links PE1-P1 and P1-P2.

What sets an HSMP LSP apart from a regular P2MP LSP is the ability in the former to set up a path from the leaves back to the root. P2MP LSPs are unidirectional, so no traffic can flow from the leaf node routers to the ingress head end router along the P2MP LSP. In HSMP LSP, the leaf nodes can also send unidirectional traffic back to the root. This is shown in the figure below.

Because of the mechanisms defined in HSMP LSP, the branch node P2 advertises the same upstream label L1 for a given HSMP LSP to the nodes PE2, PE3 and PE4. It programs its ILM table as shown above, where it simply swaps L1 with L2 for all incoming MPLS packets and sends those towards P1. This way HSMP LSP is also able to provide a path back from the leaf nodes to the root node.

In the last post i had discussed issues that exist in VPLS. Lets see how HSMP LSP can solve them. I am using the same topology as was used there to demonstrate how HSMP LSP helps.

The figure 3 above shows the same VPLS service that we had discussed in the earlier post.

PE1 knows through some out-of-band mechanism (could be via BGP, Radius, manual configs, etc) that PE2, PE3 and PE4 are the egress nodes that belong to the same VPLS domain. PE1 now needs to establish an HSMP LSP (can be trivially extended to support a pseudowire) to PE2, PE3 and PE4. Figure shows 3 HSMP LSPs that will be required in this arrangement. The red HSMP LSP has PE1 as the root node and PE2, PE3 and PE4 as the leaf nodes. The green HSMP LSP has been initiated by PE2 and has PE2 as the root, and PE1, PE3 and PE4 as the leaf nodes. The blue HSMP LSP has been initiated by PE3 and has PE1, PE2 and PE4 as the leaf nodes. There is another HSMP LSP thats required – the one initiated by PE4 for the VPLS service to function. It has been omitted from the figure for the sake of clarity.

Thus all PE nodes in a VPLS service need to initiate an HSMP LSP (or a HSMP pseudowire) that terminates at the other PE routers.

In VPLS all BUM (broadcast, unknown unicast and multicast) traffic is flooded to all PE nodes. Its only the learnt traffic thats sent unicast by one PE to the other PE.

As explained earlier, a single copy sent by PE1 over an HSMP LSP will reach PE2, PE3 and PE4 (due to its P2MP component). Also the PE routers PE2, PE3 and PE4 can use this HSMP LSP (terminating at them) to send unicast traffic back to PE1.

Thus PE1 sends all BUM traffic on the HSMP LSP it initiates and all learnt unicast over the HSMP LSP that terminates on it.

Going back to figure 3, we can see that PE1 can use the red HSMP LSP to send all BUM traffic. This way it only sends one copy, and all the PE routers receive this packet. If PE1 wants to send learnt unicast traffic back to PE2, it uses the green HSMP LSP that terminates on it. PE1 can use this to send traffic back to PE2, which is the root node for this HSMP LSP. Similarly, PE1 uses the blue HSMP LSP whenever it wants to send learnt unicast traffic back to PE3.

Lets look at LSPs that PE2 uses. Whenever it wants to send BUM traffic, it uses the green HSMP LSP that it has originated. Any packet sent over that LSP is received by all the leaf nodes (which in this case happen to be PE1, PE3 and PE4). Like PE1, if it has to send learnt unicast traffic back to PE1, it uses the red HSMP LSP that was originated by PE1 and terminates at PE2.

Thus for a VPLS to fully function, all PE nodes must establish an HSMP LSP with all the other participating PE routers. It can use the optimized HSMP LSP that it originates for the BUM traffic and the HSMP LSP that other PEs originate for unicast communication.

The above table compares a VPLS service using HSMP LSPs with a regular VPLS service or Hierarchical VPLS (H-VPLS) service. Clearly, the former wins against the regular VPLS and H-VPLS on all counts. This may also perhaps be an answer to Juniper’s claim that H-VPLS is not scalable. Operators now need not implement H-VPLS; they can instead go in for VPLS services implemented using HSMP LSPs.

This post only briefly explains the idea behind HSMP LSPs. Its explained in detail here in this draft.

Virtual Private Lan Service (VPLS) is a multipoint-to-multipoint ethernet bridging service over an IP/MPLS backbone and is used for connecting geographically separated customer sites by emulating a LAN segment. This post assumes that the reader is familiar with the basic concepts of VPLS and IP/MPLS and does not attempt explain them.

VPLS requires a logical full mesh of all the participating Provider Edge (PE) routers since its emulating a LAN. This means that every PE router is connected to every other PE router by a pseudowire resulting in a full mesh infrastructure. VPLS solves the loop problem by using a split-horizon rule which states that member PE routers (PE1, PE2, PE3 and PE4 in the below Figure) must forward VPLS traffic only to the local attachment circuits when they receive the traffic from the other PE routers. Exchanging traffic learnt from one remote PE router to the other is not allowed. This prevents loops and also eliminates the need to run STP in the VPLS core network.

Consider a Video broadcast service (implemented as a VPLS service) that uses a video codec and offers 200 channels of standard-definition content. Assuming around 2 Mbps for each channel it would require 400 Mbps of total standard-definition traffic. Add another 50 HDTV channels, each consuming around 10Mbps, the total network bandwidth approaches 1Gbps.

Assume that the source of the Video transmission falls behind PE1. In that case, PE1 needs to send 1Gbps worth of Video traffic to PE2, PE3 and PE4 respectively.

Figure 2 shows the physical topology where PE1 is connected to the other PE routers via LSRs P1 and P2. Since PE1 is in a full logical mesh with PE2, PE3 and PE4, it means that PE1 needs to replicate all BUM (Broadcast, Unknown Unicast and Multicast) traffic three times, so that each PE can receive a copy.

Thus PE1 needs to make three copies of all the Video traffic that it receives which means that the link PE1 – P1 and P1-P2 will be now carrying 3Gbps. The amount of traffic that these links will carry will increase linearly as the number of PEs that PE1 is in full mesh with. Clearly this is NOT scalable!

So, whats the solution?

The solution apparently lies in using Hierarchical VPLS, also popularly known as H-VPLS.

The original VPLS architecture requires all PEs to be in a full mesh. This however may not always be practical if the number of PE routers is too high. Provisioning a full meshed network may also in some instances not be an efficient network design, as was illustrated in the topology in Figure 2.

To fix these issues H-VPLS architecture introduces the concept of spoke-pseudowires. Unlike mesh-pseudowires, that are used in regular VPLS, spoke-pseudowires can exchange traffic with other pseudowires (both mesh and spoke), so they can relay traffic between PE routers. Let us see how we can re-design the above network using H-VPLS.

To illustrate this, the figure above shows two H-VPLS architectures that can be used to break the logical full mesh that is required in the regular VPLS.

In the first there is a logical connection between PE1-PE2, PE2 – PE4 and PE1 – PE3. There is thus a VPLS service defined on PE1 which has just two spoke pseudowires and a connection to the local attachment circuit which is the source of the Video traffic. The first pseudowire connects PE1 to PE2 and the other connects it to PE3. There is no pseudowire connecting PE1 to PE4.

In the other design, PE1, PE2, PE3 and PE4 are all connected in a ring. In this case PE1 is connected to PE2 and PE3 using spoke pseudowires; PE2 to PE1 and PE4; PE4 to PE3 and PE2 and PE3 to PE1 and PE4.

While the two examples that i have taken dont have a mesh pseudowire, there is nothing that precludes that from happening. The true potential of HVPLS is only exploited when the network is designed using a combination of spoke and mesh pseudowires.

The split-horizon rule “Do not relay traffic among mesh-pseudowires” is used to prevent forwarding loops in H-VPLS networks. The mesh pseudowires dont exchange traffic as in the regular VPLS architecture. The spoke pseudowires otoh do not obey the split-horizon rule – thus traffic arriving on a spoke pseudowire is forwarded to the other spokes, meshes and local attachment circuits if any. This requires the provider to run Spanning Tree Protocol (STP) in the core to keep it loop-free, something that the providers dont feel to happy about given the high convergence values of STP. Since the traffic can loop providers need to be extremely careful when planning where the spokes and mesh pseudowires are placed.

While H-VPLS solves the problems seen in VPLS, it introduces a few of its own.

Lets look at each design and see how ..

In the first H-VPLS design (as shown above) the links PE1-P1 and P1-P2 are still carrying two copies of each BUM traffic, thus we have not gained significantly from the original VPLS design in terms of saving the bandwidth efficiency.

The second problem is that PE4 only gets the packets after they are relayed by PE2. If you go back to Figure 2, you will see that there is no physical connection between PE2 and PE4. Thus all packets go back to P2 before they reach P4, thus congesting this link. Its also introduces a single point of failure, where PE4 can get completely disconnected from the rest of the PEs if PE2 goes down. There is thus a huge amount of redundancy planned in H-VPLS, which can result in loops and thus using STP becomes extremely important here.

The third issue, and as per some critics, the biggest problem with H-VPLS is that PE2 now has to learn all the customer MACs that fall behind PE1 and P4. This is because all traffic from PE1 is terminated at PE2 and relayed to PE4. Similarly, all traffic coming from PE4 gets terminated at PE2 and is then forwarded to PE1. During this process PE2 has to learn all these MACs. This is primarily because H-VPLS implements the Hub-and-Spoke architecture in the data plane as against Route Reflectors in BGP that do it in the control plane.

Lets now look at the other H-VPLS design where all the PE routers are in a ring.

Clearly we need to run STP to break the forwarding loop. Assume that STP puts the spoke pseudowire connecting PE1 – PE3 in a blocked state. In this case PE1 only has one pseudowire (PE1 – PE2) to send traffic to. This solves the bandwidth wastage problem on the links PE1-P1 and P1-P2 as they only carry one copy of the BUM traffic.

This however, doubles the traffic on the links PE2-P2 and P2-PE4 as each packet is first relayed by PE2 to PE4 and then later by PE4 to PE3. So, while we had saved bandwidth on some links, we ended up wasting it somewhere else!

Like before, PE2 and PE4 have to unnecessarily learn all the customer MACs exchanged between learnt traffic between PE1 and PE3.

Another issue is that the learnt traffic between PE1 and PE3 cannot pick up the most optimal IGP path since it has to necessarily get routed via PE2 and PE4. This is a direct consequence of H-VPLS implementing the Hub-Spoke architecture in the data plane.

Its thus easy to sum up the issues that exist in VPLS and H-VPLS.

(1) Bandwidth is wasted since traffic for different pseudowires is replicated on a shared physical path. This is a big issue as more and more multicast video traffic is sent using VPLS services.

(2) A full mesh of PE routers can result in a significant amount of control traffic.

(3) If one uses H-VPLS then operators need to ensure loop prevention and detection. This entails running STP which is not the most attractive choice.

(4) Learnt traffic may not follow the best and the most optimal path since it has to get relayed via multiple PE routers.

(5) Unnecessary MAC learning happens on all PE routers participating in H-VPLS.

So, do we have a solution to the above problems? Thankfully we do!

I have written a draft with Lizhong Jin from ZTE and Frederic Jounay from France Telecom where we have introduced a concept of a Hub-and-Spoke Multipoint LSPs. This can be trivially extended to a Hub-and-Spoke Multipoint Pseudowires which can solve the issues described above that exist in the regular VPLS and the H-VPLS architectures. More detail about Hub-and-Spoke Multipoint LSPs and how they solve all the issues described above in the next post!

Tag: RSVP-TE

Hub and Spoke Multi-Point LSPs for Scalable VPLS Architecture

Does Hierarchical VPLS solve all Scaling issues found in VPLS?