CSE 588 Network Systems Spring 1997 A Comparison of IP Switching Technologies from 3Com, Cascade, and IBM. Executive summary Network usage continues to grow rapidly. Web-based computing and an ever increasing number of users has brought unprecedented challenges to network infrastructures. LAN switching is currently a popular and cost effective means of increasing bandwidth. However, it creates new problems. For example, conventional routers can't handle the increased traffic made possible by the high performance switches. Further, emerging real-time applications such as video conferencing loom on the horizon which will require massive bandwidth as well as high quality of service. Are faster routers and larger pipes, e.g., gigabit Ethernet, the way to go? Ideally, we would really like to have something that scales well into the future and provides good QoS. ATM hardware looks promising but, being connection-oriented, it doesn't mesh well with connectionless IP. A number of ways have been concocted to run IP on top of ATM, none of which are very satisfying because they don't take the most advantage of ATM. They are either too complex, inefficient, and/or don't scale well. But in early 1996 a startup company, Ipsilon Networks, introduced an elegant solution to this problem which they called IP Switching. Industry has since then accepted it. Some companies license it from Ipsilon while others have created their own versions of it with different twists. Indeed, a number of them provide some sort of "cut-through" switching over other link-level technologies, not just ATM. In this paper we compare the offerings from 3Com, Cascade and IBM who have recently joined forces to provide an integrated, end-to-end, desktop to LAN to WAN to LAN to server IP switching solution. Background The Ipsilon approach emerged as a high performance alternative to many existing ATM protocols, such as Multiprotocol Encapsulation (RFC 1483) and the ATM Forum's Multiprotocol over ATM (MPOA). In the first protocol, every router is connected to every other router by a direct ATM virtual circuit (VC) to minimize the number of layer 3 hops. But this full connectivity leads to the "n-squared" problem where the number of VCs required grows quadratically as new routers are added thus limiting scalability. The MPOA model groups workstations and servers within virtual subnets and uses ATM LAN Emulation (LANE) to move packets from one subnet to another. An external route server forwards an arriving packet, and simultaneously downloads layer 3 information down to the source device, which determines the ATM address of the destination using Next-Hop Routing Protocol (NHRP). Subsequent packets between the same source and destination now use this ATM address, bypassing the route server completely. Overall response to the MPOA model remains mixed, perhaps because of the extensive hype and high expectations. Critics point out that MPOA requires many new complicated protocols and depends on a route server, which could limit scalability. Ipsilon proposed what they feel is a simpler, more robust solution. Ipsilon IP Switches are still based on high speed, high capacity ATM hardware, but General Switch Management Protocol (GSMP) and Ipsilon Flow Management Protocol (IFMP) software (at approximately 2,000 and 10,000 lines of code, respectively) replace more complicated MPOA protocols (coming in at 300,000 lines of code). GSMP (RFC1987) replaces standard ATM signaling to request, tear down, and monitor VCs. IFMP (RFC1953) is used between the IP Switches and between IP Switches and edge devices to associate flows with VCs. An IP switch controller routes like an ordinary router, forwarding packets on a default VC. However, it also performs flow classification for traffic optimization. A flow is an extended IP conversation, i.e. a long-lived sequence of IP packets sent from a particular source to a particular destination sharing the same protocol type. Once a flow is identified, the IP switch sets up a cut-through connection by first establishing a VC for subsequent flow traffic, and then by asking the upstream node to use this VC. If the upstream node concurs, the traffic begins to flow on this new VC bypassing the routing software and its associated processing overhead. Ipsilon estimates that flows make up more than 80% of internetwork traffic. The remaining traffic is forwarded by layer 3 hops in the usual way. Flows also provide a convenient hook for QoS. By analyzing IP headers an IP Switch can relate individual flows to performance requirements and request ATM VCs with the proper type of service. Individual QoS requests for each flow will be supported using RSVP. This scheme's greatest attribute is its performance: Ipsilon claims throughput of up to 5.3 million packets/second (pps) for their first-generation product. More expensive high-end routers max out at around 1 million pps or less. It is fully compatible with existing and emerging IP protocols such as RIP, OSPF, DVMRP and IGMP. Support for IPv6, RSVP, and BGP is planned. On the other hand, because IP switching is based on IP, tunneling or encapsulation techniques are needed for non-IP protocols. The multiservice capability that you normally expect from ATM doesn't exist in IP Switches. However, IPX will be supported soon. The bulk of the criticism, however, relates to Ipsilon's use of virtual circuits. Flows are associated with application-to-application conversations and each flow gets its very own VC. Large environments like the Internet with millions of individual flows would exhaust VC tables. Under these conditions, their design would need to be modified to support flows with less granularity. In January of 1997, networking industry leaders 3Com, Cascade, and IBM announced their plans to cooperate in the implementation of end-to-end IP switching solutions across enterprise and public networks. The plan looks very promising: it has the advantage of being interoperable with other vendor's gear through industry standards, and promises tremendous gains in end-to-end network performance. Each company has it's own area of expertise. 3Com will represent the desktop and LAN community with its Transcend architecture and Fast IP product, where Cascade's IP Navigator will focus on superior WAN service. IBM will continue to develop its Aggregate Route-based IP Switching (ARIS) and Multiprotocol Switched Services (MSS) software, which control the routing, bridging, traffic and congestion functions of its switches. The rest of the industry has clearly taken notice of the cooperative effort and eagerly awaits demonstrations in May. 3Com's Fast IP 3Com's Fast IP product focuses on LANs and provides IP switching across all types of backbone technologies including Ethernet, Fast Ethernet, Gigabit Ethernet, FDDI, Token Ring, and ATM. A significant portion of Fast IP's functionality lies in new end-system software which provide 802.1p VLAN registration and NHRP address resolution protocols. In environments where end systems cannot be upgraded, Fast IP switches supporting 802.1p, 802.1Q, and NHRP may establish a Fast IP connection. Fast IP is based on several emerging standards. These include 802.1Q, 802.1p, and NHRP. 802.1Q provides an architecture, protocol, and mapping for bridges between VLANs. 802.1Q will enable a standards-based identification of VLANs and deliver VLAN communications across common switched backbones. 802.1p involves protocol mechanisms that allow end systems and switches to dynamically register membership and convey other information. Fast IP will use the Generic Attribute Registration Protocol (GARP) defined in 802.1p to provide VLAN membership registration and enable switches to map and exchange topology information. For example, a desktop will issue a GARP message indicating its VLAN membership and location. In this way, all switches will learn the VLAN topology. Lastly, NHRP specifies a mechanism that allows a source node to determine the subnetworking layer address of either the destination node or the next hop towards the destination node. Although primarily designed for non-broadcast multi-access networks such as ATM, NHRP techniques may be extended to broadcast multi-access networks such as Ethernet, FDDI, and Token Ring. The Fast IP process can be started by either an end system or a Fast IP-enabled switch. An end system will issue an NHRP request based on data to be forwarded to a separate subnet or VLAN. The event that triggers the NHRP request is configurable, e.g., after a specified number of packets are sent to the destination address, or when certain types of packets are sent such as those based on QoS priorities. The NHRP request is a standard format packet with source and destination MAC and IP addresses and frame type indicating an NHRP packet. Contained in the data portion of the packet are the source end node's MAC address and VLAN ID. These will be used by the receiving end system to send an NHRP response back to the originating source. The NHRP packet will be forwarded to a router just like any other packet. The router can filter the packet or forward the packet according to configured policies, e.g. it may be configured to deny access to the destination subnet based on the source node's subnet address. Alternatively, the router may filter the packet based on the NHRP type field. If there are no restrictions, the router will forward the NHRP request to the destination node. The destination node will issue an NHRP response directly to the originating source node using the source MAC address and VLAN ID contained in the NHRP request. Switches along the data path of the NHRP response will forward the packet based on either the destination MAC address or VLAN ID. In cases where the switch does not have the destination MAC address in its address tables, it will forward the packet based on VLAN ID. This ensures that as long as the underlying infrastructure is switched, the NHRP response will reach the originating node. In returning the NHRP response, switches in the data path will also learn and map the address of the source node. An NHRP response received by the originating source node will indicate that there is an underlying switched connection between VLANS. The source node will then redirect data packets directly to the destination node using its MAC address, effectively bypassing the router and enabling wire-speed switching. If a response to a Fast IP connection request is not received, the requesting node will continue to send packets through the default router gateway. Fast IP is designed to work over multiple network architectures. Further, none of the underlying proposed techniques, 802.1p/Q, or NHRP are tied to IP, so Fast IP can easily be extended to other protocols. It is the only IP switching proposal that works across multiple backbone technologies and for multiple protocols. A distinctive feature of the 3Com solution is that flow policy is based on requests initiated from desktop and server systems. It seems straightforward: data senders can explicitly tag associated frames with desired policy, thus relieving the guesswork and performance-compromising analysis at downstream devices. Cascade's IP Navigator Cascade's IP switching product, IP Navigator, is IP switching for WANs. Its a software upgrade to their existing ATM and frame relay switches running Virtual Network Navigator (VNN). VNN is Cascade's OSPF-based networking architecture which provides the internal communications for their family of multiservice WAN switches. VNN includes management of frame relay and ATM attributes like available bandwidth and QoS and it performs the routing functions needed to establish end-to-end VCs throughout the network taking into consideration QoS, for example, to calculate the best routes. IP Navigator can run together with other ATM protocols unlike Ipsilon's IP switching which replaces standard ATM signaling with GSMP. IP Navigator adds an IP routing table to VNN except instead of recording the IP address of the next hop, the end-destination switch id (e.g., a virtual circuit identifier (VCI) in the case of ATM) for each IP destination address is recorded. This is much like Cisco's Tag switching or IBM's ARIS. IP Navigator/VNN addresses the virtual circuit O(n^2) scaling problem by defining a new type of virtual circuit, a Multipoint-to-Point Tunneling (MPT) virtual circuit. In MPT, a switch, call it A, uses itself as the root and it establishes a single multicast circuit to all other switches in the network, adding them as leaves. The multicast circuit informs all other switches of the circuit that is to be used for forwarding data to switch A. This multicast circuit actually functions as a reverse forwarding tree for data destined to switch A. To forward traffic to switch A, another switch looks for the multicast circuit rooted at switch A. The multicast circuit then forwards data in the reverse direction, from leaf to root. This dramatically reduces the total potential number of VCs in the core to N, where N is the number of edge switches. So when a frame is received by a port of a Cascade switch configured for IP Navigator its IP header is examined and its egress switch is looked up on the IP routing table. The packet then moves rapidly through a preestablished Multipoint-to-Point Tunnel to the egress switch. Another routing table lookup is performed at the destination switch to determine the egress port. Unlike Ipsilon's IP Switching, under this scheme every packet gets switched and there is no time wasted in setting up a session. The MPT VCs are established at startup time. Latency is reduced and packet processing speeds are increased by removing the layer 3 routing hops in the core. Cascade's literature makes a big deal of this MPT technology. We thought we must be missing something here but as far as we can tell, with the limited amount of information we had, this is pretty much what Cisco's TDP and IBM's ARIS are doing as well, its just that they give it other names and don't make it sound so grandiose. Once an incoming packet is assigned a VCI (in the case ATM) by an edge switch, the ATM switches do all the rest to move it to its egress switch. Perhaps there is more to it for traffic that isn't best-effort, i.e. some higher QoS. For best-effort traffic the normal IP hop routed path and the reverse MPT path are the same. In the case of ATM because IP packets from different sources can converge and share a VCI on the way to the same destination it is possible that cells from two packets can become interleaved and there is no way to sort them out again. One solution is to buffer colliding packets which conserves VCs but may require additional hardware. Cisco is taking this approach. Another way is to use ATM virtual path (VP) labels, one VP per egress point and each source point would use a different VC within the VP. Now the destination switch can sort out interleaved cells. Although more VCs are used in this method the amount of state information is still O(N), where N is the number of destinations. IP Navigator takes this latter approach. IBM's ARIS supports both approaches. Note this is not a problem with Ipsilon's IP Switching since every flow receives its own VC and they are never shared. IP Navigator provides full ATM-quality QoS for IP by using VNN's existing base of QoS features including large buffers, weighted fair queuing, Quad-Plane architecture (4 separate routing planes in their switches, giving 4 different QoS levels which can be further subdivided), and rapid convergence and rerouting with OSPF. It allows programmable QoS for IP connections configurable by port, route, IP address, or user defined through RSVP. Later enhancements will coordinate QoS support between campus and wide-area backbones by mapping QoS from IFMP, NHRP, PNNI, and TDP to IP Navigator. Also, because the number of VCs can be kept relatively small using MPT VCs it is practical to later add additional MPT VCs dedicated to guaranteed levels of service. The number of layer 3 hops can be reduced even further by using IFMP to link NHRP local-area connections to IP Navigator wide area connections essentially moving the edge of the MPT tree to the campus. And with 3Com's Fast IP using 802.1p/Q it will move the edge of the MPT tree all the way to the desktop/server. IBM's ARIS IBM's Aggregate Route-based IP Switching (ARIS) according to IBM takes a more general approach to IP switching, one that is not specific to any one set of products. Its actually quite similar to Cisco's Tag Switching and IP Navigator. Indeed, IBM and Cisco are co-chairs of the IETF Multi-Protocol Label Switching (MPLS) working group. This group will define where label-swapping based forwarding, i.e. tag switching, is going for the ultimate standard. IBM will use ARIS in their ATM and frame relay switches as well LAN switches. In ARIS parlance a switch that has had IP routing capability added to it is known as an Integrated Switch Router (ISR). Edge ISRs perform the usual forwarding of IP datagrams except the next hop field in the IP routing table now contains a reference to a switched path known as the "egress identifier" and, in the case of ATM, it would contain an ATM VCI. This switched path may lead just to a neighboring ISR (comparable to IP next hops on conventional routers) or it may traverse a series of ISRs following a standard IP routing path to an egress ISR. ARIS pre-establishes switched paths to "well known" egress ISRs. As a result, virtually all best-effort traffic is switched. These well known egress nodes are learned through standard routing protocols such as OSPF and BGP. Egress ISRs initiate the setup of switched paths by sending Establish messages to their upstream neighbors. These neighboring ISRs forward the messages onto their own upstream neighbors in Reverse Path Multicast style (RPM) only after ensuring the switched path is loop free. Eventually all ISRs establish switched paths to all egress ISRs. The switched path to an egress ISR in general takes the form of a tree rooted at the egress ISR. A tree results because of the "merging" of switched paths that occurs at a node when multiple upstream switched paths for an egress point are spliced to a single downstream switched path for that egress point. So these ARIS switched path trees which look very similar if not identical to the MPT trees in IP Navigator also solve the VC scaling problem, keeping the number of VCs used in the core to O(N). However, this isn't the whole story for ARIS. ARIS uses different types of egress identifiers to balance the desire to share the same egress identifier among many IP destination prefixes with the desire to maximize switching benefits. ISRs choose the type of egress identifier to use based on routing protocol information and local configuration. The first type of egress identifier is the IP destination prefix. This type results in each IP destination prefix getting its own switched path tree and thus it will not scale in large backbone and enterprise networks. However, this is the only information that some routing protocols, such as RIP, can provide. This egress identifier type may work well in networks where the number of destination prefixes is limited, such as in campus environments. The second type is the egress IP address. This type is used primarily for BGP protocol updates which carry this information in the next_hop attribute. The third type is OSPF router id, which allows aggregation of traffic on behalf of multiple datagram protocols by OSPF. The fourth type is multicast pair used by multicast protocols such as DVMRP, MOSPF, and PIM. Other egress identifiers may be defined like IS-IS NSAP addresses, NSLP IPX addresses, and IPv6 destination prefixes. As mentioned before, in the case of ATM, ATM cells corresponding to IP packets from different sources can become interleaved. ARIS supports both the usage of ATM switching hardware that has the capability of preventing cell interleaving, of which there is very little currently, as well as using ATM virtual paths (VPs) to the egress points rather than VCs. The current ARIS specifications say ARIS can be extended to support QoS parameters. This will be addressed in a future ARIS revision. So currently there is no QoS in ARIS, just best effort. Conclusions There are a number of concerns surrounding these various IP switching implementations. A significant issue is scalability. The granularity of what gets switched determines in large part how scalable a given solution is. In Ipsilon's IP Switching an application-to-application communication get its own VC and, at least currently, this is the only granularity it provides. In Fast IP it seems it can also provide this same application-to-application granularity but it would only use this if it needed some higher than best-effort QoS. For best-effort traffic, Fast IP seems to provide node-to-node granularity, aggregating all the applications communicating between two nodes into one circuit. ARIS provides a few different granularities ranging from node-to-node to switch-to-switch by using its egress identifier type. Switch-to-switch granularity aggregates traffic to all nodes reachable from an egress switch into one circuit. IP Navigator provides only switch-to-switch granularity. Other issues include achievable QoS, what protocols the implementation is tied to, what backbone technologies it can use, how well does it interoperate with other equipment, and can it upgrade easily. This combined with the fact that the standards are still evolving makes it difficult to evaluate these proposals. Having said this, the interoperability initiative taken by 3Com, Cascade, and IBM is encouraging. It based on external standards rather than a single company's technology and provides a promising end-to-end solution. References "Draft Standard for Traffic Class Expediting and Dynamic Multicast Filtering", IEEE 802.1p/D6, April 1997. "Draft Standard for Virtual Bridged Local Area Networks", IEEE 802.1Q/D5, February 1997. "IP Navigator White Paper", Cascade Communications Corp., December 1996. "LAN Emulation over ATM Version 2 - LUNI Specification - Straw Ballot", ATM Forum/STR-LANE-LUNI-02.00, April 1997. "Multiprotocol Over ATM Version 1.0 - Straw Ballot", ATM Forum/ STR-MPOA-MPOA-01.00, February 1997. R. Bellman, "IP Switching -- Which Flavor Works for You?", Business Communications Review 27(4), April 1997, 41-46. J. Hart, "Fast IP: The Foundation for 3D Networking", 3Com Corporation PN 501312-001, January 1997. J. Heinanen, "Multiprotocol Encapsulation over ATM Adaptation Layer 5", IETF RFC 1483, July 1993. J. Luciani, et al., "NMBA Next Hop Resolution Protocol (NHRP)", IETF Internet Draft, draft-ietf-rolc-nhrp-11.txt, March 1997. R. McGee, "Rick McGee's Keynote Speech at Networks Expo", . P. Newman, et al., "Flow Labeled IP: A Connectionless Approach to ATM", Proc. IEEE Infocom, San Francisco, March 1996, 1251-1260. P. Newman, et al., "Ipsilon Flow Management Protocol Specification for IPv4", IETF RFC 1953, May 1996. P. Newman, et al., "Ipsilon's General Switch Management Protocol Specification", IETF RFC 1987, August 1996. A. Viswanathan, et al., "ARIS Aggregate Route-Based IP Switching", IETF Internet Draft, draft-viswanathan-aris-overview-00.txt, March 1997.