Cheating the I/O Bottleneck:
Network Storage with Trapeze/Myrinet

Darrell Anderson, Jeff Chase, Syam Gadde,
Andrew Gallatin, Alvin Lebeck, and Ken Yocum
Duke University

Mike Feeley
University of British Columbia

Two recent hardware advances boost the potential of cluster computing: high-quality PCI bus implementations, and switched cluster interconnects that can deliver a gigabit or more of point-to-point bandwidth. We are developing system facilities to realize the potential for high-speed data transfer over Myricom's 1.28 Gb/s Myrinet LAN, and harness it for cluster file systems, network memory systems, and other distributed OS services that cooperatively share data across the cluster. Our broad goal is to use the power of the network to sidestep the disk I/O bottleneck for data-intensive computing on workstation clusters. Our recent work focuses on three areas:

We are pursuing several directions for continuing work. First, our preliminary results have exposed unnecessary overheads in common-case paths through the file and VM systems, which were previously limited by disk speeds. Second, cluster OS services using RPC-style messaging introduce new flow control considerations, particularly with prefetching, which is bursty and hungry for bandwidth. GMS/Trapeze systems under load can interfere with application-mandated network traffic, or even saturate network trunk links and node I/O buses. We have extended the Trapeze/Myrinet firmware to reflect back to the host precise information about congestion delays incurred while moving a packet through the interconnect, and we are experimenting with a modified GMS system that factors this data into its policies for selecting peer sites to receive its page migration requests. This is one example of a network-aware distributed system that can adapt what it sends and where in response to feedback about network topology and conditions. We believe that these mechanisms, which supplement more familiar mechanisms to control the send rate and routing of the generated traffic, are essential for delivering the potential of gigabit interconnects for data-intensive computing on workstation clusters.



Jeff Chase
Thu Aug 21 16:48:08 EDT 1997