Protocol Compilation: High Performance Communication for Parallel Programs (343K)

By Edward W. Felten

Abstract

One obstacle to the use of distributed-memory multicomputers has been the disappointing performance of communication primitives. Although the hardware is capable of moving data very quickly, the software has been unable to exploit this potential.

The main cause of this "communication gap" is "protocol overhead". In addition to moving the application's data between processors, communication systems must engage in other communication and synchronization to ensure correct execution. Protocol overhead is an inherent attribute of parallel computation - it occurs on all machines, under all programming models.

To reduce the effect of protocol overhead, I advocate the use of "tailored protocols" - communication protocols custom-designed to work with a particular application program. A tailored protocol takes advantage of advance knowledge about the behavior of the application program to "slim down" the protocol. Unfortunately, designing tailored protocols by hand is very difficult.

The process of designing tailored protocols can be automated. A "protocol compiler" is a goal that takes an application program as input, and produces as output a tailored protocol for that program. This tailored protocol is plug-compatible with the existing message-passing library, so its use is transparent to the application program.

The bulk of this dissertation discusses the principles underlying the design of a protocol compiler, including the forms of program analysis the protocol compiler must carry out. In addition to this general discussion. I describe the details of Parachute a working prototype I have constructed. Parachute generates tailored protocols for data-parallel programs running on the Intel series of multicomputers.

Performance models predict, and experiments with Parachute on real applications confirm, that using a protocol compiler leads to a large decrease in communication time. Experiments indicate that Parachute, despite several engineering compromises in its design, cuts communication time by 17%. Reductions in overall application running time range from 0% to 20%, with an average reduction of 7.3%. More aggressive implementation, or new hardware, will lead to even larger speedups.

%A Edward W. Felten
%T Protocol Compilation: High Performance Communication for Parallel Programs
%R TR 93-09-09
%I University of Washington Department of Computer Science and Engineering
%C Seattle, Washington
%D 1993

pardo@cs.washington.edu