Effects of Communication Latency, Overhead, and Bandwidth 

in a Cluster Architecture

 


Richard Martin, Amin Vahdat, David Culler, and Thomas Anderson. Effects of Communication Latency, Overhead, and Bandwidth in a Cluster Architecture. Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA), pages 85 - 97. June 1997.


This work provides a systematic study of the impact of communication performance on parallel applications in a high performance network of workstations. We develop an experimental system in which the communication latency, overhead, and bandwidth can be independently varied to observe the effects on a wide range of applications. Our results indicate that current efforts to improve cluster communication performance to that of tightly integrated parallel machines results in significantly improved application performance. We show that applications demonstrate strong sensitivity to overhead, slowing down by a factor of 60 on 32 processors when overhead is increased from 3 to 103 microseconds. Applications in this study are also sensitive to per-message bandwidth, but are surprisingly tolerant of increased latency and lower per-byte bandwidth. Finally, most applications demonstrate a highly linear dependence to both overhead and per-message bandwidth, indicating that further improvements in communication performance will continue to improve application performance.


Postscript