Relaxed Consistency and Synchronization in Parallel Processors, UW CSE TR 92-12-05

By Richard N. Zucker

ABSTRACT

Parallel programs often do not obtain close to linear speed-up when compared to a sequential version of the program running on a uniprocessor. There are many reasons that linear speed-up is not obtained. Two important ones are the overhead of synchronization and memory latency.

Synchronization, the coordination of the work done by different processors, is an overhead that does not exist in uniprocessor programs. Therefore, excessive time spent performing synchronization leads to a loss of performance. Many previous studies to evaluate this overhead have used artificial benchmarks with high levels of lock contention. In this dissertation I study both the effects of synchronization on the performance of real parallel programs and the impact of the efficiency of the implementation of the synchronization algorithm. The results show that the frequency of synchronization is the most significant factor leading to performance loss. When synchronization occurs sufficiently often, the implementation algorithm has a non-negligible effect.

Memory latency, the length of time from when a request to memory is initiated until it completes, is a major problem in multiprocessors. Many hardware and software enhancements have been proposed to deal with the problem. One of the ideas is relaxed models of memory consistency. Relaxed models, such as weak ordering or release consistency, replace sequential consistency, the usual intuitive model of how the memory of the system is implemented. With this change in the memory model, many architectural features can now be used that are not allowed under sequential consistency, but at the cost of imposing constraints to the programmer of parallel systems. In this dissertation I consider many of these architectural features such as bypassing, lock-up free caches and a software controlled cache coherence scheme I propose. I attempt to determine the performance benefits of using such features and which features provide the most benefit. The results show that relaxed consistency can provide significant performance gains for some programs and architectures. The choice of a given relaxed model does not significantly affect the gains. Software controlled cache coherence, a scheme that requires a smaller hardware investment, can provide equivalent performance in some cases and competitive performance in others.

%A Richard N. Zucker
%T Relaxed Consistency and Synchronization in Parallel Processors
%R 92-12-05
%I University of Washington Department of Computer Science and Engineering
%D December 1992

pardo@cs.washington.edu