An on-the-fly Data Race Detector for RECPLAY,
a Record/Replay System for Parallel Programs

Michiel Ronsse and Koen De Bosschere
Department of Electronics and Information Systems
Universiteit Gent, Belgium
ronsse at

Developing multi-threaded programs is clearly a more difficult task than developing sequential programs. Besides the fact that correctness of multi-threaded programs has more aspects (communication, synchronization, performance, etc.), there is also a clear lack of adequate development tools, especially for finding data races. A data race occurs when two threads can concurrently access the same shared variable and at least one thread modifies the variable. These data races often introduce undesired nondeterminism in the execution of multi-threaded programs and, as such, should be removed. As statical race detection is known to be undecidable, most data race detection mechanisms use a dynamic on-the-fly approach. Finding data races on-the-fly basically consists of three steps: (i) tracing all memory accesses, (ii) detecting unordered (parallel) accesses, and (iii) identifying the offending memory instructions.

This abstract describes the data race detection tool that is part of RECPLAY, a fully integrated practical record/replay mechanism enabling cyclic debugging with automatic data race detection and reference identification during the replay phase. RECPLAY was developed for Solaris 2.5 and is fully operational. A demo is available on the web.

As tracing all memory accesses introduces an intolerable overhead in a parallel program, possibly introducing Heisenbugs and the probe effect, tracing is done under the supervision of a record/replay mechanism. The record/replay mechanism of RECPLAY traces the order of the Solaris synchronization operations during a first execution. The usage of scalar Lamport clocks to record the ordering introduces only a minimal overhead (<2%). These traces are used during the replay phase to enforce the program flow of the first execution. A faithful replay can however only be guaranteed if the behavior of the program is fully determined by the synchronization operations only (i.e., there are no data races). Fortunately, by running the race detector during replay we can automatically detect the first data race, warn the user, and stop the execution. Hence, there is a perfect marriage between our race detector and our record/replay system: the race detector helps the record/replay system to faithfully reproduce an execution, while the record/replay system allows the data race detector to substantially slow down the program execution without altering its behavior. The current race detector slows down the program execution by a factor 30.

The race detector starts by collecting the sets of read and write memory accesses per segment, i.e., between successive synchronization operations per thread. At the end of each segment, these sets are compared with the corresponding sets of all the concurrent segments (i.e., segments that are not ordered by the happened-before relation) that have already completed. This approach guarantees to find all the data races in that particular execution, even the races that are dormant, i.e., memory accesses that were executed in the correct order, but are not ordered by the program. When a data race is detected (i.e., a memory location involved in a data race), a new replay execution automatically finds the two offending instructions and the variable they access.

To the best of our knowledge, there are no race detectors that work on general thread-based programs. Eraser, a recent data race detector is limited to programs based on mutexes; it does not work on general parallel programs that use other synchronization operations. It is not based on the happened-before relation, but on the verification of a so-called locking discipline. As such, it claims to find more data races in a program than can be found from a single execution (e.g., RECPLAY). Eraser will indeed find data races that RECPLAY cannot detect, but unfortunately, some of these races are false positives. RECPLAY is guaranteed to find all the races in a particular execution, but it will never return false positives (even the initialization of variables is handled automatically because threads are also ordered by the thread creation calls), and hence we do not need program annotations to adapt a program (switching off the race detector can be very dangerous because it can also mask real data races). Furthermore, the record/replay mechanism that is sitting behind the race detector completely masks the probe effect caused by the race detector overhead. Eraser cannot find data races in a real execution, only data races in a program which might have a different behavior due to the extra delays introduced.

Michiel Ronsse is supported by a grant from the Flemish Institute for the Promotion of the Scientific-Technological Research in the Industry (IWT). Koen De Bosschere is a research associate with the Fund for Scientific Research -- Flanders. The authors would like to thank Jan Van Campenhout for his support and advice. They are also endebted to Koen Audenaert for the many discussions.