A Parallel Trace-driven Simulator: Implementation and Performance, UW CSE TR 94-01-04

By Xiaohan Qin and Jean-Loup Baer.

ABSTRACT

The simulation of parallel architectures requires an enormous amount of CPU cycles and, in the case of trace-driven simulation, of disk storage. In this paper, we consider the evaluation of the memory hierarchy of multiprocessor systems via parallel trace-driven simulation. We refine Lin et al. original algorithm, whose main characteristic is to insert the shared references from every trace in all other traces, by reducing the amount of communication between simulation processes. We have implemented our algorithm on a KSR-1. Results of our experiments on traces of four applications and three different cache coherence protocols show that parallel trace-driven simulation yields significant speedups over its sequential counter-part. The communication overhead is not substantial compared to the dominant overhead due to the processing of replicated inserted references.

We also investigate filtering techniques for multiprocessor traces. We show how to filter --in parallel-- private and shared references. Our technique generates filtered traces for various block sizes in a single pass. As expected, the simulation of filtered traces is much faster but parallel simulation of filtered traces is not as effective since the ratio of unfiltered shared to private references is now much larger.

%A Xiaohan Qin
%A Jean-Loup Baer
%T A Parallel Trace-driven Simulator: Implementation and Performance
%R 94-01-04
%I University of Washington Department of Computer Science and Engineering
%D January 1994

pardo@cs.washington.edu