Appears In Proceedings of the SIGMETRICS '98/ PERFORMANCE '98 Joint International Conference on Measurement and Modeling of Computer Systems, June 1998.
[B&W PostScript] [B&W PDF] (TR Version: [Color PostScript] [Color PDF])
Abstract. We consider the problem of scheduling tasks with unpredictable service times on distinct processing nodes so as to meet a real-time deadline, given that all communication among nodes entails some (possibly large) overhead. This work is motivated by our effort to build a system that uses a cluster of workstations to improve the performance of multimedia applications that require real-time 3D rendering. In this context, the tasks correspond to (sets of) scene objects that must be rendered to create a single image within each frame time. A basic difficulty that must be confronted is that task execution times may be very imbalance, and can be difficult to estimate accurately prior to the start of a frame. Thus, while the initial assignment of tasks at the beginning of a frame may have equal expected processing times, the system must recover from imbalances discovered during execution.
We consider two distinct classes of scheduling policies, static, in which task reassignments can only occur at specific times, and dynamic, in which reassignments are triggered by some node going idle. For both classes, we further examine global reassignment, in which all nodes are rescheduled at a rescheduling moment, and local reassignment, in which only a subset of the nodes engage in rescheduling at any one time.
We show that, over a range of parameterizations appropriate to clusters of commodity workstations, global dynamic policies work best. We introduce a new policy, Dynamic with Shadowing, that places a small number of tasks in the schedules of multiple nodes to reduce the amount of communication required for load-balancing. This policy dominates all other alternatives considered over most of the parameter space.