The lack of mature languages and tools is a chief obstacle to obtaining high performance from parallel computers. The Orca Parallel Programming Project addresses this problem with a programming language, Orca C, and support software for efficient, portable and scalable parallel programming.
Like its successful antecedent, the Poker Parallel Programming Environment, Orca C has been designed from first principles. It is founded on a machine model known as the CTA, an idealized parallel machine that abstracts the commonly available facilities of physical machines. On top of this machine model there is layered a new programming model, known as Phase Abstractions, that enhances the CTA by providing a global view of data and communication as well as explicit high level control. Orca C is the programming language that realizes the Phase Abstractions programming model; it borrows syntactically from the C language while building upon the parallel foundations of the Phase Abstractions.
The CTA machine model and the Phase Abstractions programming model, both developed at Washington, have been tested in numerous experiments to demonstrate that they promote the project's goals of high performance, portability and scalability. For example, the SIMPLE benchmark, a widely studied fluid dynamics computation, was programmed using the CTA and Phase Abstractions models and hand translated - no Orca C compiler exists yet - to five different parallel machines, representing the major classes of MIMD parallel architectures. SIMPLE realized at least P/2 performance on all machines on all experiments, i.e. the observed speedup was always more than half of the theoretically best speedup factor of P, making it the first demonstration of portability across MIMD machines.
Other studies have investigated parallel computations including matrix multiplication, LU Decomposition, QR Factorization, Jacobi and a molecular simulation of water molecules. These programs have run on nearly all of the recently available commercial and experimental parallel machines. Among the discoveries is the relationship between the memory structure of a programming model and the memory hierarchy of the object machine -- a nonshared memory model is more effective as the hierarchy deepens.
In the past year a compiler has been constructed for ZPL, the data parallel subset of Orca C, and the earlier SIMPLE portability results have been successfully duplicated using compiled code. ZPL is currently being used in collaborations with other departments (Astronomy, Civil Engineering) and other universities (U. of Oregon, U. of Arizona) to build parallel applications and to develop parallel tools such as parallel debuggers. In addition to building a prototype of the full Orca C language compiler, there are many other ongoing projects. These include parallel algorithm design and analysis, studies in parallel debugging, further research into parallel models of computation, and the support software for an Orca C based parallel programming environment.
Principal Investigators: Lin and Snyder