Silicon technology will continue to provide an exponential increase in the availability of raw transistors. Effectively translating this resource into application performance, however, is an open challenge. Ever increasing wire delay relative to switching speed and the exponential cost of circuit complexity make simply scaling up existing processor designs futile. Our work is an alternative to superscalar design, called WaveScalar. WaveScalar is a dataflow instruction set architecture and execution model designed for scalable, low-complexity/high-performance processors. It is unique among dataflow architectures in that it efficiently provides traditional memory semantics in order to execute applications written in imperative languages.

The WaveScalar ISA is designed to run on an intelligent memory system. Each instruction in a WaveScalar binary executes in place in the memory system and explicitly communicates with its dependents in dataflow fashion. WaveScalar architectures cache instructions and the values they operate on in a WaveCache, a simple grid of ``alu-in-cache'' nodes. By co-locating computation and data in physical space, the WaveCache minimizes long wire, high-latency communication.