|
|
|||||||||||||
|

Compiling to hybrid sequential/micro-parallel architectures presents some new and interesting challenges. We divide the Compilation task up into front-end transformations and back-end transformations. We are currently working on a compiler called MacahC that takes Macah code and generates code for micro-parallel architectures.
Many micro-parallel architectures offer special resources for transfering data efficiently, such as DMA engines. Because Macah has streaming constructs built in, taking advantage of these resources is relatively easy. MacahC analyses memory stream accessor functions to determine whether they fit the constraints of the target architecture and if so, replaces corresponding calls to the spawn functions with code to configure the architecture's special memory access features. Since our current target is an FPGA, we can generate custom hardware for each memory accessor function in a program.
It is also possible to automatically analyze memory access patterns and synthesize DMA commands, or configurations for similar streaming engines. Since Macah has streaming construct built in, we consider such analyses more of a convenience than a necessity, and have left them as future work.
Several researchers have observed that converting programs to static single assignment (SSA) form is a useful first step in translating a program to a spatial implementation (for example, 1). Our compiler follows their lead and converts all code within micro-parallel blocks into SSA form before applying any other transformations. In addition to making many conventional optimizations easier \cite{cytron1991}, conversion to SSA form makes immediately evident where multiplexors and registers are needed in a synchronous sequential circuit implementation of a code block.
Though it would result in an inefficient implementation, we can translate directly from the initial SSA version of a code block to a circuit, by turning each operator in each node of the SSA control flow graph into a combinational circuit, and the graph itself into a finite state machine that controls the activity of the rest of the circuit. Although they do not fit within the HMP model, even features like calls to sequential functions can be supported, as described in \cite{Budiu2002a}.
One of the few optimizations currently performed by our compiler that is not strictly necessary is if-conversion. Within micro-parallel blocks, all simple if-then-else and switch statements are converted to straight-line code with multiplexors to choose results based on the branch condition. Operations with side-effects, such as stream sends and receives must also be predicated on the branch condition, in order to ensure that operations that are not supposed to take place have no effect.
Finally, our compiler must generate ``glue code'' that causes the transfer of control from the processor to the micro-parallel engine and back again. This includes copying shared data to the micro-parallel engine before control is transfered, and copying shared data back when control is returned. Of course, only ``dirty'' data needs to be copied. Typically it is more convenient to initialize data in the micro-parallel engine via this shared memory protocol, and more convenient to transfer the high-bandwidth data used by the micro-parallel engine via memory streams. Shared data is transferred automatically, while the user-defined streams are allocated, configured and initiated by the runtime system.
|
UW
Embedded Research Group Last modified: Wed Jun 7 16:38:57 PDT 2006 |