SPR |
Schedule, Place and Route for//Coarse-Grained
Reconfigurable Architectures |
|
|
|
|
Backend Transformations
There has been a lot of progress in the high-level
synthesis community on synthesizing high-level programming languages like
C or Java into hardware. But high-level synthesis assumes that an
arbitrary hardware structure can be built, and uses this freedom to find
a solution that optimizes some combination of cost and performance.
Although this work applies to FPGA-based architectures, it does not translate
well to coarse-grained adaptable architectures, which present a very constrained
computational substrate.
The problem of compiling to an adaptable architecture also shares
some aspects with the
problem of compiling to a traditional VLIW architecture: Operations
specified in the program must be scheduled to a fixed number of
function units, and the data operands and results must be transferred
between function units via registers. However, compiling to an
adaptable architecture is much more difficult:
-
Adaptable architectures have many more function units, typically an order
of magnitude more, than a VLIW. The compiler must discover
and expose sufficient parallelism to make efficient use of these function
units, and the scheduler must schedule them in space and time.
-
Data communication is much more constrained in an adaptable architecture
because a general interconnection network like a crossbar or multi-ported
register file is far too expensive for so many function units. Thus both
function units and wires have to be scheduled by the compiler.
-
Sufficient data bandwidth must be provided to keep all function units operating
in parallel. Instead of centralized, multi-ported register files,
registers are distributed throughout the datapath to achieve high bandwidth,
locality of reference and low power. In addition, small memories are distributed
throughout the datapath, which serve as local data caches that supply data
close to where it is used. The compiler must be able to use
these registers and memories effectively.
-
Control in a coarse-grained reconfigurable architecture is typically very
constrained and the compiler must generate solutions that can be implemented
using the given control architecture.
Our approach treats the coarse-grained configurable substrate as a very
large-scale, very constrained VLIW architecture and combines traditional
VLIW scheduling techniques based on iterative-modulo scheduling, along
with place and route algorithms
used for configurable architectures. The compiler front-end is
used to transform programs written in the Macah high-level language
to a control/dataflow graph that is then scheduled to the configurable
datapath. The scheduling problem is formulated as a place and route
problem that maps dataflow graphs from the program control/dataflow graph
to a computing substrate comprising multiple instances of the datapath
unrolled in time. By casting the scheduling problem as a place and
route problem, many difficult and interacting subproblems imposed by the
configurable architecture can be solved simultaneously. These include resource
allocation and scheduling, register allocation and data transfer, efficient
time multiplexing of hardware resources, pipelining and retiming, and the
satisfaction of configurable control constraints.
By isolating the architecture-dependent factors in the objective function,
the scheduling algorithms can be designed to be architecture-independent.
We will use this feature to explore different architecture parameters and
their effect on the compiler. This will allow, for example, an exploration
of the tradeoff between interconnect resources and control complexity.
It will also facilitate retargeting the compiler to other coarse-grained
configurable architectures.