Mosaic

Developing power-efficient coarse-grained reconfigurable architectures and tools

Main Page
HMP Type Architecture
Programming Language & Compiler Tools
CAD Tools
Architecture
Applications
People
Publications & Talks
Related Projects
Project Tools & Mosaic Wiki 
CSE logo Department of Computer Science and Engineering
University of Washington

Summary

Coarse-grained reconfigurable architectures (CGRAs) have the potential to offer performance approaching an ASIC with the flexibility, within an application domain, similar to a digital signal processor. In the past, coarse-grained reconfigurable architectures have been encumbered by challenging programming models that are either too far removed from the hardware to offer reasonable performance or bury the programmer in the minutiae of hardware specification. Additionally, the ratio of performance to power hasn't been compelling enough to overcome the hurdles of the programming model to drive adoption.

The goal of our research is to improve the power efficiency of a CGRA at an architectural level, with respect to a traditional island-style FPGA. Additionally, we are continuing previous research into a unified mapping tool that simplifies the scheduling, placement, and routing of an application onto a CGRA.

Overview

Reconfigurable computing architectures provide large numbers of tightly integrated processing elements to achieve the performance of custom hardware without sacrificing reprogrammability. These "micro-parallel" architectures have yet to be adopted for general computation for two main reasons: First, they do not execute sequential code efficiently. Second, writing programs for micro-parallel execution is an arcane art far removed from the experience of most programmers.

The first problem has been addressed by a hybrid computing model that integrates a sequential processor along with a spatial fabric such as an FPGA. Such hybrid processors have become prevalent with recent FPGAs like the Altera Stratix and the Xilinx Virtex. While hybrid computers address the issue of combining sequential and spatial computation, the burden of integrating sequential and spatial code in a single application, and especially programming the spatial fabrics, remains challenging. Part of the difficulty lies in the lack of an agreed upon computational model and family of programming languages.

To address this challenge, we have developed a type architecture 1 that extends the familiar von Neumann model to include the micro-parallel engine in hybrid architectures. This hybrid type architecture provides an abstraction for programmers that allows them to understand the essential features and constraints of the underlying hybrid system without being overwhelmed by second-order details.

In addition to a mental model, programmers need a language that allows them to address key features of the model. We propose a language based on C called Macah, with extensions that reflect our proposed type architecture. Programmers can take advantage of the features of the type architectures using the programming language extensions programs provided by Macah.

Compiler Backend

Compiling Macah programs to hybrid sequential/micro-parallel architectures presents a real hallenge. We divide this compilation task into front-end transformations and back-end transformations. Front-end transformations are used to create large, parallel control-dataflow graphs from the program description, partitioning the computation into sequential and micro-parallel sections. Sequential code is compiled using standard processor compilers, while the micro-parallel code is mapped to a spatial fabric by the compiler back-end. Back-end transformations perform detailed scheduling, pipelining and time-multiplexing of the hardware resources while placing and routing the resulting circuit to the configurable fabric.

Reconfigurable Architectures

Although many current reconfigurable architectures conform to the hybrid type architecture, they may not be the most efficient way to implement the type architecture. For example, the size and organization of the workspace memory is crucial the performance of the spatial fabric. Moreover, the interface between the micro-parallel engine and the sequential processor and memory may constrain both performance and the range of applications that can be hosted by the hybrid system. Finally, fine-grained reconfigurable fabrics like FPGAs will become increasingly inefficient with respect to power and area for many types of computations. We are investigating a class of coarse-grained reconfigurable fabrics that implement our type architecture efficiently, and support the compilation process, especially hardware virtualization. This work is leveraging our previous experience with the RaPiD and PipeRench coarse-grained architectures. Exploring a range of different architectures will be enabled by our architecture-independent compiler backend. That is, we can retarget the compiler simply by providing a detailed architectural description of the reconfigurable fabric.

Micro-Parallelism

We use ``micro-parallel'' to describe both computations and architectures. The defining features of micro-parallel computations are:

Note that programs can have several micro-parallel sections embedded within control-flow dominated code, which drives the need for hybrid architectures. The defining features of micro-parallel architectures are:

Micro-parallel computations can be mapped to efficient spatial implementations whose static structure reflects the structure of the computation: Parallel operations are executed by different, dedicated, functional units, communication between operations is performed by dedicated wires and registers, and repetition means that the same structure can be reused many times for different data.

Computations that run efficiently on systolic arrays are good examples of micro-parallel computations and systolic arrays are good examples of micro-parallel engines. FPGAs, as well as various FPGA cousins (for example RaPiD and PipeRench), are ``configurable'' micro-parallel engines that can be restructured dynamically to execute different micro-parallel computations. The vector unit of a vector processor might also be considered a micro-parallel engine, but we do not consider the datapath of a superscalar processor to be a micro-parallel engine, because the number of function units is relatively small and cannot be dedicated, the communication between function units is not scalable, and such processors utilize significant control resources to maintain the appearance that instructions execute sequentially.

There is an important distinction between micro-parallelism, and task-, process-, or thread-level parallelism2. Algorithms that can be implemented to take advantage of micro-parallelism can, in most cases, be implemented in a task-parallel style as well. However, multiprocessor architectures have so much overhead relative to micro-parallel architectures that a micro-parallel implementation is much cheaper, in terms of both dollars and energy for equivalent performance. In some cases, the close, fine-grained communication between the operations in a micro-parallel computation precludes the efficient use of a task-parallel architecture like a multiprocessor. It is true that many computations exhibit both task- and micro-parallelism and for those it is appropriate to build a multi-processor from the hybrid compute nodes that we describe in this paper. Architectures such as the Cray XD1, SCORE, and Merrimac focus on both task- and micro-parallelism.


1A type architecture is an abstract model of a family of computers.

2Micro-parallelism encompasses traditional instruction-level parallelism (ILP).


UW Embedded Research Group
Last modified: Wed Jun 7 16:38:57 PDT 2006