Embedded System Synthesis
(Borriello)
Automated synthesis of the hardware and software elements of embedded
controllers is critical to rapidly evaluating the many design choices
faced by designers of task-specific computing and communication devices.
We are currently developing a synthesis system, as part of the Chinook
project, that will automatically generate many of the low-level software
needed to coordinate the elements of an embedded system including detailed
scheduling to meet real-time constraints. Thus, our principal interest
is to make embedded software more portable by automatically generating
the most error-prone and system-specific portions of the software. This
includes the automatic customization of generic device drivers and
application-specific, real-time kernels to execute the software on
multiple processors.
Component-Based Design
(Borriello)
Modern software design involves the reuse of a variety of existing components.
The elements encapsulate functionality associated with a particular task
such as: a web browser, management of a wireless connection, user interface
widgets, etc.. Currently, these components are provided to designers via
an API abstraction for a particular platform or system architecture. This
style supports data-composition well (i.e., the transfer of data packets in a
data-flow style) but does not do as well for control-composition (where
the internal state changes of one component must also effect others). We are
developing a specification methodology that supports both data and control
composition at a high-level of abstraction and supports the automatic
synthesis of all the coordinating code (that handles communication,
synchronization, and consistency between the components). Our goal is to
be make software components more easily reusable and retargetable and
optimize them to different system architectures (e.g., use the same
specification to generate a family of applications including those that may
run solely on a workstation, as part of an embedded system, or distributed
over a network). We are currently developing an integrated development
environment that supports this style of specification and software design.
Information Appliances and Wearable Computing
(Borriello)
The future of computing, for the majority of people, is not going to
be on the desktop. Rather, it will be in devices unobtrusively embedded
in their homes, cars, and carried on their person. There are interesting
issues in constructing such devices that related to their connectivity,
user input/output modalities, and power sources. We are actively working
on speech recognition and synthesis (as may be used in a cellular phone
that connects to information services on the web) and on uses of small
personal digital assistants in home and automobile environments (for
example, to remote control home appliances and utilize GPS information
for coordinating routes and modes of transportation). Our aim is to
prototype interesting new devices and use them as drivers for our work
in component-based design and embedded system synthesis.
Silicon Neuroscience
(Diorio [home page, research page])
Nervous tissue (e.g. the brain) computes using hardware and
algorithms that are fundamentally different from our digital
computers. Our research group believes that we can learn new
principles of computing by studying neurobiology. Our goal is
to investigate the foundations of neuronal computation, and to
build silicon integrated circuits that compute using these
principles. We call our approach silicon neuroscience: the
development of neurally inspired silicon-learning systems.
We have already developed, in a standard CMOS process, a
family of single-transistor devices that we call synapse
transistors. Like neural synapses, synapse transistors
learn locally. And they do so at biologically realistic
timescales. Although we do not believe that a single
transistor can model the complex behavior of a neural
synapse completely, our synapse transistors do learn
from their inputs. Using them, we intend to develop
silicon systems that learn from their environment.
Pipelined-Systolic RADAR Signal Processing
(Diorio,
Sahr
(Electrical Engineering))
Researchers in the Radar Remote Sensing Laboratory
(rcs.ee.washington.edu/SPP/)
are building
a passive, bistatic radar for real-time measurements
of ionospheric fluctuations. This RADAR requires a
processing throughput of 10-100GOPS for a two-receiver
system; the computational burden increases as the square
of the number of receivers (additional receivers enable
angle-of-arrival estimation and target imaging).
We are optimizing the signal-processing algorithms
for integrated circuit implementation, and we are
developing custom, pipelined-systolic CMOS ICs with
the requisite throughput to enable real-time
ionospheric measurement.
Action-Potential Based Computing
(Diorio,
Atlas
(Electrical Engineering))
Conventional digital-logic systems communicate using
discrete, binary voltages, transmitted on passive metal
wire, and are synchronized by a system clock. By contrast,
neurons, the brain's primary computing elements, communicate
using millisecond-long impulse-like voltage waveforms
called action potentials, transmitted on nondispersive
active wire (called axon), and are not clocked. Recent
neurophysiological data lends credence to computing models
that use action-potential timing, and tuned wire delays,
to enable self-timed signal processing. We are investigating
the relationship between delay-based computing and
time-frequency analysis, and are beginning to develop
hardware models for computing using delta-function
impulses and wire delays.
A Configurable ASIC Compiler for Compute-Intensive Applications
(Ebeling)
ASICs can provide a large price-performance advantage over DSPs for
many compute-intensive applications. But ASIC technology
is not appropriate for applications that present a moving target in
terms of evolving standards, multiple uses, or algorithmic
improvements. For such applications, the implementation must be
reprogrammable to avoid premature obsolescence.
We are working on a hardware compiler that generates configurable
ASICs from high-level descriptions of compute-intensive tasks. A
configurable ASIC uses a combination of programmed control and static
reconfiguration of hardware resources to achieve flexibility without
compromising performance. Depending on the intended application, the
configurable ASIC can be built with more or less flexibility depending
on the required range of in situ reprogrammability.
Field-Programmable Gate Arrays
(Ebeling)
Reconfigurable hardware is rapidly increasing in popularity and complexity.
Our research in FGPAs falls in three different categories.
First, we have developed and patented a new FPGA architecture,
called Triptych, that achieves 2 to 4 times higher logic density than
existing commercial FPGAs.
Second, we have developed a set of architecture-independent,
performance-driven mapping tools for exploring new FPGA architectures.
These tools have been retargeted to a new IBM FPGA architecture,
where they have achieved better performance and density than industry tools.
Third, we are now investigating a new coarse-grained FPGA architecture
intended for computation-intensive applications,
such as digital signal processing.
We expect to achieve an order of magnitude improvement in density over
current FPGA architectures.
Architectural Retiming
(Ebeling)
Pipelining is probably the most common technique for improving system
performance, and retiming can be used to automatically optimize the
performance obtained by pipelining.
However, in many situations pipelining and retiming are not possible,
either because of latency constraints or because a feedback cycle makes
pipelining impossible.
We are investigating a set of techniques that allow pipelining in these
cases by changing the architecture of the circuit.
Whereas retiming preserves the function of the circuit exactly on a cycle
by cycle basis,
changing only the position of registers in the circuit,
architectural retiming relies on logic transformations,
preserving functionality only in terms of input/output sequences.
We are developing a CAD tool called ART which applies these transformations,
relying initially on the designer to evaluate and choose those that achieve
the best performance.
Chaos Routing
(Snyder,
Ebeling, Bolding)
The Chaos router is a randomizing, non-minimal adaptive packet router
that has been shown to have higher throughput and lower latency
than the state of the art oblivious routers for both two dimensional
(e.g., mesh and torus) networks and hypercube networks
on uniform and non-uniform traffic patterns.
Randomization is crucial,
since it allows the Chaos router to be naturally livelock free
without any special circuitry
(i.e., packets do not circulate forever).
The Chaos router has been implemented in 1.2u CMOS with a speed of 66MHz,
the speed limit for this technology, using a pipelined design.
Research has included processor/network interface designs,
characterizations of Chaos routing under various workloads,
studies of fault tolerance issues in adaptive routing,
theoretical characterizations of routing algorithms,
and a general study of high performance routing algorithms.
Currently, the 0.6u Chaos chip is back from fab, and being
incorporated into a Chaos LAN design.
Click
here for more information.
Memory Hierarchy Studies
(Baer)
Processor speed has steadily improved during the last decade but
memory latency and bandwidth have been progressing
at a slower pace. Several memory organizations,
as well as techniques to reduce or tolerate
high memory latencies are investigated for single as well
as multiprocessor systems.
Particular projects include:
(1) monitoring the execution characteristics of "real-world" (i.e., Wintel)
applications for their potential optimizations in (maybe) non-conventional
memory systems
(2) code and data reorganization for improving instruction
and data cache, TLB and Branch target buffer efficiencies.
(3) design and evaluation of user-primitives for
applications running on
clustered shared-memory multiprocessors,
Simultaneous Multithreading
(Eggers,
H. Levy)
Simultaneous multithreading (SMT) is a processor design that
permits several independent threads to issue instructions to a (wide)
superscalar's functional units in a single cycle.
Its microarchitecture is a fairly straightforward extension of today's
wide-issue, dynamically scheduling processors,
which should ease its potential transition in the commercial marketplace.
Our experiments to date have shown that,
relative to several other architectures that are also competing to be the
design paradigm in the next few years,
SMT has higher program speedups and useful instruction throughput on a
variety of multi-threaded workloads,
while barely impacting single-thread performance.
Our early studies exposed the performance advantages of simultaneously
issuing instructions from several threads and dynamically sharing almost
all hardware resources among those threads.
Much of this research prompted advances in architectural techniques
(examples are effective multithread instruction fetch hardware and
compiler- and operating systems-directed support for early remapping of
renaming registers)
and using existing software techniques in a different manner
(examples include applying different policies for OS page mapping and
several compiler optimizations).
Our future research will investigate using SMTs in a parallel processing
environment, including the development of SMT-specific synchronization
mechanisms, revisiting operating policies that drive SMT's thread-shared
hardware resources and expanding our experimental workload to include
real, threaded applications.
Our first foray into the latter area was an investigation of SMT
performance on commercial workloads.
Chaos Routing
(Snyder,
Ebeling, Bolding)
The Chaos router is a randomizing, non-minimal adaptive packet router
that has been shown to have higher throughput and lower latency
than the state of the art oblivious routers for both two dimensional
(e.g., mesh and torus) networks and hypercube networks
on uniform and non-uniform traffic patterns.
Randomization is crucial,
since it allows the Chaos router to be naturally livelock free
without any special circuitry
(i.e., packets do not circulate forever).
The Chaos router has been implemented in 1.2u CMOS with a speed of 66MHz,
the speed limit for this technology, using a pipelined design.
Research has included processor/network interface designs,
characterizations of Chaos routing under various workloads,
studies of fault tolerance issues in adaptive routing,
theoretical characterizations of routing algorithms,
and a general study of high performance routing algorithms.
Currently, the 0.6u Chaos chip is back from fab, and being
incorporated into a Chaos LAN design.
A Configurable ASIC Compiler for Compute-Intensive Applications
(Ebeling)
ASICs can provide a large price-performance advantage over DSPs for
many compute-intensive applications. But ASIC technology
is not appropriate for applications that present a moving target in
terms of evolving standards, multiple uses, or algorithmic
improvements. For such applications, the implementation must be
reprogrammable to avoid premature obsolescence.
We are working on a hardware compiler that generates configurable
ASICs from high-level descriptions of compute-intensive tasks. A
configurable ASIC uses a combination of programmed control and static
reconfiguration of hardware resources to achieve flexibility without
compromising performance. Depending on the intended application, the
configurable ASIC can be built with more or less flexibility depending
on the required range of in situ reprogrammability.
WebOS: Operating System Services for Wide Area Applications
(T. Anderson)
WebOS provides OS services to wide-area applications, including
mechanisms for resource discovery, a global namespace, remote process
execution, resource management, authentication, and security. On a
single machine, application developers can rely on the local operating
system to provide these abstractions. In the wide area, however,
application developers are forced to build these abstractions themselves
or to do without. This ad-hoc approach wastes programmer effort and
system resources. To address these problems, WebOS provides basic
operating systems services needed to build applications that are
geographically distributed, highly available, incrementally scalable,
and dynamically reconfiguring. An application that demonstrates the
utility of WebOS is Rent-A-Server, a web server capable of dynamically
replicating itself geographically in response to client access patterns.
CRISIS: A Wide Area Security Architecture
(T. Anderson)
CRISIS re-examines wide area authentication and access control
from the perspective of programmable control of remote resources.
CRISIS explores the systematic application of a number of design
principles to the problem of wide area security: redundancy to eliminate
single points of attack, caching to improve performance and availability
over slow and unreliable wide area networks, fine-grained capabilities
and roles to enable lightweight control of privilege,
and complete local logging of all evidence used to
make each access control decision.
Self-Tuning Network File Systems
(T. Anderson)
File system designs have long been driven by changes in the cost and
performance of the underlying hardware. A designer must consider the
relative cost per byte of memory versus disk, the relative performance
of the CPU versus a network access versus a disk access, the relative
magnitudes of seek time, rotational delay, and disk bandwidth, not to
mention changes in the workload placed on the file system. These
changes pose a tremendous challenge to file system designers. Although
it may be possible to design a system that is efficient for today's
hardware and workload patterns, file system implementations are often
used for decades, long after their design assumptions are no longer valid.
In this work, we are exploring a new design principle for file systems,
called self-tuning. A self-tuning system (1) measures the physical
characteristics of the underlying hardware, (2) measures the workload
placed on the file system, and (3) adapts the file system behavior to
match. Building a self-tuning system requires new algorithms that monitor
the environment and adjust behavior appropriately. The alternative to
self-tuning is building a file system for a fixed point on the moving
target of hardware and workload characteristics, and either living with
the resulting system well past its applicability or rebuilding
the system from scratch every few years.
Memory System Analysis
(Baer,
Bershad, Karlin,
H. Levy)
Our research group is investigating techniques that use the
operating system to improve memory system performance. We rely on
a combination of simple hardware support and operating system
modifications to monitor the dynamic behavior of applications.
These monitoring mechanisms incur a small overhead at runtime, but
the information they collect can be used to identify sources of
memory system delays such as cache misses and TLB misses. By
identifying and resolving these bottlenecks, we not only pay for the
overhead of the monitoring mechanisms, but also significantly
improve overall system performance. Click
here
SPIN -- An Extensible Microkernel for Application-Specific
Operating System Services
(
Bershad,
Chambers, Eggers)
Application domains, such as multimedia, databases and parallel
computing, require operating system services with high performance
and high functionality. Existing operating systems provide fixed
interfaces and implementations to system services and resources.
This makes them inappropriate for applications whose resource demands
and usage patterns are poorly matched by the services provided.
The SPIN operating system enables system services to be defined in an
application-specific fashion through an extensible microkernel.
It offers fine-grained control over a machine's logical and physical
resources to applications through run-time adaptation of the system to
application requirements. The application-specific services are
installed into the kernel at run-time. A trusted compiler and
pointer-safe language run-time environment ensure that the installed
sequences do not violate system integrity. The system runs on DEC ALPHA
workstations and Intel PCs.
Click here for
more information.
Instrumentation and Optimization of WIN32/Intel Executables with Etch
(Bershad,
H. Levy)
We have developed an application program performance evaluation and
optimization system called Etch for Intel x86 platforms running the
Windows/NT operating system. The system allows you to annotate
existing binaries with arbitrary instructions (for example, to trace, or
perform coverage analysis), or to rewrite an existing binary so that
it executes more efficiently. Etch works directly on Win32 x86
executables. It does not require program source code for either
measurement or optimization. Click here for more details.
Large Scale Mail Services
(Bershad,
H. Levy, T. Anderson)
To better understand the types and structures of services appropriate
for large scale clusters, we are developing Porcupine, a highly available, highly scalable, low cost cluster-based mail
server. The goal of Porcupine is to provide mail service for 100 million
users, or roughly one billion mail messages per day. at a cost of
$3/year/user. Porcupine is being developed on a PC-based cluster.
Kimera -- A Factored Java Architecture
(Bershad)
We are implementing a new Java architecture based on
factored components for security, performance, and scalability.
By locating crucial Java Virtual Machine services at
trust-domain boundaries, such as Intranet
firewalls, we can make safety enforcement mandatory, ease security
management and reduce the processing
requirements of Java endpoints. Under such a centralized security
architecture, the trusted computing base is minimal and consists of
small and simple components whose security can be more readily assured.
SPINE -- A Safe Programmable Integrated Network Environment
(Bershad)
SPINE provides a safe execution environment for applications to
directly compute on the network interface (NI).
Several network interfaces, for example the Myrinet, as well as
devices from the new I2O architecture, provide
the infrastructure necessary to compute on the NI itself. SPINE
uses ideas from the SPIN extensible operating
system (safe language, extensible interfaces, and resource
management) and applies these to the NI, which allows
applications to customize their interaction with the network.
The motivation is to migrate application-specific
functionality directly to the NI, thereby reducing I/O related
data and control transfers to the host system.
Security in Extensible Systems
(Bershad)
Extensible systems, such as SPIN or Java, raise new security
concerns. In these systems, units of code, or extensions, can be
added to a running system in almost arbitrary fashion.
Extensions closely interact through low latency, but type-safe
interfaces to form a tightly integrated system. As extensions
can come from arbitrary sources, not all of whom can be trusted
to conform to an organization's security policy, such structuring
raises the question of how security constraints are enforced in an
extensible system. In the SPIN extensible operating system built
at the University of Washington, we are experimenting with an
access control mechanism to address this problem. Our access control
mechanism decomposes access control into a policy-neutral enforcement
manager and a security policy manager, and it is transparent to
extensions in the absence of security violations. It structures the
system into protection domains, enforces the protection domains through
access control checks, and performs auditing of system operations.
It works by inspecting extensions for their types and operations to
determine which abstraction require protection, and by redirecting
procedure or method invocations to inject access control operations
into the system. By only imposing those dynamic access control
operations that are absolutely necessary, we intend to minimize the
performance overhead of system security. Click
here for more details.
SAAM II Development
(Golde)
SAAM II is a sophisticated computer program for the analysis of certain
biomedical data in the context of a proposed model.
Its development is an interdisciplinary effort,
involving faculty, students, and staff from Computer Science and
Engineering, Bioengineering, and the Applied Physics Laboratory.
The program encompasses both simulation and fitting of parameters to
experimental data.
The equations specifying the model can be algebraic or first order ordinary
linear or nonlinear differential equations.
One of the key elements of the program is that the differential equations
are automatically derived from a proposed graphical model of a physical
system.
The program also performs a statistical analysis of the results.
Design and Analysis of Competitive Algorithms for Online
Problems
(Karlin)
Competitive algorithms for online problems are guaranteed to have
performance which is close to that of the optimal offline performance
on each input. We are in the process of designing and implementing
such algorithms for a number of basic systems problems, including
cache management algorithms for exploiting spatial locality and
bypassing (with Baer), cooperative caching and prefetching in PC
clusters and wider area networks (with Levy), and other resource
allocation problems. Our focus is on the empirical evaluation of
algorithms with provable performance guarantees, with an eye towards
algorithms that adapt to observed behavior patterns. In addition, we
are looking at a number of more theoretical research questions,
revolving around the long-term goal of coming up with general
principles that characterize competitive algorithms for online
problems.
Vector Quantization and Wavelet Image Compression
(Ladner, Riskin (Electrical Engineering))
The use of digital images is rapidly expanding due to the emerging
technologies in numerous fields, such as remote sensing and HDTV.
The aim of this project is to provide state of the art, commercially
available, tools for storing, transmitting, and analyzing image data.
The goal of this project is to develop new algorithms based on VQ and
wavelets for different image processing tasks.
These are document image processing, document image browsing,
and compression for fast transmission of ice information location images.
Opal: A Single-Address-Space Operating System
(H. Levy)
The Opal project is exploring a new operating system structure,
tuned to the needs of complex CAD-style applications,
where many cooperating programs share a large persistent object database.
In Opal, all code and data exist within a single, huge, shared address
space.
Protection is independent of the single address space;
each Opal thread executes within a protection domain that defines which
virtual pages it has the right to access.
The single address space enhances sharing and cooperation,
because addresses have a unique (for all time) interpretation.
Thus, pointer-based data structures can be directly communicated and
shared between programs at any time, and
can be stored directly on secondary storage without the need for translation.
This structure is simplified by the availability of a large address space,
such as those provided by the DEC Alpha, MIPS R4000, HP/PA-RISC, IBM RS6000,
and Sun UltraSPARC.
We are currently building an Opal prototype,
in conjunction with The Boeing Company,
which is testing Opal's structure as a basis for their next-generation
airplane CAD system.
Click here for
more information.
Global Memory Management in Workstation and PC Clusters
(H. Levy,
Karlin)
Our work involves the general issue of managing resources
globally in high-speed local-area networks. The first phase
has been the design and implementation of algorithms that
manage memory globally in a cluster. In our system, called
GMS (for global memory system), active nodes can use available
physical memory pages of "idle" nodes as backing store,
reducing the fetch time on those pages by an order of magnitude.
We have implemented our system at the lowest level of
on the OSF operating system, underneath of both the file and
virtual memory systems. GMS currently runs
on an ATM-connected network of DEC Alpha workstations.
Click here for
more information.
Real-Time Systems
(Shaw)
Our research has been concerned with models, techniques, and tools for
specifying, analyzing, and generating software for real-time
applications. The emphasis is on requirements and design specifications;
fast prototyping; real-time operating systems; real-time programming;
and methods for
describing, predicting, and verifying the timing behavior of programming
languages and systems. Current research includes:
using execution-time bounds and
assertional logic to state and prove timing properties of higher-level
language programs; building software tools that predict the
deterministic execution times of programs; refining, implementing, and
testing a specification method, called communicating real-time state
machines; and studying the use of time-stamped event histories
in real-time programming. Two themes
of the research are the incorporation of time as a first-class
specification and programming object, and the analysis and exploitation
of concurrency.
DDDDrraw - Distributed 3D Real-time Rendering at Washington
(Zahorjan)
Real-time 3D rendering is an essential component of an
increasing number of multimedia applications (e.g., VRML, virtual
reality, and geographical information systems). However, despite recent
performance gains, the capability of a single workstation to render in
real-time is still quite limited. In the DDDDrraw project (pronounced
draw), we are working to improve real-time rendering performance through
the use of distributed rendering on a cluster of workstations. In this
context, improving performance means extending the complexity of the
scenes that can be rendered at a sufficiently fast as well as
predictable frame rate.
Design of Object-Oriented Languages
(Chambers)
Object-oriented languages promise to make programs easier to write, to
modify, and to extend. However, challenges remain in developing
expressive language features that support static, modular reasoning
and typechecking. We have been working on a series of theoretical and
practical language designs, including the Cecil language and its
successor Diesel, with the goal of marrying espressive features like
first-class functions, programmable multiple dispatch, and classless
objects with first-class modules and strong, static, separate
typechecking. We have implemented the Cecil language and written
real, large programs in it, in order to gain practical exprience with
our language designs and to suggest future ideas.
Efficient Implementation of High-Level Languages
(Chambers)
High-level languages can be powerful to program in, but they usually
run more slowly than lower-level languages. We are developing
implementation strategies for optimizing high-level languages,
particularly those with object-oriented features and first-class
functions, in an effort to close the performance gap between
high-level and low-level languages. We are studying static analyses,
at the intraprocedural, interprocedural, and whole-program level;
dynamic profile-guided optimizations; and control flow and data layout
transformations. We also are studying which optimizations can best be
performed at which stage of a program's lifetime (at traditional
compile-time of individual files, or at link-time, or at run-time),
and also how a single optimization can be divided into phases, each
performed at a different stage.
We have implemented many of these ideas in a language-independent compiler framework, named Vortex. Vortex currently includes front-ends for Java, C++, Smalltalk, Modula-3, and Cecil, and we study the effectiveness and cost of our techniques on large programs in each of these languages. In addition, Vortex itself is a 100,000-line Cecil program, providing practical feedback to the object-oriented language design project.
Dynamic Compilation
(Chambers,
Eggers)
Dynamic compilation enables constant-based optimizations on the values
of invariant data computed at run-time. Using the values of these
run-time constants, a dynamic compiler can eliminate their memory
loads, perform constant propagation and folding, remove branches they
determine, and fully unroll loops they bound. However, the
performance benefits of the more efficient, dynamically-compiled code
are offset by the run-time cost of the dynamic compile. Our approach
to dynamic compilation strives for both fast dynamic compilation and
high-quality dynamically-compiled code: the programmer annotates
regions of the programs that should be compiled dynamically; a static
optimizing compiler automatically produces machine-code templates that
have been pre-optimized by an analysis that identifies which variables
will be constant at run-time; and a simple, dynamic compiler copies
the templates, patching in the computed values of the run-time
constants, to produce optimized, executable code. Our work targets
general-purpose, imperative programming languages, initially C.
Initial experiments applying dynamic compilation to C programs have
produced speed-ups up to nearly a factor of 9. In the short term we
plan to extend the framework in several dimensions: to track the flow
of run-time constants across procedure boundaries, to develop special
forms of additional optimizations, such as register allocation and
instruction scheduling, that can be applied quickly at run-time, to
investigate the effect on dynamic compilation performance of different
microarchitecture designs, to
more fully automate the selection of run-time constants and dynamic
regions, and to extend our benchmark suite to include a wide range of
application areas and larger programs.
Compiling for Simultaneous Multithreading
(Eggers,
H. Levy)
Simultaneous Multithreading processors (SMT) is a processor design that
permits several independent threads to issue instructions to a (wide)
superscalar's functional units in a single cycle.
Its microarchitecture is a fairly straightforward extension of today's
wide-issue, dynamically scheduling processors.
Our experiments have shown that, relative to several other architectures
that are competing to be the design paradigm in the next few years,
SMT has higher program speedups and useful instruction throughput on a
variety of multi-threaded workloads,
while barely impacting single-thread performance.
Since an SMT architecture changes some fundamental architectural assumptions
on which several machine-dependent compiler optimizations rely,
we speculated that it also changes the use of these optimizations.
So far our experiments have validated this hypothesis:
loop distribution, loop tiling, and software speculation should all be
applied differently (or not at all!) on an SMT processor than for their
original, targetted machines.
We suspect the same is true for software prefetching, predicated execution and software pipelining.
ZPL: A Parallel Array Language
(Snyder)
ZPL is a portable, high performance parallel programming language
designed for science and engineering applications. Portability
means that ZPL runs well on all contemporary parallel machines,
including both shared memory and message passing machines and
networked clusters. (It has been targetted to many sequential
platforms as well.) High performance means that ZPL programs
exhibit running times comparable to parallel programs written in C
with programmer-specified interprocessor communication. ZPL is
also very convenient, introducing the region concept as a
replacement for the loops and indexing found in traditional
imperative languages. The ZPL compiler performs aggressive code
optimization, and pioneered machine independent communication
optimizations based on the Ironman Interface. The ZPL language and
compiler (released in July 1997) are available from the ZPL home page . Researchers
are developing ZPL applications in many areas of science and
engineering, including software libraries.
Advanced ZPL: A General Purpose Parallel Programming Language
(Snyder)
Advanced ZPL is the parent language of the recently released ZPL
Array Language, a stand alone data parallel programming language.
Advanced ZPL contains ZPL as a proper subset. It also includes
additional facilities making it a general purpose parallel
programming language. A-ZPL's features include support for
irregular and sparse data structures, facilities for user specified
data allocation and load balancing, pipelining, thread and task
parallelism, etc. Advanced ZPL uses the compilation and run-time
technology proved out in the ZPL implementation. These facilities
demonstrated high performance and portability by building on an
underlying abstract machine model (CTA) and an underlying
programming model (Phase Abstractions). Critical research topics
include the seamless integration of the data parallel facilities of
ZPL with the new A-ZPL features, the extension of the region
concept to control parallelism, and the exploitation of locality to
reduce communications costs. Advanced ZPL's features are designed
with close cooperation of scientists and engineers from
applications disciplines.
Internet Softbots
(Etzioni,
Weld)
AI research is moving away from "toy tasks, such as block-stacking, towards
more realistic problems. Building autonomous
agents that interact with real-world software environments, such as the
World Wide Web, is a pragmatically
convenient, yet intellectually challenging, substrate for AI research. To
support this claim, we are utilizing planning and
machine-learning technology to develop several Internet softbots (software
robots), customizable and (moderately) intelligent
assistants for Internet access. The softbot accepts goals in a high-level
language, generates and executes plans to achieve these
goals, and learns from its experience. Custom-built execution and sensing
modules enable the softbot to interact with the
Web in real time.
MetaCrawler
(Etzioni)
The MetaCrawler is a Web service that provides a single, central interface
for Web document searching. Upon receiving a
query, the MetaCrawler posts the query to multiple search engines in
parallel, and performs sophisticated pruning on the
responses returned. A commercial version of MetaCrawler is available at
www.metacrawler.com. The research version at
http://huskysearch.cs.washington.edu continues to be the platform for
cutting-edge research on web search mechanisms including fast clustering
algorithms for search engines results, and collaborative searching
architectures.
Management of Semistructured Data
( A. Levy)
Several important applications involve large amounts of irregular
semistructured data. Examples include the management of
biological data, large volumes of online documentation, program
libraries, and integration of data from multiple external
sources. Currently, such data resides mostly in structured files, and
basic database services such as querying and access control and
transaction management are not supported. The main characteristics
distinguishing semistructured data from traditional data are the
absence of a fixed, predetemined schema, and the fact that the data is
not strongly typed as in traditional models. Our research considers
the management of semistructured data from several
perspectives. First, we are developing appropriate models and query
languages for such data. Second, we have developed novel algorithms
for optimizing queries over semistructured data. Finally, we are also
considering problems conerning the storage of semistructured data and
architectures for uniformly combining structured and semistructured
data. Our solutions are being tested and incorporated into the Strudel
Web site management system, which is based on a semistructured data
model.
Web Site Management
( A. Levy)
The World-Wide Web (WWW) has become a prime vehicle for disseminating
information. As a result, the number of large Web sites with complex
structure and that serve information derived from multiple data
sources is increasing. Managing the content and the
structure of such Web sites has raised a novel data management
problem. We are developing the Strudel Web site management system
which is the first system to apply concepts from database management
systems to the problem of Web site management. The architecture of Strudel is
based on separating three distinct tasks in building Web sites:
(1) choosing and accessing the data that will be displayed at the
site, (2) designing the site's structure, i.e., specifying the data
contained within each page and the links between pages, and (3)
designing the graphical presentation of pages. In particular, Strudel
provides a mechanism for declaratively describing the
structure and content of a Web site. As a result, several tasks that
are otherwise very tedious to perform are supported very naturally in
Strudel, such as (1) automatically updating a site, (2) restructuring
a site, (3) creating multiple versions of a Web site and (4) enforcing
integrity constraints on a site's structure.
[an error occurred while processing this directive]
Data Integration
( A. Levy,
Weld)
An information integration system provides a user with a uniform
query interface to multiple information sources such as the thousands
available on the Internet. As such, the system frees the user from having
to locate the data sources relevant to a query, query each source in
isolation, and manually combine the information from the different sources.
Our research considers several problems related to data integration
systems. First, we have developed methods for describing the contents,
capabilities and completeness of data sources. Second, we are developing
methods for automatically creating robust and efficient query execution
plans for user queries. Our work develops novel techniques in the areas of
planning, database query optimization and knowledge representation.
HCI Issues in Web Site Management
( Borning,
A. Levy)
Building tools for Web site design and management raises a number of
interesting challenges in human-computer interaction and data management.
The web site design and maintenance task can be seen as consisting of three
subtasks: data management, structure management and graphical presentation.
The work on
Strudel has
demonstrated the value of this decomposition, as well as providing powerful
declarative tools for data and structure management. We are now
investigating user interface metaphors to make these tools more accessible
to designers, along with interfaces that embody these metaphors. Also, we
are investigating whether the HTML generation languages that support the
third subtask -- graphical presentation -- can incorporate additional
declarative tools as well, namely
constraints on
possible layout of the pages, as well as specifications of how the
layout is to be determined from the structure. At a more fundamental level
we are developing constraint languages and efficient constraint solvers
that incorporate both layout constraints and structural constraints.
Symbolic Model Checking of Software Specifications
(R. Anderson,
Beame,
Notkin)
Errors in software specifications cost money and, in some cases,
threaten lives.
Symbolic model checking based on binary decision diagrams (BDDs)
is an efficient automatic verification technique that has been applied
successfully to many hardware designs but only in a very limited way to
software specifications. We apply this technique to large safety-critical
software specifications. We show how to provide effective input encodings to
symbolic model checkers and tune their algorithms to handle specifications
given as communicating hierarchical state machines. Although BDD-based
symbolic model checking typically is efficient, it can bog down when dealing
with non-linear arithmetic that is prevalent in some software specifications.
We develop extensions of standard model-checking techniques that hold the
promise of circumventing this non-linear arithmetic bottleneck.
Constraint-Based Systems
(Borning)
A constraint describes some relation that we would like to keep maintained,
for example, that a column in a table in a web page is at least 20 pixels
wide, that the background color in a document becomes more yellow with age,
or that window w1 is twice as high as window w2. In recent research we
developed a number of fast and expressive constraint solvers for
interactive graphical applications. We are now applying these solvers to
web page layout, including html extensions to allow the page author to
express constraints, and to interactive graphical editors. For more
information please see the web pages for Constraint-Based
Systems.
Program Restructuring
(Notkin)
Software maintenance tends to become more difficult as a system evolves,
in part because the structure of the system degrades.
Restructuring the system to overcome such entropy is one approach to
reducing maintenance costs.
However, restructuring does not make visible "progress" and
it is itself error-prone.
We are investigating an alternative approach, meaning-preserving
restructuring.
In this approach, the software engineer applies structural changes and
a tool then finds a set of global, compensating changes that
(together with the engineer's explicit change)
will guarantee that the meaning of the program remains unchanged.
Reflection Models
(Notkin)
Given an existing software system and a task to accomplish,
software engineers decide whether and how to proceed based on experience,
judgment, and system-specific knowledge.
For even modest-sized software systems,
there is a substantial risk of making poor decisions,
because it is difficult (if not impossible) for software engineers to
maintain adequate knowledge of the source.
To manage this risk,
engineers often use high-level models to focus on particular aspects
of the system relevant to the task.
These models help the engineer reduce risk
only if they are sufficiently accurate representations of the actual system.
We have developed an approach that may help engineers determine whether
their task-specific models are "good enough" to make informed decisions.
An engineer defines a high-level structural model appropriate for the task
at hand and also specifies how the model maps to the source.
A tool then computes a reflexion model
that shows the engineer where the high-level model is consistent with
and differs from the source.
Analyzing the reflection model and applying personal experience and
judgment,
an engineer can then assess the risk of proceeding with the task.
Object-Oriented Frameworks
(Notkin)
Frameworks are one of several approaches to exploiting the commonalities
of a family of software systems.
They have been used successfully in user interfaces, network management,
and operating systems.
Applying framework operations,
such as instantiating a framework to a specific application,
can be error-prone.
This is, in part, because any errors in the operation are not found
until the application itself is executed.
We are exploring approaches to reducing these problems
(and thus increasing the benefits of using frameworks)
through the use of static typing in object-oriented languages and
tools to help execute these framework operations.
Implicit Invocation Mechanisms
(Notkin)
Implicit invocation (also known as event-based) mechanisms allow software
components to announce interesting events that cause the invocation of other
components that have registered interest in those events.
When carefully applied, implicit invocation can be used to structure
software systems that are more easily integrated and
that are more easy to evolve.
We are exploring effective design approaches using implicit invocation,
as well as the design space for implicit invocation mechanisms
(including language-based and domain-specific mechanisms).
Source Model Extraction
(Notkin)
Reverse engineers depend on the automatic extraction of information
from source code.
Some useful kinds of information -- source models -- are well-known:
call graphs, file dependences, etc.
Predicting every kind of source model that a reverse engineer may need
is impossible.
We have developed a lightweight approach for generating flexible and
tolerant source model extractors from lexical specifications.
We have also performed an empirical study of a set of C call graph
extractors.
These extractors return different, often significantly different,
call graphs when applied to the same programs.
Real-Time Systems
(Shaw)
Our research has been concerned with models, techniques, and tools for
specifying, analyzing, and generating software for real-time
applications. The emphasis is on requirements and design specifications;
fast prototyping; real-time operating systems; real-time programming;
and methods for
describing, predicting, and verifying the timing behavior of programming
languages and systems. Current research includes:
using execution-time bounds and
assertional logic to state and prove timing properties of higher-level
language programs; building software tools that predict the
deterministic execution times of programs; refining, implementing, and
testing a specification method, called communicating real-time state
machines; and studying the use of time-stamped event histories
in real-time programming. Two themes
of the research are the incorporation of time as a first-class
specification and programming object, and the analysis and exploitation
of concurrency.
3D Scanning, Reconstruction, and Rendering
(Curless,
Duchamp
(Mathematics),Stuetzle (Statistics))
3D scanning is a powerful tool for acquiring the shape and appearance
of unique physical objects, and for bringing the creative design
process from the sculptor into the computer. Capturing the properties
of existing objects has driving applications in reverse engineering,
inspection, virtual dissemination of museum artifacts, and anatomical
modeling for medicine. In addition, when desiging new objects such as
automobiles or synthetic movie characters, clay is frequently a
superior "interface" than most computer-aided geometric design
systems; 3D scanning provides the means of bringing the resulting
shape into the computer. In our research, we are concerned with all
phases of 3D scanning. We are developing new methods for acquiring
high resolution shape and color data. We are exploring ways of
reconstructing useful computer models from this data. And we are
devising algorithms that enable real-time interaction with these
models by taking advantage of current trends in 3D graphics hardware
development.
Multiresolution Methods in Computer Graphics
(Salesin)
We are exploring the application of multiresolution methods, such as
"wavelets," to problems in computer graphics. These graphics
applications include image editing and compression, automatic
level-of-detail control for editing and rendering curves and surfaces,
surface reconstruction from contours, and fast methods for solving
simulation problems in global illumination and animation. Our results
to date are described in the research monograph, Wavelets for Computer Graphics:
Theory and Applications, and in the following papers:
| Clustering for Glossy Global Illumination | (1996) |
| Fast Rendering of Complex Environments Using a Spatial Hierarchy | (1996) |
| Global Illumination of Glossy Environments Using Wavelets and Importance | (1996) |
| Hierarchical Image Caching for Accelerated Walkthroughs of Complex Environments | (1996) |
| Interactive Multiresolution Surface Viewing | (1996) |
| Multiresolution Video | (1996) |
| Fast Multiresolution Image Querying | (1995) |
| Wavelets for Computer Graphics: A Primer | (1995) |
| Multiresolution Curves | (1994) |
| Multiresolution Painting and Compositing | (1994) |
| Wavelet Radiance | (1994) |
| Computer-Generated Watercolor | (1997) | |
| Multiperspective Panoramas for Cel Animation | (1997) |
| Orientable Textures for Image-Based Pen-and-Ink Illustration | (1997) |
| Comic Chat | (1996) |
| Declarative Camera Control for Automatic Cinematography | (1996) |
| Rendering Parametric Surfaces in Pen and Ink | (1996) |
| Scale-Dependent Reproduction of Pen-and-Ink Illustrations | (1996) |
| The Virtual Cinematographer: A Paradigm for Automatic Real-Time Camera Control and Directing | (1996) |
| Computer-Generated Pen-and-Ink Illustration | (1994) |
| Interactive Pen-and-Ink Illustration | (1994) |
| Electronic ``How Things Work'' articles: Two early prototypes | (1993) |