Break-out Topics and Talks | Paul G. Allen School of Computer Science & Engineering

Monday, October 30, 2006

SESSION I 9:25 - 10:30	Future Architectures: WaveScalar and Quantum Computing CSE 305	The Future of Search CSE 403	Networking Your World CSE 691
SESSION II 10:40 - 11:30	Internet Systems CSE 305	Computing in the Biology Revolution CSE 403	21st Century Programming CSE 691
SESSION III 11:40 - 12:30	Monitoring the Physical World with RFID CSE 305	Accessible Computing CSE 403	Digital Entertainment: Games and Animation in UW CSE CSE 691

Session I

Future Architectures: WaveScalar and Quantum Computing (CSE 305)

9:25-9:30: Introduction and Overview: Susan Eggers

9:30-9:45: WaveScalar, Martha Mercaldi

Tiled architectures offer a great deal of promise in the face of recent architectural challenges. However in order to extract the most performance out of a tiled design, one must schedule computation onto the tiles carefully. In this study, we explore hierarchical instruction scheduling techniques for one particular tiled processor, WaveScalar. Our results show that at the top level of the hierarchy, a simple profile-driven algorithm effectively minimizes operand latency. After this schedule has been partitioned into large sections, the bottom-level algorithm must more carefully analyze program structure when producing the final schedule.

Our analysis reveals that at this bottom level, good scheduling depends upon carefully balancing instruction contention for processing elements and operand latency between producer and consumer instructions. We develop a parameterizable instruction scheduler that more effectively optimizes this trade-off. We use this scheduler to determine the contention-latency sweet spot that generates the best instruction schedule for each application.
9:45-10:00: Reducing Control Overhead in Dataflow Architectures, Andrew Petersen (PDF Slides)

Tiled architectures are a response to emerging problems in processor design including design complexity, wire delay, and fabrication reliability. One tiled architecture, WaveScalar, uses a dynamic, tagged-token dataflow execution model to simplify the design of the processor tiles and to achieve good parallel performance.

However, using the dataflow execution model reawakens old problems, including a 2-3x dynamic instruction overhead to execute conventional, control-oriented code in languages like C. We present three compiler optimizations that significantly reduce control overhead with minimal additional hardware. Together, these optimizations reduce control overhead by 80% on average and increase application performance between 21 and 37%.
10:00-10:15: How to Build a Quantum Computer, David Bacon

The quantum theory of nature is one of the most successful physical theories of all time. It is essential for almost all of physics and chemistry, yet it was only recently realized that it could have an impact on the world of computation since one could conceivably construct computers whose operation is entirely quantum mechanical. Such quantum computers can exploit the counter intuitive properties of quantum theory to achieve speedups over the best known classical algorithms. Most importantly, a quantum computer could be used to efficiently factor numbers and hence break the most widely used public key cryptosystems. Here I will discuss the current vision for how we will actually build such a quantum computer, in particular focusing on the novel hardware challenges which arise in such an endeavor.
10:15-10:30: Shake and Bake Silicon Manufacturing, Mark Oskin

Technology scaling has long benefited the information technology industry. Unfortunately, at each new technology node, the fixed costs per design increase. Currently these costs exceed a million dollars for the latest generation of silicon fabrication. This overhead makes spinning new ASICs an expensive proposition, and one not undertaken lightly. Ultimately, as these costs continue to increase the effect on the industry will be a stifling of innovation. In this proposal we describe an entirely new way to manufacture silicon chips. The idea is to take a brick and mortar approach: chips are constructed from a collection of pre-built standardizes bricks (processors, codecs, signal processors, etc). These bricks are assembled using a fluidic self assembly process into an application-specific arrangement. This arrangement of bricks is then flip-chip bounded onto an I/O routing chip to produce the custom ASIC. The research team assembled for this work spans a wide range of specialties, including fluidic self-assembly, analog VLSI, FPGA/ CAD, and computer systems architecture. It is only though innovations at all of these layers will this technology be successful.

The Future of Search (CSE 403)

9:25-9:30: Introduction and Overview: Oren Etzioni

9:30-9:45: Searching the Web for Open-Source Code, Raphael Hoffmann (Powerpoint Slides)

Increasingly, programmers are using the web as a giant repository of source code. By searching for reusable libraries they can save significant development effort, and by finding example code they can easily learn how to use an API. Unfortunately, traditional Information Retrieval (IR) techniques (e.g. the vector-space model, anchor text indexing, and pagerank-style link analysis) tend to perform poorly for code. Our analysis of 15 million queries collected by a popular search engine revealed interesting results about what information developers are seeking and how they are finding it. A large number of queries referred to example code snippets, and click-through data suggests that these are typically found in technical articles and tutorials. Motivated by these results, we implemented a novel search engine that automatically finds source-code snippets on web pages, parses the fragmentary code, and links them to referenced API’s. The result yields a hybrid call-graph / hyperlink structure, which allows us to better rank and organize search engine results.
9:45-10:00: Cross-Lingual Image Search, Stephen Soderland (Powerpoint Slides)

Image search on the Web relies on tags found near the image, which limits the ability to retrieve an image unless the query is in the same language as the page where the image is found. Cross-lingual homonyms cause problems as well. For example, the Hungarian word for tooth is 'fog', which makes it challenging for Hungarian searchers to find images of teeth on the Web. To solve these and related problems we introduce PanImages, a cross-lingual image search engine. PanImages enables searchers to translate and disambiguate queries before sending them to Google. PanImages utilizes several machine-readable dictionaries and composes them into a translation graph. Our experiments show that for non-English queries, PanImages increases precision by 38% while improving recall 38-fold.
10:00-10:15: Information Extraction from the Web, Ana Maria Popescu

In the past few years the World Wide Web has emerged as an important source of data, much of it in the form of unstructured text. In this talk we describe an extensible extract-and-assess architecture for the acquisition of high-quality information from Web text. Our approach is characterized by the extensive use of Web statistics, lexico-syntactic patterns and unsupervised learning methods. We show how our approach can be used in a number of information extraction tasks: subclass and related class extraction, relation property acquisition, the extraction of product features together with corresponding user opinions and finally, the extraction of commonsense knowledge.
10:15-10:30: Search with Partial Information: Data-Driven Word Sense Disambiguation, Andrei Alexandrescu (PDF Slides)

Traditional approaches use encodings of the surrounding context as input features to a standard classifier. We introduce a new two-pass method combining supervised and a semi-supervised learning methods. First, a supervised classifier learns indicative features in the form of a continuous representation of the initial features; second, the learned features are used to create a graph that is then used by a graph-based semisupervised algorithm. The technique combines the advantages of supervised and semi-supervised learning and performs better than either.

Networking Your World (CSE 691)

9:25-9:30: Introduction and Overview: Tom Anderson

9:30-9:45: iPlane: An Information Plane for Distributed Services, Harsha Madhyastha (Powerpoint Slides)

In this talk, we will present the design, implementation, and evaluation of iPlane, a scalable service providing accurate predictions of Internet path performance for emerging overlay services. Unlike the more common black box latency prediction techniques in use today, iPlane adopts a structural approach and predicts end-to-end path performance by composing the performance of measured segments of Internet paths. For the paths we observed, this method allows us to accurately and efficiently predict latency, bandwidth, capacity and loss rates between arbitrary Internet hosts. We demonstrate the feasibility and utility of the iPlane service by applying it to several representative overlay services in use today: content distribution, swarming peer-to-peer filesharing, and voice-over-IP. In each case, using iPlane's predictions leads to improved overlay performance.
9:45-10:00: Internet Geolocation Using Topology and Delay Measurements, John P. John

We present Topology-based Geolocation (TBG), a novel approach to estimating the geographic location of arbitrary Internet hosts. We motivate our work by showing that 1) existing approaches, based on end-to-end delay measurements from a set of landmarks, fail to outperform much simpler techniques, and 2) the error of these approaches is strongly determined by the distance to the nearest landmark, even when triangulation is used to combine estimates from different landmarks. Our approach improves on these earlier techniques by leveraging network topology, along with measurements of network delay. We convert topology and delay data into a set of constraints, then solve for router and host locations simultaneously. This approach improves the consistency of location estimates, reducing the error substantially for well-constrained structured networks. For networks with insufficient structural constraints, our techniques integrate external hints that help improve accuracy. Together, these techniques lower the median estimation error for our dataset to 67 km vs. 228 km for the best previous approach.
10:00-10:15: Analyzing the MAC-Level Behavior of Wireless Networks, John Zahorjan (Powerpoint Slides)

Despite the widespread deployment of wireless networks, very little is known about their operational behavior in live use. This gap in understanding is due in large part to the surprising difficulty of taking meaningful measurements of wireless traffic. I'll present Wit, a tool for collecting and analyzing MAC-level wireless traces. Wit employs novel techniques to merge independent observations of a single system, to infer information still missing after merging, and to deduce key performance measures involving relationships among multiple samples, rather than just sample counts. I'll also present preliminary results about the effectiveness of the 802.11 MAC layer obtained by applying Wit to data collected at a live hotspot.
10:15-10:30: Do Incentives Build Robustness in BitTorrent?, Michael Piatek (PDF Slides)

A fundamental problem with many peer-to-peer systems is the tendency for users to "free ride"--consume resources without contributing to the system. The popular file distribution tool BitTorrent was explicitly designed to address this problem, using a tit-for-tat reciprocity mechanism to provide positive incentives for nodes to contribute resources to the swarm. We show that although BitTorrent has been fantastically successful, its incentive mechanism is not robust to selfish clients. Through performance modeling parameterized by real-world traces, we demonstrate that high capacity peers provide low capacity peers with an unfair share of aggregate swarm resources. We use these results to drive the design and implementation of BitTyrant, a selfish BitTorrent client that provides significant performance gains in practice on real Internet swarms. We further show that this selfishness, when applied universally, hurts average swarm performance compared to today's BitTorrent client implementations.

Session II

Internet Systems (CSE 305)

10:40-10:45: Introduction and Overview: Steve Gribble

10:45-11:00: A Web of Personal Files, Roxana Geambasu (PDF Slides)

We describe SharedViews, a peer-to-peer data management system that simplifies file organization and facilitates file sharing and protection among home users on the Internet. The key innovation of SharedViews is the integration of queries and dynamic views from database systems with a capability-based protection model from the operating systems world. Users organize their files using views and share their views by exchanging capabilities. The system allows users to integrate local or remote dynamic collections of files into the local one through a seamless view composition mechanism. What results is a Web of personal files. We show that SharedViews is easy to use and provides clients with flexible protection, but without the account creation and centralized protection management problems inherent in current shared-data systems.
11:00-11:15: Safe and Efficient Isolation for Web-based Applications, Charlie Reis (PDF Slides)

Web applications written in JavaScript have become prevalent, and they require a set of services on par with those needed by many desktop applications. We quantify this increasing use of JavaScript on the web, and we investigate how well current web browsers support the needs of web applications. Through observations of adversarial and actual web sites, we demonstrate poor browser support for numerous important services, including concurrency, reliability, protection, and communication. We then discuss how to properly isolate JavaScript applications in a way that is effective and efficient, proposing two changes to current web browsers. First, we introduce OS process boundaries at an appropriate granularity to improve concurrency and reliability, without disrupting current web sites. Second, we extend the browser's same-origin security policy to cover active content, to improve protection and the safety of communication. We demonstrate how our changes resolve problems with both adversarial pages and observed web sites, and we quantify the impact of our changes on existing sites.
11:15-11:30: SpyProxy: On-the-fly Protection from Malicious Web Content, Alex Moshchuk (Powerpoint Slides)

The Web has recently moved towards highly interactive applications, such as Ajax-based sites. As a result, today’s users are exposed to a myriad of scripts and other code that are downloaded from untrusted sources and executed in their browsers. Although it improves responsiveness, this new active content can also carry threats such as spyware to the user’s desktop. We designed, implemented, and evaluated a new defense system for malicious Web content, called SpyProxy. It is a network proxy that acts as both a cache and a checker for Web pages. SpyProxy intercepts users’ Web requests and analyzes them on the fly using a combination of static analysis and virtual-machine-based dynamic techniques. It detects and blocks threats before they reach the user’s browser. Unlike current signature-based systems, SpyProxy observes the behavior of Web content during rendering and therefore can detect previously unseen threats. Our tests demonstrate the effectiveness and importance of using on-the-fly techniques. In experiments where a client fetched malicious Web pages, SpyProxy detected every threat. Moreover, many of those threats were not detected by other antispyware systems. Equally important, our evaluations show that with careful implementation and optimization, the performance impact of a proxy-based malware detector can be reduced to the point where it is negligible.

Computing in the Biology Revolution (CSE 403)

10:40-10:45: Introduction and Overview: Martin Tompa

10:45-11:00: Tracing the Effects of Immune System Selection Pressure on HIV, Jonathan Carlson (Powerpoint Slides)

Human Immunodeficiency Virus type 1 (HIV-1) is one of the most successful human pathogens in recorded history. Since its identification in 1981, HIV-1 has killed more than 25 million people and the United Nations estimates that it currently infects more than 50 million people world wide. Despite tens of billions of dollars spent on research, a viable vaccine remains elusive and the virus continues to evade even our best anti-viral cocktails. One key reason for HIV's success is its high rate of evolution--each day, millions of genetically distinct variants are generated in a single patient, and those variants that show better resistance to the immune system eventually dominate that patient's HIV population. Using a novel phylogenetic model of selection pressure, we have identified hundreds of escape mutations in HIV-1 that are associated with specific human immune system variants, as well as apparent compensatory mutations that maintain protein structure in the face of these changes. These predictions are being validated in the lab and further our understanding of the complex pathogen-host interaction during the course of clinical infection.
11:00-11:15: Neurobotics: A New Addition to CSE toward Neural Engineering Initiatives, Yoky Matsuoka

The field of Neurobotics is a new field that lies at the intersection of Robotics and Neuroscience. In Neurobotics, robotic models and environments are used to understand the biomechanics and neuromuscular control of human limbs. In parallel, robotic systems are developed to augment, replace and rehabilitate damaged sensorimotor functions. In this talk, an overview of Neurobotics is provided and it is connected to the strength of UW in Neural Engineering.
11:15-11:30: Novel Techniques in Brain Mapping and Thought-Based Object Control Using Electrocorticography, Kai J. Miller

Using new insight into the nature of the potential at the brain surface, we have been able to engineer a new technique in real time brain mapping. This understanding has also allowed us to further our creation of new types of brain-computer interfaces. This brain mapping technique and thought-based object control are explained conceptually and demonstrated in video.

21st Century Programming (CSE 691)

10:40-10:45: Introduction and Overview: Dan Grossman

10:45-11:00: Matching Program Elements Across Versions, Miryung Kim (PDF Slides)

Identifying matches between two versions of a program is a fundamental building block for many software engineering tools. Our recent survey of existing matching techniques identified a disparity between the kinds of information that tools use and the kinds of information that programmers use when matching code. Existing tools use only similarity measures to match code at fixed granularities such as lines and methods. Programmers, on the other hand, may also use information about changes between the versions; for example, they may look for code that was refactored. In this talk, we present an approach that enables tools to benefit from similar change information. In particular, our approach automatically infers likely changes between two versions, represents them as a set of change rules, and then determines matches based on the rules.
11:00-11:15: Seminal: Searching for Type-Error Messages, Benjamin Lerner (Powerpoint Slides)

Seminal is a compiler-construction project for designing compilers that produce better type-error messages while simultaneously making them simpler, more reliable, and faster. There is a strong tension between the complex, expressive type systems found in many modern programming languages and the still more complex bookkeeping needed to explain the errors that arise in those systems. Type-checkers are the arbiters of program safety, and as such must be trusted to perform correctly. While algorithms for inferring types are often simple, elegant, and fast in practice for well-typed programs, and hence well trusted, these straightforward algorithms often do not produce good error messages. But extending the type-checker with state to generate better error messages violates the goal of simplicity.

We maintain the simple elegance of a type-checker that makes little attempt to produce good error messages by not using it directly to produce error messages. Instead, the error-message producer is a search procedure that uses the simple type-checker as an oracle to determine if programs type-check. The goal of the search is to find a new program very much like the ill-typed program except the skeleton would type-check. We have a prototype implemented for Objective Caml, and a paper describing our work appeared in ML Workshop 2006. We present our experiences with this prototype, and current work implementing this approach for C++ template instantiation errors.
11:15-11:30: Coping With Implementation Dependencies, Marius Nita (PDF Slides)

A typical programming language definition will mark a subset of the language as "unspecified", meaning an implementor is offered some amount of freedom in deciding how this unspecified subset should behave. It is possible, often convenient, and sometimes necessary, to write programs that are implementation-dependent -- that make particular assumptions about these idiosyncrasies. This talk presents our ongoing effort to build tools for aiding with porting efforts and for implementing portable code from scratch. We present an abstract model for a C-like language which isolates the notion of "implementation," and show how a program's implementation dependencies (which we can extract from the program text) can be safe on some implementations but not on others. Based on this model, we are currently implementing a tool for automatic discovery of portability bugs in C code.

Session III

Monitoring the Physical World with RFID (CSE 305)

11:40-11:45: Introduction and Overview: Magdalena Balazinska

11:45-12:00: RFID Ecosystem: Experimenting with a Pervasive RFID-Based Infrastructure, Evan Welbourne (Powerpoint Slides)

The success of RFID in supply-chain-management has lead many to consider more personal and pervasive deployments of RFID technology. Unlike industrial settings, however, deployments that involve humans raise new and critical problems with privacy, security, uncertainty, and diverse, evolving applications.

At the UW CSE, we are deploying a building-wide RFID-based infrastructure with hundreds of antennas and thousands of tags. Our goal is to uncover the challenges of pervasive RFID deployments and devise techniques for addressing these challenges before such deployments become common place.

In this talk, we present the challenges encountered and lessons learned during a smaller-scale pilot deployment of the system. We show some preliminary results and, for each challenge, discuss how we addressed it or how we plan to.
12:00-12:15: StreamClean: Near Real-Time RFID Data Cleaning, Nodira Khoussainova

The use of radio frequency identification (RFID) is expanding across many different application domains, including supply-chain management, inventory tracking, and even more user-oriented applications such as reminder services.

RFID technology offers users and applications the ability to track tagged objects and people in a monitored environment. This technology, however, is unreliable. Antennas often fail to detect tags in their vicinity or they detect tags that are in fact far way, causing multiple antennas to detect the same tag at the same time.

In this talk, we present StreamClean, a system we are currently building for cleaning RFID data automatically using application defined integrity constraints. Since it is often not possible to clean errors with certainty, we clean the data probabilistically. This involves assigning to each tuple the probability that it is correct. We show the effectiveness of StreamClean by demonstrating its application on data from the RFID Ecosystem project.
12:15-12:30: WISP: A Wirelessly Powered, RFID-Compatible Sensing Platform, Joshua R. Smith, Intel Research Seattle

When Tesla began investigating the wireless delivery of power about a century ago, the available electrical devices (light bulbs and motors) required large amounts of power. Moore's law has lowered the power requirements of small electronic devices far enough that it now makes sense to revisit wireless power.

WISP is a platform for sensing and computation that is wirelessly powered and read by a standard Ultra-High-Frequency (UHF) RFID reader. We will describe a WISP that includes a 16 bit microcontroller and a 3 axis accelerometer. We believe that this is the first wirelessly powered accelerometer system.

We will discuss the research issues and practical possibilities suggested by RFID-compatible, wirelessly-powered sensing and computation.

Accessible Computing (CSE 403)

11:40-11:45: Introduction and Overview: Gaetano Borriello

11:45-12:00: Personalized Adaptive Interfaces, Krzysztof Gajos (PDF Slides)

Graphical user interfaces (GUIs) for desktop applications are typically optimized for ``average'' users who interact with computers via keyboard, mouse and a small range of display sizes. Part of the reason why some users with vision or motor impairments find it hard to use computers is not their inherent inability to use computers effectively but the mismatch between those users' individual needs and the designers' assumptions.

We argue that a flexible automatic UI generation can help provide disabled users with custom-tailored user interfaces that optimally take advantage of these users' abilities -- a task that cannot be accomplished by human designers due to the idiosyncratic nature of many disabilities. We have built SUPPLE to automatically generate custom UIs and we are also building a system that allows non-experts to easily reparametrize the UI generator for the needs of individual users, thus making our solution scalable.
12:00-12:15: Indoor Wayfinding: Developing a Functional Interface for Individuals with Cognitive Impairments, Alan Liu (Powerpoint Slides)

Assistive technology for wayfinding will significantly improve the quality of life for many individuals with cognitive impairments. The user interface of such a system is as crucial as the underlying implementation and localization technology. We built a system using the Wizard-of-Oz technique that let us experiment with many guidance strategies and interface modalities. Through user studies, we evaluated various configurations of the user interface for accuracy of route completion, time to completion, and user preferences. We used a counter-balanced design that included different modalities (images, audio, and text) and different routes. We found that although users were able to use all types of modalities to find their way indoors, they varied significantly in their preferred modalities. We also found that timing of directions requires careful attention, as does providing users with confirmation messages at appropriate times. Our findings suggest that the ability to adapt indoor wayfinding devices for specific users’ preferences and needs will be particularly important.
12:15-12:30: MobileASL: Intelligibility of Sign Language Video as Constrained by Mobile Phone Technology, Anna Cavender (Powerpoint Slides)

For Deaf people, access to the mobile telephone network in the United States is currently limited to text messaging, forcing communication in English as opposed to American Sign Language (ASL). Mobile video phones have the potential to give Deaf people access to real-time mobile communication in their preferred language. However, even today's best video compression techniques cannot yield intelligible ASL at limited cell network bandwidths. Motivated by this constraint, we conducted user studies with members of the Deaf Community to determine the intelligibility effects of video compression techniques such as region-of-interest encodings and reduced frame rates.

Digital Entertainment: Games and Animation in UW CSE (CSE 691)

11:40-11:45: Introduction and Overview: Zoran Popović

11:45-12:00: Continuum Crowds: Massive Urban Crowds in Real-Time, Seth Cooper (Powerpoint Slides) (associated video)

Understanding human motion is a difficult and important area of research in computer graphics. As models for individual human motion develop, research has begun to look into the motion of crowds of people, a development which introduces considerable new difficulties. We propose a new theory of crowd dynamics, adopting a novel energy minimization perspective on crowd flow rather than the traditional per-agent decision processes. This approach yields smooth, highly realistic simulations of large crowds in real-time. The speed of our technique makes it attractive for interactive applications such as video games.
12:00-12:15: Real-time Fluids, Adrien Treuille (zipped files, includes videos)

Computer graphics researchers have made great strides towards simulation of complex phenomena for special effects and other off-line applications. A much less explored but equally important domain is interactive virtual worlds including training simulations, computer games, and other situations where interactivity is required. This project explores the use of model reduction to achieve drastically faster simulations of complex, high-dimensional phenomena. Our current work focuses on incompressible fluids. Within this context we have found that:
* Speedups of up to 5 orders of magnitude are possible.
* Simulations are uncoditionally stable.
* Moving objects can be correctly incorporated into the flow.
12:15-12:30: Animation Production at UW: Character Motion, Story and Digital Production, Barbara Mones

For the past eight years, upper level computer science undergraduates have joined students from many other areas of the University (English, Cinema Studies, DXArts, Music and Achitecture to name a few) in a collaborative and interdisciplinary adventure to design and produce a short animated film from initial concept to completion. We'll screen some of the best work we've produced to date and then provide an in depth review of several "shot breakdowns" from our films. This will give you an inside view into the challenges the groups face and an understanding of the production pipeline with an emphasis on how we iterate character motion to support our story.