Break-out Topics and Talks | Paul G. Allen School of Computer Science & Engineering

Thursday, October 29, 2009

SESSION I 9:40 - 10:30	Security and Privacy CSE 305	Making Parallelism Safe for Everyone, Preventing Threads from Going Wild CSE 403	New Approaches to Everyday Interaction with Machine Learning CSE 691
SESSION II 10:40 - 11:30	Better Software Development CSE 305	Data Intensive Scientific Analytics CSE 403	Sustainability and Information Technology CSE 691
SESSION III 11:40 - 12:30	Technology for Under-Served Regions CSE 305	The Revolution in Computational Biology CSE 403	Reconstructing the World from Web Images CSE 691

Session I

Security and Privacy (CSE 305)

9:40-9:45: Introduction and Overview: Yoshi Kohno
9:45-10:00: Security and Privacy for RFIP Tags: Passport Cards, Enhanced Drivers Licenses, and Beyond, Karl Koscher

EPC (Electronic Product Code) tags are industry-standard RFID devices poised to supplant optical barcodes in many applications. We explore the systemic risks and challenges created by the increasingly common use of EPC for security applications. As a central case study, we examine the recently issued United States Passport Card and Washington State "enhanced drivers license" (WA EDL), both of which incorporate Gen-2 EPC tags. We measure multiple weaknesses, including susceptibility to cloning, extended read ranges, and the ability to remotely kill a WA EDL.

We study the implications of these vulnerabilities to overall system security, and other suggestions for improvement. We demonstrate anti-cloning techniques for off-the-shelf EPC tags, overcoming practical challenges in a previous proposal to co-opt the EPC "kill" command to achieve tag authentication. Our research fills a vacuum of experimentally grounded evaluation of and guidance for security applications for EPC tags not just in identity documents, but more broadly in the authentication of objects and people.
10:00-10:15: A Spotlight on Security and Privacy Risks with Future Household Robots: Attacks and Lessons, Tamara Denning(PDF Slides)

Future homes will be populated with large numbers of robots with diverse functionalities, ranging from chore robots to elder care robots to entertainment robots. While household robots will offer numerous benefits, they also have the potential to introduce new security and privacy vulnerabilities into the home. We experimentally analyzed three of today's household robots for security and privacy vulnerabilities — the WowWee Rovio, the Erector Spykee, and the WowWee RoboSapien V2 — and used the results to identify key lessons and challenges for securing future household robots.
10:15-10:30: Vanish: Increasing Data Privacy with Self-Destructing Data, Roxana Geambasu

Computing and communicating through the Web makes it virtually impossible to leave the past behind. College Facebook posts or pictures can resurface during a job interview; a lost or stolen laptop can expose personal photos or messages; or a legal investigation can subpoena the entire contents of a home or work computer, uncovering incriminating or just embarrassing details from the past. This talk presents Vanish, a research system that seeks to protect the privacy of past, archived data — such as copies of emails maintained by an email provider — against accidental, malicious, and legal attacks. Specifically, Vanish aims to ensure that all copies of certain data become unreadable after a user-specified time, without any specific action on the part of a user, and even if an attacker obtains both a cached copy of that data and the user's cryptographic keys and passwords. Vanish addresses this goal via a novel integration of cryptographic techniques with global-scale, peer-to-peer distributed hash tables (DHTs). The talk introduces Vanish and presents some of the challenges of building it on top of today's DHTs.

Making Parallelism Safe for Everyone, Preventing Threads from Going Wild (CSE 403)

9:40-9:45: Introduction and Overview: Susan Eggers
9:45-10:00: Deterministic Multiprocessing, Joseph Devietti(PDF Slides)

Current shared memory multicore and multiprocessor systems are nondeterministic. Each time these systems execute a multithreaded application, even if supplied with the same input, they can produce a different output. This frustrates debugging and limits the ability to properly test multithreaded code, becoming a major stumbling block to the much-needed widespread adoption of parallel programming.

In this talk, we make the case for fully deterministic shared memory multiprocessing (DMP). The behavior of an arbitrary multithreaded program on a DMP system is only a function of its inputs. The core idea is to make inter-thread communication fully deterministic. Previous approaches to coping with nondeterminism in multithreaded programs have focused on replay, a technique useful only for debugging. In contrast, while DMP systems are directly useful for debugging by offering repeatability by default, we argue that parallel programs should execute deterministically in the ?eld as well. This has the potential to make testing more assuring and increase the reliability of deployed multithreaded software. We propose a range of approaches to enforcing determinism and discuss their implementation trade-offs. We show that determinism can be provided with little performance cost using our architecture proposals on future hardware, and that software-only approaches can be utilized on existing systems.
10:00-10:15: Bug Finding with Big Fancy Graphs, Brandon Lucia

Incorrect thread synchronization often leads to concurrency bugs that manifest nondeterministically and are difficult to detect and fix. Past work on detecting concurrency bugs has addressed the general problem in an ad-hoc fashion, focusing mostly on data races and atomicity violations.

Using graphs to represent a multithreaded program execution is very natural, nodes represent static instructions and edges represent communication via shared memory. In this paper we make the fundamental observation that such basic context-oblivious graphs do not encode enough information to enable accurate bug detection. We propose context-aware communication graphs, a new kind of communication graph that encodes global ordering information by embedding communication contexts. We then build Bugaboo, a simple and generic framework that accurately detects complex concurrency bugs. Our framework collects communication graphs from multiple executions and uses invariant-based techniques to detect anomalies in the graphs.

We built two versions of Bugaboo: BB-SW, which is fully implemented in software but suffers from significant slowdowns; and BB-HW, which relies on custom architecture support but has negligible performance degradation. BB-HW requires modest extensions to a commodity multicore processor and can be used in deployment settings. We evaluate both versions using applications such as MySQL, Apache, PARSEC, and several others.
10:15-10:30: Finding Concurrency in Large Windows of Execution, Jacob Nelson

Dynamically finding parallelism in sequential applications with hardware mechanisms is typically limited to short windows of execution due to resource limitations. This is because precisely keeping track of memory dependences is too expensive. We propose trading off precision for efficiency. The key idea is to encode a superset of the dependences in a way that saves storage and makes traversal for concurrency discovery efficient. We propose a hardware structure that stores an imprecise map of memory addresses to timestamps that summarizes dependences on-the-fly. Our evaluation with SPEC2006 applications shows that they lead to little imprecision in the dependence graph and find similar parallelization opportunities as an exact approach.

New Approaches to Everyday Interaction with Machine Learning (CSE 691)

9:40-9:45: Introduction and Overview: James Fogarty
10:00-10:15: Learning to Generalize for Complex Selection Tasks, Alan Ritter(PDF Slides)

Selection tasks are common in modern computer interfaces: we are often required to select a set of files, emails, data entries, and the like. File and data browsers have sorting and block selection facilities to make these tasks easier, but for complex selections there is little to aid the user without writing complex search queries. We propose an interactive machine learning solution to this problem called “smart selection,” in which the user selects and deselects items as inputs to a selection classifier which attempts at each step to correctly generalize to the user’s target state. Furthermore, we take advantage of our data on how users perform selection tasks over many sessions, and use it to train a label regressor that models their generalization behavior: we call this process learning to generalize. We then combine the user’s explicit labels as well the label regressor outputs in the selection classifier to predict the user’s desired selections. We show that the selection classifier alone takes dramatically fewer mouse clicks than the standard file browser, and when used in conjunction with the label regressor, the predictions of the classifier are significantly more accurate with respect to the target selection state.
9:45-10:00: Designing for End-User Interactive Concept Learning, James Fogarty(PDF Slides)

End-user interactive machine learning is the process by which people define concepts that can be recognized by an intelligent system, providing the building blocks needed for configuring complex automated behaviors on large unstructured data sets in pursuit of their everyday goals. People define concepts by iteratively providing examples of objects matching a desired concept and inspecting feedback presented by the system to illustrate its current understanding. Defining concepts via examples enables end-user personalization of intelligent systems while circumventing the interpretation or manipulation of low-level computational representations. Recent work has shown that we can create end-user interactive machine learning systems for specific applications. In this research, we explore the question: how should we design effective end-user interactive machine learning systems?
10:15-10:30: Reform the Internet with Machine Learning, Michael Toomim(PDF Slides)

Fellow citizens, some put forth that the Internet is a place for freedom of information and ideas. But are you really free from the webmasters who control what you can and cannot do on websites?

We present a technology, powered by Support Vector Machines and Dynamic Programming, for a new Internet. An Internet that leverages its 1.4 billion end users to improve its interface. An Internet where a Democracy adds features to Google, Facebook, and your bank's website. An Internet where you, not Mark Zuckerberg, control how you view your friends. An Internet that breaks the iron grip of the webmaster, and lets programmers and end users surf, hand-in-hand, mouse-in-keyboard, to a better tomorrow.com.

Session II

Better Software Development (CSE 305)

10:40-10:45: Introduction and Overview: Michael Ernst
10:45-11:00: Detecting and preventing bugs with pluggable type-checking, Michael Ernst(PDF Slides)

Are you tired of null pointer exceptions, unintended side effects, SQL injections, concurrency errors, mistaken equality tests, and other run-time errors that appear during testing or in the field? A pluggable type system can guarantee the absence of these errors, and many more.

We have built a set of pluggable type checkers for Java (though the ideas extend to other languages). The type checkers are easy to use and have found many errors in real programs. Java 7 will contain syntactic support for type annotations, but in the meanwhile your code remains backward-compatible with all versions of Java.
We will also describe the Checker Framework, which enables a programmer to write an annotation processor that checks custom properties of your code and prevents even more bugs.
The type-checkers and the Checker Framework are publicly available at http://types.cs.washington.edu/jsr308/
11:00-11:15: Conflicts of Interest: Making Web Browser Extensions Work Better Together, Ben Lerner(PDF Slides)

An extensible system permits multiple third-parties to add to, revise, or fundamentally alter the functionality provided by the base system. As the capabilities of extensions grow, so too does the likelihood and severity of conflicts among extensions. Working with these systems thus requires a solid understanding of the extensibility model they provide: what exactly are extensions capable of doing, both to the underlying system and to each other, and what can go wrong when extensions interact? Recently, a new extensible system has grown in importance: the web browser. Browser vendors have experimented with various extensibility mechanisms, but little systematic work has been done to understand the design space. In this talk, I'll give several examples of popular kinds of browser extensions, the ways in which they conflict with each other, and report on language design work-in-progress trying to improve the way these extensions are written.
11:15-11:30: Exposing Opaque Changes by Contrasting Static and Dynamic Analyses, Reid Holmes(PDF Slides)

Programmers usually modify a program intending to fix a bug or to add a new feature. Unfortunately, it can be hard to predict and comprehend the runtime effects of these changes. We propose an approach that identifies when a program change inconsistently alters the source-behaviour relationship of a system. By contrasting static and dynamic dependence graphs from two program versions, our approach can concisely identify specific program elements and dependencies that behave differently at runtime than their corresponding static change might suggest. For example, a consistent change would occur if a developer added a method call to their code and this method call was executed at runtime. In contrast, examples of inconsistent changes are adding a method call to the source code but the call not executing at run time, or a method call that stops executing dynamically without a corresponding static change. Inconsistent changes are likely to be of interest to a developer, who can use them to identify errors, untested code, or changes in a program's operating environment.

Data Intensive Scientific Analytics (CSE 403)

10:40-10:45: Introduction and Overview: Magdalena Balazinska
10:45-11:00: The eScience Institute: Commoditizing Data-Intensive Scentific Analysis, Bill Howe(PDF Slides)

Science is becoming data-intensive. New discoveries are increasingly made through semi-automatic analysis of massive datasets, requiring new research in, and ubiquitous access to, technologies and techniques related to databases, visualization, data mining, machine learning, and cluster/cloud computing. In this talk, I will introduce the eScience Institute, a state-funded group within the University of Washington created to provide the "intellectual infrastructure" needed for data-intensive science. I will also discuss several case studies in bridging the gap between the frontiers of commercial and research data analysis systems and the daily needs of domain scientists in Microbiology, Oceanography, and Astronomy.
11:00-11:15: Scientific Data Management in the Cloud: Overview of the Nuage Project, Magdalena Balazinska(PDF Slides)

In the Nuage project, we are developing data management systems and techniques for enabling scientists to store, analyze, and share large volumes of data using cloud-computing environments. In this talk, we will present the vision behind the Nuage project, the techniques and systems that we are currently developing, and some of our preliminary results.
11:15-11:30: Parallax: A Progress Indicator for MapReduce Pipelines, Kristi Morton(PDF Slides)

In this talk, we will present our ongoing work developing Parallax, the first non-trivial, time-oriented progress indicator for parallel queries. We developed the current version of Parallax for Pig queries running in a Hadoop cluster, an environment that is a popular open-source parallel data-processing engine under active development. As an initial step, we focused on Pig Latin queries that compile into a series of MapReduce jobs. In this talk, we will present the techniques behind Parallax and our preliminary experimental results.

Sustainability and Information Technology (CSE 691)

10:40-10:45: Introduction and Overview: Ed Lazowska
10:45-11:00: OneBusAway: Improving the Usability of Public Transit, Brian Ferris(PDF Slides)

Public transit is an important tool for those looking to ease their commutes, reduce their car dependence, or perhaps minimize their environmental impact. Unfortunately, the usability of transit systems often leaves much to be desired, to the point of deterring new riders. Tools on the web and mobile devices are increasingly being used to help tame confusing transit systems. OneBusAway is one such set of tools, providing access to real-time transit information for Seattle bus riders through a variety of interfaces, including web (http://onebusaway.org), phone, SMS, and mobile devices. We describe our current system and discuss current and planned research that builds on OneBusAway to use increasingly-powerful smart mobile devices to provide location and context-aware tools for navigating transit systems.
11:00-11:15: Sensing and Feedback of Everyday Activities to Promote Environmentally Sustainable Behaviors, Jon Froehlich(Zip File)

Everyday human behaviors such as home energy consumption and personal travel can have a significant impact on the environment. Feedback has been shown to be one the most effective strategies in reducing wasteful consumption, particularly for home energy, yet there has been a lack of research in other domains. With advances in sensing technologies and computerized displays, we now have the potential to provide personalized feedback in real time for a variety of environmentally impactful activities. In this talk, I describe both completed and proposed work to further the design, tools and techniques of sensing and feedback solutions of everyday activities to promote pro-environmental behaviors.
11:15-11:30: Infrastructure Mediated Sensing: Low Cost, Single Point Sensing of Gas and Electrical Events, Sidhant Gupta(PDF Slides)

Infrastructure Mediated Sensing (IMS) leverages the existing infrastructure of a home to detect the transduction of events. Such sensors are easy to install, economical and reduces maintenance barriers to adoption by reducing the overall cost and complexity of deploying and maintaining the sensing infrastructure. In this talk, I describe two example IMS projects geared towards sustainability sensing. First, Powerline Event Detection, uses a single plug-in module to monitor electrical noise generated by a device on the powerline in order to determine its identity. Second, GasSense, which provides the unique capability of sensing both the individual appliance at which gas is currently being consumed as well as an estimate of the amount of gas flow, by analyzing the acoustic response of a home's gas regulator.

Session III

Technology for Under-Served Regions (CSE 305)

11:40-11:45: Introduction and Overview: Carl Hartung
11:45-12:00: Digital Study Hall Evaluation Study, Amit Saxena

The Digital StudyHall (DSH) project aims to improve education for students in rural and slum schools in India. The researchers video record classes taught by experienced teachers, distribute these videos over "Postmannet" (effected by DVDs sent in the postal system), collect them in a large distributed database, and distribute them on DVDs to poor rural and slum schools. Education experts train local teachers to mediate the video lessons for use in their teaching. For more details, please see:

http://dsh.cs.washington.edu/

The University of Washington and the StudyHall Educational Foundation are currently undertaking a two year mixed methods study of Digital StudyHall in 11 village schools in the the Chinhat Development Block, Lucknow, India. The study is supported by the National Science Foundation. In the presentation, I will share our preliminary analysis of the teaching and learning outcomes as well as the implementation of the DSH system in these schools.
12:00-12:15: Open Data Kit, Carl Hartung(PDF Slides)

In developing regions, the lack of reliable infrastructure, ubiquitous connectivity, and adequate expertise makes data collection difficult. Currently, most organizations collect data on paper forms despite inefficiencies such as the physical collection of completed forms, data transcription errors, and long delays before data can be aggregated and put to use. To help solve these issues we present Open Data Kit (ODK), a modular suite of tools that allows users to use mobile phones to collect their own rich data including text, GPS location, images, audio, video, and barcodes, and upload their data to scalable cloud storage. ODK allows users to collect, own, visualize, and share data without the difficulties of setting up and maintaining servers. In this talk I’ll describe the system’s design and discuss how organizations have begun using ODK for projects spanning from community health counseling in Kenya to deforestation monitoring in the Amazon.
12:15-12:30: Building a Grassroots Transportation Information System, Ruth Anderson

We present an SMS-based system for providing transit information based solely on existing cellular and GPS networks. The aim is to permit the development of information services that do not rely on a central authority or complex web hosting to deploy new applications. We relate our experience in developing our system and applying it to the network of privately-run marshrutka buses in Bishkek, Kyrgyzstan. A custom designed GPS-GSM unit is placed on a bus and users can query our server over SMS with their own non-GPS-enabled cell phones. Our system is a grassroots solution to the persistent lack of transport information in developing countries.

The Revolution in Computational Biology (CSE 403)

11:40-11:45: Introduction and Overview: Larry Ruzzo(PDF Slides)
11:45-12:00: Decoding gene regulation based on DNA accessibility data, Xiaoyu Chen(PDF Slides)

The binding of transcription factors in the context of DNA accessibility is critical to understanding gene regulation mechanism and genome function. DNaseI digestion coupled with massively parallel sequencing (digital genomic footprinting) makes it feasible to identify protein binding footprints with high resolution on a genome-wide scale. However, accurately inferring the precise locations of these footprints remains a challenging computational problem. In this talk, I will present approaches for identifying and assigning statistical confidence estimates to protein binding footprints from digital genomic footprinting data.
12:00-12:15: Nucleic acid circuits for programming biology, Georg Seelig (PDF Slides)

Biological organisms process information and control cellular behavior using sophisticated biochemical circuits. In order to engineer circuits of similar complexity and thus “program” biology we need to develop molecular tools for (i) accurate detection of complex cellular states and (ii) control and modulation of those states.

As a first step towards addressing these challenges we have recently implemented multi-layered nucleic acid logic circuits that function reliably in an aqueous, cell-free environment. Hybridization reactions provide the free energy to move computation forward and Watson-Crick base pairing between modular recognition domains determines the connectivity of circuit components. Circuits embody key design principles of digital electronics: logic, cascading, restoration, fan-in/fan-out and modularity. Our approach demonstrates that adherence to these principles provides a viable path towards the de novo construction of biochemical reaction networks.

Working in vitro it is possible to rapidly scale up circuit complexity, to explore new modes of molecular computation, and to quantitatively characterize circuit behavior without degradation of DNA or RNA that may occur in a cell. These experiments represent a foundation for developing in vivo computation using related molecular schemes.
12:15-12:30: Synthetic Biology and Feedback Control, Eric Klavins(PDF Slides)

Feedback is ubiquitous in natural processes. For example, blood pressure, blood glucose levels, and body temperature all all regulated via feedback by molecular mechanisms in cells. Feedback is also ubiquitous in engineered systems, from aircraft autopilots to TCP/IP control of Internet congestion. But how to apply engineering methods to understanding a designing feedback in biology is a relatively new. In this talk, I will describe how synthetic biologists are learning to build feedback control devices out of molecules in cells, and describe my lab's work in a feedback regulated transcriptional network.

Reconstructing the World from Web Images (CSE 691)

11:40-11:45: Introduction and Overview: Zoran Popović
11:45-12:00: Building Rome in a Day, Sameer Agarwal(PDF Slides, Quicktime)

Entering the search term "Rome" on Flickr.com returns more than two million photos. This collection represents an increasingly complete photographic record of the city, capturing every popular site, facade, interior, fountain, sculpture, painting, cafe, and so forth. It also offers us an unprecedented opportunity to richly capture, explore and study the three dimensional shape of the city.

In this talk, I will presents the first system capable of city-scale reconstruction from images harvested from the web. Our system uses a collection of novel parallel distributed matching and reconstruction algorithms to scale gracefully with both the size of the problem and the amount of available computation.

I will show three dimensional models that are up to two orders of magnitude larger than the next largest results reported in the literature. Furthermore, our system enables reconstruction from data sets of 150,000 images in less than a day.
12:00-12:15: Reconstructing Building Interiors from Images, Yasutaka Furukawa(PDF Slides)

3D reconstruction and visualization of architectural scenes are getting more and more popular with large scale efforts underway to recover models of cities at a global scale (e.g., Google Earth, Virtual Earth). However, relatively little attention has been paid to the reconstruction and visualization of indoor scenes such as our offices or houses. The reconstruction of indoor environments from photographs is particularly challenging due to texture-poor planar surfaces such as uniformly-painted walls and complicated visibility analysis. In this talk, I will present a fully automatic system that reconstructs a 3D model of architectural scenes (mostly indoors) from photographs, then provides interactive virtual exploration of a reconstructed scene. Our examples include a single kitchen room, an entire floor of a house, and Henry Art Gallery in our campus.
12:15-12:30: Snapshots from inside the CSE Animation Capstone, Barbara Mones

Our CSE Animation Capstone series of courses culminates in an annual digital short film based on a story written by a UW student. Our students begin the Capstone with no experience in animation or filmmaking. The capstone class experience is designed to be both collaborative and interdisciplinary. We will bring you inside the studio to give you an inside view of our most recent products and the process we use to create them. We will also focus on several innovative approaches we've developed in house to improve the student experience and our production pipeline here at UW.