Break-out Topics and Talks | Paul G. Allen School of Computer Science & Engineering

Thursday, November 3, 2005

SESSION I 9:40 - 10:30	Scalable Location Systems CSE 305	Proof in Computing CSE 403	Information Extraction: Discovering Relations CSE 691
SESSION II 10:40 - 11:30	A Gazillion Transistors CSE 305	Computing in the Biology Revolution CSE 403	The Internet of Today and Tomorrow CSE 691
SESSION III 11:40 - 12:30	Technology for Teaching and Learning CSE 305	Technologies for Personal Health CSE 403	Animation Research and Production CSE 691

Session I

Scalable Location Systems (CSE 305)

9:40-9:45: Introduction and Overview: Dieter Fox

9:45-10:00: Large-Scale Localization From WiFi Signals, Julie Letchner, (Zipped files, includes video)

Knowledge of location is the most fundamental component of a context-aware system that can understand and respond to its environment. For example, a location-aware cell phone might automatically switch off when it enters a lecture hall, or prompt with directions when it detects that its user has become lost. We present a system that determines location both indoors (~2 meters error) and outdoors (~10 meters error) from wireless internet signals. We demonstrate that the system requires little training and improves with increased usage, and discuss current research directions building upon the location-inference system.
10:00-10:15: Location-Based Activity Recognition, Lin Liao, (Powerpoint Slides)

Learning patterns of human behavior from sensor data is extremely important for high level activity inference. We show how to extract and label a person's activities and significant places from traces of GPS data. In contrast to existing techniques, our approach simultaneously detects and classifies the significant locations of a person and takes the high level context into account. We present experiments that show significant improvements over existing techniques.
10:15-10:30: Highly Accurate Room-Level Location With No Extra Hardware , Jong Hee Kang, (PDF Slides)

We propose a system that uses the wireless networking and microphone interfaces of mobile devices to determine location to room-level accuracy. The wireless network provides a synchronizing pulse along with information about the room. This is accompanied by an ultrasound beacon that allows us to resolve locations to the confines of a physical room (since audio is mostly bounded by walls). We generate the wireless data and ultrasound pulses from the existing PCs in each room; a PDA carried by a user listens for both signals. Thus, our approach does not require special hardware. We do not use ultrasound to send data. As a result we dramatically reduce the computational burden on the mobile device while also decreasing the latency of location resolution. Our results indicate that (i) ultrasound detection is robust even in noisy environments with many reflective surfaces; and (ii) that we can determine the correct room within a couple of seconds with high probability even when the ultrasound emitting PCs are not synchronized.

Proof in Computing (CSE 403)

9:40-9:45: Introduction and Overview: Richard Ladner

9:45-10:00: How Well Can Priceline Sell Airline Tickets?, Ning Chen, (Powerpoint Slides)

We consider the following simple model for how Priceline sets airline prices. Assume Priceline wants to sell tickets for a single flight (which are assumed to have unlimited supply) over (discrete) time. Each bidder submits a triple (s; e; b) where s is the time when the bidder will arrive, e is the time when the bidder will leave and b is the maximum amount the bidder is willing to pay for a ticket (every bidder is interested in only one ticket). On each day t, Priceline sets a price p(t) and every bidder buys a ticket on the first day t (at price p(t)) such that p(t) is less than or equal to its bid value and t is between its arrival and departure time. The goal for the Priceline is to set prices so as to maximize its revenue. We also consider a related model studied by Guruswami et al. (SODA'05), where each bidder buys the ticket at the minimum price p(t0) (provided it is at most b) where t0 is between its arrival and departure time. We study the above optimization problems in both the offine and online settings. We show that the offine algorithm can be solved exactly in polynomial time. We also give upper and lower bounds for online algorithms (both deterministic and randomized) for this problem.
10:00-10:15: Ordering by Wins Gives a Good Tournament Ranking, Atri Rudra (Powerpoint Slides)

Consider a sports tournament where some number of players play each other. After all the games are completed, one would like to rank the players with as few inconsistencies as possible. By an inconsistency we mean a higher ranked player actually lost to a lower ranked player. A natural way to generate such a ranking is to rank the players according to their number of wins (with ties broken in some manner). In this work we prove that the above scheme always produces a ranking that has at most 5 times as many inconsistencies as the optimal ranking. Further, the choice of the tie breaking mechanism does not seem to matter. Our results also have applications in the related problem of rank aggregation. This is joint work with Don Coppersmith and Lisa Fleischer.
10:15-10:30: Cache Efficient Dynamic Programming, Cary Cherng, (Powerpoint Slides)

New cache-oblivious and cache-aware algorithms for simple dynamic programming based on Valiant's context-free language recognition algorithm are presented. Studies show that for large inputs the cache-oblivious and cache-aware dynamic programming algorithms are significantly faster than the standard dynamic programming algorithm.

Information Extraction: Discovering Relations (CSE 691)

9:40-9:45: Introduction and Overview: Stephen Soderland

9:45-10:00: KnowItNow: Fast, Scalable Information Extraction from the Web, Michael Cafarella

Numerous NLP applications rely on search engines, both to extract information from and to compute statistics over the Web corpus. But search engines often limit the number of available queries. As a result, query-intensive NLP applications such as Information Extraction distribute their query load over several days, making IE a slow, offline process. We introduce a novel architecture for IE that obviates queries to commericial search engines. The architecture is embodied in a system called KnowItNow that performs high-precision IE in minutes instead of days.
10:00-10:15: Extracting Product Features and Opinions from Reviews, Ana-Maria Popescu

Consumers are often forced to wade through many on-line reviews in order to make an informed product choice. This paper introduces OPINE, an unsupervised information-extraction system which mines reviews in order to build a model of important product features, their evaluation by reviewers, and their relative quality across products.

Compared to previous work, OPINE achieves 22% higher precision (with only 3% lower recall) on the feature extraction task. OPINE's novel use of relaxation labeling for finding the semantic orientation of words in context leads to strong performance on the tasks of finding opinion phrases and their polarity.
10:15-10:30: Managing Imprecisions in Data, Nilesh Dalvi

The problem of managing imprecisions in databases has received renewed interest with the emergence of several recent applications: exploratory queries in databases, novel IR-style approaches to data integration, querying information extracted from the Web, queries over sensor networks, data acquisition, querying data sources that violate integrity constraints, controlling information disclosure in data exchange, and reasoning about privacy breaches in data mining. In this talk, I will illustrate through several examples that a probabilistic database can provide a unifying framework to support various kinds of imprecisions. Although various kinds of probabilistic databases have studied in the past, a general system that can support these diverse range of applications has so far been elusive. In this talk, I will present some of our recent results and techniques towards an efficient implementation of such a system.

Session II

A Gazillion Transistors (CSE 305)

10:40-10:45: Introduction and Overview: Mark Oskin

10:45-11:00: The WaveScalar Architecture, Steven Swanson, (Powerpoint Slides)

Silicon technology will continue to provide an exponential increase in the availability of raw transistors. Effectively translating this resource into application performance, however, is an open challenge that conventional superscalar designs will not be able to meet. We present WaveScalar as a scalable alternative to conventional designs. WaveScalar is a dataflow instruction set and execution model designed for scalable, low-complexity/high-performance processors. Unlike previous dataflow machines, WaveScalar can efficiently provide the sequential semantics imperative languages require. To allow programmers to easily express parallelism, WaveScalar supports pthread-style, coarse-grain multithreading and dataflow-style, fine-grain threading. In addition, it permits blending the two styles within an application or even a single function. To execute WaveScalar programs, we have designed a scalable, tile- based processor architecture called the WaveCache. As a program executes, the WaveCache maps the program's instructions onto its array of processing elements (PEs). The instructions remain at their processing elements for many invocations, and as the working set of instructions changes, the WaveCache removes unused instructions and maps new instructions in their place. The instructions communicate directly with one-another over a scalable, hierarchical on-chip interconnect, obviating the need for long wires and broadcast communication. For single-threaded applications, the WaveCache achieves performance on par with conventional processors, but in less area. For coarse- grain threaded applications the WaveCache achieves nearly linear speedup with up to 64 threads and can sustain 7-14 multiply- accumulates per cycle on fine-grain threaded versions of well-known kernels. Finally, we apply both styles of threading to equake from spec2000 and speed it up by 9x compared to the serial version.
11:00-11:15: The WaveScalar Microarchitecture and Prototype, Andrew Putnam, (Powerpoint Slides)

This talk flushes out the microarchitectural details of WaveScalar, and shows one way to implement the architecture. Specifically, we answer the question of whether a WaveScalar processor can really be built to rival conventional superscalars in size, power consumption, and performance. We also include some exploration of the design space to find the best design point for this architecture. The result is an RTL model that requires approximately 252 mm^2 and operates at a clock cycle of 20 FO4. The talk also deals with the broader issue of how academic research teams can still contribute to computer architecture research without huge budgets and expensive fabrication runs. We introduce our 36 square foot prototype and discuss the new avenues that the prototype will open for WaveScalar and dataflow research.
11:15-11:30: Programming Hybrid Sequential/MicroParallel Computers, Brian Van Essen (PDF Slides)

Reconfigurable computing architectures provide large numbers of tightly integrated processing elements to achieve the performance of custom hardware without sacrificing reprogrammability. These "micro-parallel" architectures have yet to be adopted for general computation for two main reasons: First, they do not execute sequential code efficiently. Second, writing programs for micro-parallel execution is an arcane art far removed from the experience of most programmers. The first problem can be addressed by a hybrid processor model that integrates a sequential processor and a micro-parallel engine. This talk will describe a new "type architecture" that we are developing to address the challenge of programming these hybrid architectures. This abstract model extends the familiar von Neumann model to include the micro-parallel engine in hybrid architectures. We will describe by example how this model can be used by programmers to simplify the task of writing efficient micro-parallel algorithms.

Computing in the Biology Revolution (CSE 403)

10:40-10:45: Introduction and Overview: Larry Ruzzo and Martin Tompa

10:45-11:00: Making Sense of the DNA Sequences of Dozens of Vertebrates, Amol Prakash, (Powerpoint Slides)

The number of vertebrates whose DNA sequences are completely known has already reached a dozen, and is growing at a fast pace. One aim of this process is to aid in the understanding of every character of the human genome by giving its evolutionary history. Web browsers such as those at UCSC and Ensembl help this process, by aligning the DNA sequences of all the vertebrates and annotating the various functional regions of the genome. Using a recently developed tool called SigMA (Significant of Multiple Alignments), we try to analyse the quality of such alignments. We believe this is an important step to be done, before we try to use these alignments to make sense of the DNA sequences of the dozen vertebrates.
11:00-11:15: A comprehensive Framework for High Quality Genome-Wide Search of Non-Coding RNA, Zizhen Yao (PDF Slides)

Non-coding RNAs (ncRNAs) are functional RNA molecules that do not code for proteins. In the last 5 years, dramatic discoveries have greatly expanded both the number of known ncRNAs and the breadth of their biological roles. The increasing biological research activities in this area creates a pressing need for tools for automatic discovery, characterization and annotation of ncRNAs. Classical methods are either inaccurate or slow. To address the need for genome scale search of ncRNA, our lab has designed an effective pipeline to assist in key steps in discovery of novel ncRNAs, which includes two major components: automatic conserved RNA motif identification and RNA homolog search. The two components can be integrated to iteratively refine the RNA model, and our experiments demonstrate that our automatically predicted models are comparable to high quality hand-curated models. This framework has been used to search several bacterial genomes, and we have succeeded in re-discovering a large number of known ncRNAs, as well as locating a set of novel ncRNA candidates, which are still under investigation by our collaborators.
11:15-11:30: Toward a Multidimensional Robust Brain Computer Interface, Kai Miller, (Zipped files)

Using implanted electrocorticographic electrodes on the brain surface, we are able to measure electrical behavior at the surface of the brain. This electrical behavior may be translated in such a way that patients can consciously modify the electrical behavior through thought alone. By coupling these modifications to various output regimes, a robust method of thought control over real and virtual objects is attained.

The Internet of Today and Tomorrow (CSE 691)

10:40-10:45: Introduction and Overview: Steve Gribble

10:45-11:00: GENI: Global Environment for Network Investigations, Tom Anderson

In collaboration with the NSF, the networking and distributed systems research community is planning an initiative to explore the capabilities of a Future Internet that meets the demands of the 21st century. The initiative, called GENI, consists of both a research program and an experimental facility to support the research program. This talk will focus on the facility, unique among experimental platforms in that it is designed to support both research and deployment, effectively filling the gap between small-scale experiments in the lab, and mature technology that is ready for commercial deployment. We envision researchers using GENI to evaluate new network systems on a large-scale (global) system, learning from real workloads as users begin to exercise those systems, and hardening the successful systems as user adoption grows.
11:00-11:15: A Crawler-based Study of Spyware in the Web, Alexander Moshchuk, (Powerpoint Slides)

Malicious spyware has recently become one of the greatest threats to desktop security and integrity. We have built a measurement system to perform a large-scale study of spyware on the web and to determine the extent of spyware content in both executables and conventional web pages. In this talk, I will describe our methodology and present our current results, which analyze the density of spyware, the types of spyware threats, and the most dangerous web zones in which spyware is likely to be encountered.
11:15-11:30: Efficient Endpoint Congestion Control, Arvind Krishnamurthy, (Powerpoint Slides)

TCP is the dominant transport protocol in today's Internet. TCP's popularity and strength arises from its ability to adapt to a wide variety of environments without requiring any explicit network support. However, it is well-known that TCP performs poorly in many scenarios, including the common case when the network is uncongested, but also for very high bandwidth transfers, interactive applications, real-time applications, and wireless networks. To address these limitations, several commercial "box" products have been developed to better manage TCP traffic; these products sit inside the network at potential congestion points and spoof congestion signals back to the end host to control their sending rate. Of course, the problem with box solutions embedded in the network is that they are expensive, require separate administration, and they typically need to be deployed at every point of possible congestion. We have taken a radically different approach and have developed a solution that allows end hosts, without any special support from the network, to achieve near-optimal performance. Our approach, called the Probe Control Protocol (PCP), emulates network-based control by using explicit short probes to test and temporarily acquire available bandwidth. As an end host software solution, it can be quickly and universally deployed at much lower cost than existing commercial box solutions. Our experiments show that PCP, unlike TCP, achieves rapid startup, small queues, and low loss rates, and that the efficiency of our approach does not compromise eventual fairness and stability.

Session III

Technology for Teaching and Learning (CSE 305)

11:40-11:45: Introduction and Overview: Tom Anderson

11:45-12:00: Classroom Interaction with Tablet PCs, Ruth Anderson, (Powerpoint Slides)

In this session we will demonstrate Classroom Presenter, a Tablet PC based system that facilitates active and collaborative learning in the classroom. In our system, students are equipped with tablet computers and at various points during the lecture, are asked to solve a problem or respond to a question. Students respond by writing their solution on the tablet and submitting it wirelessly to the instructor. The instructor can view all student responses, select one or more to display to the class, and annotate responses with ink as they are being displayed. In this talk we will demonstrate the system and describe our initial experiences in the classroom.
12:00-12:15: Information Systems for the Developing World: Managing Information from the Grassroots, Tapan Parikh, (PDF Slides)

In this talk I will discuss my experiences designing, developing and deploying information systems in rural India. I have worked on making information more accessible (by designing a user interface that was usable by semi-literate users), and applications supporting sustainable rural development (such as a distributed multi-media workspace to share and catalog inventions by rural villagers). My current projects include CAM - a paper-based programming toolkit for camera-equipped mobile phones; and Mahakalasm MIS - an information system for village micro-finance cooperatives, that uses CAM to link paper-based processes to on-line data services. I will present some of the lessons I have learned in this work, and how that has culminated in a toolkit for facilitating several important rural information flows. My long-term objective is the development of low-cost, accessible information services providing economic opportunity and potential for expression in disconnected communities around the World.
12:15-12:30: The Digital Study Hall: An E-Learning System for Improving Basic Education in Third World Countries, Tom Anderson

Good primary education is one of the most crucial factors in combating extreme poverty. In this project, computer scientists and education experts collaborate to build a distance learning system that seeks to offer resource-starved village schools in rural India human and content resources comparable to the urban environments. To avoid retracing the missteps of earlier "wire-the-schools" projects, we follow two important principles: (1) cost realism, essential if we are to scale the system up to a significant number of villages, schools, and students; and (2) building systems that solve end-to-end education problems, beyond just providing connectivity. Our Digital Study Hall system is based on a unique approach leveraging the postal system, DVDs, robotically operated DVD publishers, long-distance ham radio transceivers, and short-range TV transmitters with radio controllers. An important goal of the system is to enable customized any-to-any communication and effective group learning, which may provide an ultimate solution to the scalability problem of the education system. We have started a live deployment of a prototype in India for use by students starting in July of 2005. One of the current key research topics is mediation-based pedagogy and training models that can simultaneously engage students and improve teachers' teaching skills.

Technologies for Personal Health (CSE 403)

11:40-11:45: Introduction and Overview: Gaetano Borriello

11:45-12:00: Overview of Research at UW CSE, Gaetano Borriello

12:00-12:15: Personal Sensors and Activity Inference, Jonathan Lester

Accurate recognition and tracking of human activities is an important goal of ubiquitous computing and of significant importance for the medical community. Recent advances in the multi-modal wearable sensors have enabled us to gather rich datasets of human activities. We will present an unobtrusive wearable platform we have developed to capture sensor readings, along with the machine learning techniques we use to classify this sensor. We are able to recognize 10 activities like standing, walking, walking up/down stairs, riding an elevator up/down, and riding a bicycle with better than 90% accuracy. We will also present some potential applications we are currently working on in conjunction with the UW Medical School.
12:15-12:30: Increasing Activity using Social Support Through Cell Phones, Kate Everitt, (Powerpoint Slides)

Obesity and overweight have become a global epidemic, with over one billion overweight adults worldwide (300+ million of whom are obese). Obesity is linked to several health problems and serious medical conditions. Medical experts agree that physical activity is critical to maintaining fitness, reducing weight, and improving health, yet, many people have difficulty increasing and maintaining physical activity in their everyday lives. Clinical studies have shown that health benefits can occur from simply increasing the number of steps one takes each day and that social support can motivate people to stay active. In this talk I will describe Houston, a prototype mobile phone application for encouraging physical activity by sharing step count with friends, and present four key design requirements of physical activity-enabling technologies that we derived from a three-week long in situ pilot study. The pilot study was conducted with three groups of women who wanted to increase their levels of physical activity.

Animation Research and Production (CSE 691)

11:40-11:45: Introduction and Overview: Brian Curless

11:45-12:00: Controller Synthesis for Bird Flight Animations, Jia-Chi Wu, (Powerpoint Slides)

Motor learning in animals is a complex process because of the redundancy and high dimensionality in their musculoskeletal structure and the nonlinear dynamics. When designing learning algorithms, we need to make sure that they scale up to the complexity seen in such a system and that they handle the high dimensionality efficiently. We will present our current progress on controller design for bird flight animations using reinforcement learning.
12:00-12:15: Crowd Flows, Seth Cooper, (Powerpoint Slides)

Understanding human motion is a difficult and important area of research in computer graphics. As models for individual human motion develop, research has begun to look into the motion of crowds of people, a development which introduces considerable new difficulties. We propose a new theory of crowd dynamics, adopting a novel energy minimization perspective on crowd flow rather than the traditional per-agent decision processes. This approach yields smooth, highly realistic simulations of large crowds in real-time. The speed of our technique makes it attractive for interactive applications such as video games.
12:15-12:30: "2 Old Chicks" and Other Strange Characters: Character Motion, Story and Digital Production, Barbara Mones

For the past eight years upper level computer science undergraduates have joined students from many other areas of the University ( English, Cinema Studies, DXArts,Music and Achitecture to name a few) in a collaborative and interdisciplinary adventure to design and produce a short animated film from initial concept to completion. We'll screen some of the best work we've produced to date and then provide an in depth review of several "shot breakdowns" from our newly completed film, 2 Old Chicks. This will give you an inside view into the challenges the groups face and an understanding of the production pipeline with an emphasis on how we iterate character motion to support our story.