Tuesday, February 27, 2001
SESSION I
9:50 - 10:50
Artificial Intelligence
HUB 200AB
Computational Biology
HUB 200C
Systems and Networking I
HUB 209A
SESSION II
11:00 - 12:00
Programming Languages
HUB 200AB
Computer-based Learning
HUB 200C
Systems and Networking II
HUB 209A
SESSION III
3:20 - 4:20
Databases
HUB 200AB
Architecture
HUB 200C
Graphics and Vision
HUB 209A
SESSION IV
4:30 - 5:30
Ubiquitous Computing
HUB 200AB
Theory
HUB 200C
Data Compression and Image Processing
HUB 209A

Session I

Artificial Intelligence: Pedro Domingos
  • Autonomous Mobile Robots  
    Dieter Fox
    "This talk provides an overview of research topics in mobile robotics. Probabilistic methods are well suited for dealing with the uncertainties involved in sensing and acting in the real world. We will present applications of probabilistic methods to fundamental problems in mobile robot navigation, such as position estimation and map building. Furthermore, we will discuss future application areas for mobile robots.”
  • Personalizing Web Sites for Wireless Devices
    Corin Anderson
    "The fastest growing community of web users is that of mobile visitors who browse with wireless PDAs, cell phones, and pagers. Unfortunately, most web sites today are optimized exclusively for desktop, broadband clients, and deliver content poorly suited for mobile devices --- devices that can display only a few lines of text, are on slow wireless network connections, and cannot run client-side programs or scripts.
    To best serve the needs of this growing community, we propose building web site personalizers that observe the behavior of web visitors and automatically customize and adapt web sites for each individual mobile visitor. In this talk, I will give an overview of web site personalization and our approach to personalization using search. Following this framework we have implemented a personalizer, Proteus, and I will present early results from a user study of Proteus. Initial results indicate Proteus saves mobile visitors considerable time and effort when seeking information 'on the go.'"
  • Mining High-Speed Data Streams
    Geoff Hulten
    "Many organizations today have more than very large databases; they have databases that grow without limit at a rate of several million records per day. Mining these continuous data streams brings unique opportunities, but also new challenges. This talk describes and evaluates VFDT, an anytime system that builds decision trees using constant memory and constant time per example. VFDT can incorporate tens of thousands of examples per second using off-the-shelf hardware. It uses Hoeffding bounds to guarantee that its output is asymptotically nearly identical to that of a conventional learner. We study VFDT's properties and demonstrate its utility through an extensive set of experiments on synthetic data. We apply VFDT to mining the continuous stream of Web access data from the whole University of Washington main campus.”
  • Adaptive Knowledge Representation and Reasoning Systems  
    Henry Kautz
    "A hallmark of intelligence is the ability to solve novel problems by making inferences from a general body of commonsense knowledge. However, general inference is hard: it is believed that there is no reasoning algorithm that is sound, complete, and efficient in all cases. In response to such general worst-case limits, research has turned to the problem of discovering characteristics that make particular problems hard or easy to solve. Our goal is to build reasoning systems that improve their performance by learning to adapt to these characteristics. I will illustrate these concepts by describing the link between problem hardness and phase transition phenomena, and our work toward building a phase transition aware satisfiability testing engine.”
Computational Biology: Rimli Sengupta
  • Algorithms for Polygenetic Footprinting
    Mathieu Blanchette
    "Phylogenetic footprinting is a technique that identifies regulatory elements by finding unusually well conserved regions in a set of orthologous non-coding DNA sequences from multiple species. We investigate the following formalization of the problem: given a collection of orthologous sequences related by a known phylogenetic tree, search for motifs (short stretches of DNA) that are well conserved, according to a parsimony measure, in most or all of the species. We present an exact algorithm for solving this problem, as well as a number of extensions that allow us to identify weaker signals. Our program successfully identifies a large number of known binding sites, as well as several highly conserved but uncharacterized regions.”
  • Genomic Sequence Annotation Through Random Projection  
    Jeremy Buhler
    "In annotating very long genomic DNA sequences, it is useful to compare these sequences to find high-scoring pairwise local alignments, which often represent conserved biological features. Comparisons may be performed either between two or more long sequences, such as orthologous stretches of human and mouse DNA, or within one sequence, as when finding repetitive elements in a chromosome. Typically, algorithms for discovering local alignments on a large scale rely on the presence of sufficiently long runs of matching nucleotides in any alignment of interest. We describe a new randomized algorithm, LSH-ALL-PAIRS, that lifts this restriction by directly finding ungapped alignments with up to a specified fraction of substitutions, regardless of whether they contain a long exact substring match. Our algorithm adapts the technique of locality-sensitive hashing, previously used to solve high-dimensional computational geometry problems, to compute local alignments. Our implementation of LSH-ALL-PAIRS scales to sequences tens of megabases in length while discovering similarities with as little as 60-70% nucleotide identity. The algorithm can be effectively parallelized by iterating its core randomized trial simultaneously on multiple processors.
  • Expression Analysis of Barrett's Epithelium and Normal Gastrointestinal Tissues: a Computational Perspective
    Ka Yee Yeung
    "Barrett's esophagus is a premalignant condition that arises due to chronic acid reflux in which the normal squamous epithelium of the esophagus is replaced by a metaplastic columnar epithelium. A fundamental question is the distinction between neoplastic Barrett's epithelium (BE) and surrounding normal tissues of the upper gastrointestinal tract. For example, although it arises in the esophagus, BE more closely resembles the epithelium of the duodenum at the histologic level. Therefore, we compared the transcriptional profile of BE to those of normal upper gastrointestinal tissues, including gastric epithelium, squamous epithelium of the esophagus and duodenal epithelium. Endoscopic biopsies from each tissue were collected from a series of patients during routine surveillance. Poly A+ RNA was prepared from pooled samples (2-4 patients/pool) of BE (4 pools), esophageal squamous epithelium (4 pools), gastric (3 pools) and duodenum (3 pools) and used to interrogate Affymetrix HU6800 and FL6800 chips. We found no difference in the correlation coefficients between neoplastic BE and each of the normal tissues. In addition, we searched for tissue-specific patterns of gene expression in the normal tissues and the neoplastic BE. We compared the performance of several clustering algorithms, and our results suggest that the data set consists of approximately 8 clusters, and the best performance were obtained using CAST and k-means. From this analysis, we applied CAST to identify 8 clusters, among which some clusters were specific for squamous epithelium (203 genes), duodenum (211 genes), gastric (105 genes), and BE (36 genes)."  (produced in collaboration with Michael T. Barrett, Jeff Delrow, Patricia L. Blount, Li Hsu, Walter L. Ruzzo, Brian J. Reid, Peter S. Rabinovitch)
Systems and Networking I: Steve Gribble
  • A Measurement Study of Gnutella
    Stefan Saroiu
    "This talk describes our efforts in validating and monitoring the behavior of the Gnutella peer-to-peer system. We present a detailed study of the characteristics of the Gnutella file-sharing system and its network protocol. We start by first identifying a set of explicit and implicit design decisions made in Gnutella. For each such design principle we devise experiments that measure and observe their validity in practice. We conclude that many of Gnutella's design decisions do not perform well in practice and there are many valuable lessons to be learned in designing future P2P file-sharing systems.”
  • Characterizing Internet Congestion using End-to-End Measurements  
    Andy Collins
    "We are investigating the time-series behavior of queuing and congestion in the Internet. How fast do congestion episodes begin and how long do they last? Is it possible to predict future congestion from past behavior, and if so over what timescales and with what accuracy? We use a custom, TCP-based tool to collect closely-spaced latency data between our measurement sites and arbitrary Internet servers, and analyze this data to show predictability for both queuing delay and packet losses. We believe that this data has important implications for the design of future congestion control protocols.”
  • Halting Runaway Network Protocols with Icarus  
    Andrew Whitaker
    "Deploying new protocols is currently risky because there is no way to bound the resource consumption of incorrect protocols. Icarus adds a containment layer to the protocol stack that prevents runaway protocols from damaging the network. As a proof-of-concept, we have implemented Icarus inside the ANTS active network toolkit.”

Session II

Programming Languages: Larry Snyder
  • Cecil: More Expressive OO Typechecking for More Reliable OO Systems
    Vassily Litvinov
    "Increasing expressiveness of a type system allows programmers to include more specification with the code. The typechecker will ensure continued conformance to such specification as software evolves, a task especially challenging for large software systems.
    We are working on an expressive type system for OO languages. It is based on constraint-bound polymorphism and supports, for example, conditional subtyping and multi-methods. We have successfully applied this type system to the Vortex optimizing compiler infrastructure of 130,000 lines of Cecil code.”
  • Array Language Support for Parallel Sparse Computation
    Brad Chamberlain
    "Specifying parallel sparse computations has traditionally involved creating and managing data structures by hand to represent the arrays' sparsity patterns. This results in code which is confusing for readers to understand and for the compiler to optimize. In this work, we describe extensions to the ZPL parallel programming language which support sparse computation using the traditional dense syntax, leaving the details of implementing parallel sparsity to the compiler. We use a unique sparse representation that supports general array operators, but which can automatically be optimized to simpler formats such as Compressed Row Storage (CRS). We compare the clarity, performance, and memory requirements of our approach to the hand-coded NAS CG and MG benchmarks.”
  • Automatic Elimination of Redundant Array Subexpressions
    Steve Deitz
    "We introduce a new compiler optimization called Array Sub-expression Elimination (ASE) that is critical to the performance of an important class of computations known as stencils. A stencil is a stylized matrix computation in which a group of neighboring data elements are combined in the form of a sum of products to produce a new value. We show that it is both reasonable and desirable to express a stencil in the simplest, most concise way and rely on the compiler to optimize what is otherwise done in an ad hoc, error-prone manner: by hand.”
Computer-based Learning: Steve Tanimoto
  • Educational Assessment with the DIAGNOSER System  
    Earl Hunt
  • The Conversational Authoring Paradigm and its Realization in a Pedagogical Agent Architecture
    Jeremy Baer
    "Traditionally, programming and authoring environments have been viewed as passive tools. The programmer/author provides the initiative when interacting with the system. This paradigm automatically allows a maximum of flexibility for the author in the programming process, but places the majority of the burden on the author. The author must know what the building blocks of the system are, how to create them, and how to compose them in meaningful ways to achieve the desired authoring outcome. Here we describe a new paradigm for authoring and programming called conversational authoring. In conversational authoring, the authoring process may be seen as a mixed-initiative collaboration between the author and the system. The system guides the author by asking questions and requesting information, and participates in the construction of the authored artifact. In this way, the burden on the author is lessened, making conversational authoring particularly appropriate for systems that embody very complex, abstract, or unintuitive concepts. We have applied the ideas of conversational authoring to the domain of authoring pedagogical agents.”
  • A Toolkit for Statistical Analysis of Language Usage  
    Adam Carlson
    "The recent explosion of online text makes it possible to ask new kinds of questions about what people's use of language reveals about them. How does one group of writers differ from another? How effectively can they communicate and could we make that communication smoother? What can we infer about an author's understanding of a topic by comparing their writing to that of others in the area? We present a tool for analyzing text which addresses these issues. We have developed a number of novel ways to view the relationships between word usages in different corpora (bodies of text.) These views highlight words which may have different meanings among the corpora, words which are used in multiple ways within a single corpus, and allow us to compare documents for their similarity with respect to a given corpus. This tool has a wide variety of possible applications. It could be used to help develop a targeted glossary aimed at defining terms that might be not be known by a particular target audience. It could also be used to groups students according to their level of writing about a particular topic as compared to an "expert text" on that topic. Finally, it could assist in developing a specialized search engine "front-end" that allows a general audience to more effectively query a topic-specific corpus.”
  • Design Issues for Distributed Transcripts
    Steve Tanimoto
    "Current efforts to develop standards for interoperable online learning resources are largely driven by institutions rather than learners. In order to make online learning as efficient as possible, the architecture of an information system for educational assessment data needs to put the learner at the center. Important components of the architecture are representations for assessment data as evidence, support for diverse media, for validation processes, for distributed representation, and to a large extent, for learner control. Such an approach is necessitated by the growing richness of assessment data, concerns for privacy, the growing richness of opportunities for online learning, and the increasing complexity and dynamics of educational goal setting.”
Systems and Networking II: David Wetherall, David Notkin
  • Alpine: A User-Level Infrastructure for Network Protocol Development  
    David Ely
    "In traditional operating systems, modifying the network protocol code is a tedious and error-prone task, largely because the networking stack resides in the kernel. We present Alpine, a networking infrastructure that moves an unmodified FreeBSD networking stack into a user-level library for development. Alpine is easy to use because it requires no modifications to the host operating system or the applications that use it. Once stack modifications have been developed and tested in Alpine with user-level tools such as source-level debuggers, they are easily moved back into the kernel because Alpine uses unmodified kernel source files. We discuss the challenges we faced in virtualizing the FreeBSD networking stack and then show how Alpine is effective at easing the burden of debugging and testing protocol modifications or new network protocol.”
  • one.world - Programming for Pervasive Computing Environments
    Robert Grimm
    "While pervasive computing provides an attractive vision for the future of computing, it is currently difficult to design, build, and deploy applications in such an environment. The key challenge for developers is to build applications that adapt to a highly dynamic environment and that continue to work, even if devices are roaming across the infrastructure and if the network provides only limited services. In this talk, we describe our approach to meeting this challenge and present one.world, a system architecture that provides powerful primitives to simplify pervasive applications.”
  • Supporting Refactoring with Invariants
    Josh Kataoka
    "Program refactoring - transforming a program to improve readability, structure, performance, abstraction, maintainability, or other features - is not applied in practice as much as might be desired. One deterrent is the cost of detecting candidates for refactoring and of choosing the appropriate refactoring transformation. We demonstrate the feasibility of automatically finding places in the program that are candidates for specific refactorings. The approach uses program invariants: when a particular pattern of invariant relationships appears at a program point, a specific refactoring is applicable. Since most programs lack explicit invariants, an invariant detection tool called Daikon is used to infer the required invariants. We developed an invariant pattern matcher for several common refactorings and applied it to an existing Java code base. Numerous refactorings were detected, and one of the developers of the code base assessed their efficacy.”

Session III

Databases: Alon Halevy
  • XML Data Management
    Dan Suciu
    "XML, the new Web data exchange format, is just syntax: the science behind it still needs to be developed. This talk describes four XML data management tasks that need to be addressed in order to enable universal XML data sharing: XML publishing, XML transport, XML storage, and XML processing. I will give brief overviews of four of our research projects that attempt to address these tasks: a prototype for XML publishing from relational databases called SilkRoute; an XML compressor called XMill; an architecture for storing arbitrary XML data in relations called STORED; and an XML toolkit for processing XML in a pipelined fashion.”
  • A Query Processing Architecture for Network-Based XML Data  
    Zack Ives
    "The emergence of XML as a standard for data interchange provides new opportunities for querying and integrating data across the intranet and Internet. We present the Tukwila system, the first data integration system designed specifically for efficient processing of network-bound XML data. Tukwila features several novel components, including the x-scan operator, and it leverages adaptive techniques developed for querying relational data over a network. In contrast to previous systems, which must read, parse, and often store XML data before processing it, Tukwila produces query results even as data is being read from its sources -- returning answers to the user much earlier and also resulting in superior overall performance.”
  • Model Management  
    Rachel Pottinger
    "Many of the problems encountered when building applications of databases systems (DBMSs) involve the manipulation of models. A model is a complex discrete structure that represents a design artifact. For example, a model could be an XML DTD, web site schema, interface definition, relational schema, database transformation script, work-flow definition, semantic network, software configuration or complex document. Many uses of models involve managing the change in models and the transformation of data from one model into another. These uses require an explicit representation of mapping between models. There is an opportunity to make DBMSs easier to use for these model management applications by making ``model" and ``mapping" first-class objects with high-level operators that simplify their use. This capability is known as model management. This talk describes the work currently being done on model management at the University of Washington in conjunction with Phil Bernstein at Microsoft Research.”
Architecture: Susan Eggers
  • An Analysis of Operating System Behavior on a Simultaneous Multithreaded Architecture
    Joshua Redstone
    "We provide a brief overview of some recent results about operating systems execution on a simultaneous multithreaded processor. To carry out this study, we modified the Digital 4.0 Unix operating system to run on a simulated SMT CPU based on a Compaq Alpha processor. We executed this environment by integrating our SMT Alpha instruction set simulator into the SimOS machine simulator. As our principle workload, we executed the Apache Web server running on an 8-context SMT under Digital Unix. Our results demonstrate the micro-architectural impact of an OS-intensive workload on a simultaneous multithreaded processor, and provide insight into the OS demands of the OS intensive Apache Web server.”
  • The Impact of Speculative Execution on Simultaneous Multithreaded Processors  
    Steven Swanson
    "Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that 93% of SPECInt95 instructions committed on a 6-wide superscalar are speculative! Without speculation, processor resources on such a machine would be largely idle. In contrast, simultaneous multithreaded (SMT) processors achieve high utilization by issuing instructions from multiple threads every cycle and dynamically sharing hardware resources among threads. On an SMT, therefore, speculation is questionable, because wrong-path speculative instructions may compete with and displace useful instructions from execution. For this reason, common wisdom indicates (and many have hoped) that SMT performance would be higher if speculation were foregone and branch prediction hardware were used instead for shared resources, such as the TLB. This paper evaluates the behavior of speculative instruction execution on SMT processors. Using both multiprogrammed (SPECInt and SPECFP) and multithreaded (the Apache Web server) workloads, we measure and analyze the impact of speculation. We also examine the effect of speculation-aware fetch and branch prediction policies in the processor. Overall, our results show that, contrary to common wisdom, speculation is critical to performance on an SMT; on our workloads, a speculative SMT realizes up to a 32 percent performance gain over a non-specula-tive version of the same processor. Our analysis shows why speculation is effective on an SMT and also identifies the workload and processor environments in which speculation could cease to be effective.”
  • Memory System and Datapath Design for Network Processors  
    Patrick Crowley
    "Our recent work investigates architectural tradeoffs in network processor design. This talk presents our current work on supporting the flow of packets through the (often) parallel and (sometimes) shared resources found in and among the network processor core, memory and supporting hardware units.”
  • The Importance of Timeliness for Prefetching
    Wayne Wong
    "For many classes of applications, the disparity between memory and processor speeds is the main impediment to sustained high-performance. One technique for reducing the memory latency seen by the processor is to prefetch data. In the context of a modern processor, we evaluated the effectiveness of data prefetching from main memory. Using a oracle prefetcing model, we show that prefetching strategies will need to be able to prefetch with more than one miss ahead in order to have a significant overall benefit. As the relative memory latency increases, the prefetches will need to speculate further ahead in time, a challenge that cannot be met adequately by some of the prefetching techniques that we evaluated.”
Graphics and Vision: Steve Seitz
  • Interactive Elastic Dynamics
    Seth Green
    "We present a novel framework for interactive dynamic simulation of elastically deformable solids. We represent deformations using a lazy wavelet hierarchy constructed from 3D Catmull-Clark subdivision. This wavelet basis is defined both volumetrically and over every point on the surface. The surface is represented in the same subdivision framework, which permits specification of sharp creases and surfaces of arbitrary topology. We develop the equations of motion for linear elastic deformations in the presence of body forces and constraints, all within the wavelet framework. We solve the resulting ordinary differential equations using an implicit method, which lends stability. Further, we refine the wavelet basis near constraints to accelerate the simulation without compromising accuracy. The result is a system that permits high-fidelity, dynamic interaction with complex elastic bodies with volumetrically varying material properties running on a desktop PC.”
  • Single View Modeling
    Li Zhang
    "This talk presents a novel approach for constructing 3D textured surface models from a single painting or photograph. Given a sparse set of user-specified constraints such as normals, silhouettes and creases, a fair 3D surface is generated that satisfies the constraints. We formulate this problem as a constrained optimization problem. Users are free to specify any constraints on a single input image. Our algorithm solves the smoothest surface subject to user’s constraints. Different from most of previous variational surface modeling work, our algorithm explicitly keeps various surface discontinuity constraints while computing the smoothest surface. As our modeling is based on images, surface textures are obtained during surface geometry modeling. The approach is shown to yield good results on real images.”
  • 3D Object Recognition  
    Salvador Ruiz
    "We present a 3D shape based object-recognition algorithm for recognition of objects in scenes containing clutter and occlusion. Recognition is based on matching surface points using novel surface signatures that are closely related to the standard spin image representation. Each signature corresponds to a point in a n-1 dimensional sphere S^(n-1). The maching is implemented as a nearest-neighbor search on S^(n-1). A variant of the algorithm uses a random projection of the original data to S^(d-1), d < n. The algorithms is tested in a data base of 115 3D scenes and 4 models and compared with two standard methods for object recognition. The results of 1640 experiments show that the proposed algorithms have a better recognition rates and and recognition times than the standard algorithms.”
  • Cavalcade of Polyhedra
    Doug Zongker
    "We introduce a novel method for constructing polyhedra by blending together two or more existing polyhedra, using a dualization method. The method is a Minkowski sum, analogous to that of overlay tilings [Zongker 1999], which uses a topological dual operation to create tilings from a network (a graph embedded in a plane). This paper extends that method to networks on the surface of the unit sphere. We begin with polyhedra in canonical form, dualize them to form networks, overlay the networks, and dualize the result to obtain a new polyhedron that blends together the faces of the original polyhedra.”
  • Craig Kaplan
    "In the quest for new visually interesting polyhedra with regular faces, we define and present an infinite class of solids, constructed by placing regular polygons at the rotational axes of a polyhedral symmetry group. This new technique can be used to generate many existing polyhedra, including most of the Archimedean solids. It also yields novel families of attractive symmetric polyhedra.”

Session IV

Ubiquitous Computing: Gaetano Borriello
  • Communication by Low-Power Electrical Signals Through the Human Body  
    Kurt Partridge
    "Intrabody signalling is a technology that allows devices to communicate by using the human body as a circuit element. Communication is limited to physical contact or near physical contact, and therefore can distinguish nearby individuals much better than conventional wireless radio. This talk will cover our bandwidth improvements to 56 kbps and the effect of plate size and location on system performance.”
  • SpotON: Ad-hoc Location Sensing Using Radio Signal Strength
    Jeff Hightower
    "The location of equipment, people, and other real world things is essential data to many up-and-coming applications; such data is often not easy to obtain. We have created SpotON to investigate flexible location sensor deployments in small-scale environments. SpotON tags use radio signal strength information (RSSI) as a distance estimator. This talk highlights selected results in embedded design, calibration, ad-hoc location algorithms, and both infrastructural and wearable application scenarios.
  • Portolano Services and Service Infrastructure  
    Stefan Sigurdson
    We present our service-based approach to invisible computing and the service infrastructure we are developing on top of the One.world distributed system platform. A collection of generic, networked services we are building for two Portolano projects (Labscape and Communicator) will highlight the flexibility and modularity of our approach.
Theory: Anna Karlin
  • How Important is Optimal Routing in the Internet?
    Eric Anderson
    "Recent advances in router hardware technology, algorithm design, and traffic measurement now make it possible to consider Internet routing schemes that incorporate knowledge of traffic demands. In this study, we examine the characteristics of "optimal" routing, where routing tables are computed based on a unified, global analysis of traffic demands. Optimal routing is efficient, in that average packet delay can be reduced substantially compared to standard shortest-path routing. Optimal routing is robust, in that over most of the range of traffic load factors, the routing tables produced by optimal routing are resilient to inaccuracies in the traffic demand matrix [and indeed more so than shortest-path routing]. Finally, optimal routing is practical, in the sense that approximations in the traffic demand matrix and in the routing tables, as well as in the algorithm itself, do not cause significant deterioration in the routing performance. We base our conclusions on the analysis of realistic and synthetic topologies and workloads that we think are typical of medium-scale Internet networks. The conclusions apply to 'quasi-static' routing, and not directly to dynamic routing. The results do not rely on specific assumptions about stochastic properties of network traffic.”
  • Time-space Tradeoff Lower Bounds for Non-uniform Computation  
    Erik Vee
    "A fundamental area of research in theoretical computer science concerns itself with proving lower bounds for running time and space requirements of computational problems. Time-space tradeoff lower bounds explicitly give the minimum running time for any algorithm in terms of the space it uses. Here we explore tradeoff lower bounds using a computational model called a branching program. Branching programs offer an elegant model for non-uniform computation, a model more powerful than the traditional Turing machine. This talk will give an overview of tradeoff lower bounds on branching programs. Although several recent results showing superlinear tradeoff lower bounds on general branching programs will be discussed, the majority of this presentation will deal with more accessible areas, including a restricted model of computation called an oblivious branching program.”
  • Spectral Analysis for Data Mining  
    Frank McSherry
    "Experimental evidence suggests that spectral techniques are valuable for a wide range of applications. A partial list of such applications include (i) semantic analysis of documents used to cluster documents by area of interest, (ii) collaborative filtering -- the reconstruction of missing data items, and (iii) determining the relative importance of documents based on citation/link structure. Intuitive arguments can explain some of the phenomena that have been observed, but little theoretical study has been done. In this talk, we present a model for framing data mining tasks and a unified approach to solving the resulting data mining problems using spectral analysis. For example, our results yield the first theoretical justification for the use of spectral algorithms for collaborative filtering."
    (Joint work with Y. Azar, A. Fiat, A. Karlin and J. Saia.)
Data Compression and Image Processing: Richard Ladner and Eve Riskin
  • Computational Neuroscience and Adaptive Image Processing
    Raj Rao
    The last two decades have seen a tremendous growth in our knowledge of how the brain processes information. Improvements in imaging and multi-electrode technologies are shedding new light on the neural mechanisms of purposeful behavior and cognition, while breakthroughs in molecular biology are allowing increasingly precise characterizations of learning and memory. These advances offer a unique opportunity to build adaptive intelligent systems modeled after neurobiology. This talk will provide a brief overview of my research efforts in this direction. I will illustrate how recent results in computational neuroscience suggest novel methods for efficient image representation, object recognition, and visual learning.
  • Group Testing for Image Compression  
    Ed Hong
    "Group Testing for Wavelets (GTW) is a novel embedded wavelet-based image compression algorithm based on the concept of group testing. We present an overview of both embedded image compression algorithms and group testing. We then show how group testing can be applied to image compression.”
  • Transport of Compressed Media  
    Alex Mohr
    "With the increasing use of unreliable packet networks to transmit media content in real-time, protection of that media from packet losses becomes increasingly important. With real-time services, waiting for retransmission of lost data would impose too much delay on the system, so a number of approaches have been used to tackle this problem. We focus on combining media compression and channel coding techniques (forward error correction) to create a stream that is robust to packet losses, such that media quality degrades gracefully as packet losses increase.”
  • Unequal Loss Protection for H.263 Compressed Video  
    Justin Goshi
    "We study the application of Unequal Loss Protection (ULP) algorithms for motion-compensated video over lossy packet networks. The original ULP framework applies unequal amounts of forward error correction (FEC) to embedded data to provide graceful degradation of image quality in the presense of increasing packet loss. Embedded data enable coarse approximations of an image to be reconstructed from the beginning parts of the bitstream. In this paper, we extend the ULP framework for baseline H.263, a video compression standard that targets low bit rates. In particular, we look at ways to make the encoded bit stream embedded, allowing us to use the ULP framework to provide graceful degradation of video quality in the presense of packet loss.”