Tuesday, February 26, 2013 - 12:30

Fitting Manifolds to Probability Measures

Speaker: Hari Narayanan, UW Stats and Mathematics
Location: CSE 403
We are confronted with very high dimensional data sets. As a result, methods of dealing with high dimensional data have become prominent. One geometrically motivated approach for analyzing data is called manifold learning. The underlying hypothesis of this approach is that due to symmetries and constraints in the generating process, high dimensional data often lie near a low dimensional manifold. Although there are many well known algorithms based on this hypothesis, the basic question of understanding when data lies near a manifold is poorly understood. We will describe joint work with Charles Fefferman and Sanjoy Mitter on developing a provably correct algorithm to fit a nearly optimal manifold to an unknown probability distribution using i.i.d samples, and thereby test this hypothesis.

Tuesday, February 19, 2013 - 12:30

Discovering Attributes of Objects in Images

Speaker: Ali Farhadi, UW-CSE
Location: CSE Gates Commons
Visual attributes have shown to be useful in several recognition tasks. The vocabulary of attributes is typically defined manually. In this talk I will show a method to discover the vocabulary of attributes by learning to project images, using learnable projections, to a binary space where categories are well separated. Category memberships are usually good proxies for visual similarity but should not be enforced as a hard constraint. Our method learns codes that maximize separability of categories unless there is strong visual evidence against it. The learned binary codes are highly discriminative; A simple linear SVMs can achieve state-of-the-art results with our short codes. In fact, our method produces state-of-the-art results on Caltech256 with only 128- dimensional bit vectors and outperforms state of the art by using longer codes. Using these codes, I will show a method to reason about conjunctions of multiple classification tasks to reason about complex queries and how to learn categories from very few training samples by borrowing examples from other categories.

Tuesday, February 12, 2013 - 12:30

Limited Rationality, Limited Perception, and Understanding Others

Speaker: Tomas Singliar, Boeing Research
Location: CSE Gates Commons

Inverse reinforcement learning is now a well established method for creating controllers by imitation learning. It is less often used for predicting behavior, and only rarely for understanding and interpreting behavior. We seek to improve throughput of surveillance data processing by using IRL techniques to detect anomalous actions for the analyst, together with a plausible explanation for the behavior. For this purpose, real-world interpretations must be associated with the basis functions in the linear reward function approximation. We create a practical scheme where the basis is built up in user interactions triggered by false alarms, thus yielding a small and relevant set of basis functions. The usability of this interactive method is enhanced by formulating the basis functions in natural language.

Experiments revealed limited rationality as the most important barrier to this approach to behavioral anomaly detection. The traditional IRL approach to limited rationality is to assume the agent is making noisy decisions with perfect knowledge of the value function. We argue that in reality, the exact value function is not (cannot) be known and that people are in fact good at choosing the value-function-maximizing action under believed cost, but have misperceptions of the cost. Thus, we present a formulation of the IRL problem which trades off the planning margin (maximal rationality) and accuracy of action cost perception.

Tuesday, November 13, 2012 - 12:30

Learning with Weak Supervision

Speaker: Hannaneh Hajishirzi
Location: Room 305
In this talk, I will introduce a novel approach to learning in weakly supervised settings. The core idea is to exploit the underlying structure between positive instances using a discriminative notion of similarity coupled with a ranking function. By reasoning in terms of pairwise discriminative similarities, we tackle two challenging problem domains: semantic understanding of professional soccer commentaries, and Multiple Instance Learning (MIL). In semantic understanding, I demonstrate a novel algorithm in learning the correspondences between complex sentences and a rich set of events. Loose temporal alignments between sentences and events, weak labels, impose inevitable uncertainties. I will show an algorithm that reasons in terms of pairwise discriminative similarities and utilizes popularity metrics to learn the alignements between events and sentences and even discover group of events, called macro-events, that best describe a sentence. I will show extensive evaluations on our new dataset of professional soccer commentaries. I will also introduce an extension of this approach to tackle an instance-level multiple instance learning problem. This bottom-up approach learns a discriminative notion of similarity between instances in positive bags and use it to form a discriminative similarity graph. The main idea is to learn a similarity metric that relies only on confident labels in MIL; negative bags. The underlying structure among positive instances can be discovered by reasoning in terms of similarity preserving quasi-cliques, large quasi-cliques with high scores of within-clique similarities. I show experimental evaluations that demonstrate the advantages of our bottom-up approach to the state-of-the-art MIL methods both at the bag-level and instance-level predictions in standard benchmarks and image and text datasets.

Tuesday, October 30, 2012 - 12:30

Gaussian Processes on the Brain

Speaker: Emily Fox
Location: Room 305
In this talk, we focus on a set of modeling challenges associated with Magnetoencephalography (MEG) recordings of brain activity: (i) the time series are high dimensional with long-range dependencies, (ii) the recordings are extremely noisy, and (iii) gathering multiple trials for a given stimulus is costly. Our goal then is to harness shared structure both within and between trials. Correlations between sensors arise based on spatial proximity, but also from coactivation patterns of brain activity that change with time. Capturing this unknown and changing correlation structure is crucial in effectively sharing information. Motivated by the structure of our high-dimensional time series, we propose a Bayesian nonparametric dynamic latent factor model based on a sparse combination of Gaussian processes (GPs). Key to the formulation is a time-varying mapping from the lower dimensional embedding of the dynamics to the full observation space, thus capturing time-varying correlations between sensors. Finally, we turn to another challenge: in addition to long-range dependencies, there are abrupt changes in the MEG signal. We propose a multiresolution GP that hierarchically couples GPs over a random nested partition. Long-range dependencies are captured by the top-level GP while the partition points define the abrupt changes in the time series. The inherent conjugacy of the GPs allows for efficient inference of the hierarchical partition, for which we employ graph-theoretic techniques. Portions of this talk are joint work with David Dunson, Alona Fyshe, and Tom Mitchell.

Tuesday, October 16, 2012 - 12:30

Recovery of Simultaneously Structured Models by Convex Optimization

Speaker: Maryam Fazel
Location: Room 305
The topic of deriving a structured model from a small number of linear observations by solving a convex optimization problem has been well-studied in recent years. Examples include the recovery of sparse or group-sparse vectors (which gave rise to the area of Compressed Sensing), low-rank matrices (arising in collaborative filtering and system identification), and the sum of sparse and low-rank matrices (arising in PCA with sparse outliers, graphical models). In many applications in signal processing and machine learning, the model of interest is known to be structured in several ways at the same time, for example, a matrix that is simultaneously sparse and low-rank. An application in signal processing is the classic sparse phase retrieval problem, where the goal is to recover a sparse signal from phaseless (magnitude-only) measurements. In machine learning, the problem comes up when combining several regularizers that each promote a certain desired structure. In this work, we address the questions: what convex relaxation should we use for the recovery of a ``simultaneously structured'' model? And how many measurements are needed generically? Often penalties (norms) that promote each structure are known (e.g. l1 norm for sparsity, nuclear norm for matrix rank), so it is reasonable to minimize a combination of these norms. We show that, surprisingly, if we use as objective function any convex joint function of the individual norms, we can do no better than an algorithm that exploits only one of the several structures. We then specialize our result to the case of simultaneously sparse and low-rank matrices, and present numerical simulations that support the theoretical bounds on required number of observations.

Tuesday, October 2, 2012 - 12:30

Large-Scale Machine Learning

Speaker: Carlos Guestrin
Location: Room 305
Today, machine learning (ML) methods play a central role in industry and science. The growth of the Web and improvements in sensor data collection technology have been rapidly increasing the magnitude and complexity of the ML tasks we must solve. This growth is driving the need for scalable, parallel ML algorithms that can handle "Big Data." In this talk, I'll first present some recent advances in large-scale algorithms for tackling such huge problems. Unfortunately, implementing efficient parallel ML algorithms is challenging. Existing high-level parallel abstractions such as MapReduce and Pregel are insufficiently expressive to achieve the desired performance, while low-level tools such as MPI are difficult to use, leaving ML experts repeatedly solving the same design challenges. In this talk, I will also describe the GraphLab framework, which naturally expresses asynchronous, dynamic graph computations that are key for state-of-the-art ML algorithms. When these algorithms are expressed in our higher-level abstraction, GraphLab will effectively address many of the underlying parallelism challenges, including data distribution, optimized communication, and guaranteeing sequential consistency, a property that is surprisingly important for many ML algorithms. On a variety of large-scale tasks, GraphLab provides 20-100x performance improvements over Hadoop. In recent months, GraphLab has received thousands of downloads, and is being actively used by a number of startups, companies, research labs and universities. This talk represents joint work with Yucheng Low, Joey Gonzalez, Aapo Kyrola, Jay Gu, Danny Bickson, and Joseph Bradley.

Tuesday, May 22, 2012 - 12:30

Learning Hierarchical Representations for Recognition

Speaker: Liefeng Bo, Dept. Computer Science & Eng., Univ. Washington
Location: Gates Commons

Humans are able to recognize objects despite significant
variations in their appearance due to changing viewpoints, scales,
lighting conditions and deformations. This ability fundamentally relies
on robust representations/features of the physical world. In this talk,
I will introduce our recent work on representation/feature learning:
hierarchical kernel descriptor and hierarchical matching pursuit.
Hierarchical kernel descriptor is a kernel based feature learning
approach that uses efficient matching kernels to build feature
hierarchies. Hierarchical matching pursuit is a multi-layer sparse
coding approach that learns multi-level features from raw sensor data by
recursively applying matching pursuit coding followed by spatial pyramid
pooling. Our models outperform the state-of-the-art, often by a large
margin, on more than fifteen popular computer vision benchmarks for many
types of recognition tasks including RGB-(D) object recognition,
fine-grained object recognition, digit recognition, face recognition,
material recognition, scene recognition, and RGB-(D) scene labeling. In
addition, I will discuss the connections and differences between our
models, and bag-of-words models, deformable parts model and deep
networks/feature learning in terms of the underlying architecture.

Tuesday, May 8, 2012 - 12:30

Linguistic Structure Prediction with AD³

Speaker: Noah Smith, Carnegie Mellon University
Location: Gates Commons

In this talk, I will present AD³ (Alternating Directions Dual
Decomposition), an algorithm for approximate MAP inference in loopy
graphical models with discrete random variables, including structured
prediction problems. AD³ is simple to implement and well-suited to
problems with hard constraints expressed in first-order logic. It often
finds the exact MAP solution, giving a certificate when it does; when it
doesn't, it can be embedded within an exact branch and bound technique.
I'll show experimental results on two natural language processing tasks,
dependency parsing and frame-semantic parsing. This work was done in
collaboration with André Martins, Dipanjan Das, Pedro Aguiar, Mário
Figueiredo, and Eric Xing.

Tuesday, April 24, 2012 - 12:30

Decision making in the primate brain: A model based on POMDPs and reinforcement learning

Speaker: Raj Rao, University of Washington
Location: Room 305

How does the brain learn to make decisions based on noisy sensory
information and incomplete knowledge of the world? In this talk, I
will sketch a neural model of action selection and decision making
based on the general framework of partially observable Markov decision
processes (POMDPs). The model postulates that actions are selected so
as to maximize expected cumulative reward, where rewards can be
external (e.g., food) or internal (e.g., penalty for delay). Action
selection is based on the posterior distribution over states (the
"belief" state) computed using Bayesian inference. A reinforcement
learning algorithm known as temporal difference (TD) learning
maximizes the expected reward. I will describe how such a model
provides a unified framework for explaining recent experimental
results on decision making in the primate brain. The resulting neural
architecture posits an active role for the neocortex in Bayesian
inference while ascribing a role to the basal ganglia in value
computation and action selection. The model suggests an important role
for interactions between the neocortex and the basal ganglia in
learning the mapping between probabilistic sensory representations and
actions that maximize rewards.
(Joint work with Yanping Huang and Abe Friesen)

Machine Learning Seminars

Sponsored by