**Title: ** Interpretable patient-level predictive models

**Speaker: ** Tyler McCormick, UW Sociology & Statistics

**When: ** Tuesday, April 30, 2013 - 12:30

**Location: ** CSE 403

This talk presents statistical methods which generate patent-level predictions that are both accurate and highly interpretable to healthcare providers and patients. In this context, an interpretable model should be able to pinpoint exactly why a particular prediction was made, and provide the reason in a clear and natural way. The talk begins by introducing the Hierarchical Association Rule Model (HARM) which sequentially predicts a patient's possible future medical conditions given the patient's current and past history of reported conditions. The core of our technique is a Bayesian hierarchical model for selecting predictive association rules (such as ``dyspepsia and epigastric pain imply heartburn") from a large set of candidate rules. We next present a model for traditional classification problems based on decision lists, which consist of a series of if...then... statements (for example, if high blood pressure, then stroke). Decision lists discretize the high-dimensional, multivariate feature space into a series of simple, readily interpretable decision statements. Our Bayesian framework, known as the Bayesian List Machine (BLM), introduces a formal relationship between sparsity and interpretability through a prior structure over lists. We compare our model with the CHADS2 score, actively used in clinical practice for estimating the risk of stroke in patients that have atrial fibrillation. Our model is as interpretable as CHADS2, but more accurate.

This is collaborative work with Cynthia Rudin, Ben Letham, and David Madigan.

**Title: ** Machine learning for protein structure modeling

**Speaker: ** David Baker, Hetunandan Kamisetty, Frank DiMaio, UW Biochemistry

**When: ** Tuesday, March 19, 2013 - 12:30

**Location: ** CSE 305

David will introduce protein structure modeling and describe the conformational samping and scoring problems. Hetu will describe a method to constrain this conformational space that exploits the many-to-one relationship between protein sequence and structure using graphical models. Frank will describe ongoing efforts at improving scoring by optimizing scorefunction parameters directly against experimental data.

**Title: ** Graphical Event Models

**Speaker: ** Asela Gunawardana, Microsoft Research

**When: ** Tuesday, March 12, 2013 - 12:30

**Location: ** CSE Gates Commons

Modeling the temporal dynamics of heterogeneous event streams is central to many real world applications. Graphical Event Models (GEMs) represent the dependencies of each type of event in a stream on past events. Learning the structure and parameters of GEMs from event stream data is valuable for understanding the dynamics of event streams, as well as for predicting their future behavior. This talk will describe specific classes of GEMs, along with computationally tractable learning algorithms and inference algorithms for them, giving empirical results on a number of real world applications. It will also touch on some recent learnability results.

**Title: ** Fitting Manifolds to Probability Measures

**Speaker: ** Hari Narayanan, UW Stats and Mathematics

**When: ** Tuesday, February 26, 2013 - 12:30

**Location: ** CSE 403

We are confronted with very high dimensional data sets. As a result, methods of dealing with high dimensional data have become prominent. One geometrically motivated approach for analyzing data is called manifold learning. The underlying hypothesis of this approach is that due to symmetries and constraints in the generating process, high dimensional data often lie near a low dimensional manifold. Although there are many well known algorithms based on this hypothesis, the basic question of understanding when data lies near a manifold is poorly understood. We will describe joint work with Charles Fefferman and Sanjoy Mitter on developing a provably correct algorithm to fit a nearly optimal manifold to an unknown probability distribution using i.i.d samples, and thereby test this hypothesis.

**Title: ** Discovering Attributes of Objects in Images

**Speaker: ** Ali Farhadi, UW-CSE

**When: ** Tuesday, February 19, 2013 - 12:30

**Location: ** CSE Gates Commons

Visual attributes have shown to be useful in several recognition tasks. The vocabulary of attributes is typically defined manually. In this talk I will show a method to discover the vocabulary of attributes by learning to project images, using learnable projections, to a binary space where categories are well separated. Category memberships are usually good proxies for visual similarity but should not be enforced as a hard constraint. Our method learns codes that maximize separability of categories unless there is strong visual evidence against it. The learned binary codes are highly discriminative; A simple linear SVMs can achieve state-of-the-art results with our short codes. In fact, our method produces state-of-the-art results on Caltech256 with only 128- dimensional bit vectors and outperforms state of the art by using longer codes. Using these codes, I will show a method to reason about conjunctions of multiple classification tasks to reason about complex queries and how to learn categories from very few training samples by borrowing examples from other categories.

**Title: ** Limited Rationality, Limited Perception, and Understanding Others

**Speaker: ** Tomas Singliar, Boeing Research

**When: ** Tuesday, February 12, 2013 - 12:30

**Location: ** CSE Gates Commons

Inverse reinforcement learning is now a well established method for creating controllers by imitation learning. It is less often used for predicting behavior, and only rarely for understanding and interpreting behavior. We seek to improve throughput of surveillance data processing by using IRL techniques to detect anomalous actions for the analyst, together with a plausible explanation for the behavior. For this purpose, real-world interpretations must be associated with the basis functions in the linear reward function approximation. We create a practical scheme where the basis is built up in user interactions triggered by false alarms, thus yielding a small and relevant set of basis functions. The usability of this interactive method is enhanced by formulating the basis functions in natural language.

Experiments revealed limited rationality as the most important barrier to this approach to behavioral anomaly detection. The traditional IRL approach to limited rationality is to assume the agent is making noisy decisions with perfect knowledge of the value function. We argue that in reality, the exact value function is not (cannot) be known and that people are in fact good at choosing the value-function-maximizing action under believed cost, but have misperceptions of the cost. Thus, we present a formulation of the IRL problem which trades off the planning margin (maximal rationality) and accuracy of action cost perception.

**Title: ** Learning with Weak Supervision

**Speaker: ** Hannaneh Hajishirzi

**When: ** Tuesday, November 13, 2012 - 12:30

**Location: ** Room 305

In this talk, I will introduce a novel approach to learning in weakly
supervised settings. The core idea is to exploit the underlying
structure between positive instances using a discriminative notion of
similarity coupled with a ranking function. By reasoning in terms of
pairwise discriminative similarities, we tackle two challenging
problem domains: semantic understanding of professional soccer
commentaries, and Multiple Instance Learning (MIL).
In semantic understanding, I demonstrate a novel algorithm in learning
the correspondences between complex sentences and a rich set of
events. Loose temporal alignments between sentences and events, weak
labels, impose inevitable uncertainties. I will show an algorithm that
reasons in terms of pairwise discriminative similarities and utilizes
popularity metrics to learn the alignements between events and
sentences and even discover group of events, called macro-events, that
best describe a sentence. I will show extensive evaluations on our new
dataset of professional soccer commentaries.
I will also introduce an extension of this approach to tackle an
instance-level multiple instance learning problem. This bottom-up
approach learns a discriminative notion of similarity between
instances in positive bags and use it to form a discriminative
similarity graph. The main idea is to learn a similarity metric that
relies only on confident labels in MIL; negative bags. The underlying
structure among positive instances can be discovered by reasoning in
terms of similarity preserving quasi-cliques, large quasi-cliques with
high scores of within-clique similarities. I show experimental
evaluations that demonstrate the advantages of our bottom-up approach
to the state-of-the-art MIL methods both at the bag-level and
instance-level predictions in standard benchmarks and image and text
datasets.

**Title: ** Gaussian Processes on the Brain

**Speaker: ** Emily Fox

**When: ** Tuesday, October 30, 2012 - 12:30

**Location: ** Room 305

In this talk, we focus on a set of modeling challenges associated with Magnetoencephalography (MEG) recordings of brain activity: (i) the time series are high dimensional with long-range dependencies, (ii) the recordings are extremely noisy, and (iii) gathering multiple trials for a given stimulus is costly. Our goal then is to harness shared structure both within and between trials. Correlations between sensors arise based on spatial proximity, but also from coactivation patterns of brain activity that change with time. Capturing this unknown and changing correlation structure is crucial in effectively sharing information.
Motivated by the structure of our high-dimensional time series, we propose a Bayesian nonparametric dynamic latent factor model based on a sparse combination of Gaussian processes (GPs). Key to the formulation is a time-varying mapping from the lower dimensional embedding of the dynamics to the full observation space, thus capturing time-varying correlations between sensors.
Finally, we turn to another challenge: in addition to long-range dependencies, there are abrupt changes in the MEG signal. We propose a multiresolution GP that hierarchically couples GPs over a random nested partition. Long-range dependencies are captured by the top-level GP while the partition points define the abrupt changes in the time series. The inherent conjugacy of the GPs allows for efficient inference of the hierarchical partition, for which we employ graph-theoretic techniques.
Portions of this talk are joint work with David Dunson, Alona Fyshe, and Tom Mitchell.

**Title: ** Recovery of Simultaneously Structured Models by Convex Optimization

**Speaker: ** Maryam Fazel

**When: ** Tuesday, October 16, 2012 - 12:30

**Location: ** Room 305

The topic of deriving a structured model from a small number of linear
observations by solving a convex optimization problem has been
well-studied in recent years. Examples include the recovery of sparse
or group-sparse vectors (which gave rise to the area of Compressed
Sensing), low-rank matrices (arising in collaborative filtering and
system identification), and the sum of sparse and low-rank matrices
(arising in PCA with sparse outliers, graphical models).
In many applications in signal processing and machine learning, the
model of interest is known to be structured in several ways at the
same time, for example, a matrix that is simultaneously sparse and
low-rank. An application in signal processing is the classic sparse
phase retrieval problem, where the goal is to recover a sparse signal
from phaseless (magnitude-only) measurements. In machine learning, the
problem comes up when combining several regularizers that each promote
a certain desired structure.
In this work, we address the questions: what convex relaxation should
we use for the recovery of a ``simultaneously structured'' model? And
how many measurements are needed generically?
Often penalties (norms) that promote each structure are known (e.g. l1
norm for sparsity, nuclear norm for matrix rank), so it is reasonable
to minimize a combination of these norms. We show that, surprisingly,
if we use as objective function any convex joint function of the
individual norms, we can do no better than an algorithm that exploits
only one of the several structures.
We then specialize our result to the case of simultaneously sparse and
low-rank matrices, and present numerical simulations that support the
theoretical bounds on required number of observations.

**Title: ** Large-Scale Machine Learning

**Speaker: ** Carlos Guestrin

**When: ** Tuesday, October 2, 2012 - 12:30

**Location: ** Room 305

Today, machine learning (ML) methods play a central role in industry
and science. The growth of the Web and improvements in sensor data
collection technology have been rapidly increasing the magnitude and
complexity of the ML tasks we must solve. This growth is driving the
need for scalable, parallel ML algorithms that can handle "Big Data."
In this talk, I'll first present some recent advances in large-scale
algorithms for tackling such huge problems.
Unfortunately, implementing efficient parallel ML algorithms is
challenging. Existing high-level parallel abstractions such as
MapReduce and Pregel are insufficiently expressive to achieve the
desired performance, while low-level tools such as MPI are difficult
to use, leaving ML experts repeatedly solving the same design
challenges.
In this talk, I will also describe the GraphLab framework, which
naturally expresses asynchronous, dynamic graph computations that are
key for state-of-the-art ML algorithms. When these algorithms are
expressed in our higher-level abstraction, GraphLab will effectively
address many of the underlying parallelism challenges, including data
distribution, optimized communication, and guaranteeing sequential
consistency, a property that is surprisingly important for many ML
algorithms. On a variety of large-scale tasks, GraphLab provides
20-100x performance improvements over Hadoop. In recent months,
GraphLab has received thousands of downloads, and is being actively
used by a number of startups, companies, research labs and
universities.
This talk represents joint work with Yucheng Low, Joey Gonzalez, Aapo
Kyrola, Jay Gu, Danny Bickson, and Joseph Bradley.