Machine Learning | Paul G. Allen School of Computer Science & Engineering

Machine Learning Seminars

Sponsored by

Title: Local Privacy, Minimax Rates, and Learning

Speaker: John Duchi, UC Berkeley

When: Tuesday, May 21, 2013 - 12:30

Location: CSE 305

We study statistical estimation minimization problems under a privacy model in which the data is kept confidential even from the learner. In this local privacy framework, we establish sharp upper and lower bounds on the convergence rates of statistical inference procedures. As a consequence, we exhibit a precise tradeoff between the amount of privacy the data preserves and the utility, as measured by convergence rate, of any statistical estimator or learning procedure.

Joint work with Michael Jordan and Martin Wainwright.

Title: Recursive Deep Learning for Modeling Semantic Compositionality

Speaker: Richard Socher, Stanford

When: Tuesday, May 14, 2013 - 12:30

Location: CSE Gates Commons

Compositional and recursive structure is commonly found in different modalities, including natural language sentences and scene images. I will introduce several recursive deep learning models that, unlike standard deep learning methods can learn compositional meaning vector representations for phrases, sentences and images. These recursive neural network based models obtain state-of-the-art performance on a variety of syntactic and semantic language tasks such as parsing, paraphrase detection, relation classification and sentiment analysis.

Besides the good performance, the models capture interesting phenomena in language such as compositionality. For instance the models learn different types of high level negation and how it can change the meaning of longer phrases with many positive words. They can learn that the sentiment following a "but" usually dominates that of phrases preceding the "but." Furthermore, unlike many other machine learning approaches that rely on human designed feature sets, features are learned as part of the model.

I will focus on two recent results: Pushing the performance of the Stanford parser by 3.8% and improving its speed by 20% using syntactically untied RNNs and a recursive neural tensor network trained on a novel sentiment treebank.

Title: Validating Network Classifiers and Pricing Information

Speaker: Eric Bax

When: Tuesday, May 7, 2013 - 12:30

Location: CSE Gates Commons

Validating Network Classifiers: Networks are fundamental to our lives, from the network of gene interactions that shapes our bodies to the social networks that can eat up the hours of our lives. :) Collective classification uses network structure to predict node information. For example, if your friends all like jazz, are you likely to as well? Since networks grow by adding nodes based on the nodes already in the network, nodes are not drawn i.i.d. This makes trouble for most machine learning approaches to validation of classifier performance. We will discuss a method to validate network classifiers that is based on understanding how the network grows.

Pricing Information: In auctions for online advertising, data providers tell advertisers which users are the best bets for their ads. So advertisers buy a combination of information (from data providers) and advertising space (from publishers like Yahoo). How much should advertisers pay for each? Let's have a hands-on experience to find out.

Title: Interpretable patient-level predictive models

Speaker: Tyler McCormick, UW Sociology & Statistics

When: Tuesday, April 30, 2013 - 12:30

Location: CSE 403

This talk presents statistical methods which generate patent-level predictions that are both accurate and highly interpretable to healthcare providers and patients. In this context, an interpretable model should be able to pinpoint exactly why a particular prediction was made, and provide the reason in a clear and natural way. The talk begins by introducing the Hierarchical Association Rule Model (HARM) which sequentially predicts a patient's possible future medical conditions given the patient's current and past history of reported conditions. The core of our technique is a Bayesian hierarchical model for selecting predictive association rules (such as ``dyspepsia and epigastric pain imply heartburn") from a large set of candidate rules. We next present a model for traditional classification problems based on decision lists, which consist of a series of if...then... statements (for example, if high blood pressure, then stroke). Decision lists discretize the high-dimensional, multivariate feature space into a series of simple, readily interpretable decision statements. Our Bayesian framework, known as the Bayesian List Machine (BLM), introduces a formal relationship between sparsity and interpretability through a prior structure over lists. We compare our model with the CHADS2 score, actively used in clinical practice for estimating the risk of stroke in patients that have atrial fibrillation. Our model is as interpretable as CHADS2, but more accurate.

This is collaborative work with Cynthia Rudin, Ben Letham, and David Madigan.

Title: Machine learning for protein structure modeling

Speaker: David Baker, Hetunandan Kamisetty, Frank DiMaio, UW Biochemistry

When: Tuesday, March 19, 2013 - 12:30

Location: CSE 305

David will introduce protein structure modeling and describe the conformational samping and scoring problems. Hetu will describe a method to constrain this conformational space that exploits the many-to-one relationship between protein sequence and structure using graphical models. Frank will describe ongoing efforts at improving scoring by optimizing scorefunction parameters directly against experimental data.

Title: Graphical Event Models

Speaker: Asela Gunawardana, Microsoft Research

When: Tuesday, March 12, 2013 - 12:30

Location: CSE Gates Commons

Modeling the temporal dynamics of heterogeneous event streams is central to many real world applications. Graphical Event Models (GEMs) represent the dependencies of each type of event in a stream on past events. Learning the structure and parameters of GEMs from event stream data is valuable for understanding the dynamics of event streams, as well as for predicting their future behavior. This talk will describe specific classes of GEMs, along with computationally tractable learning algorithms and inference algorithms for them, giving empirical results on a number of real world applications. It will also touch on some recent learnability results.

Title: Fitting Manifolds to Probability Measures

Speaker: Hari Narayanan, UW Stats and Mathematics

When: Tuesday, February 26, 2013 - 12:30

Location: CSE 403

We are confronted with very high dimensional data sets. As a result, methods of dealing with high dimensional data have become prominent. One geometrically motivated approach for analyzing data is called manifold learning. The underlying hypothesis of this approach is that due to symmetries and constraints in the generating process, high dimensional data often lie near a low dimensional manifold. Although there are many well known algorithms based on this hypothesis, the basic question of understanding when data lies near a manifold is poorly understood. We will describe joint work with Charles Fefferman and Sanjoy Mitter on developing a provably correct algorithm to fit a nearly optimal manifold to an unknown probability distribution using i.i.d samples, and thereby test this hypothesis.

Title: Discovering Attributes of Objects in Images

Speaker: Ali Farhadi, UW-CSE

When: Tuesday, February 19, 2013 - 12:30

Location: CSE Gates Commons

Visual attributes have shown to be useful in several recognition tasks. The vocabulary of attributes is typically defined manually. In this talk I will show a method to discover the vocabulary of attributes by learning to project images, using learnable projections, to a binary space where categories are well separated. Category memberships are usually good proxies for visual similarity but should not be enforced as a hard constraint. Our method learns codes that maximize separability of categories unless there is strong visual evidence against it. The learned binary codes are highly discriminative; A simple linear SVMs can achieve state-of-the-art results with our short codes. In fact, our method produces state-of-the-art results on Caltech256 with only 128- dimensional bit vectors and outperforms state of the art by using longer codes. Using these codes, I will show a method to reason about conjunctions of multiple classification tasks to reason about complex queries and how to learn categories from very few training samples by borrowing examples from other categories.

Title: Limited Rationality, Limited Perception, and Understanding Others

Speaker: Tomas Singliar, Boeing Research

When: Tuesday, February 12, 2013 - 12:30

Location: CSE Gates Commons

Inverse reinforcement learning is now a well established method for creating controllers by imitation learning. It is less often used for predicting behavior, and only rarely for understanding and interpreting behavior. We seek to improve throughput of surveillance data processing by using IRL techniques to detect anomalous actions for the analyst, together with a plausible explanation for the behavior. For this purpose, real-world interpretations must be associated with the basis functions in the linear reward function approximation. We create a practical scheme where the basis is built up in user interactions triggered by false alarms, thus yielding a small and relevant set of basis functions. The usability of this interactive method is enhanced by formulating the basis functions in natural language.

Experiments revealed limited rationality as the most important barrier to this approach to behavioral anomaly detection. The traditional IRL approach to limited rationality is to assume the agent is making noisy decisions with perfect knowledge of the value function. We argue that in reality, the exact value function is not (cannot) be known and that people are in fact good at choosing the value-function-maximizing action under believed cost, but have misperceptions of the cost. Thus, we present a formulation of the IRL problem which trades off the planning margin (maximal rationality) and accuracy of action cost perception.

Title: Learning with Weak Supervision

Speaker: Hannaneh Hajishirzi

When: Tuesday, November 13, 2012 - 12:30

Location: Room 305

In this talk, I will introduce a novel approach to learning in weakly supervised settings. The core idea is to exploit the underlying structure between positive instances using a discriminative notion of similarity coupled with a ranking function. By reasoning in terms of pairwise discriminative similarities, we tackle two challenging problem domains: semantic understanding of professional soccer commentaries, and Multiple Instance Learning (MIL). In semantic understanding, I demonstrate a novel algorithm in learning the correspondences between complex sentences and a rich set of events. Loose temporal alignments between sentences and events, weak labels, impose inevitable uncertainties. I will show an algorithm that reasons in terms of pairwise discriminative similarities and utilizes popularity metrics to learn the alignements between events and sentences and even discover group of events, called macro-events, that best describe a sentence. I will show extensive evaluations on our new dataset of professional soccer commentaries. I will also introduce an extension of this approach to tackle an instance-level multiple instance learning problem. This bottom-up approach learns a discriminative notion of similarity between instances in positive bags and use it to form a discriminative similarity graph. The main idea is to learn a similarity metric that relies only on confident labels in MIL; negative bags. The underlying structure among positive instances can be discovered by reasoning in terms of similarity preserving quasi-cliques, large quasi-cliques with high scores of within-clique similarities. I show experimental evaluations that demonstrate the advantages of our bottom-up approach to the state-of-the-art MIL methods both at the bag-level and instance-level predictions in standard benchmarks and image and text datasets.

Video