Edo Airoldi received a PhD from Carnegie Mellon University in 2007, working at the intersection of statistical machine learning and computational social science with Stephen Fienberg and Kathleen Carley. His PhD thesis explored modeling approaches and inference strategies for analyzing social and biological networks. Until December 2008, he was a postdoctoral fellow in the Lewis-Sigler Institute for Integrative Genomics and the Department of Computer Science at Princeton University working with Olga Troyanskaya and David Botstein. They developed mechanistic models of regulation, leveraging of high-thoughput technology, to gain insights into aspects of cellular dynamics that are not directly measurable at the desired resolution, such as growth rate. He joined the Statistics Department at Harvard University in 2009.

# Machine Learning

## Tuesday, June 2, 2015 - 12:30

## Statistical machine learning methods for the analysis of large networks

**Speaker:**Edo Airoldi

**Location:**CSE 305

Edo Airoldi received a PhD from Carnegie Mellon University in 2007, working at the intersection of statistical machine learning and computational social science with Stephen Fienberg and Kathleen Carley. His PhD thesis explored modeling approaches and inference strategies for analyzing social and biological networks. Until December 2008, he was a postdoctoral fellow in the Lewis-Sigler Institute for Integrative Genomics and the Department of Computer Science at Princeton University working with Olga Troyanskaya and David Botstein. They developed mechanistic models of regulation, leveraging of high-thoughput technology, to gain insights into aspects of cellular dynamics that are not directly measurable at the desired resolution, such as growth rate. He joined the Statistics Department at Harvard University in 2009.

## Tuesday, May 19, 2015 - 12:30

## Diverse Particle Selection for High-Dimensional Inference in Graphical Models

**Speaker:**Erik Sudderth, Brown University

**Location:**CSE 305

Rich graphical models for real-world scene understanding encode the shape and pose of objects via high-dimensional, continuous variables. We describe a particle-based max-product inference algorithm which maintains a diverse set of posterior mode hypotheses, and is robust to initialization. At each iteration, the set of particle hypotheses is augmented via stochastic proposals, and then reduced via an optimization algorithm that minimizes distortions in max-product messages. Our particle selection metric is submodular, and thus efficient greedy algorithms have rigorous optimality guarantees. By avoiding the stochastic resampling steps underlying standard particle filters, we also avoid common degeneracies where particles collapse onto a single hypothesis. Our approach significantly outperforms previous particle-based algorithms in the estimation of human pose from single images, and the prediction of protein side-chain conformations.

Erik B. Sudderth is an Assistant Professor in the Brown University Department of Computer Science. He received the Bachelor's degree (summa cum laude, 1999) in Electrical Engineering from the University of California, San Diego, and the Master's and Ph.D. degrees (2006) in EECS from the Massachusetts Institute of Technology. His research interests include probabilistic graphical models; nonparametric Bayesian methods; and applications of statistical machine learning in computer vision and the sciences. He received an NSF CAREER award, and was named one of "AI's 10 to Watch" by IEEE Intelligent Systems Magazine

## Wednesday, May 6, 2015 - 12:30

## Graphical Modeling with the Bethe Approximation

**Speaker:**Tony Jebara, Department of Computer Science, Columbia University

**Location:**Room CSE 305, Allen Center

## Tuesday, January 27, 2015 - 12:30

## Degree, curvature, and mixing of random walks on the phylogenetic subtree-prune-regraft graph, and what it tells us about phylogenetic inference via MCMC

**Speaker:**Erick Matsen, Fred Hutchinson Cancer Research Center http://matsen.fhcrc.org/

**Location:**CSE 305

## Tuesday, January 13, 2015 - 12:30

## Driving Time Variability Prediction Using Mobile Phone Location Data

**Speaker:**Dawn Woodard, Cornell University

**Location:**CSE 305

We introduce a method to predict the variability in (probability distribution of) driving time on an arbitrary route in a road network at a given time, using mobile phone GPS data. Although commercial mapping services currently provide a high-quality estimate of driving time on a given route, there can be considerable uncertainty in that prediction due for example to unknown timing of traffic signals, uncertainties in traffic congestion levels, and differences in driver habits. For this reason, a distribution prediction can be more valuable than a deterministic prediction of driving time, by accounting not just for the measured traffic conditions and other available information, but also for the presence of unmeasured conditions that also affect driving time. Accurate distribution predictions can be used to report variability to the user, to provide risk-averse route recommendations, and as a part of vehicle fleet decision support systems. Simple approaches to distribution prediction assume independence in driving time across road segments and as a result dramatically underestimate the variability in driving time. We propose a method that accurately accounts for dependencies in

driving time across road segments, and apply it to large volumes of mobile phone GPS data from the Seattle metropolitan region.

## Tuesday, November 4, 2014 - 12:30

## TBA

**Speaker:**Yi Chang, Yahoo! Research

**Location:**CSE 305

## Thursday, October 30, 2014 - 12:30

## Deep Representation Learning: Challenges and New Directions

**Speaker:**Honglak Lee, University of Michigan

**Location:**CSE 305

Machine learning is a powerful tool for tackling challenging problems

in artificial intelligence. In practice, success of machine learning

algorithms critically depends on the feature representations for input

data, which often becomes a limiting factor. To address this problem,

deep learning methods have recently emerged as successful techniques

to learn feature hierarchies from unlabeled and labeled data. In this

talk, I will present my perspectives on the progress, challenges, and

some new directions. Specifically, I will talk about my recent work to

address the following interrelated challenges: (1) how can we learn

invariant yet discriminative features, and furthermore disentangle

underlying factors of variation to model high-order interactions

between the factors? (2) how can we learn representations of the

output data when the output variables have complex high-order

dependencies? (3) how can we learn shared representations from

heterogeneous input data modalities?

Bio:

Honglak Lee is an Assistant Professor of Computer Science and

Engineering at the University of Michigan, Ann Arbor. He received his

Ph.D. from Computer Science Department at Stanford University in 2010,

advised by Prof. Andrew Ng. His primary research interests lie in

machine learning, which spans over deep learning, unsupervised and

semi-supervised learning, transfer learning, graphical models, and

optimization. He also works on application problems in computer

vision, audio recognition, robot perception, and text processing. His

work received best paper awards at ICML and CEAS. He has served as a

guest editor of IEEE TPAMI Special Issue on Learning Deep

Architectures, as well as area chairs of ICML and NIPS. He received

the Google Faculty Research Award in 2011, and was selected by IEEE

Intelligent Systems as one of AI's 10 to Watch in 2013.

## Tuesday, October 21, 2014 - 12:30

## Massive, Sparse, Efficient Multilabel Learning

**Speaker:**Charles Elkan, UCSD and Amazon

**Location:**TBA (not CSE 305)

Amazon has many applications whose core is multilabel

classification. This talk will present progress towards a multilabel

learning method that can handle 10^7 training examples, 10^6 features, and

10^5 labels on a single workstation. A sparse linear model is learned for

each label simultaneously by stochastic gradient descent with L2 and L1

regularization. Tractability is achieved through careful use of sparse data

structures, and speed is achieved by using the latest stochastic gradient

methods that do variance reduction. Both theoretically and practically,

these methods achieve order-of-magnitude faster convergence than Adagrad.

We have extended them to handle non-differentiable L1 regularization. We

show experimental results on classifying biomedical articles into 26,853

scientific categories. [Joint work with Galen Andrew, ML intern at Amazon.]

**Bio** Charles Elkan is the first Amazon Fellow, on leave from being a

professor of computer science at the University of California, San Diego.

In the past, he has been a visiting associate professor at Harvard and a

researcher at MIT. His published research has been mainly in machine

learning, data science, and computational biology. The MEME algorithm that

he developed with Ph.D. students has been used in over 3000 published

research projects in biology and computer science. He is fortunate to have

had inspiring undergraduate and graduate students who are in leadership

positions now such as vice president at Google.

## Tuesday, October 7, 2014 - 12:30

## Learning Mixtures of Ranking Models

**Speaker:**Pranjal Awasthi, Princeton University

**Location:**CSE 305

Probabilistic modeling of ranking data is an extensively studied

problem with applications ranging from understanding user preferences

in electoral systems and social choice theory, to more modern learning

tasks in online web search, crowd-sourcing and recommendation

systems. This work concerns learning the Mallows model -- one of the

most popular probabilistic models for analyzing ranking data. In this

model, the user's preference ranking is generated as a noisy version

of an unknown central base ranking. The learning task is to recover

the base ranking and the model parameters using access to noisy

rankings generated from the model.

Although well understood in the setting of a homogeneous population (a

single base ranking), the case of a heterogeneous population (mixture

of multiple base rankings) has so far resisted algorithms with

guarantees on worst case instances. In this talk I will present the

first polynomial time algorithm which provably learns the parameters

and the unknown base rankings of a mixture of two Mallows models. A

key component of our algorithm is a novel use of tensor decomposition

techniques to learn the top-k prefix in both the rankings. Before this

work, even the question of identifiability in the case of a mixture of

two Mallows models was unresolved.

Joint work with Avrim Blum, Or Sheffet and Aravindan Vijayaraghavan.

## Monday, September 29, 2014 - 12:30

## Convex and Bayesian methods for link prediction using distributed representations

**Speaker:**Guillaume Bouchard, Xerox Research Europe

**Location:**CSE 305

Many applications involve multiple interlinked data sources, but existing

approach to handle them are often based on latent factor models (i.e.

distributed representations) which are difficult to learn. At the same

time, recent advances in convex analysis, mainly based on the nuclear norm

(relaxation of the matrix rank) and sparse structured approximations, have

shown great theoretical and practical performances to handle very large

matrix factorization problems with non-Gaussian noise and missing data.

In this talk, we will show how multiple matrices or tensors can be jointly

factorized using a convex formulation of the problem, with a particular

focus on:

- Multi-view learning: A popular approach is to assume that, both, the

correlations between the views and the view-specific correlations have

low-rank structure, leading to a model closely related to canonical

correlation analysis called inter-battery factor analysis. We propose a

convex relaxation of this model, based on a structured nuclear norm

regularization. - Collective matrix factorization: When multiple matrices are related, they

share common latent factors, leading to a simple yet powerful way of

handling complex data structures, such as relational databases. Again, a

convex formulation of this approach is proposed. We also show that the

Bayesian version of this model can be used to tune the multiple

regularization parameters involved in such models, avoiding costly

cross-validation.

Another contribution to KB modeling relates to binary tensor and matrix

factorization with many zeros. We show a new learning approaches for binary

data that scales linearly with the number of positive examples. It is based

on a iterative split of the tensor (or matrix) on which the binary loss is

approximated by a Gaussian loss which itself can be efficiently minimized.

Experiments on popular tasks such as data imputation, multi-label

prediction, link prediction in graphs and item recommendation illustrate

the benefit of the proposed approaches.

**Bio**Guillaume Bouchard is senior research scientist at Xerox Research

Centre Europe in Grenoble, France. After an engineering degree and master

in mathematics in Université de Rouen, he obtained a PhD in statistics from

Institut National de Recherche en Information et Automatique (INRIA) in

2004. Since then, he worked for Xerox on multiple machine learning research

project in big data analysis, including user modelling, recommender systems

and natural language processing. He was involved in French and European

research projects called LAVA, FUPOL, Fusepool and Dynamicité. His current

research focuses on the development of distributed statistical relational

models for knowledge bases, applied to the development of virtual agents.