We propose the combinatorial inference to explore the global topological structures of graphical models.In particular, we conduct hypothesis tests on many combinatorial graph properties including connectivity, hub detection, perfect matching, etc. Our methods can be applied to any graph property which is invariant under the deletion of edges. On the other side, we also develop a generic minimax lower bound which shows the optimality of the proposed method for a large family of graph properties. Our methods are applied to the neuroscience by discovering hub voxels contributing to visual memories.

## Wednesday, November 14, 2018 - 12:00

## Combinatorial Inference

**Speaker:**Junwei Lu

**Location:**CSE 403

## Wednesday, October 31, 2018 - 12:00

## Best-of-all-worlds: Robust structural risk minimization

**Speaker:**Vidya Muthukumar

**Location:**CSE 403

Sequential learning algorithms typically operate under two possible sets of assumptions: that the environment is stochastic, possibly with temporal dependencies, or that the environment is controlled by a strategic agent that is adversarial to our learning. There are two factors that can make learning difficult: the model complexity of the environment if it is stochastic, and the potential presence of misleading information from an adversary. Can we learn about the nature of our environment online? Can we learn as well as we could have in hindsight had we known about the presence or absence of stochasticity, and the right notion of model complexity?
We design an algorithm inspired by recent advances in adaptivity in online learning to solve this problem in binary sequence prediction, in which the sequence is either adversarially designed or stochastic with temporal dependencies. We show that when the sequence is Markovian with a finite memory (the “in-model” case), our algorithm learns both the presence of stochasticity and the memory length, and we achieve regret rates that are competitive with the optimal greedy algorithm that has side information about the stochastic process. Additionally, in the “out-of-model cases” where the temporal dependencies can be of infinite memory, we show that our algorithm appropriately adapts and improves performance to those of higher-memory benchmarks as we see more of the sequence. We thus adaptively recover the fundamental estimation-approximation tradeoff in structural risk minimization while guaranteeing adversarial robustness.
For clarity of understanding, we present our insights in the context of binary sequence prediction, but they apply more generally in a contextual input-output prediction setting with discrete model hierarchies, discrete outputs and a black-box structural risk minimization framework. This includes online supervised classification under function classes that are VC-classes/have bounded covering numbers.

## Wednesday, October 24, 2018 - 12:00

## Battling Demons in Peer Review

**Speaker:**Nihar Shah

**Location:**CSE 403

Peer review is the backbone of scholarly research. It is however faced with a number of challenges (or "demons") such as subjectivity, bias/miscalibration, noise, and strategic behavior. The growing number of submissions in many areas of research such as machine learning has significantly increased the scale of these demons. This talk will present some principled and practical approaches to battle these demons in peer review:
(1) Subjectivity: How to ensure that all papers are judged by the same yardstick?
(2) Bias/miscalibration: How to use ratings in presence of arbitrary or adversarial miscalibration?
(3) Noise: How to assign reviewers to papers to simultaneously ensure fair and accurate evaluations in the presence of review noise?
(4) Strategic behavior: How to insulate peer review from strategic behavior of author-reviewers?
The work uses tools from statistics and learning theory, social choice theory, information theory, game theory and decision theory. (No prior knowledge on these topics will be assumed.)

## Tuesday, October 2, 2018 - 12:00

## New Frontiers in Imitation Learning

**Speaker:**Yisong Yue

**Location:**CSE 403

The ongoing explosion of spatiotemporal tracking data has now made it possible to analyze and model fine-grained behaviors in a wide range of domains. For instance, tracking data is now being collected for every NBA basketball game with players, referees, and the ball tracked at 25 Hz, along with annotated game events such as passes, shots, and fouls. Other settings include laboratory animals, people in public spaces, professionals in settings such as operating rooms, actors speaking and performing, digital avatars in virtual environments, and even the behavior of other computational systems.
In this talk, I will describe ongoing research in using imitation learning to develop predictive models of fine-grained behavior. Imitation learning is branch of machine learning that deals with learning to imitate dynamic demonstrated behavior. I will provide a high level overview of the basic problem setting, as well as specific projects in modeling laboratory animals, professional sports, speech animation, and expensive computational oracles.

## Tuesday, May 22, 2018 - 12:00

## Learning Without Mixing: Towards A Sharp Analysis of Linear System Identification

**Speaker:**Max Simchowitz

**Location:**Savery Hall, Room 409

We prove that the ordinary least-squares (OLS) estimator attains nearly minimax optimal performance for the identification of linear dynamical systems from a single observed trajectory. Our upper bound relies on a generalization of Mendelson's small-ball method to dependent data, eschewing the use of standard mixing-time arguments. Our lower bounds reveal that these upper bounds match up to logarithmic factors. In particular, we capture the correct signal-to-noise behavior of the problem, showing that more unstable linear systems are easier to estimate. This behavior is qualitatively different from arguments which rely on mixing-time calculations that suggest that unstable systems are more difficult to estimate. We generalize our technique to provide bounds for a more general class of linear response time-series. Joint work with Horia Mania, Stephen Tu, Michael I. Jordan, Benjamin Recht. Paper .

## Saturday, May 19, 2018 - 12:00

## Active-set complexity of proximal-gradient

**Speaker:**Mark Schmidt (UBC)

**Location:**CSE 403

Proximal gradient methods have been found to be highly effective for solving minimization problems with non-negative constraints or L1-regularization. Under suitable non-degeneracy conditions, it is known that these algorithms identify the optimal sparsity pattern for these types of problems in a finite number of iterations. However, it is not known how many iterations this may take. We introduce the notion of the "active-set complexity", which in these cases is the number of iterations before an algorithm is guaranteed to have identified the final sparsity pattern. We further give a bound on the active-set complexity of proximal gradient methods in the common case of minimizing the sum of a strongly-convex smooth function and a separable convex non-smooth function.

## Tuesday, May 1, 2018 - 12:00

## Bandits with Delayed Feedback: (Not) Everything Comes to Him Who Waits

**Speaker:**Claire Vernade (Amazon AI)

**Location:**Savery Hall, Room 409

Almost all real world implementations of bandit algorithms actually deal with bandit feedback: after a customer is presented an ad, his click (if any) is not sent within milliseconds but rather minutes or even hours or days, depending on the application. Moreover, this problem is coupled with an observation ambiguity: while the system is waiting for a click feedback, the customer might already have decided not to click at all and the learner will never get the awaited reward.
In this talk we introduce a delayed feedback model for stochastic bandits. We first consider the situation when the learner has an infinite patience and show that in that case the problem is actually not harder than the non-delayed one and can be solved similarly. However, this comes at a huge memory cost in O(T), T being the length of the experiment. Thus, we introduce a short-memory setting that mitigates the previously mentioned issue at the price of an additional censoring effect on the feedback that we carefully handle. We present an asymptotically optimal algorithm together with a regret bound and demonstrate empirically its behavior on simulated data.
Short Bio:
Claire got her PhD from Telecom ParisTech in October 2017 and is now a post-doc with Amazon CoreAI in Berlin and the University of Magdeburg. Her work focuses on designing and analyzing bandit models for recommendation, A/B testing and other marketing-related applications. From a larger perspective, she is interested in modeling external sources of uncertainty -- or bias -- in order to understand the impact that it may have on the complexity of the learning and on the final result.

## Tuesday, April 24, 2018 - 12:00

## How to tie your parameters? Parameter-sharing as a powerful solution to deep model design

**Speaker:**Siamak Ravanbakhsh (UBC)

**Location:**Savery Hall, Room 409

The language of invariance (and equivariance) can express the data structure in many application domains. Translation invariance in image data, time-invariance of Markov property in time series and exchangeability in relational data are examples of this perspective on structured domains. In this talk, I will present theoretical results demonstrating that parameter-sharing can be a "generic" technique for encoding our prior knowledge about the domain structure within a deep model. I will show experimental results in several areas including deep models for exchangeable sequences -- i.e., sets -- and its generalization to exchangeable tensors that encode interactions across multiple sets of objects
Bio: Before joining the Computer Science Department at UBC in the summer of 2017, Siamak was a postdoctoral fellow at Machine Learning Department and Robotics Institute at Carnegie Mellon University, where he worked with Barnabás Póczos and Jeff Schneider. Siamak was affiliated with the Auton Lab and the McWilliams Center for Cosmology. He obtained my M.Sc. and Ph.D. at University of Alberta, advised by Russell Greiner. There, he was affiliated with Alberta Ingenuity Center for Machine Learning (now amii) as well as The Metabolomics Innovation Centre. Before that, he received my B.Sc. from Sharif University.

## Tuesday, March 27, 2018 - 12:00

## DeepCuts: Learning sonic similarity

**Speaker:**Fabian Moerchen (Amazon Music)

**Location:**Savery Hall, Room 409

We present DeepCuts, a content-based model to assess sonic similarity. DeepCuts learns how to extract features from the audio signal of a song such that similar sounding songs are close. We use behavioral signals from customers ('Customers who listened to X, also listened to...') as weak labels to train Deep Siamese Neural Networks with large margin loss. This enables similarity based recommendations, cold start recommendations, sonically consistent sequencing of songs and other improvements to the customer experience of Amazon Music and Alexa.
Bio
Fabian Moerchen is a senior scientist in the Amazon Music Machine Learning team working on recommendations, search, and content understanding. Previously he lead a data science team improving the quality of the world-wide Amazon retail catalog and measure the impact of such improvements with attribution models. At Siemens Corporate Research he worked data driven decision support systems using machine learning for preventative maintenance, text mining, bioinformatics, medical imaging, geothermal energy, and finance. His publications centers around patterns mining, neural networks, and applications including music information retrieval.

## Tuesday, March 6, 2018 - 12:00

## Active Querying for Crowdsourced Clustering

**Speaker:**Ramya Korlakai Vinayak

**Location:**CSE 403

We consider the problem of crowdsourced clustering – the task of clustering items using answers from non-expert crowd workers. In such cases, the workers are often not able to label the items directly; however, they can compare the items and answer queries of type: “Are items i and j from the same cluster?” Since the workers are not experts, they provide noisy answers. Therefore, the problem of designing queries and inferring quality data from non-expert crowd workers is of importance. We present a novel active querying algorithm which is simple and computationally efficient. We show that the proposed algorithm is guaranteed to succeed in recovering all the clusters as long as the workers provide answers with error probability less than 1/2. We provide upper and lower bounds on the number of queries made by the algorithm. While the bounds depend on the error probability, the algorithm does not require this knowledge. In addition to theoretical guarantees, we also provide extensive numerical simulations as well as experiments on real datasets, using both synthetic and real crowd workers. Joint work with Babak Hassibi.
Bio:
Ramya Korlakai Vinayak is a postdoctoral researcher in Paul G. Allen School of Computer Science and Engineering at the University of Washington in Seattle, working with Sham Kakade. Her research interests broadly span the areas of machine learning, crowdsourcing and optimization. She received her B.Tech from the IIT Madras and obtained her MS and Ph.D in Electrical Engineering at Caltech where she worked with Babak Hassibi.