Data sets with 100,000 or more predictor variables are common in fields such as biology. Yet if we wish to fit regression models with (pairwise) interactions, we are faced with enormous variable selection problems with at least five billion features. The scale of such problems demands seeking out computationally cheap methods (both in time and memory) that still have sound statistical properties. Motivated by these large-scale problem sizes, we develop a new method that performs feature selection for regression models with interactions. We provide theoretical results indicating favorable statistical properties and empirical results showing our method applied to large-scale interaction problems with strong statistical performance.

**Title:**Reluctant Interaction Modeling

**Speaker:**Guo (Hugo) Yu

**When:**Tuesday, January 22, 2019 - 12:00

**Location:**CSE 691

**Title:**Deep Decoder: Concise Image Representations From Untrained Neural Networks

**Speaker:**Reinhard Heckel

**When:**Tuesday, January 8, 2019 - 12:00

**Location:**Gates Commons CSE 691

Deep neural networks, in particular convolutional neural networks, have become highly effective tools for compressing images and solving inverse problems including denoising, inpainting, and reconstruction from few and noisy measurements. This success can be attributed in part to their ability to represent and generate natural images well. Contrary to classical tools such as wavelets, image-generating deep neural networks have a large number of parameters—typically a multiple of their output dimension—and need to be trained on large datasets. In this talk, we discuss an untrained simple image model, called the deep decoder, which is a deep neural network that can generate natural images from very few weight parameters. The deep decoder has a simple architecture with no convolutions and fewer weight parameters than the output dimensionality. This underparameterization enables the deep decoder to compress images into a concise set of network weights, which we show is on par with wavelet-based thresholding. Further, underparameterization provides a barrier to overfitting, allowing the deep decoder to have state-of-the-art performance for denoising. The deep decoder is simple in the sense that each layer has an identical structure that consists of only one upsampling unit, pixel-wise linear combination of channels, ReLU activation, and channelwise normalization. This simplicity makes the network amenable to theoretical analysis, and it sheds light on the aspects of neural networks that enable them to form effective signal representations.

**Title:**Structured Prediction via Machine Learning for Inverse Problems

**Speaker:**Eric Jonas

**When:**Wednesday, November 28, 2018 - 12:00

**Location:**CSE 403

Inverse problems are a well-studied branch of applied mathematics where we attempt to invert a well-understood forward model in an attempt to learn the properties of some hidden source. Classical examples include seismography in geoscience and tomography in medical imaging, where boundary-value measurements are used to infer properties of the interior. Many scientific problems can be cast as inverse problems with structured outputs -- graphs, discrete point sources, and more. I will present some of our recent work developing machine learning techniques for structured prediction to inverse problems in neuroscience, computational microscopy, and nuclear spectroscopy. By leveraging advances in scalable compute, deep models, and function approximation, we can solve these problems faster, more accurately, and even compensate for more complicated physical generative processes.

**Title:**Combinatorial Inference

**Speaker:**Junwei Lu

**When:**Wednesday, November 14, 2018 - 12:00

**Location:**CSE 403

We propose the combinatorial inference to explore the global topological structures of graphical models.In particular, we conduct hypothesis tests on many combinatorial graph properties including connectivity, hub detection, perfect matching, etc. Our methods can be applied to any graph property which is invariant under the deletion of edges. On the other side, we also develop a generic minimax lower bound which shows the optimality of the proposed method for a large family of graph properties. Our methods are applied to the neuroscience by discovering hub voxels contributing to visual memories.

**Title:**Best-of-all-worlds: Robust structural risk minimization

**Speaker:**Vidya Muthukumar

**When:**Wednesday, October 31, 2018 - 12:00

**Location:**CSE 403

Sequential learning algorithms typically operate under two possible sets of assumptions: that the environment is stochastic, possibly with temporal dependencies, or that the environment is controlled by a strategic agent that is adversarial to our learning. There are two factors that can make learning difficult: the model complexity of the environment if it is stochastic, and the potential presence of misleading information from an adversary. Can we learn about the nature of our environment online? Can we learn as well as we could have in hindsight had we known about the presence or absence of stochasticity, and the right notion of model complexity?
We design an algorithm inspired by recent advances in adaptivity in online learning to solve this problem in binary sequence prediction, in which the sequence is either adversarially designed or stochastic with temporal dependencies. We show that when the sequence is Markovian with a finite memory (the “in-model” case), our algorithm learns both the presence of stochasticity and the memory length, and we achieve regret rates that are competitive with the optimal greedy algorithm that has side information about the stochastic process. Additionally, in the “out-of-model cases” where the temporal dependencies can be of infinite memory, we show that our algorithm appropriately adapts and improves performance to those of higher-memory benchmarks as we see more of the sequence. We thus adaptively recover the fundamental estimation-approximation tradeoff in structural risk minimization while guaranteeing adversarial robustness.
For clarity of understanding, we present our insights in the context of binary sequence prediction, but they apply more generally in a contextual input-output prediction setting with discrete model hierarchies, discrete outputs and a black-box structural risk minimization framework. This includes online supervised classification under function classes that are VC-classes/have bounded covering numbers.

**Title:**Battling Demons in Peer Review

**Speaker:**Nihar Shah

**When:**Wednesday, October 24, 2018 - 12:00

**Location:**CSE 403

Peer review is the backbone of scholarly research. It is however faced with a number of challenges (or "demons") such as subjectivity, bias/miscalibration, noise, and strategic behavior. The growing number of submissions in many areas of research such as machine learning has significantly increased the scale of these demons. This talk will present some principled and practical approaches to battle these demons in peer review:
(1) Subjectivity: How to ensure that all papers are judged by the same yardstick?
(2) Bias/miscalibration: How to use ratings in presence of arbitrary or adversarial miscalibration?
(3) Noise: How to assign reviewers to papers to simultaneously ensure fair and accurate evaluations in the presence of review noise?
(4) Strategic behavior: How to insulate peer review from strategic behavior of author-reviewers?
The work uses tools from statistics and learning theory, social choice theory, information theory, game theory and decision theory. (No prior knowledge on these topics will be assumed.)

**Title:**New Frontiers in Imitation Learning

**Speaker:**Yisong Yue

**When:**Tuesday, October 2, 2018 - 12:00

**Location:**CSE 403

The ongoing explosion of spatiotemporal tracking data has now made it possible to analyze and model fine-grained behaviors in a wide range of domains. For instance, tracking data is now being collected for every NBA basketball game with players, referees, and the ball tracked at 25 Hz, along with annotated game events such as passes, shots, and fouls. Other settings include laboratory animals, people in public spaces, professionals in settings such as operating rooms, actors speaking and performing, digital avatars in virtual environments, and even the behavior of other computational systems.
In this talk, I will describe ongoing research in using imitation learning to develop predictive models of fine-grained behavior. Imitation learning is branch of machine learning that deals with learning to imitate dynamic demonstrated behavior. I will provide a high level overview of the basic problem setting, as well as specific projects in modeling laboratory animals, professional sports, speech animation, and expensive computational oracles.

**Title:**Learning Without Mixing: Towards A Sharp Analysis of Linear System Identification

**Speaker:**Max Simchowitz

**When:**Tuesday, May 22, 2018 - 12:00

**Location:**Savery Hall, Room 409

We prove that the ordinary least-squares (OLS) estimator attains nearly minimax optimal performance for the identification of linear dynamical systems from a single observed trajectory. Our upper bound relies on a generalization of Mendelson's small-ball method to dependent data, eschewing the use of standard mixing-time arguments. Our lower bounds reveal that these upper bounds match up to logarithmic factors. In particular, we capture the correct signal-to-noise behavior of the problem, showing that more unstable linear systems are easier to estimate. This behavior is qualitatively different from arguments which rely on mixing-time calculations that suggest that unstable systems are more difficult to estimate. We generalize our technique to provide bounds for a more general class of linear response time-series. Joint work with Horia Mania, Stephen Tu, Michael I. Jordan, Benjamin Recht. Paper .

**Title:**Active-set complexity of proximal-gradient

**Speaker:**Mark Schmidt (UBC)

**When:**Saturday, May 19, 2018 - 12:00

**Location:**CSE 403

Proximal gradient methods have been found to be highly effective for solving minimization problems with non-negative constraints or L1-regularization. Under suitable non-degeneracy conditions, it is known that these algorithms identify the optimal sparsity pattern for these types of problems in a finite number of iterations. However, it is not known how many iterations this may take. We introduce the notion of the "active-set complexity", which in these cases is the number of iterations before an algorithm is guaranteed to have identified the final sparsity pattern. We further give a bound on the active-set complexity of proximal gradient methods in the common case of minimizing the sum of a strongly-convex smooth function and a separable convex non-smooth function.

**Title:**Bandits with Delayed Feedback: (Not) Everything Comes to Him Who Waits

**Speaker:**Claire Vernade (Amazon AI)

**When:**Tuesday, May 1, 2018 - 12:00

**Location:**Savery Hall, Room 409

Almost all real world implementations of bandit algorithms actually deal with bandit feedback: after a customer is presented an ad, his click (if any) is not sent within milliseconds but rather minutes or even hours or days, depending on the application. Moreover, this problem is coupled with an observation ambiguity: while the system is waiting for a click feedback, the customer might already have decided not to click at all and the learner will never get the awaited reward.
In this talk we introduce a delayed feedback model for stochastic bandits. We first consider the situation when the learner has an infinite patience and show that in that case the problem is actually not harder than the non-delayed one and can be solved similarly. However, this comes at a huge memory cost in O(T), T being the length of the experiment. Thus, we introduce a short-memory setting that mitigates the previously mentioned issue at the price of an additional censoring effect on the feedback that we carefully handle. We present an asymptotically optimal algorithm together with a regret bound and demonstrate empirically its behavior on simulated data.
Short Bio:
Claire got her PhD from Telecom ParisTech in October 2017 and is now a post-doc with Amazon CoreAI in Berlin and the University of Magdeburg. Her work focuses on designing and analyzing bandit models for recommendation, A/B testing and other marketing-related applications. From a larger perspective, she is interested in modeling external sources of uncertainty -- or bias -- in order to understand the impact that it may have on the complexity of the learning and on the final result.