**Title: ** Performance of Q-learning with Linear Function Approximation: Stability and Finite-Time Analysis

**Speaker: ** Siva Theja Maguluri (Georgia Tech)

**When: ** Friday, October 18, 2019 - 01:30

**Location: ** CSE 303

We consider the model-free reinforcement learning problem and study the popular Q-learning algorithm with linear function approximation for finding the optimal policy. We provide finite-time error bounds for the convergence of Q-learning with linear function approximation under an assumption on the sampling policy. Unlike some prior work in the literature, we do not make the unnatural assumption that the samples are i.i.d. (since they are Markovian), and do not require an additional projection step in the algorithm. To show this result, we first consider a more general nonlinear stochastic approximation algorithm under Markovian noise, and derive a finite-time bound on the mean-square error, which we believe is of independent interest. The key idea of our proof is to use Lyapunov drift arguments and exploit the geometric mixing property of the underlying Markov chain. This talk is based on the following paper:

https://arxiv.org/abs/1905.11425
Personal Website,

Semantic Scholar
**Title: ** Safe and Reliable Reinforcement Learning for Continuous Control

**Speaker: ** Stephen Tu

**When: ** Tuesday, April 23, 2019 - 12:00

**Location: ** CSE2 271

Many autonomous systems such as self-driving cars, unmanned aerial vehicles, and personalized robotic assistants are inherently complex. In order to deal with this complexity, practitioners are increasingly turning towards data-driven learning techniques such as reinforcement learning (RL) for designing sophisticated control policies. However, there are currently two fundamental issues that limit the widespread deployment RL: sample inefficiency and the lack of formal safety guarantees. In this talk, I will propose solutions for both these issues in the context of continuous control tasks. In particular, I will show that in the widely applicable setting where the dynamics are linear, model-based algorithms which exploit this structure are substantially more sample efficient than model-free algorithms, such as the widely used policy gradient method. Furthermore, I will describe a new model-based algorithm which comes with provable safety guarantees and is computationally efficient, relying only on convex programming. I will conclude the talk by discussing the next steps towards safe and reliable deployment of reinforcement learning.

**Title: ** Learning to learn from data: using deep adversarial learning to construct optimal statistical procedures

**Speaker: ** Alex Luedtke

**When: ** Tuesday, March 12, 2019 - 12:00

**Location: ** CSE2 271

Traditionally, statistical procedures have been derived via analytic calculations whose validity often relies on sample size growing to infinity. We use tools from deep learning to develop a new approach, adversarial Monte Carlo meta-learning, for constructing optimal statistical procedures. Statistical problems are framed as two-player games in which Nature adversarially selects a distribution that makes it difficult for a Statistician to answer the scientific question using data drawn from this distribution. The players’ strategies are parameterized via neural networks, and optimal play is learned by modifying the network weights over many repetitions of the game. Given sufficient computing time, the Statistician’s strategy is (nearly) optimal at the finite observed sample size, rather than in the hypothetical scenario where sample size grows to infinity. In numerical experiments and data examples, this approach performs favorably compared to standard practice in point estimation, individual-level predictions, and interval estimation, without requiring specialized statistical knowledge. The talk will conclude with an overview of computational challenges that we have faced and areas for future work.

**Title: ** Efficient Approaches to Changepoint Problems with Dependence Across Segments

**Speaker: ** Paul Fearnhead

**When: ** Tuesday, March 5, 2019 - 12:00

**Location: ** CSE2 271

Changepoint detection is an increasingly important problem across a range of applications. It is most commonly encountered when analysing time-series data, where changepoints correspond to points in time where some feature of the data, for example its mean, changes abruptly. Often there are important computational constraints when analysing such data, with the number of data sequences and their lengths meaning that only very efficient methods for detecting changepoints are practically feasible.
A natural way of estimating the number and location of changepoints is to minimise a cost that trades-off a measure of fit to the data with the number of changepoints fitted. There are now some efficient algorithms that can exactly solve the resulting optimisation problem, but they are only applicable in situations where there is no dependence of the mean of the data across segments. Using such methods can lead to a loss of statistical efficiency in situations where e.g. it is known that the change in mean must be positive.
This talk will present a new class of efficient algorithms that can exactly minimise our cost whilst imposing certain constraints on the relationship of the mean before and after a change. These algorithms have links to recursions that are seen for discrete-state hidden Markov Models, and within sequential Monte Carlo. We demonstrate the usefulness of these algorithms on problems such as detecting spikes in calcium imaging data. Our algorithm can analyse data of length 100,000 in less than a second, and has been used by the Allen Brain Institute to analyse the spike patterns of over 60,000 neurons.
This is joint work with Toby Hocking, Sean Jewell, Guillem Rigaill and Daniela Witten.

**Title: ** Efficient Z_q synchronization on the Euclidean lattice up to the phase transition

**Speaker: ** Ahmed El Alaoui

**When: ** Tuesday, February 19, 2019 - 12:00

**Location: ** CSE2 271

Group synchronization is an inverse problem on a graph: Given a group g and a graph G, every vertex of G is assigned an element from g. One observes a noisy version of the group difference of the endpoints of every edge in G, and the task is to estimate the original assignment. This problem is relevant to computer vision, microscopy, and is a close relative to the problem of community detection in networks.
Abbé et al. (2017) studied this problem for a variety of compact groups g when the graph G is the d-dimensional Euclidean lattice.
They established the existence of a phase transition in terms of the noise level separating a regime where recovery is possible from a regime where it is not. I will speak about the special case of the cyclic group g = Z/qZ, still on the d-dimensional lattice. I will show that under mild assumptions, recovery is efficiently possible whenever it is information-theoretically so. The algorithm has a multi-scale structure and its analysis relies on a lattice-renormalization scheme.
The fact that a “possible-but-hard phase”, where recovery is possible but computationally hard, is absent extends far beyond group-synchronization or lattices. Time permitting, I will briefly show that any inference problem built on a graph with sub-exponential growth cannot be computationally hard.
Joint work with Andrea Montanari.

**Title: ** Reluctant Interaction Modeling

**Speaker: ** Guo (Hugo) Yu

**When: ** Tuesday, January 22, 2019 - 12:00

**Location: ** CSE 691

Data sets with 100,000 or more predictor variables are common in fields such as biology. Yet if we wish to fit regression models with (pairwise) interactions, we are faced with enormous variable selection problems with at least five billion features. The scale of such problems demands seeking out computationally cheap methods (both in time and memory) that still have sound statistical properties. Motivated by these large-scale problem sizes, we develop a new method that performs feature selection for regression models with interactions. We provide theoretical results indicating favorable statistical properties and empirical results showing our method applied to large-scale interaction problems with strong statistical performance.

**Title: ** Deep Decoder: Concise Image Representations From Untrained Neural Networks

**Speaker: ** Reinhard Heckel

**When: ** Tuesday, January 8, 2019 - 12:00

**Location: ** Gates Commons CSE 691

Deep neural networks, in particular convolutional neural networks, have become highly effective tools for compressing images and solving inverse problems including denoising, inpainting, and reconstruction from few and noisy measurements. This success can be attributed in part to their ability to represent and generate natural images well. Contrary to classical tools such as wavelets, image-generating deep neural networks have a large number of parameters—typically a multiple of their output dimension—and need to be trained on large datasets. In this talk, we discuss an untrained simple image model, called the deep decoder, which is a deep neural network that can generate natural images from very few weight parameters. The deep decoder has a simple architecture with no convolutions and fewer weight parameters than the output dimensionality. This underparameterization enables the deep decoder to compress images into a concise set of network weights, which we show is on par with wavelet-based thresholding. Further, underparameterization provides a barrier to overfitting, allowing the deep decoder to have state-of-the-art performance for denoising. The deep decoder is simple in the sense that each layer has an identical structure that consists of only one upsampling unit, pixel-wise linear combination of channels, ReLU activation, and channelwise normalization. This simplicity makes the network amenable to theoretical analysis, and it sheds light on the aspects of neural networks that enable them to form effective signal representations.

**Title: ** Structured Prediction via Machine Learning for Inverse Problems

**Speaker: ** Eric Jonas

**When: ** Wednesday, November 28, 2018 - 12:00

**Location: ** CSE 403

Inverse problems are a well-studied branch of applied mathematics where we attempt to invert a well-understood forward model in an attempt to learn the properties of some hidden source. Classical examples include seismography in geoscience and tomography in medical imaging, where boundary-value measurements are used to infer properties of the interior. Many scientific problems can be cast as inverse problems with structured outputs -- graphs, discrete point sources, and more. I will present some of our recent work developing machine learning techniques for structured prediction to inverse problems in neuroscience, computational microscopy, and nuclear spectroscopy. By leveraging advances in scalable compute, deep models, and function approximation, we can solve these problems faster, more accurately, and even compensate for more complicated physical generative processes.

**Title: ** Combinatorial Inference

**Speaker: ** Junwei Lu

**When: ** Wednesday, November 14, 2018 - 12:00

**Location: ** CSE 403

We propose the combinatorial inference to explore the global topological structures of graphical models.In particular, we conduct hypothesis tests on many combinatorial graph properties including connectivity, hub detection, perfect matching, etc. Our methods can be applied to any graph property which is invariant under the deletion of edges. On the other side, we also develop a generic minimax lower bound which shows the optimality of the proposed method for a large family of graph properties. Our methods are applied to the neuroscience by discovering hub voxels contributing to visual memories.

**Title: ** Best-of-all-worlds: Robust structural risk minimization

**Speaker: ** Vidya Muthukumar

**When: ** Wednesday, October 31, 2018 - 12:00

**Location: ** CSE 403

Sequential learning algorithms typically operate under two possible sets of assumptions: that the environment is stochastic, possibly with temporal dependencies, or that the environment is controlled by a strategic agent that is adversarial to our learning. There are two factors that can make learning difficult: the model complexity of the environment if it is stochastic, and the potential presence of misleading information from an adversary. Can we learn about the nature of our environment online? Can we learn as well as we could have in hindsight had we known about the presence or absence of stochasticity, and the right notion of model complexity?
We design an algorithm inspired by recent advances in adaptivity in online learning to solve this problem in binary sequence prediction, in which the sequence is either adversarially designed or stochastic with temporal dependencies. We show that when the sequence is Markovian with a finite memory (the “in-model” case), our algorithm learns both the presence of stochasticity and the memory length, and we achieve regret rates that are competitive with the optimal greedy algorithm that has side information about the stochastic process. Additionally, in the “out-of-model cases” where the temporal dependencies can be of infinite memory, we show that our algorithm appropriately adapts and improves performance to those of higher-memory benchmarks as we see more of the sequence. We thus adaptively recover the fundamental estimation-approximation tradeoff in structural risk minimization while guaranteeing adversarial robustness.
For clarity of understanding, we present our insights in the context of binary sequence prediction, but they apply more generally in a contextual input-output prediction setting with discrete model hierarchies, discrete outputs and a black-box structural risk minimization framework. This includes online supervised classification under function classes that are VC-classes/have bounded covering numbers.