**Title:**TBD: Dylan Foster

**Speaker:**Dylan Foster (MIT)

**When:**Thursday, December 5, 2019 - 10:00

**Location:**CSE2 371

**Title:**The brain outside the lab

**Speaker:**Bing Brunton (University of Washington)

**When:**Tuesday, November 12, 2019 - 12:00

**Location:**Allen 403

Developing useful interfaces between brains and machines is a grand challenge of neuroengineering. An effective interface has the capacity to not only interpret neural signals, but predict the intentions of the human to perform an action in the near future; prediction is made even more challenging outside well-controlled laboratory experiments. I will talk about past and ongoing work in my lab to understand and model neural activity underlying natural human behavior, focusing on arm movements. We work with an opportunistic clinical dataset that comprises continuously monitored poses for 12 human subjects over ~1500 total hours, along with the simultaneously recorded high-resolution intracranial neural activity sampled at 1 kHz. The size and scope of this dataset greatly exceeds all previous datasets with movements and neural recordings (including our previously published AJILE), making it possible to leverage modern data-intensive techniques in machine learning to decode and understand the neuroscience of natural human behaviors.

Personal Website, Semantic Scholar

Personal Website, Semantic Scholar

**Title:**Some recent advances in learning from user behavior

**Speaker:**Adith Swaminathan (Microsoft Research)

**When:**Tuesday, October 29, 2019 - 12:00

**Location:**Allen 403

How can we re-use the logs of an interactive system to train new policies to engage with users? Building on work in off-policy contextual bandits and reinforcement learning, we will describe recent work (with Yao Liu, Alekh Agarwal and Emma Brunskill; https://arxiv.org/abs/1904.08473, UAI’19) that describes a class of off-policy estimators for temporally-extended interaction settings. Then, we will describe recent work (with Eric Zhan, Matthew Hausknecht and Yisong Yue; https://arxiv.org/pdf/1910.01179.pdf) that derives calibratable policies which exhibit controllable generation of diverse long-term sequential behavior. We will conclude by sketching some open problems in counterfactual learning for user-interactive systems.

Personal Website, Semantic Scholar

Personal Website, Semantic Scholar

**Title:**Learning to Decide: Dynamics and Economics

**Speaker:**Nika Haghtalab (Cornell University)

**When:**Thursday, October 24, 2019 - 11:00

**Location:**Gates 172

Machine learning systems are increasingly used for automated decision making; for example for designing economic policies and for identifying qualified candidates in education, financial, or judicial systems. When designing such systems, it is important to consider how changes in the target population or the environment affect the performance of the systems. Moreover, it is important to consider how these systems influence the societal forces that impact the target population. In this talk, I will explore three lines of my research addressing this dynamical and economic perspective on machine learning: Learning parameters of auctions in presence of changes in user preferences, learning admission and hiring classifiers that encourage candidates to invest in valuable skill sets, and augmenting human decision making in hiring and admission to increase the diversity of candidates.

Personal Website, Semantic Scholar

Personal Website, Semantic Scholar

**Title:**Speeding up Distributed SGD via Communication-Efficient Model Aggregation

**Speaker:**Gauri Joshi (Carnegie Mellon University)

**When:**Tuesday, October 22, 2019 - 12:00

**Location:**Allen 403

Large-scale machine learning training, in particular, distributed stochastic gradient descent (SGD), needs to be robust to inherent system variability such as unpredictable computation and communication delays. This work considers a distributed SGD framework where each worker node is allowed to perform local model updates and the resulting models are averaged periodically. Our goal is to analyze and improve the true speed of error convergence with respect to wall-clock time (instead of the number of iterations). For centralized model-averaging, we propose a strategy called AdaComm that gradually increases the model-averaging frequency in order to strike the best error-runtime trade-off. For decentralized model-averaging, we propose MATCHA, where we use matching decomposition sampling of the base graph to parallelize inter-worker information exchange and reduce communication delay. Experiments on training deep neural networks show that AdaComm and MATCHA can take 3x less time to achieve the same final training loss as compared to fully synchronous SGD and vanilla decentralized SGD respectively.

Based on joint work with Jianyu Wang, Anit Sahu, and Soummya Kar

Personal Website, Semantic Scholar

Based on joint work with Jianyu Wang, Anit Sahu, and Soummya Kar

Personal Website, Semantic Scholar

**Title:**Adaptive Discretization for Decision making in large continuous spaces

**Speaker:**Christina Yu (Cornell University)

**When:**Monday, October 21, 2019 - 01:00

**Location:**Gates 271

In this talk, I will present a sequence of two works that explore adaptive discretization for decision making in large (continuous) state and action spaces. In the first work, we present a novel Q-learning policy for epsiodic reinforcement learning with adaptive data-driven discretization; the algorithm selectively maintains a finer partition of the state-action space in regions which are frequently visited in historical trajectories, and have higher payoff estimates. We recover the worst case regret guarantees of prior algorithms for continuous state-action spaces, which additionally require either an optimal discretization as input, and/or access to a simulation oracle. Moreover, experiments suggest that the algorithm automatically adapts to the underlying structure of the problem, resulting in a lower memory footprint and a faster convergence compared to using uniform discretization. In the second work, we consider the challenge when the metric is unknown. We consider a nonparametric contextual multi-arm bandit problem where each arm is associated to a nonparametric reward function mapping from contexts to the expected reward. Suppose that there is a large set of arms, yet there is a simple but unknown structure amongst the arm reward functions, e.g. finite types or smooth with respect to an unknown metric space. We present a novel algorithm which learns data-driven similarities amongst the arms, in order to implement adaptive partitioning of the context-arm space for more efficient learning. We provide regret bounds along with simulations that highlight the algorithm's dependence on the local geometry of the reward functions.

Personal Website, Semantic Scholar

Personal Website, Semantic Scholar

**Title:**Performance of Q-learning with Linear Function Approximation: Stability and Finite-Time Analysis

**Speaker:**Siva Theja Maguluri (Georgia Tech)

**When:**Friday, October 18, 2019 - 01:30

**Location:**CSE 303

We consider the model-free reinforcement learning problem and study the popular Q-learning algorithm with linear function approximation for finding the optimal policy. We provide finite-time error bounds for the convergence of Q-learning with linear function approximation under an assumption on the sampling policy. Unlike some prior work in the literature, we do not make the unnatural assumption that the samples are i.i.d. (since they are Markovian), and do not require an additional projection step in the algorithm. To show this result, we first consider a more general nonlinear stochastic approximation algorithm under Markovian noise, and derive a finite-time bound on the mean-square error, which we believe is of independent interest. The key idea of our proof is to use Lyapunov drift arguments and exploit the geometric mixing property of the underlying Markov chain. This talk is based on the following paper: https://arxiv.org/abs/1905.11425

Personal Website, Semantic Scholar

Personal Website, Semantic Scholar

**Title:**Safe and Reliable Reinforcement Learning for Continuous Control

**Speaker:**Stephen Tu

**When:**Tuesday, April 23, 2019 - 12:00

**Location:**CSE2 271

Many autonomous systems such as self-driving cars, unmanned aerial vehicles, and personalized robotic assistants are inherently complex. In order to deal with this complexity, practitioners are increasingly turning towards data-driven learning techniques such as reinforcement learning (RL) for designing sophisticated control policies. However, there are currently two fundamental issues that limit the widespread deployment RL: sample inefficiency and the lack of formal safety guarantees. In this talk, I will propose solutions for both these issues in the context of continuous control tasks. In particular, I will show that in the widely applicable setting where the dynamics are linear, model-based algorithms which exploit this structure are substantially more sample efficient than model-free algorithms, such as the widely used policy gradient method. Furthermore, I will describe a new model-based algorithm which comes with provable safety guarantees and is computationally efficient, relying only on convex programming. I will conclude the talk by discussing the next steps towards safe and reliable deployment of reinforcement learning.

**Title:**Learning to learn from data: using deep adversarial learning to construct optimal statistical procedures

**Speaker:**Alex Luedtke

**When:**Tuesday, March 12, 2019 - 12:00

**Location:**CSE2 271

Traditionally, statistical procedures have been derived via analytic calculations whose validity often relies on sample size growing to infinity. We use tools from deep learning to develop a new approach, adversarial Monte Carlo meta-learning, for constructing optimal statistical procedures. Statistical problems are framed as two-player games in which Nature adversarially selects a distribution that makes it difficult for a Statistician to answer the scientific question using data drawn from this distribution. The players’ strategies are parameterized via neural networks, and optimal play is learned by modifying the network weights over many repetitions of the game. Given sufficient computing time, the Statistician’s strategy is (nearly) optimal at the finite observed sample size, rather than in the hypothetical scenario where sample size grows to infinity. In numerical experiments and data examples, this approach performs favorably compared to standard practice in point estimation, individual-level predictions, and interval estimation, without requiring specialized statistical knowledge. The talk will conclude with an overview of computational challenges that we have faced and areas for future work.

**Title:**Efficient Approaches to Changepoint Problems with Dependence Across Segments

**Speaker:**Paul Fearnhead

**When:**Tuesday, March 5, 2019 - 12:00

**Location:**CSE2 271

Changepoint detection is an increasingly important problem across a range of applications. It is most commonly encountered when analysing time-series data, where changepoints correspond to points in time where some feature of the data, for example its mean, changes abruptly. Often there are important computational constraints when analysing such data, with the number of data sequences and their lengths meaning that only very efficient methods for detecting changepoints are practically feasible.
A natural way of estimating the number and location of changepoints is to minimise a cost that trades-off a measure of fit to the data with the number of changepoints fitted. There are now some efficient algorithms that can exactly solve the resulting optimisation problem, but they are only applicable in situations where there is no dependence of the mean of the data across segments. Using such methods can lead to a loss of statistical efficiency in situations where e.g. it is known that the change in mean must be positive.
This talk will present a new class of efficient algorithms that can exactly minimise our cost whilst imposing certain constraints on the relationship of the mean before and after a change. These algorithms have links to recursions that are seen for discrete-state hidden Markov Models, and within sequential Monte Carlo. We demonstrate the usefulness of these algorithms on problems such as detecting spikes in calcium imaging data. Our algorithm can analyse data of length 100,000 in less than a second, and has been used by the Allen Brain Institute to analyse the spike patterns of over 60,000 neurons.
This is joint work with Toby Hocking, Sean Jewell, Guillem Rigaill and Daniela Witten.