Machine Learning Seminars

Sponsored by

Title: Two Facets of Learning Robust Models: Fundamental Limits and Generalization to Natural Out-of-Distribution Inputs
Speaker: Hamed Hassani
When: Tuesday, December 1, 2020 - 12:10
Location: Zoom TBD
In this talk, we will focus on the recently-emerged field of (adversarially) robust learning. The field began by the observation that modern learning models, despite the breakthrough performance, remain fragile to seemingly innocuous changes in the data such as small, norm-bounded perturbations of the input data. In response, various training methodologies have been developed for enhancing robustness. However, it is fair to say that our understanding in this field is still at its infancy and several key questions remain widely open. We will consider two such questions.
(1) Fundamental limits: It has been repeatedly observed that improving robustness to perturbed inputs (robust accuracy) comes at the cost of decreasing the accuracy on benign inputs (standard accuracy), leading to a fundamental tradeoff between these often competing objectives. Complicating matters further, recent empirical evidence suggests that a variety of other factors (size and quality of training data, model size, etc.) affect this tradeoff in somewhat surprising ways. In the first part of the talk, we will develop a precise and comprehensive understanding of such tradeoffs in the context of the simple yet foundational problem of linear regression.
(2) Robustness to other types of out-of-distribution inputs: There are other sources of fragility for deep learning that are arguably more common and less studied. Indeed, natural variation such as lighting or weather conditions or device imperfections can significantly degrade the accuracy of trained neural networks, proving that such natural variation presents a significant challenge. To this end, in the second part of the talk we propose a paradigm shift from perturbation-based adversarial robustness toward a new framework called "model-based robust deep learning". Using this framework, we will provide general training algorithms that improve the robustness of neural networks against natural variation in data. We will show the success of this framework to improve robustness of modern learning models consistently against many types of natural out-of-distribution inputs and across a variety of commonly-used datasets. .
Personal Website, Semantic Scholar

 

Title: Compressing Variational Bayes
Speaker: Stefan Mandt
When: Tuesday, November 17, 2020 - 12:30
Location: Zoom TBD
Neural image compression methods have recently outperformed their classical counterparts in rate-distortion performance and show great potential to also revolutionize video coding. In this talk, I will show how methods from approximate Bayesian inference and generative modeling can lead to dramatic performance improvements in compression. In particular, I will explain how sequential variational autoencoders can be converted into video codecs, how deep latent variable models can be compressed in post-processing with variable bitrates, and how iterative amortized inference can be used to achieve the world record in image compression performance.
Stephan Mandt is an Assistant Professor of Computer Science at the University of California, Irvine. From 2016 until 2018, he was a Senior Researcher and Head of the statistical machine learning group at Disney Research, first in Pittsburgh and later in Los Angeles. He held previous postdoctoral positions at Columbia University and Princeton University. Stephan holds a Ph.D. in Theoretical Physics from the University of Cologne. He is a Fellow of the German National Merit Foundation, a Kavli Fellow of the U.S. National Academy of Sciences, and was a visiting researcher at Google Brain. Stephan regularly serves as an Area Chair for NeurIPS, ICML, AAAI, and ICLR, and is a member of the Editorial Board of JMLR. His research is currently supported by NSF, DARPA, Intel, and Qualcomm. .
Personal Website, Semantic Scholar

 

Title: Halting Time is Predictable for Large Models: A Universality Property and Average-case Analysis
Speaker: Courtney Paquette
When: Tuesday, November 10, 2020 - 12:00
Location: TBD
Average-case analysis computes the complexity of an algorithm averaged over all possible inputs. Compared to worst-case analysis, it is more representative of the typical behavior of an algorithm, but remains largely unexplored in optimization. One difficulty is that the analysis can depend on the probability distribution of the inputs to the model. However, we show that this is not the case for a class of large-scale problems trained with first-order methods including random least squares and one-hidden layer neural networks with random weights. In fact, the halting time exhibits a universality property: it is independent of the probability distribution. With this barrier for average-case analysis removed, we provide the first explicit average-case convergence rates showing a tighter complexity not captured by traditional worst-case analysis. Finally, numerical simulations suggest this universality property holds for a more general class of algorithms and problems.

Courtney Paquette is an Assistant Professor at McGill University. Paquette’s research broadly focuses on designing and analyzing algorithms for large-scale optimization problems, motivated by applications in data science. She received her PhD from the mathematics department at the University of Washington (2017) and held postdoctoral positions at Lehigh University (2017-2018) and an NSF postdoctoral fellowship at the University of Waterloo (2018-2019). Prior to starting at McGill, she was a research scientist with Google Research, Brain Team in Montreal. Currently, she is a CIFAR Canada AI chair with the Quebec AI institute (MILA). .
Personal Website, Semantic Scholar

 

Title: Oblivious data for kernel methods
Speaker: Steffen Grunewalder (Lancaster)
When: Thursday, January 30, 2020 - 12:00
Location: Allen CSE1 403
I’ll present an approach to reduce the influence of sensitive features in data in the context of kernel methods. The resulting method uses Hilbert space valued conditional expectations to create new features that are close approximations of the original (non-sensitive) features while having a reduced dependence on the sensitive features. I’ll provide optimality statements about these new features and a bound on the alpha-mixing coefficient between the sensitive features and these new features. In practice, standard techniques to estimate conditional expectations can be used to generate these features. I’ll discuss a plug-in approach for estimating conditional expectation which uses properties of the empirical process to control estimation errors. .
Personal Website, Semantic Scholar

 

Title: The Complexity of Non-Convex Stochastic Optimization
Speaker: Dylan Foster (MIT)
When: Thursday, December 5, 2019 - 10:00
Location: CSE2 371
We characterize the complexity of finding approximate stationary points (points with gradient norm at most epsilon) using stochastic first-order methods. In the well-studied model where algorithms access smooth, potentially non-convex functions through queries to an unbiased stochastic gradient oracle with bounded variance, we prove that any algorithm requires at least $\epsilon^{-4}$ queries to find an $\epsilon$-stationary point. This lower bound is tight, and establishes that stochastic gradient descent is minimax optimal. In a more restrictive model where the noisy gradient estimates satisfy an additional "mean-squared smoothness" property, we prove a lower bound of $\epsilon^{-3}$ queries, establishing the optimality of recently proposed variance reduction techniques. Joint work with Yossi Arjevani, Yair Carmon, John Duchi, Nathan Srebro, and Blake Woodworth.
Personal Website, Semantic Scholar

 

Title: The brain outside the lab
Speaker: Bing Brunton (University of Washington)
When: Tuesday, November 12, 2019 - 12:00
Location: Allen 403
Developing useful interfaces between brains and machines is a grand challenge of neuroengineering. An effective interface has the capacity to not only interpret neural signals, but predict the intentions of the human to perform an action in the near future; prediction is made even more challenging outside well-controlled laboratory experiments. I will talk about past and ongoing work in my lab to understand and model neural activity underlying natural human behavior, focusing on arm movements. We work with an opportunistic clinical dataset that comprises continuously monitored poses for 12 human subjects over ~1500 total hours, along with the simultaneously recorded high-resolution intracranial neural activity sampled at 1 kHz. The size and scope of this dataset greatly exceeds all previous datasets with movements and neural recordings (including our previously published AJILE), making it possible to leverage modern data-intensive techniques in machine learning to decode and understand the neuroscience of natural human behaviors.
Personal Website, Semantic Scholar

 

Title: Some recent advances in learning from user behavior
Speaker: Adith Swaminathan (Microsoft Research)
When: Tuesday, October 29, 2019 - 12:00
Location: Allen 403
How can we re-use the logs of an interactive system to train new policies to engage with users? Building on work in off-policy contextual bandits and reinforcement learning, we will describe recent work (with Yao Liu, Alekh Agarwal and Emma Brunskill; https://arxiv.org/abs/1904.08473, UAI’19) that describes a class of off-policy estimators for temporally-extended interaction settings. Then, we will describe recent work (with Eric Zhan, Matthew Hausknecht and Yisong Yue; https://arxiv.org/pdf/1910.01179.pdf) that derives calibratable policies which exhibit controllable generation of diverse long-term sequential behavior. We will conclude by sketching some open problems in counterfactual learning for user-interactive systems.
Personal Website, Semantic Scholar

 

Title: Learning to Decide: Dynamics and Economics
Speaker: Nika Haghtalab (Cornell University)
When: Thursday, October 24, 2019 - 11:00
Location: Gates 172
Machine learning systems are increasingly used for automated decision making; for example for designing economic policies and for identifying qualified candidates in education, financial, or judicial systems. When designing such systems, it is important to consider how changes in the target population or the environment affect the performance of the systems. Moreover, it is important to consider how these systems influence the societal forces that impact the target population. In this talk, I will explore three lines of my research addressing this dynamical and economic perspective on machine learning: Learning parameters of auctions in presence of changes in user preferences, learning admission and hiring classifiers that encourage candidates to invest in valuable skill sets, and augmenting human decision making in hiring and admission to increase the diversity of candidates.
Personal Website, Semantic Scholar

 

Title: Speeding up Distributed SGD via Communication-Efficient Model Aggregation
Speaker: Gauri Joshi (Carnegie Mellon University)
When: Tuesday, October 22, 2019 - 12:00
Location: Allen 403
Large-scale machine learning training, in particular, distributed stochastic gradient descent (SGD), needs to be robust to inherent system variability such as unpredictable computation and communication delays. This work considers a distributed SGD framework where each worker node is allowed to perform local model updates and the resulting models are averaged periodically. Our goal is to analyze and improve the true speed of error convergence with respect to wall-clock time (instead of the number of iterations). For centralized model-averaging, we propose a strategy called AdaComm that gradually increases the model-averaging frequency in order to strike the best error-runtime trade-off. For decentralized model-averaging, we propose MATCHA, where we use matching decomposition sampling of the base graph to parallelize inter-worker information exchange and reduce communication delay. Experiments on training deep neural networks show that AdaComm and MATCHA can take 3x less time to achieve the same final training loss as compared to fully synchronous SGD and vanilla decentralized SGD respectively.
Based on joint work with Jianyu Wang, Anit Sahu, and Soummya Kar
Personal Website, Semantic Scholar

 

Title: Adaptive Discretization for Decision making in large continuous spaces
Speaker: Christina Yu (Cornell University)
When: Monday, October 21, 2019 - 01:00
Location: Gates 271
In this talk, I will present a sequence of two works that explore adaptive discretization for decision making in large (continuous) state and action spaces. In the first work, we present a novel Q-learning policy for epsiodic reinforcement learning with adaptive data-driven discretization; the algorithm selectively maintains a finer partition of the state-action space in regions which are frequently visited in historical trajectories, and have higher payoff estimates. We recover the worst case regret guarantees of prior algorithms for continuous state-action spaces, which additionally require either an optimal discretization as input, and/or access to a simulation oracle. Moreover, experiments suggest that the algorithm automatically adapts to the underlying structure of the problem, resulting in a lower memory footprint and a faster convergence compared to using uniform discretization. In the second work, we consider the challenge when the metric is unknown. We consider a nonparametric contextual multi-arm bandit problem where each arm is associated to a nonparametric reward function mapping from contexts to the expected reward. Suppose that there is a large set of arms, yet there is a simple but unknown structure amongst the arm reward functions, e.g. finite types or smooth with respect to an unknown metric space. We present a novel algorithm which learns data-driven similarities amongst the arms, in order to implement adaptive partitioning of the context-arm space for more efficient learning. We provide regret bounds along with simulations that highlight the algorithm's dependence on the local geometry of the reward functions.
Personal Website, Semantic Scholar