Proximal gradient methods have been found to be highly effective for solving minimization problems with non-negative constraints or L1-regularization. Under suitable non-degeneracy conditions, it is known that these algorithms identify the optimal sparsity pattern for these types of problems in a finite number of iterations. However, it is not known how many iterations this may take. We introduce the notion of the "active-set complexity", which in these cases is the number of iterations before an algorithm is guaranteed to have identified the final sparsity pattern. We further give a bound on the active-set complexity of proximal gradient methods in the common case of minimizing the sum of a strongly-convex smooth function and a separable convex non-smooth function.

**Title:**Active-set complexity of proximal-gradient

**Speaker:**Mark Schmidt (UBC)

**When:**Saturday, May 19, 2018 - 12:00

**Location:**CSE 403

**Title:**Bandits with Delayed Feedback: (Not) Everything Comes to Him Who Waits

**Speaker:**Claire Vernade (Amazon AI)

**When:**Tuesday, May 1, 2018 - 12:00

**Location:**Savery Hall, Room 409

Almost all real world implementations of bandit algorithms actually deal with bandit feedback: after a customer is presented an ad, his click (if any) is not sent within milliseconds but rather minutes or even hours or days, depending on the application. Moreover, this problem is coupled with an observation ambiguity: while the system is waiting for a click feedback, the customer might already have decided not to click at all and the learner will never get the awaited reward.
In this talk we introduce a delayed feedback model for stochastic bandits. We first consider the situation when the learner has an infinite patience and show that in that case the problem is actually not harder than the non-delayed one and can be solved similarly. However, this comes at a huge memory cost in O(T), T being the length of the experiment. Thus, we introduce a short-memory setting that mitigates the previously mentioned issue at the price of an additional censoring effect on the feedback that we carefully handle. We present an asymptotically optimal algorithm together with a regret bound and demonstrate empirically its behavior on simulated data.
Short Bio:
Claire got her PhD from Telecom ParisTech in October 2017 and is now a post-doc with Amazon CoreAI in Berlin and the University of Magdeburg. Her work focuses on designing and analyzing bandit models for recommendation, A/B testing and other marketing-related applications. From a larger perspective, she is interested in modeling external sources of uncertainty -- or bias -- in order to understand the impact that it may have on the complexity of the learning and on the final result.

**Title:**How to tie your parameters? Parameter-sharing as a powerful solution to deep model design

**Speaker:**Siamak Ravanbakhsh (UBC)

**When:**Tuesday, April 24, 2018 - 12:00

**Location:**Savery Hall, Room 409

The language of invariance (and equivariance) can express the data structure in many application domains. Translation invariance in image data, time-invariance of Markov property in time series and exchangeability in relational data are examples of this perspective on structured domains. In this talk, I will present theoretical results demonstrating that parameter-sharing can be a "generic" technique for encoding our prior knowledge about the domain structure within a deep model. I will show experimental results in several areas including deep models for exchangeable sequences -- i.e., sets -- and its generalization to exchangeable tensors that encode interactions across multiple sets of objects
Bio: Before joining the Computer Science Department at UBC in the summer of 2017, Siamak was a postdoctoral fellow at Machine Learning Department and Robotics Institute at Carnegie Mellon University, where he worked with Barnabás Póczos and Jeff Schneider. Siamak was affiliated with the Auton Lab and the McWilliams Center for Cosmology. He obtained my M.Sc. and Ph.D. at University of Alberta, advised by Russell Greiner. There, he was affiliated with Alberta Ingenuity Center for Machine Learning (now amii) as well as The Metabolomics Innovation Centre. Before that, he received my B.Sc. from Sharif University.

**Title:**DeepCuts: Learning sonic similarity

**Speaker:**Fabian Moerchen (Amazon Music)

**When:**Tuesday, March 27, 2018 - 12:00

**Location:**Savery Hall, Room 409

We present DeepCuts, a content-based model to assess sonic similarity. DeepCuts learns how to extract features from the audio signal of a song such that similar sounding songs are close. We use behavioral signals from customers ('Customers who listened to X, also listened to...') as weak labels to train Deep Siamese Neural Networks with large margin loss. This enables similarity based recommendations, cold start recommendations, sonically consistent sequencing of songs and other improvements to the customer experience of Amazon Music and Alexa.
Bio
Fabian Moerchen is a senior scientist in the Amazon Music Machine Learning team working on recommendations, search, and content understanding. Previously he lead a data science team improving the quality of the world-wide Amazon retail catalog and measure the impact of such improvements with attribution models. At Siemens Corporate Research he worked data driven decision support systems using machine learning for preventative maintenance, text mining, bioinformatics, medical imaging, geothermal energy, and finance. His publications centers around patterns mining, neural networks, and applications including music information retrieval.

**Title:**Active Querying for Crowdsourced Clustering

**Speaker:**Ramya Korlakai Vinayak

**When:**Tuesday, March 6, 2018 - 12:00

**Location:**CSE 403

We consider the problem of crowdsourced clustering – the task of clustering items using answers from non-expert crowd workers. In such cases, the workers are often not able to label the items directly; however, they can compare the items and answer queries of type: “Are items i and j from the same cluster?” Since the workers are not experts, they provide noisy answers. Therefore, the problem of designing queries and inferring quality data from non-expert crowd workers is of importance. We present a novel active querying algorithm which is simple and computationally efficient. We show that the proposed algorithm is guaranteed to succeed in recovering all the clusters as long as the workers provide answers with error probability less than 1/2. We provide upper and lower bounds on the number of queries made by the algorithm. While the bounds depend on the error probability, the algorithm does not require this knowledge. In addition to theoretical guarantees, we also provide extensive numerical simulations as well as experiments on real datasets, using both synthetic and real crowd workers. Joint work with Babak Hassibi.
Bio:
Ramya Korlakai Vinayak is a postdoctoral researcher in Paul G. Allen School of Computer Science and Engineering at the University of Washington in Seattle, working with Sham Kakade. Her research interests broadly span the areas of machine learning, crowdsourcing and optimization. She received her B.Tech from the IIT Madras and obtained her MS and Ph.D in Electrical Engineering at Caltech where she worked with Babak Hassibi.

**Title:**Continuous-time dynamics for convex optimization

**Speaker:**Walid Krichene

**When:**Tuesday, February 20, 2018 - 12:00

**Location:**CSE 403

Many optimization algorithms can be viewed as a discretization of a continuous-time process (described using an ordinary differential equation, or a stochastic differential equation in the presence of noise). The continuous-time point of view can be useful for simplifying the analysis, drawing connections with physics, and streamlining the design of new algorithms and heuristics.
We will review results from continuous-time optimization, from the simple case of gradient flow, to more recent results on accelerated methods. In particular, we give simple interpretations of acceleration, and show how these interpretations motivate heuristics (restarting and adaptive averaging) which, empirically, can significantly improve the speed of convergence. We will then focus on the stochastic case, and study the interaction between acceleration and noise, and their effect on the convergence rates. We will conclude with a brief review of how the same tools can be applied in other problems, such as approximate sampling and non-convex optimization.
Bio:
Walid Krichene is at Google Research, where he works on large-scale optimization and recommendation. He received his Ph.D. in EECS in 2016 from UC Berkeley, where he was advised by Alex Bayen and Peter Bartlett, a M.A. in Mathematics from U.C. Berkeley, and a M.S. in Engineering and Applied Math from the Ecole des Mines Paristech. He received the Leon Chua Award and two outstanding GSI awards from U.C. Berkeley. His research interests include convex optimization, stochastic approximation, recommender systems, and online learning.

**Title:**Adaptive Averaging in Accelerated Descent Dynamics

**Speaker:**Walid Krichene (Google)

**When:**Tuesday, February 20, 2018 - 12:00

**Location:**CSE 403

We study accelerated descent dynamics for constrained convex optimization. This dynamics can be described naturally as a coupling of a dual variable accumulating gradients at a given rate eta(t), and a primal variable obtained as the weighted average of the mirrored dual trajectory, with weights w(t). Using a Lyapunov argument, we give sufficient conditions on η and w to achieve a desired convergence rate. As an example, we show that the replicator dynamics (an example of mirror descent on the simplex) can be accelerated using a simple averaging scheme. We then propose an adaptive averaging heuristic which adaptively computes the weights to speed up the decrease of the Lyapunov function. We provide guarantees on adaptive averaging in continuous-time, and give numerical experiments in discrete-time to compare it with existing heuristics, such as adaptive restarting. The experiments indicate that adaptive averaging performs at least as well as adaptive restarting, with significant improvements in some cases.

**Title:**Active Learning for Convex Regression

**Speaker:**Kevin Jamieson

**When:**Tuesday, February 6, 2018 - 12:00

**Location:**CSE 403

In this work, we will introduce the first principled adaptive-sampling procedure for learning a convex function in the L∞ norm, a problem that arises often in economics, psychology, and the social sciences. We present a function-specific measure of complexity and use it to prove that our algorithm is information theoretically near-optimal in a strong, function-specific sense. We also corroborate our theoretical contributions with extensive numerical experiments, finding that our method substantially outperforms passive, uniform sampling for favorable synthetic and data-derived functions in low-noise settings with large sampling budgets. Our results also suggest an idealized “oracle strategy”, which we use to gauge the potential for deploying the adaptive-sampling strategy on any function in any particular setting.

**Title:**Subspace clustering via matching pursuits

**Speaker:**Michael Tschannen (ETH Zürich)

**When:**Tuesday, January 9, 2018 - 12:10

**Location:**CSE 403

Subspace clustering refers to the problem of clustering unlabeled high-dimensional data points into a union of low-dimensional linear subspaces, whose number, orientations, and dimensions are all unknown. Among the large variety of subspace clustering algorithms, sparsity-based algorithms have attracted special attention thanks to their excellent performance in practical applications. A prominent example is the sparse subspace clustering (SSC) algorithm, which performs spectral clustering based on an adjacency matrix obtained by sparsely representing each data point in terms of all the other data points via the Lasso. When the number of data points is large or the dimension of the ambient space is high, the computational complexity of SSC quickly becomes prohibitive. Replacing the Lasso in SSC by the greedy orthogonal matching pursuit (OMP) or matching pursuit (MP) algorithm significantly reduces computational complexity while often maintaining clustering accuracy. In this talk, we present an analytical performance characterization of SSC-OMP and SSC-MP for noisy data. Both SSC-OMP and SSC-MP are shown to succeed even when the subspaces intersect and when the data points are contaminated by severe noise. Analytical results in combination with numerical results indicate that both SSC-OMP and SSC-MP with a data-dependent stopping criterion automatically detect the dimensions of the subspaces underlying the data. Finally, to further reduce computational complexity of SSC-OMP and SSC-MP, we consider the scenario where the dimensionality of the data is reduced by random projections, and present corresponding performance guarantees.

**Title:**Unified optimization for self-learning robust penalties

**Speaker:**Sasha Aravkin

**When:**Tuesday, November 28, 2017 - 12:15

**Location:**CSE 403

We consider extended optimization formulations that unify tasks classically done sequentially or hierarchically. In particular, we develop self-tuning robust penalties, that require optimizing over both model parameters and shape parameters (which are typically computed with cross-validation or black-box optimization). If time permits, we will also show extended formulations that simultaneously learn a model and detect outliers in the input data.
Bio: Aleksandr Aravkin is a Washington Research Foundation Data Science Assistant Professor in Applied Mathematics at the University of Washington. He is also a Fellow at the UW eSciences Institute, and an adjunct professor in the Mathematics and Statistics. He received his M.S. in Statistics and PhD in Mathematics (Optimization) from the University of Washington in 2010. From 2010-2012, he was a postdoctoral fellow in Computer Science and Earth and Ocean Sciences at UBC, working on robust approaches for large-scale inverse problems. From 2012-2015, Dr. Aravkin was a Research Staff Member at the IBM TJ Watson Research center, and an adjunct professor at Columbia University in Computer Science and IEOR departments.