We consider the problem of clustering observations using a potentially large set of features. One might expect that the true underlying clusters present in the data differ only with respect to a small fraction of the features, and will be missed if one clusters the observations using the full set of features. We propose a novel framework for sparse clustering, in which one clusters the observations using an adaptively chosen subset of the features. The method uses a lasso-type penalty to select the features. We use this framework to develop simple methods for sparse K-means and sparse hierarchical clustering. A single criterion governs both the selection of the features and the resulting clusters.
Deducing the state or structure of a system from partial, noisy measurements is a fundamental task throughout the sciences and engineering. A commonly encountered difficulty that arises in such inverse problems is the very limited availability of data relative to the ambient dimension of the signal to be estimated. We analyze a class of ill-posed linear inverse problems in which the underlying model of interest has simple algebraic structure. Specifically the focus of this talk is on the setting in which we are given a small number of linear measurements of a simple model, which is formed by adding a small number of elements from some atomic set. Examples of such models include previously studied cases such as sparse signals and low-rank matrices, as well as others such as low-rank tensors, binary vectors, orthogonal matrices, and matrices formed as the sum of a few permutations. Inspired by the success of the L1-norm and nuclear-norm heuristics, we propose a general convex regularization framework to recover such simple models. We discuss general recovery guarantees for our approach, and describe several example applications. Thus our work extends the catalog of objects and structures (beyond sparse vectors and low-rank matrices) that can be recovered from limited information via tractable convex programming. Joint work with Benjamin Recht, Pablo Parrilo, and Alan Willsky.
This talk is concerned with summarizing -- by means of statistical models -- of data that expresses preferences. This data is typically a set of rankings of n items by a panel of experts; the simplest summary is the "consensus ranking", or the "centroid" of the set of rankings. Such problems appear in many tasks, ranging from combining voter preferences to boosting of search engines. We study the problem in its more general form of estimating a parametric model known as the Generalized Mallows (GM) model. I will present an exact estimation algorithm, non-polynomial in theory, but extremely effective in comparison with existing algorithms. From a statistical point of view, we show that the GM model is an exponential family, and introduce the conjugate prior for this model class. Then we introduce the infinite GM model, corresponding to "rankings" over an infinite set of items, and show that this model is both elegant and of practical significance. Finally, the talk will touch upon the subject of multimodal distributions and clustering. Joint work with: Harr Chen, Alnur Ali, Bhushan Mandhani, Le Bao, Kapil Phadnis, Arthur Patterson and Jeff Bilmes.
I will present in this talk a theory of anytime, approximate inference, which explains some of the most influential algorithms in probabilistic reasoning, yet is based on one of the older ideas in symbolic reasoning: Relaxations. According to this theory, the fundamental notion of approximation is that of "relaxing" logical constraints (equalities in particular) for the purpose of decomposing a problem into smaller pieces that can be solved independently. A second fundamental notion is that of "compensation", which calls for imposing weaker and, typically, probabilistic notions of equality to compensate for the relaxed equalities. The third fundamental notion of the presented theory is that of "recovery," where some of the relaxed equalities are incrementally recovered, based on an assessment of their impact on improving the quality of approximations. I will discuss how this theory subsumes one of the most influential algorithms in probabilistic inference: loopy belief propagation and some of its generalizations, and comment on its relevance to lifted probabilistic inference. I will also show how this theory has formed the basis of an award-winning solver at the third UAI inference competition in 2010. I will finally discuss the relationship between this theory and current developments in symbolic reasoning -- in particular, how advances in knowledge compilation can impact the state of the art in approximate (and exact) probabilistic inference.
Sequence census methods such as ChIP-seq have begun to produce an unprecedented amount of genome-anchored data. Numerous techniques exist to analyze the results of single assays, but genomics research has lacked an integrative method to identify patterns from multiple experiments simultaneously, while taking full advantage of the high-resolution data now available. We have developed such a method, which employs a dynamic Bayesian network to discover joint patterns across different experiment types. We have applied this method to ENCODE chromatin data for the human chronic myeloid leukemia cell line K562. In an unsupervised fashion, we have identified patterns associated with transcription initiation and elongation, as well as transcriptional repression, finding the locations of genes without using DNA sequence, mRNA or multispecies conservation data. We have also successfully performed a semi-supervised identification of enhancer regions across the genome using a curated set of seed enhancers. In both the unsupervised and semi-supervised cases, the method identifies patterns that elucidate the relationships among open chromatin, various transcription factors and histone modifications.
In this talk, I'll discuss a journey taken by my research interests leading from dynamic graphical models to image segmentation. I'll start out discussing dynamic graphical models (DGMs) and their efficient inference. Entailed in performing inference in such models is the problem of identifying a vertex cut that determines the boundary of a critical periodic graphical component. This component is used when the model is expanded to an arbitrary length, and also determines a local junction tree segment used as part of this expansion. Depending on the boundary of this component, inference costs can change by orders of magnitude. There are many ways of judging the quality of the component without performing inference itself, including the standard vertex cut criterion. Most interestingly, however, is the case where the DGM may be sparse, and the quality is determined by an objective that takes into account this sparsity. It turns out that such an objective is submodular. Considering this, and when vertex cut is seen as standard graph cut, a new problem is naturally introduced that involves finding a minimal cut in a graph where edges are judged not by a sum of edge weights but instead by a submodular function, a problem we call cooperative cut. The remainder of the talk will provide a brief background on submodularity, cooperative cut, and various hardness and approximate solutions to this problem, and how it can be applied to image segmentation. In this latter application, results show significant improvements over the standard (node submodular) graph cut approaches, in particular for images that exhibit a "shrinking bias" problem and for those images with smooth illumination gradients. Joint work with Stefanie Jegelka, with earlier contributions from Chris Bartels.