Online data, including the web and social media, presents an unprecedented visual and textual summary of human lives, events, and activities. We --- the UW NLP and Computer Vision groups --- are working to design scalable new machine learning algorithms that can make sense of this ever growing body of information, by automatically inferring the latent semantic correspondences between language, images, videos, and 3D models of the world.
In machine learning, as throughout computer science, there is a tradeoff between expressiveness and tractability. On the one hand, we need powerful model classes to capture the richness and complexity of the real world. On the other, we need inference in those models to remain tractable, otherwise their potential for widespread practical use is limited. Deep learning can induce powerful representations, with multiple layers of latent variables, but these models are generally intractable. We are developing new classes of similarly expressive but still tractable models, including sum-product networks and tractable Markov logic. These models capture both class-subclass and part-subpart structure in the domain, and are in some aspects more expressive than traditional graphical models like Bayesian networks and Markov random fields. Research includes designing representations, studying their properties, developing efficient algorithms for learning them, and applications to challenging problems in natural language understanding, vision, and other areas.
Alchemy is a software package providing a series of algorithms for statistical relational learning and probabilistic logic inference, based on the Markov logic representation. Alchemy allows you to easily develop a wide range of AI applications, including:
- Collective classification
- Link prediction
- Entity resolution
- Social network modeling
- Information extraction
Our research focuses on two important aspects of the
design of information: our theoretical work investigates principled estimation,
and our more applied work explores solutions for color image processing.
Estimation problems are common in engineering, from estimating the integrity of
a pipeline to reconstructing an object based on transmitted signals. Our work
in color image processing combines statistical signal processing with human
vision science to design visual information for humans. Please check out our
Berners-Lee's compelling vision of a Semantic Web is hindered by a chicken-and-egg problem, which can be best solved by a bootstrapping method, creating enough structured data to motivate the development of applications. However, automatic information extraction systems produce errors and are not tolerated by users, whereas user contributions incentives and management to control vandalism.
In many domains, data now arrives faster than we are able to learn from it. To avoid wasting this data, we must switch from the traditional "one-shot" machine learning approach to systems that can mine continuous, high-volume, open-ended data streams as they arrive. We have identified a set of desiderata for such systems and developed an approach to building stream mining algorithms that satisfies all of them. The approach is based on explicitly minimizing the number of examples used in each learning step, while guaranteeing that user-defined targets for predictive performance are met.
We seek to apply natural language, information extraction and machine learning methods to build semantic representations of individual texts and large corpora such as the WWW.
How can a computer accumulate a massive body of knowledge? What will web search engines look like in 10 years?
To address these questions, the Open Information Extraction (Open IE) project has been developing a web-scale information extraction system that reads arbitrary text from any domain on the web, extracts meaningful information, and stores it in a unified knowledge base for efficient querying. In contrast to traditional information extraction, the Open IE paradigm attempts to overcome the knowledge acquisition bottleneck by extracting a large number of relations at once.
Scaling existing translation technology to all language-pairs in the world is not feasible due to the lack of aligned parallel corpora and other resources needed by statistical machine translation algorithms. This project seeks to combine all existing translation dictionaries present in the world into a single resource, translation graph and to perform probabilistic inference on this graph to automatically infer translations between language-pairs for which no dictionary exists.
The Spoken Networks projects studies how real-world, face-to-face social behavior can be measured and modeled in ways that simultaneously protect privacy and provide new insight into the dynamics of human social behavior.
Intelligent agents must function in a world that is characterized by high uncertainty and missing information, and by a rich structure of objects, classes, and relations. Current AI systems are, for the most part, able to handle one of these issues but not both. Overcoming this will lay the foundation for the next generation of AI, bringing it significantly closer to human-level performance on the hardest problems. In particular, learning algorithms almost invariably assume that all training examples are mutually independent, but they often have complex relations among them.
Hierarchical Matching Pursuit uses sparse coding to learn codebooks at each layer in an unsupervised way and then builds hierarchial feature representations from the learned codebooks. It achieves state-of-the-art results on many types of recognition tasks.
The RGB-D Object Dataset is a large dataset of 300 common household objects. The objects are organized into 51 categories arranged using WordNet hypernym-hyponym relationships (similar to ImageNet). This dataset was recorded using a Kinect style 3D camera that records synchronized and aligned 640x480 RGB and depth images at 30 Hz.
In this project we address joint object category, instance, and pose recognition in the context of rapid advances of RGB-D cameras that combine both visual and 3D shape information. The focus is on detection and classification of objects in indoor scenes, such as in domestic environments
We introduce an approach for identifying objects based on natural language containing appearance and name attributes.
Kernel descriptors is a general approach that extracts multi-level representations from high-dimensional structured data such as images, depth maps, and 3D point clouds.