If we knew what it was we were doing, it would not be called research, would it?
-- Albert Einstein


Current Projects

  • Open Information Extraction: We hope to overcome the "knowledge-acquisition bottleneck" by automatically extracting information from natural language text in a domain-independent manner. We work on improving the quality of Open IE extractors by pushing their precision and recall. A recent paper on this work.
  • NLP over Microblogs: Micro-blogging sites such as Twitter have exploded in popularity in the recent times. Tweets often represent the most up-to-date information and "buzz" on a vast spectrum of topics, however, their sheer number adds to huge information overload. We recently released a suite of NLP tools for tweets. We are currently designing automated information extraction systems over Twitter. A recent paper and a demo of automatically generated calendar of events.
  • AI Applications to Crowd-sourcing: Crowd-sourcing has taken over the business world by storm in the last few years. Although it is touted as "Artificial Artificial Intelligence", there are huge opportunities for AI to contribute to its success. A vision paper describes our approach to this synergy. We have investigated decision-theoretic techniques to automatically control workflows on a crowd-sourcing platform such as Amazon's Mechanical Turk, and have obtained significant quality improvements for the same price. Recent papers on this work: Paper 1 and Paper 2.
  • Large-scale Probabilistic Planning: Solving large Markov Decision Processes by combining several optimal as well as approximate techniques. We hope to alleviate the memory bottleneck in solving the large MDPs and scale to large, industry sized probabilistic planning problems. Some recent papers on this work: Paper 1 and Paper 2. Our planner, Glutton, was runners up in 2011 International Probabilistic Planning competition.
  • Commonsense Knowledge Extraction: Automatically creating corpora of commonsense knowledge based on reasoning over extracted information from the Web. We are currently building a large repository of selectional preferences for the different arguments taken by verbs in a sentence. We are also trying to infer many meta-properties of relations present in natural language text. A recent demo on selectional preferences and recent papers, Paper 1 and Paper 2.

Past Projects

  • Half-Open Information Extraction: Open Information Extraction, while a scalable paradigm, suffers from the drawback that it does not normalize its extractions with a domain schema. Our recent work explores middle grounds between completely open and completely closed variants of IE to leverage benefits of both. An article on this work.
  • Formal Inference in Translation Graph: Developing probabilistic inference techniques to formalize inference in translation graphs, a graph that is formed by combining all available dictionaries between all possible languages in the world. An efficient and high quality inference procedure will enable the system to produce good translations from a sense in one language to several languages, even when there is no available dictionary between the exact pair of languages. Try our demo at the Panimages website. A journal paper on this work and the AAAI Nectar version.
  • Open Information Extraction over News: A relation-independent question-answering system over thousands of current news articles. We apply Textrunner information extraction technology as well as news-specific heuristics to construct a massive knowledge base of current events. This information can be queried by asking specific questions or by keyword search.
  • Hybridizing Planners: A fast but suboptimal planner may be hybridized with a slow but optimal one to yield a high-quality, anytime planner that solves the problems in intermediate times. We developed HybPlan, a planner that hybridized GPT and MBP for probabilistic planning.