Oren Etzioni

Machine Reading

We seek to apply natural language, information extraction and machine learning methods to build semantic representations of individual texts and large corpora such as the WWW.

Demos

Open Information Extraction

How can a computer accumulate a massive body of knowledge? What will Web search engines look like in ten years?


To address these questions, the Open IE project has been developing a Web-scale information extraction system that reads arbitrary text from any domain on the Web, extracts meaningful information, and stores it in a unified knowledge base for efficient querying. In contrast to traditional information extraction, the Open Information Extraction paradigm attempts to overcome the knowledge acquisition bottleneck by extracting a large number of relations at once.

Panlingual Translation

Scaling existing translation technology to all language-pairs in the world is not feasible due to the lack of aligned parallel corpora and other resources needed by statistical machine translation algorithms. This project seeks to combine all existing translation dictionaries present in the world into a single resource, translation graph and to perform probabilistic inference on this graph to automatically infer translations between language-pairs for which no dictionary exists.


KnowItAll

  1. How can a computer accumulate a massive body of knowledge?
  2. What will Web search engines look like in ten years?

To address the questions above, the KnowItAll project has been developing a variety of domain-independent systems that extract information from the Web in an autonomous, scalable manner.