The Intelligence in Wikipedia Project
University of Washington Department of Computer Science & Engineering
  CSE Home   AI Home  About CSE    Search    Contact Info 

Project faculty
 Daniel Weld
Project students
 Fei Wu
 Raphael Hoffmann
 Eytan Adar
 Related Work

The Intelligence in Wikipedia Project


Berners-Lee's compelling vision of a Semantic Web is hindered by a chicken-and-egg problem, which can be best solved by a bootstrapping method, creating enough structured data to motivate the development of applications. We believe that autonomously `Semantifying Wikipedia' is the best way to solve the problem. We choose Wikipedia as an initial data source, because it is comprehensive, not too large, high-quality, and contains enough manually-derived structure to bootstrap an autonomous, self-supervised process. Specifically, Wikipedia contains infoboxes, taxonomic data, multi-lingual correspondences, link structures, edit history, and other features which greatly simplify extraction.

The Intelligence in Wikipedia Project aims to accelerate the extraction of Wikipedia knowledge, e.g. with construction of infoboxes, and link the resulting schemata together to form a knowledge base of outstanding size. Not only will this `semantified Wikipedia' be an even more valuable resource for AI, but it will support Faceted browsing and simple forms of inference that may increase the recall of question-answering systems.

Research Activities


  • Raphael Hoffmann, Saleema Amershi, Kayur Patel, Fei Wu, James Fogarty, Daniel S. Weld. Amplifying Community Content Creation Using Mixed-Initiative Information Extraction In CHI 2009, Boston, USA, April 2009. [pdf] Best Paper Nominee
  • Daniel S. Weld, Raphael Hoffmann, Fei Wu. Using Wikipedia to Bootstrap Open Information Extraction ACM SIGMOD Record, December 2008. [pdf].
  • Etzioni, O. and Banko, M. and Soderland, S. and Weld, D. Open Information Extraction from the Web, Communications of the ACM 51(12), December 2008. [pdf]
  • Stefan Schoenmackers, Oren Etzioni and Daniel Weld Scaling Textual Inference to the Web, Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP 2008), Honolulu, Hawaii, October 2008. [pdf]
  • Daniel S. Weld, Fei Wu, Eytan Adar, Saleema Amershi, James Fogarty, Raphael Hoffmann, Kayur Patel, Michael Skinner. Intelligence in Wikipedia In the 23rd AAAI Conference, (AAAI-08), Chicago, USA, July, 2008. [pdf]
  • Fei Wu, Raphael Hoffmann, Daniel S. Weld. Information Extraction from Wikipedia: Moving Down the Long Tail In the 14th International Conference on Knowledge Discovery & Data Mining (KDD-08), Las Vegas, USA, August, 2008 [pdf]
  • Fei Wu, Daniel S. Weld. Automatically Refining the Wikipedia Infobox Ontology In the 17th International World Wide Web Conference, (WWW-08), Beijing, China, April, 2008 [pdf] Best Student Paper Nominee
  • Fei Wu, Daniel S. Weld. Autonomously Semantifying Wikipedia In the Sixteenth Conference on Information and Knowledge Management (CIKM-07), Lisbon, Portugal, November, 2007. [pdf] Awarded Best Paper

Other Wikipedia-Related Work at UW

  • Open information extraction using TextRunner [demo]
  • Borning, A., B. Friedman, J. Davis, B. Gill, P. Kahn, T. Kriplean, and P. Lin. Laying the Foundations for Public Participation and Value Advocacy: Interaction Design for a Large-Scale Scale Urban Simulation (To appear) Proceedings of the 9th Annual International Conference on Digital Government Research (DGO '08).
  • Beschastnikh, I., T. Kriplean, and D.W. McDonald. Wikipedian Self-Governance in Action: Motivating the Policy Lens Proceedings of the 2008 AAAI International Conference on Weblogs and Social Media (ICWSM '08).
  • Kriplean, T., Beschastnikh, I., D.W. McDonald, and S. Golder. Community, Consensus, Coercion, Control: CS*W or How Policy Mediates Mass Participation Proceedings of the 2007 ACM Conference on Supporting Group Work (GROUP '07).


Turing Center
The Turing Center is a multidisciplinary research center at the University of Washington, investigating problems at the crossroads of natural language processing, data mining, Web search, and the Semantic Web.
DUB is an alliance of faculty and students across the University of Washington exploring Human-Computer Interaction and Design.

CSE logo Department of Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to Fei Wu or Raphael Hoffmann]

Last updated: March 6, 2008