Michael J. Cafarella
|
Email: username mjc, found at cs dot washington dot edu.
Physical mail:
Mike Cafarella
University of Washington
Department of Computer Science and Engineering
Box 352350
Seattle, WA 98195-2350
Office: 482 Allen Center
|
I am a 5th-year graduate student at the
Department of Computer Science and Engineering
at the University of Washington. My research interests are databases, information retrieval and extraction, and machine learning/data mining. My recent work has focused on recovering and querying structured data found on the Web.
My advisors are Oren Etzioni and Dan Suciu. I've collaborated with many fellow students, most recently with Michele Banko, Chris Re, and Nodira Khoussainova. I have also completed two research projects at Google with Alon Halevy.
I've earned degrees from Brown and the University of Edinburgh, Scotland.
Before grad school, I worked at Marimba (later bought by BMC) and Tellme Networks (later bought by Microsoft). I also costarted the Nutch and Hadoop open source search projects with Doug Cutting (but the demands of grad school have finally pushed me to emeritus status).
Publications
2008
- Data Management Projects at Google. Michael Cafarella, Edward Chang, Andrew Fikes, Alon Halevy, Wilson Hsieh, Alberto Lerner, Jayant Madhavan, S. Muthukrishnan. SIGMOD Record, 37(1), 2008.
- Ontology-driven, Unsupervised Instance Population. Luke K. McDowell and Michael Cafarella. To appear, Journal of Web Semantics, 2008.
- Uncovering the Relational Web. Michael J. Cafarella, Alon Halevy, Yang Zhang, Daisy Zhe Wang, Eugene Wu. Proceedings of the Eleventh International Workshop on the Web and Databases (WebDB), June 2008. Vancouver, Canada.
- WebTables: Exploring the Power of Tables on the Web. Michael J. Cafarella, Alon Halevy, Yang Zhang, Daisy Zhe Wang, Eugene Wu. Proceedings of VLDB 2008, August 2008. Auckland, New Zealand.
2007
- Navigating Extracted Data with Schema Discovery. Michael J. Cafarella, Dan Suciu, Oren Etzioni. Proceedings of the Tenth International Workshop on the Web and Databases (WebDB), June 2007. Beijing, China.
- Structured Querying of Web Text: A Technical Challenge. Michael J. Cafarella, Christopher Re, Dan Suciu, Oren Etzioni, Michele Banko. Proceedings of the Conference on Innovative Data Systems Research (CIDR) 2007. Asilomar, CA.
- Open Information Extraction from the Web. Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, Oren Etzioni. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), January 2007. Hyderabad, India.
2006
2005
- KnowItNow: Fast, Scalable Information Extraction from the Web. Michael J. Cafarella, Doug Downey, Stephen Soderland, and Oren Etzioni. Proceedings of the Conference on Empirical Methods in Natural Language Processing. Vancouver, 2005.
- A Search Engine for Natural Language Applications. Michael J. Cafarella, Oren Etzioni. Proceedings of the 14th International World Wide Web Conference (WWW 2005).
- Unsupervised named-entity extraction from the Web: An experimental study. Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates. In Artificial Intelligence 165, pp. 91-134. 2005.
2004
- Methods for Domain-Independent Information Extraction
from the Web: An Experimental Comparison. Oren Etzioni, Michael Cafarella,
Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S.
Weld, Alexander Yates. Proceedings of AAAI 2004.
- Web-scale Information Extraction in KnowItAll.
Oren Etzioni, Michael Cafarella, Doug Downey, Stanley Kok, Ana-Maria
Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander
Yates. Proceedings of the 13th International World Wide Web Conference (WWW 2004).
- Building Nutch: Open Source Search by Mike Cafarella and Doug Cutting. ACM Queue, 2(2), April 2004.
Teaching
I TA'ed CSE454, Advanced Internet and Web Services in Winter '04 and Autumn '06. I really enjoyed helping to teach this class; if you're a UW student, give it a shot.
Personal
Last modified: April 17, 2008