PEDRO DOMINGOS
|
Associate Professor
Address:
Dept. of Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA 98195-2350
Telephone: (206) 543-4229
Fax: (206) 543-2969
Email:
pedrod at cs dot washington dot edu
Office: 648 Allen Center
|
Research Interests
My main research interests are in the fields of machine learning and data
mining. I'd like to make computers do more with less help from us, learn from
experience, adapt effortlessly, and discover new knowledge. We need computers
that reduce the information overload by extracting the important patterns from
masses of data. This poses many deep and fascinating scientific problems: How
can a computer decide autonomously which representation is best for target
knowledge? How can it tell genuine regularities from chance occurrences? How
can pre-existing knowledge be exploited? How can a computer learn with limited
computational resources? How can learned results be made understandable by us?
My research addresses these and related questions. Research topics that I'm
working on, or have recently worked on, include:
- Learning concepts represented by sets of rules
- Using examples as implicit definitions of concepts
- Using probabilistic representations and analyses to address the
uncertainty inherent in learning
- Automating the process of selecting representations for concepts
- Learning several models and combining them to improve accuracy
and stability
- Evaluating and selecting candidate models to avoid "overfitting"
(i.e., to distinguish between genuine regularities and chance occurrences)
- Learning models that can be easily understood by people
- Using pre-existing knowledge to guide and improve learning
- Developing knowledge discovery algorithms that run in linear or
near-linear time, and so scale up to large databases
- Using subsampling techniques to scale up pre-existing approaches
- Developing algorithms that take into account the costs of decisions
- Understanding the probabilistic properties and foundations of data
mining algorithms
- Developing techniques for mining semi-structured data sources
(e.g., text, the Web)
Current Projects
Brief Bio
I received an undergraduate degree (1988) and M.S. in Electrical
Engineering and Computer Science (1992) from IST, in Lisbon. I
received an M.S. (1994) and Ph.D. (1997) in Information and Computer
Science from the University of California at Irvine. I spent two years
as an assistant professor at IST, before joining the faculty of the
University of Washington in 1999. I'm the author or co-author of over
150 technical publications in machine learning, data mining, and other
areas. I'm a member of the editorial board of the Machine Learning
journal and the advisory board of JAIR, and a co-founder of the
International Machine Learning Society. I was program co-chair of
KDD-2003, and I've served on the program committees of AAAI, ICML,
IJCAI, KDD, SIGMOD, WWW, and others. I've received a Sloan Fellowship,
an NSF CAREER Award, a Fulbright Scholarship, an IBM Faculty Award,
two KDD best paper awards, and other distinctions.
Current Students
Current Post-Docs
Alumni
- Corin Anderson,
Software Engineer, Google, Inc.
- AnHai Doan,
Assistant Professor, University of Illinois at Urbana-Champaign.
(Winner of the 2003 ACM Distinguished Dissertation Award.)
- Geoff Hulten,
Researcher, Microsoft Corp.
- Tessa Lau,
Research Staff Member, IBM Almaden Research Center.
- Matt
Richardson, Researcher, Microsoft Research.
Software
- Alchemy:
Statistical relational AI.
- BVD:
Bias-variance decomposition for zero-one loss.
- NBE:
Bayesian learner with very fast inference.
- RISE:
Unified rule- and instance-based learner.
- VFML:
Toolkit for mining massive data sources.
Ph.D. Dissertation
Selected Book Chapters
-
What's Missing in AI: The Interface Layer. In P. Cohen (ed.), Artificial
Intelligence: The First Hundred Years. Menlo Park, CA: AAAI Press. To appear.
-
Markov Logic, with various coauthors. In L. De Raedt, P. Frasconi,
K. Kersting and S. Muggleton (eds.), Probabilistic Inductive Logic
Programming (pp. 92-117), 2008. New York: Springer.
-
Markov Logic: A Unifying Framework for Statistical Relational Learning,
with Matt Richardson. In L. Getoor and B. Taskar (eds.), Introduction to
Statistical Relational Learning (pp. 339-371), 2007. Cambridge, MA: MIT Press.
-
Combining Link and Content Information in Web Search, with Matt
Richardson. In M. Levene and A. Poulovassilis (eds.), Web Dynamics
(pp. 179-193), 2004. New York: Springer.
-
Ontology Matching: A Machine Learning Approach, with AnHai Doan,
Jayant Madhavan and Alon Halevy. In S. Staab and R. Studer (eds.),
Handbook on Ontologies in Information Systems (pp. 385-403), 2004.
New York: Springer.
-
Machine Learning. In W. Klosgen and J. Zytkow (eds.), Handbook of
Data Mining and Knowledge Discovery (pp. 660-670), 2002. New York:
Oxford University Press.
-
Learning Repetitive Text-Editing Procedures with SMARTedit, with
Tessa Lau, Steve Wolfman and Dan Weld. In H. Lieberman (ed.), Your Wish
Is My Command: Giving Users the Power to Instruct their Software
(pp. 209-225), 2001. San Francisco, CA: Morgan Kaufmann.
Selected Journal Papers
-
Structured Machine Learning: Ten Problems for the Next Ten Years (Section
5 in
Structured Machine Learning: The Next Ten Years). Machine Learning.
To appear.
-
Toward Knowledge-Rich Data Mining (position paper). Data Mining and
Knowledge Discovery, 15, 21-28, 2007.
-
Markov Logic Networks, with Matt Richardson. Machine Learning, 62,
107-136, 2006.
-
Mining Social Networks for Viral Marketing (short paper). IEEE Intelligent
Systems, 20(1), 80-82, 2005.
-
Learning to Match Ontologies on the Semantic Web, with AnHai Doan,
Jayant Madhavan, Robin Dhamankar and Alon Halevy. VLDB Journal 12(4),
303-319, 2003.
-
Programming by Demonstration Using Version Space Algebra, with Tessa Lau,
Steve Wolfman and Dan Weld. Machine Learning, 53, 111-156, 2003.
-
Tree Induction for Probability-Based Ranking, with Foster Provost.
Machine Learning, 52, 199-216, 2003.
-
Learning to Match the Schemas of Data Sources: A Multistrategy
Approach, with AnHai Doan and Alon Halevy. Machine Learning, 50,
279-301, 2003.
-
A General Framework for Mining Massive Data Streams, with Geoff Hulten
(short paper). Journal of Computational and Graphical Statistics, 12, 2003.
-
Prospects and Challenges for Multi-Relational Data Mining (position
paper). SIGKDD Explorations, 5, 80-83, 2003.
-
The Role of Occam's Razor in Knowledge Discovery.
Data Mining and Knowledge Discovery, 3, 409-425, 1999.
-
Knowledge Discovery Via Multiple Models.
Intelligent Data Analysis, 2, 187-202, 1998.
-
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss,
with Michael Pazzani. Machine Learning, 29, 103-130, 1997.
-
Context-Sensitive Feature Selection for Lazy Learners.
Artificial Intelligence Review, 11, 227-253, 1997.
-
Unifying Instance-Based and Rule-Based Induction.
Machine Learning, 24, 141-168, 1996.
-
Two-Way Induction.
International Journal on Artificial Intelligence Tools, 5, 113-125, 1996.
Selected Conference Papers
-
Learning Arithmetic Circuits, with Daniel Lowd. Proceedings of the
Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (pp.
383-392), 2008. Helsinki, Finland: AUAI Press.
-
Lifted First-Order Belief Propagation, with Parag Singla. Proceedings
of the Twenty-Third AAAI Conference on Artificial Intelligence (pp. 1094-1099),
2008. Chicago, IL: AAAI Press.
-
Hybrid Markov Logic Networks, with Jue Wang. Proceedings of the
Twenty-Third AAAI Conference on Artificial Intelligence (pp. 1106-1111), 2008.
Chicago, IL: AAAI Press.
-
A General Method for Reducing the Complexity of Relational Inference and its
Application to MCMC, with Hoifung Poon and Marc Sumner. Proceedings of the
Twenty-Third AAAI Conference on Artificial Intelligence (pp. 1075-1080),
2008. Chicago, IL: AAAI Press.
-
Deep Transfer via Second-Order Markov Logic, with Jesse Davis. Proceedings
of the AAAI-2008 Workshop on Transfer Learning for Complex Tasks (pp. 13-18),
2008. Chicago, IL: AAAI Press.
-
Extracting Semantic Networks from Text via Relational Clustering,
with Stanley Kok. Proceedings of the Nineteenth European Conference on
Machine Learning. Antwerp, Belgium: Springer. To appear.
-
Efficient Weight Learning for Markov Logic Networks, with Daniel Lowd.
Proceedings of the Eleventh European Conference on Principles and Practice of
Knowledge Discovery in Databases (pp. 200-211), 2007. Warsaw, Poland:
Springer.
-
Markov Logic in Infinite Domains, with Parag Singla. Proceedings of
the Twenty-Third Conference on Uncertainty in Artificial Intelligence
(pp. 368-375), 2007. Vancouver, Canada: AUAI Press.
-
Joint Inference in Information Extraction, with Hoifung Poon. Proceedings
of the Twenty-Second National Conference on Artificial Intelligence (pp.
913-918), 2007. Vancouver, Canada: AAAI Press.
-
Statistical Predicate Invention, with Stanley Kok. Proceedings of the
Twenty-Fourth International Conference on Machine Learning (pp. 433-440),
2007. Corvallis, Oregon: ACM Press.
-
Recursive Random Fields, with Daniel Lowd. Proceedings of the Twentieth
International Joint Conference on Artificial Intelligence (pp. 950-955), 2007.
Hyderabad, India: AAAI Press.
-
Entity Resolution with Markov Logic, with Parag Singla. Proceedings of
the Sixth IEEE International Conference on Data Mining (pp. 572-582), 2006.
Hong Kong: IEEE Computer Society Press.
-
Unifying Logical and Statistical AI, with various coauthors. Proceedings
of the Twenty-First National Conference on Artificial Intelligence (pp. 2-7),
2006. Boston, MA: AAAI Press.
-
Sound and Efficient Inference with Probabilistic and Deterministic
Dependencies, with Hoifung Poon. Proceedings of the Twenty-First
National Conference on Artificial Intelligence (pp. 458-463), 2006.
Boston, MA: AAAI Press.
-
Memory-Efficient Inference in Relational Domains, with Parag Singla.
Proceedings of the Twenty-First National Conference on Artificial
Intelligence (pp. 488-493), 2006. Boston, MA: AAAI Press.
-
Object Identification with Attribute-Mediated Dependences, with Parag
Singla. Proceedings of the Ninth European Conference on Principles and
Practice of Knowledge Discovery in Databases (pp. 297-308), 2005. Porto,
Portugal: Springer. Winner of the Best Paper Award.
-
Learning the Structure of Markov Logic Networks, with Stanley Kok.
Proceedings of the Twenty-Second International Conference on Machine
Learning (pp. 441-448), 2005. Bonn, Germany: ACM Press.
-
Naive Bayes Models for Probability Estimation, with Daniel Lowd.
Proceedings of the Twenty-Second International Conference on Machine
Learning (pp. 529-536), 2005. Bonn, Germany: ACM Press.
-
Discriminative Training of Markov Logic Networks, with Parag Singla.
Proceedings of the Twentieth National Conference on Artificial Intelligence
(pp. 868-873), 2005. Pittsburgh, PA: AAAI Press.
-
Markov Logic: A Unifying Framework for Statistical Relational Learning,
with Matt Richardson. Proceedings of the ICML-2004 Workshop on Statistical
Relational Learning and its Connections to Other Fields (pp. 49-54), 2004.
Banff, Canada: IMLS.
-
Multi-Relational Record Linkage, with Parag. Proceedings of the
KDD-2004 Workshop on Multi-Relational Data Mining (pp. 31-48), 2004.
Seattle, CA: ACM Press.
-
Adversarial Classification, with Nilesh Dalvi, Mausam, Sumit Sanghai
and Deepak Verma. Proceedings of the Tenth International Conference on
Knowledge Discovery and Data Mining (pp. 99-108), 2004. Seattle, WA: ACM Press.
-
Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood,
with Dan Grossman. Proceedings of the Twenty-First International Conference on
Machine Learning (pp. 361-368), 2004. Banff, Canada: ACM Press.
-
iMAP: Discovering Complex Semantic Matches between Database Schemas,
with Robin Dhamankar, Yoonkyong Lee, AnHai Doan and Alon Halevy. Proceedings
of the 2004 ACM SIGMOD International Conference on Management of Data
(pp. 383-394), 2004. Paris, France: ACM Press.
-
Building Large Knowledge Bases by Mass Collaboration, with Matt
Richardson. Proceedings of the Second International Conference on
Knowledge Capture (pp. 129-137), 2003. Sanibel Island, FL: ACM Press.
-
Learning Programs from Traces Using Version Space Algebra, with Tessa Lau
and Dan Weld. Proceedings of the Second International Conference on
Knowledge Capture (pp. 36-43), 2003. Sanibel Island, FL: ACM Press.
-
Trust Management for the Semantic Web, with Matt Richardson and Rakesh
Agrawal. Proceedings of the Second International Semantic Web Conference
(pp. 351-368), 2003. Sanibel Island, FL: Springer.
-
Learning with Knowledge from Multiple Experts, with Matt Richardson.
Proceedings of the Twentieth International Conference on Machine Learning
(pp. 624-631), 2003. Washington, DC: Morgan Kaufmann.
-
Mining Massive Relational Databases, with Geoff Hulten and Yeuhi Abe.
Proceedings of the IJCAI-2003 Workshop on Learning Statistical Models from
Relational Data (pp. 53-60), 2003. Acapulco, Mexico: IJCAII.
-
Research on Statistical Relational Learning at the University of
Washington, with various coauthors. Proceedings of the IJCAI-2003
Workshop on Learning Statistical Models from Relational Data (pp. 43-47),
2003. Acapulco, Mexico: IJCAII.
-
Relational Markov Models and their Application to Adaptive Web
Navigation, with Corin Anderson and Dan Weld. Proceedings of the
Eighth International Conference on Knowledge Discovery and Data Mining
(pp. 143-152), 2002. Edmonton, Canada: ACM Press.
-
Mining Knowledge-Sharing Sites for Viral Marketing, with Matt
Richardson. Proceedings of the Eighth International Conference on
Knowledge Discovery and Data Mining (pp. 61-70), 2002. Edmonton,
Canada: ACM Press.
-
Mining Complex Models from Arbitrarily Large Databases in Constant
Time, with Geoff Hulten. Proceedings of the Eighth International
Conference on Knowledge Discovery and Data Mining (pp. 525-531),
2002. Edmonton, Canada: ACM Press.
-
Representing and Reasoning about Mappings between Domain Models,
with Jayant Madhavan, Phil Bernstein and Alon Halevy. Proceedings of the
Eighteenth National Conference on Artificial Intelligence (pp. 80-86), 2002.
Edmonton, Canada: AAAI Press.
-
Learning to Map between Ontologies on the Semantic Web, with AnHai Doan,
Jayant Madhavan and Alon Halevy. Proceedings of the Eleventh International
World Wide Web Conference (pp. 662-673), 2002. Honolulu, HI: ACM Press.
-
Learning from Infinite Data in Finite Time, with Geoff Hulten. Advances
in Neural Information Processing Systems 14 (pp. 673-680), 2002. Cambridge,
MA: MIT Press.
-
The Intelligent Surfer: Probabilistic Combination of Link and Content
Information in PageRank, with Matt Richardson. Advances in Neural
Information Processing Systems 14 (pp. 1441-1448), 2002. Cambridge, MA:
MIT Press.
-
Mining the Network Value of Customers, with Matt Richardson. Proceedings
of the Seventh International Conference on Knowledge Discovery and Data
Mining (pp. 57-66), 2001. San Francisco, CA: ACM Press.
-
Mining Time-Changing Data Streams, with Geoff Hulten and Laurie Spencer.
Proceedings of the Seventh International Conference on Knowledge Discovery
and Data Mining (pp. 97-106), 2001. San Francisco, CA: ACM Press.
-
Adaptive Web Navigation for Wireless Devices, with Corin Anderson and Dan
Weld. Proceedings of the Seventeenth International Joint Conference on
Artificial Intelligence (pp. 879-884), 2001. Seattle, WA: Morgan Kaufmann.
-
A General Method for Scaling Up Machine Learning Algorithms and its
Application to Clustering, with Geoff Hulten. Proceedings of the
Eighteenth International Conference on Machine Learning (pp. 106-113), 2001.
Williamstown, MA: Morgan Kaufmann.
-
Reconciling Schemas of Disparate Data Sources: A Machine-Learning
Approach, with AnHai Doan and Alon Halevy. Proceedings of the 2001 ACM
SIGMOD International Conference on Management of Data (pp. 509-520), 2001.
Santa Barbara, CA: ACM Press.
-
Personalizing Web Sites for Mobile Users, with Corin Anderson and
Dan Weld. Proceedings of the Tenth International World Wide Web Conference
(pp. 565-575), 2001. Hong Kong: ACM Press.
-
Mixed Initiative Interfaces for Learning Tasks: SMARTedit Talks Back,
with Steve Wolfman, Tessa Lau and Dan Weld. Proceedings of the 2001
Conference on Intelligent User Interfaces (pp. 167-174), 2001. Santa Fe,
NM: ACM Press.
-
Mining High-Speed Data Streams, with Geoff Hulten. Proceedings of the
Sixth International Conference on Knowledge Discovery and Data Mining
(pp. 71-80), 2000. Boston, MA: ACM Press.
-
A Unified Bias-Variance Decomposition for Zero-One and Squared Loss.
Proceedings of the Seventeenth National Conference on Artificial Intelligence
(pp. 564-569), 2000. Austin, TX: AAAI Press.
-
Version Space Algebra and its Application to Programming by Demonstration,
with Tessa Lau and Dan Weld. Proceedings of the Seventeenth International
Conference on Machine Learning (pp. 527-534), 2000. Stanford, CA: Morgan
Kaufmann.
-
A Unified Bias-Variance Decomposition and its Applications.
Proceedings of the Seventeenth International Conference on
Machine Learning (pp. 231-238), 2000. Stanford, CA: Morgan Kaufmann.
-
Bayesian Averaging of Classifiers and the Overfitting Problem.
Proceedings of the Seventeenth International Conference on
Machine Learning (pp. 223-230), 2000. Stanford, CA: Morgan Kaufmann.
-
Learning Source Descriptions for Data Integration, with AnHai Doan and
Alon Levy. Proceedings of the Third International Workshop on the Web and
Databases (pp. 81-86), 2000. Dallas, TX: ACM SIGMOD.
-
MetaCost: A General Method for Making Classifiers Cost-Sensitive.
Proceedings of the Fifth International Conference on Knowledge
Discovery and Data Mining (pp. 155-164), 1999. San Diego, CA: ACM
Press. Winner of the Best Paper Award for Fundamental Research.
-
Process-Oriented Estimation of Generalization Error.
Proceedings of the Sixteenth International Joint Conference on Artificial
Intelligence (pp. 714-719), 1999. Stockholm, Sweden: Morgan Kaufmann.
-
Occam's Two Razors: The Sharp and the Blunt. Proceedings of the
Fourth International Conference on Knowledge Discovery and Data Mining
(pp. 37-43), 1998. New York, NY: AAAI Press. Winner of the Best Paper
Award for Fundamental Research.
-
A Process-Oriented Heuristic for Model Selection.
Proceedings of the Fifteenth International Conference on
Machine Learning (pp. 127-135), 1998. Madison, WI: Morgan Kaufmann.
-
How to Get a Free Lunch: A Simple Cost Model for Machine Learning
Applications. Proceedings of the AAAI-98/ICML-98 Workshop on the
Methodology of Applying Machine Learning (pp. 1-7), 1998. Madison,
WI: AAAI Press.
-
Knowledge Acquisition from Examples Via Multiple Models.
Proceedings of the Fourteenth International Conference on
Machine Learning (pp. 98-106), 1997. Nashville, TN: Morgan Kaufmann.
-
Why Does Bagging Work? A Bayesian Account and its Implications.
Proceedings of the Third International Conference on Knowledge Discovery
and Data Mining (pp. 155-158), 1997. Newport Beach, CA: AAAI Press.
-
Bayesian Model Averaging in Rule Induction. Preliminary Papers of
the Sixth International Workshop on Artificial Intelligence and
Statistics (pp. 157-164), 1997. Ft. Lauderdale, FL: Society for
Artificial Intelligence and Statistics.
-
Linear-Time Rule Induction. Proceedings of the Second
International Conference on Knowledge Discovery and Data Mining
(pp. 96-101), 1996. Portland, OR: AAAI Press.
-
Using Partitioning to Speed Up Specific-to-General Rule Induction.
Proceedings of the AAAI-96 Workshop on Integrating Multiple Learned
Models (pp. 29-34), 1996. Portland, OR: AAAI Press.
-
Beyond Independence: Conditions for the Optimality of the Simple
Bayesian Classifier, with Michael Pazzani. Proceedings of the
Thirteenth International Conference on Machine Learning (pp. 105-112),
1996. Bari, Italy: Morgan Kaufmann.
-
From Instances to Rules: A Comparison of Biases. Proceedings of
the Third International Workshop on Multistrategy Learning
(pp. 147-154), 1996. Harpers Ferry, WV: AAAI Press.
-
Two-Way Induction. Proceedings of the Seventh IEEE International
Conference on Tools with Artificial Intelligence (pp. 182-189), 1995.
Herndon, VA: IEEE Computer Society Press.
-
Rule Induction and Instance-Based Learning: A Unified Approach.
Proceedings of the Fourteenth International Joint Conference on
Artificial Intelligence (pp. 1226-1232), 1995. Montreal, Canada:
Morgan Kaufmann.
-
The RISE System: Conquering Without Separating. Proceedings of
the Sixth IEEE International Conference on Tools with Artificial
Intelligence (pp. 704-707), 1994. New Orleans, LA: IEEE Computer
Society Press.
Teaching
Other Interests
- Literature, cinema, science fiction, music, interactive art.
Sports: tennis, swimming.
Department of Computer
Science and Engineering, University of Washington, Seattle
Last modified: August 27, 2008