Title: Building and Querying Probabilistic Models for Open World Database Systems
Advisors: Magda Balazinska & Dan Suciu
Supervisory Committee: Magda Balazinska (Co-Chair), Dan Suciu (Co-Chair), Colette Moore (GSR, English), and Chrisopther Althoff (CSE)
A fundamental assumption of traditional database management systems is that the database contains all information necessary to answer a query; i.e., the database contains the entire population of data. However, with the increasing availability of public data samples (e.g. government data) and easy-to-use scientific programming languages (e.g. Python), data scientists are turning to these samples to analyze and understand the population they represents. As databases assume a closed world, data scientists are forced to use tools outside of the database for their data processing needs.
For database management systems to accommodate this growing group of users, they need to adopt the open world assumption that tuples not it the database still exist. In this dissertation, I answer two main research questions on building an open world database system that approximately answers queries as if they were issued over the entire population.