Daniela Florescu , Daphne Koller , Alon Levy , Using Probabilistic Information in Data Integration Proceedings of the 23rd VLDB Conference, Athens, Greece 1997
Abstract: The goal of a mediator system is to provide users a uniform interface to the multitude of information sources. To translate user queries, given in a mediated schema, to queries on the data sources, mediators rely on explicit mappings between the contents of the data sources and the meanings of the relations in the mediated schema.
Thus far, contents of data sources were described qualitatively. In
this paper we
describe the use of quantitative information in the form of
probabilistic knowledge in mediator systems. We consider several kinds
of probabilistic information: information about overlap between
collections in the mediated schema, coverage of the
information sources, and degrees of overlap between information
sources. We address the problem of ordering accesses to
multiple information sources, in order to maximize the likelihood of
obtaining answers as early as possible. We describe a declarative
formalism for specifying these kinds of probabilistic information, and
we propose algorithms for ordering the information sources. Finally,
we discuss a preliminary experimental evaluation of these algorithms
on the domain of bibliographic sources available on the WWW.