Anhai Doan , Pedro Domingos , Alon Y. Levy , Learning Source Descriptions for data integration Proceedings of the International Workshop on The Web and Databases (WebDB) 2000
Abstract: To build a data-integration system, the application designer
must specify a mediated schema and supply the descriptions
of data sources. A source description contains a source
schema that describes the content of the source, and a
mapping between the corresponding elements of the source
schema and the mediated schema. Manually constructing these
mappings is both labor-intensive and error-prone, and has
proven to be a major bottleneck in deploying large-scale
data integration systems in practice. In this paper we
report on our initial work toward automatically learning
mappings between source schemas and the mediated schema.
Specifically, we investigate finding one-to-one mappings for
the leaf elements of source schemas. We describe LSD, a
system that automatically finds such mappings. LSD consults
a set of learner modules, where each module looks at the
problem from a different perspective, then combines the
predictions of the modules using a meta-learner. We report
on experimental results of applying LSD to five sources in
the real-estate domain.