Anhai Doan , Pedro Domingos , Alon Y. Levy , Learning Source Descriptions for data integration Proceedings of the International Workshop on The Web and Databases (WebDB) 2000

Abstract: To build a data-integration system, the application designer must specify a mediated schema and supply the descriptions of data sources. A source description contains a source schema that describes the content of the source, and a mapping between the corresponding elements of the source schema and the mediated schema. Manually constructing these mappings is both labor-intensive and error-prone, and has proven to be a major bottleneck in deploying large-scale data integration systems in practice. In this paper we report on our initial work toward automatically learning mappings between source schemas and the mediated schema. Specifically, we investigate finding one-to-one mappings for the leaf elements of source schemas. We describe LSD, a system that automatically finds such mappings. LSD consults a set of learner modules, where each module looks at the problem from a different perspective, then combines the predictions of the modules using a meta-learner. We report on experimental results of applying LSD to five sources in the real-estate domain.