Zachary Ives , Daniela Florescu , Marc Friedman , Alon Levy , Dan Weld , An Adaptive Query Execution Engine for Data Integration Proc. of ACM SIGMOD Conf. on Management of Data 1999
Abstract: Query processing in data integration occurs over
network-bound, autonomous data sources. This requires extensions
to traditional optimization and execution techniques for three
reasons: there is an absence of quality statistics about the data,
data transfer rates are unpredictable and bursty, and slow or
unavailable data sources can often be replaced by overlapping or
mirrored sources. This paper presents the Tukwila data integration
system, designed to support adaptivity at its core using a
two-pronged approach. Interleaved planning and execution with
partial optimization allows Tukwilato quickly recover from
decisions based on inaccurate estimates. During execution,
Tukwilauses adaptive query operators such as the double pipelined
hash join, which produces answers quickly, and the dynamic
collector, which robustly and efficiently computes unions across
overlapping data sources. We demonstrate that the Tukwila
architecture extends previous innovations in adaptive execution
(such as query scrambling, mid-execution re-optimization, and choose
nodes), and we present experimental evidence that our techniques
result in behavior desirable for a data integration system.