Zachary Ives , Alon Halevy , Dan Weld , Integrating Network-Bound XML Data Data Engineering Bulletin 24(2) 2001
Abstract: Although XML was originally envisioned as a replacement for HTML on
the web, to this point it has instead been used primarily as a format
for on-demand interchange of data between applications and
enterprises. The web is rather sparsely populated with static XML
documents, but nearly every data management application today can
export XML data. There is great interest in integrating such exported
data across applications and administrative boundaries, and as a
result, efficient techniques for integrating XML data across local-
and wide-area networks are an important research focus.
In this paper, we provide an overview of the Tukwila data integration
system, which is based on the first XML query engine designed
specifically for processing network-bound XML data sources. In
contrast to previous approaches, which must read, parse, and often
store XML data before querying it, the Tukwila XML engine can return
query results even as the data is streaming into the system. Tukwila
features a new system architecture that extends relational query
processing techniques, such as pipelining and adaptive query
processing, into the XML realm. We compare the focus of the Tukwila
project to that of other XML research systems, and then we present our
system architecture and novel query operators, such as the
x-scan operator. We conclude with a description of our current
research directions in extending XML-based adaptive query processing.