|
CSE590ds Course Overview
|
|
Course Summary
XML and, more generally, semistructured data, differs in shape from traditional relational data. More importantly, it is used in novel kinds of applications: exchange and retrieval on the Web, rather than local storage and processing. This course covers some major
developments in semistructured data and XML that happened over the
last few years: data models, syntax, query languages, schemas, query
analysis, type-checking, publishing, indexes, storage methods, and
systems aspects.
OUTLINE (tentative)
- Introduction. SS data: its origins, its data models (OEM,
value-based), comparison with relational data
- XML: syntax (elements, attributes, DTD's, ID/IDREFS). XPath,
XPointer, XLink.
- Query languages: Path expressions (regular expressions, evaluation
in graphs), Lorel, UnQL, StruQL, XML-QL, Quilt, XSL. A
general-purpose programming language: XDuce (pronounce:
"transduce").
- Schemas: upper-bound, lower-bound schemas, DTDs, XSchema, some
theory about reasoning with XML keys.
- Advanced query analysis: query pruning, query containment (regular
path queries with constraints, conjunctive queries with regular
expressions, applications to XPath queries).
- Type-checking: simple inference (YAT), type inference in XDuce,
type-checking for k-pebble transformations.
- XML publishing from relational databases: the issues
(virtual/materialized XML views); query composition; optimization
of extraction queries; illustration with Experanto, Silkroute.
- Indexes for semistructured data and XML.
- XML storage: ternary relations, schema-based, data-based.
- Miscellaneous: data mining in semistructured data, XML
compression, incomplete information.
Requirements
There will a couple of reading assignments, 1-2 homeworks, no project
and no midterm/final.
|
 |
Department of Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA 98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to gerome]
|