CSE 599C Hot Topics in Data Management Systems

 

Announcements

Administration

Instructor: Magdalena Balazinska

Meeting times: Tuesdays and Thursdays 12pm-1:20pm, so feel free to bring your lunch to class.

Location: CSE 503.

Overview

Advances in the area of sensor networks have created the need for systems capable of processing continuous streams of information. The dramatic rise in the number of applications based on the Internet and the World Wide Web has increased the need for efficient and scalable mechanisms to manage distributed data repositories. In today's systems, data repositories are also frequently owned by different autonomous parties and contain data in many different formats: structured, semi-structured, text, maps, images, video, etc. In this seminar, we will examine how modern data management systems cope with these new challenges, and explore open questions.

Format

One or two papers will be assigned for each class. Please read the papers and come prepared to discuss them. For each paper, a few reading questions will be provided to help you prepare.

Evaluation

The seminar will be graded as credit or no credit. To get credit, you must read the papers, come to class, and participate in the discussions. Additionally, you should also pick one of the following:

  1. If you are taking the class to get some breadth, select three papers discussed in the seminar and hand-in written answers to the reading questions. Please do not write more than a total of three pages.
  2. If you are interested in one topic covered in the seminar in particular, identify an open question related to that topic and briefly discuss the problem, the related work, and some possible solutions. Do not write more than a total of three pages.
  3. If you would like to get more seriously involved in a topic covered in the seminar, you can start a research project on that topic. Please come and see me for a list of possible projects.

DEADLINE: March 10th 2006 at 6pm.


Course Calendar

Date

Topic and readings

01/03

Background

Topic: Class introduction and introduction to relational database management systems.

Readings: None assigned.

01/05

Background

Topic: Fundamentals of query evaluation in relational database management systems.

Readings: None assigned.

Slides in html (these slides are only available from within cs.washington.edu).

01/10

Background

Topic: Active databases.

Readings:

  • Eric N. Hanson, Chris Carnes, Lan Huang, Mohan Konyala, Lloyd Noronha, Sashi Parthasarathy, J. B. Park, and Albert Vernon. Scalable Trigger Processing. ICDE 1999. [pdf]

Reading questions

Slides in html (these slides are only available from within cs.washington.edu).

01/12

Streams

Topic: Stream processing overview.

Readings:

  • B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and Issues in Data Stream Systems. PODS 2002. [pdf]

Reading questions

No slides today.

01/17

Streams

Topic: Stream data models and operators.

Readings:

  • D. Abadi, D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: A New Model and Architecture for Data Stream Management. In VLDB Journal (12)2, 2003. [pdf] (Required reading: Sections 2 and 5)
  • Arvind Arasu, Shivnath Babu, and Jennifer Widom. The CQL Continuous Query Language: Semantic Foundations and Query Execution. VLDB Journal (to appear). [pdf] (Required reading: Sections 3, 4, 5, and 6)

Reading questions

Slides in OpenOffice format (.sxi) and html

01/19

Streams

Topic: Continuous and adaptive query processing.

Readings:

  • Samuel R. Madden, Mehul A. Shah, Joseph M. Hellerstein, and Vijayshankar Raman. Continuously Adaptive Continuous Queries over Streams. SIGMOD 2002. [pdf]

Background:

  • Joseph M. Hellerstein and Ron Avnur. Eddies: Continuously Adaptive Query Processing. SIGMOD 2000. [pdf]

Reading questions

Slides in OpenOffice format (.sxi) and html

01/24

Streams

Topic: Approximate stream processing.

Readings:

  • Required: N. Tatbul, U. Çetintemel, S. Zdonik, M. Cherniack, and M. Stonebraker. Load Shedding in a Data Stream Manager. VLDB 2003. [pdf]
  • Optional (we will not discuss it in class): Theodore Johnson, S. Muthukrishnan, and Irina Rozenbaum. Sampling Algorithms in a Stream Operator. SIGMOD 2005. [pdf]

Reading questions

Slides in OpenOffice format (.sxi) and html

01/26

Streams

Topic: Distributed stream processing. Managing load and resource utilization.

Readings:

  • Peter Pietzuch, Jonathan Ledlie, Jeffrey Shneidman, Mema Roussopoulos, Matt Welsh, and Margo Seltzer. Network-Aware Operator Placement for Stream-Processing Systems. ICDE 2006. [pdf]

Reading questions

Slides in OpenOffice format (.sxi) and html

01/31

Streams

Topic: Distributed stream processing. Fault-tolerance.

Readings:

  • Jeong-Hyon Hwang, Magdalena Balazinska, Alexander Rasin, Ugur Cetintemel, Michael Stonebraker, and Stan Zdonik. High-Availability Algorithms for Distributed Stream Processing. ICDE 2005. [pdf]
  • Magdalena Balazinska, Hari Balakrishnan, Samuel Madden, and Michael Stonebraker. Fault-Tolerance in the Borealis Distributed Stream Processing System. SIGMOD 2005. [pdf]

Reading questions

Slides in OpenOffice format (.sxi) and html

02/02

Streams & device heterogeneity

Topic: Sensor networks.

Readings:

  • Samuel Madden, Michael Franklin, Joseph Hellerstein, and Wei Hong. TinyDB: An Acquisitional Query Processing System for Sensor Networks. In TODS 30(1), 2005. [pdf]

Reading questions

Slides in OpenOffice format (.sxi) and html

02/07

Distributed data management

Topics:

  • Overview of distributed data management.
  • Introduction to traditional distributed database systems.

Readings: None assigned.

Slides in OpenOffice format (.sxi) and html

02/09

Background

Topic: Transactions (background needed to further discuss distributed databases).

Readings:

  • Michael J. Franklin. Concurrency Control and Recovery. The Handbook of Computer Science and Engineering, A. Tucker, ed., CRC Press, Boca Raton, 1997. [pdf]

No reading questions.

Slides in OpenOffice format (.sxi) and html

02/14

Distributed data management

Topic: Traditional distributed databases. Uniformity and tight coupling.

Readings:

  • C. Mohan, B. Lindsay, and R. Obermarck. Transaction Management in the R* Distributed Database Management System. ACM Transactions On Database Systems 11(4), 1986. [pdf]

Reading questions

Slides in OpenOffice format (.sxi) and html

02/16

Distributed data management

Topic: Federated databases. Autonomy and incentives.

Readings:

  • M. Stonebraker, P. M. Aoki, W. Litwin, A. Pfeffer, A. Sah, J. Sidell, C. Staelin, and A. Yu. Mariposa: A Wide-area Distributed Database System. VLDB Journal (5)1, 1996. [pdf]

Reading questions

Slides in OpenOffice format (.sxi) and html

02/21

Distributed data management

Topic: Federated systems. Autonomy and heterogeneity.

Readings:

  • U. Srivastava, J. Widom, K. Munagala, and R. Motwani. Query Optimization over Web Services. Technical Report, Stanford University, Oct 2005. [pdf]

Reading questions

Slides in OpenOffice format (.sxi) and html

02/23

Distributed data management

Topic: Federated systems. Fault-tolerance.

Readings:

  • R. Barga, D. Lomet, G. Shegalov, and G. Weikum. Recovery Guarantees for Internet Applications. ACM Trans. on Internet Technology. 2004. [pdf]

Reading questions

Slides in OpenOffice format (.sxi) and html

02/28

Distributed data management

Topic: Peer-to-peer systems. Large-scale distribution.

Readings:

  • Ryan Huebsch, Joseph M. Hellerstein, Nick Lanham, Boon Thau Loo, Scott Shenker, Ion Stoica. Querying the Internet with PIER. VLDB 2003. [pdf]

Background:

  • Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan, Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications. SIGCOMM 2001. [pdf]

Reading questions

Slides in OpenOffice format (.sxi) and html

03/02

Distributed data management

Topic: Caching, replication, and disconnected operation.

Readings:

  • Required: Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. The Dangers of Replication and a Solution. ACM SIGMOD Record (25)2, 1996. [pdf]
  • Optional (we will not discuss it in class): Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, and Jonathan Goldstein. Relaxed Currency and Consistency: How to Say "Good Enough" in SQL. SIGMOD 2004. [pdf]

Reading questions

Slides in OpenOffice format (.sxi) and html

03/07

Data type heterogeneity

Topic: Structured and semi-structured data

Readings:

  • Matthias Nicola and Bert van der Linden. Native XML Support in DB2 Universal Database. VLDB 2005. [pdf]

Reading questions

Slides in OpenOffice format (.sxi) and html

03/09

Data type heterogeneity

Topic: Unstructured data and more...

Readings:

  • Sergey Brin and Lawrence Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. WWW7. 1998. [pdf]
  • Google services and tools.

Reading questions

Slides in OpenOffice format (.sxi) and html