UW MSR Summer Institute 2010

Breakout Topics

Breakout A - Database as a Service (DaaS)

August 2, 2010

Topic 1: Performance and Scalability
Overview: How should database as a service be architected to enable scaling to: (a) large numbers of databases (b) large databases? What applications, beyond simple OLTP applications, can be supported efficiently and at scale?
Participants:
- Daniel Abadi
- Mike Cafarella
- Mike Carey (leader)
- Jeff Hammerbacher
- Chris Jermaine
- Paul Larson
- Luke Lonergan
- Balan Sethu Raman
- Yuan Yu
Topic 2: Consistency, Replication, Availability, Programmability
Overview: What consistency models are important to support? What are the trade-offs in the choices of querying paradigms? What new challenges arise due to geo-replication?
Participants:
- Amr El Abbadi
- Phil Bernstein
- Bill Bolosky
- Molham Aref
- Jim Larus
- Hank Levy
- Doug Terry
- Jeff Ullman
- Raghu Ramakrishnan (leader)
Topic 3: Manageability and Autonomics
Overview: What challenges arise in managing, deploying, and debugging DaaS? What new opportunities exist for self-tuning and self-healing technology? [Resource management, monitoring, diagnostics, automatic tuning/corrections]
Participants:
- Shivnath Babu
- Vinayak Borkar
- Goetz Graefe
- Martin Kersten
- Vivek Narasayya
- Berthold Reinwald
- Donovan Schneider
- Haixun Wang
- Jingren Zhou (leader)
Topic 4: SLAs, Pricing Models, Benchmarking
Overview: What kinds of SLAs and Pricing Models are meaningful? What are the technical challenges in achieving such SLAs and Pricing Models? What metrics are important to benchmark?
Participants:
- Magda Balazinska
- Roger Barga
- Paul Brown
- Surajit Chaudhuri
- Bill Howe
- Donald Kossmann (leader)
- Tim Kraska
- David Maier
- Nigel Ellis

Breakout B - Large Scale Data Processing

August 3, 2010

Topic 1: Performance, fault-tolerance, scalability
Overview: What architectures should we adopt to make our data intensive scalable computing (DISC) systems faster and more scalable? [Parallel DB vs MapReduce vs hybrid; Fault-tolerance, scalability, heterogeneity, new hardware trends, query optimization, benchmarks, etc.]
Participants:
- Daniel Abadi (leader)
- Phil Bernstein
- Bill Bolosky
- Mike Carey
- Donald Kossman
- Jeff Hammerbacher
- Martin Kersten
- Paul Larson
- Hank Levy
- Luke Lonergan
- Doug Terry
Topic 2: Resource management
Overview: How should we manage resources in data intensive computing systems? What guarantees do users need with respect to performance and result quality? What if input data is streaming? What if users want to see results incrementally? [Data placement, query scheduling, sampling, online query processing, SLAs, monitoring and diagnostics, tuning, autonomics, scalability, heterogeneity, multitenancy.]
Participants:
- Shivnath Babu (leader)
- Surajit Chaudhuri
- Nigel Ellis
- Goetz Graefe
- Chris Jermaine
- Vivek Narasayya
- Balan Sethu Raman
Topic 3: New Requirements
Overview: Existing DISC systems are designed for specific types of workloads. What other data intensive workloads should also be supported and how do they affect the design of massive-scale data processing systems? [New applications (search, social networking, eScience, etc.) and what they require from DISC systems. Different data models for DISC systems (arrays, unstructured data, xml, etc.). Handling dirty data.]
Participants:
- Roger Barga (leader)
- Amr El Abbadi
- Vinayak Borkar
- Paul Brown
- Mike Cafarella
- Tim Kraska
- Raghu Ramakrishnan
- Berthold Reinwald
- Donovan Schneider
- Stan Zdonik
Topic 4: Languages, programmability, and APIs
Overview: What interfaces should be offered by DISC systems? How should DISC systems interact with users and applications? Is batch enough? [Declarative or procedural. Functionality vs. Developer ease-of-use; recursive queries, datalog]
Participants:
- Bill Howe (leader)
- Molham Aref
- Jim Larus
- Dave Maier
- Jeff Ullman
- Yuan Yu
- Jingren Zhou

Last updated: 2 August 2010



Breakout Topics Breakout A - Database as a Service (DaaS) August 2, 2010 Topic 1: Performance and Scalability Overview: How should database as a service be architected to enable scaling to: (a) large numbers of databases (b) large databases? What applications, beyond simple OLTP applications, can be supported efficiently and at scale? Participants: Daniel Abadi Mike Cafarella Mike Carey (leader) Jeff Hammerbacher Chris Jermaine Paul Larson Luke Lonergan Balan Sethu Raman Yuan Yu Topic 2: Consistency, Replication, Availability, Programmability Overview: What consistency models are important to support? What are the trade-offs in the choices of querying paradigms? What new challenges arise due to geo-replication? Participants: Amr El Abbadi Phil Bernstein Bill Bolosky Molham Aref Jim Larus Hank Levy Doug Terry Jeff Ullman Raghu Ramakrishnan (leader) Topic 3: Manageability and Autonomics Overview: What challenges arise in managing, deploying, and debugging DaaS? What new opportunities exist for self-tuning and self-healing technology? [Resource management, monitoring, diagnostics, automatic tuning/corrections] Participants: Shivnath Babu Vinayak Borkar Goetz Graefe Martin Kersten Vivek Narasayya Berthold Reinwald Donovan Schneider Haixun Wang Jingren Zhou (leader) Topic 4: SLAs, Pricing Models, Benchmarking Overview: What kinds of SLAs and Pricing Models are meaningful? What are the technical challenges in achieving such SLAs and Pricing Models? What metrics are important to benchmark? Participants: Magda Balazinska Roger Barga Paul Brown Surajit Chaudhuri Bill Howe Donald Kossmann (leader) Tim Kraska David Maier Nigel Ellis Breakout B - Large Scale Data Processing August 3, 2010 Topic 1: Performance, fault-tolerance, scalability Overview: What architectures should we adopt to make our data intensive scalable computing (DISC) systems faster and more scalable? [Parallel DB vs MapReduce vs hybrid; Fault-tolerance, scalability, heterogeneity, new hardware trends, query optimization, benchmarks, etc.] Participants: Daniel Abadi (leader) Phil Bernstein Bill Bolosky Mike Carey Donald Kossman Jeff Hammerbacher Martin Kersten Paul Larson Hank Levy Luke Lonergan Doug Terry Topic 2: Resource management Overview: How should we manage resources in data intensive computing systems? What guarantees do users need with respect to performance and result quality? What if input data is streaming? What if users want to see results incrementally? [Data placement, query scheduling, sampling, online query processing, SLAs, monitoring and diagnostics, tuning, autonomics, scalability, heterogeneity, multitenancy.] Participants: Shivnath Babu (leader) Surajit Chaudhuri Nigel Ellis Goetz Graefe Chris Jermaine Vivek Narasayya Balan Sethu Raman Topic 3: New Requirements Overview: Existing DISC systems are designed for specific types of workloads. What other data intensive workloads should also be supported and how do they affect the design of massive-scale data processing systems? [New applications (search, social networking, eScience, etc.) and what they require from DISC systems. Different data models for DISC systems (arrays, unstructured data, xml, etc.). Handling dirty data.] Participants: Roger Barga (leader) Amr El Abbadi Vinayak Borkar Paul Brown Mike Cafarella Tim Kraska Raghu Ramakrishnan Berthold Reinwald Donovan Schneider Stan Zdonik Topic 4: Languages, programmability, and APIs Overview: What interfaces should be offered by DISC systems? How should DISC systems interact with users and applications? Is batch enough? [Declarative or procedural. Functionality vs. Developer ease-of-use; recursive queries, datalog] Participants: Bill Howe (leader) Molham Aref Jim Larus Dave Maier Jeff Ullman Yuan Yu Jingren Zhou Last updated: 2 August 2010