Large-scale parallel data analysis in the cloud

Week 1 - Fri, September 26th

SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets
Ronnie Chaiken, Bob Jenkins (Microsoft), Paul Larson(Microsoft Research, USA), Bill Ramsey, Darren Shakib, Simon Weaver (Microsoft), Jingren Zhou (Microsoft Research, USA). VLDB 2008.
Background: Pig-Latin, Dryad, MapReduce, etc.
Presenter:

Week 2 - Fri, October 3rd

DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language [CSENetID]
Yuan Yu, Michael Isard, Dennis Fetterly, and Mihai Budiu (Microsoft Research); òlfar Erlingsson (Reykjav’k University and Microsoft Research); Pradeep Kumar Gunda and Jon Currey (Microsoft Research). OSDI 2008.
Background: Dryad, MapReduce, BigTable, Pig, etc. Mostly Pig and MapReduce, though.
Presenter: Nicholas Murphy

Week 3 - Fri, October 10th

Automatic Optimization of Parallel Dataflow Programs
C. Olston, B. Reed, A. Silberstein and U. Srivastava.
2008 USENIX Annual Technical Conference, Boston, Massachusetts, June 2008.
Presenter: YongChul Kwon

Database as a service in the cloud

Week 4 - Fri, October 17th

Dynamo: amazon's highly available key-value store
Giuseppe DeCandia, et. al. SOSP'07
Related: Amazon's Web services, Facebook Cassandra, and Google MegaStore
Presenter: Nodira Khoussainova & Flavio Pfaffhauser

Week 5 - Fri, October 24th

PNUTS: Yahoo!'s Hosted Data Serving Platform
Brian Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein (Yahoo! Research), Phil Bohannon(Yahoo!), Hans-Arno Jacobsen (Yahoo! Research and University of Toronto), Nick Puz, Daniel Weaver, Ramana Yerneni (Yahoo! Research). VLDB 2008
Related: SSDS, Amazon SimpleDB, Google App Engine, building a databse on S3
Presenter: Tom Bergan

Impact of flash memory

Week 6 - Fri, October 31st

Flashing Up The Storage Layer
Ioannis Koltsidas, Stratis Viglas (University of Edinburgh). VLDB 2008.
Related: A Case for Flash Memory SSD in Enterprise Database Applications
Sang-Won Lee (Sungkyunkwan University), Bongki Moon (University of Arizona), Chanik Park (Samsung Electronics), Jae-Myung Kim (Altibase), Sang-Woo Kim (Sungkyunkwan University). SIGMOD 2008
Presenter: Michael J Cafarella & Christopher M Ré

Week 7 - Wed, November 5th

No meeting.

Combining computation and data management in the cloud

Week 8 - Wed, November 12th

Clustera: An Integrated Computation and Data Management System [CSENetID]
David DeWitt, Eric Robinson, Srinath Shankar, Erik Paulson, Jeffrey Naughton, Andrew Krioukov, Joshua Royalty (UW - Madison). VLDB 2008
Presenter: Katherine Moore & Kristi Morton

Scientific data management in the cloud

Week 9 - Fri, November 21st

[SUBJECT TO CHANGE]
Scalable Multi-Query Optimization for Exploratory Queries over Federated Scientific Databases
Dieter Van de Craen, Frank Neven (Hasselt University), Anastasios Kementsietsidis (IBM T.J. Watson Research Center), Stijn Vansummeren (Hasselt University). VLDB 2008
Presenter:Prasang Upadhyaya