Title: Enabling Performance-based Service Level Agreements for Data Analytics in the Cloud
Advisor: Magda Balazinska
Supervisory Committee: Magda Balazinska (Chair), Elizabeth Sanders (GSR, College of Ed.), Dan Suciu, and Hannaneh Hajishirzi (EE)
A variety of data analytics systems are available as cloud services today, for example, Amazon Elastic MapReduce (EMR) and Azure's HDInsight. To buy these services, users select and pay for a given cluster configuration: i.e., the number and type of service instances. It is well known, however, that users often have difficultly selecting configurations that meet their needs. We present our approach to this challenge. First, we investigate how to offer database-specific performance-oriented service level agreements (SLAs) to users. Second, we show how to use elastic scaling via online learning to meet SLA runtime guarantees at a low cost. Finally, since query cost estimates are an important factor in determining performance estimates in our system, we study whether we can use deep learning to predict these costs more accurately.