Title: Cuttlefish: A Lightweight Primitive for Online Tuning
Advisors: Magda Balazinska and Alvin Cheung
Modern data processing applications execute increasingly sophisticated analysis that requires operations beyond traditional relational algebra. As a result, operators in query plans grow in diversity and complexity. Designing query optimizer rules and cost models to choose physical operators for all of these novel logical operators is impractical. To address this challenge, we develop Cuttlefish, a new primitive for online query plan tuning that explores the physical operator instances during query execution and exploits the fastest ones, using multi-armed bandit reinforcement learning techniques. We prototype Cuttlefish in Apache Spark and tune operators for image convolution, regular expression matching, and relational joins. Our experiments show Cuttlefish-tuned adaptive convolution and regular expression operators can reach 72-99% of the throughput of an all-knowing oracle that always selects the optimal algorithm, even when individual physical operators are up to 105x slower than the optimal. Additionally, Cuttlefish achieves join throughput improvements of up to 7.5x compared with Spark SQL’s query optimizer.