Title: Optimizing Large Scale Data Intensive Applications Using Verified Lifting
Advisors: Alvin Cheung and Ras Bodik
Abstract: We present a tool called Casper that enables sequential data-intensive programs to automatically leverage the optimizations provided by parallel data processing frameworks. Casper works by lifting sequential code fragments to a high-level domain-specific language that can be easily retargeted to parallel data processing frameworks based on the MapReduce programming model. We use a novel cost-based syntax-guided synthesis algorithm and program verification to find sound functional MapReduce representations of the input code. Our Casper prototype currently targets sequential code fragments written in Java and retargets them to Apache Spark. Our evaluation shows that Casper can translate 46 benchmarks from 7 different suites, with the generated code performing on average 13x faster compared to the original sequential implementation.