Title: PROSE: A Programming-by-Examples Framework for Industrial Data Wrangling and Beyond
Advisors: Sumit Gulwani (MSR) and Zoran Popovic
Supervisory Committee: Sumit Gulwani (Co-Chair), Zoran Popovic (Co-Chair), Gary Hsieh (GSR, HCDE), and Ras Bodik
Abstract: Inductive program synthesis or programming by examples (PBE) is the task of synthesizing a program in an underlying domain-specific language (DSL), given the user's intent as an under-specification (e.g., input-output examples of the program's behavior). Thanks to their ease of use, PBE-based technologies have gained a lot of prominence in the last five years, successfully automating computational tasks for end-users and repetitive routines for professionals. However, designing, developing, and maintaining an effective industrial-quality PBE technology used to be an intellectual and engineering challenge, requiring 1-2 man-years of Ph.D. effort.
In this work, we present PROSE (PROgram Synthesis using Examples), a universal PBE framework that vastly simplifies industrial applications of inductive program synthesis. The key idea behind PROSE is that many PBE technologies can be refactored as instantiations of one generic meta-algorithm for synthesis that is parameterized with declarative properties of the DSL operators. This meta-algorithm is based on backpropagation: it pushes the example-based spec downward through the DSL grammar, reducing the synthesis problem to smaller synthesis subproblems on the desired program's subexpressions. In addition to being more maintainable and accessible (compared to state-of-the-art complexly entangled PBE implementations), this approach is also more efficient since it does not need to filter out incorrect program candidates during the synthesis process.
We have built 10+ PBE technologies with PROSE in the domains of data wrangling, software refactoring, and programming education. Many of them have been deployed in Microsoft mass-market products, including Excel, PowerShell, Exchange, Azure Operational Management Suite, and Azure Data Factory. In this talk, I will also give an overview of the insights that we have gained from these deployments, including (a) challenges in resolving ambiguity in user intent, (b) HCI research in user interaction models, (c) approaches to making program synthesis accessible to software developers, and (d) importance of effortless parameterization with domain-specific insight. I will also present our future research plans on investigating novel domains, user studies in disambiguation models, and adaptive synthesis technologies.