Title: Learning to Map Natural Language to General Purpose Source Code

Advisors: Luke Zettlemoyer and Alvin Cheung

Supervisory Committee: Luke Zettlemoyer (Co-Chair), Alvin Cheung (Co-Chair), Yejin Choi, and Amy Ko (GSR, iSchool)

Abstract: Mapping natural language (NL) into executable programs, also known as semantic parsing, is a popular task with applications to voice assistants, home automation devices, NL interfaces to databases (NLIDBs), and connected cars. While methods for semantic parsing have evolved from the use of hand-crafted rules to statistical semantic parsers, to neural methods, there still exist a number of limitations. Most semantic parsers make use of special-purpose intermediate meaning representations that are expensive to obtain and limit the expressivity of their programs. These systems are also hard to deploy in practice and to port to new domains.  Parsers that bypass these intermediate representations either require extensive feature engineering, or only consider limited language/code environments with fixed code templates, a fixed context, or no context at all. These limitations have restricted the widespread adoption of semantic parsing models beyond the research community.

Therefore, our work has two main goals. The first is to develop new methods accompanied with new datasets to map NL instructions directly into fully expressive general purpose programs in ways that are both inexpensive and domain agnostic. Specifically, we present neural semantic parsing models that learn from two different sources: (a) from code snippets with accompanying NL descriptions gathered directly from the web, and (b) through annotations provided by crowd programmers. We also present the first neural models that learn to map NL to source code within the context of a real-world programming environment. Our second goal is to research methods to deploy semantic parsers for new domains in the real-world from scratch, that take into account issues relating to user interaction and failure management. To achieve this, we present an approach to rapidly and easily build NLIDBs for new domains, whose performance improves over time based on user feedback, and requires minimal intervention. Additionally, we develop neural models that can explain the functionality of generated source code back to users to avoid misunderstandings in cases of failure. We hope that our contributions open up new ways of building high quality semantic parsers that are easy to train and deploy.
Place: 
CSE 203
When: 
Tuesday, July 3, 2018 - 15:30 to 17:30