The 2013 Summer Institute, cosponsored by the University of Washington Computer Science & Engineering and Microsoft Research will be held at the Alderbrook Resort in Union, Washington from July 22-25, 2013. Union is located on the Hood Canal, approximately two hours
southwest of Seattle.
Robust natural language understanding systems have the potential to completely revolutionize our interactions with computers. From Apple's Siri to Google Now and Microsoft's XBox Kinect, we now talk to our computers, phones, and entertainment systems on a daily basis. Similarly, as we interact with social media we constantly watch, comment on, and otherwise caption massive streams of image and video data. Recently, there has been growing interest in approaches that learn to understand these rich data sources, with a common focus on studying how language use is grounded in the physical or a virtual world. This work spans a number of different communities, which have related and overlapping research goals:
- NLP and Speech: Language researchers have recently increased their focus on learning to understand grounded language, including an emphasis on interactive learning and understanding how language refers to an external world. Example applications including learning to read and execute instructions, narrating events in videos, or participating in robust multimodal conversational interaction. Given recent progress, there is a significant opportunity to consider ever more realistic sensing models, for example as used in the vision and robotics communities.
- Computer Vision: Recently, linguistic resources have had significant impact in the computer vision community. For example, ImageNet uses WordNet categories to define object classes. Much of the work on attribute detection is closely aligned with linguistic notions of objects and properties. As visual detectors become more and more reliable, there has been significant effort to more closely tie language understanding to visual input, including for example systems that automatically caption or otherwise describe images.
- Robotics: Researchers have long dreamed of creating robots that we can converse with. Such systems could more naturally assist users and learn from their demonstrations and explanations. Recent progress has enabled a wide class of new human-robot interaction scenarios, where people can describe goals and teach robots how to achieve them. However, there remain significant opportunities to learn to understand more complex language in everyday environments.
- Cognitive Science: Finally, the classic challenge for the study of language grounding is the question of how children learn to understand language. Although the models and algorithms differ from those used in the other settings above, much can be learned by studying the assumptions that underlie what inputs are available and what behaviors can be learned.
The goal of this symposium is to provide a forum for identifying common research themes and challenges across all of these disciplines. We will discuss topics including (but not limited to):
- Situated, spoken dialog systems
- Language-driven human-robot interaction
- Interactive language learning
- Child language learning
- Procedural language understanding
- Image/video captioning and description generation
- Natural language-driven search of image/video data
- Interactive physical tutorial systems
- Using language to interact with context-aware wearable devices
- Language-driven interaction with characters in a virtual world
The Summer Institute will be a forum for discussion and interaction across disciplines. We feel the time is right to share the steady progress has been made in each area, and that this cross-fertilization of ideas has the potential to enable new collaborations and insights to push for our general understanding of situated language use.
Organizers: Luke Zettlemoyer (UW CSE) and Bill Dolan (MSR)
Descriptions of past summer institutes may be viewed at: