The task of automatic image captioning (i.e. associating images with sentences that describe what is depicted in them) has received a lot of attention over the last couple of years. But how well do these systems actually work? What are the limits of current approaches? In this talk, I will attempt to give an overview of how work in this area has developed. I will also highlight some shortcomings of current approaches, and discuss future directions. Bio: Julia Hockenmaier is associate professor in Computer Science at the University of Illinois at Urbana-Champaign. She works on natural language processing. Her current research focuses on automatic image description, statistical parsing and unsupervised grammar induction. Her group produced the popular Flickr30K dataset. She has given tutorials on image description at EACL and CVPR. Julia received her PhD from the University of Edinburgh and did postdoctoral work at the University of Pennsylvania. She has received an NSF CAREER award was shortlisted for the British Computer Society’s Distinguished Dissertation award.
Monday, February 13, 2017 - 10:00
Speaker: Julia Hockenmaier (UIUC)
Location: CSE 305
Monday, January 23, 2017 - 10:00
Speaker: Hal Daume (University of Maryland)
Location: CSE 305
Machine learning-based natural language processing systems are amazingly effective, when plentiful labeled training data exists for the task/domain of interest. Unfortunately, for broad coverage (both in task and domain) language understanding, we're unlikely to ever have sufficient labeled data, and systems must find some other way to learn. I'll describe a novel algorithm for learning from interactions, and several problems of interest, most notably machine simultaneous interpretation (translation while someone is still speaking). This is all joint work with some amazing (former) students He He, Alvin Grissom II, John Morgan, Mohit Iyyer, Sudha Rao and Leonardo Claudino, as well as colleagues Jordan Boyd-Graber, Kai-Wei Chang, John Langford, Akshay Krishnamurthy, Alekh Agarwal, Stéphane Ross, Alina Beygelzimer and Paul Mineiro. Bio: Hal Daume III is an associate professor in Computer Science at the University of Maryland, College Park. He holds joint appointments in UMIACS and Linguistics. He has recieved best paper awards at NAACL 2016, CEAS 2010 and ECML 2008, and a best demonstration award at NIPS 2015. He was an executive board member of the North American Association for Computational Linguistics and then, in 2013, one of two program co-chairs for its conference (NAACL), and was previously the chair of the NAACL executive board. He has served as an editor for the Machine Learning Journal, the Computational Linguistics Journal and the Journal for Artificial Intelligence Research. His primary research interest is in developing new learning algorithms for prototypical problems that arise in the context of language processing and artificial intelligence. This includes topics like structured prediction, domain adaptation and unsupervised learning; as well as multilingual modeling and affect analysis. He earned his PhD at the University of Southern California with a thesis on structured prediction for language (his advisor was Daniel Marcu).
Monday, December 5, 2016 - 10:00
Speaker: Martha Palmer (University of Colorado)
Location: Gates Commons (CSE 691)
Abstract Meaning Representations (AMRs) provide a single, graph-based semantic representation that abstracts away from the word order and syntactic structure of a sentence, resulting in a more language-neutral representation of its meaning. AMRs implements a simplified, standard neo-Davidsonian semantics. A word in a sentence either maps to a concept or a relation or is omitted if it is already inherent in the representation or it conveys inter-personal attitude (e.g., stance or distancing). The basis of AMR is PropBank’s lexicon of coarse-grained senses of verb, noun and adjective relations as well as the roles associated with each sense (each lexicon entry is a ‘roleset’). By marking the appropriate roles for each sense, this level of annotation provides information regarding who is doing what to whom. However, unlike PropBank, AMR also provides a deeper level of representation of discourse relations, non-relational noun phrases, prepositional phrases, quantities and time expressions (which PropBank largely leaves unanalyzed), as well as Named Entity tags with Wikipedia links. Additionally, AMR makes a greater effort to abstract away from language-particular syntactic facts. The latest version of AMR includes adding coreference links across sentences, including links to implicit arguments. This talk will explore the differences between PropBank and AMR, the current and future plans for AMR annotation, and the potential of AMR as a basis for machine translation. It will end with a discussion of areas of semantic representation that AMR is not currently addressing, which remain as open challenges. Martha Palmer is a Professor at the University of Colorado in Linguistics, Computer Science and Cognitive Science, and a Fellow of the Association of Computational Linguistics.. She works on trying to capture elements of the meanings of words that can comprise automatic representations of complex sentences and documents. Supervised machine learning techniques rely on vast amounts of annotated training data so she and her students are engaged in providing data with word sense tags, semantic role labels and AMRs for English, Chinese, Arabic, Hindi, and Urdu, both manually and automatically, funded by DARPA and NSF. These methods have also recently been applied to biomedical journal articles, clinical notes, and geo-science documents, funded by NIH and NSF. She is a co-editor of LiLT, Linguistic Issues in Language Technology, and has been on the CLJ Editorial Board and a co-editor of JNLE. She is a past President of the Association for Computational Linguistics, past Chair of SIGLEX and SIGHAN, co-organizer of the first few Sensevals, and was the Director of the 2011 Linguistics Institute held in Boulder, Colorado.
Monday, November 14, 2016 - 10:00
Speaker: Reut Tsarfaty (Open University of Israel)
Location: CSE 305
Can we program computers in our native tongue? This idea, termed natural language programming (NLPRO), has attracted attention almost since the inception of computers themselves. From the point of view of software engineering (SE), efforts to program in natural language (NL) have relied thus far on controlled natural languages (CNL) -– small unambiguous fragments of English with restricted grammars and limited expressivity. Is it possible to replace these CNLs with truly natural, human language? From the point of view of natural language processing (NLP), current technology successfully extracts static information from NL texts. However, the level of NL understanding required for programming in NL goes far beyond such extraction -– it requires human-like interpretation of dynamic processes which are affected by the environment, update states and lead to action. Is it possible to endow computers with this kind of NL understanding? These two questions are fundamental to SE and NLP, respectively. In this talk I argue that the solutions to these seemingly separate challenges are actually closely intertwined, and that one community’s challenge is the other community’s stepping stone for a huge leap and vice versa. Specifically, in this talk I propose to view executable programs in SE as semantic structures in NLP, as the basis for broad-coverage semantic parsing. I present a feasibility study on the statistical modeling of semantic parsing of requirement documents into executable scenarios, where the input documents are written in a restricted yet highly ambiguous fragment of English, and the target representation employs live sequence charts (LSC), a multi-modal visual-executable language for scenario-based programming. The parsing architecture I propose jointly models sentence-level and discourse-level processing in a generative probabilistic framework. I empirically show that the discourse-based model consistently outperforms the sentence-based model when constructing a system that reflects the static (entities, properties) and dynamic (behavioral scenarios) requirements in the document. I conjecture that LSCs, joint sentence-discourse modeling, and statistical learning are key ingredients for effectively tackling the NLPRO long standing challenge, and discuss ways in which NLPRO bots have the potential to change the ways humans and computers interact. BIO: Reut Tsarfaty is a senior lecturer at the department of Mathematics and Computer Science at the Open University in Israel. Reut holds a BSc. from the Technion in Israel and an MSc./PhD. from the Institute for Logic, Language and Computation (ILLC) at the University of Amsterdam. She also held postdoctoral research fellowships at Uppsala University in Sweden and at the Weizmann Institute of Science in Israel. Reut is a recipient of an ERC staring grant from the EU research council, an ISF individual research grant from the Israel science foundation, and a MOSAIC grant from the Dutch Science Foundation (NWO). Reut is a renown expert on statistical parsing of morphologically rich languages (PMRL), she served as a guest editor on PMRL for the Computational Linguistics Journal, and she is the author of the PMRL book (to be published by Morgan and Claypool publishers). Reut's research focuses on statistical models for morphological, syntactic and semantic parsing, and their applications, including (but not limited to) natural language programming, automated essay scoring, and natural language generation.
Thursday, November 10, 2016 - 15:30
Speaker: Mirella Lapata (University of Edinburgh)
Location: EEB 105
Movie analysis is an umbrella term for many tasks aiming to automatically interprete, extract, and summarize the content of a movie. Potential applications include generating shorter versions of scripts to help with the decision making process in a production company, enhancing movie recommendation engines by abstracting over specific keywords to more general concepts (e.g., thrillers with psychopaths), and notably generating movie previews. In this talk I will illustrate how NLP-based models together with video analysis can be used to facilitate various steps in the movie production pipeline. I will formalize the process of generating a shorter version of a movie as the task of finding an optimal chain of scenes and present a graph-based model that selects a chain by jointly optimizing its logical progression, diversity, and importance. I will then apply this framework to screenplay summarization, a task which could enhance script browsing and speed up reading time. I will also show that by aligning the screenplay to the movie, the model can generate movie previews with minimal modification. Finally, I will discuss how the computational analysis of movies can lead to tools that automatically create movie "profiles" which give a first impression of the movie by describing its plot, mood, location, or style. Mirella Lapata is a Professor at the School of Informatics at the University of Edinburgh. Her recent research interests are in natural language processing. She serves as an associate editor of the Journal of Artificial Intelligence Research (JAIR). She is the first recipient (2009) of the British Computer Society and Information Retrieval Specialist Group (BCS/IRSG) Karen Sparck Jones award. She has also received best paper awards in leading NLP conferences and financial support from the EPSRC (the UK Engineering and Physical Sciences Research Council) and ERC (the European Research Council).
Monday, October 10, 2016 - 10:00
Speaker: Francis Bond (Nanyang Technological University, sabbatical at UW)
In this talk I introduce the Open Multilingual Wordnet, a large lexical network of words grouped into concepts and linked by typed semantic relations. The talk will cover how the resource has evolved over time (increases in both size and complexity) and introduce some of the latest extensions. Bio: Francis Bond is an Associate Professor at the Division of Linguistics and Multilingual Studies, Nanyang Technological University, Singapore. He worked on machine translation and natural language understanding in Japan, first at Nippon Telegraph and Telephone Corporation and then at the National Institute of Information and Communications Technology, where his focus was on open source natural language processing. He is an active member of the Deep Linguistic Processing with HPSG Initiative (DELPH-IN) and the Global WordNet Association. His main research interest is in natural language understanding. Francis has developed and released wordnets for Chinese, Japanese, Malay and Indonesian and coordinates the open multilingual wordnet. He is currently on sabbatical at the University of Washington.
Monday, June 6, 2016 - 12:00
Speaker: Taylor Berg-Kirkpatrick (Semantic Machines)
While acoustic signals are continuous in nature, the ways that humans generate pitch in speech and music involve important discrete decisions. As a result, models of pitch must resolve a tension between continuous and combinatorial structure. Similarly, interpreting images of printed documents requires reasoning about both continuous pixels and discrete characters. Focusing on several different tasks that involve human artifacts, I'll present probabilistic models with this goal in mind. First, I'll describe an approach to historical document recognition that uses a statistical model of the historical printing press to reason about images, and, as a result, is able to decipher historical documents in an unsupervised fashion. Second, I'll present an unsupervised system that transcribes acoustic piano music into a symbolic representation by jointly describing the discrete structure of sheet music and the continuous structure of piano sounds. Finally, I'll present a supervised method for predicting prosodic intonation from text that treats discrete prosodic decisions as latent variables, but directly models pitch in a continuous fashion. Bio: Taylor Berg-Kirkpatrick will be starting as an Assistant Professor of Language Technologies in the School of Computer Science at Carnegie Mellon University in the Fall of 2016. Currently, Taylor is a Research Scientist at Semantic Machines Inc. He recently completed his PhD in computer science at the University of California, Berkeley, working with professor Dan Klein. Taylor's research focuses on using machine learning to understand structured human data, including language but also sources like music, document images, and other complex artifacts.
Thursday, May 12, 2016 - 10:00
Speaker: Kenji Sagae (KITT.AI)
Location: CSE 305
Interest in local classification for transition-based parsing has been renewed in the past couple of years as it provides a straightforward way to parse with neural networks. This parsing framework is simple and efficient, but allows only greedy inference. I will present a new approach for approximate structured prediction that retains these desirable aspects of greedy parsing with local classifiers, but uses local classification scores suitable for global scoring and search. This is accomplished with the introduction of error states in local training, which add information about incorrect derivation paths typically left out completely in locally-trained models. Coupled with best-first search, error state models improve both efficiency and accuracy compared to commonly used global linear models with beam search. Kenji Sagae is a co-founder of KITT.AI, a Seattle-based NLP startup. After receiving a PhD in Language Technologies from Carnegie Mellon University in 2006, he held research positions at the University of Tokyo and the University of Southern California, where he was a Computer Science research faculty member from 2010 to 2015.
Friday, April 15, 2016 - 12:00
Speaker: Philip Resnik (University of Maryland)
Location: CSE 305
According to one classic definition, framing is the use of language to "select some aspects of a perceived reality and make them more salient in a communicating text". It's a familiar notion in political science -- when politicians do it deliberately and egregiously, we call it "spin" -- but I would argue that it is also a fundamental property of linguistic communication in any setting. In this talk I'll discuss work on computational modeling of framing, primarily using extensions of Bayesian topic models, as a way of studying the connection between signals in language use and underlying mental state. This has applications not only in political science, where underlying mental state includes notions like party self-identification or ideological bias, but in mental health, where underlying mental state can include conditions such as clinical depression. Bio: Philip Resnik is Professor of Linguistics at the University of Maryland, holding a joint appointment at UMD's Institute for Advanced Computer Studies. He received his Ph.D. in Computer and Information Science at the University of Pennsylvania (1993), and has worked in industry R&D at Bolt Beranek and Newman, IBM T.J. Watson Research Center, and Sun Microsystems Laboratories. His research emphasizes combining linguistic knowledge and statistical methods in computational linguistics, with a focus on multilingual applications and computational social science. As extracurricular activities, he was a technical co-founder of CodeRyte Inc., a provider of language technology solutions in healthcare (acquired in 2012 by 3M), he has served as lead scientist for Converseon, a leading social media consultancy, and he is currently commercializing React Labs, a mobile platform for real-time polling and audience engagement.
Monday, April 11, 2016 - 12:00
Speaker: Percy Liang (Stanford)
Location: Gates Commons - CSE 691
Can we learn if we start with zero examples, either labeled or unlabeled? This scenario arises in new user-facing systems (such as virtual assistants for new domains), where inputs should come from users, but no users exist until we have a working system, which depends on having training data. I discuss recent work that circumvent this circular dependenceby interleaving user interaction and learning. Percy Liang is an Assistant Professor of Computer Science at Stanford University (B.S. from MIT, 2004; Ph.D. from UC Berkeley, 2011). His research interests include modeling natural language semantics and developing machine learning methods that infer rich latent structures from limited supervision. His awards include the IJCAI Computers and Thought Award (2016), an NSF CAREER Award (2016), a Sloan Research Fellowship (2015), a Microsoft Research Faculty Fellowshop (2014), and the best student paper at the International Conference on Machine Learning (2008).