Abstract: Despite large advances in neural text generation in terms of fluency, existing generation techniques are prone to hallucination and often produce output that is factually incorrect or structurally incoherent. In this talk, we study this problem from the perspectives of evaluation, modeling, and robustness. We first discuss how existing evaluation metrics like BLEU or ROUGE show poor correlation to human judgement when the reference text diverges from information in the source, a common phenomena in generation datasets. We propose a new metric, PARENT, which aligns n-grams to the source before computing precision and recall, making it considerably more robust to divergence. Next, we discuss modeling, proposing an exemplar-based approach to conditional text generation that aims to leverage training instances to build instance-specific decoders that can more easily capture style and structure. Results on 3 datasets show that our model achieves strong performance and outperforms comparable baselines. Lastly, we discuss generalization of neural generation in non-iid settings, focusing on the problem of zero shot translation — a challenging setup that tests models on translation directions they have not been optimized for at training time. We define the notion of zero-shot consistency and introduce a consistent agreement-based training method that results in a 2-3 BLEU zero shot improvement over strong baselines.
Collaborators: This is joint work with Maruan Al-Shedivat, Bhuwan Dhingra, Hao Peng, Manaal Faruqui, Ming-Wei Chang, William Cohen, and Dipanjan Das.
Bio: Ankur Parikh is a Senior Research Scientist at Google NYC and adjunct assistant professor at NYU. His primary interests are in natural language processing and machine learning. Ankur received his PhD from Carnegie Mellon in 2015 and his B.S.E. from Princeton University in 2009. He has received a best paper runner up award at EMNLP 2014 and a best paper in translational bioinformatics at ISMB 2011.