Adaptive Parser-Centric Text Normalization
Submitted by clzhang on Thu, 2013-04-18 20:25
Text normalization has often been viewed as an important first step towards supporting many natural language processing (NLP) tasks. While many of these tasks, such as parsing, perform the best over fully grammatically correct text, most existing text normalization approaches narrowly define the task as one of mapping all out-of-vocabulary non-standard word tokens to their in-vocabulary standard forms in the language. In this paper, we take a parser-centric view of normalization and convert raw informal text into grammatically correct text. To understand the real effect of normalization on the parser, we tie normalization performance directly to parser performance. Additionally, we design a customizable normalization framework to address the often overlooked concept of domain adaptability, and illustrate that the system allows for transfer to new domains with a minimal amount of data and effort. Our experimental study over datasets from three domains demonstrates that our approach outperforms not only the state-of-the-art word-to-word normalization techniques, but also manual word-to-word annotations.
Last changed Thu, 2013-04-18 20:25

cs.