"Introduction to Natural Language Understanding"
Motivation:
1. More natural human/computer interfaces.
"Find me a good deal on new
bicycle at a dealer near my home."
2. Enable computers to learn from
books and audio.
"Columbus sailed the ocean
blue in the year 1492."
3. Intelligent aids to communication.
"Merci beaucoup." -->
"Thank you very much"
4. Content analysis: computer programs that summarize documents and make notes of significant points.
5. Intelligent search for information
on the web.
Stages in NLU
Levels of analysis for NLU:
Signal
Phoneme
Lexical
Syntax
Semantic
Speech Act (Pragmatic)
Also...Dialog
Speech understanding deals with all levels, including signal and phoneme levels.
Text understanding begins at the
lexical level.
Aspects of Communication with
Language
Language is one-dimensional (linear), but is used to describe multidimensional situations and events.
People seek economy of expression, often at the expense of ambiguity. In fact, ambiguity is a pervasive aspect of NLU.
Ambiguity is often resolved using
"context" -- knowledge about the situation.
Syntax
A language (from a formal syntactic point of view) is a set of strings over a given finite alphabet.
In order to provide a way to map strings into meanings, there must be a way to group elements of a string into units and phrases. This is usually done by means of grammars -- sets of rules that are described according to string transformations based on replacements of symbols or substrings by other substrings.
The Chomsky hierarchy gives a well-known taxonomy for classes of languages described by grammars.
The process of mapping a sentence in a language into a syntactic description according to a grammar is called parsing.
The most important classes of languages for NLU are context-free and context-sensitive. The job of building grammars that can handle English in a general way is a big challenge. Typical "industrial" grammars have hundreds of production rules. Boeing uses a large parser to help improve the technical writing in airplane user manuals.
Because methods for syntax analysis
tend to be simpler than those for semantic analysis, the fuzzy line between
syntax and semantics has sometimes been pushed towards the semantics side
in order to perform more of the processing with parsing technology.
For example, "semantic grammars" use standard syntactic mechanisms to process
phrases into semantically specific categories.
Last modified: November 23, 1998
Steve Tanimoto