Much of the motivation for extending CSP comes from our work in spoken language parsing [Harper & Helzerman, 1995a, Harper, Jamieson, Zoltowski, & Helzerman, 1992, Zoltowski, Harper, Jamieson, & Helzerman, 1992]. The output of a hidden-Markov-model-based speech recognizer can be thought of as a lattice of word candidates. Unfortunately, a lattice contains many word candidates that can never appear in a sentence covering the duration of a speech utterance. By converting the lattice to a word graph, many word candidates in the lattice can be eliminated. Figure 7 depicts a word graph constructed from a simple lattice. Notice that the word tour can be eliminated when the word graph is constructed. In order to accommodate words that occur over time intervals that may overlap, each word's position in the lattice is now represented as a tuple (b,e) such that b<e. The positional relations defined for constraints are easily modified to operate on tuples [Harper & Helzerman, 1995a].
Figure 7: Multiple sentence hypotheses can be parsed simultaneously
by applying constraints over a word graph rather than individual
sentences extracted from a lattice.
After construction, the word graph often contains spurious sentence hypotheses which can be pruned by using a variety of constraints (e.g., syntactic, semantic, etc.). We can apply constraints to individual sentences to rule out those that are ungrammatical; however, individually processing each sentence hypothesis is inefficient since many have a high degree of similarity. If the spoken language parsing problem is structured as a MUSE CSP problem, then the constraints used to parse individual sentences would be applied to the word graph of sentence hypotheses, eliminating from further consideration many hypotheses which are ungrammatical.
We have developed a MUSE CSP constraint-based parser, PARSEC [Harper & Helzerman, 1995a, Harper & Helzerman, 1995b, Harper, Jamieson, Zoltowski, & Helzerman, 1992, Zoltowski, Harper, Jamieson, & Helzerman, 1992], which is capable of parsing word graphs containing multiple sentences produced by a speech recognition module. We have developed syntactic and semantic constraints for parsing single sentences, which when applied to a word graph, eliminate those hypotheses that are syntactically or semantically incorrect. The MUSE CSP used by our parser can be thought of as a parse forest which is pruned by using constraints. By applying constraints from a wide variety of knowledge sources, the parser prunes the composite structure of many of the role values associated with a role, as well as word nodes with no remaining role values. Several experiments [Harper, Jamieson, Zoltowski, & Helzerman, 1992, Zoltowski, Harper, Jamieson, & Helzerman, 1992] have considered how effective syntactic and semantic constraints are at pruning word nodes that can appear in no sentence hypothesis. For our work in speech processing, the MUSE arc consistency algorithm is very effective at pruning the role values from the composite structure that can never appear in a parse for a sentence (i.e., an individual CSP). Constraints are usually tight enough that MUSE arc consistency eliminates role values that do not participate in at least one parse for the represented sentences.
MUSE CSP is a useful way to process multiple sentences because the arc consistency algorithm is effective at eliminating role values that cannot appear in sentence parses. Several factors contribute to the effectiveness of the arc consistency algorithm for this problem. First, the syntactic constraints are fairly tight constraints. Second, the role values contain some segmental information that constrain the problem. Consider the word graph in Figure 8. The value s-(3,4) associated with the role marked N for the word are cannot support any of the values for the role marked G for the word dogs at position (3,5), because it is not legal in a segment involving position (3,5). In the figure, we mark those entries where a value associated with one role is segmentally incompatible with the values of another with an N. These entries are equivalent to 0. Third, many times constraints create symmetric dependencies between words in the sentence. For example, one constraint might indicate that a verb needs a subject to its left, and another that a subject must be governed by a verb to its right.
Figure 8: In parsing word graphs, some of the values assigned to roles contain
segmental information which make them incompatible with the values
associated with some of the other roles. For example, s-(3,4) cannot support any of the
values associated with the G or N roles of the word dogs.