4.2. The Running Time, Space Complexity, and Correctness of MUSE PC-1

5. Combining CSPs into a MUSE CSP

Problems which have an inherent lattice structure or problems which can be solved by the node splitting approach are natural areas of application for MUSE CSP, because an exponential number of CSPs are replaced by a single instance of MUSE CSP, and the DAG representation of tex2html_wrap_inline18743 is inherent in the problem. In this section we discuss DAG construction for other application areas which would benefit from the MUSE CSP approach, but for which it is not as obvious how to construct the DAG. Any set of CSP problems can be used as the segments of a MUSE CSP. For example, Figure 35 illustrates how two instances of a CSP can be combined into a single MUSE CSP. However, using MUSE CSP for this example would not be the right choice; node sharing cannot offset the cost of using the extra MUSE AC-1 data structures.

   figure16942
Figure 35: An example set of CSP problems which would not be a good candidate for MUSE CSP because of the lack of node sharing.

   figure16969
Figure 36: An example of how maximal node sharing leads to spurious segments. The first DAG contains two paths, {1,2,3} and {2}, which correspond with none of the segments. The second DAG presents a preferred sharing as created by the CREATE-DAG routine.

Multiple nodes which have the same name in various CSPs can potentially be represented as a single node in a MUSE CSP. We assume that if two nodes, tex2html_wrap_inline22157 and tex2html_wrap_inline22159 are given the same name (say k) in two instances of CSP, then they have the same domain and obey the same constraints, i.e.:

  1. tex2html_wrap_inline22163 (i.e., their domains are equal.)
  2. tex2html_wrap_inline22165 for every tex2html_wrap_inline22167 (i.e., their unary constraints are the same.)
  3. tex2html_wrap_inline22169 for labels tex2html_wrap_inline22167 and tex2html_wrap_inline22173 , where i is in both segments (i.e., their binary constraints are the same.)
However, as illustrated in Figure 36, too much sharing of common nodes can introduce additional segments that do not appear in the original list of CSPs. Because the extra segments can cause extra work to be done, it is often desirable to create a DAG which shares nodes without introducing extra segments. The algorithm CREATE-DAG, shown in Figure 38 takes an arbitrary set of CSP problems as input (a list of segments), and outputs a DAG representation for those CSPs which shares nodes without introducing spurious segments. CREATE-DAG calls an auxiliary procedure ORDER-SIGMA defined in Figure 39. The data structures used in these two routines are defined in Figure 37.

   figure17023
Figure 37: Data structures used in CREATE-DAG and ORDER-SIGMA.

    figure17114
Figure 38: Routine to create a DAG to represent tex2html_wrap_inline18743 .

   figure17202
Figure 39: The routine to arrange the nodes within the segments for convenient merging.

To hold the individual segments in tex2html_wrap_inline18743 , the routine CREATE-DAG uses a special data structure for ordered sets which supports some useful operations. If tex2html_wrap_inline22405 is a segment and n is an integer, then tex2html_wrap_inline22409 is the node at position n in tex2html_wrap_inline22405 . tex2html_wrap_inline22415 is always the start node, and tex2html_wrap_inline22417 is always the end node. tex2html_wrap_inline22419 is the ordered subset of tex2html_wrap_inline22405 consisting of all the nodes in positions k through m. In addition, the ordered set allows us to insert a node i immediately after any node j which is already in the set. Each node tex2html_wrap_inline22431 is a structure with a name field and a next-set field, which is the set of names of nodes that follow the node tex2html_wrap_inline22431 in a set of segments.

CREATE-DAG begins by adding special purpose start and end nodes to each segment. It then calls the routine ORDER-SIGMA shown in Figure 39 to order the nodes in each segment. ORDER-SIGMA orders the nodes of each segment such that the ones that are the most common tend to occur earlier in the set. To order the elements, it uses the operator > (i.e., larger than) which is defined on nodes. Note that the start node is defined to be the ``largest'' node, and the end node the ``smallest'' node. In addition, i > j means either that i appears in more segments than j does, or if they both appear in the same number of segments, then i has a lower ordinal number than j. Thus the operator > induces a total ordering on the nodes in N.

When ORDER-SIGMA is first called by CREATE-DAG it selects the largest node i which is smaller than the start node. It then constructs the set S, which is a set of those segments containing i. At this point, the segments in S are ordered such that the start node is first and i is second. It then calls ORDER-SIGMA to order S for nodes smaller than i. Once the recursive call is done, any segments that were not in S are considered (i.e., Z). Note that after the first iteration of the loop, there is a preference to select the largest node that was not contained in the segments that were ordered by the recursive call to ORDER-SIGMA. These items are independent of the ordered segments, and so will not create spurious paths when placed early in the DAG; however, items that occur in the already ordered segments, if placed earlier than items that do not occur in the ordered segments would tend to introduce spurious paths. The while loop continues until all segments in tex2html_wrap_inline18743 are ordered. The worst-case running time of ORDER-SIGMA is O tex2html_wrap_inline20597 , where n is the sum of the cardinalities of the segments in tex2html_wrap_inline18743 .

Once ORDER-SIGMA orders the nodes in the segments, CREATE-DAG begins to construct the DAG, which is represented as a set of nodes N and a set of directed edges G. The DAG is constructed by going through each segment beginning with the position of the second element (the position after start). The for loop on line 3 looks at nodes in a left to right order, one position at a time, until all the elements of each segment have been added to G. If a node with a certain name has not already been placed in N (i.e., the set of nodes already in the DAG being created) then adding the node to the graph (as well as a directed edge between tex2html_wrap_inline22485 and tex2html_wrap_inline22431 to G) cannot create any spurious paths in the DAG. On the other hand, if a node with the same name as tex2html_wrap_inline22431 had already been placed in N, then it is possible that the current segment could add paths to the DAG that do not correspond to any of the segments in tex2html_wrap_inline18743 . To avoid adding spurious segments, we deal with all segments at one time that share the same previous node and have a node with the same name at the current position. The basic idea is to add that edge only once and to keep track of all nodes that can follow that node in the DAG. By doing this, we can easily determine whether that same node can be used if it occurs in another segment in a later position. The same node can be used only if it is followed by precisely the same set of next nodes that follow the node already placed in the graph; otherwise, the second node would have to be renamed to avoid adding spurious segments. In such an event, we create a new name for the node.

Note that once the DAG is complete, we eliminate the start and end nodes from G (and their corresponding outgoing and incoming edges) to make G consistent with its use in the MUSE arc consistency and MUSE path consistency algorithms. The running time of CREATE-DAG is also O tex2html_wrap_inline20597 , where n is the sum of the cardinalities of the segments in tex2html_wrap_inline18743 .

Even though the DAGs produced by the routine CREATE-DAG do have nice properties, this routine should probably be used only as a starting point for custom combining routines which are specific to the intended application area. We believe that domain-specific information can play an important role in MUSE combination. An example of a domain specific combining algorithm is presented in [Harper, Jamieson, Zoltowski, & Helzerman, 1992], which describes a spoken-language parsing system which uses MUSE CSP. A distinguishing feature of this application's combining algorithm is that instead of avoiding the creation of extra segments, it allows controlled introduction of extra segments because the extra segments often represent sentences which an N-Best sentence spoken language recognition system would miss.



6. Conclusion