Problems which have an inherent lattice structure or problems
which can be solved by the node splitting approach are natural areas
of application for MUSE CSP, because an exponential number of CSPs are
replaced by a single instance of MUSE CSP, and the DAG representation
of
is inherent in the problem. In this section we discuss DAG
construction for other application areas which would benefit from the
MUSE CSP approach, but for which it is not as obvious how to construct
the DAG. Any set of CSP problems can be used as the segments of a MUSE
CSP. For example, Figure 35 illustrates how two
instances of a CSP can be combined into a single
MUSE CSP. However, using MUSE CSP for this example would not be the
right choice; node sharing cannot offset the cost of
using the extra MUSE AC-1 data structures.
Figure 35: An example set of CSP problems which would not be a good
candidate for MUSE CSP because of the lack of node sharing.
Figure 36: An example of how maximal node sharing leads to spurious
segments. The first DAG contains two paths, {1,2,3} and {2}, which correspond
with none of the segments. The second DAG presents a preferred sharing
as created by the CREATE-DAG routine.
Multiple nodes which have the same name in various CSPs can potentially be
represented as a single node in a MUSE CSP.
We assume that if two nodes,
and
are given
the same name (say k) in two instances of CSP, then they have the
same domain and obey the same constraints, i.e.:
Figure 37: Data structures used in CREATE-DAG and ORDER-SIGMA.
Figure 38: Routine to create a DAG to represent
.
Figure 39: The routine to arrange the nodes within the segments for
convenient merging.
To hold the individual segments in
, the routine CREATE-DAG uses a
special data structure for ordered sets which supports some useful
operations. If
is a segment and n is an integer, then
is the node at position n in
.
is always the
start node, and
is always the end
node.
is the ordered subset of
consisting of
all the nodes in positions k through m. In addition, the ordered
set allows us to insert a node i immediately after any node j which
is already in the set. Each node
is a structure with a
name field and a next-set
field, which is the set of names of nodes that follow the node
in a set of segments.
CREATE-DAG begins by adding special purpose start and end nodes to each segment. It then calls the routine ORDER-SIGMA shown in Figure 39 to order the nodes in each segment. ORDER-SIGMA orders the nodes of each segment such that the ones that are the most common tend to occur earlier in the set. To order the elements, it uses the operator > (i.e., larger than) which is defined on nodes. Note that the start node is defined to be the ``largest'' node, and the end node the ``smallest'' node. In addition, i > j means either that i appears in more segments than j does, or if they both appear in the same number of segments, then i has a lower ordinal number than j. Thus the operator > induces a total ordering on the nodes in N.
When ORDER-SIGMA is first called by CREATE-DAG it selects the
largest node i which is smaller than the start node. It then constructs
the set S, which is a set of those segments containing i. At this point,
the segments in S are ordered such that the start node is first and i
is second. It then calls ORDER-SIGMA to order S for nodes smaller
than i. Once the recursive call is done, any segments that were not
in S are considered (i.e., Z). Note that after the first
iteration of the loop, there is a preference to select the
largest node that was not contained in the segments that were ordered
by the recursive call to ORDER-SIGMA. These items are independent of
the ordered segments, and so will not create spurious paths when placed
early in the DAG; however, items that occur in the already ordered
segments, if placed earlier than items that do not occur in the
ordered segments would tend to introduce spurious paths.
The while loop continues until all segments in
are ordered.
The worst-case running time of ORDER-SIGMA is O
, where n is the sum of
the cardinalities of the segments in
.
Once ORDER-SIGMA orders the nodes in the segments, CREATE-DAG
begins to construct the DAG, which is represented as a set of nodes
N and a set of directed edges G. The DAG is constructed by going
through each segment beginning with the position of the second element
(the position after start). The for loop on line 3 looks at nodes in a left to right order, one
position at a time, until all the elements of each segment have been added to
G. If a node with a certain name has not already been placed in
N (i.e., the set of nodes already in the DAG being created) then
adding the node to the graph (as well as a directed edge between
and
to G) cannot create any spurious paths in the DAG. On the other hand,
if a node with the same name as
had already been placed
in N, then it is possible that the current segment
could add paths to the DAG that do not correspond to any of the segments
in
. To avoid adding spurious segments, we deal with all
segments at one time that share the same previous node
and have a node with the same
name at the current position. The basic idea is to add that edge only
once and to keep track of all nodes that can follow that node in
the DAG. By doing this, we can easily determine whether that same
node can be used if it occurs in another segment in a later position.
The same node can be used only if it is followed by precisely the same set of
next nodes that follow the node already placed in the graph; otherwise,
the second node would have to be renamed to avoid adding spurious segments.
In such an event, we create a new name for the node.
Note that once the DAG is complete, we eliminate the
start and end nodes from G (and their corresponding outgoing and
incoming edges) to make G consistent with its use in the MUSE
arc consistency and MUSE path consistency algorithms.
The running time of CREATE-DAG is also O
, where n is the sum of
the cardinalities of the segments in
.
Even though the DAGs produced by the routine CREATE-DAG do have nice properties, this routine should probably be used only as a starting point for custom combining routines which are specific to the intended application area. We believe that domain-specific information can play an important role in MUSE combination. An example of a domain specific combining algorithm is presented in [Harper, Jamieson, Zoltowski, & Helzerman, 1992], which describes a spoken-language parsing system which uses MUSE CSP. A distinguishing feature of this application's combining algorithm is that instead of avoiding the creation of extra segments, it allows controlled introduction of extra segments because the extra segments often represent sentences which an N-Best sentence spoken language recognition system would miss.