"Case Based Reasoning"
Motivation: Reuse of Past Problem-Solving
Results and Experience
In professions such as law or business, problem solving may involve preparing a legal defense, or planning for a corporate merger. Rather than applying a scientific theory, lawyers and chief executive officers typically recall previous experience in the form of "cases."
A case is a previously solved problem together with the solution and, ideally, some evaluation of how effective the solution turned out to be.
Referring to a previous case can be very useful when the problem at hand is very difficult to solve yet there is a highly similar problem in a case for which an effective solution was found.
Using a previously solved problem to solve a new problem is analogous to software reuse or general design reuse. The reuse saves problem-solving effort or improves the likeliness of successful solution. The reuse may of course involve some adaptation to the old design so that it fits the new requirements.
Fundamental Components of a CBR System
Case based-reasoning (a.k.a. case-based problem-solving) requires the following components:
1. a database of cases. These cases represent previous problem-solving experience.
2. a means to take a new problem and retrieve from the database those cases that are most likely to be helpful in solving the new problem.
3. a means to adapt an old solution
(from one of the retrieved cases) to meet the requirements of the new problem.
Case Representation
One way to represent a case is with
a triple...
(problem, solution, result)
where "result" refers to the relative
success or failure of the solution, according to some evaluation that presumably
took place after the solution was administered.
The representations of the problem
and solution will be application-dependent.
Problems are often specified by
listing a number of properties the solution must have or constraints that
it must meet. If there are a fixed number of such properties, then
the problem can be described by a vector of values.
A solution typically consists of an arrangement of parts: a plan, an object, etc. If the number and types of parts is known in advance, then the solution can also be represented as a vector.
The result could be represented briefly using a binary value from {SUCCESS, FAILURE}, or it reflect more gradations of judgment (e.g., with integers in the range 0 to 10). It could also be a vector itself, if there are a fixed number of evaluation criteria and values are available for each criterion.
For the example of planning a menu
for a dinner party, the representation of a case can have a specific format
such as that shown on p.377 of the text.
Distance Functions
The problem of determining which cases from the database best match a new problem requires a means to compare the current problem with the old problems in the case database.
It is customary to compute some kind of "distance" value between the new problem and some or all of the problems in the case database.
The traditional mathematical definition of a distance function is a nonnegative real-valued function of two arguments of the same type such that...
1. d(x, x) = 0;
2. d(x, y) = d(y, x)
3. d(x, z) <= d(x,
y) + d(y, z) "triangle inequality"
Here x is a point in some kind of space. For case-based reasoning, x should be a problem in a space of possible problems.
It is usually up to the designer of a case-based reasoning system to design a suitable function d(x, y) that will capture numerically the relevant notion of similarity between problems that matches the application.
One approach is to use a weighted sum of component distances, where each component distance measures the distance between a pair of corresponding attributes of the problems.
d(x, y) = a1 d1(x, y) + a2 d2(x, y) + . . . + an dn(x, y)
The individual component distances are simpler to design because they each deal only with one dimenion of the problem space.
Because of a natural asymmetry in
the relationship between the problem to be solved and the problems in the
case database, the symmetry condition for a distance metric can be relaxed,
if desired.
Some designers also relax the constraint
that the triangle inequality must hold.
Some of the more interesting design
problems involve the distance functions for symbolic objects such as words
and phrases or irregularly structured objects such as graphs.
Adaptation of Old Solutions
Once a case has been retrieved from the database, its solution may need to be adjusted in response to two factors: (1) the differences between the new problem and the problem in the stored case, and (2) the degree of success in the result from the stored case. If the result actually includes a "lesson learned", then this lesson should be taken to heart in handling the new problem.
Assuming that the stored solution was a successful one for its problem, adaptation for a new problem need deal only with item (1) of the above. If the new problem poses additional constraints over those of the old problem, then the old solution may need to simply be adjusted in such a way that it meets the new constraints, without violating any of the old problem constraints.
It may help to have a kind of expert
system built into the case-based reasoner so that it can take advantage
of knowledge (perhaps in the form of rules) to transform the old solution
into a new one appropriate to the new problem.
Indexing
As a case database grows larger, there comes a point where a linear search through the list of old problems becomes a very slow way to find the relevant cases. Consequently, it eventually pays to provide an organizational scheme for the database to speed up retrieval. This is usually done with an index.
There are lots of strategies of indexing databases of complicated objects. However, it is traditional to identify one component of such an object to be its "primary key" and to organize the index around that.
With objects of complex structure, such as labelled graphs, it may be helpful to compute scalar features from the graphs and index on those values -- number of nodes, number of edges, average valence of nodes, etc.
The more complicated the objects,
the greater is the need to provide taxonomies or ontologies for the various
components. Means to classify an object are an important part of
placing its index entries logically in the index.
Last modified: December 3, 1998
Steve Tanimoto