Table of Contents

5. Discussion


5.1 Related Work

There are several other algorithms that are sensitive to test costs (Nunez, 1988, 1991; Tan & Schlimmer, 1989, 1990; Tan, 1993; Norton, 1989). As we have discussed, the main limitation of these algorithms is that they do not consider the cost of classification errors. We cannot rationally determine whether a test should be performed until we know both the cost of the test and the cost of classification errors.

There are also several algorithms that are sensitive to classification error costs (Breiman et al., 1984; Friedman & Stuetzle, 1981; Hermans et al., 1974; Gordon & Perlis, 1989; Pazzani et al., 1994; Provost, 1994; Provost & Buchanan, in press; Knoll et al., 1994). None of these algorithms consider the cost of tests. Therefore they all focus on complex classification cost matrices, since, when tests have no cost and the classification error matrix is simple, the problem reduces to maximizing accuracy.

The FIS system (Pipitone et al., 1991) attempts to find a decision tree that minimizes the average total cost of the tests required to achieve a certain level of accuracy. This approach could be implemented in ICET by altering the fitness function. The main distinction between FIS (Pipitone et al., 1991) and ICET is that FIS does not learn from data. The information gain of a test is estimated using a qualitative causal model, instead of training cases. Qualitative causal models are elicited from domain experts, using a special knowledge acquisition tool. When training data are available, ICET can be used to avoid the need for knowledge acquisition. Otherwise, ICET is not applicable and the FIS approach is suitable.

Another feature of ICET is that it does not perform purely greedy search. Several other authors have proposed non-greedy classification algorithms (Tcheng et al., 1989; Ragavan & Rendell, 1993; Norton, 1989; Schaffer, 1993; Rymon, 1993; Seshu, 1989). In general, these results show that there can be an advantage to more sophisticated search procedures. ICET is different from these algorithms in that it uses a genetic algorithm and it is applied to minimizing both test costs and classification error costs.

ICET uses a two-tiered search strategy. At the bottom tier, EG2 performs a greedy search through the space of classifiers. On the second tier, GENESIS performs a non-greedy search through a space of biases. The idea of a two-tiered search strategy (where the first tier is search in classifier space and the second tier is search in bias space) also appears in (Provost, 1994; Provost & Buchanan, in press; Aha & Bankert, 1994; Schaffer, 1993). Our work goes beyond Aha and Bankert (1994) by considering search in a real bias space, rather than search in a binary space. Our work fits in the general framework of Provost and Buchanan (in press), but differs in many details. For example, their method of calculating cost is a special case of ours (Section 2.3).

Other researchers have applied genetic algorithms to classification problems. For example, Frey and Slate (1991) applied a genetic algorithm (in particular, a learning classifier system (LCS)) to letter recognition. However, Fogarty (1992) obtained higher accuracy using a simple nearest neighbor algorithm. More recent applications of genetic algorithms to classification have been more successful (De Jong et al., 1993). However, the work described here is the first application of genetic algorithms to the problem of cost-sensitive classification.

We mentioned in Section 2.1 that decision theory may be used to define the optimal solution to the problem of cost-sensitive classification. However, searching for the optimal solution is computationally infeasible (Pearl, 1988). We attempted to take a decision theoretic approach to this problem by implementing the AO* algorithm (Pearl, 1984) and designing a heuristic evaluation function to speed up the AO* search (Lirov & Yue, 1991). We were unable to make this approach execute fast enough to be practical.

We also attempted to apply genetic programming (Koza, 1992) to the problem of cost-sensitive classification. Again, we were unable to make this approach execute fast enough to be practical, although it was faster than the AO* approach.

The cost-sensitive classification problem, as we have treated it here, is essentially a problem in reinforcement learning (Sutton, 1992; Karakoulas, in preparation). The average cost of classification, measured as described in Section 2.2, is a reward/punishment signal that could be optimized using reinforcement learning techniques. This is something that might be explored as an alternative approach.


5.2 Future Work