This paper discusses two types of costs, the cost of tests and the cost of misclassification errors. These two costs have been treated together in decision theory, but ICET is the first machine learning system that handles both costs together. The experiments in this paper have compared ICET to other machine learning systems that can handle test costs (Nunez, 1988, 1991; Tan & Schlimmer, 1989, 1990; Tan, 1993; Norton, 1989), but we have not compared ICET to other machine learning systems that can handle classification error costs (Breiman et al., 1984; Friedman & Stuetzle, 1981; Hermans et al., 1974; Gordon & Perlis, 1989; Pazzani et al., 1994; Provost, 1994; Provost & Buchanan, in press; Knoll et al., 1994). In future work, we plan to address this omission. A proper treatment of this issue would make this paper too long.
The absence of comparison with machine learning systems that can handle classification error costs has no impact on most of the experiments reported here. The experiments in this paper focussed on simple classification cost matrices (except for Section 4.2.3). When the classification cost matrix is simple and the cost of tests is ignored, minimizing cost is exactly equivalent to maximizing accuracy (see Section 2.3). Therefore, C4.5 (which is designed to maximize accuracy) is a suitable surrogate for any of the systems that can handle classification error costs.
We also did not experiment with setting the test costs to zero. However, the behavior of ICET when the penalty for misclassification errors is very high (the extreme right-hand sides of the plots in Figure 3) is necessarily the same as its behavior when the cost of tests is very low, since ICET is sensitive to the relative differences between test costs and error costs, not the absolute costs. Therefore (given the behavior we can observe in the extreme right-hand sides of the plots in Figure 3) we can expect that the performance of ICET will tend to converge with the performance of the other algorithms as the cost of tests approaches zero.
One natural addition to ICET would be the ability to output an "I don't know" class. This is easily handled by the GENESIS component, by extending the classification cost matrix so that a cost is assigned to classifying a case as "unknown". We need to also make a small modification to the EG2 component, so that it can generate decision trees with leaves labelled "unknown". One way to do this would be to introduce a parameter that defines a confidence threshold. Whenever the confidence in a certain leaf drops below the confidence threshold, that leaf would be labelled "unknown". This confidence parameter would be made accessible to the GENESIS component, so that it could be tuned to minimize average classification cost.
The mechanism in ICET for handling conditional test costs has some limitations. As it is currently implemented, it does not handle the cost of attributes that are calculated from other attributes. For example, in the Thyroid dataset (Appendix A.5), the FTI test is calculated based on the results of the TT4 and T4U tests. If the FTI test is selected, we must pay for the TT4 and T4U tests. If the TT4 and T4U tests have already been selected, the FTI test is free (since the calculation is trivial). The ability to deal with calculated test results could be added to ICET with relatively little effort.
ICET, as currently implemented, only handles two classes of test results: tests with "immediate" results and tests with "delayed" results. Clearly there can be a continuous range of delays, from seconds to years. We have chosen to treat delays as distinct from test costs, but it could be argued that a delay is simply another type of test cost. For example, we could say that a group of blood tests shares the common cost of a one-day wait for results. The cost of one of the blood tests is conditional on whether we are prepared to commit ourselves to doing one or more of the other tests in the group, before we see the results of the first test. One difficultly with this approach to handling delays is the problem of assigning a cost to the delay. How much does it cost to bring a patient in for two blood samples, instead of one? Do we include the disruption to the patient's life in our estimate of the cost? To avoid these questions, we have not treated delays as another type of test cost, but our approach does not readily handle a continuous range of delays.
The cost of a test can be a function of several things: (1) It can be a function of the prior tests that have been selected. (2) It can be a function of the actual class of the case. (3) It can be a function of other aspects of the case, where information about these other aspects may be available through other tests. (4) It can be a function of the test result. This list seems comprehensive, but there may be some possibilities we have overlooked. Let us consider each of these four possibilities.
First, the cost of a test can be a function of the prior tests that have been selected. ICET handles a special case of this, where a group of tests shares a common cost. As it is currently implemented, ICET does not handle the general case. However, we could easily add this capability to ICET by modifying the fitness function.
Second, the cost of a test can be a function of the actual class of the case. For example, a test for heart disease might involve heavy exercise (Appendix A.2). If the patient actually has heart disease, the exercise might trigger a heart attack. This risk should be included in the cost of this particular test. Thus the cost of this test should vary, depending on whether the patient actually has heart disease. We have not implemented this, although it could easily be added to ICET by modifying the fitness function.
Third, the cost of a test can be a function of the results of other tests. For example, drawing blood from a newborn is more costly than drawing blood from an adult. To assign a cost to a blood test, we need to know the age of the patient. The age of the patient can be represented as the result of another test -- the "patient-age" test. This is slightly more complex than the preceding cases, because we must now insure that the blood test is always accompanied with the patient-age test. We have not implemented this, although it could be added to ICET.
Fourth, the cost of a test can be a function of the test result. For example, injecting a radio-opaque die for an X-ray might cause an allergic reaction in the patient. This risk should be added to the cost of the test. This makes the cost of the test a function of one of the possible outcomes of the test. In a situation like this, it may be wise to precede the injection of the die with a screening test for allergies. This could be as simple as asking a question to the patient. This question may have no relevance at all for determining the correct diagnosis of the patient, but it may serve to reduce the average cost of classification. This case is similar to the third case, above. Again, we have not implemented this, although it could be added to ICET.
Attribute selection in EG2, CS-ID3, and IDX shares a common form. We
may view attribute selection as a function from
to
, which takes as input
n information gain values
(one for each attribute) and generates as output the index of one of
the attributes. We may view
and
as parameters in the attribute selection
function. These parameters may be used to control the bias of the
attribute selection procedure. In this view, ICET uses GENESIS to tune
the parameters of EG2's attribute selection function. In the future,
we would like to investigate more general attribute selection
functions. For example, we might use a neural network to implement a
function from
to
. GENESIS would then be used to tune the weights in the neural network. The
attribute selection function might also benefit from the addition of an
input that specifies the depth of the decision tree at the current
node, where the information gain values are measured. This would enable
the bias for a test to vary, depending on how many tests have already
been selected.
Another area for future work is to explore the parameter settings that control GENESIS (Table 4). There are many parameters that could be adjusted in GENESIS. We think it is significant that ICET works well with the default parameter settings in GENESIS, since it shows that ICET is robust with respect to the parameters. However, it might be possible to substantially improve the performance of ICET by tuning some of these parameters. A recent trend in genetic algorithm research is to let the genetic algorithm adjust some of its own parameters, such as mutation rate and crossover rate (Whitley et al., 1993). Another possibility is to stop breeding when the fitness levels stop improving, instead of stopping after a fixed number of generations. Provost and Buchanan (in press) use a goodness measure as a stopping condition for the bias space search.