The prototypical example of the problem of cost-sensitive classification is medical diagnosis, where a doctor would like to balance the costs of various possible medical tests with the expected benefits of the tests for the patient. There are several aspects to this problem: When does the benefit of a test, in terms of more accurate diagnosis, justify the cost of the test? When is it time to stop testing and make a commitment to a particular diagnosis? How much time should be spent pondering these issues? Does an extensive examination of the various possible sequences of tests yield a significant improvement over a simpler, heuristic choice of tests? These are some of the questions investigated here.
The words "cost", "expense", and "benefit" are used in this paper in the broadest sense, to include factors such as quality of life, in addition to economic or monetary cost. Cost is domain-specific and is quantified in arbitrary units. It is assumed here that the costs of tests are measured in the same units as the benefits of correct classification. Benefit is treated as negative cost.
This paper introduces a new algorithm for cost-sensitive classification, called ICET (Inexpensive Classification with Expensive Tests -- pronounced "iced tea"). ICET uses a genetic algorithm (Grefenstette, 1986) to evolve a population of biases for a decision tree induction algorithm (a modified version of C4.5, Quinlan, 1992). The fitness function of the genetic algorithm is the average cost of classification when using the decision tree, including both the costs of tests (features, measurements) and the costs of classification errors. ICET has the following features: (1) It is sensitive to test costs. (2) It is sensitive to classification error costs. (3) It combines a greedy search heuristic with a genetic search algorithm. (4) It can handle conditional costs, where the cost of one test is conditional on whether a second test has been selected yet. (5) It distinguishes tests with immediate results from tests with delayed results.
The problem of cost-sensitive classification arises frequently. It is a problem in medical diagnosis (Nunez, 1988, 1991), robotics (Tan & Schlimmer, 1989, 1990; Tan, 1993), industrial production processes (Verdenius, 1991), communication network troubleshooting (Lirov & Yue, 1991), machinery diagnosis (where the main cost is skilled labor), automated testing of electronic equipment (where the main cost is time), and many other areas.
There are several machine learning algorithms that consider the costs of tests, such as EG2 (Nunez, 1988, 1991), CS-ID3 (Tan & Schlimmer, 1989, 1990; Tan, 1993), and IDX (Norton, 1989). There are also several algorithms that consider the costs of classification errors (Breiman et al., 1984; Friedman & Stuetzle, 1981; Hermans et al., 1974; Gordon & Perlis, 1989; Pazzani et al., 1994; Provost, 1994; Provost & Buchanan, in press; Knoll et al., 1994). However, there is very little work that considers both costs together.
There are good reasons for considering both the costs of tests and the costs of classification errors. An agent cannot rationally determine whether a test should be performed without knowing the costs of correct and incorrect classification. An agent must balance the cost of each test with the contribution of the test to accurate classification. The agent must also consider when further testing is not economically justified. It often happens that the benefits of further testing are not worth the costs of the tests. This means that a cost must be assigned to both the tests and the classification errors.
Another limitation of many existing cost-sensitive classification algorithms (EG2, CS-ID3) is that they use greedy heuristics, which select at each step whatever test contributes most to accuracy and least to cost. A more sophisticated approach would evaluate the inter- actions among tests in a sequence of tests. A test that appears useful considered in isolation, using a greedy heuristic, may not appear as useful when considered in combination with other tests. Past work has demonstrated that more sophisticated algorithms can have superior performance (Tcheng et al., 1989; Ragavan & Rendell, 1993; Norton, 1989; Schaffer, 1993; Rymon, 1993; Seshu, 1989; Provost, 1994; Provost & Buchanan, in press).
Section 2 discusses why a decision tree is the natural form of knowledge representation for classification with expensive tests and how we measure the average cost of classification of a decision tree. Section 3 introduces the five algorithms that we examine here, C4.5 (Quinlan, 1992), EG2 (Nunez, 1991), CS-ID3 (Tan & Schlimmer, 1989, 1990; Tan, 1993), IDX (Norton, 1989), and ICET. The five algorithms are evaluated empirically on five real-world medical datasets. The datasets are discussed in detail in Appendix A. Section 4 presents three sets of experiments. The first set (Section 4.1) of experiments examines the baseline performance of the five algorithms on the five datasets and establishes that ICET performs significantly better than its competitors for the given datasets. The second set (Section 4.2) tests the robustness of ICET under a variety of conditions and shows that ICET maintains its advantage. The third set (Section 4.3) looks at ICET's search in bias space and discovers a way to improve the search. We then discuss related work and future work in Section 5. We end with a summary of what we have learned with this research and a statement of the general motivation for this type of research.