Table of Contents

2. Cost-Sensitive Classification


2.1 Decision Trees and Cost-Sensitive Classification

The decision trees used in decision theory (Pearl, 1988) are somewhat different from the classification decision trees that are typically used in machine learning (Quinlan, 1992). When we refer to decision trees in this paper, we mean the standard classification decision trees of machine learning. The claims we make here about classification decision trees also apply to decision theoretical decision trees, with some modification. A full discussion of decision theoretical decision trees is outside of the scope of this paper.

The decision to do a test must be based on both the cost of tests and the cost of classification errors. If a test costs $10 and the maximum penalty for a classification error is $5, then there is clearly no point in doing the test. On the other hand, if the penalty for a classification error is $10,000, the test may be quite worthwhile, even if its information content is relatively low. Past work with algorithms that are sensitive to test costs (Nunez, 1988, 1991; Tan, 1993; Norton, 1989) has overlooked the importance of also considering the cost of classification errors.

When tests are inexpensive, relative to the cost of classification errors, it may be rational to do all tests (i.e., measure all features; determine the values of all attributes) that seem possibly relevant. In this kind of situation, it is convenient to separate the selection of tests from the process of making a classification. First we can decide on the set of tests that are relevant, then we can focus on the problem of learning to classify a case, using the results of these tests. This is a common approach to classification in the machine learning literature. Often a paper focuses on the problem of learning to classify a case, without any mention of the decisions involved in selecting the set of relevant tests.

When tests are expensive, relative to the cost of classification errors, it may be suboptimal to separate the selection of tests from the process of making a classification. We may be able to achieve much lower costs by interleaving the two. First we choose a test, then we examine the test result. The result of the test gives us information, which we can use to influence our choice for the next test. At some point, we decide that the cost of further tests is not justified, so we stop testing and make a classification.

When the selection of tests is interleaved with classification in this way, a decision tree is the natural form of representation. The root of the decision tree represents the first test that we choose. The next level of the decision tree represents the next test that we choose. The decision tree explicitly shows how the outcome of the first test determines the choice of the second test. A leaf represents the point at which we decide to stop testing and make a classification.

Decision theory can be used to define what constitutes an optimal decision tree, given (1) the costs of the tests, (2) the costs of classification errors, (3) the conditional probabilities of test results, given sequences of prior test results, and (4) the conditional probabilities of classes, given sequences of test results. However, searching for an optimal tree is infeasible (Pearl, 1988). ICET was designed to find a good (but not necessarily optimal) tree, where "good" is defined as "better than the competition" (i.e., IDX, CS-ID3, and EG2).


2.2 Calculating the Average Cost of Classification