Our method for calculating cost does not explicitly deal with accuracy;
however, we can handle accuracy as a special case. If the test cost is
set to $0.00 for all tests and the classification cost matrix is set to
a positive constant value k when the guess class i
does not equal the actual class j, but it is set to $0.00 when
i equals j, then the average total cost of using the
decision tree is pk, where
is the frequency of errors on the testing dataset and 100(1-p)
is the percentage accuracy on the testing dataset. Thus there is a
linear relationship between average total cost and percentage accuracy,
in this situation.
More generally, let C be a classification cost matrix that has
cost x on the diagonal,
, and
cost y off the diagonal,
,
where x is less than y,
. We will call this type
of classification cost matrix a simple classification cost
matrix. A cost matrix that is not simple will be called a
complex classification cost matrix. When we have a simple cost
matrix and test costs are zero (equivalently, test costs are ignored),
minimizing cost is exactly equivalent to maximizing accuracy.
It follows from this that an algorithm that is sensitive to misclassification error costs but ignores test costs (Breiman et al., 1984; Friedman & Stuetzle, 1981; Hermans et al., 1974; Gordon & Perlis, 1989; Pazzani et al., 1994; Provost, 1994; Provost & Buchanan, in press; Knoll et al., 1994) will only be interesting when we have a complex cost matrix. If we have a simple cost matrix, an algorithm such as CART (Breiman et al., 1984) that is sensitive to misclassification error cost has no advantage over an algorithm such as C4.5 (Quinlan, 1992) that maximizes accuracy (assuming other differences between these two algorithms are negligible). Most of the experiments in this paper use a simple cost matrix (the only exception is Section 4.2.3). Therefore we focus on comparison of ICET with algorithms that are sensitive to test cost (IDX, CS-ID3, and EG2). In future work, we will examine complex cost matrices and compare ICET with algorithms that are sensitive to misclassification error cost.
It is difficult to find information on the costs of misclassification errors in medical practice, but it seems likely that a complex cost matrix is more appropriate than a simple cost matrix for most medical applications. This paper focuses on simple cost matrices because, as a research strategy, it seems wise to start with the simple cases before we attempt the complex cases.
Provost (Provost, 1994; Provost & Buchanan, in press) combines accuracy and classification error cost using the following formula:
In this formula, A and B are arbitrary weights that the user can set for a particular application. Both "accuracy" and "cost", as defined by Provost (Provost, 1994; Provost & Buchanan, in press), can be represented using classification cost matrices. We can represent "accuracy" using any simple cost matrix. In interesting applications, "cost" will be represented by a complex cost matrix. Thus "score" is a weighted sum of two classification cost matrices, which means that "score" is itself a classification cost matrix. This shows that equation (1) can be handled as a special case of the method presented here. There is no loss of information in this translation of Provost's formula into a cost matrix. This does not mean that all criteria can be represented as costs. An example of a criterion that cannot be represented as a cost is stability (Turney, in press).