Table of Contents

4.2.2 No Group Discounts


4.2.3 Complex Classification Cost Matrices

So far, we have only used simple classification cost matrices, where the penalty for a classification error is the same for all types of error. This assumption is not inherent in ICET. Each element in the classification cost matrix can have a different value. In this experiment, we explore ICET's behavior when the classification cost matrix is complex.

We use the term "positive error" to refer to a false positive diagnosis, which occurs when a patient is diagnosed as being sick, but the patient is actually healthy. Conversely, the term "negative error" refers to a false negative diagnosis, which occurs when a patient is diagnosed as being healthy, but is actually sick. The term "positive error cost" is the cost that is assigned to positive errors, while "negative error cost" is the cost that is assigned to negative errors. See Appendix A for examples. We were interested in ICET's behavior as the ratio of negative to positive error cost was varied. Table 9 shows the ratios that we examined. Figure 5 shows the performance of the five algorithms at each ratio.

Our hypothesis was that the difference in performance between ICET and the other algorithms would increase as we move away from the middles of the plots, where the ratio is 1.0, since the other algorithms have no mechanism to deal with complex classification cost; they were designed under the implicit assumption of simple classification cost matrices. In fact, Figure 5 shows that the difference tends to decrease as we move away from the middles. This is most pronounced on the right-hand sides of the plots. When the ratio is 8.0 (the extreme right-hand sides of the plots), there is no advantage to using ICET. When the ratio is 0.125 (the extreme left-hand sides of the plots), there is still some advantage to using ICET.

The interpretation of these plots is complicated by the fact that the gap between the algorithms tends to decrease as the penalty for classification errors increases (as we can see in Figure 3 -- in retrospect, we should have held the sum of the negative error cost and the positive error cost at a constant value, as we varied their ratio). However, there is clearly an asymmetry in the plots, which we expected to be symmetrical about a vertical line centered on 1.0 on the x axis. The plots are close to symmetrical for the other algorithms, but they are asymmetrical for ICET. This is also apparent in Table 10, which focuses on a comparison of the performance of ICET and EG2, averaged across all five datasets (see the sixth plot in Figure 5). This suggests that it is more difficult to reduce negative errors (on the right-hand sides of the plots, negative errors have more weight) than it is to reduce positive errors (on the left-hand sides, positive errors have more weight). That is, it is easier to avoid false positive diagnoses (a patient is diagnosed as being sick, but the patient is actually healthy) than it is to avoid false negative diagnoses (a patient is diagnosed as being healthy, but is actually sick). This is unfortunate, since false negative diagnoses usually carry a heavier penalty, in real-life. Preliminary investigation suggests that false negative diagnoses are harder to avoid because the "sick" class is usually less frequent than the "healthy" class, which makes the "sick" class harder to learn.


4.2.4 Poorly Estimated Classification Cost