Table of Contents

Appendix A. Five Medical Datasets


A.1 BUPA Liver Disorders

The BUPA Liver Disorders dataset was created by BUPA Medical Research Ltd. and it was donated to the Irvine collection by Richard Forsyth. Table 15 shows the test costs for the BUPA Liver Disorders dataset. The tests in group A are blood tests that are thought to be sensitive to liver disorders that might arise from excessive alcohol consumption. These tests share the common cost of $2.10 for collecting blood. The target concept was defined using the sixth column: Class 0 was defined as "drinks < 3" and class 1 was defined as "drinks 3". Table 16 shows the general form of the classification cost matrix that was used in the experiments in Section 4. For most of the experiments, the classification error cost equals the positive error cost equals the negative error cost. The exception is in Section 4.2.3, for the experiments with complex classification cost matrices. The terms "positive error cost" and "negative error cost" are explained in Section 4.2.3. There are 345 cases in this dataset, with no missing values. Column seven was originally used to split the data into training and testing sets. We did not use this column, since we required ten different random splits of the data. In our ten random splits, the ten training sets all had 230 cases and the ten testing sets all had 115 cases.


A.2 Heart Disease