Table of Contents

4.3.2 Search in Binary Space


4.3.3 Seeded Population

In this experiment, we returned to searching in real bias space, but we seeded the initial population of biases with the true test costs. This gave ICET direct access to the true test costs. For conditional test costs, we used the no-discount cost (see Section 4.2.2). In the baseline experiment (Section 4.1), the initial population consists of 50 randomly generated strings, representing n+2 real numbers. In this experiment, the initial population consists of 49 randomly generated strings and one manually generated string. In the manually generated string, the first n numbers are the true test costs. The last two numbers were set to 1.0 (for ) and 25 (for CF). This string is exactly the bias of EG2, as implemented here (Section 3.2).

Our hypotheses were (1) that ICET would perform better (on average) when the initial population is seeded than when it is purely random, (2) that ICET would perform better (on average) searching in real space with a seeded population than when searching in binary space, and (3) that ICET would perform better on the Heart Disease dataset when the initial population is seeded than when it is purely random. Table 14 appears to support the first two hypotheses. Figure 6 appears to support the third hypothesis. However, the results are not statistically significant.

This experiment raises some interesting questions: Should seeding the population be built into the ICET algorithm? Should we seed the whole population with the true costs, perturbed by some random noise? Perhaps this is the right approach, but we prefer to modify (equation (2)), the device by which GENESIS controls the decision tree induction. We could alter this equation so that it contains both the true costs and some bias parameters. This seems to make more sense than our current approach, which deprives EG2 of direct access to the true costs. We discuss some other ideas for modifying the equation in Section 5.2.

Incidentally, this experiment lets us answer the following question: Does the genetic search in bias space do anything useful? If we start with the true costs of the tests and reasonable values for the parameters and CF, how much improvement do we get from the genetic search? In this experiment, we seeded the population with an individual that represents exactly the bias of EG2 (the first n numbers are the true test costs and the last two numbers are 1.0 for and 25 for CF). Therefore we can determine the value of genetic search by comparing EG2 with ICET. ICET starts with the bias of EG2 (as a seed in the first generation) and attempts to improve the bias. The score of EG2 in Table 14 shows the value of the bias built into EG2. The score of ICET in Table 14 shows how genetic search in bias space can improve the built-in bias of EG2. When the cost of misclassification errors has the same order of magnitude as the test costs ($10 to $100), EG2 averages 43% of the standard cost, while ICET averages 25% of the standard cost. When the cost of misclassification errors ranges from $10 to $10,000, EG2 averages 58% of the standard cost, while ICET averages 46% of the standard cost. Both of these differences are significant with more than 95% confidence. This makes it clear that genetic search is adding value.


5. Discussion