In this experiment, we returned to searching in real bias space, but we
seeded the initial population of biases with the true test costs. This
gave ICET direct access to the true test costs. For conditional test
costs, we used the no-discount cost (see Section 4.2.2). In the baseline experiment
(Section 4.1), the initial population
consists of 50 randomly generated strings, representing n+2
real numbers. In this experiment, the initial population consists of 49
randomly generated strings and one manually generated string. In the
manually generated string, the first n numbers are the true
test costs. The last two numbers were set to 1.0 (for
) and 25 (for CF). This string is exactly the bias of
EG2, as implemented here (Section 3.2).
Our hypotheses were (1) that ICET would perform better (on average) when the initial population is seeded than when it is purely random, (2) that ICET would perform better (on average) searching in real space with a seeded population than when searching in binary space, and (3) that ICET would perform better on the Heart Disease dataset when the initial population is seeded than when it is purely random. Table 14 appears to support the first two hypotheses. Figure 6 appears to support the third hypothesis. However, the results are not statistically significant.
This experiment raises some interesting questions: Should seeding the
population be built into the ICET algorithm? Should we seed the whole
population with the true costs, perturbed by some random noise?
Perhaps this is the right approach, but we prefer to modify
(equation
(2)), the device by which GENESIS controls the decision tree
induction. We could alter this equation
so that it contains both the true costs and some bias parameters. This
seems to make more sense than our current approach, which deprives EG2
of direct access to the true costs. We discuss some other ideas for
modifying the equation in Section 5.2.
Incidentally, this experiment lets us answer the following question:
Does the genetic search in bias space do anything useful? If we start
with the true costs of the tests and reasonable values for the
parameters
and CF, how much improvement
do we get from the genetic search? In this experiment, we seeded the
population with an individual that represents exactly the bias of EG2
(the first n numbers are the true test costs and the last two
numbers are 1.0 for
and 25 for CF).
Therefore we can determine the value of genetic search by comparing EG2
with ICET. ICET starts with the bias of EG2 (as a seed in the first
generation) and attempts to improve the bias. The score of EG2 in Table
14 shows the value of the bias built into EG2. The score of ICET in
Table 14 shows how genetic search in bias space can improve the
built-in bias of EG2. When the cost of misclassification errors has the
same order of magnitude as the test costs ($10 to $100), EG2 averages
43% of the standard cost, while ICET averages 25% of the standard cost.
When the cost of misclassification errors ranges from $10 to $10,000,
EG2 averages 58% of the standard cost, while ICET averages 46% of the
standard cost. Both of these differences are significant with more than
95% confidence. This makes it clear that genetic search is adding
value.