Table of Contents

4.3.1 Crossover Versus Mutation


4.3.2 Search in Binary Space

ICET searches for biases in a space of n+2 real numbers. Inspired by Aha and Bankert (1994), we decided to see what would happen when ICET was restricted to a space of n binary numbers and 2 real numbers. We modified ICET so that EG2 was given the true cost of each test, instead of a "pseudo-cost" or bias. For conditional test costs, we used the no-discount cost (see Section 4.2.2). The n binary digits were used to exclude or include a test. EG2 was not allowed to use excluded tests in the decision trees that it generated.

To be more precise, let be n binary numbers and let be n real numbers. For this experiment, we set to the true cost of the i-th test. In this experiment, GENESIS does not change . That is, is constant for a given test in a given dataset. Instead, GENESIS manipulates the value of for each i. The binary number is used to determine whether EG2 is allowed to use a test in its decision tree. If = 0, then EG2 is not allowed to use the i-th test (the i-th attribute). Otherwise, if = 1, EG2 is allowed to use the i-th test. EG2 uses the ICF equation as usual, with the true costs . Thus this modified version of ICET is searching through a binary bias space instead of a real bias space.

Our hypothesis was that ICET would perform better when searching in real bias space than when searching in binary bias space. Table 13 shows that this hypothesis was not confirmed. It appears to be better to search in binary bias space, rather than real bias space. However, the differences are not statistically significant.

When we searched in binary space, we set to the true cost of the i-th test. GENESIS manipulated instead of . When we searched in real space, GENESIS set to whatever value it found useful in its attempt to optimize fitness. We hypothesized that this gives an advantage to binary space search over real space search. Binary space search has direct access to the true costs of the tests, but real space search only learns about the true costs of the tests indirectly, by the feedback it gets from the fitness function.

When we examined the experiment in detail, we found that ICET did well on the Heart Disease dataset when it was searching in binary bias space, although it did poorly when it was searching in real bias space (see Section 4.1.1). We hypothesized that ICET, when searching in real space, suffered most from the lack of direct access to the true costs when it was applied to the Heart Disease dataset. These hypotheses were tested by the next experiment.


4.3.3 Seeded Population