ICET searches for biases in a space of n+2 real numbers. Inspired by Aha and Bankert (1994), we decided to see what would happen when ICET was restricted to a space of n binary numbers and 2 real numbers. We modified ICET so that EG2 was given the true cost of each test, instead of a "pseudo-cost" or bias. For conditional test costs, we used the no-discount cost (see Section 4.2.2). The n binary digits were used to exclude or include a test. EG2 was not allowed to use excluded tests in the decision trees that it generated.
To be more precise, let
be n
binary numbers and let
be n
real numbers. For this experiment, we set
to the true cost of the i-th test. In this experiment,
GENESIS does not change
. That is,
is constant for a given test in a given
dataset. Instead, GENESIS manipulates the value of
for each i. The binary number
is used to determine whether EG2 is allowed to use a
test in its decision tree. If
= 0, then EG2
is not allowed to use the i-th test (the i-th attribute).
Otherwise, if
= 1, EG2 is allowed to use
the i-th test. EG2 uses the ICF equation as usual, with the
true costs
. Thus this modified version of
ICET is searching through a binary bias space instead of a real bias
space.
Our hypothesis was that ICET would perform better when searching in real bias space than when searching in binary bias space. Table 13 shows that this hypothesis was not confirmed. It appears to be better to search in binary bias space, rather than real bias space. However, the differences are not statistically significant.
When we searched in binary space, we set
to the true cost of the i-th test. GENESIS manipulated
instead of
. When
we searched in real space, GENESIS set
to
whatever value it found useful in its attempt to optimize fitness. We
hypothesized that this gives an advantage to binary space search over
real space search. Binary space search has direct access to the true
costs of the tests, but real space search only learns about the true
costs of the tests indirectly, by the feedback it gets from the fitness
function.
When we examined the experiment in detail, we found that ICET did well on the Heart Disease dataset when it was searching in binary bias space, although it did poorly when it was searching in real bias space (see Section 4.1.1). We hypothesized that ICET, when searching in real space, suffered most from the lack of direct access to the true costs when it was applied to the Heart Disease dataset. These hypotheses were tested by the next experiment.