ICET is a hybrid of a genetic algorithm and a decision tree induction algorithm. The genetic algorithm evolves a population of biases for the decision tree induction algorithm. The genetic algorithm we use is GENESIS (Grefenstette, 1986). The decision tree induction algorithm is C4.5 (Quinlan, 1992), modified to use ICF. That is, the decision tree induction algorithm is EG2, implemented as described in Section 3.2.
ICET uses a two-tiered search strategy. On the bottom tier, EG2 performs a greedy search through the space of decision trees, using the standard TDIDT strategy. On the top tier, GENESIS performs a genetic search through a space of biases. The biases are used to modify the behavior of EG2. In other words, GENESIS controls EG2's preference for one type of decision tree over another.
ICET does not use EG2 the way it was designed to be used. The
n costs,
, used in EG2's
attribute selection function, are treated by ICET as bias parameters,
not as costs. That is, ICET manipulates the bias of EG2 by adjusting
the parameters,
. In ICET, the values of
the bias parameters,
, have no direct
connection to the actual costs of the tests.
Genetic algorithms are inspired by biological evolution. The individuals that are evolved by GENESIS are strings of bits. GENESIS begins with a population of randomly generated individuals (bit strings) and then it measures the "fitness" of each individual. In ICET, an individual (a bit string) represents a bias for EG2. An individual is evaluated by running EG2 on the data, using the bias of the given individual. The "fitness" of the individual is the average cost of classification of the decision tree that is generated by EG2. In the next generation, the population is replaced with new individuals. The new individuals are generated from the previous generation, using mutation and crossover (sex). The fittest individuals in the first generation have the most offspring in the second generation. After a fixed number of generations, ICET halts and its output is the decision tree determined by the fittest individual. Figure 2 gives a sketch of the ICET algorithm.
GENESIS has several parameters that can be used to alter its
performance. The parameters we used are listed in Table 4. These are
essentially the default parameter settings (Grefenstette, 1986). We used
a population size of 50 individuals and 1,000 trials, which results in
20 generations. An individual in the population consists of a string of
n+2 numbers, where n is the number of attributes
(tests) in the given dataset. The n+2 numbers are represented
in binary format, using a Gray code. This
binary string is used as a bias for EG2. The first n numbers
in the string are treated as if they were the n costs,
, used in ICF (equation (2)). The first n
numbers range from 1 to 10,000 and are coded with 12 binary digits
each. The last two numbers in the string are used to set
and CF. The parameter
is used in ICF. The parameter CF is used in C4.5 to control the level
of pruning of the decision tree. The last two numbers are coded with 8
binary digits each.
ranges from 0 (cost
is ignored) to 1 (maximum sensitivity to cost) and CF ranges from 1
(high pruning) to 100 (low pruning). Thus an individual is a string of
12n+16 bits.
Each trial of an individual consists of running EG2 (implemented as a
modification to C4.5) on a given training dataset, using the numbers
specified in the binary string to set
(i = 1, ..., n),
, and
CF. The training dataset is randomly split into two equal-sized subsets (
for odd-sized training sets), a sub-training set and a
sub-testing set. A different random split is used for each trial, so
the outcome of a trial is stochastic. We cannot assume that identical
individuals yield identical outcomes, so every individual must be
evaluated. This means that there will be duplicate individuals in the
population, with slightly different fitness scores. The measure of
fitness of an individual is the average cost of classification on the
sub-testing set, using the decision tree that was generated on the
sub-training set. The average cost is measured as described in Section 2.2. After 1,000 trials, the most
fit (lowest cost) individual is then used as a bias for EG2 with the
whole training set as input. The resulting decision tree is the output
of ICET for the given training dataset.
The n costs (bias parameters),
,
used in ICF, are not directly related to the true costs of the
attributes. The 50 individuals in the first generation are generated
randomly, so the initial values of
have
no relation to the true costs. After 20 generations, the values of
may have some relation to the true costs, but
it will not be a simple relationship. These values of
are more appropriately thought of as biases
than costs. Thus GENESIS is searching through a bias space for biases
for C4.5 that result in decision trees with low average cost.
The biases
range from 1 to 10,000. When a
bias
is greater than 9,000, the
i-th attribute is ignored. That is, the i-th
attribute is not available for C4.5 to include in the decision tree,
even if it might maximize
. This
threshold of 9,000 was arbitrarily chosen. There was no attempt to
optimize this value by experimentation.
We chose to use EG2 in ICET, rather than IDX or CS-ID3, because EG2 has
the parameter
, which gives GENESIS
greater control over the bias of EG2.
is partly based on the data (via the information gain,
) and it is partly based on the bias (via the
"pseudo-cost",
). The exact mix of data
and bias can be controlled by varying
.
Otherwise, there is no reason to prefer EG2 to IDX or CS-ID3, which
could easily be used instead of EG2.
The treatment of delayed tests and conditional test costs is not "hard-wired" into EG2. It is built into the fitness function used by GENESIS, the average cost of classification (measured as described in Section 2). This makes it relatively simple to extend ICET to handle other pragmatic constraints on the decision trees.
In effect, GENESIS "lies" to EG2 about the costs of the tests. How can lies improve the performance of EG2? EG2 is a hill-climbing algorithm that can get trapped at a local optimum. It is a greedy algorithm that looks only one test ahead as it builds a decision tree. Because it looks only one step ahead, EG2 suffers from the horizon effect. This term is taken from the literature on chess playing programs. Suppose that a chess playing program has a fixed three-move lookahead depth and it finds that it will loose its queen in three moves, if it follows a certain branch of the game tree. There may be an alternate branch where the program first sacrifices a pawn and then loses its queen in four moves. Because the loss of the queen is over its three-move horizon, the program may foolishly decide to sacrifice its pawn. One move later, it is again faced with the loss of its queen. Analogously, EG2 may try to avoid a certain expensive test by selecting a less expensive test. One test later, it is again faced with the more expensive test. After it has exhausted all the cheaper tests, it may be forced to do the expensive test, in spite of its efforts to avoid the test. GENESIS can prevent this short-sighted behavior by telling lies to EG2. GENESIS can exaggerate the cost of the cheap tests or it can understate the cost of the expensive test. Based on past trials, GENESIS can find the lies that yield the best performance from EG2.
In ICET, learning (local search in EG2) and evolution (in GENESIS) interact. A common form of hybrid genetic algorithm uses local search to improve the individuals in a population (Schaffer et al., 1992). The improvements are then coded into the strings that represent the individuals. This is a form of Lamarckian evolution. In ICET, the improvements due to EG2 are not coded into the strings. However, the improvements can accelerate evolution by altering the fitness landscape. This phenomenon (and other phenomena that result from this form of hybrid) is known as the Baldwin effect (Baldwin, 1896; Morgan, 1896; Waddington, 1942; Maynard Smith, 1987; Hinton & Nowlan, 1987; Ackley & Littman, 1991; Whitley & Gruau, 1993; Whitley et al., 1994; Anderson, in press). The Baldwin effect may explain much of the success of ICET.