Table of Contents

3.4 IDX


3.5 ICET

ICET is a hybrid of a genetic algorithm and a decision tree induction algorithm. The genetic algorithm evolves a population of biases for the decision tree induction algorithm. The genetic algorithm we use is GENESIS (Grefenstette, 1986). The decision tree induction algorithm is C4.5 (Quinlan, 1992), modified to use ICF. That is, the decision tree induction algorithm is EG2, implemented as described in Section 3.2.

ICET uses a two-tiered search strategy. On the bottom tier, EG2 performs a greedy search through the space of decision trees, using the standard TDIDT strategy. On the top tier, GENESIS performs a genetic search through a space of biases. The biases are used to modify the behavior of EG2. In other words, GENESIS controls EG2's preference for one type of decision tree over another.

ICET does not use EG2 the way it was designed to be used. The n costs, , used in EG2's attribute selection function, are treated by ICET as bias parameters, not as costs. That is, ICET manipulates the bias of EG2 by adjusting the parameters, . In ICET, the values of the bias parameters, , have no direct connection to the actual costs of the tests.

Genetic algorithms are inspired by biological evolution. The individuals that are evolved by GENESIS are strings of bits. GENESIS begins with a population of randomly generated individuals (bit strings) and then it measures the "fitness" of each individual. In ICET, an individual (a bit string) represents a bias for EG2. An individual is evaluated by running EG2 on the data, using the bias of the given individual. The "fitness" of the individual is the average cost of classification of the decision tree that is generated by EG2. In the next generation, the population is replaced with new individuals. The new individuals are generated from the previous generation, using mutation and crossover (sex). The fittest individuals in the first generation have the most offspring in the second generation. After a fixed number of generations, ICET halts and its output is the decision tree determined by the fittest individual. Figure 2 gives a sketch of the ICET algorithm.

GENESIS has several parameters that can be used to alter its performance. The parameters we used are listed in Table 4. These are essentially the default parameter settings (Grefenstette, 1986). We used a population size of 50 individuals and 1,000 trials, which results in 20 generations. An individual in the population consists of a string of n+2 numbers, where n is the number of attributes (tests) in the given dataset. The n+2 numbers are represented in binary format, using a Gray code. This binary string is used as a bias for EG2. The first n numbers in the string are treated as if they were the n costs, , used in ICF (equation (2)). The first n numbers range from 1 to 10,000 and are coded with 12 binary digits each. The last two numbers in the string are used to set and CF. The parameter is used in ICF. The parameter CF is used in C4.5 to control the level of pruning of the decision tree. The last two numbers are coded with 8 binary digits each. ranges from 0 (cost is ignored) to 1 (maximum sensitivity to cost) and CF ranges from 1 (high pruning) to 100 (low pruning). Thus an individual is a string of 12n+16 bits.

Each trial of an individual consists of running EG2 (implemented as a modification to C4.5) on a given training dataset, using the numbers specified in the binary string to set (i = 1, ..., n), , and CF. The training dataset is randomly split into two equal-sized subsets ( for odd-sized training sets), a sub-training set and a sub-testing set. A different random split is used for each trial, so the outcome of a trial is stochastic. We cannot assume that identical individuals yield identical outcomes, so every individual must be evaluated. This means that there will be duplicate individuals in the population, with slightly different fitness scores. The measure of fitness of an individual is the average cost of classification on the sub-testing set, using the decision tree that was generated on the sub-training set. The average cost is measured as described in Section 2.2. After 1,000 trials, the most fit (lowest cost) individual is then used as a bias for EG2 with the whole training set as input. The resulting decision tree is the output of ICET for the given training dataset.

The n costs (bias parameters), , used in ICF, are not directly related to the true costs of the attributes. The 50 individuals in the first generation are generated randomly, so the initial values of have no relation to the true costs. After 20 generations, the values of may have some relation to the true costs, but it will not be a simple relationship. These values of are more appropriately thought of as biases than costs. Thus GENESIS is searching through a bias space for biases for C4.5 that result in decision trees with low average cost.

The biases range from 1 to 10,000. When a bias is greater than 9,000, the i-th attribute is ignored. That is, the i-th attribute is not available for C4.5 to include in the decision tree, even if it might maximize . This threshold of 9,000 was arbitrarily chosen. There was no attempt to optimize this value by experimentation.

We chose to use EG2 in ICET, rather than IDX or CS-ID3, because EG2 has the parameter , which gives GENESIS greater control over the bias of EG2. is partly based on the data (via the information gain, ) and it is partly based on the bias (via the "pseudo-cost", ). The exact mix of data and bias can be controlled by varying . Otherwise, there is no reason to prefer EG2 to IDX or CS-ID3, which could easily be used instead of EG2.

The treatment of delayed tests and conditional test costs is not "hard-wired" into EG2. It is built into the fitness function used by GENESIS, the average cost of classification (measured as described in Section 2). This makes it relatively simple to extend ICET to handle other pragmatic constraints on the decision trees.

In effect, GENESIS "lies" to EG2 about the costs of the tests. How can lies improve the performance of EG2? EG2 is a hill-climbing algorithm that can get trapped at a local optimum. It is a greedy algorithm that looks only one test ahead as it builds a decision tree. Because it looks only one step ahead, EG2 suffers from the horizon effect. This term is taken from the literature on chess playing programs. Suppose that a chess playing program has a fixed three-move lookahead depth and it finds that it will loose its queen in three moves, if it follows a certain branch of the game tree. There may be an alternate branch where the program first sacrifices a pawn and then loses its queen in four moves. Because the loss of the queen is over its three-move horizon, the program may foolishly decide to sacrifice its pawn. One move later, it is again faced with the loss of its queen. Analogously, EG2 may try to avoid a certain expensive test by selecting a less expensive test. One test later, it is again faced with the more expensive test. After it has exhausted all the cheaper tests, it may be forced to do the expensive test, in spite of its efforts to avoid the test. GENESIS can prevent this short-sighted behavior by telling lies to EG2. GENESIS can exaggerate the cost of the cheap tests or it can understate the cost of the expensive test. Based on past trials, GENESIS can find the lies that yield the best performance from EG2.

In ICET, learning (local search in EG2) and evolution (in GENESIS) interact. A common form of hybrid genetic algorithm uses local search to improve the individuals in a population (Schaffer et al., 1992). The improvements are then coded into the strings that represent the individuals. This is a form of Lamarckian evolution. In ICET, the improvements due to EG2 are not coded into the strings. However, the improvements can accelerate evolution by altering the fitness landscape. This phenomenon (and other phenomena that result from this form of hybrid) is known as the Baldwin effect (Baldwin, 1896; Morgan, 1896; Waddington, 1942; Maynard Smith, 1987; Hinton & Nowlan, 1987; Ackley & Littman, 1991; Whitley & Gruau, 1993; Whitley et al., 1994; Anderson, in press). The Baldwin effect may explain much of the success of ICET.


4. Experiments