EG2 (Nunez, 1991) is a TDIDT algorithm that uses the Information Cost Function (ICF) (Nunez, 1991) for selection of attributes. ICF selects attributes based on both their information gain and their cost. We implemented EG2 by modifying the C4.5 source code so that ICF was used instead of information gain ratio.
ICF for the i-th attribute,
,
is defined as follows:
In this equation,
is the information
gain associated with the i-th attribute at a given stage in
the construction of the decision tree and
is the cost of measuring the i-th attribute. C4.5 selects the
attribute that maximizes the information gain ratio, which is a
function of the information gain
.
We modified C4.5 so that it selects the attribute that maximizes
.
The parameter
adjusts the strength of
the bias towards lower cost attributes. When
= 0, cost is ignored and selection by
is equivalent to selection by
. When
= 1,
is strongly biased by cost. Ideally,
would be selected in a way that is sensitive
to classification error cost (this is done in ICET -- see Section 3.5). Nunez (1991) does not suggest a principled
way of setting
. In our experiments
with EG2,
was set to 1. In other words,
we used the following selection measure:
In addition to its sensitivity to the cost of tests, EG2 generalizes attributes by using an ISA tree (a generalization hierarchy). We did not implement this aspect of EG2, since it was not relevant for the experiments reported here.