How to choose the next attribute
What is our goal in building the tree in the first place?
- Maximize accuracy over the entire data set
- Minimize expected number of tests to classify an example
(In both cases this can argue for building the shortest tree.)
We can’t really do the first looking only at the training set: we can only build a tree accurate for our subset and assume the characteristics of the full data set are the same.
To minimize the expected number of tests
- the best test would be one where each branch has all positive or all negative instances
- the worst test would be one where the proportion of positive to negative instances is the same in every branch
- knowledge of A would provide no information about the example’s ultimate classification