How to choose the next attribute

What is our goal in building the tree in the first place?
- Maximize accuracy over the entire data set
- Minimize expected number of tests to classify an example

We can’t really do the first looking only at the training set: we can only build a tree accurate for our subset and assume the characteristics of the full data set are the same.

To minimize the expected number of tests
- the best test would be one where each branch has all positive or all negative instances
- the worst test would be one where the proportion of positive to negative instances is the same in every branch
  - knowledge of A would provide no information about the example’s ultimate classification

What is our goal in building the tree in the first place?