The Entropy (Disorder) of a Collection
Suppose S is a collection containing positive and negative examples of the target concept:
- Entropy(S) ? – (p+ log2 p+ + p- log2 p-)
- where p+ is the fraction of examples that are positive and p- is the fraction of examples that are negative
Good features
- minimum of 0 where p+ = 0 and where p- = 0
- maximum of 1 where p+ = p- = 0.5
Interpretation: how far away are we from having a leaf node in the tree?
The best attribute would reduce the entropy in the child collections as quickly as possible.