Entropy and Information Gain

The best attribute is one that maximizes the expected decrease in entropy
- if entropy decreases to 0, the tree need not be expanded further
- if entropy does not decrease at all, the attribute was useless

Gain is defined to be
- Gain(S, A) = Entropy(S) – ?v ? values(A) p{A=v} Entropy(S{A=v})
- where p{A=v} is the proportion of S where A=v, and
- S{A=v} is the collection taken by selecting those elements of S where A=v

The best attribute is one that maximizes the expected decrease in entropy