next up previous
Next: ch3 Up: ch3 Previous: ch3

Attributes with Many Values

Problem: If attribute has many values, $Gain$ will select it Imagine using $Date = Jun\_3\_1996$ as attribute



One approach: use $GainRatio$ instead

\begin{displaymath}GainRatio(S,A) \equiv \frac{Gain(S,A)}{SplitInformation(S,A)} \end{displaymath}


\begin{displaymath}SplitInformation(S,A) \equiv - \sum_{i=1}^{c} \frac{\vert S_{...
...}{\vert S\vert} \log_{2}
\frac{\vert S_{i}\vert}{\vert S\vert} \end{displaymath}

where $S_{i}$ is subset of $S$ for which $A$ has value $v_{i}$



Don Patterson 2001-12-13