next up previous
Next: ch3 Up: ch3 Previous: ch3

Avoiding Overfitting

How can we avoid overfitting?

stop growing when data split not statistically significant grow full tree, then post-prune


How to select ``best'' tree: Measure performance over training data

Measure performance over separate validation data set

MDL: minimize $size(tree) + size(misclassifications(tree))$



Don Patterson 2001-12-13