Why prefer short hypotheses?
Argument in favor: Fewer short hyps. than long hyps. [] a short hyp that fits data unlikely to be coincidence [] a long hyp that fits data might be coincidence
Argument opposed: There are many ways to define small sets of hyps e.g., all trees with a prime number of nodes that use attributes beginning with ``Z'' What's so special about small sets based on size of hypothesis??