Homework 4B Machine Learning Assignment Assigned 11/28/01 Notes: You may work in teams of up to three people for this assignment. You may *not* work with anyone that you have previously worked with during this class. Due Date: Monday December 17th Assignment: 1. Write a Naive Bayes classifier. Look at the format of the data sets before you start programming. 2. Download and get C4.5 working. (http://www.cse.unsw.edu.au/~quinlan/c4.5r8.tar.gz) 3. Download the following datasets from the UCI machine learning website (http://www1.ics.uci.edu/~mlearn/MLSummary.html): Chess Endgames(king-rook vs king-pawn) Congressional Voting Records Echocardiogram Hepatitis Horse Colic Hypothyroid Labor Relations Lung Cancer Post-operative Patient Promoters Gene Sequencers Sonar Soybean 4. Perform a 10-fold cross-validation accuracy study of the performance of your Naive Bayes classifier and Quinlan's C4.5 decision tree implementation. For each of the datasets above report the mean and standard deviation of each machine learning algorithm across each fold. Do not differentiate between different types of errors (for example false positive and false negative) Discretize any continuous attributes into 10 equal sized bins before training/testing your machine learners. Make sure you include the -s option when you run C4.5. 5. Create two additional datasets with at least 100 examples. Using the same procedures as above, demonstrate that your Naive Bayes implementation algorithm is more accurate on the first data set and that C4.5 is more accurate on the second data set. If the mean accuracy plus or minus one standard deviation overlaps between the two machine learners on one data set, in addition to the mean and standard deviation, demonstrate the statistical significance of your results using an appropriate test. (for example a paired t-test or paired Wilcoxon signed rank test) 6. Turn-in a hard-copy of the table of results from 2, a discussion of the characteristics of the datasets in part 5, and a cover sheet identifying the group members, their email addresses and signatures acknowledging that this is your group's independent work.