Main Page | Modules | Data Structures | File List | Globals | Related Pages

cleandata File Reference


Detailed Description

Cleans up a data set in several ways.

This tool cleans a data set in several ways, and outputs the cleaned data. It will scan the training data and gather some simple stats about it (if needed, see below). It then does a pass over training and testing data, filling in missing categorical attribute values with the most common value given the class, and filling in missing continuous attribute values with the average value given the class.

The tool can also add a new attribute value to each categorical attribe, called 'u' (short for unknown) and rewrite the data set as appropriate.

The tool can also remove every attribute in the data set that is marked ignored.

cleandata accepts an input stem and expects to find a file stem.data and stem.names and optionally one called stem.test. cleandata outputs a file named stem-clean.data, stem-clean.names, and optionally stem-clan.test.

cleandata will work with a single pass over the data set if and only if you use -addValue and there are no continuous attributes. Otherwise it usues an additional pass.

Thanks:
to Chun-Hsiang Hung for doing the core development work for this tool.

Arguments


Generated for VFML by doxygen hosted by SourceForge.net Logo