Main Page | Modules | Data Structures | File List | Globals | Related Pages

bindata File Reference

Detailed Description

Converts continuous attributes into discrete ones.

Converts all continuous attributes in a data set to categorical ones. Uses two passes over data, one to gather the stats needed to pick bin boundaries, and one to do the conversion (although the first pass can be done on a sample with the -samples argument below).

bindata uses one of two methods to select bin boundaries. The first is to find the range of each attribute (by identifing its highest and lowest value) and then dividing the range into even with bins. This is the default method. The other method assumes that the attribute was generated from a Gaussian, estimates the mean and variance of the Gaussian from data, and sets bin boundaries so that each bin holds an even amount of the Gaussian's probability mass.

to Chun-Hsiang Hung for doing the core development work for this tool.

Wish List:
that this tool would have more methods for selecting bin boundaries, for example to reduce entropy.


Generated for VFML by doxygen hosted by Logo