Main Page | Modules | Data Structures | File List | Globals | Related Pages

vfkm File Reference

Detailed Description

Performs k-means clustering accelerated with sampling.

Performs k-mean clustering accelerated with sampling as described in this paper. This learner ignores categorical attributes. vfkm performs several iterations of clustering on progressively larger samples until it determines with high confidence (see -delta below) that the clustering it achieves is within -epsilon of the one that would be achieved using infinite data for each decision. vfkm can use a fancy optimization to select the number of samples to use in each iteration of the next round, or it can use straight progressive sampling. You can use the -batch argument to turn off the sampling and do traditional k-means clustering. vfkm evaluates the learned centers by comparing to the centers found in <stem>.test as follows. Learned centers are greedily matched to the closest of the test centers until each center has a match, and then the evaluation is the sum of the squared distance between each test center and its matched learned center.

The learner takes input and does output in c4.5 format. It expects to find the files <stem>.names and <stem>.data. and outputs the learned centers to a file called <stem>.centers.


Generated for VFML by doxygen hosted by Logo