# clusterdata File Reference

## Detailed Description

Creates a synthetic data set from randomly generated clusters.
This program creates a synthetic data set by selecting cluster centroids and generating samples by ranomly picking a centroid and sampling from a spherical gaussian with the centroid as its mean and a user specified standard deviation. This data generator has been used to evaluate the VFKM and VFEM systems.

The centroids are randomly placed uniformly in a N dimensional unit hypercube (where N is the number of continuous dimensions), except that if any centroid is placed closer than: `(sqrt(N) / (num centroids + 1)) * std deviation `

to an already placed one its location is resampled. (If any centroid can not be placed after 1000 resamples and error is reported.) Note that 'unit hypercube' means that each dimension ranges from 0 - 1.0.

Finally, training samples are generated by randomly selecting a centroid and sampling from a Gaussian with it as the mean and the specified standard deviation (specified by a parameter to the program) for each dimension. Note that the value of a sample's dimension may fall outside the 0 - 1.0 range.

## Arguments

- -f 'stem name'
- -continuous 'number of continous dimenstions'
- -clusters 'number of clusters'
- -train 'size of training set'
- -infinite
- Generate an infinite stream of training examples, overrides -train flags, only makes sense with -stdout (default off) -stdout output the trainset to stdout (default to 'stem'.data)

- -stdev 'the standard deviation on each dimension'
- -conceptSeed 'the multiplier for the concept seed'
- -seed 'random seed'
- -target 'dir'
- Set the output directory (default '.')

- -v
- Increase the message level

- -h
- Run with this argument to get a list of arguments and their meanings.

## Example

`clusterdata -continuous 20 -clusters 3 -stdev 0.05 -conceptSeed 21 -seed 1234 -train 1000`

Creates 1000 samples in 20 dimensions by sampling from a mixture of 3 Gaussians with a standard deviation of 0.05. This same data set could be recreated by using the same *seed* and *conceptSeed* flags.

Generated for VFML by
hosted by