All Packages  Class Hierarchy  This Package  Previous  Next  Index  WEKA's home

Class weka.clusterers.EM

java.lang.Object
    |
    +----weka.clusterers.Clusterer
            |
            +----weka.clusterers.DistributionClusterer
                    |
                    +----weka.clusterers.EM

public class EM
extends DistributionClusterer
implements OptionHandler
Simple EM (estimation maximisation) class.

EM assigns a probability distribution to each instance which indicates the probability of it belonging to each of the clusters. EM can decide how many clusters to create by cross validation, or you may specify apriori how many clusters to generate.

Valid options are:

-V
Verbose.

-N
Specify the number of clusters to generate. If omitted, EM will use cross validation to select the number of clusters automatically.

-I
Terminate after this many iterations if EM has not converged.

-S
Specify random number seed.

-M
Set the minimum allowable standard deviation for normal density calculation.

Version:
$Revision: 1.14.2.1 $
Author:
Mark Hall (mhall@cs.waikato.ac.nz)

Constructor Index

 o EM()
Constructor.

Method Index

 o buildClusterer(Instances)
Generates a clusterer.
 o densityForInstance(Instance)
Computes the density for a given instance.
 o distributionForInstance(Instance)
Predicts the cluster memberships for a given instance.
 o getDebug()
Get debug mode
 o getMaxIterations()
Get the maximum number of iterations
 o getMinStdDev()
Get the minimum allowable standard deviation.
 o getNumClusters()
Get the number of clusters
 o getOptions()
Gets the current settings of EM.
 o getSeed()
Get the random number seed
 o globalInfo()
Returns a string describing this clusterer
 o listOptions()
Returns an enumeration describing the available options.
 o main(String[])
Main method for testing this class.
 o maxIterationsTipText()
Returns the tip text for this property
 o minStdDevTipText()
Returns the tip text for this property
 o numberOfClusters()
Returns the number of clusters.
 o numClustersTipText()
Returns the tip text for this property
 o seedTipText()
Returns the tip text for this property
 o setDebug(boolean)
Set debug mode - verbose output
 o setMaxIterations(int)
Set the maximum number of iterations to perform
 o setMinStdDev(double)
Set the minimum value for standard deviation when calculating normal density.
 o setNumClusters(int)
Set the number of clusters (-1 to select by CV).
 o setOptions(String[])
Parses a given list of options.
 o setSeed(int)
Set the random number seed
 o toString()
Outputs the generated clusters into a string.

Constructor Detail

 o EM
public EM()
          Constructor.

Method Detail

 o globalInfo
public java.lang.String globalInfo()
          Returns a string describing this clusterer
Returns:
a description of the evaluator suitable for displaying in the explorer/experimenter gui
 o listOptions
public java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.

Valid options are:

-V
Verbose.

-N
Specify the number of clusters to generate. If omitted, EM will use cross validation to select the number of clusters automatically.

-I
Terminate after this many iterations if EM has not converged.

-S
Specify random number seed.

-M
Set the minimum allowable standard deviation for normal density calculation.

Returns:
an enumeration of all the available options
 o setOptions
public void setOptions(java.lang.String options[]) throws java.lang.Exception
          Parses a given list of options.
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported
 o minStdDevTipText
public java.lang.String minStdDevTipText()
          Returns the tip text for this property
Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui
 o setMinStdDev
public void setMinStdDev(double m)
          Set the minimum value for standard deviation when calculating normal density. Reducing this value can help prevent arithmetic overflow resulting from multiplying large densities (arising from small standard deviations) when there are many singleton or near singleton values.
Parameters:
m - minimum value for standard deviation
 o getMinStdDev
public double getMinStdDev()
          Get the minimum allowable standard deviation.
Returns:
the minumum allowable standard deviation
 o seedTipText
public java.lang.String seedTipText()
          Returns the tip text for this property
Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui
 o setSeed
public void setSeed(int s)
          Set the random number seed
Parameters:
s - the seed
 o getSeed
public int getSeed()
          Get the random number seed
Returns:
the seed
 o numClustersTipText
public java.lang.String numClustersTipText()
          Returns the tip text for this property
Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui
 o setNumClusters
public void setNumClusters(int n) throws java.lang.Exception
          Set the number of clusters (-1 to select by CV).
Parameters:
n - the number of clusters
Throws:
java.lang.Exception - if n is 0
 o getNumClusters
public int getNumClusters()
          Get the number of clusters
Returns:
the number of clusters.
 o maxIterationsTipText
public java.lang.String maxIterationsTipText()
          Returns the tip text for this property
Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui
 o setMaxIterations
public void setMaxIterations(int i) throws java.lang.Exception
          Set the maximum number of iterations to perform
Parameters:
i - the number of iterations
Throws:
java.lang.Exception - if i is less than 1
 o getMaxIterations
public int getMaxIterations()
          Get the maximum number of iterations
Returns:
the number of iterations
 o setDebug
public void setDebug(boolean v)
          Set debug mode - verbose output
Parameters:
v - true for verbose output
 o getDebug
public boolean getDebug()
          Get debug mode
Returns:
true if debug mode is set
 o getOptions
public java.lang.String[] getOptions()
          Gets the current settings of EM.
Returns:
an array of strings suitable for passing to setOptions()
 o toString
public java.lang.String toString()
          Outputs the generated clusters into a string.
Overrides:
toString in class java.lang.Object
 o numberOfClusters
public int numberOfClusters() throws java.lang.Exception
          Returns the number of clusters.
Returns:
the number of clusters generated for a training dataset.
Throws:
java.lang.Exception - if number of clusters could not be returned successfully
Overrides:
numberOfClusters in class Clusterer
 o buildClusterer
public void buildClusterer(Instances data) throws java.lang.Exception
          Generates a clusterer. Has to initialize all fields of the clusterer that are not being set via options.
Parameters:
data - set of instances serving as training data
Throws:
java.lang.Exception - if the clusterer has not been generated successfully
Overrides:
buildClusterer in class Clusterer
 o densityForInstance
public double densityForInstance(Instance inst) throws java.lang.Exception
          Computes the density for a given instance.
Parameters:
inst - the instance to compute the density for
Returns:
the density.
Throws:
java.lang.Exception - if the density could not be computed successfully
Overrides:
densityForInstance in class DistributionClusterer
 o distributionForInstance
public double[] distributionForInstance(Instance inst) throws java.lang.Exception
          Predicts the cluster memberships for a given instance.
Parameters:
data - set of test instances
instance - the instance to be assigned a cluster.
Returns:
an array containing the estimated membership probabilities of the test instance in each cluster (this should sum to at most 1)
Throws:
java.lang.Exception - if distribution could not be computed successfully
Overrides:
distributionForInstance in class DistributionClusterer
 o main
public static void main(java.lang.String argv[])
          Main method for testing this class.
Parameters:
argv - should contain the following arguments:

-t training file [-T test file] [-N number of clusters] [-S random seed]


All Packages  Class Hierarchy  This Package  Previous  Next  Index  WEKA's home