public class EM extends RandomizableDensityBasedClusterer implements NumberOfClustersRequestable, WeightedInstancesHandler
-N <num> number of clusters. If omitted or -1 specified, then cross validation is used to select the number of clusters.
-X <num> Number of folds to use when cross-validating to find the best number of clusters.
-K <num> Number of runs of k-means to perform. (default 10)
-max <num> Maximum number of clusters to consider during cross-validation. If omitted or -1 specified, then there is no upper limit on the number of clusters.
-ll-cv <num> Minimum improvement in cross-validated log likelihood required to consider increasing the number of clusters. (default 1e-6)
-I <num> max iterations. (default 100)
-ll-iter <num> Minimum improvement in log likelihood required to perform another iteration of the E and M steps. (default 1e-6)
-V verbose.
-M <num> minimum allowable standard deviation for normal density computation (default 1e-6)
-O Display model in old format (good when there are many clusters)
-num-slots <num> Number of execution slots. (default 1 - i.e. no parallelism)
-S <num> Random number seed. (default 100)
-output-debug-info If set, clusterer is run in debug mode and may output additional info to the console
-do-not-check-capabilities If set, clusterer capabilities are not checked before clusterer is built (use with caution).
Constructor and Description |
---|
EM()
Constructor.
|
Modifier and Type | Method and Description |
---|---|
void |
buildClusterer(Instances data)
Generates a clusterer.
|
double[] |
clusterPriors()
Returns the cluster priors.
|
java.lang.String |
debugTipText()
Returns the tip text for this property
|
java.lang.String |
displayModelInOldFormatTipText()
Returns the tip text for this property
|
Capabilities |
getCapabilities()
Returns default capabilities of the clusterer (i.e., the ones of
SimpleKMeans).
|
double[][][] |
getClusterModelsNumericAtts()
Return the normal distributions for the cluster models
|
double[] |
getClusterPriors()
Return the priors for the clusters
|
boolean |
getDebug()
Get debug mode
|
boolean |
getDisplayModelInOldFormat()
Get whether to display model output in the old, original format.
|
int |
getMaximumNumberOfClusters()
Get the maximum number of clusters to consider when cross-validating
|
int |
getMaxIterations()
Get the maximum number of iterations
|
double |
getMinLogLikelihoodImprovementCV()
Get the minimum improvement in cross-validated log likelihood required to
consider increasing the number of clusters when cross-validating to find
the best number of clusters
|
double |
getMinLogLikelihoodImprovementIterating()
Get the minimum improvement in log likelihood necessary to perform another
iteration of the E and M steps.
|
double |
getMinStdDev()
Get the minimum allowable standard deviation.
|
int |
getNumClusters()
Get the number of clusters
|
int |
getNumExecutionSlots()
Get the degree of parallelism to use.
|
int |
getNumFolds()
Get the number of folds to use when cross-validating to find the best
number of clusters.
|
int |
getNumKMeansRuns()
Returns the number of runs of k-means to perform.
|
java.lang.String[] |
getOptions()
Gets the current settings of EM.
|
java.lang.String |
getRevision()
Returns the revision string.
|
java.lang.String |
globalInfo()
Returns a string describing this clusterer
|
java.util.Enumeration<Option> |
listOptions()
Returns an enumeration describing the available options.
|
double[] |
logDensityPerClusterForInstance(Instance inst)
Computes the log of the conditional density (per cluster) for a given
instance.
|
static void |
main(java.lang.String[] argv)
Main method for testing this class.
|
java.lang.String |
maximumNumberOfClustersTipText()
Returns the tip text for this property
|
java.lang.String |
maxIterationsTipText()
Returns the tip text for this property
|
java.lang.String |
minLogLikelihoodImprovementCVTipText()
Returns the tip text for this property
|
java.lang.String |
minLogLikelihoodImprovementIteratingTipText()
Returns the tip text for this property
|
java.lang.String |
minStdDevTipText()
Returns the tip text for this property
|
int |
numberOfClusters()
Returns the number of clusters.
|
java.lang.String |
numClustersTipText()
Returns the tip text for this property
|
java.lang.String |
numExecutionSlotsTipText()
Returns the tip text for this property
|
java.lang.String |
numFoldsTipText()
Returns the tip text for this property
|
java.lang.String |
numKMeansRunsTipText()
Returns the tip text for this property
|
void |
setDebug(boolean v)
Set debug mode - verbose output
|
void |
setDisplayModelInOldFormat(boolean d)
Set whether to display model output in the old, original format.
|
void |
setMaximumNumberOfClusters(int n)
Set the maximum number of clusters to consider when cross-validating
|
void |
setMaxIterations(int i)
Set the maximum number of iterations to perform
|
void |
setMinLogLikelihoodImprovementCV(double min)
Set the minimum improvement in cross-validated log likelihood required to
consider increasing the number of clusters when cross-validating to find
the best number of clusters
|
void |
setMinLogLikelihoodImprovementIterating(double min)
Set the minimum improvement in log likelihood necessary to perform another
iteration of the E and M steps.
|
void |
setMinStdDev(double m)
Set the minimum value for standard deviation when calculating normal
density.
|
void |
setMinStdDevPerAtt(double[] m) |
void |
setNumClusters(int n)
Set the number of clusters (-1 to select by CV).
|
void |
setNumExecutionSlots(int slots)
Set the degree of parallelism to use.
|
void |
setNumFolds(int folds)
Set the number of folds to use when cross-validating to find the best
number of clusters.
|
void |
setNumKMeansRuns(int intValue)
Set the number of runs of SimpleKMeans to perform.
|
void |
setOptions(java.lang.String[] options)
Parses a given list of options.
|
java.lang.String |
toString()
Outputs the generated clusters into a string.
|
getSeed, seedTipText, setSeed
distributionForInstance, logDensityForInstance, logJointDensitiesForInstance, makeCopies
clusterInstance, doNotCheckCapabilitiesTipText, forName, getDoNotCheckCapabilities, makeCopies, makeCopy, postExecution, preExecution, run, runClusterer, setDoNotCheckCapabilities
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
makeCopy
clusterInstance
public java.lang.String globalInfo()
public java.util.Enumeration<Option> listOptions()
listOptions
in interface OptionHandler
listOptions
in class RandomizableDensityBasedClusterer
public void setOptions(java.lang.String[] options) throws java.lang.Exception
-N <num> number of clusters. If omitted or -1 specified, then cross validation is used to select the number of clusters.
-X <num> Number of folds to use when cross-validating to find the best number of clusters.
-K <num> Number of runs of k-means to perform. (default 10)
-max <num> Maximum number of clusters to consider during cross-validation. If omitted or -1 specified, then there is no upper limit on the number of clusters.
-ll-cv <num> Minimum improvement in cross-validated log likelihood required to consider increasing the number of clusters. (default 1e-6)
-I <num> max iterations. (default 100)
-ll-iter <num> Minimum improvement in log likelihood required to perform another iteration of the E and M steps. (default 1e-6)
-V verbose.
-M <num> minimum allowable standard deviation for normal density computation (default 1e-6)
-O Display model in old format (good when there are many clusters)
-num-slots <num> Number of execution slots. (default 1 - i.e. no parallelism)
-S <num> Random number seed. (default 100)
-output-debug-info If set, clusterer is run in debug mode and may output additional info to the console
-do-not-check-capabilities If set, clusterer capabilities are not checked before clusterer is built (use with caution).
setOptions
in interface OptionHandler
setOptions
in class RandomizableDensityBasedClusterer
options
- the list of options as an array of stringsjava.lang.Exception
- if an option is not supportedpublic java.lang.String numKMeansRunsTipText()
public int getNumKMeansRuns()
public void setNumKMeansRuns(int intValue)
intValue
- public java.lang.String numFoldsTipText()
public void setNumFolds(int folds)
folds
- the number of folds to usepublic int getNumFolds()
public java.lang.String minLogLikelihoodImprovementCVTipText()
public void setMinLogLikelihoodImprovementCV(double min)
min
- the minimum improvement in log likelihoodpublic double getMinLogLikelihoodImprovementCV()
public java.lang.String minLogLikelihoodImprovementIteratingTipText()
public void setMinLogLikelihoodImprovementIterating(double min)
min
- the minimum improvement in log likelihoodpublic double getMinLogLikelihoodImprovementIterating()
public java.lang.String numExecutionSlotsTipText()
public void setNumExecutionSlots(int slots)
slots
- the number of tasks to run in parallel when computing the
nearest neighbors and evaluating different values of k between the
lower and upper boundspublic int getNumExecutionSlots()
public java.lang.String displayModelInOldFormatTipText()
public void setDisplayModelInOldFormat(boolean d)
d
- true if model ouput is to be shown in the old formatpublic boolean getDisplayModelInOldFormat()
public java.lang.String minStdDevTipText()
public void setMinStdDev(double m)
m
- minimum value for standard deviationpublic void setMinStdDevPerAtt(double[] m)
public double getMinStdDev()
public java.lang.String numClustersTipText()
public void setNumClusters(int n) throws java.lang.Exception
setNumClusters
in interface NumberOfClustersRequestable
n
- the number of clustersjava.lang.Exception
- if n is 0public int getNumClusters()
public void setMaximumNumberOfClusters(int n)
n
- the maximum number of clusters to considerpublic int getMaximumNumberOfClusters()
public java.lang.String maximumNumberOfClustersTipText()
public java.lang.String maxIterationsTipText()
public void setMaxIterations(int i) throws java.lang.Exception
i
- the number of iterationsjava.lang.Exception
- if i is less than 1public int getMaxIterations()
public java.lang.String debugTipText()
debugTipText
in class AbstractClusterer
public void setDebug(boolean v)
setDebug
in class AbstractClusterer
v
- true for verbose outputpublic boolean getDebug()
getDebug
in class AbstractClusterer
public java.lang.String[] getOptions()
getOptions
in interface OptionHandler
getOptions
in class RandomizableDensityBasedClusterer
public double[][][] getClusterModelsNumericAtts()
double[][][]
valuepublic double[] getClusterPriors()
double[]
valuepublic java.lang.String toString()
toString
in class java.lang.Object
public int numberOfClusters() throws java.lang.Exception
numberOfClusters
in interface Clusterer
numberOfClusters
in class AbstractClusterer
java.lang.Exception
- if number of clusters could not be returned successfullypublic Capabilities getCapabilities()
getCapabilities
in interface Clusterer
getCapabilities
in interface CapabilitiesHandler
getCapabilities
in class AbstractClusterer
Capabilities
public void buildClusterer(Instances data) throws java.lang.Exception
buildClusterer
in interface Clusterer
buildClusterer
in class AbstractClusterer
data
- set of instances serving as training datajava.lang.Exception
- if the clusterer has not been generated successfullypublic double[] clusterPriors()
clusterPriors
in interface DensityBasedClusterer
clusterPriors
in class AbstractDensityBasedClusterer
public double[] logDensityPerClusterForInstance(Instance inst) throws java.lang.Exception
logDensityPerClusterForInstance
in interface DensityBasedClusterer
logDensityPerClusterForInstance
in class AbstractDensityBasedClusterer
inst
- the instance to compute the density forjava.lang.Exception
- if the density could not be computed successfullypublic java.lang.String getRevision()
getRevision
in interface RevisionHandler
getRevision
in class AbstractClusterer
public static void main(java.lang.String[] argv)
argv
- should contain the following arguments:
-t training file [-T test file] [-N number of clusters] [-S random seed]