public class XMeans
extends weka.clusterers.RandomizableClusterer
implements weka.core.TechnicalInformationHandler
@inproceedings{Pelleg2000,
author = {Dan Pelleg and Andrew W. Moore},
booktitle = {Seventeenth International Conference on Machine Learning},
pages = {727-734},
publisher = {Morgan Kaufmann},
title = {X-means: Extending K-means with Efficient Estimation of the Number of Clusters},
year = {2000}
}
Valid options are:
-I <num> maximum number of overall iterations (default 1).
-M <num> maximum number of iterations in the kMeans loop in the Improve-Parameter part (default 1000).
-J <num> maximum number of iterations in the kMeans loop for the splitted centroids in the Improve-Structure part (default 1000).
-L <num> minimum number of clusters (default 2).
-H <num> maximum number of clusters (default 4).
-B <value> distance value for binary attributes (default 1.0).
-use-kdtree Uses the KDTree internally (default no).
-K <KDTree class specification> Full class name of KDTree class to use, followed by scheme options. eg: "weka.core.neighboursearch.kdtrees.KDTree -P" (default no KDTree class used).
-C <value> cutoff factor, takes the given percentage of the splitted centroids if none of the children win (default 0.0).
-D <distance function class specification> Full class name of Distance function class to use, followed by scheme options. (default weka.core.EuclideanDistance).
-N <file name> file to read starting centers from (ARFF format).
-O <file name> file to write centers to (ARFF format).
-U <int> The debug level. (default 0)
-Y <file name> The debug vectors file.
-S <num> Random number seed. (default 10)
RandomizableClusterer,
Serialized Form| Modifier and Type | Field and Description |
|---|---|
static int |
D_CONVCHCLOSER
have a closer look at converge children.
|
static int |
D_CURR
for current debug.
|
static int |
D_FOLLOWSPLIT
follows the splitting of the centers.
|
static int |
D_GENERAL
general debugging.
|
static int |
D_ITERCOUNT
follow iterations.
|
static int |
D_KDTREE
check on kdtree.
|
static int |
D_METH_MISUSE
functions were maybe misused.
|
static int |
D_PRINTCENTERS
print the centers.
|
static int |
D_RANDOMVECTOR
check on random vectors.
|
boolean |
m_CurrDebugFlag
Flag: I'm debugging.
|
static int |
R_HIGH
Index in ranges for HIGH.
|
static int |
R_LOW
Index in ranges for LOW.
|
static int |
R_WIDTH
Index in ranges for WIDTH.
|
| Constructor and Description |
|---|
XMeans()
the default constructor.
|
| Modifier and Type | Method and Description |
|---|---|
java.lang.String |
binValueTipText()
Returns the tip text for this property.
|
void |
buildClusterer(weka.core.Instances data)
Generates the X-Means clusterer.
|
boolean |
checkForNominalAttributes(weka.core.Instances data)
Checks for nominal attributes in the dataset.
|
int |
clusterInstance(weka.core.Instance instance)
Classifies a given instance.
|
java.lang.String |
cutOffFactorTipText()
Returns the tip text for this property.
|
java.lang.String |
debugLevelTipText()
Returns the tip text for this property.
|
java.lang.String |
debugVectorsFileTipText()
Returns the tip text for this property.
|
java.lang.String |
distanceFTipText()
Returns the tip text for this property.
|
double |
getBinValue()
Gets value that represents true in a new numeric attribute.
|
weka.core.Capabilities |
getCapabilities()
Returns default capabilities of the clusterer.
|
weka.core.Instances |
getClusterCenters()
Return the centers of the clusters as an Instances object
|
double |
getCutOffFactor()
Gets the cutoff factor.
|
int |
getDebugLevel()
Gets the debug level.
|
java.io.File |
getDebugVectorsFile()
Gets the file name for a file that has the random vectors stored.
|
weka.core.DistanceFunction |
getDistanceF()
Gets the distance function.
|
java.io.File |
getInputCenterFile()
Gets the file to read the list of centers from.
|
weka.core.neighboursearch.KDTree |
getKDTree()
Gets the KDTree class.
|
int |
getMaxIterations()
Gets the maximum number of iterations.
|
int |
getMaxKMeans()
Gets the maximum number of iterations in KMeans.
|
int |
getMaxKMeansForChildren()
Gets the maximum number of iterations in KMeans.
|
int |
getMaxNumClusters()
Gets the maximum number of clusters to generate.
|
int |
getMinNumClusters()
Gets the minimum number of clusters to generate.
|
weka.core.Instance |
getNextDebugVectorsInstance(weka.core.Instances model)
Read an instance from debug vectors file.
|
java.lang.String[] |
getOptions()
Gets the current settings of SimpleKMeans.
|
java.io.File |
getOutputCenterFile()
Gets the file to write the list of centers to.
|
java.lang.String |
getRevision()
Returns the revision string.
|
weka.core.TechnicalInformation |
getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed
information about the technical background of this class, e.g., paper
reference or book this class is based on.
|
boolean |
getUseKDTree()
Gets whether the KDTree is used or not.
|
java.lang.String |
globalInfo()
Returns a string describing this clusterer.
|
void |
initDebugVectorsInput()
Initialises the debug vector input.
|
java.lang.String |
inputCenterFileTipText()
Returns the tip text for this property.
|
java.lang.String |
KDTreeTipText()
Returns the tip text for this property.
|
java.util.Enumeration<weka.core.Option> |
listOptions()
Returns an enumeration describing the available options.
|
static void |
main(java.lang.String[] argv)
Main method for testing this class.
|
java.lang.String |
maxIterationsTipText()
Returns the tip text for this property.
|
java.lang.String |
maxKMeansForChildrenTipText()
Returns the tip text for this property.
|
java.lang.String |
maxKMeansTipText()
Returns the tip text for this property.
|
java.lang.String |
maxNumClustersTipText()
Returns the tip text for this property.
|
java.lang.String |
minNumClustersTipText()
Returns the tip text for this property.
|
int |
numberOfClusters()
Returns the number of clusters.
|
java.lang.String |
outputCenterFileTipText()
Returns the tip text for this property.
|
void |
setBinValue(double value)
Sets the distance value between true and false of binary attributes.
|
void |
setCutOffFactor(double i)
Sets a new cutoff factor.
|
void |
setDebugLevel(int d)
Sets the debug level.
|
void |
setDebugVectorsFile(java.io.File value)
Sets the file that has the random vectors stored.
|
void |
setDistanceF(weka.core.DistanceFunction distanceF)
gets the "binary" distance value.
|
void |
setInputCenterFile(java.io.File value)
Sets the file to read the list of centers from.
|
void |
setKDTree(weka.core.neighboursearch.KDTree k)
Sets the KDTree class.
|
void |
setMaxIterations(int i)
Sets the maximum number of iterations to perform.
|
void |
setMaxKMeans(int i)
Set the maximum number of iterations to perform in KMeans.
|
void |
setMaxKMeansForChildren(int i)
Sets the maximum number of iterations KMeans that is performed on the child
centers.
|
void |
setMaxNumClusters(int n)
Sets the maximum number of clusters to generate.
|
void |
setMinNumClusters(int n)
Sets the minimum number of clusters to generate.
|
void |
setOptions(java.lang.String[] options)
Parses a given list of options.
|
void |
setOutputCenterFile(java.io.File value)
Sets file to write the list of centers to.
|
void |
setUseKDTree(boolean value)
Sets whether to use the KDTree or not.
|
java.lang.String |
toString()
Return a string describing this clusterer.
|
java.lang.String |
useKDTreeTipText()
Returns the tip text for this property.
|
debugTipText, distributionForInstance, doNotCheckCapabilitiesTipText, forName, getDebug, getDoNotCheckCapabilities, makeCopies, makeCopy, postExecution, preExecution, run, runClusterer, setDebug, setDoNotCheckCapabilitiespublic static int R_LOW
public static int R_HIGH
public static int R_WIDTH
public static int D_PRINTCENTERS
public static int D_FOLLOWSPLIT
public static int D_CONVCHCLOSER
public static int D_RANDOMVECTOR
public static int D_KDTREE
public static int D_ITERCOUNT
public static int D_METH_MISUSE
public static int D_CURR
public static int D_GENERAL
public boolean m_CurrDebugFlag
public java.lang.String globalInfo()
public weka.core.TechnicalInformation getTechnicalInformation()
getTechnicalInformation in interface weka.core.TechnicalInformationHandlerpublic weka.core.Capabilities getCapabilities()
getCapabilities in interface weka.clusterers.ClusterergetCapabilities in interface weka.core.CapabilitiesHandlergetCapabilities in class weka.clusterers.AbstractClustererpublic void buildClusterer(weka.core.Instances data)
throws java.lang.Exception
buildClusterer in interface weka.clusterers.ClustererbuildClusterer in class weka.clusterers.AbstractClustererdata - set of instances serving as training datajava.lang.Exception - if the clusterer has not been generated successfullypublic boolean checkForNominalAttributes(weka.core.Instances data)
data - the data to checkpublic int clusterInstance(weka.core.Instance instance)
throws java.lang.Exception
clusterInstance in interface weka.clusterers.ClustererclusterInstance in class weka.clusterers.AbstractClustererinstance - the instance to be assigned to a clusterjava.lang.Exception - if instance could not be classified successfullypublic int numberOfClusters()
numberOfClusters in interface weka.clusterers.ClusterernumberOfClusters in class weka.clusterers.AbstractClustererpublic java.util.Enumeration<weka.core.Option> listOptions()
listOptions in interface weka.core.OptionHandlerlistOptions in class weka.clusterers.RandomizableClustererpublic java.lang.String minNumClustersTipText()
public void setMinNumClusters(int n)
n - the minimum number of clusters to generatepublic int getMinNumClusters()
public java.lang.String maxNumClustersTipText()
public void setMaxNumClusters(int n)
n - the maximum number of clusters to generatepublic int getMaxNumClusters()
public java.lang.String maxIterationsTipText()
public void setMaxIterations(int i)
throws java.lang.Exception
i - the number of iterationsjava.lang.Exception - if i is less than 1public int getMaxIterations()
public java.lang.String maxKMeansTipText()
public void setMaxKMeans(int i)
i - the number of iterationspublic int getMaxKMeans()
public java.lang.String maxKMeansForChildrenTipText()
public void setMaxKMeansForChildren(int i)
i - the number of iterationspublic int getMaxKMeansForChildren()
public java.lang.String cutOffFactorTipText()
public void setCutOffFactor(double i)
i - the new cutoff factorpublic double getCutOffFactor()
public java.lang.String binValueTipText()
public double getBinValue()
public void setBinValue(double value)
value - the distancepublic java.lang.String distanceFTipText()
public void setDistanceF(weka.core.DistanceFunction distanceF)
distanceF - the distance function with all options setpublic weka.core.DistanceFunction getDistanceF()
public java.lang.String debugVectorsFileTipText()
public void setDebugVectorsFile(java.io.File value)
value - the file to read the random vectors frompublic java.io.File getDebugVectorsFile()
public void initDebugVectorsInput()
throws java.lang.Exception
java.lang.Exception - if there is error opening the debug input file.public weka.core.Instance getNextDebugVectorsInstance(weka.core.Instances model)
throws java.lang.Exception
model - the data model for the instance.java.lang.Exception - if there are no debug vector in m_DebugVectors.public java.lang.String inputCenterFileTipText()
public void setInputCenterFile(java.io.File value)
value - the file to read centers frompublic java.io.File getInputCenterFile()
public java.lang.String outputCenterFileTipText()
public void setOutputCenterFile(java.io.File value)
value - file to write centers topublic java.io.File getOutputCenterFile()
public java.lang.String KDTreeTipText()
public void setKDTree(weka.core.neighboursearch.KDTree k)
k - a KDTree object with all options setpublic weka.core.neighboursearch.KDTree getKDTree()
public java.lang.String useKDTreeTipText()
public void setUseKDTree(boolean value)
value - if true the KDTree is usedpublic boolean getUseKDTree()
public java.lang.String debugLevelTipText()
public void setDebugLevel(int d)
d - debuglevelpublic int getDebugLevel()
public void setOptions(java.lang.String[] options)
throws java.lang.Exception
-I <num> maximum number of overall iterations (default 1).
-M <num> maximum number of iterations in the kMeans loop in the Improve-Parameter part (default 1000).
-J <num> maximum number of iterations in the kMeans loop for the splitted centroids in the Improve-Structure part (default 1000).
-L <num> minimum number of clusters (default 2).
-H <num> maximum number of clusters (default 4).
-B <value> distance value for binary attributes (default 1.0).
-use-kdtree Uses the KDTree internally (default no).
-K <KDTree class specification> Full class name of KDTree class to use, followed by scheme options. eg: "weka.core.neighboursearch.kdtrees.KDTree -P" (default no KDTree class used).
-C <value> cutoff factor, takes the given percentage of the splitted centroids if none of the children win (default 0.0).
-D <distance function class specification> Full class name of Distance function class to use, followed by scheme options. (default weka.core.EuclideanDistance).
-N <file name> file to read starting centers from (ARFF format).
-O <file name> file to write centers to (ARFF format).
-U <int> The debug level. (default 0)
-Y <file name> The debug vectors file.
-S <num> Random number seed. (default 10)
setOptions in interface weka.core.OptionHandlersetOptions in class weka.clusterers.RandomizableClustereroptions - the list of options as an array of stringsjava.lang.Exception - if an option is not supportedpublic java.lang.String[] getOptions()
getOptions in interface weka.core.OptionHandlergetOptions in class weka.clusterers.RandomizableClustererpublic java.lang.String toString()
toString in class java.lang.Objectpublic weka.core.Instances getClusterCenters()
public java.lang.String getRevision()
getRevision in interface weka.core.RevisionHandlergetRevision in class weka.clusterers.AbstractClustererpublic static void main(java.lang.String[] argv)
argv - should contain options