public class XMeans extends RandomizableClusterer implements TechnicalInformationHandler
@inproceedings{Pelleg2000, author = {Dan Pelleg and Andrew W. Moore}, booktitle = {Seventeenth International Conference on Machine Learning}, pages = {727-734}, publisher = {Morgan Kaufmann}, title = {X-means: Extending K-means with Efficient Estimation of the Number of Clusters}, year = {2000} }Valid options are:
-I <num> maximum number of overall iterations (default 1).
-M <num> maximum number of iterations in the kMeans loop in the Improve-Parameter part (default 1000).
-J <num> maximum number of iterations in the kMeans loop for the splitted centroids in the Improve-Structure part (default 1000).
-L <num> minimum number of clusters (default 2).
-H <num> maximum number of clusters (default 4).
-B <value> distance value for binary attributes (default 1.0).
-use-kdtree Uses the KDTree internally (default no).
-K <KDTree class specification> Full class name of KDTree class to use, followed by scheme options. eg: "weka.core.neighboursearch.kdtrees.KDTree -P" (default no KDTree class used).
-C <value> cutoff factor, takes the given percentage of the splitted centroids if none of the children win (default 0.0).
-D <distance function class specification> Full class name of Distance function class to use, followed by scheme options. (default weka.core.EuclideanDistance).
-N <file name> file to read starting centers from (ARFF format).
-O <file name> file to write centers to (ARFF format).
-U <int> The debug level. (default 0)
-Y <file name> The debug vectors file.
-S <num> Random number seed. (default 10)
RandomizableClusterer
,
Serialized FormModifier and Type | Field and Description |
---|---|
static int |
D_CONVCHCLOSER
have a closer look at converge children.
|
static int |
D_CURR
for current debug.
|
static int |
D_FOLLOWSPLIT
follows the splitting of the centers.
|
static int |
D_GENERAL
general debugging.
|
static int |
D_ITERCOUNT
follow iterations.
|
static int |
D_KDTREE
check on kdtree.
|
static int |
D_METH_MISUSE
functions were maybe misused.
|
static int |
D_PRINTCENTERS
print the centers.
|
static int |
D_RANDOMVECTOR
check on random vectors.
|
boolean |
m_CurrDebugFlag
Flag: I'm debugging.
|
static int |
R_HIGH
Index in ranges for HIGH.
|
static int |
R_LOW
Index in ranges for LOW.
|
static int |
R_WIDTH
Index in ranges for WIDTH.
|
Constructor and Description |
---|
XMeans()
the default constructor.
|
Modifier and Type | Method and Description |
---|---|
java.lang.String |
binValueTipText()
Returns the tip text for this property.
|
void |
buildClusterer(Instances data)
Generates the X-Means clusterer.
|
boolean |
checkForNominalAttributes(Instances data)
Checks for nominal attributes in the dataset.
|
int |
clusterInstance(Instance instance)
Classifies a given instance.
|
java.lang.String |
cutOffFactorTipText()
Returns the tip text for this property.
|
java.lang.String |
debugLevelTipText()
Returns the tip text for this property.
|
java.lang.String |
debugVectorsFileTipText()
Returns the tip text for this property.
|
java.lang.String |
distanceFTipText()
Returns the tip text for this property.
|
double |
getBinValue()
Gets value that represents true in a new numeric attribute.
|
Capabilities |
getCapabilities()
Returns default capabilities of the clusterer.
|
Instances |
getClusterCenters()
Return the centers of the clusters as an Instances object
|
double |
getCutOffFactor()
Gets the cutoff factor.
|
int |
getDebugLevel()
Gets the debug level.
|
java.io.File |
getDebugVectorsFile()
Gets the file name for a file that has the random vectors stored.
|
DistanceFunction |
getDistanceF()
Gets the distance function.
|
java.io.File |
getInputCenterFile()
Gets the file to read the list of centers from.
|
KDTree |
getKDTree()
Gets the KDTree class.
|
int |
getMaxIterations()
Gets the maximum number of iterations.
|
int |
getMaxKMeans()
Gets the maximum number of iterations in KMeans.
|
int |
getMaxKMeansForChildren()
Gets the maximum number of iterations in KMeans.
|
int |
getMaxNumClusters()
Gets the maximum number of clusters to generate.
|
int |
getMinNumClusters()
Gets the minimum number of clusters to generate.
|
Instance |
getNextDebugVectorsInstance(Instances model)
Read an instance from debug vectors file.
|
java.lang.String[] |
getOptions()
Gets the current settings of SimpleKMeans.
|
java.io.File |
getOutputCenterFile()
Gets the file to write the list of centers to.
|
java.lang.String |
getRevision()
Returns the revision string.
|
TechnicalInformation |
getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing
detailed information about the technical background of this class,
e.g., paper reference or book this class is based on.
|
boolean |
getUseKDTree()
Gets whether the KDTree is used or not.
|
java.lang.String |
globalInfo()
Returns a string describing this clusterer.
|
void |
initDebugVectorsInput()
Initialises the debug vector input.
|
java.lang.String |
inputCenterFileTipText()
Returns the tip text for this property.
|
java.lang.String |
KDTreeTipText()
Returns the tip text for this property.
|
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options.
|
static void |
main(java.lang.String[] argv)
Main method for testing this class.
|
java.lang.String |
maxIterationsTipText()
Returns the tip text for this property.
|
java.lang.String |
maxKMeansForChildrenTipText()
Returns the tip text for this property.
|
java.lang.String |
maxKMeansTipText()
Returns the tip text for this property.
|
java.lang.String |
maxNumClustersTipText()
Returns the tip text for this property.
|
java.lang.String |
minNumClustersTipText()
Returns the tip text for this property.
|
int |
numberOfClusters()
Returns the number of clusters.
|
java.lang.String |
outputCenterFileTipText()
Returns the tip text for this property.
|
void |
setBinValue(double value)
Sets the distance value between true and false of binary attributes.
|
void |
setCutOffFactor(double i)
Sets a new cutoff factor.
|
void |
setDebugLevel(int d)
Sets the debug level.
|
void |
setDebugVectorsFile(java.io.File value)
Sets the file that has the random vectors stored.
|
void |
setDistanceF(DistanceFunction distanceF)
gets the "binary" distance value.
|
void |
setInputCenterFile(java.io.File value)
Sets the file to read the list of centers from.
|
void |
setKDTree(KDTree k)
Sets the KDTree class.
|
void |
setMaxIterations(int i)
Sets the maximum number of iterations to perform.
|
void |
setMaxKMeans(int i)
Set the maximum number of iterations to perform in KMeans.
|
void |
setMaxKMeansForChildren(int i)
Sets the maximum number of iterations KMeans that is performed
on the child centers.
|
void |
setMaxNumClusters(int n)
Sets the maximum number of clusters to generate.
|
void |
setMinNumClusters(int n)
Sets the minimum number of clusters to generate.
|
void |
setOptions(java.lang.String[] options)
Parses a given list of options.
|
void |
setOutputCenterFile(java.io.File value)
Sets file to write the list of centers to.
|
void |
setUseKDTree(boolean value)
Sets whether to use the KDTree or not.
|
java.lang.String |
toString()
Return a string describing this clusterer.
|
java.lang.String |
useKDTreeTipText()
Returns the tip text for this property.
|
getSeed, seedTipText, setSeed
distributionForInstance, forName, makeCopies, makeCopy
public static int R_LOW
public static int R_HIGH
public static int R_WIDTH
public static int D_PRINTCENTERS
public static int D_FOLLOWSPLIT
public static int D_CONVCHCLOSER
public static int D_RANDOMVECTOR
public static int D_KDTREE
public static int D_ITERCOUNT
public static int D_METH_MISUSE
public static int D_CURR
public static int D_GENERAL
public boolean m_CurrDebugFlag
public java.lang.String globalInfo()
public TechnicalInformation getTechnicalInformation()
getTechnicalInformation
in interface TechnicalInformationHandler
public Capabilities getCapabilities()
getCapabilities
in interface Clusterer
getCapabilities
in interface CapabilitiesHandler
getCapabilities
in class AbstractClusterer
Capabilities
public void buildClusterer(Instances data) throws java.lang.Exception
buildClusterer
in interface Clusterer
buildClusterer
in class AbstractClusterer
data
- set of instances serving as training datajava.lang.Exception
- if the clusterer has not been
generated successfullypublic boolean checkForNominalAttributes(Instances data)
data
- the data to checkpublic int clusterInstance(Instance instance) throws java.lang.Exception
clusterInstance
in interface Clusterer
clusterInstance
in class AbstractClusterer
instance
- the instance to be assigned to a clusterjava.lang.Exception
- if instance could not be classified
successfullypublic int numberOfClusters()
numberOfClusters
in interface Clusterer
numberOfClusters
in class AbstractClusterer
public java.util.Enumeration listOptions()
listOptions
in interface OptionHandler
listOptions
in class RandomizableClusterer
public java.lang.String minNumClustersTipText()
public void setMinNumClusters(int n)
n
- the minimum number of clusters to generatepublic int getMinNumClusters()
public java.lang.String maxNumClustersTipText()
public void setMaxNumClusters(int n)
n
- the maximum number of clusters to generatepublic int getMaxNumClusters()
public java.lang.String maxIterationsTipText()
public void setMaxIterations(int i) throws java.lang.Exception
i
- the number of iterationsjava.lang.Exception
- if i is less than 1public int getMaxIterations()
public java.lang.String maxKMeansTipText()
public void setMaxKMeans(int i)
i
- the number of iterationspublic int getMaxKMeans()
public java.lang.String maxKMeansForChildrenTipText()
public void setMaxKMeansForChildren(int i)
i
- the number of iterationspublic int getMaxKMeansForChildren()
public java.lang.String cutOffFactorTipText()
public void setCutOffFactor(double i)
i
- the new cutoff factorpublic double getCutOffFactor()
public java.lang.String binValueTipText()
public double getBinValue()
public void setBinValue(double value)
value
- the distancepublic java.lang.String distanceFTipText()
public void setDistanceF(DistanceFunction distanceF)
distanceF
- the distance function with all options setpublic DistanceFunction getDistanceF()
public java.lang.String debugVectorsFileTipText()
public void setDebugVectorsFile(java.io.File value)
value
- the file to read the random vectors frompublic java.io.File getDebugVectorsFile()
public void initDebugVectorsInput() throws java.lang.Exception
java.lang.Exception
- if there is error
opening the debug input file.public Instance getNextDebugVectorsInstance(Instances model) throws java.lang.Exception
model
- the data model for the instance.java.lang.Exception
- if there are no debug vector
in m_DebugVectors.public java.lang.String inputCenterFileTipText()
public void setInputCenterFile(java.io.File value)
value
- the file to read centers frompublic java.io.File getInputCenterFile()
public java.lang.String outputCenterFileTipText()
public void setOutputCenterFile(java.io.File value)
value
- file to write centers topublic java.io.File getOutputCenterFile()
public java.lang.String KDTreeTipText()
public void setKDTree(KDTree k)
k
- a KDTree object with all options setpublic KDTree getKDTree()
public java.lang.String useKDTreeTipText()
public void setUseKDTree(boolean value)
value
- if true the KDTree is usedpublic boolean getUseKDTree()
public java.lang.String debugLevelTipText()
public void setDebugLevel(int d)
d
- debuglevelpublic int getDebugLevel()
public void setOptions(java.lang.String[] options) throws java.lang.Exception
-I <num> maximum number of overall iterations (default 1).
-M <num> maximum number of iterations in the kMeans loop in the Improve-Parameter part (default 1000).
-J <num> maximum number of iterations in the kMeans loop for the splitted centroids in the Improve-Structure part (default 1000).
-L <num> minimum number of clusters (default 2).
-H <num> maximum number of clusters (default 4).
-B <value> distance value for binary attributes (default 1.0).
-use-kdtree Uses the KDTree internally (default no).
-K <KDTree class specification> Full class name of KDTree class to use, followed by scheme options. eg: "weka.core.neighboursearch.kdtrees.KDTree -P" (default no KDTree class used).
-C <value> cutoff factor, takes the given percentage of the splitted centroids if none of the children win (default 0.0).
-D <distance function class specification> Full class name of Distance function class to use, followed by scheme options. (default weka.core.EuclideanDistance).
-N <file name> file to read starting centers from (ARFF format).
-O <file name> file to write centers to (ARFF format).
-U <int> The debug level. (default 0)
-Y <file name> The debug vectors file.
-S <num> Random number seed. (default 10)
setOptions
in interface OptionHandler
setOptions
in class RandomizableClusterer
options
- the list of options as an array of stringsjava.lang.Exception
- if an option is not supportedpublic java.lang.String[] getOptions()
getOptions
in interface OptionHandler
getOptions
in class RandomizableClusterer
public java.lang.String toString()
toString
in class java.lang.Object
public Instances getClusterCenters()
public java.lang.String getRevision()
getRevision
in interface RevisionHandler
getRevision
in class AbstractClusterer
public static void main(java.lang.String[] argv)
argv
- should contain options