public class BIRCHCluster extends ClusterGenerator implements TechnicalInformationHandler
 @inproceedings{Zhang1996,
    author = {Tian Zhang and Raghu Ramakrishnan and Miron Livny},
    booktitle = {ACM SIGMOD International Conference on Management of Data},
    pages = {103-114},
    publisher = {ACM Press},
    title = {BIRCH: An Efficient Data Clustering Method for Very Large Databases},
    year = {1996}
 }
 
 
 
 
 
 Valid options are: 
 
 -h Prints this help.
-o <file> The name of the output file, otherwise the generated data is printed to stdout.
-r <name> The name of the relation.
-d Whether to print debug informations.
-S The seed for random function (default 1)
-a <num> The number of attributes (default 10).
-c Class Flag, if set, the cluster is listed in extra attribute.
-b <range> The indices for boolean attributes.
-m <range> The indices for nominal attributes.
-k <num> The number of clusters (default 4)
-G Set pattern to grid (default is random). This flag cannot be used at the same time as flag I. The pattern is random, if neither flag G nor flag I is set.
-I Set pattern to sine (default is random). This flag cannot be used at the same time as flag I. The pattern is random, if neither flag G nor flag I is set.
-N <num>..<num> The range of number of instances per cluster (default 1..50). Lower number must be between 0 and 2500, upper number must be between 50 and 2500.
-R <num>..<num> The range of radius per cluster (default 0.1..1.4142135623730951). Lower number must be between 0 and SQRT(2), upper number must be between SQRT(2) and SQRT(32).
-M <num> The distance multiplier (default 4.0).
-C <num> The number of cycles (default 4).
-O Flag for input order is ORDERED. If flag is not set then input order is RANDOMIZED. RANDOMIZED is currently not implemented, therefore is the input order always ORDERED.
-P <num> The noise rate in percent (default 0.0). Can be between 0% and 30%. (Remark: The original algorithm only allows noise up to 10%.)
| Modifier and Type | Field and Description | 
|---|---|
| static int | GRIDConstant set for choice of pattern. | 
| static int | ORDEREDConstant set for input order (option O) | 
| static int | RANDOMConstant set for choice of pattern. | 
| static int | RANDOMIZEDConstant set for input order (default) | 
| static int | SINEConstant set for choice of pattern. | 
| static Tag[] | TAGS_INPUTORDERthe input order tags | 
| static Tag[] | TAGS_PATTERNthe pattern tags | 
| Constructor and Description | 
|---|
| BIRCHCluster()initializes the generator with default values | 
| Modifier and Type | Method and Description | 
|---|---|
| Instances | defineDataFormat()Initializes the format for the dataset produced. | 
| java.lang.String | distMultTipText()Returns the tip text for this property | 
| Instance | generateExample()Generate an example of the dataset. | 
| Instances | generateExamples()Generate all examples of the dataset. | 
| Instances | generateExamples(java.util.Random random,
                Instances format)Generate all examples of the dataset. | 
| java.lang.String | generateFinished()Compiles documentation about the data generation after
 the generation process | 
| java.lang.String | generateStart()Compiles documentation about the data generation before
 the generation process | 
| double | getDistMult()Gets the distance multiplier. | 
| SelectedTag | getInputOrder()Gets the input order. | 
| int | getMaxInstNum()Gets the upper boundary for instances per cluster. | 
| double | getMaxRadius()Gets the upper boundary for the radiuses of the clusters. | 
| int | getMinInstNum()Gets the lower boundary for instances per cluster. | 
| double | getMinRadius()Gets the lower boundary for the radiuses of the clusters. | 
| double | getNoiseRate()Gets the percentage of noise set. | 
| int | getNumClusters()Gets the number of clusters the dataset should have. | 
| int | getNumCycles()Gets the number of cycles. | 
| java.lang.String[] | getOptions()Gets the current settings of the datagenerator BIRCHCluster. | 
| boolean | getOrderedFlag()Gets the ordered flag (option O). | 
| SelectedTag | getPattern()Gets the pattern type. | 
| java.lang.String | getRevision()Returns the revision string. | 
| boolean | getSingleModeFlag()Gets the single mode flag. | 
| TechnicalInformation | getTechnicalInformation()Returns an instance of a TechnicalInformation object, containing 
 detailed information about the technical background of this class,
 e.g., paper reference or book this class is based on. | 
| java.lang.String | globalInfo()Returns a string describing this data generator. | 
| java.lang.String | inputOrderTipText()Returns the tip text for this property | 
| java.util.Enumeration | listOptions()Returns an enumeration describing the available options. | 
| static void | main(java.lang.String[] args)Main method for testing this class. | 
| java.lang.String | maxInstNumTipText()Returns the tip text for this property | 
| java.lang.String | maxRadiusTipText()Returns the tip text for this property | 
| java.lang.String | minInstNumTipText()Returns the tip text for this property | 
| java.lang.String | minRadiusTipText()Returns the tip text for this property | 
| java.lang.String | noiseRateTipText()Returns the tip text for this property | 
| java.lang.String | numClustersTipText()Returns the tip text for this property | 
| java.lang.String | numCyclesTipText()Returns the tip text for this property | 
| java.lang.String | patternTipText()Returns the tip text for this property | 
| void | setDistMult(double newDistMult)Sets the distance multiplier. | 
| void | setInputOrder(SelectedTag value)Sets the input order. | 
| void | setMaxInstNum(int newMaxInstNum)Sets the upper boundary for instances per cluster. | 
| void | setMaxRadius(double newMaxRadius)Sets the upper boundary for the radiuses of the clusters. | 
| void | setMinInstNum(int newMinInstNum)Sets the lower boundary for instances per cluster. | 
| void | setMinRadius(double newMinRadius)Sets the lower boundary for the radiuses of the clusters. | 
| void | setNoiseRate(double newNoiseRate)Sets the percentage of noise set. | 
| void | setNumClusters(int numClusters)Sets the number of clusters the dataset should have. | 
| void | setNumCycles(int newNumCycles)Sets the the number of cycles. | 
| void | setOptions(java.lang.String[] options)Parses a list of options for this object. | 
| void | setPattern(SelectedTag value)Sets the pattern type. | 
booleanColsTipText, classFlagTipText, getBooleanCols, getClassFlag, getNominalCols, getNumAttributes, nominalColsTipText, numAttributesTipText, setBooleanCols, setBooleanIndices, setClassFlag, setNominalCols, setNominalIndices, setNumAttributesdebugTipText, defaultOutput, formatTipText, getDatasetFormat, getDebug, getNumExamplesAct, getOutput, getRandom, getRelationName, getSeed, makeData, outputTipText, randomTipText, relationNameTipText, seedTipText, setDatasetFormat, setDebug, setOutput, setRandom, setRelationName, setSeedpublic static final int GRID
public static final int SINE
public static final int RANDOM
public static final Tag[] TAGS_PATTERN
public static final int ORDERED
public static final int RANDOMIZED
public static final Tag[] TAGS_INPUTORDER
public java.lang.String globalInfo()
public TechnicalInformation getTechnicalInformation()
getTechnicalInformation in interface TechnicalInformationHandlerpublic java.util.Enumeration listOptions()
listOptions in interface OptionHandlerlistOptions in class ClusterGeneratorpublic void setOptions(java.lang.String[] options)
                throws java.lang.Exception
-h Prints this help.
-o <file> The name of the output file, otherwise the generated data is printed to stdout.
-r <name> The name of the relation.
-d Whether to print debug informations.
-S The seed for random function (default 1)
-a <num> The number of attributes (default 10).
-c Class Flag, if set, the cluster is listed in extra attribute.
-b <range> The indices for boolean attributes.
-m <range> The indices for nominal attributes.
-k <num> The number of clusters (default 4)
-G Set pattern to grid (default is random). This flag cannot be used at the same time as flag I. The pattern is random, if neither flag G nor flag I is set.
-I Set pattern to sine (default is random). This flag cannot be used at the same time as flag I. The pattern is random, if neither flag G nor flag I is set.
-N <num>..<num> The range of number of instances per cluster (default 1..50). Lower number must be between 0 and 2500, upper number must be between 50 and 2500.
-R <num>..<num> The range of radius per cluster (default 0.1..1.4142135623730951). Lower number must be between 0 and SQRT(2), upper number must be between SQRT(2) and SQRT(32).
-M <num> The distance multiplier (default 4.0).
-C <num> The number of cycles (default 4).
-O Flag for input order is ORDERED. If flag is not set then input order is RANDOMIZED. RANDOMIZED is currently not implemented, therefore is the input order always ORDERED.
-P <num> The noise rate in percent (default 0.0). Can be between 0% and 30%. (Remark: The original algorithm only allows noise up to 10%.)
setOptions in interface OptionHandlersetOptions in class ClusterGeneratoroptions - the list of options as an array of stringsjava.lang.Exception - if an option is not supportedpublic java.lang.String[] getOptions()
getOptions in interface OptionHandlergetOptions in class ClusterGeneratorDataGenerator.removeBlacklist(String[])public void setNumClusters(int numClusters)
numClusters - the new number of clusterspublic int getNumClusters()
public java.lang.String numClustersTipText()
public int getMinInstNum()
public void setMinInstNum(int newMinInstNum)
newMinInstNum - new lower boundary for instances per clusterpublic java.lang.String minInstNumTipText()
public int getMaxInstNum()
public void setMaxInstNum(int newMaxInstNum)
newMaxInstNum - new upper boundary for instances per clusterpublic java.lang.String maxInstNumTipText()
public double getMinRadius()
public void setMinRadius(double newMinRadius)
newMinRadius - new lower boundary for the radiuses of the clusterspublic java.lang.String minRadiusTipText()
public double getMaxRadius()
public void setMaxRadius(double newMaxRadius)
newMaxRadius - new upper boundary for the radiuses of the clusterspublic java.lang.String maxRadiusTipText()
public SelectedTag getPattern()
public void setPattern(SelectedTag value)
value - new pattern typepublic java.lang.String patternTipText()
public double getDistMult()
public void setDistMult(double newDistMult)
newDistMult - new distance multiplierpublic java.lang.String distMultTipText()
public int getNumCycles()
public void setNumCycles(int newNumCycles)
newNumCycles - new number of cyclespublic java.lang.String numCyclesTipText()
public SelectedTag getInputOrder()
public void setInputOrder(SelectedTag value)
value - new input orderpublic java.lang.String inputOrderTipText()
public boolean getOrderedFlag()
public double getNoiseRate()
public void setNoiseRate(double newNoiseRate)
newNoiseRate - new percentage of noisepublic java.lang.String noiseRateTipText()
public boolean getSingleModeFlag()
getSingleModeFlag in class DataGeneratorpublic Instances defineDataFormat() throws java.lang.Exception
defineDataFormat in class DataGeneratorjava.lang.Exception - data format could not be definedDataGenerator.defaultRelationName()public Instance generateExample() throws java.lang.Exception
generateExample in class DataGeneratorjava.lang.Exception - if format not defined or generating public Instances generateExamples() throws java.lang.Exception
generateExamples in class DataGeneratorjava.lang.Exception - if format not definedpublic Instances generateExamples(java.util.Random random, Instances format) throws java.lang.Exception
random - the random number generator to useformat - the dataset formatjava.lang.Exception - if format not definedpublic java.lang.String generateFinished()
                                  throws java.lang.Exception
generateFinished in class DataGeneratorjava.lang.Exception - no input structure has been definedpublic java.lang.String generateStart()
generateStart in class DataGeneratorpublic java.lang.String getRevision()
getRevision in interface RevisionHandlerpublic static void main(java.lang.String[] args)
args - should contain arguments for the data producer: