public class SubspaceCluster extends ClusterGenerator
-h Prints this help.
-o <file> The name of the output file, otherwise the generated data is printed to stdout.
-r <name> The name of the relation.
-d Whether to print debug informations.
-S The seed for random function (default 1)
-a <num> The number of attributes (default 1).
-c Class Flag, if set, the cluster is listed in extra attribute.
-b <range> The indices for boolean attributes.
-m <range> The indices for nominal attributes.
-P <num> The noise rate in percent (default 0.0). Can be between 0% and 30%. (Remark: The original algorithm only allows noise up to 10%.)
-C <cluster-definition> A cluster definition of class 'SubspaceClusterDefinition' (definition needs to be quoted to be recognized as a single argument).
Options specific to weka.datagenerators.clusterers.SubspaceClusterDefinition:
-A <range> Generates randomly distributed instances in the cluster.
-U <range> Generates uniformly distributed instances in the cluster.
-G <range> Generates gaussian distributed instances in the cluster.
-D <num>,<num> The attribute min/max (-A and -U) or mean/stddev (-G) for the cluster.
-N <num>..<num> The range of number of instances per cluster (default 1..50).
-I Uses integer instead of continuous values (default continuous).
Modifier and Type | Field and Description |
---|---|
static int |
CONTINUOUS
cluster subtype: continuous
|
static int |
GAUSSIAN
cluster type: gaussian
|
static int |
INTEGER
cluster subtype: integer
|
static Tag[] |
TAGS_CLUSTERSUBTYPE
the tags for the cluster types
|
static Tag[] |
TAGS_CLUSTERTYPE
the tags for the cluster types
|
static int |
TOTAL_UNIFORM
cluster type: total uniform
|
static int |
UNIFORM_RANDOM
cluster type: uniform/random
|
Constructor and Description |
---|
SubspaceCluster()
initializes the generator, sets the number of clusters to 0, since user
has to specify them explicitly
|
Modifier and Type | Method and Description |
---|---|
java.lang.String |
clusterDefinitionsTipText()
Returns the tip text for this property
|
Instances |
defineDataFormat()
Initializes the format for the dataset produced.
|
Instance |
generateExample()
Generate an example of the dataset.
|
Instances |
generateExamples()
Generate all examples of the dataset.
|
java.lang.String |
generateFinished()
Compiles documentation about the data generation after
the generation process
|
java.lang.String |
generateStart()
Compiles documentation about the data generation before
the generation process
|
ClusterDefinition[] |
getClusterDefinitions()
returns the currently set clusters
|
double |
getNoiseRate()
Gets the percentage of noise set.
|
int[] |
getNumValues()
returns array that stores the number of values for a nominal attribute.
|
java.lang.String[] |
getOptions()
Gets the current settings of the datagenerator.
|
java.lang.String |
getRevision()
Returns the revision string.
|
boolean |
getSingleModeFlag()
Gets the single mode flag.
|
java.lang.String |
globalInfo()
Returns a string describing this data generator.
|
boolean |
isBoolean(int index)
Returns true if attribute is boolean
|
boolean |
isNominal(int index)
Returns true if attribute is nominal
|
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options.
|
static void |
main(java.lang.String[] args)
Main method for testing this class.
|
java.lang.String |
noiseRateTipText()
Returns the tip text for this property
|
java.lang.String |
numAttributesTipText()
Returns the tip text for this property
|
void |
setClusterDefinitions(ClusterDefinition[] value)
sets the clusters to use
|
void |
setNoiseRate(double newNoiseRate)
Sets the percentage of noise set.
|
void |
setNumAttributes(int numAttributes)
Sets the number of attributes the dataset should have.
|
void |
setOptions(java.lang.String[] options)
Parses a list of options for this object.
|
booleanColsTipText, classFlagTipText, getBooleanCols, getClassFlag, getNominalCols, getNumAttributes, nominalColsTipText, setBooleanCols, setBooleanIndices, setClassFlag, setNominalCols, setNominalIndices
debugTipText, defaultOutput, formatTipText, getDatasetFormat, getDebug, getNumExamplesAct, getOutput, getRandom, getRelationName, getSeed, makeData, outputTipText, randomTipText, relationNameTipText, seedTipText, setDatasetFormat, setDebug, setOutput, setRandom, setRelationName, setSeed
public static final int UNIFORM_RANDOM
public static final int TOTAL_UNIFORM
public static final int GAUSSIAN
public static final Tag[] TAGS_CLUSTERTYPE
public static final int CONTINUOUS
public static final int INTEGER
public static final Tag[] TAGS_CLUSTERSUBTYPE
public SubspaceCluster()
public java.lang.String globalInfo()
public java.util.Enumeration listOptions()
listOptions
in interface OptionHandler
listOptions
in class ClusterGenerator
public void setOptions(java.lang.String[] options) throws java.lang.Exception
-h Prints this help.
-o <file> The name of the output file, otherwise the generated data is printed to stdout.
-r <name> The name of the relation.
-d Whether to print debug informations.
-S The seed for random function (default 1)
-a <num> The number of attributes (default 1).
-c Class Flag, if set, the cluster is listed in extra attribute.
-b <range> The indices for boolean attributes.
-m <range> The indices for nominal attributes.
-P <num> The noise rate in percent (default 0.0). Can be between 0% and 30%. (Remark: The original algorithm only allows noise up to 10%.)
-C <cluster-definition> A cluster definition of class 'SubspaceClusterDefinition' (definition needs to be quoted to be recognized as a single argument).
Options specific to weka.datagenerators.clusterers.SubspaceClusterDefinition:
-A <range> Generates randomly distributed instances in the cluster.
-U <range> Generates uniformly distributed instances in the cluster.
-G <range> Generates gaussian distributed instances in the cluster.
-D <num>,<num> The attribute min/max (-A and -U) or mean/stddev (-G) for the cluster.
-N <num>..<num> The range of number of instances per cluster (default 1..50).
-I Uses integer instead of continuous values (default continuous).
setOptions
in interface OptionHandler
setOptions
in class ClusterGenerator
options
- the list of options as an array of stringsjava.lang.Exception
- if an option is not supportedpublic java.lang.String[] getOptions()
getOptions
in interface OptionHandler
getOptions
in class ClusterGenerator
DataGenerator.removeBlacklist(String[])
public void setNumAttributes(int numAttributes)
setNumAttributes
in class ClusterGenerator
numAttributes
- the new number of attributespublic java.lang.String numAttributesTipText()
numAttributesTipText
in class ClusterGenerator
public double getNoiseRate()
public void setNoiseRate(double newNoiseRate)
newNoiseRate
- new percentage of noisepublic java.lang.String noiseRateTipText()
public ClusterDefinition[] getClusterDefinitions()
public void setClusterDefinitions(ClusterDefinition[] value) throws java.lang.Exception
value
- the clusters do usejava.lang.Exception
- if clusters are not the correct classpublic java.lang.String clusterDefinitionsTipText()
public boolean getSingleModeFlag()
getSingleModeFlag
in class DataGenerator
public Instances defineDataFormat() throws java.lang.Exception
defineDataFormat
in class DataGenerator
java.lang.Exception
- data format could not be definedDataGenerator.defaultRelationName()
public boolean isBoolean(int index)
index
- of the attributepublic boolean isNominal(int index)
index
- of the attributepublic int[] getNumValues()
public Instance generateExample() throws java.lang.Exception
generateExample
in class DataGenerator
java.lang.Exception
- if format not defined or generating public Instances generateExamples() throws java.lang.Exception
generateExamples
in class DataGenerator
java.lang.Exception
- if format not definedpublic java.lang.String generateFinished() throws java.lang.Exception
generateFinished
in class DataGenerator
java.lang.Exception
- no input structure has been definedpublic java.lang.String generateStart()
generateStart
in class DataGenerator
public java.lang.String getRevision()
public static void main(java.lang.String[] args)
args
- should contain arguments for the data producer: