public class WekaClassifierMapTask
extends java.lang.Object
implements weka.core.OptionHandler, weka.core.EnvironmentHandler, java.io.Serializable
Classifiers may be trained on all the incoming data or on a particular cross-validation fold (this functionality is used directly by the evaluation map and reduce tasks). In the case of batch classifiers, the data for the map will be stratified (if the class is nominal) and randomized before extracting the fold to train on. In the case of incremental classifiers, a modulus operation is used to pull out the instance corresponding to the selected fold from the incoming instance stream.
Classifiers can optionally have their training data passed through one or more filters as a pre-processing step. The class will determine how to wrap the base classifier and filters based on the nature of the filters specified and whether the classifier is batch/incremental and Aggregateable. Aggregateable classifiers (batch or incremental) can only be aggregated to one final model if the filters used with them (if using filters) are all StreamableFilters (i.e. they can determine their output structure immediately without having to see any instances).
It is also possible to specify a special "preconstructed" filter to use in conjunction with, or instead of, regular filters. At present, there is just one Preconstructed filter implemented by the distributed system. PreConstructedPCA can produce a "trained" PCA filter using a correlation matrix produced by the CorrelationMatrixMap/Reduce tasks.
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
TOTAL_NUMBER_OF_MAPS
If this property is set then we can adjust the total number of requested
iterations for IteratedSingleClassifierEnhancers according to the number of
maps that are going to run.
|
Constructor and Description |
---|
WekaClassifierMapTask() |
Modifier and Type | Method and Description |
---|---|
void |
addPreconstructedFilterToUse(PreconstructedFilter f)
Add a Preconstructed filter (such as PreConstructedPCA) to use with the
classifier.
|
void |
addToTrainingHeader(weka.core.Instance toAdd)
Add the supplied instance to the training header
|
void |
addToTrainingHeader(weka.core.Instances toAdd)
Add the supplied instances to the training header
|
java.lang.String |
classifierTipText()
The tool tip text for this property.
|
java.lang.String |
continueTrainingUpdateableClassifierTipText()
The tool tip text for this property.
|
java.lang.String |
filtersToUseTipText()
The tool tip text for this property.
|
void |
finalizeTask()
Finish up the map task.
|
java.lang.String |
foldNumberTipText()
The tool tip text for this property.
|
java.lang.String |
forceBatchLearningForUpdateableClassifiersTipText()
The tool tip text for this property.
|
java.lang.String |
forceVotedEnsembleCreation()
The tool tip text for this property.
|
weka.classifiers.Classifier |
getClassifier()
Get the classifier to use
|
boolean |
getContinueTrainingUpdateableClassifier()
Get whether to continue training an incremental (updateable) classifier.
|
weka.filters.Filter[] |
getFiltersToUse()
Get the filters to wrap up with the base classifier
|
int |
getFoldNumber()
Get the fold number to train the classifier with.
|
boolean |
getForceBatchLearningForUpdateableClassifiers()
Get whether to force batch training for incremental (Updateable)
classifiers
|
boolean |
getForceVotedEnsembleCreation()
Get whether to force the creation of a Vote ensemble for Aggregateable
classifiers
|
int |
getNumTrainingInstances()
Get the number of training instances actually used to train the classifier.
|
java.lang.String[] |
getOptions() |
int |
getReservoirSampleSize()
Get the sample size for reservoir sampling
|
java.lang.String |
getSeed()
Get the seed for randomizing the data when batch learning and for reservoir
sampling.
|
int |
getTotalNumFolds()
Get the total number of folds to use.
|
boolean |
getUseReservoirSamplingWhenBatchLearning()
Get whether to use reservoir sampling when batch learning
|
java.util.Enumeration<weka.core.Option> |
listOptions() |
static void |
main(java.lang.String[] args) |
void |
processInstance(weka.core.Instance inst)
Process the supplied instance.
|
java.lang.String |
reservoirSampleSizeTipText()
The tool tip text for this property.
|
java.lang.String |
seedTipText()
The tool tip text for this property.
|
void |
setClassifier(weka.classifiers.Classifier classifier)
Set the classifier to use
|
void |
setContinueTrainingUpdateableClassifier(boolean u)
Set whether to continue training an incremental (updateable) classifier.
|
void |
setEnvironment(weka.core.Environment env) |
void |
setFiltersToUse(weka.filters.Filter[] toUse)
Set the filters to wrap up with the base classifier
|
void |
setFoldNumber(int fn)
Set the fold number to train the classifier with.
|
void |
setForceBatchLearningForUpdateableClassifiers(boolean force)
Set whether to force batch training for incremental (Updateable)
classifiers
|
void |
setForceVotedEnsembleCreation(boolean f)
Set whether to force the creation of a Vote ensemble for Aggregateable
classifiers
|
void |
setOptions(java.lang.String[] options) |
void |
setReservoirSampleSize(int size)
Set the sample size for reservoir sampling
|
void |
setSeed(java.lang.String seed)
Set the seed for randomizing the data when batch learning and for reservoir
sampling.
|
void |
setTotalNumFolds(int t)
Set the total number of folds to use.
|
void |
setup(weka.core.Instances trainingHeader)
Initialize the map task
|
void |
setUseReservoirSamplingWhenBatchLearning(boolean r)
Set whether to use reservoir sampling when batch learning
|
java.lang.String |
totalNumberOfFoldsTipText()
The tool tip text for this property.
|
java.lang.String |
useReservoirSamplingWhenBatchLearningTipText()
The tool tip text for this property.
|
public static final java.lang.String TOTAL_NUMBER_OF_MAPS
public static void main(java.lang.String[] args)
public java.util.Enumeration<weka.core.Option> listOptions()
listOptions
in interface weka.core.OptionHandler
public java.lang.String[] getOptions()
getOptions
in interface weka.core.OptionHandler
public void setOptions(java.lang.String[] options) throws java.lang.Exception
setOptions
in interface weka.core.OptionHandler
java.lang.Exception
public weka.classifiers.Classifier getClassifier()
public void setClassifier(weka.classifiers.Classifier classifier)
classifier
- the classifier to usepublic java.lang.String classifierTipText()
public java.lang.String getSeed()
public void setSeed(java.lang.String seed)
seed
- the seed to usepublic java.lang.String seedTipText()
public boolean getForceVotedEnsembleCreation()
public void setForceVotedEnsembleCreation(boolean f)
f
- true if a Vote ensemble is to be created even in the case where
the base classifier is directly aggregateablepublic java.lang.String forceVotedEnsembleCreation()
public weka.filters.Filter[] getFiltersToUse()
public void setFiltersToUse(weka.filters.Filter[] toUse)
toUse
- filters to wrap up with the base classifierpublic java.lang.String filtersToUseTipText()
public void addPreconstructedFilterToUse(PreconstructedFilter f)
f
- the Preconstructed filter to use.public boolean getUseReservoirSamplingWhenBatchLearning()
public void setUseReservoirSamplingWhenBatchLearning(boolean r)
r
- true if reservoir sampling is to be usedpublic java.lang.String useReservoirSamplingWhenBatchLearningTipText()
public int getReservoirSampleSize()
public void setReservoirSampleSize(int size)
size
- the sample size to use for reservoir samplingpublic java.lang.String reservoirSampleSizeTipText()
public boolean getForceBatchLearningForUpdateableClassifiers()
public void setForceBatchLearningForUpdateableClassifiers(boolean force)
force
- true if incremental classifiers should be batch trainedpublic java.lang.String forceBatchLearningForUpdateableClassifiersTipText()
public boolean getContinueTrainingUpdateableClassifier()
public void setContinueTrainingUpdateableClassifier(boolean u)
u
- true to continue training an incremental classifier rather than
starting from scratchpublic java.lang.String continueTrainingUpdateableClassifierTipText()
public int getFoldNumber()
public void setFoldNumber(int fn)
fn
- the fold number to train on.public java.lang.String foldNumberTipText()
public int getTotalNumFolds()
public void setTotalNumFolds(int t)
t
- the total number of foldspublic java.lang.String totalNumberOfFoldsTipText()
public int getNumTrainingInstances()
public void addToTrainingHeader(weka.core.Instances toAdd)
toAdd
- the instances to addpublic void addToTrainingHeader(weka.core.Instance toAdd)
toAdd
- the instance to addpublic void setup(weka.core.Instances trainingHeader) throws DistributedWekaException
trainingHeader
- the header of the incoming training data.DistributedWekaException
- if something goes wrongpublic void processInstance(weka.core.Instance inst) throws DistributedWekaException
inst
- the instance to train withDistributedWekaException
- if a problem occurspublic void finalizeTask() throws DistributedWekaException
DistributedWekaException
- if something goes wrongpublic void setEnvironment(weka.core.Environment env)
setEnvironment
in interface weka.core.EnvironmentHandler