public class EnsembleSelection
extends weka.classifiers.RandomizableClassifier
implements weka.core.TechnicalInformationHandler
@inproceedings{RichCaruana2004, author = {Rich Caruana, Alex Niculescu, Geoff Crew, and Alex Ksikes}, booktitle = {21st International Conference on Machine Learning}, title = {Ensemble Selection from Libraries of Models}, year = {2004} }Our implementation of ensemble selection is a bit different from the other classifiers because we assume that the list of models to be trained is too large to fit in memory and that our base classifiers will need to be serialized to the file system (in the directory listed in the "workingDirectory option). We have adopted the term "model library" for this large set of classifiers keeping in line with the original paper. If you are planning to use this classifier, we highly recommend you take a quick look at our FAQ/tutorial on the WIKI. There are a few things that are unique to this classifier that could trip you up. Otherwise, this method is a great way to get really great classifier performance without having to do too much parameter tuning. What is nice is that in the worst case you get a nice summary of how s large number of diverse models performed on your data set. This class relies on the package weka.classifiers.meta.ensembleSelection. When run from the Explorer or another GUI, the classifier depends on the package weka.gui.libraryEditor. Valid options are:
-L </path/to/modelLibrary> Specifies the Model Library File, continuing the list of all models.
-W </path/to/working/directory> Specifies the Working Directory, where all models will be stored.
-B <numModelBags> Set the number of bags, i.e., number of iterations to run the ensemble selection algorithm.
-E <modelRatio> Set the ratio of library models that will be randomly chosen to populate each bag of models.
-V <validationRatio> Set the ratio of the training data set that will be reserved for validation.
-H <hillClimbIterations> Set the number of hillclimbing iterations to be performed on each model bag.
-I <sortInitialization> Set the the ratio of the ensemble library that the sort initialization algorithm will be able to choose from while initializing the ensemble for each model bag
-X <numFolds> Sets the number of cross-validation folds.
-P <hillclimbMettric> Specify the metric that will be used for model selection during the hillclimbing algorithm. Valid metrics are: accuracy, rmse, roc, precision, recall, fscore, all
-A <algorithm> Specifies the algorithm to be used for ensemble selection. Valid algorithms are: "forward" (default) for forward selection. "backward" for backward elimination. "both" for both forward and backward elimination. "best" to simply print out top performer from the ensemble library "library" to only train the models in the ensemble library
-R Flag whether or not models can be selected more than once for an ensemble.
-G Whether sort initialization greedily stops adding models when performance degrades.
-O Flag for verbose output. Prints out performance of all selected models.
-S <num> Random number seed. (default 1)
-D If set, classifier is run in debug mode and may output additional info to the console
Modifier and Type | Field and Description |
---|---|
static int |
ALGORITHM_BACKWARD |
static int |
ALGORITHM_BEST |
static int |
ALGORITHM_BUILD_LIBRARY |
static int |
ALGORITHM_FORWARD
The "enumeration" of the algorithms we can use.
|
static int |
ALGORITHM_FORWARD_BACKWARD |
static weka.core.Tag[] |
TAGS_ALGORITHM
defines metrics that can be chosen for hillclimbing
|
static weka.core.Tag[] |
TAGS_METRIC
defines metrics that can be chosen for hillclimbing
|
Constructor and Description |
---|
EnsembleSelection() |
Modifier and Type | Method and Description |
---|---|
java.lang.String |
algorithmTipText()
Returns the tip text for this property
|
void |
buildClassifier(weka.core.Instances trainData)
Buildclassifier selects a classifier from the set of classifiers by
minimising error on the training data.
|
double[] |
distributionForInstance(weka.core.Instance instance)
Calculates the class membership probabilities for the given test instance.
|
weka.core.SelectedTag |
getAlgorithm()
Gets the algorithm
|
weka.core.Capabilities |
getCapabilities()
We return true for basically everything except for Missing class values,
because we can't really answer for all the models in our library.
|
static java.lang.String |
getDefaultWorkingDirectory()
This method tries to find a reasonable path name for the ensemble working
directory where models and files will be stored.
|
boolean |
getGreedySortInitialization()
Get the value of greedySortInitialization.
|
int |
getHillclimbIterations()
Gets the number of hillclimbIterations.
|
weka.core.SelectedTag |
getHillclimbMetric()
Gets the hill climbing metric.
|
EnsembleSelectionLibrary |
getLibrary()
Gets the ensemble library.
|
double |
getModelRatio()
Get the value of modelRatio.
|
int |
getNumFolds()
Gets the number of folds for the cross-validation.
|
int |
getNumModelBags()
Gets numModelBags.
|
java.lang.String[] |
getOptions()
Gets the current settings of the Classifier.
|
boolean |
getReplacement()
Get the value of replacement.
|
java.lang.String |
getRevision()
Returns the revision string.
|
double |
getSortInitializationRatio()
Get the value of sortInitializationRatio.
|
weka.core.TechnicalInformation |
getTechnicalInformation()
Return the technical information.
|
double |
getValidationRatio()
Get the value of validationRatio.
|
boolean |
getVerboseOutput()
Get the value of verboseOutput.
|
java.io.File |
getWorkingDirectory()
Get the value of working directory.
|
java.lang.String |
globalInfo()
Returns a string describing classifier
|
java.lang.String |
greedySortInitializationTipText()
Returns the tip text for this property
|
java.lang.String |
hillclimbIterationsTipText()
Returns the tip text for this property
|
java.lang.String |
hillclimbMetricTipText()
Returns the tip text for this property
|
java.lang.String |
libraryTipText()
Returns the tip text for this property
|
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options.
|
static void |
main(java.lang.String[] argv)
Executes the classifier from commandline.
|
java.lang.String |
modelRatioTipText()
Returns the tip text for this property
|
java.lang.String |
numFoldsTipText()
Returns the tip text for this property
|
java.lang.String |
numModelBagsTipText()
Returns the tip text for this property
|
java.lang.String |
replacementTipText()
Returns the tip text for this property
|
void |
setAlgorithm(weka.core.SelectedTag newType)
Sets the Algorithm to use
|
void |
setGreedySortInitialization(boolean newGreedySortInitialization)
Set the value of greedySortInitialization.
|
void |
setHillclimbIterations(int n)
Sets the number of hillclimbIterations.
|
void |
setHillclimbMetric(weka.core.SelectedTag newType)
Sets the hill climbing metric.
|
void |
setLibrary(EnsembleSelectionLibrary newLibrary)
Sets the ensemble library.
|
void |
setModelRatio(double v)
Set the value of modelRatio.
|
void |
setNumFolds(int numFolds)
Sets the number of folds for the cross-validation.
|
void |
setNumModelBags(int n)
Sets numModelBags.
|
void |
setOptions(java.lang.String[] options)
Valid options are:
|
void |
setReplacement(boolean newReplacement)
Set the value of replacement.
|
void |
setSortInitializationRatio(double v)
Set the value of sortInitializationRatio.
|
void |
setValidationRatio(double v)
Set the value of validationRatio.
|
void |
setVerboseOutput(boolean newVerboseOutput)
Set the value of verboseOutput.
|
void |
setWorkingDirectory(java.io.File newWorkingDirectory)
Set the value of working directory.
|
java.lang.String |
sortInitializationRatioTipText()
Returns the tip text for this property
|
java.lang.String |
toString()
Output a representation of this classifier
|
java.lang.String |
validationRatioTipText()
Returns the tip text for this property
|
java.lang.String |
verboseOutputTipText()
Returns the tip text for this property
|
java.lang.String |
workingDirectoryTipText()
Returns the tip text for this property
|
batchSizeTipText, classifyInstance, debugTipText, distributionsForInstances, doNotCheckCapabilitiesTipText, forName, getBatchSize, getDebug, getDoNotCheckCapabilities, getNumDecimalPlaces, implementsMoreEfficientBatchPrediction, makeCopies, makeCopy, numDecimalPlacesTipText, postExecution, preExecution, run, runClassifier, setBatchSize, setDebug, setDoNotCheckCapabilities, setNumDecimalPlaces
public static final weka.core.Tag[] TAGS_METRIC
public static final int ALGORITHM_FORWARD
public static final int ALGORITHM_BACKWARD
public static final int ALGORITHM_FORWARD_BACKWARD
public static final int ALGORITHM_BEST
public static final int ALGORITHM_BUILD_LIBRARY
public static final weka.core.Tag[] TAGS_ALGORITHM
public java.lang.String globalInfo()
public java.util.Enumeration listOptions()
listOptions
in interface weka.core.OptionHandler
listOptions
in class weka.classifiers.RandomizableClassifier
public weka.core.Capabilities getCapabilities()
getCapabilities
in interface weka.classifiers.Classifier
getCapabilities
in interface weka.core.CapabilitiesHandler
getCapabilities
in class weka.classifiers.AbstractClassifier
public void setOptions(java.lang.String[] options) throws java.lang.Exception
-L </path/to/modelLibrary> Specifies the Model Library File, continuing the list of all models.
-W </path/to/working/directory> Specifies the Working Directory, where all models will be stored.
-B <numModelBags> Set the number of bags, i.e., number of iterations to run the ensemble selection algorithm.
-E <modelRatio> Set the ratio of library models that will be randomly chosen to populate each bag of models.
-V <validationRatio> Set the ratio of the training data set that will be reserved for validation.
-H <hillClimbIterations> Set the number of hillclimbing iterations to be performed on each model bag.
-I <sortInitialization> Set the the ratio of the ensemble library that the sort initialization algorithm will be able to choose from while initializing the ensemble for each model bag
-X <numFolds> Sets the number of cross-validation folds.
-P <hillclimbMettric> Specify the metric that will be used for model selection during the hillclimbing algorithm. Valid metrics are: accuracy, rmse, roc, precision, recall, fscore, all
-A <algorithm> Specifies the algorithm to be used for ensemble selection. Valid algorithms are: "forward" (default) for forward selection. "backward" for backward elimination. "both" for both forward and backward elimination. "best" to simply print out top performer from the ensemble library "library" to only train the models in the ensemble library
-R Flag whether or not models can be selected more than once for an ensemble.
-G Whether sort initialization greedily stops adding models when performance degrades.
-O Flag for verbose output. Prints out performance of all selected models.
-S <num> Random number seed. (default 1)
-D If set, classifier is run in debug mode and may output additional info to the console
setOptions
in interface weka.core.OptionHandler
setOptions
in class weka.classifiers.RandomizableClassifier
options
- the list of options as an array of stringsjava.lang.Exception
- if an option is not supportedpublic java.lang.String[] getOptions()
getOptions
in interface weka.core.OptionHandler
getOptions
in class weka.classifiers.RandomizableClassifier
public java.lang.String numFoldsTipText()
public int getNumFolds()
public void setNumFolds(int numFolds) throws java.lang.Exception
numFolds
- the number of folds for the cross-validationjava.lang.Exception
- if parameter illegalpublic java.lang.String libraryTipText()
public EnsembleSelectionLibrary getLibrary()
public void setLibrary(EnsembleSelectionLibrary newLibrary)
newLibrary
- the ensemble librarypublic java.lang.String modelRatioTipText()
public double getModelRatio()
public void setModelRatio(double v)
v
- Value to assign to modelRatio.public java.lang.String validationRatioTipText()
public double getValidationRatio()
public void setValidationRatio(double v)
v
- Value to assign to validationRatio.public java.lang.String hillclimbMetricTipText()
public weka.core.SelectedTag getHillclimbMetric()
public void setHillclimbMetric(weka.core.SelectedTag newType)
newType
- the new hillclimbMetricpublic java.lang.String algorithmTipText()
public weka.core.SelectedTag getAlgorithm()
public void setAlgorithm(weka.core.SelectedTag newType)
newType
- the new algorithmpublic java.lang.String hillclimbIterationsTipText()
public int getHillclimbIterations()
public void setHillclimbIterations(int n) throws java.lang.Exception
n
- the number of hillclimbIterationsjava.lang.Exception
- if parameter illegalpublic java.lang.String numModelBagsTipText()
public int getNumModelBags()
public void setNumModelBags(int n) throws java.lang.Exception
n
- the new value for numModelBagsjava.lang.Exception
- if parameter illegalpublic java.lang.String sortInitializationRatioTipText()
public double getSortInitializationRatio()
public void setSortInitializationRatio(double v)
v
- Value to assign to sortInitializationRatio.public java.lang.String replacementTipText()
public boolean getReplacement()
public void setReplacement(boolean newReplacement)
newReplacement
- Value to assign to replacement.public java.lang.String greedySortInitializationTipText()
public boolean getGreedySortInitialization()
public void setGreedySortInitialization(boolean newGreedySortInitialization)
newGreedySortInitialization
- Value to assign to replacement.public java.lang.String verboseOutputTipText()
public boolean getVerboseOutput()
public void setVerboseOutput(boolean newVerboseOutput)
newVerboseOutput
- Value to assign to verboseOutput.public java.lang.String workingDirectoryTipText()
public java.io.File getWorkingDirectory()
public void setWorkingDirectory(java.io.File newWorkingDirectory)
newWorkingDirectory
- directory Value.public void buildClassifier(weka.core.Instances trainData) throws java.lang.Exception
buildClassifier
in interface weka.classifiers.Classifier
trainData
- the training data to be used for generating the boosted
classifier.java.lang.Exception
- if the classifier could not be built successfullypublic double[] distributionForInstance(weka.core.Instance instance) throws java.lang.Exception
distributionForInstance
in interface weka.classifiers.Classifier
distributionForInstance
in class weka.classifiers.AbstractClassifier
instance
- the instance to be classifiedjava.lang.Exception
- if instance could not be classified
successfullypublic static java.lang.String getDefaultWorkingDirectory()
public java.lang.String toString()
toString
in class java.lang.Object
public weka.core.TechnicalInformation getTechnicalInformation()
getTechnicalInformation
in interface weka.core.TechnicalInformationHandler
public java.lang.String getRevision()
getRevision
in interface weka.core.RevisionHandler
getRevision
in class weka.classifiers.AbstractClassifier
public static void main(java.lang.String[] argv)
argv
- should contain the following arguments: -t training file [-T
test file] [-c class index]