public class WekaClassifierSparkJob extends SparkJob implements CommandlineRunnable, ClassifierProducer
SparkJob.NoKeyTextOutputFormat<K,V>
TEST_DATA, TRAINING_DATA
Constructor and Description |
---|
WekaClassifierSparkJob() |
Modifier and Type | Method and Description |
---|---|
java.lang.String |
classAttributeTipText()
Tip text for this property
|
java.lang.String |
getClassAttribute()
Get the name or index of the class attribute ("first" and "last" can also
be used)
|
Classifier |
getClassifier() |
java.lang.String |
getClassifierMapTaskOptions()
Get the options for the classifier task
|
java.lang.String |
getCSVMapTaskOptions()
Get the options to the header job
|
java.lang.String[] |
getJobOptionsOnly() |
java.lang.String |
getModelFileName()
Get the name only for the model file
|
int |
getNumIterations()
Get the number of iterations (passes over the data) to run in the model
building phase.
|
java.lang.String |
getNumRandomlyShuffledSplits()
Get the number of randomly shuffled splits to make (if randomly shuffling
the data)
|
java.lang.String[] |
getOptions() |
java.lang.String |
getPathToPreconstructedFilter()
Get the path to a pre-constructed filter to use to pre-process the data
entering each map.
|
boolean |
getRandomizeAndStratify()
Get whether to randomize (and stratify) the input data or not
|
java.lang.String |
getRandomizedJobOptions()
Get the options for the randomize/stratify task
|
Instances |
getTrainingHeader() |
boolean |
getWriteRandomlyShuffledSplitsToOutput()
Get whether the randomly shuffled data job should output its splits to the
file system
|
java.lang.String |
globalInfo()
Help information
|
java.util.Enumeration<Option> |
listOptions() |
static void |
main(java.lang.String[] args) |
java.lang.String |
modelFileNameTipText()
Tip text for this property
|
java.lang.String |
numIterationsTipText()
Tip text for this property
|
java.lang.String |
numRandomlyShuffledSplitsTipText()
Tip text for this property
|
java.lang.String |
pathToPreconstructedFilterTipText()
Tip text for this property
|
java.lang.String |
randomizeAndStratifyTipText()
Tip text for this property
|
void |
run(java.lang.Object toRun,
java.lang.String[] options) |
boolean |
runJobWithContext(org.apache.spark.api.java.JavaSparkContext sparkContext)
Clients to implement
|
void |
setClassAttribute(java.lang.String c)
Set the name or index of the class attribute ("first" and "last" can also
be used)
|
void |
setClassifierMapTaskOptions(java.lang.String opts)
Set the options for the classifier task
|
static void |
setClassIndex(java.lang.String classNameOrIndex,
Instances data,
boolean defaultToLast)
Helper method for setting the class index in the supplied Instances object
|
void |
setCSVMapTaskOptions(java.lang.String opts)
Set the options to the header job
|
void |
setModelFileName(java.lang.String m)
Set the name only for the model file
|
void |
setNumIterations(int i)
Set the number of iterations (passes over the data) to run in the model
building phase.
|
void |
setNumRandomlyShuffledSplits(java.lang.String s)
Set the number of randomly shuffled splits to make (if randomly shuffling
the data)
|
void |
setOptions(java.lang.String[] options) |
void |
setPathToPreconstructedFilter(java.lang.String path)
Set the path to a pre-constructed filter to use to pre-process the data
entering each map.
|
void |
setRandomizeAndStratify(boolean r)
Set whether to randomize (and stratify) the input data or not
|
void |
setRandomizeJobOptions(java.lang.String opts)
Set the options for the randomize/stratify task
|
void |
setWriteRandomlyShuffledSplitsToOutput(boolean write)
Set whether the randomly shuffled data job should output its splits to the
file system
|
static void |
writeModelToDestination(java.lang.Object model,
Instances header,
java.lang.String outputPath)
Utility routine to write a Weka model to a destination path
|
java.lang.String |
writeRandomlyShuffledSplitsToOutput()
Tip text for this property
|
addSubdirToPath, checkFileExists, createSparkContextForJob, debugTipText, deleteDirectory, getBaseOptionsOnly, getCachingStrategy, getDataset, getDatasets, getDebug, getFSConfigurationForPath, getSizeInBytesOfPath, getSparkContext, getSparkJobConfig, initJob, initSparkLogAppender, loadCSVFile, loadInput, loadInstanceObjectFile, openFileForRead, openFileForWrite, openTextFileForWrite, removeSparkLogAppender, resolveLocalOrOtherFileSystemPath, runJob, setCachingStrategy, setDataset, setDebug, shutdownJob, stringRDDToInstanceRDD
environmentSubstitute, getAdditionalWekaPackageNames, getJobName, getJobStatus, getLog, logMessage, logMessage, logMessage, makeOptionsStr, objectRowToInstance, parseInstance, postExecution, preExecution, setEnvironment, setJobDescription, setJobName, setJobStatus, setLog, setStatusMessagePrefix, stackTraceToString, statusMessage, stopJob
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
postExecution, preExecution
public static void setClassIndex(java.lang.String classNameOrIndex, Instances data, boolean defaultToLast) throws java.lang.Exception
classNameOrIndex
- name or index of the class attribute (may be the
special 'first' or 'last' strings)data
- the data to set the class index indefaultToLast
- true if the data should have the last attribute set to
the class if no class name/index is suppliedjava.lang.Exception
- if a problem occurspublic static void writeModelToDestination(java.lang.Object model, Instances header, java.lang.String outputPath) throws java.io.IOException
model
- the model to writeheader
- the header of the training data used to train the modeloutputPath
- the path to write tojava.io.IOException
- if a problem occurspublic static void main(java.lang.String[] args)
public java.util.Enumeration<Option> listOptions()
listOptions
in interface OptionHandler
listOptions
in class SparkJob
public java.lang.String[] getJobOptionsOnly()
public java.lang.String[] getOptions()
getOptions
in interface OptionHandler
getOptions
in class SparkJob
public void setOptions(java.lang.String[] options) throws java.lang.Exception
setOptions
in interface OptionHandler
setOptions
in class SparkJob
java.lang.Exception
public java.lang.String globalInfo()
public java.lang.String randomizeAndStratifyTipText()
public boolean getRandomizeAndStratify()
public void setRandomizeAndStratify(boolean r)
r
- true if the input data is to be randomized and stratifiedpublic java.lang.String numRandomlyShuffledSplitsTipText()
public java.lang.String getNumRandomlyShuffledSplits()
public void setNumRandomlyShuffledSplits(java.lang.String s)
s
- the number of randomly shuffled splits to makepublic java.lang.String writeRandomlyShuffledSplitsToOutput()
public boolean getWriteRandomlyShuffledSplitsToOutput()
public void setWriteRandomlyShuffledSplitsToOutput(boolean write)
write
- true if the random shuffle job should output splits to diskpublic java.lang.String getCSVMapTaskOptions()
public void setCSVMapTaskOptions(java.lang.String opts)
opts
- options to the header jobpublic java.lang.String getClassifierMapTaskOptions()
public void setClassifierMapTaskOptions(java.lang.String opts)
opts
- options for the classifier taskpublic void setRandomizeJobOptions(java.lang.String opts)
opts
- the options for the randomize taskpublic java.lang.String getRandomizedJobOptions()
public java.lang.String classAttributeTipText()
public java.lang.String getClassAttribute()
public void setClassAttribute(java.lang.String c)
c
- the name or index of the class attributepublic java.lang.String numIterationsTipText()
public int getNumIterations()
> 1
only makes sense for incremental classifiers such as
SGD that converge on a solution.public void setNumIterations(int i)
> 1
only makes sense for incremental classifiers such as
SGD that converge on a solution.i
- the number of iterations to runpublic java.lang.String modelFileNameTipText()
public java.lang.String getModelFileName()
public void setModelFileName(java.lang.String m)
m
- the name only (not full path) that the model should be saved topublic java.lang.String pathToPreconstructedFilterTipText()
public java.lang.String getPathToPreconstructedFilter()
public void setPathToPreconstructedFilter(java.lang.String path)
path
- the path to a pre-constructed filter to usepublic boolean runJobWithContext(org.apache.spark.api.java.JavaSparkContext sparkContext) throws java.io.IOException, weka.distributed.DistributedWekaException
SparkJob
runJobWithContext
in class SparkJob
sparkContext
- the context to usejava.io.IOException
- if a IO problem occursweka.distributed.DistributedWekaException
- if any other problem occurspublic void run(java.lang.Object toRun, java.lang.String[] options) throws java.lang.IllegalArgumentException
run
in interface CommandlineRunnable
run
in class distributed.core.DistributedJob
java.lang.IllegalArgumentException
public Classifier getClassifier()
getClassifier
in interface ClassifierProducer
public Instances getTrainingHeader()
getTrainingHeader
in interface ClassifierProducer