public class WekaScoringSparkJob extends SparkJob implements CommandlineRunnable
SparkJob.NoKeyTextOutputFormat<K,V>
TEST_DATA, TRAINING_DATA
Constructor and Description |
---|
WekaScoringSparkJob()
Constructor
|
Modifier and Type | Method and Description |
---|---|
java.lang.String |
classPredictionThresholdTipText()
Tip text for this property.
|
java.lang.String |
columnsToOutputInScoredDataTipText()
Tip text for this property
|
java.lang.String |
getClassPredictionThreshold()
Get the prediction threshold to apply - predicted instances below the
threshold do not make it into the output.
|
java.lang.String |
getColumnsToOutputInScoredData()
Get the columns to output (as a comma-separated list of indexes) in the
scored data.
|
java.lang.String |
getCSVMapTaskOptions()
Get the options for the ARFF header job
|
java.lang.String[] |
getJobOptionsOnly() |
java.lang.String |
getModelPath()
Get the path (HDFS or local) to the model to use for scoring.
|
java.lang.String[] |
getOptions() |
java.lang.String |
globalInfo()
Help information for this job
|
java.util.Enumeration<Option> |
listOptions() |
static java.util.List<java.lang.Object> |
loadModel(java.lang.String modelPath)
Helper method to load the model to score with
|
static void |
main(java.lang.String[] args) |
java.lang.String |
modelPathTipText()
Tip text for this property
|
void |
run(java.lang.Object toRun,
java.lang.String[] options) |
boolean |
runJobWithContext(org.apache.spark.api.java.JavaSparkContext sparkContext)
Clients to implement
|
void |
setClassPredictionThreshold(java.lang.String thresh)
Set the prediction threshold to apply - predicted instances below the
threshold do not make it into the output.
|
void |
setColumnsToOutputInScoredData(java.lang.String cols)
Set the columns to output (as a comma-separated list of indexes) in the
scored data.
|
void |
setCSVMapTaskOptions(java.lang.String opts)
Set the options for the ARFF header job
|
void |
setModel(java.lang.Object model,
Instances modelHeader,
Instances dataHeader)
Set the model to use when scoring and intitalizes the output format
|
void |
setModelPath(java.lang.String modelPath)
Set the path (HDFS or local) to the model to use for scoring.
|
void |
setOptions(java.lang.String[] options) |
addSubdirToPath, checkFileExists, createSparkContextForJob, debugTipText, deleteDirectory, getBaseOptionsOnly, getCachingStrategy, getDataset, getDatasets, getDebug, getFSConfigurationForPath, getSizeInBytesOfPath, getSparkContext, getSparkJobConfig, initJob, initSparkLogAppender, loadCSVFile, loadInput, loadInstanceObjectFile, openFileForRead, openFileForWrite, openTextFileForWrite, removeSparkLogAppender, resolveLocalOrOtherFileSystemPath, runJob, setCachingStrategy, setDataset, setDebug, shutdownJob, stringRDDToInstanceRDD
environmentSubstitute, getAdditionalWekaPackageNames, getJobName, getJobStatus, getLog, logMessage, logMessage, logMessage, makeOptionsStr, objectRowToInstance, parseInstance, postExecution, preExecution, setEnvironment, setJobDescription, setJobName, setJobStatus, setLog, setStatusMessagePrefix, stackTraceToString, statusMessage, stopJob
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
postExecution, preExecution
public static java.util.List<java.lang.Object> loadModel(java.lang.String modelPath) throws java.io.IOException
modelPath
- the name of the model filejava.io.IOException
- if a problem occurspublic static void main(java.lang.String[] args)
public void setModel(java.lang.Object model, Instances modelHeader, Instances dataHeader) throws weka.distributed.DistributedWekaException
model
- the model to use (supports classifiers and clusterers)modelHeader
- the header of the data used to train the modeldataHeader
- the header of the incoming data (sans any summary
attributes)weka.distributed.DistributedWekaException
- if a problem occurspublic java.lang.String globalInfo()
public java.lang.String classPredictionThresholdTipText()
public java.lang.String getClassPredictionThreshold()
[label|index]:<double>
. The label or index is the name of a class label (or
zero-based index of the label) respectively. The label or index can be
omitted entirely in the case of a numeric target; in the case of a nominal
target, the first label (index 0) is assumed.
If this option is unspecified, then no threshold is applied.public void setClassPredictionThreshold(java.lang.String thresh)
[label|index]:<double>
. The label or index is the name of a class label (or
zero-based index of the label) respectively. The label or index can be
omitted entirely in the case of a numeric target; in the case of a nominal
target, the first label (index 0) is assumed.
If this option is unspecified, then no threshold is applied.thresh
- the prediction threshold to applypublic java.lang.String modelPathTipText()
public java.lang.String getModelPath()
public void setModelPath(java.lang.String modelPath)
modelPath
- the path to the model to use for scoringpublic java.lang.String columnsToOutputInScoredDataTipText()
public java.lang.String getColumnsToOutputInScoredData()
public void setColumnsToOutputInScoredData(java.lang.String cols)
cols
- the columns to output in the scored datapublic java.lang.String getCSVMapTaskOptions()
public void setCSVMapTaskOptions(java.lang.String opts)
opts
- options for the ARFF header jobpublic java.util.Enumeration<Option> listOptions()
listOptions
in interface OptionHandler
listOptions
in class SparkJob
public java.lang.String[] getOptions()
getOptions
in interface OptionHandler
getOptions
in class SparkJob
public void setOptions(java.lang.String[] options) throws java.lang.Exception
setOptions
in interface OptionHandler
setOptions
in class SparkJob
java.lang.Exception
public java.lang.String[] getJobOptionsOnly()
public boolean runJobWithContext(org.apache.spark.api.java.JavaSparkContext sparkContext) throws java.io.IOException, weka.distributed.DistributedWekaException
SparkJob
runJobWithContext
in class SparkJob
sparkContext
- the context to usejava.io.IOException
- if a IO problem occursweka.distributed.DistributedWekaException
- if any other problem occurspublic void run(java.lang.Object toRun, java.lang.String[] options) throws java.lang.IllegalArgumentException
run
in interface CommandlineRunnable
run
in class distributed.core.DistributedJob
java.lang.IllegalArgumentException