WekaScoringSparkJob

java.lang.Object
- distributed.core.DistributedJob
- - weka.distributed.spark.SparkJob
  - - weka.distributed.spark.WekaScoringSparkJob

All Implemented Interfaces:

java.io.Serializable, CommandlineRunnable, EnvironmentHandler, OptionHandler
```
public class WekaScoringSparkJob
extends SparkJob
implements CommandlineRunnable
```
Spark job for scoring new data using an existing Weka classifier or clusterer model.

Version:

$Revision: 12253 $

Author:

Mark Hall (mhall{[at]}pentaho{[dot]}com)

See Also:

Serialized Form

Nested Class Summary
- Nested classes/interfaces inherited from class weka.distributed.spark.SparkJob
  SparkJob.NoKeyTextOutputFormat<K,V>
- Nested classes/interfaces inherited from class distributed.core.DistributedJob
  distributed.core.DistributedJob.JobStatus

Field Summary
- Fields inherited from class weka.distributed.spark.SparkJob
  TEST_DATA, TRAINING_DATA
- Fields inherited from class distributed.core.DistributedJob
  WEKA_ADDITIONAL_PACKAGES_KEY

Constructor Summary

Constructors
Constructor and Description

WekaScoringSparkJob()
Constructor

Constructors
Constructor and Description
`WekaScoringSparkJob()` Constructor

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`java.lang.String`	`classPredictionThresholdTipText()` Tip text for this property.
`java.lang.String`	`columnsToOutputInScoredDataTipText()` Tip text for this property
`java.lang.String`	`getClassPredictionThreshold()` Get the prediction threshold to apply - predicted instances below the threshold do not make it into the output.
`java.lang.String`	`getColumnsToOutputInScoredData()` Get the columns to output (as a comma-separated list of indexes) in the scored data.
`java.lang.String`	`getCSVMapTaskOptions()` Get the options for the ARFF header job
`java.lang.String[]`	`getJobOptionsOnly()`
`java.lang.String`	`getModelPath()` Get the path (HDFS or local) to the model to use for scoring.
`java.lang.String[]`	`getOptions()`
`java.lang.String`	`globalInfo()` Help information for this job
`java.util.Enumeration<Option>`	`listOptions()`
`static java.util.List<java.lang.Object>`	`loadModel(java.lang.String modelPath)` Helper method to load the model to score with
`static void`	`main(java.lang.String[] args)`
`java.lang.String`	`modelPathTipText()` Tip text for this property
`void`	`run(java.lang.Object toRun, java.lang.String[] options)`
`boolean`	`runJobWithContext(org.apache.spark.api.java.JavaSparkContext sparkContext)` Clients to implement
`void`	`setClassPredictionThreshold(java.lang.String thresh)` Set the prediction threshold to apply - predicted instances below the threshold do not make it into the output.
`void`	`setColumnsToOutputInScoredData(java.lang.String cols)` Set the columns to output (as a comma-separated list of indexes) in the scored data.
`void`	`setCSVMapTaskOptions(java.lang.String opts)` Set the options for the ARFF header job
`void`	`setModel(java.lang.Object model, Instances modelHeader, Instances dataHeader)` Set the model to use when scoring and intitalizes the output format
`void`	`setModelPath(java.lang.String modelPath)` Set the path (HDFS or local) to the model to use for scoring.
`void`	`setOptions(java.lang.String[] options)`

Methods inherited from class distributed.core.DistributedJob
environmentSubstitute, getAdditionalWekaPackageNames, getJobName, getJobStatus, getLog, logMessage, logMessage, logMessage, makeOptionsStr, objectRowToInstance, parseInstance, postExecution, preExecution, setEnvironment, setJobDescription, setJobName, setJobStatus, setLog, setStatusMessagePrefix, stackTraceToString, statusMessage, stopJob

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface weka.core.CommandlineRunnable
postExecution, preExecution

- Constructor Detail
  - WekaScoringSparkJob
```
public WekaScoringSparkJob()
```
    Constructor
- Method Detail
  - loadModel
```
public static java.util.List<java.lang.Object> loadModel(java.lang.String modelPath)
                                                  throws java.io.IOException
```
    Helper method to load the model to score with
    
    Parameters:
    
    modelPath - the name of the model file
    
    Returns:
    
    a list containing the model and the header of the data it was built with
    
    Throws:
    
    java.io.IOException - if a problem occurs
  - main
```
public static void main(java.lang.String[] args)
```
  - setModel
```
public void setModel(java.lang.Object model,
                     Instances modelHeader,
                     Instances dataHeader)
              throws weka.distributed.DistributedWekaException
```
    Set the model to use when scoring and intitalizes the output format
    
    Parameters:
    
    model - the model to use (supports classifiers and clusterers)
    
    modelHeader - the header of the data used to train the model
    
    dataHeader - the header of the incoming data (sans any summary attributes)
    
    Throws:
    
    weka.distributed.DistributedWekaException - if a problem occurs
  - globalInfo
```
public java.lang.String globalInfo()
```
    Help information for this job
    
    Returns:
    
    help information for this job
  - classPredictionThresholdTipText
```
public java.lang.String classPredictionThresholdTipText()
```
    Tip text for this property.
    
    Returns:
    
    the tip text for this property.
  - getClassPredictionThreshold
```
public java.lang.String getClassPredictionThreshold()
```
    Get the prediction threshold to apply - predicted instances below the threshold do not make it into the output. Takes the format: [label|index]:<double>. The label or index is the name of a class label (or zero-based index of the label) respectively. The label or index can be omitted entirely in the case of a numeric target; in the case of a nominal target, the first label (index 0) is assumed. If this option is unspecified, then no threshold is applied.
    
    Returns:
    
    the prediction threshold to apply
  - setClassPredictionThreshold
```
public void setClassPredictionThreshold(java.lang.String thresh)
```
    Set the prediction threshold to apply - predicted instances below the threshold do not make it into the output. Takes the format: [label|index]:<double>. The label or index is the name of a class label (or zero-based index of the label) respectively. The label or index can be omitted entirely in the case of a numeric target; in the case of a nominal target, the first label (index 0) is assumed. If this option is unspecified, then no threshold is applied.
    
    Parameters:
    
    thresh - the prediction threshold to apply
  - modelPathTipText
```
public java.lang.String modelPathTipText()
```
    Tip text for this property
    
    Returns:
    
    the tip text for this property
  - getModelPath
```
public java.lang.String getModelPath()
```
    Get the path (HDFS or local) to the model to use for scoring.
    
    Returns:
    
    the path to the model to use for scoring
  - setModelPath
```
public void setModelPath(java.lang.String modelPath)
```
    Set the path (HDFS or local) to the model to use for scoring.
    
    Parameters:
    
    modelPath - the path to the model to use for scoring
  - columnsToOutputInScoredDataTipText
```
public java.lang.String columnsToOutputInScoredDataTipText()
```
    Tip text for this property
    
    Returns:
    
    the tip text for this property
  - getColumnsToOutputInScoredData
```
public java.lang.String getColumnsToOutputInScoredData()
```
    Get the columns to output (as a comma-separated list of indexes) in the scored data. 'first' and 'last' may be used as well (e.g. 1,2,10-last).
    
    Returns:
    
    the columns to output in the scored data
  - setColumnsToOutputInScoredData
```
public void setColumnsToOutputInScoredData(java.lang.String cols)
```
    Set the columns to output (as a comma-separated list of indexes) in the scored data. 'first' and 'last' may be used as well (e.g. 1,2,10-last).
    
    Parameters:
    
    cols - the columns to output in the scored data
  - getCSVMapTaskOptions
```
public java.lang.String getCSVMapTaskOptions()
```
    Get the options for the ARFF header job
    
    Returns:
    
    the options for the ARFF header job
  - setCSVMapTaskOptions
```
public void setCSVMapTaskOptions(java.lang.String opts)
```
    Set the options for the ARFF header job
    
    Parameters:
    
    opts - options for the ARFF header job
  - listOptions
```
public java.util.Enumeration<Option> listOptions()
```
    Specified by:
    
    listOptions in interface OptionHandler
    
    Overrides:
    
    listOptions in class SparkJob
  - getOptions
```
public java.lang.String[] getOptions()
```
    Specified by:
    
    getOptions in interface OptionHandler
    
    Overrides:
    
    getOptions in class SparkJob
  - setOptions
```
public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
```
    Specified by:
    
    setOptions in interface OptionHandler
    
    Overrides:
    
    setOptions in class SparkJob
    
    Throws:
    
    java.lang.Exception
  - getJobOptionsOnly
```
public java.lang.String[] getJobOptionsOnly()
```
  - runJobWithContext
```
public boolean runJobWithContext(org.apache.spark.api.java.JavaSparkContext sparkContext)
                          throws java.io.IOException,
                                 weka.distributed.DistributedWekaException
```
    Description copied from class: SparkJob
    
    Clients to implement
    
    Specified by:
    
    runJobWithContext in class SparkJob
    
    Parameters:
    
    sparkContext - the context to use
    
    Returns:
    
    true if the job was successful
    
    Throws:
    
    java.io.IOException - if a IO problem occurs
    
    weka.distributed.DistributedWekaException - if any other problem occurs
  - run
```
public void run(java.lang.Object toRun,
                java.lang.String[] options)
         throws java.lang.IllegalArgumentException
```
    Specified by:
    
    run in interface CommandlineRunnable
    
    Overrides:
    
    run in class distributed.core.DistributedJob
    
    Throws:
    
    java.lang.IllegalArgumentException

Class WekaScoringSparkJob

Nested Class Summary

Nested classes/interfaces inherited from class weka.distributed.spark.SparkJob

Nested classes/interfaces inherited from class distributed.core.DistributedJob

Field Summary

Fields inherited from class weka.distributed.spark.SparkJob

Fields inherited from class distributed.core.DistributedJob

Constructor Summary

Method Summary

Methods inherited from class weka.distributed.spark.SparkJob

Methods inherited from class distributed.core.DistributedJob

Methods inherited from class java.lang.Object

Methods inherited from interface weka.core.CommandlineRunnable

Constructor Detail

WekaScoringSparkJob

Method Detail

loadModel

main

setModel

globalInfo

classPredictionThresholdTipText

getClassPredictionThreshold

setClassPredictionThreshold

modelPathTipText

getModelPath

setModelPath

columnsToOutputInScoredDataTipText

getColumnsToOutputInScoredData

setColumnsToOutputInScoredData

getCSVMapTaskOptions

setCSVMapTaskOptions

listOptions

getOptions

setOptions

getJobOptionsOnly

runJobWithContext

run