WekaClassifierSparkJob

java.lang.Object
- distributed.core.DistributedJob
- - weka.distributed.spark.SparkJob
  - - weka.distributed.spark.WekaClassifierSparkJob

All Implemented Interfaces:

java.io.Serializable, CommandlineRunnable, EnvironmentHandler, OptionHandler, ClassifierProducer
```
public class WekaClassifierSparkJob
extends SparkJob
implements CommandlineRunnable, ClassifierProducer
```
Job for building a classifier in Spark.

Version:

$Revision: 12253 $

Author:

Mark Hall (mhall{[at]}pentaho{[dot]}com)

See Also:

Serialized Form

Nested Class Summary
- Nested classes/interfaces inherited from class weka.distributed.spark.SparkJob
  SparkJob.NoKeyTextOutputFormat<K,V>
- Nested classes/interfaces inherited from class distributed.core.DistributedJob
  distributed.core.DistributedJob.JobStatus

Field Summary
- Fields inherited from class weka.distributed.spark.SparkJob
  TEST_DATA, TRAINING_DATA
- Fields inherited from class distributed.core.DistributedJob
  WEKA_ADDITIONAL_PACKAGES_KEY

Constructor Summary

Constructors
Constructor and Description

WekaClassifierSparkJob()

Constructors
Constructor and Description
`WekaClassifierSparkJob()`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`java.lang.String`	`classAttributeTipText()` Tip text for this property
`java.lang.String`	`getClassAttribute()` Get the name or index of the class attribute ("first" and "last" can also be used)
`Classifier`	`getClassifier()`
`java.lang.String`	`getClassifierMapTaskOptions()` Get the options for the classifier task
`java.lang.String`	`getCSVMapTaskOptions()` Get the options to the header job
`java.lang.String[]`	`getJobOptionsOnly()`
`java.lang.String`	`getModelFileName()` Get the name only for the model file
`int`	`getNumIterations()` Get the number of iterations (passes over the data) to run in the model building phase.
`java.lang.String`	`getNumRandomlyShuffledSplits()` Get the number of randomly shuffled splits to make (if randomly shuffling the data)
`java.lang.String[]`	`getOptions()`
`java.lang.String`	`getPathToPreconstructedFilter()` Get the path to a pre-constructed filter to use to pre-process the data entering each map.
`boolean`	`getRandomizeAndStratify()` Get whether to randomize (and stratify) the input data or not
`java.lang.String`	`getRandomizedJobOptions()` Get the options for the randomize/stratify task
`Instances`	`getTrainingHeader()`
`boolean`	`getWriteRandomlyShuffledSplitsToOutput()` Get whether the randomly shuffled data job should output its splits to the file system
`java.lang.String`	`globalInfo()` Help information
`java.util.Enumeration<Option>`	`listOptions()`
`static void`	`main(java.lang.String[] args)`
`java.lang.String`	`modelFileNameTipText()` Tip text for this property
`java.lang.String`	`numIterationsTipText()` Tip text for this property
`java.lang.String`	`numRandomlyShuffledSplitsTipText()` Tip text for this property
`java.lang.String`	`pathToPreconstructedFilterTipText()` Tip text for this property
`java.lang.String`	`randomizeAndStratifyTipText()` Tip text for this property
`void`	`run(java.lang.Object toRun, java.lang.String[] options)`
`boolean`	`runJobWithContext(org.apache.spark.api.java.JavaSparkContext sparkContext)` Clients to implement
`void`	`setClassAttribute(java.lang.String c)` Set the name or index of the class attribute ("first" and "last" can also be used)
`void`	`setClassifierMapTaskOptions(java.lang.String opts)` Set the options for the classifier task
`static void`	`setClassIndex(java.lang.String classNameOrIndex, Instances data, boolean defaultToLast)` Helper method for setting the class index in the supplied Instances object
`void`	`setCSVMapTaskOptions(java.lang.String opts)` Set the options to the header job
`void`	`setModelFileName(java.lang.String m)` Set the name only for the model file
`void`	`setNumIterations(int i)` Set the number of iterations (passes over the data) to run in the model building phase.
`void`	`setNumRandomlyShuffledSplits(java.lang.String s)` Set the number of randomly shuffled splits to make (if randomly shuffling the data)
`void`	`setOptions(java.lang.String[] options)`
`void`	`setPathToPreconstructedFilter(java.lang.String path)` Set the path to a pre-constructed filter to use to pre-process the data entering each map.
`void`	`setRandomizeAndStratify(boolean r)` Set whether to randomize (and stratify) the input data or not
`void`	`setRandomizeJobOptions(java.lang.String opts)` Set the options for the randomize/stratify task
`void`	`setWriteRandomlyShuffledSplitsToOutput(boolean write)` Set whether the randomly shuffled data job should output its splits to the file system
`static void`	`writeModelToDestination(java.lang.Object model, Instances header, java.lang.String outputPath)` Utility routine to write a Weka model to a destination path
`java.lang.String`	`writeRandomlyShuffledSplitsToOutput()` Tip text for this property

Methods inherited from class distributed.core.DistributedJob
environmentSubstitute, getAdditionalWekaPackageNames, getJobName, getJobStatus, getLog, logMessage, logMessage, logMessage, makeOptionsStr, objectRowToInstance, parseInstance, postExecution, preExecution, setEnvironment, setJobDescription, setJobName, setJobStatus, setLog, setStatusMessagePrefix, stackTraceToString, statusMessage, stopJob

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface weka.core.CommandlineRunnable
postExecution, preExecution

- Constructor Detail
  - WekaClassifierSparkJob
```
public WekaClassifierSparkJob()
```
- Method Detail
  - setClassIndex
```
public static void setClassIndex(java.lang.String classNameOrIndex,
                                 Instances data,
                                 boolean defaultToLast)
                          throws java.lang.Exception
```
    Helper method for setting the class index in the supplied Instances object
    
    Parameters:
    
    classNameOrIndex - name or index of the class attribute (may be the special 'first' or 'last' strings)
    
    data - the data to set the class index in
    
    defaultToLast - true if the data should have the last attribute set to the class if no class name/index is supplied
    
    Throws:
    
    java.lang.Exception - if a problem occurs
  - writeModelToDestination
```
public static void writeModelToDestination(java.lang.Object model,
                                           Instances header,
                                           java.lang.String outputPath)
                                    throws java.io.IOException
```
    Utility routine to write a Weka model to a destination path
    
    Parameters:
    
    model - the model to write
    
    header - the header of the training data used to train the model
    
    outputPath - the path to write to
    
    Throws:
    
    java.io.IOException - if a problem occurs
  - main
```
public static void main(java.lang.String[] args)
```
  - listOptions
```
public java.util.Enumeration<Option> listOptions()
```
    Specified by:
    
    listOptions in interface OptionHandler
    
    Overrides:
    
    listOptions in class SparkJob
  - getJobOptionsOnly
```
public java.lang.String[] getJobOptionsOnly()
```
  - getOptions
```
public java.lang.String[] getOptions()
```
    Specified by:
    
    getOptions in interface OptionHandler
    
    Overrides:
    
    getOptions in class SparkJob
  - setOptions
```
public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
```
    Specified by:
    
    setOptions in interface OptionHandler
    
    Overrides:
    
    setOptions in class SparkJob
    
    Throws:
    
    java.lang.Exception
  - globalInfo
```
public java.lang.String globalInfo()
```
    Help information
    
    Returns:
    
    help information for this job
  - randomizeAndStratifyTipText
```
public java.lang.String randomizeAndStratifyTipText()
```
    Tip text for this property
    
    Returns:
    
    the tip texdt for this property
  - getRandomizeAndStratify
```
public boolean getRandomizeAndStratify()
```
    Get whether to randomize (and stratify) the input data or not
    
    Returns:
    
    true if the input data is to be randomized and stratified
  - setRandomizeAndStratify
```
public void setRandomizeAndStratify(boolean r)
```
    Set whether to randomize (and stratify) the input data or not
    
    Parameters:
    
    r - true if the input data is to be randomized and stratified
  - numRandomlyShuffledSplitsTipText
```
public java.lang.String numRandomlyShuffledSplitsTipText()
```
    Tip text for this property
    
    Returns:
    
    the tip text for this property
  - getNumRandomlyShuffledSplits
```
public java.lang.String getNumRandomlyShuffledSplits()
```
    Get the number of randomly shuffled splits to make (if randomly shuffling the data)
    
    Returns:
    
    the number of randomly shuffled splits to make
  - setNumRandomlyShuffledSplits
```
public void setNumRandomlyShuffledSplits(java.lang.String s)
```
    Set the number of randomly shuffled splits to make (if randomly shuffling the data)
    
    Parameters:
    
    s - the number of randomly shuffled splits to make
  - writeRandomlyShuffledSplitsToOutput
```
public java.lang.String writeRandomlyShuffledSplitsToOutput()
```
    Tip text for this property
    
    Returns:
    
    the tip text for this property
  - getWriteRandomlyShuffledSplitsToOutput
```
public boolean getWriteRandomlyShuffledSplitsToOutput()
```
    Get whether the randomly shuffled data job should output its splits to the file system
    
    Returns:
    
    true if the random shuffle job should output splits to disk
  - setWriteRandomlyShuffledSplitsToOutput
```
public void setWriteRandomlyShuffledSplitsToOutput(boolean write)
```
    Set whether the randomly shuffled data job should output its splits to the file system
    
    Parameters:
    
    write - true if the random shuffle job should output splits to disk
  - getCSVMapTaskOptions
```
public java.lang.String getCSVMapTaskOptions()
```
    Get the options to the header job
    
    Returns:
    
    options to the header job
  - setCSVMapTaskOptions
```
public void setCSVMapTaskOptions(java.lang.String opts)
```
    Set the options to the header job
    
    Parameters:
    
    opts - options to the header job
  - getClassifierMapTaskOptions
```
public java.lang.String getClassifierMapTaskOptions()
```
    Get the options for the classifier task
    
    Returns:
    
    options for the classifier task
  - setClassifierMapTaskOptions
```
public void setClassifierMapTaskOptions(java.lang.String opts)
```
    Set the options for the classifier task
    
    Parameters:
    
    opts - options for the classifier task
  - setRandomizeJobOptions
```
public void setRandomizeJobOptions(java.lang.String opts)
```
    Set the options for the randomize/stratify task
    
    Parameters:
    
    opts - the options for the randomize task
  - getRandomizedJobOptions
```
public java.lang.String getRandomizedJobOptions()
```
    Get the options for the randomize/stratify task
    
    Returns:
    
    the options for the randomize task
  - classAttributeTipText
```
public java.lang.String classAttributeTipText()
```
    Tip text for this property
    
    Returns:
    
    tip text for this property
  - getClassAttribute
```
public java.lang.String getClassAttribute()
```
    Get the name or index of the class attribute ("first" and "last" can also be used)
    
    Returns:
    
    the name or index of the class attribute
  - setClassAttribute
```
public void setClassAttribute(java.lang.String c)
```
    Set the name or index of the class attribute ("first" and "last" can also be used)
    
    Parameters:
    
    c - the name or index of the class attribute
  - numIterationsTipText
```
public java.lang.String numIterationsTipText()
```
    Tip text for this property
    
    Returns:
    
    tip text for this property
  - getNumIterations
```
public int getNumIterations()
```
    Get the number of iterations (passes over the data) to run in the model building phase. > 1 only makes sense for incremental classifiers such as SGD that converge on a solution.
    
    Returns:
    
    the number of iterations to run
  - setNumIterations
```
public void setNumIterations(int i)
```
    Set the number of iterations (passes over the data) to run in the model building phase. > 1 only makes sense for incremental classifiers such as SGD that converge on a solution.
    
    Parameters:
    
    i - the number of iterations to run
  - modelFileNameTipText
```
public java.lang.String modelFileNameTipText()
```
    Tip text for this property
    
    Returns:
    
    tip text for this property
  - getModelFileName
```
public java.lang.String getModelFileName()
```
    Get the name only for the model file
    
    Returns:
    
    the name only (not full path) that the model should be saved to
  - setModelFileName
```
public void setModelFileName(java.lang.String m)
```
    Set the name only for the model file
    
    Parameters:
    
    m - the name only (not full path) that the model should be saved to
  - pathToPreconstructedFilterTipText
```
public java.lang.String pathToPreconstructedFilterTipText()
```
    Tip text for this property
    
    Returns:
    
    tip text for this property
  - getPathToPreconstructedFilter
```
public java.lang.String getPathToPreconstructedFilter()
```
    Get the path to a pre-constructed filter to use to pre-process the data entering each map. This path may be inside or outside of HDFS.
    
    Returns:
    
    the path to a pre-constructed filter to use
  - setPathToPreconstructedFilter
```
public void setPathToPreconstructedFilter(java.lang.String path)
```
    Set the path to a pre-constructed filter to use to pre-process the data entering each map. This path may be inside or outside of HDFS.
    
    Parameters:
    
    path - the path to a pre-constructed filter to use
  - runJobWithContext
```
public boolean runJobWithContext(org.apache.spark.api.java.JavaSparkContext sparkContext)
                          throws java.io.IOException,
                                 weka.distributed.DistributedWekaException
```
    Description copied from class: SparkJob
    
    Clients to implement
    
    Specified by:
    
    runJobWithContext in class SparkJob
    
    Parameters:
    
    sparkContext - the context to use
    
    Returns:
    
    true if the job was successful
    
    Throws:
    
    java.io.IOException - if a IO problem occurs
    
    weka.distributed.DistributedWekaException - if any other problem occurs
  - run
```
public void run(java.lang.Object toRun,
                java.lang.String[] options)
         throws java.lang.IllegalArgumentException
```
    Specified by:
    
    run in interface CommandlineRunnable
    
    Overrides:
    
    run in class distributed.core.DistributedJob
    
    Throws:
    
    java.lang.IllegalArgumentException
  - getClassifier
```
public Classifier getClassifier()
```
    Specified by:
    
    getClassifier in interface ClassifierProducer
  - getTrainingHeader
```
public Instances getTrainingHeader()
```
    Specified by:
    
    getTrainingHeader in interface ClassifierProducer

Class WekaClassifierSparkJob

Nested Class Summary

Nested classes/interfaces inherited from class weka.distributed.spark.SparkJob

Nested classes/interfaces inherited from class distributed.core.DistributedJob

Field Summary

Fields inherited from class weka.distributed.spark.SparkJob

Fields inherited from class distributed.core.DistributedJob

Constructor Summary

Method Summary

Methods inherited from class weka.distributed.spark.SparkJob

Methods inherited from class distributed.core.DistributedJob

Methods inherited from class java.lang.Object

Methods inherited from interface weka.core.CommandlineRunnable

Constructor Detail

WekaClassifierSparkJob

Method Detail

setClassIndex

writeModelToDestination

main

listOptions

getJobOptionsOnly

getOptions

setOptions

globalInfo

randomizeAndStratifyTipText

getRandomizeAndStratify

setRandomizeAndStratify

numRandomlyShuffledSplitsTipText

getNumRandomlyShuffledSplits

setNumRandomlyShuffledSplits

writeRandomlyShuffledSplitsToOutput

getWriteRandomlyShuffledSplitsToOutput

setWriteRandomlyShuffledSplitsToOutput

getCSVMapTaskOptions

setCSVMapTaskOptions

getClassifierMapTaskOptions

setClassifierMapTaskOptions

setRandomizeJobOptions

getRandomizedJobOptions

classAttributeTipText

getClassAttribute

setClassAttribute

numIterationsTipText

getNumIterations

setNumIterations

modelFileNameTipText

getModelFileName

setModelFileName

pathToPreconstructedFilterTipText

getPathToPreconstructedFilter

setPathToPreconstructedFilter

runJobWithContext

run

getClassifier

getTrainingHeader