WekaClassifierHadoopJob

java.lang.Object
- distributed.core.DistributedJob
- - weka.distributed.hadoop.HadoopJob
  - - weka.distributed.hadoop.WekaClassifierHadoopJob

All Implemented Interfaces:

java.io.Serializable, CommandlineRunnable, EnvironmentHandler, OptionHandler, ClassifierProducer
```
public class WekaClassifierHadoopJob
extends HadoopJob
implements ClassifierProducer, CommandlineRunnable
```
Hadoop job for building a classifier or regressor.

Version:

$Revision: 11162 $

Author:

Mark Hall (mhall{[at]}pentaho{[dot]}com)

See Also:
Serialized Form

Nested Class Summary
- Nested classes/interfaces inherited from class distributed.core.DistributedJob
  distributed.core.DistributedJob.JobStatus

Field Summary
- Fields inherited from class weka.distributed.hadoop.HadoopJob
  COLT_JAR, DISTRIBUTED_WEKA_BASE_JAR, DISTRIBUTED_WEKA_HADOOP_JAR, JCOMMON_JAR, JFREECHART_JAR, LA4J_JAR, OPEN_CSV_JAR
- Fields inherited from class distributed.core.DistributedJob
  WEKA_ADDITIONAL_PACKAGES_KEY

Constructor Summary

Constructors
Constructor and Description

WekaClassifierHadoopJob()
Constructor

Constructors
Constructor and Description
`WekaClassifierHadoopJob()` Constructor

Method Summary

Methods
Modifier and Type	Method and Description
`java.lang.String`	`classAttributeTipText()` Tip text for this property
`java.lang.String`	`createRandomizedDataChunksTipText()` Tip text for this property
`java.lang.String`	`getClassAttribute()` Get the name or index of the class attribute ("first" and "last" can also be used)
`Classifier`	`getClassifier()`
`java.lang.String`	`getClassifierMapTaskOptions()` Get the options for the underlying map task
`boolean`	`getCreateRandomizedDataChunks()` Get whether to create randomly shuffled (and stratified if the class is nominal) data chunks via a pre-processing pass/job.
`java.lang.String`	`getCSVMapTaskOptions()` Get the options to the header job
`java.lang.String[]`	`getJobOptionsOnly()` Get the options for this job only
`java.lang.String`	`getMinTrainingFraction()` Get the minimum training fraction.
`java.lang.String`	`getModelFileName()` Get the name only for the model file
`java.lang.String`	`getNumInstancesPerRandomizedDataChunk()` Get the number of instances that each randomly shuffled data chunk should have.
`int`	`getNumIterations()` Get the number of iterations (passes over the data) to run in the model building phase.
`java.lang.String`	`getNumRandomizedDataChunks()` Get the number of randomly shuffled data chunks to create.
`java.lang.String[]`	`getOptions()`
`java.lang.String`	`getPathToPreconstructedFilter()` Get the path to a pre-constructed filter to use to pre-process the data entering each map.
`Instances`	`getTrainingHeader()`
`java.lang.String`	`globalInfo()` Help information
`java.util.Enumeration<Option>`	`listOptions()`
`static void`	`main(java.lang.String[] args)`
`java.lang.String`	`minTrainingFractionTipText()` Tip text for this property
`java.lang.String`	`modelFileNameTipText()` Tip text for this property
`java.lang.String`	`numInstancesPerRandomizedDataChunkTipText()` Tip text for this property
`java.lang.String`	`numIterationsTipText()` Tip text for this property
`java.lang.String`	`numRandomizedDataChunksTipText()` Tip text for this property
`java.lang.String`	`pathToPreconstructedFilterTipText()` Tip text for this property
`void`	`run(java.lang.Object toRun, java.lang.String[] args)`
`boolean`	`runJob()`
`void`	`setClassAttribute(java.lang.String c)` Set the name or index of the class attribute ("first" and "last" can also be used)
`void`	`setClassifierMapTaskOptions(java.lang.String opts)` Set the options for the underlying map task
`void`	`setCreateRandomizedDataChunks(boolean s)` Set whether to create randomly shuffled (and stratified if the class is nominal) data chunks via a pre-processing pass/job.
`void`	`setCSVMapTaskOptions(java.lang.String opts)` Set the options to the header job
`void`	`setMinTrainingFraction(java.lang.String frac)` Set the minimum training fraction.
`void`	`setModelFileName(java.lang.String m)` Set the name only for the model file
`void`	`setNumInstancesPerRandomizedDataChunk(java.lang.String insts)` Set the number of instances that each randomly shuffled data chunk should have.
`void`	`setNumIterations(int i)` Set the number of iterations (passes over the data) to run in the model building phase.
`void`	`setNumRandomizedDataChunks(java.lang.String chunks)` Set the number of randomly shuffled data chunks to create.
`void`	`setOptions(java.lang.String[] options)`
`void`	`setPathToPreconstructedFilter(java.lang.String path)` Set the path to a pre-constructed filter to use to pre-process the data entering each map.
`void`	`stopJob()`

Methods inherited from class weka.distributed.hadoop.HadoopJob
additionalWekaPackagesTipText, cleanOutputDirectory, deubgTipText, getAdditionalWekaPackages, getBaseOptionsOnly, getDebug, getLoggingInterval, getMapNumber, getMapReduceJobConfig, getMapReduceNumber, getPathToWekaJar, getReduceNumber, loggingIntervalTipText, pathToWekaJarTipText, setAdditionalWekaPackages, setDebug, setLoggingInterval, setMapReduceJobConfig, setPathToWekaJar

Methods inherited from class distributed.core.DistributedJob
environmentSubstitute, getAdditionalWekaPackageNames, getJobName, getJobStatus, getLog, makeOptionsStr, parseInstance, setEnvironment, setJobDescription, setJobName, setJobStatus, setLog, setStatusMessagePrefix

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - WekaClassifierHadoopJob
```
public WekaClassifierHadoopJob()
```
    Constructor
- Method Detail
  - globalInfo
```
public java.lang.String globalInfo()
```
    Help information
    
    Returns:
    help information for this job
  - modelFileNameTipText
```
public java.lang.String modelFileNameTipText()
```
    Tip text for this property
    
    Returns:
    tip text for this property
  - setModelFileName
```
public void setModelFileName(java.lang.String m)
```
    Set the name only for the model file
    
    Parameters:
    m - the name only (not full path) that the model should be saved to
  - getModelFileName
```
public java.lang.String getModelFileName()
```
    Get the name only for the model file
    
    Returns:
    the name only (not full path) that the model should be saved to
  - numIterationsTipText
```
public java.lang.String numIterationsTipText()
```
    Tip text for this property
    
    Returns:
    tip text for this property
  - setNumIterations
```
public void setNumIterations(int i)
```
    Set the number of iterations (passes over the data) to run in the model building phase. > 1 only makes sense for incremental classifiers such as SGD that converge on a solution.
    
    Parameters:
    i - the number of iterations to run
  - getNumIterations
```
public int getNumIterations()
```
    Get the number of iterations (passes over the data) to run in the model building phase. > 1 only makes sense for incremental classifiers such as SGD that converge on a solution.
    
    Returns:
    the number of iterations to run
  - classAttributeTipText
```
public java.lang.String classAttributeTipText()
```
    Tip text for this property
    
    Returns:
    tip text for this property
  - setClassAttribute
```
public void setClassAttribute(java.lang.String c)
```
    Set the name or index of the class attribute ("first" and "last" can also be used)
    
    Parameters:
    c - the name or index of the class attribute
  - getClassAttribute
```
public java.lang.String getClassAttribute()
```
    Get the name or index of the class attribute ("first" and "last" can also be used)
    
    Returns:
    the name or index of the class attribute
  - setCSVMapTaskOptions
```
public void setCSVMapTaskOptions(java.lang.String opts)
```
    Set the options to the header job
    
    Parameters:
    opts - options to the header job
  - getCSVMapTaskOptions
```
public java.lang.String getCSVMapTaskOptions()
```
    Get the options to the header job
    
    Returns:
    options to the header job
  - setClassifierMapTaskOptions
```
public void setClassifierMapTaskOptions(java.lang.String opts)
```
    Set the options for the underlying map task
    
    Parameters:
    opts - the options for the underlying map task
  - getClassifierMapTaskOptions
```
public java.lang.String getClassifierMapTaskOptions()
```
    Get the options for the underlying map task
    
    Returns:
    the options for the underlying map task
  - minTrainingFractionTipText
```
public java.lang.String minTrainingFractionTipText()
```
    Tip text for this property
    
    Returns:
    tip text for this property
  - setMinTrainingFraction
```
public void setMinTrainingFraction(java.lang.String frac)
```
    Set the minimum training fraction. This is a percentage of the total number of instances seen by the map task that has seen the most data. This option is useful when not using randomly shuffled data chunks, as there may be one input split that contains significantly less data than all the others and we might want to discard the model learned on this chunk.
    
    Parameters:
    frac - the fraction of training instances below which a model should be discarded from the aggregation
  - getMinTrainingFraction
```
public java.lang.String getMinTrainingFraction()
```
    Get the minimum training fraction. This is a percentage of the total number of instances seen by the map task that has seen the most data. This option is useful when not using randomly shuffled data chunks, as there may be one input split that contains significantly less data than all the others and we might want to discard the model learned on this chunk.
    
    Returns:
    the fraction of training instances below which a model should be discarded from the aggregation
  - pathToPreconstructedFilterTipText
```
public java.lang.String pathToPreconstructedFilterTipText()
```
    Tip text for this property
    
    Returns:
    tip text for this property
  - setPathToPreconstructedFilter
```
public void setPathToPreconstructedFilter(java.lang.String path)
```
    Set the path to a pre-constructed filter to use to pre-process the data entering each map. This path may be inside or outside of HDFS.
    
    Parameters:
    path - the path to a pre-constructed filter to use
  - getPathToPreconstructedFilter
```
public java.lang.String getPathToPreconstructedFilter()
```
    Get the path to a pre-constructed filter to use to pre-process the data entering each map. This path may be inside or outside of HDFS.
    
    Returns:
    the path to a pre-constructed filter to use
  - createRandomizedDataChunksTipText
```
public java.lang.String createRandomizedDataChunksTipText()
```
    Tip text for this property
    
    Returns:
    tip text for this property
  - setCreateRandomizedDataChunks
```
public void setCreateRandomizedDataChunks(boolean s)
```
    Set whether to create randomly shuffled (and stratified if the class is nominal) data chunks via a pre-processing pass/job.
    
    Parameters:
    s - true if randomly shuffled data chunks are to be created for input
  - getCreateRandomizedDataChunks
```
public boolean getCreateRandomizedDataChunks()
```
    Get whether to create randomly shuffled (and stratified if the class is nominal) data chunks via a pre-processing pass/job.
    
    Returns:
    true if randomly shuffled data chunks are to be created for input
  - numRandomizedDataChunksTipText
```
public java.lang.String numRandomizedDataChunksTipText()
```
    Tip text for this property
    
    Returns:
    tip text for this property
  - setNumRandomizedDataChunks
```
public void setNumRandomizedDataChunks(java.lang.String chunks)
```
    Set the number of randomly shuffled data chunks to create. Use in conjunction with createRandomizedDataChunks.
    
    Parameters:
    chunks - the number of chunks to create.
  - getNumRandomizedDataChunks
```
public java.lang.String getNumRandomizedDataChunks()
```
    Get the number of randomly shuffled data chunks to create. Use in conjunction with createRandomizedDataChunks.
    
    Returns:
    the number of chunks to create.
  - numInstancesPerRandomizedDataChunkTipText
```
public java.lang.String numInstancesPerRandomizedDataChunkTipText()
```
    Tip text for this property
    
    Returns:
    tip text for this property
  - setNumInstancesPerRandomizedDataChunk
```
public void setNumInstancesPerRandomizedDataChunk(java.lang.String insts)
```
    Set the number of instances that each randomly shuffled data chunk should have. Use in conjunction with createRandomizedDataChunks.
    
    Parameters:
    insts - the number of instances that each randomly shuffled data chunk should contain
  - getNumInstancesPerRandomizedDataChunk
```
public java.lang.String getNumInstancesPerRandomizedDataChunk()
```
    Get the number of instances that each randomly shuffled data chunk should have. Use in conjunction with createRandomizedDataChunks.
    
    Returns:
    the number of instances that each randomly shuffled data chunk should contain
  - listOptions
```
public java.util.Enumeration<Option> listOptions()
```
    Specified by:
    
    listOptions in interface OptionHandler
    
    Overrides:
    
    listOptions in class HadoopJob
  - setOptions
```
public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
```
    Specified by:
    
    setOptions in interface OptionHandler
    
    Overrides:
    
    setOptions in class HadoopJob
    
    Throws:
    
    java.lang.Exception
  - getOptions
```
public java.lang.String[] getOptions()
```
    Specified by:
    
    getOptions in interface OptionHandler
    
    Overrides:
    
    getOptions in class HadoopJob
  - getJobOptionsOnly
```
public java.lang.String[] getJobOptionsOnly()
```
    Get the options for this job only
    
    Returns:
    the options for this job only
  - runJob
```
public boolean runJob()
               throws weka.distributed.DistributedWekaException
```
    Specified by:
    
    runJob in class distributed.core.DistributedJob
    
    Throws:
    
    weka.distributed.DistributedWekaException
  - getClassifier
```
public Classifier getClassifier()
```
    Specified by:
    
    getClassifier in interface ClassifierProducer
  - stopJob
```
public void stopJob()
```
    Overrides:
    
    stopJob in class distributed.core.DistributedJob
  - getTrainingHeader
```
public Instances getTrainingHeader()
```
    Specified by:
    
    getTrainingHeader in interface ClassifierProducer
  - main
```
public static void main(java.lang.String[] args)
```
  - run
```
public void run(java.lang.Object toRun,
       java.lang.String[] args)
         throws java.lang.IllegalArgumentException
```
    Specified by:
    
    run in interface CommandlineRunnable
    
    Throws:
    
    java.lang.IllegalArgumentException

Class WekaClassifierHadoopJob

Nested Class Summary

Nested classes/interfaces inherited from class distributed.core.DistributedJob

Field Summary

Fields inherited from class weka.distributed.hadoop.HadoopJob

Fields inherited from class distributed.core.DistributedJob

Constructor Summary

Method Summary

Methods inherited from class weka.distributed.hadoop.HadoopJob

Methods inherited from class distributed.core.DistributedJob

Methods inherited from class java.lang.Object

Constructor Detail

WekaClassifierHadoopJob

Method Detail

globalInfo

modelFileNameTipText

setModelFileName

getModelFileName

numIterationsTipText

setNumIterations

getNumIterations

classAttributeTipText

setClassAttribute

getClassAttribute

setCSVMapTaskOptions

getCSVMapTaskOptions

setClassifierMapTaskOptions

getClassifierMapTaskOptions

minTrainingFractionTipText

setMinTrainingFraction

getMinTrainingFraction

pathToPreconstructedFilterTipText

setPathToPreconstructedFilter

getPathToPreconstructedFilter

createRandomizedDataChunksTipText

setCreateRandomizedDataChunks

getCreateRandomizedDataChunks

numRandomizedDataChunksTipText

setNumRandomizedDataChunks

getNumRandomizedDataChunks

numInstancesPerRandomizedDataChunkTipText

setNumInstancesPerRandomizedDataChunk

getNumInstancesPerRandomizedDataChunk

listOptions

setOptions

getOptions

getJobOptionsOnly

runJob

getClassifier

stopJob

getTrainingHeader

main

run