RandomizedDataChunkHadoopJob

java.lang.Object
- distributed.core.DistributedJob
- - weka.distributed.hadoop.HadoopJob
  - - weka.distributed.hadoop.RandomizedDataChunkHadoopJob

All Implemented Interfaces:

java.io.Serializable, CommandlineRunnable, EnvironmentHandler, OptionHandler
```
public class RandomizedDataChunkHadoopJob
extends HadoopJob
implements CommandlineRunnable
```
Job for creating randomly shuffled (and stratified if a nominal class is set) data chunks.

Version:

$Revision: 11095 $

Author:

Mark Hall (mhall{[at]}pentaho{[dot]}com)

See Also:
Serialized Form

Nested Class Summary
- Nested classes/interfaces inherited from class distributed.core.DistributedJob
  distributed.core.DistributedJob.JobStatus

Field Summary
- Fields inherited from class weka.distributed.hadoop.HadoopJob
  COLT_JAR, DISTRIBUTED_WEKA_BASE_JAR, DISTRIBUTED_WEKA_HADOOP_JAR, JCOMMON_JAR, JFREECHART_JAR, LA4J_JAR, OPEN_CSV_JAR
- Fields inherited from class distributed.core.DistributedJob
  WEKA_ADDITIONAL_PACKAGES_KEY

Constructor Summary

Constructors
Constructor and Description

RandomizedDataChunkHadoopJob()

Constructors
Constructor and Description
`RandomizedDataChunkHadoopJob()`

Method Summary

Methods
Modifier and Type	Method and Description
`java.lang.String`	`classAttributeTipText()` Tip text for this property
`java.lang.String`	`cleanOutputDirectoryTipText()` Tip text for this property
`java.lang.String`	`getClassAttribute()` Get the name or index of the class attribute ("first" and "last" can also be used)
`boolean`	`getCleanOutputDirectory()` Get whether to blow away the output directory before running.
`java.lang.String`	`getCSVMapTaskOptions()` Get the options to the header job
`boolean`	`getDontDefaultToLastAttIfClassNotSpecified()` Non-command line option to allow clients to turn off the default behavior of defaulting to setting the last attribute as the class if not explicitly specified.
`java.lang.String[]`	`getJobOptionsOnly()` Get the options for this job only
`java.lang.String`	`getNumInstancesPerRandomizedDataChunk()` Get the number of instances that each randomly shuffled data chunk should have.
`java.lang.String`	`getNumRandomizedDataChunks()` Get the number of randomly shuffled data chunks to create.
`java.lang.String[]`	`getOptions()`
`java.lang.String`	`getRandomizedChunkOutputPath()` Get the path to the output directory for this job
`java.lang.String`	`getRandomSeed()` Get the random seed for shuffling the data
`java.lang.String`	`globalInfo()` Help information
`java.util.Enumeration<Option>`	`listOptions()`
`static void`	`main(java.lang.String[] args)`
`java.lang.String`	`numInstancesPerRandomizedDataChunkTipText()` Tip text for this property
`java.lang.String`	`numRandomizedDataChunksTipText()` Tip text for this property
`java.lang.String`	`randomSeedTipText()` Tip text for this property
`void`	`run(java.lang.Object toRun, java.lang.String[] options)`
`boolean`	`runJob()`
`void`	`setClassAttribute(java.lang.String c)` Set the name or index of the class attribute ("first" and "last" can also be used)
`void`	`setCleanOutputDirectory(boolean clean)` Set whether to blow away the output directory before running.
`void`	`setCSVMapTaskOptions(java.lang.String opts)` Set the options to the header job
`void`	`setDontDefaultToLastAttIfClassNotSpecified(boolean d)` Non-command line option to allow clients to turn off the default behavior of defaulting to setting the last attribute as the class if not explicitly specified.
`void`	`setNumInstancesPerRandomizedDataChunk(java.lang.String insts)` Set the number of instances that each randomly shuffled data chunk should have.
`void`	`setNumRandomizedDataChunks(java.lang.String chunks)` Set the number of randomly shuffled data chunks to create.
`void`	`setOptions(java.lang.String[] options)`
`void`	`setRandomSeed(java.lang.String seed)` Set the random seed for shuffling the data

Methods inherited from class weka.distributed.hadoop.HadoopJob
additionalWekaPackagesTipText, cleanOutputDirectory, deubgTipText, getAdditionalWekaPackages, getBaseOptionsOnly, getDebug, getLoggingInterval, getMapNumber, getMapReduceJobConfig, getMapReduceNumber, getPathToWekaJar, getReduceNumber, loggingIntervalTipText, pathToWekaJarTipText, setAdditionalWekaPackages, setDebug, setLoggingInterval, setMapReduceJobConfig, setPathToWekaJar

Methods inherited from class distributed.core.DistributedJob
environmentSubstitute, getAdditionalWekaPackageNames, getJobName, getJobStatus, getLog, makeOptionsStr, parseInstance, setEnvironment, setJobDescription, setJobName, setJobStatus, setLog, setStatusMessagePrefix, stopJob

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - RandomizedDataChunkHadoopJob
```
public RandomizedDataChunkHadoopJob()
```
- Method Detail
  - globalInfo
```
public java.lang.String globalInfo()
```
    Help information
    
    Returns:
    help information for this job
  - setCSVMapTaskOptions
```
public void setCSVMapTaskOptions(java.lang.String opts)
```
    Set the options to the header job
    
    Parameters:
    opts - options to the header job
  - getCSVMapTaskOptions
```
public java.lang.String getCSVMapTaskOptions()
```
    Get the options to the header job
    
    Returns:
    options to the header job
  - numRandomizedDataChunksTipText
```
public java.lang.String numRandomizedDataChunksTipText()
```
    Tip text for this property
    
    Returns:
    tip text for this property
  - setDontDefaultToLastAttIfClassNotSpecified
```
public void setDontDefaultToLastAttIfClassNotSpecified(boolean d)
```
    Non-command line option to allow clients to turn off the default behavior of defaulting to setting the last attribute as the class if not explicitly specified.
    
    Parameters:
    d - true if the class is not to be set to the last attribute if the user has not specifically specified a class
  - getDontDefaultToLastAttIfClassNotSpecified
```
public boolean getDontDefaultToLastAttIfClassNotSpecified()
```
    Non-command line option to allow clients to turn off the default behavior of defaulting to setting the last attribute as the class if not explicitly specified.
    
    Returns:
    true if the class is not to be set to the last attribute if the user has not specifically specified a class
  - setNumRandomizedDataChunks
```
public void setNumRandomizedDataChunks(java.lang.String chunks)
```
    Set the number of randomly shuffled data chunks to create. Use in conjunction with createRandomizedDataChunks.
    
    Parameters:
    chunks - the number of chunks to create.
  - getNumRandomizedDataChunks
```
public java.lang.String getNumRandomizedDataChunks()
```
    Get the number of randomly shuffled data chunks to create. Use in conjunction with createRandomizedDataChunks.
    
    Returns:
    the number of chunks to create.
  - numInstancesPerRandomizedDataChunkTipText
```
public java.lang.String numInstancesPerRandomizedDataChunkTipText()
```
    Tip text for this property
    
    Returns:
    tip text for this property
  - setNumInstancesPerRandomizedDataChunk
```
public void setNumInstancesPerRandomizedDataChunk(java.lang.String insts)
```
    Set the number of instances that each randomly shuffled data chunk should have. Use in conjunction with createRandomizedDataChunks.
    
    Parameters:
    insts - the number of instances that each randomly shuffled data chunk should contain
  - getNumInstancesPerRandomizedDataChunk
```
public java.lang.String getNumInstancesPerRandomizedDataChunk()
```
    Get the number of instances that each randomly shuffled data chunk should have. Use in conjunction with createRandomizedDataChunks.
    
    Returns:
    the number of instances that each randomly shuffled data chunk should contain
  - classAttributeTipText
```
public java.lang.String classAttributeTipText()
```
    Tip text for this property
    
    Returns:
    tip text for this property
  - setClassAttribute
```
public void setClassAttribute(java.lang.String c)
```
    Set the name or index of the class attribute ("first" and "last" can also be used)
    
    Parameters:
    c - the name or index of the class attribute
  - getClassAttribute
```
public java.lang.String getClassAttribute()
```
    Get the name or index of the class attribute ("first" and "last" can also be used)
    
    Returns:
    the name or index of the class attribute
  - randomSeedTipText
```
public java.lang.String randomSeedTipText()
```
    Tip text for this property
    
    Returns:
    tip text for this property
  - setRandomSeed
```
public void setRandomSeed(java.lang.String seed)
```
    Set the random seed for shuffling the data
    
    Parameters:
    seed - the random seed to use
  - getRandomSeed
```
public java.lang.String getRandomSeed()
```
    Get the random seed for shuffling the data
    
    Returns:
    the random seed to use
  - cleanOutputDirectoryTipText
```
public java.lang.String cleanOutputDirectoryTipText()
```
    Tip text for this property
    
    Returns:
    tip text for this property
  - setCleanOutputDirectory
```
public void setCleanOutputDirectory(boolean clean)
```
    Set whether to blow away the output directory before running. If an output directory exists (and is populated with chunk files) then deleting this prior to running will force the job to run.
    
    Parameters:
    clean - true if the output directory should be deleted before first (thus forcing the job to run if there was a populated output directory already).
  - getCleanOutputDirectory
```
public boolean getCleanOutputDirectory()
```
    Get whether to blow away the output directory before running. If an output directory exists (and is populated with chunk files) then deleting this prior to running will force the job to run.
    
    Returns:
    true if the output directory should be deleted before first (thus forcing the job to run if there was a populated output directory already).
  - listOptions
```
public java.util.Enumeration<Option> listOptions()
```
    Specified by:
    
    listOptions in interface OptionHandler
    
    Overrides:
    
    listOptions in class HadoopJob
  - setOptions
```
public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
```
    Specified by:
    
    setOptions in interface OptionHandler
    
    Overrides:
    
    setOptions in class HadoopJob
    
    Throws:
    
    java.lang.Exception
  - getOptions
```
public java.lang.String[] getOptions()
```
    Specified by:
    
    getOptions in interface OptionHandler
    
    Overrides:
    
    getOptions in class HadoopJob
  - getJobOptionsOnly
```
public java.lang.String[] getJobOptionsOnly()
```
    Get the options for this job only
    
    Returns:
    the options for this job only
  - runJob
```
public boolean runJob()
               throws weka.distributed.DistributedWekaException
```
    Specified by:
    
    runJob in class distributed.core.DistributedJob
    
    Throws:
    
    weka.distributed.DistributedWekaException
  - getRandomizedChunkOutputPath
```
public java.lang.String getRandomizedChunkOutputPath()
```
    Get the path to the output directory for this job
    
    Returns:
    the path to the output directory for this job
  - run
```
public void run(java.lang.Object toRun,
       java.lang.String[] options)
         throws java.lang.IllegalArgumentException
```
    Specified by:
    
    run in interface CommandlineRunnable
    
    Throws:
    
    java.lang.IllegalArgumentException
  - main
```
public static void main(java.lang.String[] args)
```

Class RandomizedDataChunkHadoopJob

Nested Class Summary

Nested classes/interfaces inherited from class distributed.core.DistributedJob

Field Summary

Fields inherited from class weka.distributed.hadoop.HadoopJob

Fields inherited from class distributed.core.DistributedJob

Constructor Summary

Method Summary

Methods inherited from class weka.distributed.hadoop.HadoopJob

Methods inherited from class distributed.core.DistributedJob

Methods inherited from class java.lang.Object

Constructor Detail

RandomizedDataChunkHadoopJob

Method Detail

globalInfo

setCSVMapTaskOptions

getCSVMapTaskOptions

numRandomizedDataChunksTipText

setDontDefaultToLastAttIfClassNotSpecified

getDontDefaultToLastAttIfClassNotSpecified

setNumRandomizedDataChunks

getNumRandomizedDataChunks

numInstancesPerRandomizedDataChunkTipText

setNumInstancesPerRandomizedDataChunk

getNumInstancesPerRandomizedDataChunk

classAttributeTipText

setClassAttribute

getClassAttribute

randomSeedTipText

setRandomSeed

getRandomSeed

cleanOutputDirectoryTipText

setCleanOutputDirectory

getCleanOutputDirectory

listOptions

setOptions

getOptions

getJobOptionsOnly

runJob

getRandomizedChunkOutputPath

run

main