CanopyClustererSparkJob

java.lang.Object
- distributed.core.DistributedJob
- - weka.distributed.spark.SparkJob
  - - weka.distributed.spark.CanopyClustererSparkJob

All Implemented Interfaces:

java.io.Serializable, CommandlineRunnable, EnvironmentHandler, OptionHandler
```
public class CanopyClustererSparkJob
extends SparkJob
implements CommandlineRunnable
```
Spark job for training a Canopy clusterer

Version:

$Revision: 12253 $

Author:

Mark Hall (mhall{[at]}pentaho{[dot]}com)

See Also:

Serialized Form

Nested Class Summary
- Nested classes/interfaces inherited from class weka.distributed.spark.SparkJob
  SparkJob.NoKeyTextOutputFormat<K,V>
- Nested classes/interfaces inherited from class distributed.core.DistributedJob
  distributed.core.DistributedJob.JobStatus

Field Summary
- Fields inherited from class weka.distributed.spark.SparkJob
  TEST_DATA, TRAINING_DATA
- Fields inherited from class distributed.core.DistributedJob
  WEKA_ADDITIONAL_PACKAGES_KEY

Constructor Summary

Constructors
Constructor and Description

CanopyClustererSparkJob()
Constructor

Constructors
Constructor and Description
`CanopyClustererSparkJob()` Constructor

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`java.lang.String`	`assignCanopiesToTrainingDataTipText()` Tip text for this property
`boolean`	`getAssignCanopiesToTrainingData()` Get whether to assign canopies to the training data
`java.lang.String`	`getCanopyMapTaskOptions()`
`java.lang.String`	`getCSVMapTaskOptions()` Get the options to the header job
`org.apache.spark.api.java.JavaRDD<InstanceWithCanopyAssignments>`	`getDataWithCanopiesAssigned()`
`java.lang.String`	`getMaxNumCanopiesReducePhase()` Get the maximum number of canopies to form in the reduce phase
`java.lang.String`	`getModelFileName()` Get the name only for the model file
`java.lang.String[]`	`getOptions()`
`boolean`	`getRandomizeAndStratify()` Get whether to randomize (and stratify) the input data or not
`java.lang.String`	`getRandomizedJobOptions()` Get the options for the randomize/stratify task
`java.lang.String`	`getT1ReducePhase()` Get the T1 distance to use in the reduce phase
`java.lang.String`	`getT2ReducePhase()` Get the T2 distance to use in the reduce phase
`java.util.Enumeration<Option>`	`listOptions()`
`static void`	`main(java.lang.String[] args)`
`java.lang.String`	`maxNumCanopiesReducePhaseTipText()` Tip text for this property
`java.lang.String`	`modelFileNameTipText()` Tip text for this property
`void`	`run(java.lang.Object toRun, java.lang.String[] options)`
`boolean`	`runJobWithContext(org.apache.spark.api.java.JavaSparkContext sparkContext)` Clients to implement
`void`	`setAssignCanopiesToTrainingData(boolean assign)` Set whether to assign canopies to the training data
`void`	`setCanopyMapTaskOptions(java.lang.String opts)`
`void`	`setCSVMapTaskOptions(java.lang.String opts)` Set the options to the header job
`void`	`setMaxNumCanopiesReducePhase(java.lang.String max)` Set the maximum number of canopies to form in the reduce phase
`void`	`setModelFileName(java.lang.String m)` Set the name only for the model file
`void`	`setOptions(java.lang.String[] options)`
`void`	`setRandomizeAndStratify(boolean r)` Set whether to randomize (and stratify) the input data or not
`void`	`setRandomizeJobOptions(java.lang.String opts)` Set the options for the randomize/stratify task
`void`	`setT1ReducePhase(java.lang.String t1)` Set the T1 distance to use in the reduce phase
`void`	`setT2ReducePhase(java.lang.String t2)` Set the T2 distance to use in the reduce phase
`java.lang.String`	`t1ReducePhaseTipText()` Tip text for this property
`java.lang.String`	`t2ReducePhaseTipText()` Tip text for this property

Methods inherited from class distributed.core.DistributedJob
environmentSubstitute, getAdditionalWekaPackageNames, getJobName, getJobStatus, getLog, logMessage, logMessage, logMessage, makeOptionsStr, objectRowToInstance, parseInstance, postExecution, preExecution, setEnvironment, setJobDescription, setJobName, setJobStatus, setLog, setStatusMessagePrefix, stackTraceToString, statusMessage, stopJob

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface weka.core.CommandlineRunnable
postExecution, preExecution

- Constructor Detail
  - CanopyClustererSparkJob
```
public CanopyClustererSparkJob()
```
    Constructor
- Method Detail
  - main
```
public static void main(java.lang.String[] args)
```
  - listOptions
```
public java.util.Enumeration<Option> listOptions()
```
    Specified by:
    
    listOptions in interface OptionHandler
    
    Overrides:
    
    listOptions in class SparkJob
  - getOptions
```
public java.lang.String[] getOptions()
```
    Specified by:
    
    getOptions in interface OptionHandler
    
    Overrides:
    
    getOptions in class SparkJob
  - setOptions
```
public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
```
    Specified by:
    
    setOptions in interface OptionHandler
    
    Overrides:
    
    setOptions in class SparkJob
    
    Throws:
    
    java.lang.Exception
  - getCanopyMapTaskOptions
```
public java.lang.String getCanopyMapTaskOptions()
```
  - setCanopyMapTaskOptions
```
public void setCanopyMapTaskOptions(java.lang.String opts)
```
  - getCSVMapTaskOptions
```
public java.lang.String getCSVMapTaskOptions()
```
    Get the options to the header job
    
    Returns:
    
    options to the header job
  - setCSVMapTaskOptions
```
public void setCSVMapTaskOptions(java.lang.String opts)
```
    Set the options to the header job
    
    Parameters:
    
    opts - options to the header job
  - getRandomizeAndStratify
```
public boolean getRandomizeAndStratify()
```
    Get whether to randomize (and stratify) the input data or not
    
    Returns:
    
    true if the input data is to be randomized and stratified
  - setRandomizeAndStratify
```
public void setRandomizeAndStratify(boolean r)
```
    Set whether to randomize (and stratify) the input data or not
    
    Parameters:
    
    r - true if the input data is to be randomized and stratified
  - modelFileNameTipText
```
public java.lang.String modelFileNameTipText()
```
    Tip text for this property
    
    Returns:
    
    tip text for this property
  - getModelFileName
```
public java.lang.String getModelFileName()
```
    Get the name only for the model file
    
    Returns:
    
    the name only (not full path) that the model should be saved to
  - setModelFileName
```
public void setModelFileName(java.lang.String m)
```
    Set the name only for the model file
    
    Parameters:
    
    m - the name only (not full path) that the model should be saved to
  - setRandomizeJobOptions
```
public void setRandomizeJobOptions(java.lang.String opts)
```
    Set the options for the randomize/stratify task
    
    Parameters:
    
    opts - the options for the randomize task
  - getRandomizedJobOptions
```
public java.lang.String getRandomizedJobOptions()
```
    Get the options for the randomize/stratify task
    
    Returns:
    
    the options for the randomize task
  - maxNumCanopiesReducePhaseTipText
```
public java.lang.String maxNumCanopiesReducePhaseTipText()
```
    Tip text for this property
    
    Returns:
    
    tip text for this property
  - getMaxNumCanopiesReducePhase
```
public java.lang.String getMaxNumCanopiesReducePhase()
```
    Get the maximum number of canopies to form in the reduce phase
    
    Returns:
    
    the maximum number of canopies to form in the reduce phase
  - setMaxNumCanopiesReducePhase
```
public void setMaxNumCanopiesReducePhase(java.lang.String max)
```
    Set the maximum number of canopies to form in the reduce phase
    
    Parameters:
    
    max - the maximum number of canopies to form in the reduce phase
  - t1ReducePhaseTipText
```
public java.lang.String t1ReducePhaseTipText()
```
    Tip text for this property
    
    Returns:
    
    the tip text for this property
  - getT1ReducePhase
```
public java.lang.String getT1ReducePhase()
```
    Get the T1 distance to use in the reduce phase
    
    Returns:
    
    the T1 distance to use in the reduce phase
  - setT1ReducePhase
```
public void setT1ReducePhase(java.lang.String t1)
```
    Set the T1 distance to use in the reduce phase
    
    Parameters:
    
    t1 - the T1 distance to use in the reduce phase
  - t2ReducePhaseTipText
```
public java.lang.String t2ReducePhaseTipText()
```
    Tip text for this property
    
    Returns:
    
    the tip text for this property
  - getT2ReducePhase
```
public java.lang.String getT2ReducePhase()
```
    Get the T2 distance to use in the reduce phase
    
    Returns:
    
    the T2 distance to use in the reduce phase
  - setT2ReducePhase
```
public void setT2ReducePhase(java.lang.String t2)
```
    Set the T2 distance to use in the reduce phase
    
    Parameters:
    
    t2 - the T2 distance to use in the reduce phase
  - assignCanopiesToTrainingDataTipText
```
public java.lang.String assignCanopiesToTrainingDataTipText()
```
    Tip text for this property
    
    Returns:
    
    the tip text for this property
  - getAssignCanopiesToTrainingData
```
public boolean getAssignCanopiesToTrainingData()
```
    Get whether to assign canopies to the training data
    
    Returns:
    
    true if the canopies found are to be assigned to the training data (thus creating an new RDD)
  - setAssignCanopiesToTrainingData
```
public void setAssignCanopiesToTrainingData(boolean assign)
```
    Set whether to assign canopies to the training data
    
    Parameters:
    
    assign - true if the canopies found are to be assigned to the training data (thus creating an new RDD)
  - runJobWithContext
```
public boolean runJobWithContext(org.apache.spark.api.java.JavaSparkContext sparkContext)
                          throws java.io.IOException,
                                 weka.distributed.DistributedWekaException
```
    Description copied from class: SparkJob
    
    Clients to implement
    
    Specified by:
    
    runJobWithContext in class SparkJob
    
    Parameters:
    
    sparkContext - the context to use
    
    Returns:
    
    true if the job was successful
    
    Throws:
    
    java.io.IOException - if a IO problem occurs
    
    weka.distributed.DistributedWekaException - if any other problem occurs
  - getDataWithCanopiesAssigned
```
public org.apache.spark.api.java.JavaRDD<InstanceWithCanopyAssignments> getDataWithCanopiesAssigned()
                                                                                             throws weka.distributed.DistributedWekaException
```
    Throws:
    
    weka.distributed.DistributedWekaException
  - run
```
public void run(java.lang.Object toRun,
                java.lang.String[] options)
         throws java.lang.IllegalArgumentException
```
    Specified by:
    
    run in interface CommandlineRunnable
    
    Overrides:
    
    run in class distributed.core.DistributedJob
    
    Throws:
    
    java.lang.IllegalArgumentException

Class CanopyClustererSparkJob

Nested Class Summary

Nested classes/interfaces inherited from class weka.distributed.spark.SparkJob

Nested classes/interfaces inherited from class distributed.core.DistributedJob

Field Summary

Fields inherited from class weka.distributed.spark.SparkJob

Fields inherited from class distributed.core.DistributedJob

Constructor Summary

Method Summary

Methods inherited from class weka.distributed.spark.SparkJob

Methods inherited from class distributed.core.DistributedJob

Methods inherited from class java.lang.Object

Methods inherited from interface weka.core.CommandlineRunnable

Constructor Detail

CanopyClustererSparkJob

Method Detail

main

listOptions

getOptions

setOptions

getCanopyMapTaskOptions

setCanopyMapTaskOptions

getCSVMapTaskOptions

setCSVMapTaskOptions

getRandomizeAndStratify

setRandomizeAndStratify

modelFileNameTipText

getModelFileName

setModelFileName

setRandomizeJobOptions

getRandomizedJobOptions

maxNumCanopiesReducePhaseTipText

getMaxNumCanopiesReducePhase

setMaxNumCanopiesReducePhase

t1ReducePhaseTipText

getT1ReducePhase

setT1ReducePhase

t2ReducePhaseTipText

getT2ReducePhase

setT2ReducePhase

assignCanopiesToTrainingDataTipText

getAssignCanopiesToTrainingData

setAssignCanopiesToTrainingData

runJobWithContext

getDataWithCanopiesAssigned

run