KMeansClustererHadoopJob

java.lang.Object
- distributed.core.DistributedJob
- - weka.distributed.hadoop.HadoopJob
  - - weka.distributed.hadoop.KMeansClustererHadoopJob

All Implemented Interfaces:

java.io.Serializable, CommandlineRunnable, EnvironmentHandler, OptionHandler, ClustererProducer, TextProducer
```
public class KMeansClustererHadoopJob
extends HadoopJob
implements CommandlineRunnable, TextProducer, ClustererProducer
```
Hadoop job for building a k-means clusterer usining either random centroid or k-means|| initialization.

Version:

$Revision: 11164 $

Author:

Mark Hall (mhall{[at]}pentaho{[dot]}com)

See Also:
Serialized Form

Nested Class Summary
- Nested classes/interfaces inherited from class distributed.core.DistributedJob
  distributed.core.DistributedJob.JobStatus

Field Summary
- Fields inherited from class weka.distributed.hadoop.HadoopJob
  COLT_JAR, DISTRIBUTED_WEKA_BASE_JAR, DISTRIBUTED_WEKA_HADOOP_JAR, JCOMMON_JAR, JFREECHART_JAR, LA4J_JAR, OPEN_CSV_JAR
- Fields inherited from class distributed.core.DistributedJob
  WEKA_ADDITIONAL_PACKAGES_KEY

Constructor Summary

Constructors
Constructor and Description

KMeansClustererHadoopJob()
Constructor

Constructors
Constructor and Description
`KMeansClustererHadoopJob()` Constructor

Method Summary

Methods
Modifier and Type	Method and Description
`java.lang.String`	`convergenceToleranceTipText()` Tip text for this property
`java.lang.String`	`displayCentroidStdDevsTipText()` Tip text for this property
`Clusterer`	`getClusterer()`
`double`	`getConvergenceTolerance()` Get the convergence tolerance
`java.lang.String`	`getCSVMapTaskOptions()` Get the options to the header job
`boolean`	`getDisplayCentroidStdDevs()` Get whether to display the standard deviations of centroids in textual output of the model
`boolean`	`getInitWithRandomCentroids()` Get whether to initialize with randomly selected centroids rather than using the k-means\|\| initialization procedure.
`java.lang.String[]`	`getJobOptionsOnly()` Get the options for this job only
`java.lang.String`	`getKMeansParallelInitSteps()` Get the number of iterations of the k-means\|\| initialization routine to perform
`java.lang.String`	`getModelFileName()` Get the name only for the model file
`java.lang.String`	`getNumClusters()` Get the number of clusters to find
`java.lang.String`	`getNumIterations()` Get the maximum number of k-means iterations to perform
`java.lang.String`	`getNumNodesInCluster()` Get the number of nodes in the Hadoop cluster
`java.lang.String`	`getNumRuns()` Get the number of k-means runs to perform in parallel
`java.lang.String[]`	`getOptions()`
`java.lang.String`	`getRandomizedJobOptions()` Get the options for the randomize/stratify task
`boolean`	`getRandomlyShuffleData()` Get whether to randomly shuffle the order of the instances in the input data before clustering
`java.lang.String`	`getRandomlyShuffleDataNumChunks()` Get the number of randomly shuffled data chunks to create.
`java.lang.String`	`getRandomSeed()` Get the seed for random number generation
`java.lang.String`	`getText()`
`Instances`	`getTrainingHeader()`
`java.lang.String`	`globalInfo()` Help information
`java.lang.String`	`initWithRandomCentroidsTipText()` Tip text for this property
`java.lang.String`	`kMeansParallelInitStepsTipText()` Tip text for this property.
`java.util.Enumeration<Option>`	`listOptions()`
`static void`	`main(java.lang.String[] args)` Main method for executing this job from the command line
`java.lang.String`	`modelFileNameTipText()` Tip text for this property
`java.lang.String`	`numClustersTipText()` Tip text for this property.
`java.lang.String`	`numIterationsTipText()` Tip text for this property.
`java.lang.String`	`numNodesInClusterTipText()` Tip text for this property
`java.lang.String`	`numRunsTipText()` Tip text for this property.
`java.lang.String`	`randomlyShuffleDataNumChunksTipText()` Tip text for this property
`java.lang.String`	`randomlyShuffleDataTipText()` Tip text for this property
`java.lang.String`	`randomSeedTipText()` Tip text for this property.
`void`	`run(java.lang.Object toRun, java.lang.String[] args)`
`boolean`	`runJob()`
`void`	`setConvergenceTolerance(double tol)` Set the convergence tolerance
`void`	`setCSVMapTaskOptions(java.lang.String opts)` Set the options to the header job
`void`	`setDisplayCentroidStdDevs(boolean d)` Set whether to display the standard deviations of centroids in textual output of the model
`void`	`setInitWithRandomCentroids(boolean init)` Set whether to initialize with randomly selected centroids rather than using the k-means\|\| initialization procedure.
`void`	`setKMeansParallelInitSteps(java.lang.String steps)` Set the number of iterations of the k-means\|\| initialization routine to perform
`void`	`setModelFileName(java.lang.String m)` Set the name only for the model file
`void`	`setNumClusters(java.lang.String numClusters)` Set the number of clusters to find
`void`	`setNumIterations(java.lang.String numIts)` Set the maximum number of k-means iterations to perform
`void`	`setNumNodesInCluster(java.lang.String n)` Set the number of nodes in the Hadoop cluster
`void`	`setNumRuns(java.lang.String numRuns)` Set the number of k-means runs to perform in parallel
`void`	`setOptions(java.lang.String[] options)`
`void`	`setRandomizeJobOptions(java.lang.String opts)` Set the options for the randomize/stratify task
`void`	`setRandomlyShuffleData(boolean r)` Set whether to randomly shuffle the order of the instances in the input data before clustering
`void`	`setRandomlyShuffleDataNumChunks(java.lang.String chunks)` Set the number of randomly shuffled data chunks to create.
`void`	`setRandomSeed(java.lang.String seed)` Set the seed for random number generation
`void`	`stopJob()`

Methods inherited from class weka.distributed.hadoop.HadoopJob
additionalWekaPackagesTipText, cleanOutputDirectory, deubgTipText, getAdditionalWekaPackages, getBaseOptionsOnly, getDebug, getLoggingInterval, getMapNumber, getMapReduceJobConfig, getMapReduceNumber, getPathToWekaJar, getReduceNumber, loggingIntervalTipText, pathToWekaJarTipText, setAdditionalWekaPackages, setDebug, setLoggingInterval, setMapReduceJobConfig, setPathToWekaJar

Methods inherited from class distributed.core.DistributedJob
environmentSubstitute, getAdditionalWekaPackageNames, getJobName, getJobStatus, getLog, makeOptionsStr, parseInstance, setEnvironment, setJobDescription, setJobName, setJobStatus, setLog, setStatusMessagePrefix

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - KMeansClustererHadoopJob
```
public KMeansClustererHadoopJob()
```
    Constructor
- Method Detail
  - globalInfo
```
public java.lang.String globalInfo()
```
    Help information
    
    Returns:
    the help information for this job
  - convergenceToleranceTipText
```
public java.lang.String convergenceToleranceTipText()
```
    Tip text for this property
    
    Returns:
    the tip text for this property
  - setConvergenceTolerance
```
public void setConvergenceTolerance(double tol)
```
    Set the convergence tolerance
    
    Parameters:
    tol - the convergence tolerance
  - getConvergenceTolerance
```
public double getConvergenceTolerance()
```
    Get the convergence tolerance
    
    Returns:
    the convergence tolerance
  - initWithRandomCentroidsTipText
```
public java.lang.String initWithRandomCentroidsTipText()
```
    Tip text for this property
    
    Returns:
    the tip text for this property
  - setInitWithRandomCentroids
```
public void setInitWithRandomCentroids(boolean init)
```
    Set whether to initialize with randomly selected centroids rather than using the k-means|| initialization procedure.
    
    Parameters:
    init - true if randomly selected initial centroids are to be used
  - getInitWithRandomCentroids
```
public boolean getInitWithRandomCentroids()
```
    Get whether to initialize with randomly selected centroids rather than using the k-means|| initialization procedure.
    
    Returns:
    true if randomly selected initial centroids are to be used
  - numNodesInClusterTipText
```
public java.lang.String numNodesInClusterTipText()
```
    Tip text for this property
    
    Returns:
    the tip text for this property
  - setNumNodesInCluster
```
public void setNumNodesInCluster(java.lang.String n)
```
    Set the number of nodes in the Hadoop cluster
    
    Parameters:
    n - the number of nodes in the Hadoop cluster
  - getNumNodesInCluster
```
public java.lang.String getNumNodesInCluster()
```
    Get the number of nodes in the Hadoop cluster
    
    Returns:
    the number of nodes in the Hadoop cluster
  - randomlyShuffleDataNumChunksTipText
```
public java.lang.String randomlyShuffleDataNumChunksTipText()
```
    Tip text for this property
    
    Returns:
    tip text for this property
  - setRandomlyShuffleDataNumChunks
```
public void setRandomlyShuffleDataNumChunks(java.lang.String chunks)
```
    Set the number of randomly shuffled data chunks to create. Use in conjunction with createRandomizedDataChunks.
    
    Parameters:
    chunks - the number of chunks to create.
  - getRandomlyShuffleDataNumChunks
```
public java.lang.String getRandomlyShuffleDataNumChunks()
```
    Get the number of randomly shuffled data chunks to create. Use in conjunction with createRandomizedDataChunks.
    
    Returns:
    the number of chunks to create.
  - modelFileNameTipText
```
public java.lang.String modelFileNameTipText()
```
    Tip text for this property
    
    Returns:
    tip text for this property
  - setModelFileName
```
public void setModelFileName(java.lang.String m)
```
    Set the name only for the model file
    
    Parameters:
    m - the name only (not full path) that the model should be saved to
  - getModelFileName
```
public java.lang.String getModelFileName()
```
    Get the name only for the model file
    
    Returns:
    the name only (not full path) that the model should be saved to
  - randomlyShuffleDataTipText
```
public java.lang.String randomlyShuffleDataTipText()
```
    Tip text for this property
    
    Returns:
    the tip text for this property
  - setRandomlyShuffleData
```
public void setRandomlyShuffleData(boolean r)
```
    Set whether to randomly shuffle the order of the instances in the input data before clustering
    
    Parameters:
    r - true if the data should be randomly shuffled
  - getRandomlyShuffleData
```
public boolean getRandomlyShuffleData()
```
    Get whether to randomly shuffle the order of the instances in the input data before clustering
    
    Returns:
    true if the data should be randomly shuffled
  - numClustersTipText
```
public java.lang.String numClustersTipText()
```
    Tip text for this property.
    
    Returns:
    the tip text for this property
  - setNumClusters
```
public void setNumClusters(java.lang.String numClusters)
```
    Set the number of clusters to find
    
    Parameters:
    numClusters - the number of clusters to find
  - getNumClusters
```
public java.lang.String getNumClusters()
```
    Get the number of clusters to find
    
    Returns:
    the number of clusters to find
  - numRunsTipText
```
public java.lang.String numRunsTipText()
```
    Tip text for this property.
    
    Returns:
    the tip text for this property
  - setNumRuns
```
public void setNumRuns(java.lang.String numRuns)
```
    Set the number of k-means runs to perform in parallel
    
    Parameters:
    numRuns - the number of k-means runs to perform in parallel
  - getNumRuns
```
public java.lang.String getNumRuns()
```
    Get the number of k-means runs to perform in parallel
    
    Returns:
    the number of k-means runs to perform in parallel
  - numIterationsTipText
```
public java.lang.String numIterationsTipText()
```
    Tip text for this property.
    
    Returns:
    the tip text for this property
  - setNumIterations
```
public void setNumIterations(java.lang.String numIts)
```
    Set the maximum number of k-means iterations to perform
    
    Parameters:
    numIts - the maximum number of iterations to perform
  - getNumIterations
```
public java.lang.String getNumIterations()
```
    Get the maximum number of k-means iterations to perform
    
    Returns:
    the maximum number of iterations to perform
  - randomSeedTipText
```
public java.lang.String randomSeedTipText()
```
    Tip text for this property.
    
    Returns:
    the tip text for this property
  - setRandomSeed
```
public void setRandomSeed(java.lang.String seed)
```
    Set the seed for random number generation
    
    Parameters:
    seed - the seed for the random number generator
  - getRandomSeed
```
public java.lang.String getRandomSeed()
```
    Get the seed for random number generation
    
    Returns:
    the seed for the random number generator
  - kMeansParallelInitStepsTipText
```
public java.lang.String kMeansParallelInitStepsTipText()
```
    Tip text for this property.
    
    Returns:
    the tip text for this property
  - setKMeansParallelInitSteps
```
public void setKMeansParallelInitSteps(java.lang.String steps)
```
    Set the number of iterations of the k-means|| initialization routine to perform
    
    Parameters:
    steps - the number of iterations of the k-means|| init routine to perform
  - getKMeansParallelInitSteps
```
public java.lang.String getKMeansParallelInitSteps()
```
    Get the number of iterations of the k-means|| initialization routine to perform
    
    Returns:
    the number of iterations of the k-means|| init routine to perform
  - setRandomizeJobOptions
```
public void setRandomizeJobOptions(java.lang.String opts)
```
    Set the options for the randomize/stratify task
    
    Parameters:
    opts - the options for the randomize task
  - getRandomizedJobOptions
```
public java.lang.String getRandomizedJobOptions()
```
    Get the options for the randomize/stratify task
    
    Returns:
    the options for the randomize task
  - getCSVMapTaskOptions
```
public java.lang.String getCSVMapTaskOptions()
```
    Get the options to the header job
    
    Returns:
    options to the header job
  - setCSVMapTaskOptions
```
public void setCSVMapTaskOptions(java.lang.String opts)
```
    Set the options to the header job
    
    Parameters:
    opts - options to the header job
  - displayCentroidStdDevsTipText
```
public java.lang.String displayCentroidStdDevsTipText()
```
    Tip text for this property
    
    Returns:
    the tip text for this property
  - setDisplayCentroidStdDevs
```
public void setDisplayCentroidStdDevs(boolean d)
```
    Set whether to display the standard deviations of centroids in textual output of the model
    
    Parameters:
    d - true if standard deviations are to be displayed
  - getDisplayCentroidStdDevs
```
public boolean getDisplayCentroidStdDevs()
```
    Get whether to display the standard deviations of centroids in textual output of the model
    
    Returns:
    true if standard deviations are to be displayed
  - listOptions
```
public java.util.Enumeration<Option> listOptions()
```
    Specified by:
    
    listOptions in interface OptionHandler
    
    Overrides:
    
    listOptions in class HadoopJob
  - setOptions
```
public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
```
    Specified by:
    
    setOptions in interface OptionHandler
    
    Overrides:
    
    setOptions in class HadoopJob
    
    Throws:
    
    java.lang.Exception
  - getJobOptionsOnly
```
public java.lang.String[] getJobOptionsOnly()
```
    Get the options for this job only
    
    Returns:
    the options for this job only
  - getOptions
```
public java.lang.String[] getOptions()
```
    Specified by:
    
    getOptions in interface OptionHandler
    
    Overrides:
    
    getOptions in class HadoopJob
  - runJob
```
public boolean runJob()
               throws weka.distributed.DistributedWekaException
```
    Specified by:
    
    runJob in class distributed.core.DistributedJob
    
    Throws:
    
    weka.distributed.DistributedWekaException
  - getClusterer
```
public Clusterer getClusterer()
```
    Specified by:
    
    getClusterer in interface ClustererProducer
  - getTrainingHeader
```
public Instances getTrainingHeader()
```
    Specified by:
    
    getTrainingHeader in interface ClustererProducer
  - getText
```
public java.lang.String getText()
```
    Specified by:
    
    getText in interface TextProducer
  - stopJob
```
public void stopJob()
```
    Overrides:
    
    stopJob in class distributed.core.DistributedJob
  - main
```
public static void main(java.lang.String[] args)
```
    Main method for executing this job from the command line
    
    Parameters:
    args - arguments to the job
  - run
```
public void run(java.lang.Object toRun,
       java.lang.String[] args)
```
    Specified by:
    
    run in interface CommandlineRunnable

Class KMeansClustererHadoopJob

Nested Class Summary

Nested classes/interfaces inherited from class distributed.core.DistributedJob

Field Summary

Fields inherited from class weka.distributed.hadoop.HadoopJob

Fields inherited from class distributed.core.DistributedJob

Constructor Summary

Method Summary

Methods inherited from class weka.distributed.hadoop.HadoopJob

Methods inherited from class distributed.core.DistributedJob

Methods inherited from class java.lang.Object

Constructor Detail

KMeansClustererHadoopJob

Method Detail

globalInfo

convergenceToleranceTipText

setConvergenceTolerance

getConvergenceTolerance

initWithRandomCentroidsTipText

setInitWithRandomCentroids

getInitWithRandomCentroids

numNodesInClusterTipText

setNumNodesInCluster

getNumNodesInCluster

randomlyShuffleDataNumChunksTipText

setRandomlyShuffleDataNumChunks

getRandomlyShuffleDataNumChunks

modelFileNameTipText

setModelFileName

getModelFileName

randomlyShuffleDataTipText

setRandomlyShuffleData

getRandomlyShuffleData

numClustersTipText

setNumClusters

getNumClusters

numRunsTipText

setNumRuns

getNumRuns

numIterationsTipText

setNumIterations

getNumIterations

randomSeedTipText

setRandomSeed

getRandomSeed

kMeansParallelInitStepsTipText

setKMeansParallelInitSteps

getKMeansParallelInitSteps

setRandomizeJobOptions

getRandomizedJobOptions

getCSVMapTaskOptions

setCSVMapTaskOptions

displayCentroidStdDevsTipText

setDisplayCentroidStdDevs

getDisplayCentroidStdDevs

listOptions

setOptions

getJobOptionsOnly

getOptions

runJob

getClusterer

getTrainingHeader

getText

stopJob

main

run