KMeansMapTask

java.lang.Object
- weka.distributed.KMeansMapTask

All Implemented Interfaces:

java.io.Serializable, weka.core.OptionHandler
```
public class KMeansMapTask
extends java.lang.Object
implements weka.core.OptionHandler, java.io.Serializable
```
Map task for k-means clustering. Uses a "pre-constructed" KMeans cluster internally to perform the clustering (i.e. assigning training points to clusters). This is constructed with the centroids found in the previous iteration. Maintains (partial) summary stats on each centroid (by re-using the ARFF header summary attributes mechanism). Can use an arbitrary number of Streamable filters for preprocessing the data on the fly.

Version:

$Revision: $

Author:

Mark Hall (mhall{[at]}pentaho{[dot]}com)

See Also:

Serialized Form

Constructor Summary

Constructors
Constructor and Description

KMeansMapTask()

Constructors
Constructor and Description
`KMeansMapTask()`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`weka.core.Instance`	`applyFilters(weka.core.Instance original)` Apply the filters (if any) for this map task to the supplied instance
`weka.core.Instances`	`applyFilters(weka.core.Instances toApplyTo)` Apply the filters (if any) setup for this map task to the supplied instances
`static java.util.List<weka.core.Instances>`	`assignStartPointsFromList(int numRuns, int numClusters, java.util.List<weka.core.Instance> candidates, weka.core.Instances headerNoSummary)` Utility method to choose start points for a number of runs of k-means given a list of randomly selected instance objects.
`double`	`distance(weka.core.Instance one, weka.core.Instance two)` Computes the distance between the two supplied instances
`java.lang.String`	`dontReplaceMissingValuesTipText()` Returns the tip text for this property.
`java.lang.String`	`filtersToUseTipText()` The tool tip text for this property.
`java.util.List<weka.core.Instances>`	`getCentroidStats()` Get the summary stats for each centroid
`boolean`	`getConverged()` Get whether the run of k-means that this map tasks is associated with has converged
`weka.core.NormalizableDistance`	`getDistanceFunction()` Get the distance function in use
`boolean`	`getDontReplaceMissingValues()` Gets whether missing values are to be replaced.
`weka.filters.Filter[]`	`getFiltersToUse()` Get the user-specified filters to use with the k-means clusterer.
`java.lang.String[]`	`getOptions()`
`weka.filters.Filter`	`getPreprocessingFilters()` Gets the full set of preprocessing filters
`weka.core.Instances`	`getTransformedHeader()` Get the header of the data after it has been through any pre-processing filters specified by the user
`weka.core.Instances`	`init(weka.core.Instances headerWithSummary)` Initilizes the map task.
`java.util.Enumeration<weka.core.Option>`	`listOptions()`
`void`	`processInstance(weka.core.Instance toProcess)` Processes a training instance.
`void`	`setCentroids(weka.core.Instances centers)` Set the cluster centroids to use for this iteration.
`void`	`setConverged(boolean converged)` Set whether the run of k-means that this map is associated with has converged or not
`void`	`setDontReplaceMissingValues(boolean r)` Sets whether missing values are to be replaced.
`void`	`setDummyDistancePrimingData(weka.core.Instances priming)` Set the dummy priming data (two-instance dataset that contains global min/max for numeric attributes) for the distance function to use when normalizing numeric attributes.
`void`	`setFiltersToUse(weka.filters.Filter[] toUse)` Set the user-specified filters to use with the k-means clusterer.
`void`	`setOptions(java.lang.String[] options)`

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface weka.core.OptionHandler
makeCopy

- Constructor Detail
  - KMeansMapTask
```
public KMeansMapTask()
```
- Method Detail
  - init
```
public weka.core.Instances init(weka.core.Instances headerWithSummary)
                         throws DistributedWekaException
```
    Initilizes the map task. Configures any filters required.
    
    Parameters:
    
    headerWithSummary - header of the incoming instances with summary attributes included
    
    Returns:
    
    the header (without summary attributes) after it has been through any filters that the user may have specified. This structure is needed by the KMeansReduceTask
    
    Throws:
    
    DistributedWekaException
  - setDummyDistancePrimingData
```
public void setDummyDistancePrimingData(weka.core.Instances priming)
                                 throws DistributedWekaException
```
    Set the dummy priming data (two-instance dataset that contains global min/max for numeric attributes) for the distance function to use when normalizing numeric attributes. This method should be called when filters that transform the data are being used, and *after* the first iteration of k-means has completed. At this point, the reduce task can compute global min/max for transformed attributes using the partial summary metadata for the clusters computed in the first iteration
    
    Parameters:
    
    priming - the dummy priming data to use in the distance function
    
    Throws:
    
    DistributedWekaException
  - processInstance
```
public void processInstance(weka.core.Instance toProcess)
                     throws DistributedWekaException
```
    Processes a training instance. Uses the k-means clusterer to find the nearest centroid to the supplied instance and then updates the summary meta data header for the corresponding centroid with the training instance.
    
    Parameters:
    
    toProcess - the instance to process
    
    Throws:
    
    DistributedWekaException - if a problem occurs
  - distance
```
public double distance(weka.core.Instance one,
                       weka.core.Instance two)
                throws DistributedWekaException
```
    Computes the distance between the two supplied instances
    
    Parameters:
    
    one - the first instance
    
    two - the second instance
    
    Returns:
    
    the distance between the two
    
    Throws:
    
    DistributedWekaException - if a problem occurs
  - getCentroidStats
```
public java.util.List<weka.core.Instances> getCentroidStats()
```
    Get the summary stats for each centroid
    
    Returns:
    
    the summary stats (one instances object) for each centroid
  - dontReplaceMissingValuesTipText
```
public java.lang.String dontReplaceMissingValuesTipText()
```
    Returns the tip text for this property.
    
    Returns:
    
    tip text for this property suitable for displaying in the explorer/experimenter gui
  - setDontReplaceMissingValues
```
public void setDontReplaceMissingValues(boolean r)
```
    Sets whether missing values are to be replaced.
    
    Parameters:
    
    r - true if missing values are to be replaced
  - getDontReplaceMissingValues
```
public boolean getDontReplaceMissingValues()
```
    Gets whether missing values are to be replaced.
    
    Returns:
    
    true if missing values are to be replaced
  - getDistanceFunction
```
public weka.core.NormalizableDistance getDistanceFunction()
```
    Get the distance function in use
    
    Returns:
    
    the distance function in use
  - setCentroids
```
public void setCentroids(weka.core.Instances centers)
```
    Set the cluster centroids to use for this iteration. NOTE: These should be in the transformed space if any filters (including missing values replacement) are being used.
    
    Parameters:
    
    centers - the centroids to use
  - applyFilters
```
public weka.core.Instances applyFilters(weka.core.Instances toApplyTo)
                                 throws java.lang.Exception
```
    Apply the filters (if any) setup for this map task to the supplied instances
    
    Parameters:
    
    toApplyTo - the instances to filer
    
    Returns:
    
    a filtered set of instances
    
    Throws:
    
    java.lang.Exception - if a problem occurs
  - applyFilters
```
public weka.core.Instance applyFilters(weka.core.Instance original)
                                throws java.lang.Exception
```
    Apply the filters (if any) for this map task to the supplied instance
    
    Parameters:
    
    original - the instance in the original space
    
    Returns:
    
    a filtered instance
    
    Throws:
    
    java.lang.Exception - if a problem occurs
  - getPreprocessingFilters
```
public weka.filters.Filter getPreprocessingFilters()
```
    Gets the full set of preprocessing filters
    
    Returns:
    
    preprocessing filter(s) or null if no preprocessing/missing values handling is being done
  - setConverged
```
public void setConverged(boolean converged)
```
    Set whether the run of k-means that this map is associated with has converged or not
    
    Parameters:
    
    converged - true if the run has converged
  - getConverged
```
public boolean getConverged()
```
    Get whether the run of k-means that this map tasks is associated with has converged
    
    Returns:
    
    true if the run has converged
  - getTransformedHeader
```
public weka.core.Instances getTransformedHeader()
```
    Get the header of the data after it has been through any pre-processing filters specified by the user
    
    Returns:
    
    the transformed header
  - getFiltersToUse
```
public weka.filters.Filter[] getFiltersToUse()
```
    Get the user-specified filters to use with the k-means clusterer. Does not include the missing values replacement filter that is automatically configured using global ARFF profiling summary data
    
    Returns:
    
    the user-specified filters to use with k-means
  - setFiltersToUse
```
public void setFiltersToUse(weka.filters.Filter[] toUse)
```
    Set the user-specified filters to use with the k-means clusterer. Does not include the missing values replacement filter that is automatically configured using global ARFF profiling summary data
    
    Parameters:
    
    toUse - the user-specified filters to use with k-means
  - filtersToUseTipText
```
public java.lang.String filtersToUseTipText()
```
    The tool tip text for this property.
    
    Returns:
    
    the tool tip text for this property
  - listOptions
```
public java.util.Enumeration<weka.core.Option> listOptions()
```
    Specified by:
    
    listOptions in interface weka.core.OptionHandler
  - setOptions
```
public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
```
    Specified by:
    
    setOptions in interface weka.core.OptionHandler
    
    Throws:
    
    java.lang.Exception
  - getOptions
```
public java.lang.String[] getOptions()
```
    Specified by:
    
    getOptions in interface weka.core.OptionHandler
  - assignStartPointsFromList
```
public static java.util.List<weka.core.Instances> assignStartPointsFromList(int numRuns,
                                                                            int numClusters,
                                                                            java.util.List<weka.core.Instance> candidates,
                                                                            weka.core.Instances headerNoSummary)
                                                                     throws DistributedWekaException
```
    Utility method to choose start points for a number of runs of k-means given a list of randomly selected instance objects. Avoids choosing duplicate instances as start points for each run.
    
    Parameters:
    
    numRuns - the numeber of runs of k-means
    
    numClusters - the number of clusters/start points for each run
    
    candidates - the list of total candidates to choose randomly from
    
    headerNoSummary - the header of the data
    
    Returns:
    
    a list of Instances (one for each run)
    
    Throws:
    
    DistributedWekaException - if a problem occurs

Class KMeansMapTask

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface weka.core.OptionHandler

Constructor Detail

KMeansMapTask

Method Detail

init

setDummyDistancePrimingData

processInstance

distance

getCentroidStats

dontReplaceMissingValuesTipText

setDontReplaceMissingValues

getDontReplaceMissingValues

getDistanceFunction

setCentroids

applyFilters

applyFilters

getPreprocessingFilters

setConverged

getConverged

getTransformedHeader

getFiltersToUse

setFiltersToUse

filtersToUseTipText

listOptions

setOptions

getOptions

assignStartPointsFromList