public class KMeansReduceTask
extends java.lang.Object
implements java.io.Serializable
Constructor and Description |
---|
KMeansReduceTask() |
Modifier and Type | Method and Description |
---|---|
static weka.core.Instances |
computeDistancePrimingDataFromDistanceFunctions(java.util.List<weka.core.NormalizableDistance> distanceFuncs,
weka.core.Instances headerNoSummary)
Utility function to examine the attribute ranges in a bunch of distance
functions and return a two instance dataset with the global mins/maxes of
numeric attributes set.
|
java.util.List<weka.core.Instances> |
getAggregatedCentroidSummaries()
Get the aggregated summary data for each individual centroid.
|
weka.core.Instances |
getCentroidsForRun()
Return the centroids for the run
|
weka.core.Instances |
getGlobalDistanceFunctionPrimingData()
Get the global distance function priming data.
|
int |
getIterationNumber()
Get the current iteration number
|
int |
getRunNumber()
Get the run number
|
double |
getTotalWithinClustersError()
Get the total within cluster error for this run
|
KMeansReduceTask |
reduceClusters(int runNumber,
int iterationNumber,
weka.core.Instances headerNoSummary,
java.util.List<java.util.List<weka.core.Instances>> clusterSummaries)
Reduce the cluster centroid summary metadata instances for a particular run
in order to produce a new set of Instances that contains the new cluster
centroids for the run.
|
public KMeansReduceTask reduceClusters(int runNumber, int iterationNumber, weka.core.Instances headerNoSummary, java.util.List<java.util.List<weka.core.Instances>> clusterSummaries) throws DistributedWekaException
runNumber
- the current run numberiterationNumber
- the current iteration number of k-meansheaderNoSummary
- the global ARFF header (as computed by the
ArffHeader job on the entire dataset, and having passed through
any preprocessing filters). We need this so that the correct index
for nominal attribute values can be set in the new centroids (map
tasks accumulating summary stats when clustering partitions of the
data may see nominal values in different orders, or not see some
values at all, compared to the global header)clusterSummaries
- a list of cluster summary information. Each inner
list of Instances will have been generated by a map task on a
subset of the data. Each instances object in the list contains the
summary stats for one cluster centroid. Inner lists are in order
of centroid number. A particular Instances entry in a list may be
null - this indicates that the cluster was empty within that
particular map task (i.e. no training instances were assigned to
it)DistributedWekaException
- if a problem occurspublic weka.core.Instances getCentroidsForRun()
public java.util.List<weka.core.Instances> getAggregatedCentroidSummaries()
public weka.core.Instances getGlobalDistanceFunctionPrimingData()
public int getRunNumber()
public int getIterationNumber()
public double getTotalWithinClustersError()
public static weka.core.Instances computeDistancePrimingDataFromDistanceFunctions(java.util.List<weka.core.NormalizableDistance> distanceFuncs, weka.core.Instances headerNoSummary) throws DistributedWekaException
distanceFuncs
- a list of distance functions (where each potentially
has only seen part of the overall datasetheaderNoSummary
- the header of the data that the distance functions
have been seeingDistributedWekaException
- if a problem occurs