- All Implemented Interfaces:
- java.io.Serializable, BaseStepExtender, Step
@KFStep(name="CorrelationMatrixHadoopJob",
category="Hadoop",
toolTipText="Computes a correlation/covariance matrix for numeric data in Hadoop. The data can include a class attribute, which can be part of the correlation analysis if it is numeric or ignored if it is nominal. The user can optionally have the job perform a PCA analysis using the computed correlation/covariance matrix as input. Note that this is done outside of Spark on the client machine as a postprocessing step, so is suitable for data that does not conatain a large number of columns. The PCA analysis will be written back into the output directory, along with a serialized PCA filter that can be used for preprocessing data in the WekaClassfierHadoop job.",
iconPath="weka/gui/knowledgeflow/icons/CorrelationMatrixHadoopJob.gif")
public class CorrelationMatrixHadoopJob
extends AbstractHadoopJob
Knowledge Flow step for the correlation matrix/PCA job
- Version:
- $Revision: $
- Author:
- Mark Hall (mhall{[at]}pentaho{[dot]}com)
- See Also:
- Serialized Form