ArffHeaderSparkJob

java.lang.Object
- distributed.core.DistributedJob
- - weka.distributed.spark.SparkJob
  - - weka.distributed.spark.ArffHeaderSparkJob

All Implemented Interfaces:

java.io.Serializable, CommandlineRunnable, EnvironmentHandler, OptionHandler, InstancesProducer, TextProducer
```
public class ArffHeaderSparkJob
extends SparkJob
implements CommandlineRunnable, InstancesProducer, TextProducer
```
A Spark job for creating a unified ARFF header (including summary "meta" attributes).

Version:

$Revision: 13221 $

Author:

Mark Hall (mhall{[at]}pentaho{[dot]}com)

See Also:

Serialized Form

Nested Class Summary
- Nested classes/interfaces inherited from class weka.distributed.spark.SparkJob
  SparkJob.NoKeyTextOutputFormat<K,V>
- Nested classes/interfaces inherited from class distributed.core.DistributedJob
  distributed.core.DistributedJob.JobStatus

Field Summary

Fields
Modifier and Type	Field and Description
`static java.lang.String`	`CHART_HEIGHT_KEY` key for specifying a chart height to use
`static java.lang.String`	`CHART_WIDTH_KEY` key for specifying a chart width to use
`static int`	`DEFAULT_CHART_HEIGHT` Default height for charts
`static int`	`DEFAULT_CHART_WIDTH` Default width for charts
`static java.lang.String`	`OUTPUT_SUBDIR` Subdirectory in the output directory for storing the ARFF header to

Fields inherited from class weka.distributed.spark.SparkJob
TEST_DATA, TRAINING_DATA

Fields inherited from class distributed.core.DistributedJob
WEKA_ADDITIONAL_PACKAGES_KEY

Constructor Summary

Constructors
Constructor and Description

ArffHeaderSparkJob()
Constructor

Constructors
Constructor and Description
`ArffHeaderSparkJob()` Constructor

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`java.lang.String`	`attributeNamesFileTipText()` Tip text for this property
`java.lang.String`	`attributeNamesTipText()` Tip text for this property
`java.lang.String`	`csvToArffTaskOptionsTipText()` Tip text for this property
`java.lang.String`	`getAttributeNames()` Get a comma-separated list of attribute names to use when generating the ARFF header.
`java.lang.String`	`getAttributeNamesFile()` Get the path to an file containing attribute names to use.
`java.lang.String`	`getCsvToArffTaskOptions()` Get the options to pass on to the underlying csv to arff task
`Instances`	`getHeader()` Get the final header
`Instances`	`getInstances()` Get the final header (calls getHeader())
`java.lang.String[]`	`getJobOptionsOnly()` Get the options specific to this job only.
`java.lang.String[]`	`getOptions()`
`java.lang.String`	`getOutputHeaderFileName()` Get the name of the header file to create in the output directory.
`java.lang.String`	`getPathToExistingHeader()` Get the path to an previously created header file to use.
`java.util.List<java.lang.String>`	`getSummaryChartAttNames()` Get the names of the attributes in the summary charts
`java.util.List<java.awt.image.BufferedImage>`	`getSummaryCharts()` Get the summary charts (if any)
`java.lang.String`	`getText()`
`java.util.Enumeration<Option>`	`listOptions()`
`static void`	`main(java.lang.String[] args)` Main method for executing this job
`java.lang.String`	`outputHeaderFileNameTipText()` The tip text for this property
`java.lang.String`	`pathToExistingHeaderTipText()` The tip text for this property
`void`	`run(java.lang.Object toRun, java.lang.String[] options)`
`boolean`	`runJobWithContext(org.apache.spark.api.java.JavaSparkContext sparkContext)` Clients to implement
`void`	`setAttributeNames(java.lang.String names)` Set a comma-separated list of attribute names to use when generating the ARFF header.
`void`	`setAttributeNamesFile(java.lang.String namesfile)` Set the path to an file containing attribute names to use.
`void`	`setCsvToArffTaskOptions(java.lang.String opts)` Set the options to pass on to the underlying csv to arff task
`void`	`setOptions(java.lang.String[] options)`
`void`	`setOutputHeaderFileName(java.lang.String name)` Set the name of the header file to create in the output directory.
`void`	`setPathToExistingHeader(java.lang.String path)` Set the path to an previously created header file to use.

Methods inherited from class distributed.core.DistributedJob
environmentSubstitute, getAdditionalWekaPackageNames, getJobName, getJobStatus, getLog, logMessage, logMessage, logMessage, makeOptionsStr, objectRowToInstance, parseInstance, postExecution, preExecution, setEnvironment, setJobDescription, setJobName, setJobStatus, setLog, setStatusMessagePrefix, stackTraceToString, statusMessage, stopJob

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface weka.core.CommandlineRunnable
postExecution, preExecution

- Field Detail
  - CHART_WIDTH_KEY
```
public static final java.lang.String CHART_WIDTH_KEY
```
    key for specifying a chart width to use
    
    See Also:
    
    Constant Field Values
  - CHART_HEIGHT_KEY
```
public static final java.lang.String CHART_HEIGHT_KEY
```
    key for specifying a chart height to use
    
    See Also:
    
    Constant Field Values
  - DEFAULT_CHART_WIDTH
```
public static final int DEFAULT_CHART_WIDTH
```
    Default width for charts
    
    See Also:
    
    Constant Field Values
  - DEFAULT_CHART_HEIGHT
```
public static final int DEFAULT_CHART_HEIGHT
```
    Default height for charts
    
    See Also:
    
    Constant Field Values
  - OUTPUT_SUBDIR
```
public static final java.lang.String OUTPUT_SUBDIR
```
    Subdirectory in the output directory for storing the ARFF header to
    
    See Also:
    
    Constant Field Values
- Constructor Detail
  - ArffHeaderSparkJob
```
public ArffHeaderSparkJob()
```
    Constructor
- Method Detail
  - main
```
public static void main(java.lang.String[] args)
```
    Main method for executing this job
    
    Parameters:
    
    args - arguments to the job
  - listOptions
```
public java.util.Enumeration<Option> listOptions()
```
    Specified by:
    
    listOptions in interface OptionHandler
    
    Overrides:
    
    listOptions in class SparkJob
  - getJobOptionsOnly
```
public java.lang.String[] getJobOptionsOnly()
```
    Get the options specific to this job only.
    
    Returns:
    
    the options specific to this job only
  - getOptions
```
public java.lang.String[] getOptions()
```
    Specified by:
    
    getOptions in interface OptionHandler
    
    Overrides:
    
    getOptions in class SparkJob
  - setOptions
```
public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
```
    Specified by:
    
    setOptions in interface OptionHandler
    
    Overrides:
    
    setOptions in class SparkJob
    
    Throws:
    
    java.lang.Exception
  - pathToExistingHeaderTipText
```
public java.lang.String pathToExistingHeaderTipText()
```
    The tip text for this property
    
    Returns:
    
    the tip text for this property
  - getPathToExistingHeader
```
public java.lang.String getPathToExistingHeader()
```
    Get the path to an previously created header file to use. If set this prevents the job from running
    
    Returns:
    
    the path to a previously created header
  - setPathToExistingHeader
```
public void setPathToExistingHeader(java.lang.String path)
```
    Set the path to an previously created header file to use. If set this prevents the job from running
    
    Parameters:
    
    path - the path to a previously created header
  - outputHeaderFileNameTipText
```
public java.lang.String outputHeaderFileNameTipText()
```
    The tip text for this property
    
    Returns:
    
    the tip text for this property
  - getOutputHeaderFileName
```
public java.lang.String getOutputHeaderFileName()
```
    Get the name of the header file to create in the output directory. If left unset then a name is generated automatically.
    
    Returns:
    
    the name for the ARFF header file
  - setOutputHeaderFileName
```
public void setOutputHeaderFileName(java.lang.String name)
```
    Set the name of the header file to create in the output directory. If left unset then a name is generated automatically.
    
    Parameters:
    
    name - the name for the ARFF header file
  - attributeNamesTipText
```
public java.lang.String attributeNamesTipText()
```
    Tip text for this property
    
    Returns:
    
    the tip text for this property
  - getAttributeNames
```
public java.lang.String getAttributeNames()
```
    Get a comma-separated list of attribute names to use when generating the ARFF header. Use this or a names file to specify attribute names to use. Otherwise, names are generated automatically
    
    Returns:
    
    the names of the attributes
  - setAttributeNames
```
public void setAttributeNames(java.lang.String names)
```
    Set a comma-separated list of attribute names to use when generating the ARFF header. Use this or a names file to specify attribute names to use. Otherwise, names are generated automatically
    
    Parameters:
    
    names - the names of the attributes
  - attributeNamesFileTipText
```
public java.lang.String attributeNamesFileTipText()
```
    Tip text for this property
    
    Returns:
    
    the tip text for this property
  - getAttributeNamesFile
```
public java.lang.String getAttributeNamesFile()
```
    Get the path to an file containing attribute names to use. Names should be listed one per line in the file. The file may reside on the local file system or in HDFS. Use either this option or setAttributeNames to specify names for the attributes when creating the ARFF header. If left unset then names will be generated automatically.
    
    Returns:
    
    the path to a names file to use
  - setAttributeNamesFile
```
public void setAttributeNamesFile(java.lang.String namesfile)
```
    Set the path to an file containing attribute names to use. Names should be listed one per line in the file. The file may reside on the local file system or in HDFS. Use either this option or setAttributeNames to specify names for the attributes when creating the ARFF header. If left unset then names will be generated automatically.
    
    Parameters:
    
    namesfile - the path to a names file to use
  - csvToArffTaskOptionsTipText
```
public java.lang.String csvToArffTaskOptionsTipText()
```
    Tip text for this property
    
    Returns:
    
    the tip text for this property
  - getCsvToArffTaskOptions
```
public java.lang.String getCsvToArffTaskOptions()
```
    Get the options to pass on to the underlying csv to arff task
    
    Returns:
    
    options to pass on to the csv to arff map and reduce tasks
  - setCsvToArffTaskOptions
```
public void setCsvToArffTaskOptions(java.lang.String opts)
```
    Set the options to pass on to the underlying csv to arff task
    
    Parameters:
    
    opts - options to pass on to the csv to arff map and reduce tasks
  - getSummaryCharts
```
public java.util.List<java.awt.image.BufferedImage> getSummaryCharts()
```
    Get the summary charts (if any)
    
    Returns:
    
    a list of Image objects or null
  - getSummaryChartAttNames
```
public java.util.List<java.lang.String> getSummaryChartAttNames()
```
    Get the names of the attributes in the summary charts
    
    Returns:
    
    the names of the attributes in the summary charts
  - runJobWithContext
```
public boolean runJobWithContext(org.apache.spark.api.java.JavaSparkContext sparkContext)
                          throws java.io.IOException,
                                 weka.distributed.DistributedWekaException
```
    Description copied from class: SparkJob
    
    Clients to implement
    
    Specified by:
    
    runJobWithContext in class SparkJob
    
    Parameters:
    
    sparkContext - the context to use
    
    Returns:
    
    true if the job was successful
    
    Throws:
    
    java.io.IOException - if a IO problem occurs
    
    weka.distributed.DistributedWekaException - if any other problem occurs
  - getHeader
```
public Instances getHeader()
```
    Get the final header
    
    Returns:
    
    the final header
  - getInstances
```
public Instances getInstances()
```
    Get the final header (calls getHeader())
    
    Specified by:
    
    getInstances in interface InstancesProducer
    
    Returns:
    
    the final header
  - run
```
public void run(java.lang.Object toRun,
                java.lang.String[] options)
         throws java.lang.IllegalArgumentException
```
    Specified by:
    
    run in interface CommandlineRunnable
    
    Overrides:
    
    run in class distributed.core.DistributedJob
    
    Throws:
    
    java.lang.IllegalArgumentException
  - getText
```
public java.lang.String getText()
```
    Specified by:
    
    getText in interface TextProducer

Class ArffHeaderSparkJob

Nested Class Summary

Nested classes/interfaces inherited from class weka.distributed.spark.SparkJob

Nested classes/interfaces inherited from class distributed.core.DistributedJob

Field Summary

Fields inherited from class weka.distributed.spark.SparkJob

Fields inherited from class distributed.core.DistributedJob

Constructor Summary

Method Summary

Methods inherited from class weka.distributed.spark.SparkJob

Methods inherited from class distributed.core.DistributedJob

Methods inherited from class java.lang.Object

Methods inherited from interface weka.core.CommandlineRunnable

Field Detail

CHART_WIDTH_KEY

CHART_HEIGHT_KEY

DEFAULT_CHART_WIDTH

DEFAULT_CHART_HEIGHT

OUTPUT_SUBDIR

Constructor Detail

ArffHeaderSparkJob

Method Detail

main

listOptions

getJobOptionsOnly

getOptions

setOptions

pathToExistingHeaderTipText

getPathToExistingHeader

setPathToExistingHeader

outputHeaderFileNameTipText

getOutputHeaderFileName

setOutputHeaderFileName

attributeNamesTipText

getAttributeNames

setAttributeNames

attributeNamesFileTipText

getAttributeNamesFile

setAttributeNamesFile

csvToArffTaskOptionsTipText

getCsvToArffTaskOptions

setCsvToArffTaskOptions

getSummaryCharts

getSummaryChartAttNames

runJobWithContext

getHeader

getInstances

run

getText