BaseSparkJobConfig

java.lang.Object
- distributed.core.DistributedJobConfig
- - distributed.spark.BaseSparkJobConfig

All Implemented Interfaces:

java.io.Serializable, OptionHandler

Direct Known Subclasses:

SparkJobConfig
```
public abstract class BaseSparkJobConfig
extends distributed.core.DistributedJobConfig
```
Basic options common to batch and streaming spark jobs

Version:

$Revision: $

Author:

Mark Hall (mhall{[at]}pentaho{[dot]}com)

See Also:

Serialized Form

Field Summary

Fields
Modifier and Type	Field and Description
`static java.lang.String`	`COLT_JAR` The path to the colt.jar
`static java.lang.String`	`DEFAULT_HDFS_PORT`
`static java.lang.String`	`DEFAULT_MESOS_MASTER_PORT`
`static java.lang.String`	`DEFAULT_SPARK_MASTER_PORT`
`static java.lang.String`	`DISTRIBUTED_WEKA_BASE_JAR` The path to the distributedWekaBase.jar
`static java.lang.String`	`DISTRIBUTED_WEKA_SPARK_JAR` The path to the distributedWekaSpark.jar
`static java.lang.String`	`HADOOP_FS_DEFAULT_NAME`
`static java.lang.String`	`JCOMMON_JAR` The path to the jcommon jar
`static java.lang.String`	`JFREECHART_JAR` The path to the jfreechart jar
`static java.lang.String`	`LA4J_JAR` The path to the la4j.jar
`static java.lang.String`	`MASTER_HOST`
`static java.lang.String`	`MASTER_PORT`
`static java.lang.String`	`OPEN_CSV_JAR` The path to the opencsv.jar
`static java.lang.String`	`SPARK_HOME_DIR`
`static java.lang.String`	`TDIGEST_JAR` The path to the t-digest.jar

Constructor Summary

Constructors
Constructor and Description

BaseSparkJobConfig()

Constructors
Constructor and Description
`BaseSparkJobConfig()`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`addWekaLibrariesToSparkContext(org.apache.spark.api.java.JavaSparkContext context, distributed.core.DistributedJob job)` Adds necessary Weka libraries to the supplied SparkContext
`java.lang.String`	`availableClusterMemoryTipText()` Tip text for this property.
`double`	`getAvailableClusterMemory()` Get the total available cluster memory.
`org.apache.spark.api.java.JavaSparkContext`	`getBaseSparkContext(java.lang.String jobName)` Gets a configured SparkContext
`java.lang.String`	`getDefaultPortForMaster()` Attempt to get a default port based on the master url
`double`	`getInMemoryDataOverheadFactor()` Get the overhead factor for data in memory.
`java.lang.String`	`getMasterHost()` Get the host for the master node.
`java.lang.String`	`getMasterPort()` Get the port for the master node.
`double`	`getMemoryFraction()` Get the fraction of Java heap to use for Spark's memory cache.
`java.lang.String[]`	`getOptions()`
`java.lang.String`	`getPathToWekaJar()` Get the path to the weka.jar file.
`java.lang.String`	`getSparkHomeDirectory()` Get the root directory of the spark installation on the slave nodes
`java.lang.String`	`getWekaPackages()` Get a comma separated list of the names of additional weka packages to use with the job.
`java.lang.String`	`inMemoryDataOverheadFactorTipText()` Tip text for this property
`java.util.Enumeration<Option>`	`listOptions()`
`java.lang.String`	`masterHostTipText()` Tip text for this property
`java.lang.String`	`masterPortTipText()` Tool tip text for this property
`java.lang.String`	`memoryFractionTipText()` Tip text for this property.
`java.lang.String`	`pathToWekaJarTipText()` Tip text for this property
`void`	`setAvailableClusterMemory(double m)` Set the total available cluster memory.
`void`	`setInMemoryDataOverheadFactor(double f)` Set the overhead factor for data in memory.
`void`	`setMasterHost(java.lang.String host)` Set the host for the master node.
`void`	`setMasterPort(java.lang.String port)` Set the port for the master node.
`void`	`setMemoryFraction(double f)` Set the fraction of Java heap to use for Spark's memory cache.
`void`	`setOptions(java.lang.String[] options)`
`void`	`setPathToWekaJar(java.lang.String path)` Set the path to the weka.jar file.
`void`	`setSparkHomeDirectory(java.lang.String sparkHome)` Set the root directory of the spark installation on the slave nodes
`void`	`setWekaPackages(java.lang.String packages)` Set a comma separated list of the names of additional weka packages to use with the job.
`java.lang.String`	`sparkHomeDirectoryTipText()` Tool tip text for this property
`java.lang.String`	`wekaPackagesTipText()` Tip text for this property.

Methods inherited from class distributed.core.DistributedJobConfig
clearUserSuppliedProperties, getProperty, getPropertyNames, getUserSuppliedProperties, getUserSuppliedProperty, getUserSuppliedPropertyNames, isEmpty, setProperty, setUserSuppliedProperty

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - DISTRIBUTED_WEKA_SPARK_JAR
```
public static final java.lang.String DISTRIBUTED_WEKA_SPARK_JAR
```
    The path to the distributedWekaSpark.jar
  - DISTRIBUTED_WEKA_BASE_JAR
```
public static final java.lang.String DISTRIBUTED_WEKA_BASE_JAR
```
    The path to the distributedWekaBase.jar
  - OPEN_CSV_JAR
```
public static final java.lang.String OPEN_CSV_JAR
```
    The path to the opencsv.jar
  - JFREECHART_JAR
```
public static final java.lang.String JFREECHART_JAR
```
    The path to the jfreechart jar
  - JCOMMON_JAR
```
public static final java.lang.String JCOMMON_JAR
```
    The path to the jcommon jar
  - COLT_JAR
```
public static final java.lang.String COLT_JAR
```
    The path to the colt.jar
  - LA4J_JAR
```
public static final java.lang.String LA4J_JAR
```
    The path to the la4j.jar
  - TDIGEST_JAR
```
public static final java.lang.String TDIGEST_JAR
```
    The path to the t-digest.jar
  - MASTER_HOST
```
public static final java.lang.String MASTER_HOST
```
    See Also:
    
    Constant Field Values
  - MASTER_PORT
```
public static final java.lang.String MASTER_PORT
```
    See Also:
    
    Constant Field Values
  - SPARK_HOME_DIR
```
public static final java.lang.String SPARK_HOME_DIR
```
    See Also:
    
    Constant Field Values
  - DEFAULT_SPARK_MASTER_PORT
```
public static final java.lang.String DEFAULT_SPARK_MASTER_PORT
```
    See Also:
    
    Constant Field Values
  - DEFAULT_MESOS_MASTER_PORT
```
public static final java.lang.String DEFAULT_MESOS_MASTER_PORT
```
    See Also:
    
    Constant Field Values
  - DEFAULT_HDFS_PORT
```
public static final java.lang.String DEFAULT_HDFS_PORT
```
    See Also:
    
    Constant Field Values
  - HADOOP_FS_DEFAULT_NAME
```
public static final java.lang.String HADOOP_FS_DEFAULT_NAME
```
    See Also:
    
    Constant Field Values
- Constructor Detail
  - BaseSparkJobConfig
```
public BaseSparkJobConfig()
```
- Method Detail
  - listOptions
```
public java.util.Enumeration<Option> listOptions()
```
    Specified by:
    
    listOptions in interface OptionHandler
    
    Overrides:
    
    listOptions in class distributed.core.DistributedJobConfig
  - getOptions
```
public java.lang.String[] getOptions()
```
    Specified by:
    
    getOptions in interface OptionHandler
    
    Overrides:
    
    getOptions in class distributed.core.DistributedJobConfig
  - setOptions
```
public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
```
    Specified by:
    
    setOptions in interface OptionHandler
    
    Overrides:
    
    setOptions in class distributed.core.DistributedJobConfig
    
    Throws:
    
    java.lang.Exception
  - masterHostTipText
```
public java.lang.String masterHostTipText()
```
    Tip text for this property
    
    Returns:
    
    the tip text for this property
  - getMasterHost
```
public java.lang.String getMasterHost()
```
    Get the host for the master node. This can be spark://..., mesos:// or yarn-client.
    
    Returns:
    
    the host of the master
  - setMasterHost
```
public void setMasterHost(java.lang.String host)
```
    Set the host for the master node. This can be spark://..., mesos:// or yarn-client.
    
    Parameters:
    
    host - the host of the master
  - masterPortTipText
```
public java.lang.String masterPortTipText()
```
    Tool tip text for this property
    
    Returns:
    
    the tool tip text for this property
  - getMasterPort
```
public java.lang.String getMasterPort()
```
    Get the port for the master node. If not specified, then the default will be used for whichever type of cluster is specified by the master host setting. Note that port is not needed for yarn-client mode.
    
    Returns:
    
    the port for the master node
  - setMasterPort
```
public void setMasterPort(java.lang.String port)
```
    Set the port for the master node. If not specified, then the default will be used for whichever type of cluster is specified by the master host setting. Note that port is not needed for yarn-client mode.
    
    Parameters:
    
    port - the port for the master node
  - sparkHomeDirectoryTipText
```
public java.lang.String sparkHomeDirectoryTipText()
```
    Tool tip text for this property
    
    Returns:
    
    the tool tip text for this property
  - getSparkHomeDirectory
```
public java.lang.String getSparkHomeDirectory()
```
    Get the root directory of the spark installation on the slave nodes
    
    Returns:
    
    the root directory of the spark installation on the slave nodes
  - setSparkHomeDirectory
```
public void setSparkHomeDirectory(java.lang.String sparkHome)
```
    Set the root directory of the spark installation on the slave nodes
    
    Parameters:
    
    sparkHome - the root directory of the spark installation on the slave nodes
  - availableClusterMemoryTipText
```
public java.lang.String availableClusterMemoryTipText()
```
    Tip text for this property.
    
    Returns:
    
    the tip text for this property.
  - getAvailableClusterMemory
```
public double getAvailableClusterMemory()
```
    Get the total available cluster memory.
    
    Returns:
    
    the total available cluster memory in Gb
  - setAvailableClusterMemory
```
public void setAvailableClusterMemory(double m)
```
    Set the total available cluster memory.
    
    Parameters:
    
    m - the total available cluster memory in Gb
  - inMemoryDataOverheadFactorTipText
```
public java.lang.String inMemoryDataOverheadFactorTipText()
```
    Tip text for this property
    
    Returns:
    
    the tip text for this property
  - getInMemoryDataOverheadFactor
```
public double getInMemoryDataOverheadFactor()
```
    Get the overhead factor for data in memory. This is a multiple of the on-disk size of the input data.
    
    Returns:
    
    the overhead factory for data in memory.
  - setInMemoryDataOverheadFactor
```
public void setInMemoryDataOverheadFactor(double f)
```
    Set the overhead factor for data in memory. This is a multiple of the on-disk size of the input data.
    
    Parameters:
    
    f - the overhead factory for data in memory.
  - memoryFractionTipText
```
public java.lang.String memoryFractionTipText()
```
    Tip text for this property.
    
    Returns:
    
    the tip text for this property
  - getMemoryFraction
```
public double getMemoryFraction()
```
    Get the fraction of Java heap to use for Spark's memory cache.
    
    Returns:
    
    the fraction to use
  - setMemoryFraction
```
public void setMemoryFraction(double f)
```
    Set the fraction of Java heap to use for Spark's memory cache.
    
    Parameters:
    
    f - the fraction to use
  - pathToWekaJarTipText
```
public java.lang.String pathToWekaJarTipText()
```
    Tip text for this property
    
    Returns:
    
    the tip text for this property
  - getPathToWekaJar
```
public java.lang.String getPathToWekaJar()
```
    Get the path to the weka.jar file. Will be populated automatically if the classpath contains a weka.jar. The weka.jar is included as a library for the spark job.
    
    Returns:
    
    the path to the weka.jar.
  - setPathToWekaJar
```
public void setPathToWekaJar(java.lang.String path)
```
    Set the path to the weka.jar file. Will be populated automatically if the classpath contains a weka.jar. The weka.jar is included as a library for the spark job
    
    Parameters:
    
    path - the path to the weka.jar.
  - wekaPackagesTipText
```
public java.lang.String wekaPackagesTipText()
```
    Tip text for this property.
    
    Returns:
    
    the tip text for this property.
  - getWekaPackages
```
public java.lang.String getWekaPackages()
```
    Get a comma separated list of the names of additional weka packages to use with the job. Any jar files in the main package directory and the lib directory of the package will be included as a library for the spark job.
    
    Returns:
    
    a comma separated list of weka packages to use with the job
  - setWekaPackages
```
public void setWekaPackages(java.lang.String packages)
```
    Set a comma separated list of the names of additional weka packages to use with the job. Any jar files in the main package directory and the lib directory of the package will be included as a library for the spark job.
    
    Parameters:
    
    packages - a comma separated list of weka packages to use with the job
  - getDefaultPortForMaster
```
public java.lang.String getDefaultPortForMaster()
```
    Attempt to get a default port based on the master url
    
    Returns:
    
    a default port to use. Returns null if no master has been set
  - getBaseSparkContext
```
public org.apache.spark.api.java.JavaSparkContext getBaseSparkContext(java.lang.String jobName)
```
    Gets a configured SparkContext
    
    Parameters:
    
    jobName - the job name to set on the context
    
    Returns:
    
    a configured SparkContext
  - addWekaLibrariesToSparkContext
```
public void addWekaLibrariesToSparkContext(org.apache.spark.api.java.JavaSparkContext context,
                                           distributed.core.DistributedJob job)
                                    throws WekaException
```
    Adds necessary Weka libraries to the supplied SparkContext
    
    Parameters:
    
    context - the context to add dependencies to
    
    job - the job that is using the context
    
    Throws:
    
    WekaException - if a problem occurs

Class BaseSparkJobConfig

Field Summary

Constructor Summary

Method Summary

Methods inherited from class distributed.core.DistributedJobConfig

Methods inherited from class java.lang.Object

Field Detail

DISTRIBUTED_WEKA_SPARK_JAR

DISTRIBUTED_WEKA_BASE_JAR

OPEN_CSV_JAR

JFREECHART_JAR

JCOMMON_JAR

COLT_JAR

LA4J_JAR

TDIGEST_JAR

MASTER_HOST

MASTER_PORT

SPARK_HOME_DIR

DEFAULT_SPARK_MASTER_PORT

DEFAULT_MESOS_MASTER_PORT

DEFAULT_HDFS_PORT

HADOOP_FS_DEFAULT_NAME

Constructor Detail

BaseSparkJobConfig

Method Detail

listOptions

getOptions

setOptions

masterHostTipText

getMasterHost

setMasterHost

masterPortTipText

getMasterPort

setMasterPort

sparkHomeDirectoryTipText

getSparkHomeDirectory

setSparkHomeDirectory

availableClusterMemoryTipText

getAvailableClusterMemory

setAvailableClusterMemory

inMemoryDataOverheadFactorTipText

getInMemoryDataOverheadFactor

setInMemoryDataOverheadFactor

memoryFractionTipText

getMemoryFraction

setMemoryFraction

pathToWekaJarTipText

getPathToWekaJar

setPathToWekaJar

wekaPackagesTipText

getWekaPackages

setWekaPackages

getDefaultPortForMaster

getBaseSparkContext

addWekaLibrariesToSparkContext