CSVToARFFHeaderMapTask

java.lang.Object
- weka.distributed.CSVToARFFHeaderMapTask

All Implemented Interfaces:

java.io.Serializable, weka.core.OptionHandler
```
public class CSVToARFFHeaderMapTask
extends java.lang.Object
implements weka.core.OptionHandler, java.io.Serializable
```
A map task that processes incoming lines in CSV format and builds up header information. Can be configured with information on which columns to force to be nominal, string, date etc. Nominal values can be determined automatically or pre-supplied by the user. In addition to determining the format of the columns in the data it also can compute meta data such as means, modes, counts, standard deviations etc. These statistics get encoded in special "summary" attributes in the header file - one for each numeric or nominal attribute in the data.

Version:

$Revision: 13989 $

Author:

Mark Hall (mhall{[at]}pentaho{[dot]}com)

See Also:

Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`CSVToARFFHeaderMapTask.HeaderAndQuantileDataHolder` Container class for a Instances header with basic summary stats and a map of TDigest quantile estimators for numeric attributes

Field Summary

Fields
Modifier and Type Field and Description

static java.lang.String ARFF_SUMMARY_ATTRIBUTE_PREFIX
Attribute name prefix for a summary statistics attribute

static int MAX_PARSING_ERRORS

Fields
Modifier and Type	Field and Description
`static java.lang.String`	`ARFF_SUMMARY_ATTRIBUTE_PREFIX` Attribute name prefix for a summary statistics attribute
`static int`	`MAX_PARSING_ERRORS`

Constructor Summary

Constructors
Constructor and Description
`CSVToARFFHeaderMapTask()` Constructor
`CSVToARFFHeaderMapTask(boolean suppressQuantileOptions)` Constructor
`CSVToARFFHeaderMapTask(boolean suppressQuantileOptions, boolean suppressCSVParsingOptions)` Constructor

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`static CSVToARFFHeaderMapTask`	`combine(java.util.List<CSVToARFFHeaderMapTask> tasks)` Performs a "combine" operation using the supplied partial CSVToARFFHeaderMapTask tasks.
`java.lang.String`	`compressionLevelForQuartileEstimationTipText()` Returns the tip text for this property.
`java.lang.String`	`computeQuartilesAsPartOfSummaryStatsTipText()` Returns the tip text for this property.
`java.lang.String`	`dateAttributesTipText()` Returns the tip text for this property.
`java.lang.String`	`dateFormatTipText()` Returns the tip text for this property.
`void`	`deSerializeAllQuantileEstimators()` Deserialize all TDigest quantile estimators in use
`java.lang.String`	`enclosureCharactersTipText()` Returns the tip text for this property.
`java.lang.String`	`fieldSeparatorTipText()` Returns the tip text for this property.
`void`	`fromHeader(weka.core.Instances headerWithSummary, java.util.Map<java.lang.String,TDigest> quantileEstimators)` Initialize internal state using the supplied ARFF header with summary attributes.
`void`	`generateNames(int numAtts)` Generate attribute names.
`void`	`generateNames(int initial, int numAtts)` Generate attribute names.
`double`	`getCompressionLevelForQuartileEstimation()` Get the compression level to use in the TDigest quantile estimators
`boolean`	`getComputeQuartilesAsPartOfSummaryStats()` Get whether to include estimated quartiles in the profiling stats
`java.lang.String`	`getDateAttributes()` Returns the current attribute range to be forced to type date.
`java.lang.String`	`getDateFormat()` Get the format to use for parsing date values.
`java.lang.String`	`getDefaultValue(int attIndex)` Get the default label for a given attribute.
`java.lang.String`	`getEnclosureCharacters()` Get the character(s) to use/recognize as string enclosures
`java.lang.String`	`getFieldSeparator()` Returns the character used as column separator.
`weka.core.Instances`	`getHeader()` get the header information (as an Instances object) from what has been seen so far by this map task
`weka.core.Instances`	`getHeader(int numFields, java.util.List<java.lang.String> attNames)` Get a header constructed using the supplied attribute names.
`CSVToARFFHeaderMapTask.HeaderAndQuantileDataHolder`	`getHeaderAndQuantileEstimators()` Get the header information and the encoded quantile estimators
`java.lang.String`	`getMissingValue()` Returns the current placeholder for missing values.
`java.lang.String`	`getNominalAttributes()` Returns the current attribute range to be forced to type nominal.
`java.lang.Object[]`	`getNominalDefaultLabelSpecs()` Get the default label specifications for nominal attributes
`java.lang.Object[]`	`getNominalLabelSpecs()` Get label specifications for nominal attributes.
`int`	`getNumDecimalPlaces()` Get the number of decimal places for outputting summary stats
`java.lang.String[]`	`getOptions()`
`java.lang.String`	`getStringAttributes()` Returns the current attribute range to be forced to type string.
`boolean`	`getTreatUnparsableNumericValuesAsMissing()` Get whether, for hitherto thought to be numeric columns, to treat any unparsable values as missing value.
`boolean`	`getTreatZerosAsMissing()` Get whether to treat zeros as missing values for numeric attributes when computing summary statistics.
`boolean`	`headerAvailableImmediately(int numFields, java.util.List<java.lang.String> attNames, java.lang.StringBuffer problems)` Check if the header can be produced immediately without having to do a pre-processing pass to determine and unify nominal attribute values.
`void`	`initParserOnly(java.util.List<java.lang.String> attNames)` Only initialize enough stuff in order to parse rows and construct instances
`static java.util.List<java.lang.String>`	`instanceHeaderToAttributeNameList(weka.core.Instances header)`
`java.util.Enumeration<weka.core.Option>`	`listOptions()`
`static void`	`main(java.lang.String[] args)`
`weka.core.Instance`	`makeInstance(weka.core.Instances trainingHeader, boolean setStringValues, java.lang.String[] parsed)` Utility method for Constructing a dense instance given an array of parsed CSV values
`weka.core.Instance`	`makeInstance(weka.core.Instances trainingHeader, boolean setStringValues, java.lang.String[] parsed, boolean sparse)` Utility method for Constructing an instance given an array of parsed CSV values
`weka.core.Instance`	`makeInstanceFromObjectRow(weka.core.Instances trainingHeader, boolean setStringValues, java.lang.Object[] row, boolean sparse)` Utility method for Constructing an instance given an array of Objects
`java.lang.String`	`missingValueTipText()` Returns the tip text for this property.
`java.lang.String`	`nominalAttributesTipText()` Returns the tip text for this property.
`java.lang.String`	`nominalDefaultLabelSpecsTipText()` Returns the tip text for this property.
`java.lang.String`	`nominalLabelSpecsTipText()` Returns the tip text for this property.
`java.lang.String[]`	`parseRowOnly(java.lang.String row)` Just parse a row.
`void`	`processRow(java.lang.String row, java.util.List<java.lang.String> attNames)` Process a row of data coming into the map.
`void`	`processRowValues(java.lang.Object[] fieldVals, java.util.List<java.lang.String> attNames)` Process a tokenized row of values.
`void`	`serializeAllQuantileEstimators()` Serialize all TDigest quantile estimators in use
`void`	`setCompressionLevelForQuartileEstimation(double compression)` Set the compression level to use in the TDigest quantile estimators
`void`	`setComputeQuartilesAsPartOfSummaryStats(boolean c)` Set whether to include estimated quartiles in the profiling stats
`void`	`setDateAttributes(java.lang.String value)` Set the attribute range to be forced to type date.
`void`	`setDateFormat(java.lang.String value)` Set the format to use for parsing date values.
`void`	`setEnclosureCharacters(java.lang.String enclosure)` Set the character(s) to use/recognize as string enclosures
`void`	`setFieldSeparator(java.lang.String value)` Sets the character used as column separator.
`void`	`setMissingValue(java.lang.String value)` Sets the placeholder for missing values.
`void`	`setNominalAttributes(java.lang.String value)` Sets the attribute range to be forced to type nominal.
`void`	`setNominalDefaultLabelSpecs(java.lang.Object[] specs)` Set the default label specifications for nominal attributes
`void`	`setNominalLabelSpecs(java.lang.Object[] specs)` Set label specifications for nominal attributes.
`void`	`setNumDecimalPlaces(int numDecimalPlaces)` Set the number of decimal places for outputting summary stats
`void`	`setOptions(java.lang.String[] options)`
`void`	`setStringAttributes(java.lang.String value)` Sets the attribute range to be forced to type string.
`void`	`setTreatUnparsableNumericValuesAsMissing(boolean unparsableNumericValuesToMissing)` Set whether, for hitherto thought to be numeric columns, to treat any unparsable values as missing value.
`void`	`setTreatZerosAsMissing(boolean t)` Set whether to treat zeros as missing values for numeric attributes when computing summary statistics.
`java.lang.String`	`stringAttributesTipText()` Returns the tip text for this property.
`static void`	`updateSummaryStats(java.util.Map<java.lang.String,Stats> summaryStats, java.util.Map<java.lang.String,StringStats> backupStringStats, java.lang.String attName, double value, java.lang.String nominalLabel, boolean isNominal, boolean isString, boolean treatZeroAsMissing, boolean estimateQuantiles, double quantileCompression)` Update the summary statistics for a given attribute with the given value

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface weka.core.OptionHandler
makeCopy

- Field Detail
  - ARFF_SUMMARY_ATTRIBUTE_PREFIX
```
public static final java.lang.String ARFF_SUMMARY_ATTRIBUTE_PREFIX
```
    Attribute name prefix for a summary statistics attribute
    
    See Also:
    
    Constant Field Values
  - MAX_PARSING_ERRORS
```
public static final int MAX_PARSING_ERRORS
```
    See Also:
    
    Constant Field Values
- Constructor Detail
  - CSVToARFFHeaderMapTask
```
public CSVToARFFHeaderMapTask()
```
    Constructor
  - CSVToARFFHeaderMapTask
```
public CSVToARFFHeaderMapTask(boolean suppressQuantileOptions)
```
    Constructor
    
    Parameters:
    
    suppressQuantileOptions - true if commandline options relating to quantile estimation are to be suppressed
  - CSVToARFFHeaderMapTask
```
public CSVToARFFHeaderMapTask(boolean suppressQuantileOptions,
                              boolean suppressCSVParsingOptions)
```
    Constructor
    
    Parameters:
    
    suppressQuantileOptions - true if command line options relating to quantile estimation are to be suppressed
    
    suppressCSVParsingOptions - true if command line options relating to CSV parsing are to be suppressed
- Method Detail
  - updateSummaryStats
```
public static void updateSummaryStats(java.util.Map<java.lang.String,Stats> summaryStats,
                                      java.util.Map<java.lang.String,StringStats> backupStringStats,
                                      java.lang.String attName,
                                      double value,
                                      java.lang.String nominalLabel,
                                      boolean isNominal,
                                      boolean isString,
                                      boolean treatZeroAsMissing,
                                      boolean estimateQuantiles,
                                      double quantileCompression)
```
    Update the summary statistics for a given attribute with the given value
    
    Parameters:
    
    summaryStats - the map of summary statistics
    
    backupStringStats - the temporary map of backup string stats kept for numeric fields (this can be null in cases where we are sure that there is no chance of unparsable numeric values occuring)
    
    attName - the name of the attribute being updated
    
    value - the value to update with (if the attribute is numeric)
    
    nominalLabel - holds the label/string for the attribute (if it is nominal or string)
    
    isNominal - true if the attribute is nominal
    
    isString - true if the attribute is a string attribute
    
    treatZeroAsMissing - treats zero as missing value for numeric attributes
    
    estimateQuantiles - true if we should estimate quantiles too
    
    quantileCompression - the compression level to use in the TDigest estimators
  - instanceHeaderToAttributeNameList
```
public static java.util.List<java.lang.String> instanceHeaderToAttributeNameList(weka.core.Instances header)
```
  - main
```
public static void main(java.lang.String[] args)
```
  - combine
```
public static CSVToARFFHeaderMapTask combine(java.util.List<CSVToARFFHeaderMapTask> tasks)
                                      throws DistributedWekaException
```
    Performs a "combine" operation using the supplied partial CSVToARFFHeaderMapTask tasks. This is essentially a reduce operation, but returns a single CSVToARFFHeaderMapTask object (rather than the final header that is produced by CSVToARFFHeaderReduceTask). This allows several reduce stages to be implemented (if desired) or partial reduces to occur in parallel.
    
    Parameters:
    
    tasks - a list of CSVToARFFHeaderMapTasks to "combine"
    
    Returns:
    
    a CSVToARFFHeaderMapTask with the merged state
    
    Throws:
    
    DistributedWekaException - if a problem occurs
  - listOptions
```
public java.util.Enumeration<weka.core.Option> listOptions()
```
    Specified by:
    
    listOptions in interface weka.core.OptionHandler
  - getOptions
```
public java.lang.String[] getOptions()
```
    Specified by:
    
    getOptions in interface weka.core.OptionHandler
  - setOptions
```
public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
```
    Specified by:
    
    setOptions in interface weka.core.OptionHandler
    
    Throws:
    
    java.lang.Exception
  - setNumDecimalPlaces
```
public void setNumDecimalPlaces(int numDecimalPlaces)
```
    Set the number of decimal places for outputting summary stats
    
    Parameters:
    
    numDecimalPlaces - number of decimal places to use
  - getNumDecimalPlaces
```
public int getNumDecimalPlaces()
```
    Get the number of decimal places for outputting summary stats
    
    Returns:
    
    number of decimal places to use
  - setTreatUnparsableNumericValuesAsMissing
```
public void setTreatUnparsableNumericValuesAsMissing(boolean unparsableNumericValuesToMissing)
```
    Set whether, for hitherto thought to be numeric columns, to treat any unparsable values as missing value.
    
    Parameters:
    
    unparsableNumericValuesToMissing -
  - getTreatUnparsableNumericValuesAsMissing
```
public boolean getTreatUnparsableNumericValuesAsMissing()
```
    Get whether, for hitherto thought to be numeric columns, to treat any unparsable values as missing value.
    
    Returns:
    
    true if unparsable numeric values are to be treated as missing
  - getTreatZerosAsMissing
```
public boolean getTreatZerosAsMissing()
```
    Get whether to treat zeros as missing values for numeric attributes when computing summary statistics.
    
    Returns:
    
    true if zeros are to be treated as missing values for the purposes of computing summary stats.
  - setTreatZerosAsMissing
```
public void setTreatZerosAsMissing(boolean t)
```
    Set whether to treat zeros as missing values for numeric attributes when computing summary statistics.
    
    Parameters:
    
    t - true if zeros are to be treated as missing values for the purposes of computing summary stats.
  - getCompressionLevelForQuartileEstimation
```
public double getCompressionLevelForQuartileEstimation()
```
    Get the compression level to use in the TDigest quantile estimators
    
    Returns:
    
    the compression level (smaller values give higher compression and less accurate estimates).
  - setCompressionLevelForQuartileEstimation
```
public void setCompressionLevelForQuartileEstimation(double compression)
```
    Set the compression level to use in the TDigest quantile estimators
    
    Parameters:
    
    compression - the compression level (smaller values give higher compression and less accurate estimates).
  - compressionLevelForQuartileEstimationTipText
```
public java.lang.String compressionLevelForQuartileEstimationTipText()
```
    Returns the tip text for this property.
    
    Returns:
    
    tip text for this property suitable for displaying in the explorer/experimenter gui
  - getComputeQuartilesAsPartOfSummaryStats
```
public boolean getComputeQuartilesAsPartOfSummaryStats()
```
    Get whether to include estimated quartiles in the profiling stats
    
    Returns:
    
    true if quartiles are to be estimated
  - setComputeQuartilesAsPartOfSummaryStats
```
public void setComputeQuartilesAsPartOfSummaryStats(boolean c)
```
    Set whether to include estimated quartiles in the profiling stats
    
    Parameters:
    
    c - true if quartiles are to be estimated
  - computeQuartilesAsPartOfSummaryStatsTipText
```
public java.lang.String computeQuartilesAsPartOfSummaryStatsTipText()
```
    Returns the tip text for this property.
    
    Returns:
    
    tip text for this property suitable for displaying in the explorer/experimenter gui
  - getMissingValue
```
public java.lang.String getMissingValue()
```
    Returns the current placeholder for missing values.
    
    Returns:
    
    the placeholder
  - setMissingValue
```
public void setMissingValue(java.lang.String value)
```
    Sets the placeholder for missing values.
    
    Parameters:
    
    value - the placeholder
  - missingValueTipText
```
public java.lang.String missingValueTipText()
```
    Returns the tip text for this property.
    
    Returns:
    
    tip text for this property suitable for displaying in the explorer/experimenter gui
  - getStringAttributes
```
public java.lang.String getStringAttributes()
```
    Returns the current attribute range to be forced to type string.
    
    Returns:
    
    the range
  - setStringAttributes
```
public void setStringAttributes(java.lang.String value)
```
    Sets the attribute range to be forced to type string.
    
    Parameters:
    
    value - the range
  - stringAttributesTipText
```
public java.lang.String stringAttributesTipText()
```
    Returns the tip text for this property.
    
    Returns:
    
    tip text for this property suitable for displaying in the explorer/experimenter gui
  - getNominalAttributes
```
public java.lang.String getNominalAttributes()
```
    Returns the current attribute range to be forced to type nominal.
    
    Returns:
    
    the range
  - setNominalAttributes
```
public void setNominalAttributes(java.lang.String value)
```
    Sets the attribute range to be forced to type nominal.
    
    Parameters:
    
    value - the range
  - nominalAttributesTipText
```
public java.lang.String nominalAttributesTipText()
```
    Returns the tip text for this property.
    
    Returns:
    
    tip text for this property suitable for displaying in the explorer/experimenter gui
  - getDateFormat
```
public java.lang.String getDateFormat()
```
    Get the format to use for parsing date values.
    
    Returns:
    
    the format to use for parsing date values.
  - setDateFormat
```
public void setDateFormat(java.lang.String value)
```
    Set the format to use for parsing date values.
    
    Parameters:
    
    value - the format to use.
  - dateFormatTipText
```
public java.lang.String dateFormatTipText()
```
    Returns the tip text for this property.
    
    Returns:
    
    tip text for this property suitable for displaying in the explorer/experimenter gui
  - getDateAttributes
```
public java.lang.String getDateAttributes()
```
    Returns the current attribute range to be forced to type date.
    
    Returns:
    
    the range.
  - setDateAttributes
```
public void setDateAttributes(java.lang.String value)
```
    Set the attribute range to be forced to type date.
    
    Parameters:
    
    value - the range
  - dateAttributesTipText
```
public java.lang.String dateAttributesTipText()
```
    Returns the tip text for this property.
    
    Returns:
    
    tip text for this property suitable for displaying in the explorer/experimenter gui
  - enclosureCharactersTipText
```
public java.lang.String enclosureCharactersTipText()
```
    Returns the tip text for this property.
    
    Returns:
    
    tip text for this property suitable for displaying in the explorer/experimenter gui
  - getEnclosureCharacters
```
public java.lang.String getEnclosureCharacters()
```
    Get the character(s) to use/recognize as string enclosures
    
    Returns:
    
    the characters to use as string enclosures
  - setEnclosureCharacters
```
public void setEnclosureCharacters(java.lang.String enclosure)
```
    Set the character(s) to use/recognize as string enclosures
    
    Parameters:
    
    enclosure - the characters to use as string enclosures
  - getFieldSeparator
```
public java.lang.String getFieldSeparator()
```
    Returns the character used as column separator.
    
    Returns:
    
    the character to use
  - setFieldSeparator
```
public void setFieldSeparator(java.lang.String value)
```
    Sets the character used as column separator.
    
    Parameters:
    
    value - the character to use
  - fieldSeparatorTipText
```
public java.lang.String fieldSeparatorTipText()
```
    Returns the tip text for this property.
    
    Returns:
    
    tip text for this property suitable for displaying in the explorer/experimenter gui
  - nominalDefaultLabelSpecsTipText
```
public java.lang.String nominalDefaultLabelSpecsTipText()
```
    Returns the tip text for this property.
    
    Returns:
    
    tip text for this property suitable for displaying in the explorer/experimenter gui
  - getNominalDefaultLabelSpecs
```
public java.lang.Object[] getNominalDefaultLabelSpecs()
```
    Get the default label specifications for nominal attributes
    
    Returns:
    
    an array of default label specifications
  - setNominalDefaultLabelSpecs
```
public void setNominalDefaultLabelSpecs(java.lang.Object[] specs)
```
    Set the default label specifications for nominal attributes
    
    Parameters:
    
    specs - an array of default label specifications
  - nominalLabelSpecsTipText
```
public java.lang.String nominalLabelSpecsTipText()
```
    Returns the tip text for this property.
    
    Returns:
    
    tip text for this property suitable for displaying in the explorer/experimenter gui
  - getNominalLabelSpecs
```
public java.lang.Object[] getNominalLabelSpecs()
```
    Get label specifications for nominal attributes.
    
    Returns:
    
    an array of label specifications
  - setNominalLabelSpecs
```
public void setNominalLabelSpecs(java.lang.Object[] specs)
```
    Set label specifications for nominal attributes.
    
    Parameters:
    
    specs - an array of label specifications
  - generateNames
```
public void generateNames(int initial,
                          int numAtts)
```
    Generate attribute names. Attributes are named "attinitial", "attinitial+1", ..., "attinitial+numAtts-1"
    
    Parameters:
    
    initial - the number to use for the first attribute
    
    numAtts - the number of attributes to generate
  - generateNames
```
public void generateNames(int numAtts)
```
    Generate attribute names. Attributes are named "att0", "att1", ... "attnumAtts-1"
    
    Parameters:
    
    numAtts - the number of attribute names to generate
  - initParserOnly
```
public void initParserOnly(java.util.List<java.lang.String> attNames)
```
    Only initialize enough stuff in order to parse rows and construct instances
    
    Parameters:
    
    attNames - the names of the attributes to use
  - parseRowOnly
```
public java.lang.String[] parseRowOnly(java.lang.String row)
                                throws java.io.IOException
```
    Just parse a row.
    
    Parameters:
    
    row - the row to parse
    
    Returns:
    
    the values of the row in an array
    
    Throws:
    
    java.io.IOException - if a problem occurs
  - processRowValues
```
public void processRowValues(java.lang.Object[] fieldVals,
                             java.util.List<java.lang.String> attNames)
                      throws DistributedWekaException,
                             java.io.IOException
```
    Process a tokenized row of values. attNames may be non-null for the first row and is optional. If not supplied then names will be generated on receiving the first row of data. An exception will be raised on subsequent rows that don't have the same number of fields as seen in the first row
    
    Parameters:
    
    fieldVals - the row values to process
    
    attNames - the names of the attributes (fields)
    
    Throws:
    
    if - the number of fields in the current row does not match the number of attribute names
    
    DistributedWekaException
    
    java.io.IOException
  - processRow
```
public void processRow(java.lang.String row,
                       java.util.List<java.lang.String> attNames)
                throws DistributedWekaException,
                       java.io.IOException
```
    Process a row of data coming into the map. Split the row into fields and initialize if this is the first row seen. attNames may be non-null for the first row and is optional. If not supplied then names will be generated on receiving the first row of data. An exception will be raised on subsequent rows that don't have the same number of fields as seen in the first row
    
    Parameters:
    
    row - the row to process
    
    attNames - the names of the attributes (fields)
    
    Throws:
    
    if - the number of fields in the current row does not match the number of attribute names
    
    DistributedWekaException
    
    java.io.IOException
  - getHeader
```
public weka.core.Instances getHeader()
```
    get the header information (as an Instances object) from what has been seen so far by this map task
    
    Returns:
    
    the header information as an Instances object
  - getHeaderAndQuantileEstimators
```
public CSVToARFFHeaderMapTask.HeaderAndQuantileDataHolder getHeaderAndQuantileEstimators()
                                                                                  throws DistributedWekaException
```
    Get the header information and the encoded quantile estimators
    
    Returns:
    
    a holder instance containing both the header information and encoded quantile estimators
    
    Throws:
    
    DistributedWekaException - if we are not computing summary statistics or we are computing statistics but not quantiles
  - serializeAllQuantileEstimators
```
public void serializeAllQuantileEstimators()
```
    Serialize all TDigest quantile estimators in use
  - deSerializeAllQuantileEstimators
```
public void deSerializeAllQuantileEstimators()
```
    Deserialize all TDigest quantile estimators in use
  - headerAvailableImmediately
```
public boolean headerAvailableImmediately(int numFields,
                                          java.util.List<java.lang.String> attNames,
                                          java.lang.StringBuffer problems)
```
    Check if the header can be produced immediately without having to do a pre-processing pass to determine and unify nominal attribute values. All types should be specified via the ranges and nominal label specs.
    
    Parameters:
    
    numFields - number of fields in the data
    
    attNames - the names of the attributes (in order)
    
    problems - a StringBuffer to hold problem descriptions (if any)
    
    Returns:
    
    true if the header can be generated immediately with out a pre-processing job
  - getHeader
```
public weka.core.Instances getHeader(int numFields,
                                     java.util.List<java.lang.String> attNames)
                              throws DistributedWekaException
```
    Get a header constructed using the supplied attribute names. This should only be called in the situation where the data does not require a pre-processing pass to determine and unify nominal attribute values. All types should be specified via the ranges and nominal label specifications.
    
    Parameters:
    
    numFields - the number of attributes in the data
    
    attNames - the attribute names to use. May be null, in which case names are generated
    
    Returns:
    
    an Instances object encapsulating header information
    
    Throws:
    
    DistributedWekaException - if nominal attributes have been specified but there are one or more tha have no user-supplied label specifications
  - fromHeader
```
public void fromHeader(weka.core.Instances headerWithSummary,
                       java.util.Map<java.lang.String,TDigest> quantileEstimators)
                throws DistributedWekaException
```
    Initialize internal state using the supplied ARFF header with summary attributes. Assumes that setOptions() has already been called on this instance of CSVToARFFHeaderMapTask.
    
    Parameters:
    
    headerWithSummary - the ARFF header (with summary attributes) to initialize with
    
    quantileEstimators - a map (keyed by attribute name) of TDigest estimators for numeric attributes (can be null if quantiles are not being estimated)
    
    Throws:
    
    DistributedWekaException - if a problem occurs
  - makeInstance
```
public weka.core.Instance makeInstance(weka.core.Instances trainingHeader,
                                       boolean setStringValues,
                                       java.lang.String[] parsed)
                                throws java.lang.Exception
```
    Utility method for Constructing a dense instance given an array of parsed CSV values
    
    Parameters:
    
    trainingHeader - the header to associate the instance with. Does not add the new instance to this data set; just gives the instance a reference to the header
    
    setStringValues - true if any string values should be set in the header as opposed to being added to the header (i.e. accumulating in the header).
    
    parsed - the array of parsed CSV values
    
    Returns:
    
    an Instance
    
    Throws:
    
    java.lang.Exception - if a problem occurs
  - makeInstance
```
public weka.core.Instance makeInstance(weka.core.Instances trainingHeader,
                                       boolean setStringValues,
                                       java.lang.String[] parsed,
                                       boolean sparse)
                                throws java.lang.Exception
```
    Utility method for Constructing an instance given an array of parsed CSV values
    
    Parameters:
    
    trainingHeader - the header to associate the instance with. Does not add the new instance to this data set; just gives the instance a reference to the header
    
    setStringValues - true if any string values should be set in the header as opposed to being added to the header (i.e. accumulating in the header).
    
    parsed - the array of parsed CSV values
    
    sparse - true if the new instance is to be a sparse instance
    
    Returns:
    
    an Instance
    
    Throws:
    
    java.lang.Exception - if a problem occurs
  - makeInstanceFromObjectRow
```
public weka.core.Instance makeInstanceFromObjectRow(weka.core.Instances trainingHeader,
                                                    boolean setStringValues,
                                                    java.lang.Object[] row,
                                                    boolean sparse)
                                             throws java.lang.Exception
```
    Utility method for Constructing an instance given an array of Objects
    
    Parameters:
    
    trainingHeader - the header to associate the instance with. Does not add the new instance to this data set; just gives the instance a reference to the header
    
    setStringValues - true if any string values should be set in the header as opposed to being added to the header (i.e. accumulating in the header).
    
    row - the array of Object values
    
    sparse - true if the new instance is to be a sparse instance
    
    Returns:
    
    an Instance
    
    Throws:
    
    java.lang.Exception - if a problem occurs
  - getDefaultValue
```
public java.lang.String getDefaultValue(int attIndex)
```
    Get the default label for a given attribute. May be null if a default value hasn't been specified
    
    Parameters:
    
    attIndex - the index (0-based) of the attribute to get the default value for
    
    Returns:
    
    the default value or null (if a default has not been specified)

Class CSVToARFFHeaderMapTask

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface weka.core.OptionHandler

Field Detail

ARFF_SUMMARY_ATTRIBUTE_PREFIX

MAX_PARSING_ERRORS

Constructor Detail

CSVToARFFHeaderMapTask

CSVToARFFHeaderMapTask

CSVToARFFHeaderMapTask

Method Detail

updateSummaryStats

instanceHeaderToAttributeNameList

main

combine

listOptions

getOptions

setOptions

setNumDecimalPlaces

getNumDecimalPlaces

setTreatUnparsableNumericValuesAsMissing

getTreatUnparsableNumericValuesAsMissing

getTreatZerosAsMissing

setTreatZerosAsMissing

getCompressionLevelForQuartileEstimation

setCompressionLevelForQuartileEstimation

compressionLevelForQuartileEstimationTipText

getComputeQuartilesAsPartOfSummaryStats

setComputeQuartilesAsPartOfSummaryStats

computeQuartilesAsPartOfSummaryStatsTipText

getMissingValue

setMissingValue

missingValueTipText

getStringAttributes

setStringAttributes

stringAttributesTipText

getNominalAttributes

setNominalAttributes

nominalAttributesTipText

getDateFormat

setDateFormat

dateFormatTipText

getDateAttributes

setDateAttributes

dateAttributesTipText

enclosureCharactersTipText

getEnclosureCharacters

setEnclosureCharacters

getFieldSeparator

setFieldSeparator

fieldSeparatorTipText

nominalDefaultLabelSpecsTipText

getNominalDefaultLabelSpecs

setNominalDefaultLabelSpecs

nominalLabelSpecsTipText

getNominalLabelSpecs

setNominalLabelSpecs

generateNames

generateNames

initParserOnly

parseRowOnly

processRowValues

processRow

getHeader

getHeaderAndQuantileEstimators

serializeAllQuantileEstimators

deSerializeAllQuantileEstimators

headerAvailableImmediately

getHeader

fromHeader

makeInstance

makeInstance

makeInstanceFromObjectRow

getDefaultValue