public class CSVLoader extends AbstractFileLoader implements BatchConverter, IncrementalConverter, OptionHandler
-H No header row present in the data.
-N <range> The range of attributes to force type to be NOMINAL. 'first' and 'last' are accepted as well. Examples: "first-last", "1,4,5-27,50-last" (default: -none-)
-L <nominal label spec> Optional specification of legal labels for nominal attributes. May be specified multiple times. Batch mode can determine this automatically (and so can incremental mode if the first in memory buffer load of instances contains an example of each legal value). The spec contains two parts separated by a ":". The first part can be a range of attribute indexes or a comma-separated list off attruibute names; the second part is a comma-separated list of labels. E.g "1,2,4-6:red,green,blue" or "att1,att2:red,green,blue"
-S <range> The range of attribute to force type to be STRING. 'first' and 'last' are accepted as well. Examples: "first-last", "1,4,5-27,50-last" (default: -none-)
-D <range> The range of attribute to force type to be DATE. 'first' and 'last' are accepted as well. Examples: "first-last", "1,4,5-27,50-last" (default: -none-)
-format <date format> The date formatting string to use to parse date values. (default: "yyyy-MM-dd'T'HH:mm:ss")
-R <range> The range of attribute to force type to be NUMERIC. 'first' and 'last' are accepted as well. Examples: "first-last", "1,4,5-27,50-last" (default: -none-)
-M <str> The string representing a missing value. (default: ?)
-F <separator> The field separator to be used. '\t' can be used as well. (default: ',')
-E <enclosures> The enclosure character(s) to use for strings. Specify as a comma separated list (e.g. ",' (default: ",')
-B <num> The size of the in memory buffer (in rows). (default: 100)
Loader.StructureNotReadyException
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
FILE_EXTENSION
the file extension.
|
FILE_EXTENSION_COMPRESSED
BATCH, INCREMENTAL, NONE
Constructor and Description |
---|
CSVLoader()
default constructor.
|
Modifier and Type | Method and Description |
---|---|
java.lang.String |
bufferSizeTipText()
Returns the tip text for this property.
|
java.lang.String |
dateAttributesTipText()
Returns the tip text for this property.
|
java.lang.String |
dateFormatTipText()
Returns the tip text for this property.
|
java.lang.String |
enclosureCharactersTipText()
Returns the tip text for this property.
|
java.lang.String |
fieldSeparatorTipText()
Returns the tip text for this property.
|
int |
getBufferSize()
Get the buffer size to use - i.e.
|
Instances |
getDataSet()
Return the full data set.
|
java.lang.String |
getDateAttributes()
Returns the current attribute range to be forced to type date.
|
java.lang.String |
getDateFormat()
Get the format to use for parsing date values.
|
java.lang.String |
getEnclosureCharacters()
Get the character(s) to use/recognize as string enclosures
|
java.lang.String |
getFieldSeparator()
Returns the character used as column separator.
|
java.lang.String |
getFileDescription()
Get a one line description of the type of file
|
java.lang.String |
getFileExtension()
Get the file extension used for this type of file
|
java.lang.String[] |
getFileExtensions()
Gets all the file extensions used for this type of file
|
java.lang.String |
getMissingValue()
Returns the current placeholder for missing values.
|
Instance |
getNextInstance(Instances structure)
Read the data set incrementally---get the next instance in the data set or
returns null if there are no more instances to get.
|
boolean |
getNoHeaderRowPresent()
Get whether there is no header row in the data.
|
java.lang.String |
getNominalAttributes()
Returns the current attribute range to be forced to type nominal.
|
java.lang.Object[] |
getNominalLabelSpecs()
Get label specifications for nominal attributes.
|
java.lang.String |
getNumericAttributes()
Gets the attribute range to be forced to type numeric
|
java.lang.String[] |
getOptions()
Gets the current option settings for the OptionHandler.
|
java.lang.String |
getRevision()
Returns the revision string.
|
java.lang.String |
getStringAttributes()
Returns the current attribute range to be forced to type string.
|
Instances |
getStructure()
Determines and returns (if possible) the structure (internally the header)
of the data set as an empty set of instances.
|
java.lang.String |
globalInfo()
Returns a string describing this attribute evaluator.
|
java.util.Enumeration<Option> |
listOptions()
Returns an enumeration of all the available options..
|
static void |
main(java.lang.String[] args)
Main method.
|
java.lang.String |
missingValueTipText()
Returns the tip text for this property.
|
java.lang.String |
noHeaderRowPresentTipText()
Returns the tip text for this property.
|
java.lang.String |
nominalAttributesTipText()
Returns the tip text for this property.
|
java.lang.String |
nominalLabelSpecsTipText()
Returns the tip text for this property.
|
java.lang.String |
numericAttributesTipText()
Returns the tip text for this property.
|
void |
reset()
Resets the loader ready to read a new data set
|
void |
setBufferSize(int buff)
Set the buffer size to use - i.e.
|
void |
setDateAttributes(java.lang.String value)
Set the attribute range to be forced to type date.
|
void |
setDateFormat(java.lang.String value)
Set the format to use for parsing date values.
|
void |
setEnclosureCharacters(java.lang.String enclosure)
Set the character(s) to use/recognize as string enclosures
|
void |
setFieldSeparator(java.lang.String value)
Sets the character used as column separator.
|
void |
setMissingValue(java.lang.String value)
Sets the placeholder for missing values.
|
void |
setNoHeaderRowPresent(boolean b)
Set whether there is no header row in the data.
|
void |
setNominalAttributes(java.lang.String value)
Sets the attribute range to be forced to type nominal.
|
void |
setNominalLabelSpecs(java.lang.Object[] specs)
Set label specifications for nominal attributes.
|
void |
setNumericAttributes(java.lang.String value)
Sets the attribute range to be forced to type numeric
|
void |
setOptions(java.lang.String[] options)
Sets the OptionHandler's options using the given list.
|
void |
setSource(java.io.File file)
Resets the Loader object and sets the source of the data set to be the
supplied File object.
|
void |
setSource(java.io.InputStream input)
Resets the Loader object and sets the source of the data set to be the
supplied Stream object.
|
void |
setStringAttributes(java.lang.String value)
Sets the attribute range to be forced to type string.
|
java.lang.String |
stringAttributesTipText()
Returns the tip text for this property.
|
getUseRelativePath, retrieveFile, runFileLoader, setEnvironment, setFile, setUseRelativePath, useRelativePathTipText
setRetrieval
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
makeCopy
public static void main(java.lang.String[] args)
args
- should contain the name of an input file.public java.lang.String globalInfo()
public java.lang.String getFileExtension()
FileSourcedConverter
getFileExtension
in interface FileSourcedConverter
public java.lang.String[] getFileExtensions()
FileSourcedConverter
getFileExtensions
in interface FileSourcedConverter
public java.lang.String getFileDescription()
FileSourcedConverter
getFileDescription
in interface FileSourcedConverter
public java.lang.String getRevision()
RevisionHandler
getRevision
in interface RevisionHandler
public java.lang.String noHeaderRowPresentTipText()
public boolean getNoHeaderRowPresent()
public void setNoHeaderRowPresent(boolean b)
b
- true if there is no header row in the datapublic java.lang.String getMissingValue()
public void setMissingValue(java.lang.String value)
value
- the placeholderpublic java.lang.String missingValueTipText()
public java.lang.String getStringAttributes()
public void setStringAttributes(java.lang.String value)
value
- the rangepublic java.lang.String stringAttributesTipText()
public java.lang.String getNominalAttributes()
public void setNominalAttributes(java.lang.String value)
value
- the rangepublic java.lang.String nominalAttributesTipText()
public java.lang.String getNumericAttributes()
public void setNumericAttributes(java.lang.String value)
value
- the rangepublic java.lang.String numericAttributesTipText()
public java.lang.String getDateFormat()
public void setDateFormat(java.lang.String value)
value
- the format to use.public java.lang.String dateFormatTipText()
public java.lang.String getDateAttributes()
public void setDateAttributes(java.lang.String value)
value
- the rangepublic java.lang.String dateAttributesTipText()
public java.lang.String enclosureCharactersTipText()
public java.lang.String getEnclosureCharacters()
public void setEnclosureCharacters(java.lang.String enclosure)
enclosure
- the characters to use as string enclosurespublic java.lang.String getFieldSeparator()
public void setFieldSeparator(java.lang.String value)
value
- the character to usepublic java.lang.String fieldSeparatorTipText()
public int getBufferSize()
public void setBufferSize(int buff)
buff
- the buffer size (number of rows)public java.lang.String bufferSizeTipText()
public java.lang.Object[] getNominalLabelSpecs()
public void setNominalLabelSpecs(java.lang.Object[] specs)
specs
- an array of label specificationspublic java.lang.String nominalLabelSpecsTipText()
public java.util.Enumeration<Option> listOptions()
OptionHandler
listOptions
in interface OptionHandler
public java.lang.String[] getOptions()
OptionHandler
getOptions
in interface OptionHandler
public void setOptions(java.lang.String[] options) throws java.lang.Exception
OptionHandler
setOptions
in interface OptionHandler
options
- the list of options as an array of stringsjava.lang.Exception
- if an option is not supportedpublic Instance getNextInstance(Instances structure) throws java.io.IOException
Loader
getNextInstance
in interface Loader
getNextInstance
in class AbstractLoader
structure
- the dataset header information, will get updated in case
of string or relational attributesjava.io.IOException
- if there is an error during parsing or if getDataSet
has been called on this source (either incremental or batch
loading can be used, not both).public Instances getDataSet() throws java.io.IOException
Loader
getDataSet
in interface Loader
getDataSet
in class AbstractLoader
java.io.IOException
- if there is an error during parsing or if
getNextInstance has been called on this source (either
incremental or batch loading can be used, not both).
public_normal_behavior requires: model_sourceSupplied == true && (* successful parse *); modifiable: model_structureDetermined; ensures: \result != null && \result.numInstances() >= 0 && model_structureDetermined == true; also public_exceptional_behavior requires: model_sourceSupplied == false || (* unsuccessful parse *); signals: (IOException);
public void setSource(java.io.InputStream input) throws java.io.IOException
setSource
in interface Loader
setSource
in class AbstractLoader
input
- the input streamjava.io.IOException
- if an error occurspublic void setSource(java.io.File file) throws java.io.IOException
setSource
in interface Loader
setSource
in class AbstractFileLoader
file
- the source file.java.io.IOException
- if an error occurspublic Instances getStructure() throws java.io.IOException
Loader
getStructure
in interface Loader
getStructure
in class AbstractLoader
java.io.IOException
- if there is no source or parsing fails
public_normal_behavior requires: model_sourceSupplied == true && model_structureDetermined == false && (* successful parse *); modifiable: model_structureDetermined; ensures: \result != null && \result.numInstances() == 0 && model_structureDetermined == true; also public_exceptional_behavior requires: model_sourceSupplied == false || (* unsuccessful parse *); signals: (IOException);
public void reset() throws java.io.IOException
AbstractFileLoader
reset
in interface Loader
reset
in class AbstractFileLoader
java.io.IOException
- if something goes wrong