public class Instances extends java.util.AbstractList<Instance> implements java.io.Serializable, RevisionHandler
Typical usage:
import weka.core.converters.ConverterUtils.DataSource; ... // Read all the instances in the file (ARFF, CSV, XRFF, ...) DataSource source = new DataSource(filename); Instances instances = source.getDataSet(); // Make the last attribute be the class instances.setClassIndex(instances.numAttributes() - 1); // Print header and instances. System.out.println("\nDataset:\n"); System.out.println(instances); ...
All methods that change a set of instances are safe, ie. a change of a set of instances does not affect any other sets of instances. All methods that change a datasets's attribute information clone the dataset before it is changed.
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
ARFF_DATA
The keyword used to denote the start of the arff data section
|
static java.lang.String |
ARFF_RELATION
The keyword used to denote the start of an arff header
|
static java.lang.String |
FILE_EXTENSION
The filename extension that should be used for arff files
|
static java.lang.String |
SERIALIZED_OBJ_FILE_EXTENSION
The filename extension that should be used for bin.
|
Constructor and Description |
---|
Instances(Instances dataset)
Constructor copying all instances and references to the header information
from the given set of instances.
|
Instances(Instances dataset,
int capacity)
Constructor creating an empty set of instances.
|
Instances(Instances source,
int first,
int toCopy)
Creates a new set of instances by copying a subset of another set.
|
Instances(java.io.Reader reader)
Reads an ARFF file from a reader, and assigns a weight of one to each
instance.
|
Instances(java.io.Reader reader,
int capacity)
Deprecated.
instead of using this method in conjunction with the
readInstance(Reader) method, one should use the
ArffLoader or DataSource class
instead. |
Instances(java.lang.String name,
java.util.ArrayList<Attribute> attInfo,
int capacity)
Creates an empty set of instances.
|
Modifier and Type | Method and Description |
---|---|
boolean |
add(Instance instance)
Adds one instance to the end of the set.
|
void |
add(int index,
Instance instance)
Adds one instance at the given position in the list.
|
boolean |
allAttributeWeightsIdentical()
Returns true if all attribute weights are the same and false otherwise.
|
boolean |
allInstanceWeightsIdentical()
Returns true if all instance weights are the same and false otherwise.
|
Attribute |
attribute(int index)
Returns an attribute.
|
Attribute |
attribute(java.lang.String name)
Returns an attribute given its name.
|
AttributeStats |
attributeStats(int index)
Calculates summary statistics on the values that appear in this set of
instances for a specified attribute.
|
double[] |
attributeToDoubleArray(int index)
Gets the value of all instances in this dataset for a particular attribute.
|
boolean |
checkForAttributeType(int attType)
Checks for attributes of the given type in the dataset
|
boolean |
checkForStringAttributes()
Checks for string attributes in the dataset
|
boolean |
checkInstance(Instance instance)
Checks if the given instance is compatible with this dataset.
|
Attribute |
classAttribute()
Returns the class attribute.
|
int |
classIndex()
Returns the class attribute's index.
|
void |
compactify()
Compactifies the set of instances.
|
void |
delete()
Removes all instances from the set.
|
void |
delete(int index)
Removes an instance at the given position from the set.
|
void |
deleteAttributeAt(int position)
Deletes an attribute at the given position (0 to numAttributes()
- 1).
|
void |
deleteAttributeType(int attType)
Deletes all attributes of the given type in the dataset.
|
void |
deleteStringAttributes()
Deletes all string attributes in the dataset.
|
void |
deleteWithMissing(Attribute att)
Removes all instances with missing values for a particular attribute from
the dataset.
|
void |
deleteWithMissing(int attIndex)
Removes all instances with missing values for a particular attribute from
the dataset.
|
void |
deleteWithMissingClass()
Removes all instances with a missing class value from the dataset.
|
java.util.Enumeration<Attribute> |
enumerateAttributes()
Returns an enumeration of all the attributes.
|
java.util.Enumeration<Instance> |
enumerateInstances()
Returns an enumeration of all instances in the dataset.
|
boolean |
equalHeaders(Instances dataset)
Checks if two headers are equivalent.
|
java.lang.String |
equalHeadersMsg(Instances dataset)
Checks if two headers are equivalent.
|
Instance |
firstInstance()
Returns the first instance in the set.
|
Instance |
get(int index)
Returns the instance at the given position.
|
java.util.Random |
getRandomNumberGenerator(long seed)
Returns a random number generator.
|
java.lang.String |
getRevision()
Returns the revision string.
|
void |
insertAttributeAt(Attribute att,
int position)
Inserts an attribute at the given position (0 to numAttributes())
and sets all values to be missing.
|
Instance |
instance(int index)
Returns the instance at the given position.
|
double |
kthSmallestValue(Attribute att,
int k)
Returns the kth-smallest attribute value of a numeric attribute.
|
double |
kthSmallestValue(int attIndex,
int k)
Returns the kth-smallest attribute value of a numeric attribute.
|
Instance |
lastInstance()
Returns the last instance in the set.
|
static void |
main(java.lang.String[] args)
Main method for this class.
|
double |
meanOrMode(Attribute att)
Returns the mean (mode) for a numeric (nominal) attribute as a
floating-point value.
|
double |
meanOrMode(int attIndex)
Returns the mean (mode) for a numeric (nominal) attribute as a
floating-point value.
|
static Instances |
mergeInstances(Instances first,
Instances second)
Merges two sets of Instances together.
|
int |
numAttributes()
Returns the number of attributes.
|
int |
numClasses()
Returns the number of class labels.
|
int |
numDistinctValues(Attribute att)
Returns the number of distinct values of a given attribute.
|
int |
numDistinctValues(int attIndex)
Returns the number of distinct values of a given attribute.
|
int |
numInstances()
Returns the number of instances in the dataset.
|
void |
randomize(java.util.Random random)
Shuffles the instances in the set so that they are ordered randomly.
|
boolean |
readInstance(java.io.Reader reader)
Deprecated.
instead of using this method in conjunction with the
readInstance(Reader) method, one should use the
ArffLoader or DataSource class
instead. |
java.lang.String |
relationName()
Returns the relation's name.
|
Instance |
remove(int index)
Removes the instance at the given position.
|
void |
renameAttribute(Attribute att,
java.lang.String name)
Renames an attribute.
|
void |
renameAttribute(int att,
java.lang.String name)
Renames an attribute.
|
void |
renameAttributeValue(Attribute att,
java.lang.String val,
java.lang.String name)
Renames the value of a nominal (or string) attribute value.
|
void |
renameAttributeValue(int att,
int val,
java.lang.String name)
Renames the value of a nominal (or string) attribute value.
|
void |
replaceAttributeAt(Attribute att,
int position)
Replaces the attribute at the given position (0 to
numAttributes()) with the given attribute and sets all its values to
be missing.
|
Instances |
resample(java.util.Random random)
Creates a new dataset of the same size as this dataset using random sampling with
replacement.
|
Instances |
resampleWithWeights(java.util.Random random)
Creates a new dataset of the same size as this dataset using random sampling with
replacement according to the current instance weights.
|
Instances |
resampleWithWeights(java.util.Random random,
boolean representUsingWeights)
Creates a new dataset of the same size as this dataset using random sampling with
replacement according to the current instance weights.
|
Instances |
resampleWithWeights(java.util.Random random,
boolean[] sampled)
Creates a new dataset of the same size as this dataset using random sampling with
replacement according to the current instance weights.
|
Instances |
resampleWithWeights(java.util.Random random,
boolean[] sampled,
boolean representUsingWeights)
Creates a new dataset of the same size as this dataset using random sampling with
replacement according to the current instance weights.
|
Instances |
resampleWithWeights(java.util.Random random,
boolean[] sampled,
boolean representUsingWeights,
double sampleSize)
Creates a new dataset from this dataset using random sampling with
replacement according to current instance weights.
|
Instances |
resampleWithWeights(java.util.Random random,
double[] weights)
Creates a new dataset of the same size as this dataset using random sampling with
replacement according to the given weight vector.
|
Instances |
resampleWithWeights(java.util.Random random,
double[] weights,
boolean[] sampled)
Creates a new dataset of the same size as this dataset using random sampling with
replacement according to the given weight vector.
|
Instances |
resampleWithWeights(java.util.Random random,
double[] weights,
boolean[] sampled,
boolean representUsingWeights)
Creates a new dataset of the same size as this dataset using random sampling with
replacement according to the given weight vector.
|
Instances |
resampleWithWeights(java.util.Random random,
double[] weights,
boolean[] sampled,
boolean representUsingWeights,
double sampleSize)
Creates a new dataset from this dataset using random sampling with
replacement according to the given weight vector.
|
Instance |
set(int index,
Instance instance)
Replaces the instance at the given position.
|
void |
setAttributeWeight(Attribute att,
double weight)
Sets the weight of an attribute.
|
void |
setAttributeWeight(int att,
double weight)
Sets the weight of an attribute.
|
void |
setClass(Attribute att)
Sets the class attribute.
|
void |
setClassIndex(int classIndex)
Sets the class index of the set.
|
void |
setRelationName(java.lang.String newName)
Sets the relation's name.
|
int |
size()
Returns the number of instances in the dataset.
|
void |
sort(Attribute att)
Sorts the instances based on an attribute.
|
void |
sort(int attIndex)
Sorts the instances based on an attribute.
|
void |
stableSort(Attribute att)
Sorts the instances based on an attribute, using a stable sort.
|
void |
stableSort(int attIndex)
Sorts the instances based on an attribute, using a stable sort.
|
void |
stratify(int numFolds)
Stratifies a set of instances according to its class values if the class
attribute is nominal (so that afterwards a stratified cross-validation can
be performed).
|
Instances |
stringFreeStructure()
Create a copy of the structure.
|
double |
sumOfWeights()
Computes the sum of all the instances' weights.
|
void |
swap(int i,
int j)
Swaps two instances in the set.
|
static void |
test(java.lang.String[] argv)
Method for testing this class.
|
Instances |
testCV(int numFolds,
int numFold)
Creates the test set for one fold of a cross-validation on the dataset.
|
java.lang.String |
toString()
Returns the dataset as a string in ARFF format.
|
java.lang.String |
toSummaryString()
Generates a string summarizing the set of instances.
|
Instances |
trainCV(int numFolds,
int numFold)
Creates the training set for one fold of a cross-validation on the dataset.
|
Instances |
trainCV(int numFolds,
int numFold,
java.util.Random random)
Creates the training set for one fold of a cross-validation on the dataset.
|
double |
variance(Attribute att)
Computes the variance for a numeric attribute.
|
double |
variance(int attIndex)
Computes the variance for a numeric attribute.
|
double[] |
variances()
Computes the variance for all numeric attributes simultaneously.
|
addAll, clear, equals, hashCode, indexOf, iterator, lastIndexOf, listIterator, listIterator, subList
addAll, contains, containsAll, isEmpty, remove, removeAll, retainAll, toArray, toArray
public static final java.lang.String FILE_EXTENSION
public static final java.lang.String SERIALIZED_OBJ_FILE_EXTENSION
public static final java.lang.String ARFF_RELATION
public static final java.lang.String ARFF_DATA
public Instances(java.io.Reader reader) throws java.io.IOException
reader
- the readerjava.io.IOException
- if the ARFF file is not read successfully@Deprecated public Instances(java.io.Reader reader, int capacity) throws java.io.IOException
readInstance(Reader)
method, one should use the
ArffLoader
or DataSource
class
instead.reader
- the readercapacity
- the capacityjava.lang.IllegalArgumentException
- if the header is not read successfully or
the capacity is negative.java.io.IOException
- if there is a problem with the reader.ArffLoader
,
ConverterUtils.DataSource
public Instances(Instances dataset)
dataset
- the set to be copiedpublic Instances(Instances dataset, int capacity)
dataset
- the instances from which the header information is to be
takencapacity
- the capacity of the new datasetpublic Instances(Instances source, int first, int toCopy)
source
- the set of instances from which a subset is to be createdfirst
- the index of the first instance to be copiedtoCopy
- the number of instances to be copiedjava.lang.IllegalArgumentException
- if first and toCopy are out of rangepublic Instances(java.lang.String name, java.util.ArrayList<Attribute> attInfo, int capacity)
name
- the name of the relationattInfo
- the attribute informationcapacity
- the capacity of the setjava.lang.IllegalArgumentException
- if attribute names are not uniquepublic Instances stringFreeStructure()
public boolean add(Instance instance)
public void add(int index, Instance instance)
public boolean allAttributeWeightsIdentical()
public boolean allInstanceWeightsIdentical()
public Attribute attribute(int index)
index
- the attribute's index (index starts with 0)public Attribute attribute(java.lang.String name)
name
- the attribute's namepublic boolean checkForAttributeType(int attType)
attType
- the attribute type to look forpublic boolean checkForStringAttributes()
public boolean checkInstance(Instance instance)
instance
- the instance to checkpublic Attribute classAttribute()
UnassignedClassException
- if the class is not setpublic int classIndex()
public void compactify()
public void delete()
public void delete(int index)
index
- the instance's position (index starts with 0)public void deleteAttributeAt(int position)
position
- the attribute's position (position starts with 0)java.lang.IllegalArgumentException
- if the given index is out of range or the
class attribute is being deletedpublic void deleteAttributeType(int attType)
attType
- the attribute type to deletejava.lang.IllegalArgumentException
- if attribute couldn't be successfully
deleted (probably because it is the class attribute).public void deleteStringAttributes()
java.lang.IllegalArgumentException
- if string attribute couldn't be
successfully deleted (probably because it is the class
attribute).deleteAttributeType(int)
public void deleteWithMissing(int attIndex)
attIndex
- the attribute's index (index starts with 0)public void deleteWithMissing(Attribute att)
att
- the attributepublic void deleteWithMissingClass()
UnassignedClassException
- if class is not setpublic java.util.Enumeration<Attribute> enumerateAttributes()
public java.util.Enumeration<Instance> enumerateInstances()
public java.lang.String equalHeadersMsg(Instances dataset)
dataset
- another datasetpublic boolean equalHeaders(Instances dataset)
dataset
- another datasetpublic Instance firstInstance()
public java.util.Random getRandomNumberGenerator(long seed)
seed
- the given seedpublic void insertAttributeAt(Attribute att, int position)
att
- the attribute to be insertedposition
- the attribute's position (position starts with 0)java.lang.IllegalArgumentException
- if the given index is out of rangepublic Instance instance(int index)
index
- the instance's index (index starts with 0)public Instance get(int index)
public double kthSmallestValue(Attribute att, int k)
att
- the Attribute objectk
- the value of kpublic double kthSmallestValue(int attIndex, int k)
attIndex
- the attribute's indexk
- the value of kpublic Instance lastInstance()
public double meanOrMode(int attIndex)
attIndex
- the attribute's index (index starts with 0)public double meanOrMode(Attribute att)
att
- the attributepublic int numAttributes()
public int numClasses()
UnassignedClassException
- if the class is not setpublic int numDistinctValues(int attIndex)
attIndex
- the attribute (index starts with 0)public int numDistinctValues(Attribute att)
att
- the attributepublic int numInstances()
public int size()
public void randomize(java.util.Random random)
random
- a random number generator@Deprecated public boolean readInstance(java.io.Reader reader) throws java.io.IOException
readInstance(Reader)
method, one should use the
ArffLoader
or DataSource
class
instead.reader
- the readerjava.io.IOException
- if the information is not read successfullyArffLoader
,
ConverterUtils.DataSource
public void replaceAttributeAt(Attribute att, int position)
att
- the attribute to be insertedposition
- the attribute's position (position starts with 0)java.lang.IllegalArgumentException
- if the given index is out of rangepublic java.lang.String relationName()
public Instance remove(int index)
public void renameAttribute(int att, java.lang.String name)
att
- the attribute's index (index starts with 0)name
- the new namepublic void setAttributeWeight(Attribute att, double weight)
att
- the attributeweight
- the new weightpublic void setAttributeWeight(int att, double weight)
att
- the attribute's index (index starts with 0)weight
- the new weightpublic void renameAttribute(Attribute att, java.lang.String name)
att
- the attributename
- the new namepublic void renameAttributeValue(int att, int val, java.lang.String name)
att
- the attribute's index (index starts with 0)val
- the value's index (index starts with 0)name
- the new namepublic void renameAttributeValue(Attribute att, java.lang.String val, java.lang.String name)
att
- the attributeval
- the valuename
- the new namepublic Instances resample(java.util.Random random)
random
- a random number generatorpublic Instances resampleWithWeights(java.util.Random random)
random
- a random number generatorpublic Instances resampleWithWeights(java.util.Random random, boolean[] sampled)
random
- a random number generatorsampled
- an array indicating what has been sampledpublic Instances resampleWithWeights(java.util.Random random, boolean representUsingWeights)
random
- a random number generatorrepresentUsingWeights
- if true, copies are represented using weights
in resampled datapublic Instances resampleWithWeights(java.util.Random random, boolean[] sampled, boolean representUsingWeights)
random
- a random number generatorsampled
- an array indicating what has been sampledrepresentUsingWeights
- if true, copies are represented using weights
in resampled datapublic Instances resampleWithWeights(java.util.Random random, boolean[] sampled, boolean representUsingWeights, double sampleSize)
random
- a random number generatorsampled
- an array indicating what has been sampled, can be nullrepresentUsingWeights
- if true, copies are represented using weights
in resampled datasampleSize
- size of the new dataset as a percentage of the size of this
datasetjava.lang.IllegalArgumentException
- if the weights array is of the wrong
length or contains negative weights.public Instances resampleWithWeights(java.util.Random random, double[] weights)
random
- a random number generatorweights
- the weight vectorjava.lang.IllegalArgumentException
- if the weights array is of the wrong
length or contains negative weights.public Instances resampleWithWeights(java.util.Random random, double[] weights, boolean[] sampled)
random
- a random number generatorweights
- the weight vectorsampled
- an array indicating what has been sampled, can be nulljava.lang.IllegalArgumentException
- if the weights array is of the wrong
length or contains negative weights.public Instances resampleWithWeights(java.util.Random random, double[] weights, boolean[] sampled, boolean representUsingWeights)
random
- a random number generatorweights
- the weight vectorsampled
- an array indicating what has been sampled, can be nullrepresentUsingWeights
- if true, copies are represented using weights
in resampled datajava.lang.IllegalArgumentException
- if the weights array is of the wrong
length or contains negative weights.public Instances resampleWithWeights(java.util.Random random, double[] weights, boolean[] sampled, boolean representUsingWeights, double sampleSize)
random
- a random number generatorweights
- the weight vectorsampled
- an array indicating what has been sampled, can be nullrepresentUsingWeights
- if true, copies are represented using weights
in resampled datasampleSize
- size of the new dataset as a percentage of the size of this
datasetjava.lang.IllegalArgumentException
- if the weights array is of the wrong
length or contains negative weights.public Instance set(int index, Instance instance)
public void setClass(Attribute att)
att
- attribute to be the classpublic void setClassIndex(int classIndex)
classIndex
- the new class index (index starts with 0)java.lang.IllegalArgumentException
- if the class index is too big or < 0public void setRelationName(java.lang.String newName)
newName
- the new relation name.public void sort(int attIndex)
attIndex
- the attribute's index (index starts with 0)public void sort(Attribute att)
att
- the attributepublic void stableSort(int attIndex)
attIndex
- the attribute's index (index starts with 0)public void stableSort(Attribute att)
att
- the attributepublic void stratify(int numFolds)
numFolds
- the number of folds in the cross-validationUnassignedClassException
- if the class is not setpublic double sumOfWeights()
public Instances testCV(int numFolds, int numFold)
numFolds
- the number of folds in the cross-validation. Must be
greater than 1.numFold
- 0 for the first fold, 1 for the second, ...java.lang.IllegalArgumentException
- if the number of folds is less than 2 or
greater than the number of instances.public java.lang.String toString()
toString
in class java.util.AbstractCollection<Instance>
public Instances trainCV(int numFolds, int numFold)
numFolds
- the number of folds in the cross-validation. Must be
greater than 1.numFold
- 0 for the first fold, 1 for the second, ...java.lang.IllegalArgumentException
- if the number of folds is less than 2 or
greater than the number of instances.public Instances trainCV(int numFolds, int numFold, java.util.Random random)
numFolds
- the number of folds in the cross-validation. Must be
greater than 1.numFold
- 0 for the first fold, 1 for the second, ...random
- the random number generatorjava.lang.IllegalArgumentException
- if the number of folds is less than 2 or
greater than the number of instances.public double[] variances()
public double variance(int attIndex)
attIndex
- the numeric attribute (index starts with 0)java.lang.IllegalArgumentException
- if the attribute is not numericpublic double variance(Attribute att)
att
- the numeric attributejava.lang.IllegalArgumentException
- if the attribute is not numericpublic AttributeStats attributeStats(int index)
index
- the index of the attribute to summarize (index starts with 0)public double[] attributeToDoubleArray(int index)
index
- the index of the attribute.public java.lang.String toSummaryString()
public void swap(int i, int j)
i
- the first instance's index (index starts with 0)j
- the second instance's index (index starts with 0)public static Instances mergeInstances(Instances first, Instances second)
first
- the first set of Instancessecond
- the second set of Instancesjava.lang.IllegalArgumentException
- if the datasets are not the same sizepublic static void test(java.lang.String[] argv)
argv
- should contain one element: the name of an ARFF filepublic static void main(java.lang.String[] args)
weka.core.Instances
helpweka.core.Instances
<filename>weka.core.Instances
merge <filename1> <filename2>weka.core.Instances
append <filename1> <filename2>
weka.core.Instances
headers <filename1>
<filename2>weka.core.Instances
randomize <seed> <filename>args
- the commandline parameterspublic java.lang.String getRevision()
getRevision
in interface RevisionHandler