|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectweka.core.Instances
public class Instances
Class for handling an ordered set of weighted instances.
Typical usage:
import weka.core.converters.ConverterUtils.DataSource;
...
// Read all the instances in the file (ARFF, CSV, XRFF, ...)
DataSource source = new DataSource(filename);
Instances instances = source.getDataSet();
// Make the last attribute be the class
instances.setClassIndex(instances.numAttributes() - 1);
// Print header and instances.
System.out.println("\nDataset:\n");
System.out.println(instances);
...
All methods that change a set of instances are safe, ie. a change of a set of instances does not affect any other sets of instances. All methods that change a datasets's attribute information clone the dataset before it is changed.
| Field Summary | |
|---|---|
static java.lang.String |
ARFF_DATA
The keyword used to denote the start of the arff data section |
static java.lang.String |
ARFF_RELATION
The keyword used to denote the start of an arff header |
static java.lang.String |
FILE_EXTENSION
The filename extension that should be used for arff files |
static java.lang.String |
SERIALIZED_OBJ_FILE_EXTENSION
The filename extension that should be used for bin. |
| Constructor Summary | |
|---|---|
Instances(Instances dataset)
Constructor copying all instances and references to the header information from the given set of instances. |
|
Instances(Instances dataset,
int capacity)
Constructor creating an empty set of instances. |
|
Instances(Instances source,
int first,
int toCopy)
Creates a new set of instances by copying a subset of another set. |
|
Instances(java.io.Reader reader)
Reads an ARFF file from a reader, and assigns a weight of one to each instance. |
|
Instances(java.io.Reader reader,
int capacity)
Deprecated. instead of using this method in conjunction with the readInstance(Reader) method, one should use the
ArffLoader or DataSource class instead. |
|
Instances(java.lang.String name,
FastVector attInfo,
int capacity)
Creates an empty set of instances. |
|
| Method Summary | |
|---|---|
void |
add(Instance instance)
Adds one instance to the end of the set. |
Attribute |
attribute(int index)
Returns an attribute. |
Attribute |
attribute(java.lang.String name)
Returns an attribute given its name. |
AttributeStats |
attributeStats(int index)
Calculates summary statistics on the values that appear in this set of instances for a specified attribute. |
double[] |
attributeToDoubleArray(int index)
Gets the value of all instances in this dataset for a particular attribute. |
boolean |
checkForAttributeType(int attType)
Checks for attributes of the given type in the dataset |
boolean |
checkForStringAttributes()
Checks for string attributes in the dataset |
boolean |
checkInstance(Instance instance)
Checks if the given instance is compatible with this dataset. |
Attribute |
classAttribute()
Returns the class attribute. |
int |
classIndex()
Returns the class attribute's index. |
void |
compactify()
Compactifies the set of instances. |
void |
delete()
Removes all instances from the set. |
void |
delete(int index)
Removes an instance at the given position from the set. |
void |
deleteAttributeAt(int position)
Deletes an attribute at the given position (0 to numAttributes() - 1). |
void |
deleteAttributeType(int attType)
Deletes all attributes of the given type in the dataset. |
void |
deleteStringAttributes()
Deletes all string attributes in the dataset. |
void |
deleteWithMissing(Attribute att)
Removes all instances with missing values for a particular attribute from the dataset. |
void |
deleteWithMissing(int attIndex)
Removes all instances with missing values for a particular attribute from the dataset. |
void |
deleteWithMissingClass()
Removes all instances with a missing class value from the dataset. |
java.util.Enumeration |
enumerateAttributes()
Returns an enumeration of all the attributes. |
java.util.Enumeration |
enumerateInstances()
Returns an enumeration of all instances in the dataset. |
boolean |
equalHeaders(Instances dataset)
Checks if two headers are equivalent. |
Instance |
firstInstance()
Returns the first instance in the set. |
java.util.Random |
getRandomNumberGenerator(long seed)
Returns a random number generator. |
java.lang.String |
getRevision()
Returns the revision string. |
void |
insertAttributeAt(Attribute att,
int position)
Inserts an attribute at the given position (0 to numAttributes()) and sets all values to be missing. |
Instance |
instance(int index)
Returns the instance at the given position. |
double |
kthSmallestValue(Attribute att,
int k)
Returns the kth-smallest attribute value of a numeric attribute. |
double |
kthSmallestValue(int attIndex,
int k)
Returns the kth-smallest attribute value of a numeric attribute. |
Instance |
lastInstance()
Returns the last instance in the set. |
static void |
main(java.lang.String[] args)
Main method for this class. |
double |
meanOrMode(Attribute att)
Returns the mean (mode) for a numeric (nominal) attribute as a floating-point value. |
double |
meanOrMode(int attIndex)
Returns the mean (mode) for a numeric (nominal) attribute as a floating-point value. |
static Instances |
mergeInstances(Instances first,
Instances second)
Merges two sets of Instances together. |
int |
numAttributes()
Returns the number of attributes. |
int |
numClasses()
Returns the number of class labels. |
int |
numDistinctValues(Attribute att)
Returns the number of distinct values of a given attribute. |
int |
numDistinctValues(int attIndex)
Returns the number of distinct values of a given attribute. |
int |
numInstances()
Returns the number of instances in the dataset. |
void |
randomize(java.util.Random random)
Shuffles the instances in the set so that they are ordered randomly. |
boolean |
readInstance(java.io.Reader reader)
Deprecated. instead of using this method in conjunction with the readInstance(Reader) method, one should use the
ArffLoader or DataSource class instead. |
java.lang.String |
relationName()
Returns the relation's name. |
void |
renameAttribute(Attribute att,
java.lang.String name)
Renames an attribute. |
void |
renameAttribute(int att,
java.lang.String name)
Renames an attribute. |
void |
renameAttributeValue(Attribute att,
java.lang.String val,
java.lang.String name)
Renames the value of a nominal (or string) attribute value. |
void |
renameAttributeValue(int att,
int val,
java.lang.String name)
Renames the value of a nominal (or string) attribute value. |
Instances |
resample(java.util.Random random)
Creates a new dataset of the same size using random sampling with replacement. |
Instances |
resampleWithWeights(java.util.Random random)
Creates a new dataset of the same size using random sampling with replacement according to the current instance weights. |
Instances |
resampleWithWeights(java.util.Random random,
double[] weights)
Creates a new dataset of the same size using random sampling with replacement according to the given weight vector. |
void |
setClass(Attribute att)
Sets the class attribute. |
void |
setClassIndex(int classIndex)
Sets the class index of the set. |
void |
setRelationName(java.lang.String newName)
Sets the relation's name. |
void |
sort(Attribute att)
Sorts the instances based on an attribute. |
void |
sort(int attIndex)
Sorts the instances based on an attribute. |
void |
stratify(int numFolds)
Stratifies a set of instances according to its class values if the class attribute is nominal (so that afterwards a stratified cross-validation can be performed). |
Instances |
stringFreeStructure()
Create a copy of the structure if the data has string or relational attributes, "cleanses" string types (i.e. |
double |
sumOfWeights()
Computes the sum of all the instances' weights. |
void |
swap(int i,
int j)
Swaps two instances in the set. |
static void |
test(java.lang.String[] argv)
Method for testing this class. |
Instances |
testCV(int numFolds,
int numFold)
Creates the test set for one fold of a cross-validation on the dataset. |
java.lang.String |
toString()
Returns the dataset as a string in ARFF format. |
java.lang.String |
toSummaryString()
Generates a string summarizing the set of instances. |
Instances |
trainCV(int numFolds,
int numFold)
Creates the training set for one fold of a cross-validation on the dataset. |
Instances |
trainCV(int numFolds,
int numFold,
java.util.Random random)
Creates the training set for one fold of a cross-validation on the dataset. |
double |
variance(Attribute att)
Computes the variance for a numeric attribute. |
double |
variance(int attIndex)
Computes the variance for a numeric attribute. |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
public static final java.lang.String FILE_EXTENSION
public static final java.lang.String SERIALIZED_OBJ_FILE_EXTENSION
public static final java.lang.String ARFF_RELATION
public static final java.lang.String ARFF_DATA
| Constructor Detail |
|---|
public Instances(java.io.Reader reader)
throws java.io.IOException
reader - the reader
java.io.IOException - if the ARFF file is not read
successfully
@Deprecated
public Instances(java.io.Reader reader,
int capacity)
throws java.io.IOException
readInstance(Reader) method, one should use the
ArffLoader or DataSource class instead.
reader - the readercapacity - the capacity
java.lang.IllegalArgumentException - if the header is not read successfully
or the capacity is negative.
java.io.IOException - if there is a problem with the reader.ArffLoader,
ConverterUtils.DataSourcepublic Instances(Instances dataset)
dataset - the set to be copied
public Instances(Instances dataset,
int capacity)
dataset - the instances from which the header
information is to be takencapacity - the capacity of the new dataset
public Instances(Instances source,
int first,
int toCopy)
source - the set of instances from which a subset
is to be createdfirst - the index of the first instance to be copiedtoCopy - the number of instances to be copied
java.lang.IllegalArgumentException - if first and toCopy are out of range
public Instances(java.lang.String name,
FastVector attInfo,
int capacity)
name - the name of the relationattInfo - the attribute informationcapacity - the capacity of the set| Method Detail |
|---|
public Instances stringFreeStructure()
public void add(Instance instance)
instance - the instance to be addedpublic Attribute attribute(int index)
index - the attribute's index (index starts with 0)
public Attribute attribute(java.lang.String name)
name - the attribute's name
public boolean checkForAttributeType(int attType)
attType - the attribute type to look for
public boolean checkForStringAttributes()
public boolean checkInstance(Instance instance)
instance - the instance to check
public Attribute classAttribute()
UnassignedClassException - if the class is not setpublic int classIndex()
public void compactify()
public void delete()
public void delete(int index)
index - the instance's position (index starts with 0)public void deleteAttributeAt(int position)
position - the attribute's position (position starts with 0)
java.lang.IllegalArgumentException - if the given index is out of range
or the class attribute is being deletedpublic void deleteAttributeType(int attType)
attType - the attribute type to delete
java.lang.IllegalArgumentException - if attribute couldn't be
successfully deleted (probably because it is the class attribute).public void deleteStringAttributes()
java.lang.IllegalArgumentException - if string attribute couldn't be
successfully deleted (probably because it is the class attribute).deleteAttributeType(int)public void deleteWithMissing(int attIndex)
attIndex - the attribute's index (index starts with 0)public void deleteWithMissing(Attribute att)
att - the attributepublic void deleteWithMissingClass()
UnassignedClassException - if class is not setpublic java.util.Enumeration enumerateAttributes()
public java.util.Enumeration enumerateInstances()
public boolean equalHeaders(Instances dataset)
dataset - another dataset
public Instance firstInstance()
public java.util.Random getRandomNumberGenerator(long seed)
seed - the given seed
public void insertAttributeAt(Attribute att,
int position)
att - the attribute to be insertedposition - the attribute's position (position starts with 0)
java.lang.IllegalArgumentException - if the given index is out of rangepublic Instance instance(int index)
index - the instance's index (index starts with 0)
public double kthSmallestValue(Attribute att,
int k)
att - the Attribute objectk - the value of k
public double kthSmallestValue(int attIndex,
int k)
attIndex - the attribute's indexk - the value of k
public Instance lastInstance()
public double meanOrMode(int attIndex)
attIndex - the attribute's index (index starts with 0)
public double meanOrMode(Attribute att)
att - the attribute
public int numAttributes()
public int numClasses()
UnassignedClassException - if the class is not setpublic int numDistinctValues(int attIndex)
attIndex - the attribute (index starts with 0)
public int numDistinctValues(Attribute att)
att - the attribute
public int numInstances()
public void randomize(java.util.Random random)
random - a random number generator
@Deprecated
public boolean readInstance(java.io.Reader reader)
throws java.io.IOException
readInstance(Reader) method, one should use the
ArffLoader or DataSource class instead.
reader - the reader
java.io.IOException - if the information is not read
successfullyArffLoader,
ConverterUtils.DataSourcepublic java.lang.String relationName()
public void renameAttribute(int att,
java.lang.String name)
att - the attribute's index (index starts with 0)name - the new name
public void renameAttribute(Attribute att,
java.lang.String name)
att - the attributename - the new name
public void renameAttributeValue(int att,
int val,
java.lang.String name)
att - the attribute's index (index starts with 0)val - the value's index (index starts with 0)name - the new name
public void renameAttributeValue(Attribute att,
java.lang.String val,
java.lang.String name)
att - the attributeval - the valuename - the new namepublic Instances resample(java.util.Random random)
random - a random number generator
public Instances resampleWithWeights(java.util.Random random)
random - a random number generator
public Instances resampleWithWeights(java.util.Random random,
double[] weights)
random - a random number generatorweights - the weight vector
java.lang.IllegalArgumentException - if the weights array is of the wrong
length or contains negative weights.public void setClass(Attribute att)
att - attribute to be the classpublic void setClassIndex(int classIndex)
classIndex - the new class index (index starts with 0)
java.lang.IllegalArgumentException - if the class index is too big or < 0public void setRelationName(java.lang.String newName)
newName - the new relation name.public void sort(int attIndex)
attIndex - the attribute's index (index starts with 0)public void sort(Attribute att)
att - the attributepublic void stratify(int numFolds)
numFolds - the number of folds in the cross-validation
UnassignedClassException - if the class is not setpublic double sumOfWeights()
public Instances testCV(int numFolds,
int numFold)
numFolds - the number of folds in the cross-validation. Must
be greater than 1.numFold - 0 for the first fold, 1 for the second, ...
java.lang.IllegalArgumentException - if the number of folds is less than 2
or greater than the number of instances.public java.lang.String toString()
toString in class java.lang.Object
public Instances trainCV(int numFolds,
int numFold)
numFolds - the number of folds in the cross-validation. Must
be greater than 1.numFold - 0 for the first fold, 1 for the second, ...
java.lang.IllegalArgumentException - if the number of folds is less than 2
or greater than the number of instances.
public Instances trainCV(int numFolds,
int numFold,
java.util.Random random)
numFolds - the number of folds in the cross-validation. Must
be greater than 1.numFold - 0 for the first fold, 1 for the second, ...random - the random number generator
java.lang.IllegalArgumentException - if the number of folds is less than 2
or greater than the number of instances.public double variance(int attIndex)
attIndex - the numeric attribute (index starts with 0)
java.lang.IllegalArgumentException - if the attribute is not numericpublic double variance(Attribute att)
att - the numeric attribute
java.lang.IllegalArgumentException - if the attribute is not numericpublic AttributeStats attributeStats(int index)
index - the index of the attribute to summarize (index starts with 0)
public double[] attributeToDoubleArray(int index)
index - the index of the attribute.
public java.lang.String toSummaryString()
public void swap(int i,
int j)
i - the first instance's index (index starts with 0)j - the second instance's index (index starts with 0)
public static Instances mergeInstances(Instances first,
Instances second)
first - the first set of Instancessecond - the second set of Instances
java.lang.IllegalArgumentException - if the datasets are not the same sizepublic static void test(java.lang.String[] argv)
argv - should contain one element: the name of an ARFF filepublic static void main(java.lang.String[] args)
weka.core.Instances helpweka.core.Instances <filename>weka.core.Instances merge <filename1> <filename2>weka.core.Instances append <filename1> <filename2>weka.core.Instances headers <filename1> <filename2>weka.core.Instances randomize <seed> <filename>
args - the commandline parameterspublic java.lang.String getRevision()
getRevision in interface RevisionHandler
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||