Java Doc for Evaluation.java in » Science » weka » weka » classifiers » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation

1.	6.0 JDK Core
2.	6.0 JDK Modules
3.	6.0 JDK Modules com.sun
4.	6.0 JDK Modules com.sun.java
5.	6.0 JDK Modules sun
6.	6.0 JDK Platform
7.	Ajax
8.	Apache Harmony Java SE
9.	Aspect oriented
10.	Authentication Authorization
11.	Blogger System
12.	Build
13.	Byte Code
14.	Cache
15.	Chart
16.	Chat
17.	Code Analyzer
18.	Collaboration
19.	Content Management System
20.	Database Client
21.	Database DBMS
22.	Database JDBC Connection Pool
23.	Database ORM
24.	Development
25.	EJB Server geronimo
26.	EJB Server GlassFish
27.	EJB Server JBoss 4.2.1
28.	EJB Server resin 3.1.5
29.	ERP CRM Financial
30.	ESB
31.	Forum
32.	GIS
33.	Graphic Library
34.	Groupware
35.	HTML Parser
36.	IDE
37.	IDE Eclipse
38.	IDE Netbeans
39.	Installer
40.	Internationalization Localization
41.	Inversion of Control
42.	Issue Tracking
43.	J2EE
44.	JBoss
45.	JMS
46.	JMX
47.	Library
48.	Mail Clients
49.	Net
50.	Parser
51.	PDF
52.	Portal
53.	Profiler
54.	Project Management
55.	Report
56.	RSS RDF
57.	Rule Engine
58.	Science
59.	Scripting
60.	Search Engine
61.	Security
62.	Sevlet Container
63.	Source Control
64.	Swing Library
65.	Template Engine
66.	Test Coverage
67.	Testing
68.	UML
69.	Web Crawler
70.	Web Framework
71.	Web Mail
72.	Web Server
73.	Web Services
74.	Web Services apache cxf 2.0.1
75.	Web Services AXIS2
76.	Wiki Engine
77.	Workflow Engines
78.	XML
79.	XML UI

Java

Java Tutorial

Illustrator Tutorials

GIMP Tutorials

C# / C Sharp

C# / CSharp Tutorial

C# / CSharp Open Source

SQL Server / T-SQL Tutorial

Oracle PL / SQL

Oracle PL/SQL Tutorial

Flash / Flex / ActionScript

VBA / Excel / Access / Word

XML

XML Tutorial

Microsoft Office PowerPoint 2007 Tutorial

Microsoft Office Excel 2007 Tutorial

Microsoft Office Word 2007 Tutorial

Java Source Code / Java Documentation » Science » weka » weka.classifiers

Source Cross Reference Class Diagram Java Document (Java Doc)

java.lang .Object

weka.classifiers .Evaluation

Evaluation

public class Evaluation implements Summarizable(Code)

Class for evaluating machine learning models.

-------------------------------------------------------------------

General options when evaluating a learning scheme from the command-line:

-t filename
Name of the file with the training data. (required)

-T filename
Name of the file with the test data. If missing a cross-validation is performed.

-c index
Index of the class attribute (1, 2, ...; default: last).

-x number
The number of folds for the cross-validation (default: 10).

-no-cv
No cross validation. If no test file is provided, no evaluation is done.

-split-percentage percentage
Sets the percentage for the train/test set split, e.g., 66.

-preserve-order
Preserves the order in the percentage split instead of randomizing the data first with the seed value ('-s').

-s seed
Random number seed for the cross-validation and percentage split (default: 1).

-m filename
The name of a file containing a cost matrix.

-l filename
Loads classifier from the given file. In case the filename ends with ".xml" the options are loaded from XML.

-d filename
Saves classifier built from the training data into the given file. In case the filename ends with ".xml" the options are saved XML, not the model.

-v
Outputs no statistics for the training data.

-o
Outputs statistics only, not the classifier.

-i
Outputs information-retrieval statistics per class.

-k
Outputs information-theoretic statistics.

-p range
Outputs predictions for test instances (or the train instances if no test instances provided), along with the attributes in the specified range (and nothing else). Use '-p 0' if no attributes are desired.

-distribution
Outputs the distribution instead of only the prediction in conjunction with the '-p' option (only nominal classes).

-r
Outputs cumulative margin distribution (and nothing else).

-g
Only for classifiers that implement "Graphable." Outputs the graph representation of the classifier (and nothing else).

-xml filename | xml-string
Retrieves the options from the XML-data instead of the command line.

-threshold-file file
The file to save the threshold data to. The format is determined by the extensions, e.g., '.arff' for ARFF format or '.csv' for CSV.

-threshold-label label
The class label to determine the threshold data for (default is the first label)

-------------------------------------------------------------------

Example usage as the main of a classifier (called FunkyClassifier):

 public static void main(String [] args) {
 runClassifier(new FunkyClassifier(), args);
 }

------------------------------------------------------------------

Example usage from within an application:

 Instances trainInstances = ... instances got from somewhere
 Instances testInstances = ... instances got from somewhere
 Classifier scheme = ... scheme got from somewhere
 Evaluation evaluation = new Evaluation(trainInstances);
 evaluation.evaluateModel(scheme, testInstances);
 System.out.println(evaluation.toSummaryString());

author:
   Eibe Frank (eibe@cs.waikato.ac.nz)
author:
   Len Trigg (trigg@cs.waikato.ac.nz)
version:
   $Revision: 1.77 $

Field Summary
final protected static double	MIN_SF_PROB The minimum probablility accepted from an estimator to avoid taking log(0) in Sf calculations.
protected static int	k_MarginResolution
protected boolean	m_ClassIsNominal
protected String[]	m_ClassNames The names of the classes.
protected double[]	m_ClassPriors
protected double	m_ClassPriorsSum
protected double[][]	m_ConfusionMatrix Array for storing the confusion matrix.
protected double	m_Correct The weight of all correctly classified instances.
protected CostMatrix	m_CostMatrix The cost matrix (if given).
protected Estimator	m_ErrorEstimator
protected double	m_Incorrect The weight of all incorrectly classified instances.
protected double	m_MarginCounts
protected double	m_MissingClass The weight of all instances that had no class assigned to them.
protected boolean	m_NoPriors
protected int	m_NumClasses The number of classes.
protected int	m_NumFolds The number of folds for a cross-validation.
protected int	m_NumTrainClassVals
protected Estimator	m_PriorErrorEstimator
protected double	m_SumAbsErr Sum of absolute errors.
protected double	m_SumClass Sum of class values.
protected double	m_SumClassPredicted Sum of predicted * class values.
protected double	m_SumErr Sum of errors.
protected double	m_SumKBInfo
protected double	m_SumPredicted Sum of predicted values.
protected double	m_SumPriorAbsErr
protected double	m_SumPriorEntropy
protected double	m_SumPriorSqrErr
protected double	m_SumSchemeEntropy
protected double	m_SumSqrClass Sum of squared class values.
protected double	m_SumSqrErr Sum of squared errors.
protected double	m_SumSqrPredicted Sum of squared predicted values.
protected double	m_TotalCost
protected double[]	m_TrainClassVals
protected double[]	m_TrainClassWeights
protected double	m_Unclassified The weight of all unclassified instances.
protected double	m_WithClass The weight of all instances that had a class assigned to them.

Constructor Summary
public	Evaluation(Instances data) Initializes all the counters for the evaluation.
public	Evaluation(Instances data, CostMatrix costMatrix) Initializes all the counters for the evaluation and also takes a cost matrix as parameter.

Method Summary
final public double	KBInformation()
final public double	KBMeanInformation() Return the Kononenko & Bratko Information score in bits per instance.
final public double	KBRelativeInformation()
final public double	SFEntropyGain() Returns the total SF, which is the null model entropy minus the scheme entropy.
final public double	SFMeanEntropyGain() Returns the SF per instance, which is the null model entropy minus the scheme entropy, per instance.
final public double	SFMeanPriorEntropy()
final public double	SFMeanSchemeEntropy()
final public double	SFPriorEntropy()
final public double	SFSchemeEntropy()
protected void	addNumericTrainClass(double classValue, double weight) Adds a numeric (non-missing) training class value and weight to the buffer of stored values.
public double	areaUnderROC(int classIndex) Returns the area under ROC for those predictions that have been collected in the evaluateClassifier(Classifier, Instances) method.
protected static String	attributeValuesString(Instance instance, Range attRange) Builds a string listing the attribute values in a specified range of indices, separated by commas and enclosed in brackets.
final public double	avgCost() Gets the average cost, that is, total cost of misclassifications (incorrect plus unclassified) over the total number of instances. the average cost.
public double[][]	confusionMatrix() Returns a copy of the confusion matrix.
final public double	correct() Gets the number of instances correctly classified (that is, for which a correct prediction was made).
final public double	correlationCoefficient() Returns the correlation coefficient if the class is numeric.
public void	crossValidateModel(Classifier classifier, Instances data, int numFolds, Random random) Performs a (stratified if class is nominal) cross-validation for a classifier on a set of instances.
public void	crossValidateModel(String classifierString, Instances data, int numFolds, String[] options, Random random) Performs a (stratified if class is nominal) cross-validation for a classifier on a set of instances. Parameters: classifierString - a string naming the class of the classifier Parameters: data - the data on which the cross-validation is to be performed Parameters: numFolds - the number of folds for the cross-validation Parameters: options - the options to the classifier.
public boolean	equals(Object obj)
final public double	errorRate() Returns the estimated error rate or the root mean squared error (if the class is numeric).
public static String	evaluateModel(String classifierString, String[] options) Evaluates a classifier with the options given in an array of strings.
public static String	evaluateModel(Classifier classifier, String[] options) Evaluates a classifier with the options given in an array of strings.
public double[]	evaluateModel(Classifier classifier, Instances data) Evaluates the classifier on a given set of instances.
public double	evaluateModelOnce(Classifier classifier, Instance instance) Evaluates the classifier on a single instance.
public double	evaluateModelOnce(double[] dist, Instance instance) Evaluates the supplied distribution on a single instance.
public void	evaluateModelOnce(double prediction, Instance instance) Evaluates the supplied prediction on a single instance.
public double	evaluateModelOnceAndRecordPrediction(Classifier classifier, Instance instance) Evaluates the classifier on a single instance and records the prediction (if the class is nominal).
public double	evaluateModelOnceAndRecordPrediction(double[] dist, Instance instance) Evaluates the supplied distribution on a single instance.
public double	fMeasure(int classIndex) Calculate the F-Measure with respect to a particular class.
public double	falseNegativeRate(int classIndex) Calculate the false negative rate with respect to a particular class.
public double	falsePositiveRate(int classIndex) Calculate the false positive rate with respect to a particular class.
public double[]	getClassPriors()
protected static CostMatrix	handleCostOption(String costFileName, int numClasses) Attempts to load a cost matrix. Parameters: costFileName - the filename of the cost matrix Parameters: numClasses - the number of classes that should be in the cost matrix(only used if the cost file is in old format).
final public double	incorrect() Gets the number of instances incorrectly classified (that is, for which an incorrect prediction was made).
final public double	kappa() Returns value of kappa statistic if class is nominal.
public static void	main(String[] args) A test method for this class.
protected double[]	makeDistribution(double predictedClass)
protected static String	makeOptionString(Classifier classifier)
final public double	meanAbsoluteError() Returns the mean absolute error.
final public double	meanPriorAbsoluteError() Returns the mean absolute error of the prior.
protected String	num2ShortID(int num, char[] IDChars, int IDWidth) Method for generating indices for the confusion matrix.
public double	numFalseNegatives(int classIndex) Calculate number of false negatives with respect to a particular class.
public double	numFalsePositives(int classIndex) Calculate number of false positives with respect to a particular class.
final public double	numInstances() Gets the number of test instances that had a known class value (actually the sum of the weights of test instances with known class value).
public double	numTrueNegatives(int classIndex) Calculate the number of true negatives with respect to a particular class.
public double	numTruePositives(int classIndex) Calculate the number of true positives with respect to a particular class.
final public double	pctCorrect() Gets the percentage of instances correctly classified (that is, for which a correct prediction was made).
final public double	pctIncorrect() Gets the percentage of instances incorrectly classified (that is, for which an incorrect prediction was made).
final public double	pctUnclassified() Gets the percentage of instances not classified (that is, for which no prediction was made by the classifier).
public double	precision(int classIndex) Calculate the precision with respect to a particular class.
protected static String	predictionText(Classifier classifier, Instance inst, int instNum, Range attributesToOutput, boolean printDistribution)
public FastVector	predictions() Returns the predictions that have been collected. a reference to the FastVector containing the predictionsthat have been collected.
protected static String	printClassifications(Classifier classifier, Instances train, DataSource testSource, int classIndex, Range attributesToOutput) Prints the predictions for the given dataset into a String variable.
protected static String	printClassifications(Classifier classifier, Instances train, DataSource testSource, int classIndex, Range attributesToOutput, boolean printDistribution) Prints the predictions for the given dataset into a String variable.
final public double	priorEntropy()
public double	recall(int classIndex) Calculate the recall with respect to a particular class.
final public double	relativeAbsoluteError() Returns the relative absolute error.
final public double	rootMeanPriorSquaredError() Returns the root mean prior squared error.
final public double	rootMeanSquaredError() Returns the root mean squared error.
final public double	rootRelativeSquaredError() Returns the root relative squared error if the class is numeric.
protected void	setNumericPriorsFromBuffer() Sets up the priors for numeric class attributes from the training class values that have been seen so far.
public void	setPriors(Instances train)
public String	toClassDetailsString() Generates a breakdown of the accuracy for each class (with default title), incorporating various information-retrieval statistics, such as true/false positive rate, precision/recall/F-Measure.
public String	toClassDetailsString(String title) Generates a breakdown of the accuracy for each class, incorporating various information-retrieval statistics, such as true/false positive rate, precision/recall/F-Measure.
public String	toCumulativeMarginDistributionString() Output the cumulative margin distribution as a string suitable for input for gnuplot or similar package.
public String	toMatrixString() Calls toMatrixString() with a default title.
public String	toMatrixString(String title) Outputs the performance statistics as a classification confusion matrix.
public String	toSummaryString()
public String	toSummaryString(boolean printComplexityStatistics) Calls toSummaryString() with a default title.
public String	toSummaryString(String title, boolean printComplexityStatistics) Outputs the performance statistics in summary form.
final public double	totalCost() Gets the total cost, that is, the cost of each prediction times the weight of the instance, summed over all instances.
public double	trueNegativeRate(int classIndex) Calculate the true negative rate with respect to a particular class.
public double	truePositiveRate(int classIndex) Calculate the true positive rate with respect to a particular class.
final public double	unclassified() Gets the number of instances not classified (that is, for which no prediction was made by the classifier).
protected void	updateMargins(double[] predictedDistribution, int actualClass, double weight)
protected void	updateNumericScores(double[] predicted, double[] actual, double weight) Update the numeric accuracy measures.
public void	updatePriors(Instance instance)
protected void	updateStatsForClassifier(double[] predictedDistribution, Instance instance) Updates all the statistics about a classifiers performance for the current test instance.
protected void	updateStatsForPredictor(double predictedValue, Instance instance) Updates all the statistics about a predictors performance for the current test instance.
public void	useNoPriors() disables the use of priors, e.g., in case of de-serialized schemes that have no access to the original training set, but are evaluated on a set set.
protected static String	wekaStaticWrapper(Sourcable classifier, String className) Wraps a static classifier in enough source to test using the weka class libraries.

Field Detail

MIN_SF_PROB
final protected static double MIN_SF_PROB(Code)
	The minimum probablility accepted from an estimator to avoid taking log(0) in Sf calculations.

k_MarginResolution
protected static int k_MarginResolution(Code)
	Resolution of the margin histogram

m_ClassIsNominal
protected boolean m_ClassIsNominal(Code)
	Is the class nominal or numeric?

m_ClassNames
protected String[] m_ClassNames(Code)
	The names of the classes.

m_ClassPriors
protected double[] m_ClassPriors(Code)
	The prior probabilities of the classes

m_ClassPriorsSum
protected double m_ClassPriorsSum(Code)
	The sum of counts for priors

m_ConfusionMatrix
protected double[][] m_ConfusionMatrix(Code)
	Array for storing the confusion matrix.

m_Correct
protected double m_Correct(Code)
	The weight of all correctly classified instances.

m_CostMatrix
protected CostMatrix m_CostMatrix(Code)
	The cost matrix (if given).

m_ErrorEstimator
protected Estimator m_ErrorEstimator(Code)
	Numeric class error estimator for scheme

m_Incorrect
protected double m_Incorrect(Code)
	The weight of all incorrectly classified instances.

m_MarginCounts
protected double m_MarginCounts(Code)
	Cumulative margin distribution

m_MissingClass
protected double m_MissingClass(Code)
	The weight of all instances that had no class assigned to them.

m_NoPriors
protected boolean m_NoPriors(Code)
	enables/disables the use of priors, e.g., if no training set is present in case of de-serialized schemes

m_NumClasses
protected int m_NumClasses(Code)
	The number of classes.

m_NumFolds
protected int m_NumFolds(Code)
	The number of folds for a cross-validation.

m_NumTrainClassVals
protected int m_NumTrainClassVals(Code)
	Number of non-missing class training instances seen

m_PriorErrorEstimator
protected Estimator m_PriorErrorEstimator(Code)
	Numeric class error estimator for prior

m_SumAbsErr
protected double m_SumAbsErr(Code)
	Sum of absolute errors.

m_SumClass
protected double m_SumClass(Code)
	Sum of class values.

m_SumClassPredicted
protected double m_SumClassPredicted(Code)
	Sum of predicted * class values.

m_SumErr
protected double m_SumErr(Code)
	Sum of errors.

m_SumKBInfo
protected double m_SumKBInfo(Code)
	Total Kononenko & Bratko Information

m_SumPredicted
protected double m_SumPredicted(Code)
	Sum of predicted values.

m_SumPriorAbsErr
protected double m_SumPriorAbsErr(Code)
	Sum of absolute errors of the prior

m_SumPriorEntropy
protected double m_SumPriorEntropy(Code)
	Total entropy of prior predictions

m_SumPriorSqrErr
protected double m_SumPriorSqrErr(Code)
	Sum of absolute errors of the prior

m_SumSchemeEntropy
protected double m_SumSchemeEntropy(Code)
	Total entropy of scheme predictions

m_SumSqrClass
protected double m_SumSqrClass(Code)
	Sum of squared class values.

m_SumSqrErr
protected double m_SumSqrErr(Code)
	Sum of squared errors.

m_SumSqrPredicted
protected double m_SumSqrPredicted(Code)
	Sum of squared predicted values.

m_TotalCost
protected double m_TotalCost(Code)
	The total cost of predictions (includes instance weights)

m_TrainClassVals
protected double[] m_TrainClassVals(Code)
	Array containing all numeric training class values seen

m_TrainClassWeights
protected double[] m_TrainClassWeights(Code)
	Array containing all numeric training class weights

m_Unclassified
protected double m_Unclassified(Code)
	The weight of all unclassified instances.

m_WithClass
protected double m_WithClass(Code)
	The weight of all instances that had a class assigned to them.

Constructor Detail

Evaluation
public Evaluation(Instances data) throws Exception(Code)
	Initializes all the counters for the evaluation. Use `useNoPriors()` if the dataset is the test set and you can't initialize with the priors from the training set via `setPriors(Instances)`. Parameters: data - set of training instances, to get some header information and prior class distribution information throws: Exception - if the class is not defined See Also: Evaluation.useNoPriors() See Also: Evaluation.setPriors(Instances)

Evaluation
public Evaluation(Instances data, CostMatrix costMatrix) throws Exception(Code)
	Initializes all the counters for the evaluation and also takes a cost matrix as parameter. Use `useNoPriors()` if the dataset is the test set and you can't initialize with the priors from the training set via `setPriors(Instances)`. Parameters: data - set of training instances, to get some header information and prior class distribution information Parameters: costMatrix - the cost matrix---if null, default costs will be used throws: Exception - if cost matrix is not compatible with data, the class is not defined or the class is numeric See Also: Evaluation.useNoPriors() See Also: Evaluation.setPriors(Instances)

Method Detail

KBInformation
final public double KBInformation() throws Exception(Code)
	Return the total Kononenko & Bratko Information score in bits the K&B information score throws: Exception - if the class is not nominal

KBMeanInformation
final public double KBMeanInformation() throws Exception(Code)
	Return the Kononenko & Bratko Information score in bits per instance. the K&B information score throws: Exception - if the class is not nominal

KBRelativeInformation
final public double KBRelativeInformation() throws Exception(Code)
	Return the Kononenko & Bratko Relative Information score the K&B relative information score throws: Exception - if the class is not nominal

SFEntropyGain
final public double SFEntropyGain()(Code)
	Returns the total SF, which is the null model entropy minus the scheme entropy. the total SF

SFMeanEntropyGain
final public double SFMeanEntropyGain()(Code)
	Returns the SF per instance, which is the null model entropy minus the scheme entropy, per instance. the SF per instance

SFMeanPriorEntropy
final public double SFMeanPriorEntropy()(Code)
	Returns the entropy per instance for the null model the null model entropy per instance

SFMeanSchemeEntropy
final public double SFMeanSchemeEntropy()(Code)
	Returns the entropy per instance for the scheme the scheme entropy per instance

SFPriorEntropy
final public double SFPriorEntropy()(Code)
	Returns the total entropy for the null model the total null model entropy

SFSchemeEntropy
final public double SFSchemeEntropy()(Code)
	Returns the total entropy for the scheme the total scheme entropy

addNumericTrainClass
protected void addNumericTrainClass(double classValue, double weight)(Code)
	Adds a numeric (non-missing) training class value and weight to the buffer of stored values. Parameters: classValue - the class value Parameters: weight - the instance weight

areaUnderROC
public double areaUnderROC(int classIndex)(Code)
	Returns the area under ROC for those predictions that have been collected in the evaluateClassifier(Classifier, Instances) method. Returns Instance.missingValue() if the area is not available. Parameters: classIndex - the index of the class to consider as "positive" the area under the ROC curve or not a number

attributeValuesString
protected static String attributeValuesString(Instance instance, Range attRange)(Code)
	Builds a string listing the attribute values in a specified range of indices, separated by commas and enclosed in brackets. Parameters: instance - the instance to print the values from Parameters: attRange - the range of the attributes to list a string listing values of the attributes in the range

avgCost
final public double avgCost()(Code)
	Gets the average cost, that is, total cost of misclassifications (incorrect plus unclassified) over the total number of instances. the average cost.

confusionMatrix
public double[][] confusionMatrix()(Code)
	Returns a copy of the confusion matrix. a copy of the confusion matrix as a two-dimensional array

correct
final public double correct()(Code)
	Gets the number of instances correctly classified (that is, for which a correct prediction was made). (Actually the sum of the weights of these instances) the number of correctly classified instances

correlationCoefficient
final public double correlationCoefficient() throws Exception(Code)
	Returns the correlation coefficient if the class is numeric. the correlation coefficient throws: Exception - if class is not numeric

crossValidateModel
public void crossValidateModel(Classifier classifier, Instances data, int numFolds, Random random) throws Exception(Code)
	Performs a (stratified if class is nominal) cross-validation for a classifier on a set of instances. Now performs a deep copy of the classifier before each call to buildClassifier() (just in case the classifier is not initialized properly). Parameters: classifier - the classifier with any options set. Parameters: data - the data on which the cross-validation is to be performed Parameters: numFolds - the number of folds for the cross-validation Parameters: random - random number generator for randomization throws: Exception - if a classifier could not be generated successfully or the class is not defined

crossValidateModel
public void crossValidateModel(String classifierString, Instances data, int numFolds, String[] options, Random random) throws Exception(Code)
	Performs a (stratified if class is nominal) cross-validation for a classifier on a set of instances. Parameters: classifierString - a string naming the class of the classifier Parameters: data - the data on which the cross-validation is to be performed Parameters: numFolds - the number of folds for the cross-validation Parameters: options - the options to the classifier. Any options Parameters: random - the random number generator for randomizing the dataaccepted by the classifier will be removed from this array. throws: Exception - if a classifier could not be generated successfully or the class is not defined

equals
public boolean equals(Object obj)(Code)
	Tests whether the current evaluation object is equal to another evaluation object Parameters: obj - the object to compare against true if the two objects are equal

errorRate
final public double errorRate()(Code)
	Returns the estimated error rate or the root mean squared error (if the class is numeric). If a cost matrix was given this error rate gives the average cost. the estimated error rate (between 0 and 1, or between 0 and maximum cost)

evaluateModel

public static String evaluateModel(String classifierString, String[] options) throws Exception(Code)

Evaluates a classifier with the options given in an array of strings.

Valid options are: