| java.lang.Object weka.estimators.CheckEstimator
CheckEstimator | public class CheckEstimator implements OptionHandler(Code) | | Class for examining the capabilities and finding problems with
estimators. If you implement a estimator using the WEKA.libraries,
you should run the checks on it to ensure robustness and correct
operation. Passing all the tests of this object does not mean
bugs in the estimator don't exist, but this will help find some
common ones.
Typical usage:
java weka.estimators.CheckEstimator -W estimator_name
estimator_options
This class uses code from the CheckEstimatorClass
ATTENTION! Current estimators can only
1. split on a nominal class attribute
2. build estimators for nominal and numeric attributes
3. build estimators independendly of the class type
The functionality to test on other class and attribute types
is left in big parts in the code.
CheckEstimator reports on the following:
- Estimator abilities
- Possible command line options to the estimator
- Whether the estimator can predict nominal, numeric, string,
date or relational class attributes. Warnings will be displayed if
performance is worse than ZeroR
- Whether the estimator can be trained incrementally
- Whether the estimator can build estimates for numeric attributes
- Whether the estimator can handle nominal attributes
- Whether the estimator can handle string attributes
- Whether the estimator can handle date attributes
- Whether the estimator can handle relational attributes
- Whether the estimator build estimates for multi-instance data
- Whether the estimator can handle missing attribute values
- Whether the estimator can handle missing class values
- Whether a nominal estimator only handles 2 class problems
- Whether the estimator can handle instance weights
- Correct functioning
- Correct initialisation during addvalues (i.e. no result
changes when addValues called repeatedly)
- Whether incremental training produces the same results
as during non-incremental training (which may or may not
be OK)
- Whether the estimator alters the data pased to it
(number of instances, instance order, instance weights, etc)
- Degenerate cases
- building estimator with zero training instances
- all but one attribute attribute values missing
- all attribute attribute values missing
- all but one class values missing
- all class values missing
Running CheckEstimator with the debug option set will output the
training and test datasets for any failed tests.
The weka.estimators.AbstractEstimatorTest uses this
class to test all the estimators. Any changes here, have to be
checked in that abstract test class, too.
Valid options are:
-D
Turn on debugging output.
-S
Silent mode - prints nothing to stdout.
-N <num>
The number of instances in the datasets (default 100).
-W
Full name of the estimator analysed.
eg: weka.estimators.bayes.NaiveBayes
Options specific to estimator weka.estimators.rules.ZeroR:
-D
If set, estimator is run in debug mode and
may output additional info to the console
Options after -- are passed to the designated estimator.
author: Len Trigg (trigg@cs.waikato.ac.nz) author: FracPete (fracpete at waikato dot ac dot nz) version: $Revision: 1.3 $ See Also: TestInstances |
Inner Class :public class PostProcessor | |
Inner Class :public static class AttrTypes | |
Inner Class :public static class EstTypes | |
Method Summary | |
protected void | addMissing(Instances data, int level, boolean attributeMissing, boolean classMissing, int attrIndex) Add missing values to a dataset. | protected boolean[] | canEstimate(AttrTypes attrTypes, boolean supervised, int classType) Checks basic estimation of one attribute of the scheme, for simple non-troublesome
datasets. | protected boolean[] | canHandleClassAsNthAttribute(AttrTypes attrTypes, int numAtts, int attrIndex, int classType, int classIndex) Checks whether the scheme can handle class attributes as Nth attribute. | protected boolean[] | canHandleMissing(AttrTypes attrTypes, int classType, boolean attributeMissing, boolean classMissing, int missingLevel) Checks basic missing value handling of the scheme. | protected boolean[] | canHandleNClasses(AttrTypes attrTypes, int numClasses) Checks whether nominal schemes can handle more than two classes. | protected boolean[] | canHandleZeroTraining(AttrTypes attrTypes, int classType) Checks whether the scheme can handle zero training instances. | protected void | canSplitUpClass(AttrTypes attrTypes, int classType) Checks basic estimation of one attribute of the scheme, for simple non-troublesome
datasets. | protected boolean[] | canSplitUpClass(int attrType, int classType) Checks basic estimation of one attribute of the scheme, for simple non-troublesome
datasets. | protected boolean[] | canTakeOptions() Checks whether the scheme can take command line options. | protected void | compareDatasets(Instances data1, Instances data2) Compare two datasets to see if they differ. | protected boolean[] | correctBuildInitialisation(AttrTypes attrTypes, int classType) Checks whether the scheme correctly initialises models when
buildEstimator is called. | protected boolean[] | datasetIntegrity(AttrTypes attrTypes, int classType, boolean attributeMissing, boolean classMissing) Checks whether the scheme alters the training dataset during
training. | public void | doTests() | public boolean | getDebug() | public Estimator | getEstimator() | public static int | getMinMax(Instances inst, int attrIndex, double[] minMax) Find the minimum and the maximum of the attribute and return it in
the last parameter.. | protected double[] | getMinimumMaximum(Instances inst, int attrIndex) | public int | getNumInstances() Gets the current number of instances to use for the datasets. | public String[] | getOptions() Gets the current settings of the CheckEstimator. | public PostProcessor | getPostProcessor() | public boolean | getSilent() | public boolean | hasClasspathProblems() | protected boolean[] | incrementalEstimator() Checks whether the scheme can build models incrementally. | protected boolean[] | incrementingEquality(AttrTypes attrTypes, int classType) Checks whether an incremental scheme produces the same model when
trained incrementally as when batch trained. | protected boolean[] | instanceWeights(AttrTypes attrTypes, int classType) Checks whether the estimator can handle instance weights.
This test compares the estimator performance on two datasets
that are identical except for the training weights. | public Enumeration | listOptions() Returns an enumeration describing the available options. | public static void | main(String[] args) | protected Instances | makeTestDataset(int seed, int numInstances, int numAttr, AttrTypes attrTypes, int numClasses, int classType) Make a simple set of instances, which can later be modified
for use in specific tests. | protected Instances | makeTestDataset(int seed, int numInstances, int numAttr, AttrTypes attrTypes, int numClasses, int classType, int classIndex) Make a simple set of instances with variable position of the class
attribute, which can later be modified for use in specific tests. | protected Vector | makeTestValueList(int seed, int numValues, Instances data, int attrIndex, int attrType) Make a simple set of values. | protected Vector | makeTestValueList(int seed, int numValues, double minValue, double maxValue, int attrType) Make a simple set of values. | protected void | print(Object msg) | protected void | printAttributeSummary(AttrTypes attrTypes, int classType) | protected void | printAttributeSummary(int attrType, int classType) | protected void | println(Object msg) | protected void | println() | protected Instances | process(Instances data) Provides a hook for derived classes to further modify the data. | protected boolean[] | runBasicTest(AttrTypes attrTypes, int numAtts, int attrIndex, int classType, int missingLevel, boolean attributeMissing, boolean classMissing, int numTrain, int numTest, int numClasses, FastVector accepts) Runs a text on the datasets with the given characteristics. | protected boolean[] | runBasicTest(AttrTypes attrTypes, int numAtts, int attrIndex, int classType, int classIndex, int missingLevel, boolean attributeMissing, boolean classMissing, int numTrain, int numTest, int numClasses, FastVector accepts) Runs a text on the datasets with the given characteristics. | public void | setDebug(boolean debug) | public void | setEstimator(Estimator newEstimator) Set the estimator for boosting. | public void | setNumInstances(int value) Sets the number of instances to use in the datasets (some estimators
might require more instances). | public void | setOptions(String[] options) Parses a given list of options. | public void | setPostProcessor(PostProcessor value) | public void | setSilent(boolean value) | protected boolean[] | supervisedEstimator() Checks whether the estimator is supervised. | protected Vector | testWithTestValues(Estimator est, Vector test) Test with test values. | protected AttrTypes | testsPerClassType(int classType, EstTypes estTypes) | protected boolean[] | weightedInstancesHandler() Checks whether the scheme says it can handle instance weights. |
m_AnalysisResults | protected String m_AnalysisResults(Code) | | The results of the analysis as a string
|
m_ClasspathProblems | protected boolean m_ClasspathProblems(Code) | | whether classpath problems occurred
|
m_Debug | protected boolean m_Debug(Code) | | Debugging mode, gives extra output if true
|
m_Estimator | protected Estimator m_Estimator(Code) | | The estimator to be examined
|
m_EstimatorOptions | protected String[] m_EstimatorOptions(Code) | | The options to be passed to the base estimator.
|
m_NumInstances | protected int m_NumInstances(Code) | | The number of instances in the datasets
|
m_PostProcessor | protected PostProcessor m_PostProcessor(Code) | | for post-processing the data even further
|
m_Silent | protected boolean m_Silent(Code) | | Silent mode, for no output at all to stdout
|
addMissing | protected void addMissing(Instances data, int level, boolean attributeMissing, boolean classMissing, int attrIndex)(Code) | | Add missing values to a dataset.
Parameters: data - the instances to add missing values to Parameters: level - the level of missing values to add (if positive, thisis the probability that a value will be set to missing, if negativeall but one value will be set to missing (not yet implemented)) Parameters: attributeMissing - if true, attributes will be modified Parameters: classMissing - if true, the class attribute will be modified Parameters: attrIndex - index of the attribute |
canEstimate | protected boolean[] canEstimate(AttrTypes attrTypes, boolean supervised, int classType)(Code) | | Checks basic estimation of one attribute of the scheme, for simple non-troublesome
datasets.
Parameters: attrTypes - the types the estimator can work with Parameters: classType - the class type (NOMINAL, NUMERIC, etc.) index 0 is true if the test was passed, index 1 is true if test was acceptable |
canHandleClassAsNthAttribute | protected boolean[] canHandleClassAsNthAttribute(AttrTypes attrTypes, int numAtts, int attrIndex, int classType, int classIndex)(Code) | | Checks whether the scheme can handle class attributes as Nth attribute.
Parameters: attrTypes - the attribute types the estimator accepts Parameters: numAtts - of attributes Parameters: attrIndex - the index of the attribute Parameters: classType - the class type (NUMERIC, NOMINAL, etc.) Parameters: classIndex - the index of the class attribute (0-based, -1 means last attribute) index 0 is true if the test was passed, index 1 is true if test was acceptable See Also: TestInstances.CLASS_IS_LAST |
canHandleMissing | protected boolean[] canHandleMissing(AttrTypes attrTypes, int classType, boolean attributeMissing, boolean classMissing, int missingLevel)(Code) | | Checks basic missing value handling of the scheme. If the missing
values cause an exception to be thrown by the scheme, this will be
recorded.
Parameters: attrTypes - attribute types that can be estimated Parameters: classType - the class type (NUMERIC, NOMINAL, etc.) Parameters: attributeMissing - true if the missing values may be in the attributes Parameters: classMissing - true if the missing values may be in the class Parameters: missingLevel - the percentage of missing values index 0 is true if the test was passed, index 1 is true if test was acceptable |
canHandleNClasses | protected boolean[] canHandleNClasses(AttrTypes attrTypes, int numClasses)(Code) | | Checks whether nominal schemes can handle more than two classes.
If a scheme is only designed for two-class problems it should
throw an appropriate exception for multi-class problems.
Parameters: attrTypes - attribute types the estimator excepts Parameters: numClasses - the number of classes to test index 0 is true if the test was passed, index 1 is true if test was acceptable |
canHandleZeroTraining | protected boolean[] canHandleZeroTraining(AttrTypes attrTypes, int classType)(Code) | | Checks whether the scheme can handle zero training instances.
Parameters: attrTypes - attribute types that can be estimated Parameters: classType - the class type (NUMERIC, NOMINAL, etc.) index 0 is true if the test was passed, index 1 is true if test was acceptable |
canSplitUpClass | protected void canSplitUpClass(AttrTypes attrTypes, int classType)(Code) | | Checks basic estimation of one attribute of the scheme, for simple non-troublesome
datasets.
Parameters: attrTypes - the types the estimator can work with Parameters: classType - the class type (NOMINAL, NUMERIC, etc.) |
canSplitUpClass | protected boolean[] canSplitUpClass(int attrType, int classType)(Code) | | Checks basic estimation of one attribute of the scheme, for simple non-troublesome
datasets.
Parameters: attrType - the type of the estimator Parameters: classType - the class type (NOMINAL, NUMERIC, etc.) index 0 is true if the test was passed, index 1 is true if test was acceptable |
canTakeOptions | protected boolean[] canTakeOptions()(Code) | | Checks whether the scheme can take command line options.
index 0 is true if the estimator can take options |
compareDatasets | protected void compareDatasets(Instances data1, Instances data2) throws Exception(Code) | | Compare two datasets to see if they differ.
Parameters: data1 - one set of instances Parameters: data2 - the other set of instances throws: Exception - if the datasets differ |
correctBuildInitialisation | protected boolean[] correctBuildInitialisation(AttrTypes attrTypes, int classType)(Code) | | Checks whether the scheme correctly initialises models when
buildEstimator is called. This test calls buildEstimator with
one training dataset and records performance on a test set.
buildEstimator is then called on a training set with different
structure, and then again with the original training set. The
performance on the test set is compared with the original results
and any performance difference noted as incorrect build initialisation.
Parameters: attrTypes - attribute types that can be estimated Parameters: classType - the class type (NUMERIC, NOMINAL, etc.) index 0 is true if the test was passed, index 1 is true if thescheme performs worse than ZeroR, but without error (index 0 isfalse) |
datasetIntegrity | protected boolean[] datasetIntegrity(AttrTypes attrTypes, int classType, boolean attributeMissing, boolean classMissing)(Code) | | Checks whether the scheme alters the training dataset during
training. If the scheme needs to modify the training
data it should take a copy of the training data. Currently checks
for changes to header structure, number of instances, order of
instances, instance weights.
Parameters: attrTypes - attribute types that can be estimated Parameters: classType - the class type (NUMERIC, NOMINAL, etc.) Parameters: attributeMissing - true if we know the estimator can handle(at least) moderate missing attribute values Parameters: classMissing - true if we know the estimator can handle(at least) moderate missing class values index 0 is true if the test was passed |
doTests | public void doTests()(Code) | | Begin the tests, reporting results to System.out
|
getDebug | public boolean getDebug()(Code) | | Get whether debugging is turned on
true if debugging output is on |
getEstimator | public Estimator getEstimator()(Code) | | Get the estimator used as the estimator
the estimator used as the estimator |
getMinMax | public static int getMinMax(Instances inst, int attrIndex, double[] minMax) throws Exception(Code) | | Find the minimum and the maximum of the attribute and return it in
the last parameter..
Parameters: inst - instances used to build the estimator Parameters: attrIndex - index of the attribute Parameters: minMax - the array to return minimum and maximum in number of not missing values exception: Exception - if parameter minMax wasn't initialized properly |
getMinimumMaximum | protected double[] getMinimumMaximum(Instances inst, int attrIndex)(Code) | | Gets the minimum and maximum of the values a the first attribute
of the given data set
Parameters: inst - the instance Parameters: attrIndex - the index of the attribut to find min and max the array with the minimum value on index 0 and the max on index 1 |
getNumInstances | public int getNumInstances()(Code) | | Gets the current number of instances to use for the datasets.
the number of instances |
getOptions | public String[] getOptions()(Code) | | Gets the current settings of the CheckEstimator.
an array of strings suitable for passing to setOptions |
getPostProcessor | public PostProcessor getPostProcessor()(Code) | | returns the current PostProcessor, can be null
the current PostProcessor |
getSilent | public boolean getSilent()(Code) | | Get whether silent mode is turned on
true if silent mode is on |
hasClasspathProblems | public boolean hasClasspathProblems()(Code) | | returns TRUE if the estimator returned a "not in classpath" Exception
true if CLASSPATH problems occurred |
incrementalEstimator | protected boolean[] incrementalEstimator()(Code) | | Checks whether the scheme can build models incrementally.
index 0 is true if the estimator can train incrementally |
incrementingEquality | protected boolean[] incrementingEquality(AttrTypes attrTypes, int classType)(Code) | | Checks whether an incremental scheme produces the same model when
trained incrementally as when batch trained. The model itself
cannot be compared, so we compare the evaluation on test data
for both models. It is possible to get a false positive on this
test (likelihood depends on the estimator).
Parameters: attrTypes - attribute types that can be estimated Parameters: classType - the class type (NUMERIC, NOMINAL, etc.) index 0 is true if the test was passed |
instanceWeights | protected boolean[] instanceWeights(AttrTypes attrTypes, int classType)(Code) | | Checks whether the estimator can handle instance weights.
This test compares the estimator performance on two datasets
that are identical except for the training weights. If the
results change, then the estimator must be using the weights. It
may be possible to get a false positive from this test if the
weight changes aren't significant enough to induce a change
in estimator performance (but the weights are chosen to minimize
the likelihood of this).
Parameters: attrTypes - attribute types that can be estimated Parameters: classType - the class type (NUMERIC, NOMINAL, etc.) index 0 true if the test was passed |
listOptions | public Enumeration listOptions()(Code) | | Returns an enumeration describing the available options.
an enumeration of all the available options. |
main | public static void main(String[] args)(Code) | | Test method for this class
Parameters: args - the commandline parameters |
makeTestDataset | protected Instances makeTestDataset(int seed, int numInstances, int numAttr, AttrTypes attrTypes, int numClasses, int classType) throws Exception(Code) | | Make a simple set of instances, which can later be modified
for use in specific tests.
Parameters: seed - the random number seed Parameters: numInstances - the number of instances to generate Parameters: numAttr - the number of attributes Parameters: attrTypes - the attribute types Parameters: numClasses - the number of classes (if nominal class) Parameters: classType - the class type (NUMERIC, NOMINAL, etc.) the test dataset throws: Exception - if the dataset couldn't be generated See Also: CheckEstimator.process(Instances) |
makeTestDataset | protected Instances makeTestDataset(int seed, int numInstances, int numAttr, AttrTypes attrTypes, int numClasses, int classType, int classIndex) throws Exception(Code) | | Make a simple set of instances with variable position of the class
attribute, which can later be modified for use in specific tests.
Parameters: seed - the random number seed Parameters: numInstances - the number of instances to generate Parameters: numAttr - the number of attributes to generate Parameters: attrTypes - the type of attrbute that is excepted Parameters: numClasses - the number of classes (if nominal class) Parameters: classType - the class type (NUMERIC, NOMINAL, etc.) Parameters: classIndex - the index of the class (0-based, -1 as last) the test dataset throws: Exception - if the dataset couldn't be generated See Also: TestInstances.CLASS_IS_LAST See Also: CheckEstimator.process(Instances) |
makeTestValueList | protected Vector makeTestValueList(int seed, int numValues, Instances data, int attrIndex, int attrType) throws Exception(Code) | | Make a simple set of values. Only one of the num'type' parameters should be larger 0.
(just to make parameter similar to the makeTestDataset parameters)
Parameters: seed - the random number seed Parameters: numValues - the number of values to generate Parameters: data - the dataset to make test examples for Parameters: attrIndex - index of the attribute Parameters: attrType - the class type (NUMERIC, NOMINAL, etc.) throws: Exception - if the dataset couldn't be generated See Also: CheckEstimator.process(Instances) |
makeTestValueList | protected Vector makeTestValueList(int seed, int numValues, double minValue, double maxValue, int attrType) throws Exception(Code) | | Make a simple set of values. Only one of the num'type' parameters should be larger 0.
(just to make parameter similar to the makeTestDataset parameters)
Parameters: seed - the random number seed Parameters: numValues - the number of values to generate Parameters: minValue - the minimal data value Parameters: maxValue - the maximal data value Parameters: attrType - the class type (NUMERIC, NOMINAL, etc.) throws: Exception - if the dataset couldn't be generated See Also: CheckEstimator.process(Instances) |
print | protected void print(Object msg)(Code) | | prints the given message to stdout, if not silent mode
Parameters: msg - the text to print to stdout |
printAttributeSummary | protected void printAttributeSummary(AttrTypes attrTypes, int classType)(Code) | | Print out a short summary string for the dataset characteristics
Parameters: attrTypes - the attribute types used (NUMERIC, NOMINAL, etc.) Parameters: classType - the class type (NUMERIC, NOMINAL, etc.) |
printAttributeSummary | protected void printAttributeSummary(int attrType, int classType)(Code) | | Print out a short summary string for the dataset characteristics
Parameters: attrType - the attribute type (NUMERIC, NOMINAL, etc.) Parameters: classType - the class type (NUMERIC, NOMINAL, etc.) |
println | protected void println(Object msg)(Code) | | prints the given message (+ LF) to stdout, if not silent mode
Parameters: msg - the message to println to stdout |
println | protected void println()(Code) | | prints a LF to stdout, if not silent mode
|
runBasicTest | protected boolean[] runBasicTest(AttrTypes attrTypes, int numAtts, int attrIndex, int classType, int missingLevel, boolean attributeMissing, boolean classMissing, int numTrain, int numTest, int numClasses, FastVector accepts)(Code) | | Runs a text on the datasets with the given characteristics.
Parameters: attrTypes - attribute types that can be estimated Parameters: numAtts - number of attributes Parameters: attrIndex - attribute index Parameters: classType - the class type (NUMERIC, NOMINAL, etc.) Parameters: missingLevel - the percentage of missing values Parameters: attributeMissing - true if the missing values may be in the attributes Parameters: classMissing - true if the missing values may be in the class Parameters: numTrain - the number of instances in the training set Parameters: numTest - the number of instaces in the test set Parameters: numClasses - the number of classes Parameters: accepts - the acceptable string in an exception index 0 is true if the test was passed, index 1 is true if test was acceptable |
runBasicTest | protected boolean[] runBasicTest(AttrTypes attrTypes, int numAtts, int attrIndex, int classType, int classIndex, int missingLevel, boolean attributeMissing, boolean classMissing, int numTrain, int numTest, int numClasses, FastVector accepts)(Code) | | Runs a text on the datasets with the given characteristics.
Parameters: attrTypes - attribute types that can be estimated Parameters: numAtts - number of attributes Parameters: classType - the class type (NUMERIC, NOMINAL, etc.) Parameters: classIndex - the attribute index of the class Parameters: missingLevel - the percentage of missing values Parameters: attributeMissing - true if the missing values may be in the attributes Parameters: classMissing - true if the missing values may be in the class Parameters: numTrain - the number of instances in the training set Parameters: numTest - the number of instaces in the test set Parameters: numClasses - the number of classes Parameters: accepts - the acceptable string in an exception index 0 is true if the test was passed, index 1 is true if test was acceptable |
setDebug | public void setDebug(boolean debug)(Code) | | Set debugging mode
Parameters: debug - true if debug output should be printed |
setEstimator | public void setEstimator(Estimator newEstimator)(Code) | | Set the estimator for boosting.
Parameters: newEstimator - the Estimator to use. |
setNumInstances | public void setNumInstances(int value)(Code) | | Sets the number of instances to use in the datasets (some estimators
might require more instances).
Parameters: value - the number of instances to use |
setOptions | public void setOptions(String[] options) throws Exception(Code) | | Parses a given list of options.
Valid options are:
-D
Turn on debugging output.
-S
Silent mode - prints nothing to stdout.
-N <num>
The number of instances in the datasets (default 100).
-W
Full name of the estimator analysed.
eg: weka.estimators.NormalEstimator
Options specific to estimator weka.estimators.NormalEstimator:
-D
If set, estimator is run in debug mode and
may output additional info to the console
Parameters: options - the list of options as an array of strings throws: Exception - if an option is not supported |
setPostProcessor | public void setPostProcessor(PostProcessor value)(Code) | | sets the PostProcessor to use
Parameters: value - the new PostProcessor See Also: CheckEstimator.m_PostProcessor |
setSilent | public void setSilent(boolean value)(Code) | | Set slient mode, i.e., no output at all to stdout
Parameters: value - whether silent mode is active or not |
supervisedEstimator | protected boolean[] supervisedEstimator()(Code) | | Checks whether the estimator is supervised.
true if the estimator handles instance weights |
testWithTestValues | protected Vector testWithTestValues(Estimator est, Vector test)(Code) | | Test with test values.
Parameters: est - estimator to be tested Parameters: test - vector with test values |
testsPerClassType | protected AttrTypes testsPerClassType(int classType, EstTypes estTypes)(Code) | | Run a battery of tests for a given class attribute type
Parameters: classType - true if the class attribute should be numeric Parameters: estTypes - types the estimator is, like incremental, weighted, supervised etc attribute types estimator can work with |
weightedInstancesHandler | protected boolean[] weightedInstancesHandler()(Code) | | Checks whether the scheme says it can handle instance weights.
true if the estimator handles instance weights |
|
|