| weka.classifiers.RandomizableClassifier weka.classifiers.trees.SimpleCart
SimpleCart | public class SimpleCart extends RandomizableClassifier implements AdditionalMeasureProducer,TechnicalInformationHandler(Code) | |
Class implementing minimal cost-complexity pruning.
Note when dealing with missing values, use "fractional instances" method instead of surrogate split method.
For more information, see:
Leo Breiman, Jerome H. Friedman, Richard A. Olshen, Charles J. Stone (1984). Classification and Regression Trees. Wadsworth International Group, Belmont, California.
BibTeX:
@book{Breiman1984,
address = {Belmont, California},
author = {Leo Breiman and Jerome H. Friedman and Richard A. Olshen and Charles J. Stone},
publisher = {Wadsworth International Group},
title = {Classification and Regression Trees},
year = {1984}
}
Valid options are:
-S <num>
Random number seed.
(default 1)
-D
If set, classifier is run in debug mode and
may output additional info to the console
-M <min no>
The minimal number of instances at the terminal nodes.
(default 2)
-N <num folds>
The number of folds used in the minimal cost-complexity pruning.
(default 5)
-U
Don't use the minimal cost-complexity pruning.
(default yes).
-H
Don't use the heuristic method for binary split.
(default true).
-A
Use 1 SE rule to make pruning decision.
(default no).
-C
Percentage of training data size (0-1].
(default 1).
author: Haijian Shi (hs69@cs.waikato.ac.nz) version: $Revision: 1.2 $ |
Method Summary | |
public void | buildClassifier(Instances data) Build the classifier. | public void | calculateAlphas() Updates the alpha field for all nodes. | protected double | computeGini(double[] dist, double total) Compute and return gini index for a given distribution of a node. | protected double | computeGiniGain(double[] parentDist, double[][] childDist) Compute and return gini gain for given distributions of a node and its
successor nodes. | protected double | computeSortedInfo(Instances data, int[][] sortedIndices, double[][] weights, double[] classProbs) Compute sorted indices, weights and class probabilities for a given
dataset. | public double[] | distributionForInstance(Instance instance) Computes class probabilities for instance using the decision tree. | public Enumeration | enumerateMeasures() Return an enumeration of the measure names. | protected void | fillInnerNodes(Vector nodeList) Fills a list with all inner nodes in the tree. | public Capabilities | getCapabilities() Returns default capabilities of the classifier. | public boolean | getHeuristic() Get if use heuristic search for nominal attributes in multi-class problems. | protected Vector | getInnerNodes() Return a list of all inner nodes in the tree. | public double | getMeasure(String additionalMeasureName) Returns the value of the named measure. | public double | getMinNumObj() Get minimal number of instances at the terminal nodes. | public int | getNumFoldsPruning() Set number of folds in internal cross-validation. | public String[] | getOptions() Gets the current settings of the classifier. | public double | getSizePer() Get training set size. | public TechnicalInformation | getTechnicalInformation() Returns an instance of a TechnicalInformation object, containing
detailed information about the technical background of this class,
e.g., paper reference or book this class is based on. | public boolean | getUseOneSE() Get if use the 1SE rule to choose final model. | public boolean | getUsePrune() Get if use minimal cost-complexity pruning. | public String | globalInfo() Return a description suitable for displaying in the explorer/experimenter. | public String | heuristicTipText() | public Enumeration | listOptions() Returns an enumeration describing the available options. | public static void | main(String[] args) Main method. | protected void | makeLeaf(Instances data) Make the node leaf node. | protected void | makeTree(Instances data, int totalInstances, int[][] sortedIndices, double[][] weights, double[] classProbs, double totalWeight, double minNumObj, boolean useHeuristic) Make binary decision tree recursively. | public double | measureTreeSize() Return number of tree size. | public String | minNumObjTipText() | public void | modelErrors() Updates the numIncorrectModel field for all nodes when subtree (to be
pruned) is rooted. | protected SimpleCart | nodeToPrune(Vector nodeList) Find the node with minimal alpha value. | protected String | nominalDistribution(double[][] props, double[][][] dists, Attribute att, int[] sortedIndices, double[] weights, double[][] subsetWeights, double[] giniGains, Instances data, boolean useHeuristic) Compute distributions, proportions and total weights of two successor
nodes for a given nominal attribute. | public String | numFoldsPruningTipText() | public int | numInnerNodes() Method to count the number of inner nodes in the tree. | public int | numLeaves() Compute number of leaf nodes. | public int | numNodes() Compute size of the tree. | protected double | numericDistribution(double[][] props, double[][][] dists, Attribute att, int[] sortedIndices, double[] weights, double[][] subsetWeights, double[] giniGains, Instances data) Compute distributions, proportions and total weights of two successor
nodes for a given numeric attribute. | public void | prune(double alpha) Prunes the original tree using the CART pruning scheme, given a
cost-complexity parameter alpha. | public int | prune(double[] alphas, double[] errors, Instances test) Method for performing one fold in the cross-validation of minimal
cost-complexity pruning. | public void | setHeuristic(boolean value) Set if use heuristic search for nominal attributes in multi-class problems. | public void | setMinNumObj(double value) Set minimal number of instances at the terminal nodes. | public void | setNumFoldsPruning(int value) Set number of folds in internal cross-validation. | public void | setOptions(String[] options) Parses a given list of options. | public void | setSizePer(double value) Set training set size. | public void | setUseOneSE(boolean value) Set if use the 1SE rule to choose final model. | public void | setUsePrune(boolean value) Set if use minimal cost-complexity pruning. | public String | sizePerTipText() | protected void | splitData(int[][][] subsetIndices, double[][][] subsetWeights, Attribute att, double splitPoint, String splitStr, int[][] sortedIndices, double[][] weights, Instances data) Split data into two subsets and store sorted indices and weights for two
successor nodes. | public String | toString() Prints the decision tree using the protected toString method from below. | protected String | toString(int level) Outputs a tree at a certain level. | public void | treeErrors() Updates the numIncorrectTree field for all nodes. | protected void | unprune() Method to "unprune" the CART tree. | public String | useOneSETipText() | public String | usePruneTipText() |
m_Alpha | protected double m_Alpha(Code) | | Alpha-value (for pruning) at the node.
|
m_Attribute | protected Attribute m_Attribute(Code) | | Attribute used to split data.
|
m_ClassAttribute | protected Attribute m_ClassAttribute(Code) | | Class attriubte of data.
|
m_ClassProbs | protected double[] m_ClassProbs(Code) | | Class probabilities.
|
m_ClassValue | protected double m_ClassValue(Code) | | Class value if the node is leaf.
|
m_Distribution | protected double[] m_Distribution(Code) | | Distributions of leaf node (or temporary leaf node in minimal cost-complexity pruning)
|
m_Heuristic | protected boolean m_Heuristic(Code) | | If use huristic search for nominal attributes in multi-class problems (default true).
|
m_Props | protected double[] m_Props(Code) | | Proportion for each branch.
|
m_Prune | protected boolean m_Prune(Code) | | If use minimal cost-compexity pruning.
|
m_SizePer | protected double m_SizePer(Code) | | Training data size.
|
m_SplitString | protected String m_SplitString(Code) | | Split subset used to split data for nominal attributes.
|
m_SplitValue | protected double m_SplitValue(Code) | | Split point for a numeric attribute.
|
m_UseOneSE | protected boolean m_UseOneSE(Code) | | If use the 1SE rule to make final decision tree.
|
m_isLeaf | protected boolean m_isLeaf(Code) | | Indicate if the node is a leaf node.
|
m_minNumObj | protected double m_minNumObj(Code) | | Minimum number of instances in at the terminal nodes.
|
m_numFoldsPruning | protected int m_numFoldsPruning(Code) | | Number of folds for minimal cost-complexity pruning.
|
m_numIncorrectModel | protected double m_numIncorrectModel(Code) | | Number of training examples misclassified by the model (subtree rooted).
|
m_numIncorrectTree | protected double m_numIncorrectTree(Code) | | Number of training examples misclassified by the model (subtree not rooted).
|
m_totalTrainInstances | protected int m_totalTrainInstances(Code) | | Total number of instances used to build the classifier.
|
buildClassifier | public void buildClassifier(Instances data) throws Exception(Code) | | Build the classifier.
Parameters: data - the training instances throws: Exception - if something goes wrong |
calculateAlphas | public void calculateAlphas() throws Exception(Code) | | Updates the alpha field for all nodes.
throws: Exception - if something goes wrong |
computeGini | protected double computeGini(double[] dist, double total)(Code) | | Compute and return gini index for a given distribution of a node.
Parameters: dist - class distributions Parameters: total - class distributions Gini index of the class distributions |
computeGiniGain | protected double computeGiniGain(double[] parentDist, double[][] childDist)(Code) | | Compute and return gini gain for given distributions of a node and its
successor nodes.
Parameters: parentDist - class distributions of parent node Parameters: childDist - class distributions of successor nodes Gini gain computed |
computeSortedInfo | protected double computeSortedInfo(Instances data, int[][] sortedIndices, double[][] weights, double[] classProbs) throws Exception(Code) | | Compute sorted indices, weights and class probabilities for a given
dataset. Return total weights of the data at the node.
Parameters: data - training data Parameters: sortedIndices - sorted indices of instances at the node Parameters: weights - weights of instances at the node Parameters: classProbs - class probabilities at the node total weights of instances at the node throws: Exception - if something goes wrong |
distributionForInstance | public double[] distributionForInstance(Instance instance) throws Exception(Code) | | Computes class probabilities for instance using the decision tree.
Parameters: instance - the instance for which class probabilities is to be computed the class probabilities for the given instance throws: Exception - if something goes wrong |
enumerateMeasures | public Enumeration enumerateMeasures()(Code) | | Return an enumeration of the measure names.
an enumeration of the measure names |
fillInnerNodes | protected void fillInnerNodes(Vector nodeList)(Code) | | Fills a list with all inner nodes in the tree.
Parameters: nodeList - the list to be filled |
getCapabilities | public Capabilities getCapabilities()(Code) | | Returns default capabilities of the classifier.
the capabilities of this classifier |
getHeuristic | public boolean getHeuristic()(Code) | | Get if use heuristic search for nominal attributes in multi-class problems.
if use heuristic search for nominal attributes in multi-class problems |
getInnerNodes | protected Vector getInnerNodes()(Code) | | Return a list of all inner nodes in the tree.
the list of all inner nodes |
getMeasure | public double getMeasure(String additionalMeasureName)(Code) | | Returns the value of the named measure.
Parameters: additionalMeasureName - the name of the measure to query for its value the value of the named measure throws: IllegalArgumentException - if the named measure is not supported |
getMinNumObj | public double getMinNumObj()(Code) | | Get minimal number of instances at the terminal nodes.
minimal number of instances at the terminal nodes |
getNumFoldsPruning | public int getNumFoldsPruning()(Code) | | Set number of folds in internal cross-validation.
number of folds in internal cross-validation. |
getOptions | public String[] getOptions()(Code) | | Gets the current settings of the classifier.
the current setting of the classifier |
getSizePer | public double getSizePer()(Code) | | Get training set size.
training set size |
getTechnicalInformation | public TechnicalInformation getTechnicalInformation()(Code) | | Returns an instance of a TechnicalInformation object, containing
detailed information about the technical background of this class,
e.g., paper reference or book this class is based on.
the technical information about this class |
getUseOneSE | public boolean getUseOneSE()(Code) | | Get if use the 1SE rule to choose final model.
if use the 1SE rule to choose final model |
getUsePrune | public boolean getUsePrune()(Code) | | Get if use minimal cost-complexity pruning.
if use minimal cost-complexity pruning |
globalInfo | public String globalInfo()(Code) | | Return a description suitable for displaying in the explorer/experimenter.
a description suitable for displaying in the explorer/experimenter |
heuristicTipText | public String heuristicTipText()(Code) | | Returns the tip text for this property
tip text for this property suitable fordisplaying in the explorer/experimenter gui. |
listOptions | public Enumeration listOptions()(Code) | | Returns an enumeration describing the available options.
an enumeration of all the available options. |
main | public static void main(String[] args)(Code) | | Main method.
Parameters: args - the options for the classifier |
makeLeaf | protected void makeLeaf(Instances data)(Code) | | Make the node leaf node.
Parameters: data - trainging data |
makeTree | protected void makeTree(Instances data, int totalInstances, int[][] sortedIndices, double[][] weights, double[] classProbs, double totalWeight, double minNumObj, boolean useHeuristic) throws Exception(Code) | | Make binary decision tree recursively.
Parameters: data - the training instances Parameters: totalInstances - total number of instances Parameters: sortedIndices - sorted indices of the instances Parameters: weights - weights of the instances Parameters: classProbs - class probabilities Parameters: totalWeight - total weight of instances Parameters: minNumObj - minimal number of instances at leaf nodes Parameters: useHeuristic - if use heuristic search for nominal attributes in multi-class problem throws: Exception - if something goes wrong |
measureTreeSize | public double measureTreeSize()(Code) | | Return number of tree size.
number of tree size |
minNumObjTipText | public String minNumObjTipText()(Code) | | Returns the tip text for this property
tip text for this property suitable fordisplaying in the explorer/experimenter gui |
modelErrors | public void modelErrors() throws Exception(Code) | | Updates the numIncorrectModel field for all nodes when subtree (to be
pruned) is rooted. This is needed for calculating the alpha-values.
throws: Exception - if something goes wrong |
nodeToPrune | protected SimpleCart nodeToPrune(Vector nodeList)(Code) | | Find the node with minimal alpha value. If two nodes have the same alpha,
choose the one with more leave nodes.
Parameters: nodeList - list of inner nodes the node to be pruned |
nominalDistribution | protected String nominalDistribution(double[][] props, double[][][] dists, Attribute att, int[] sortedIndices, double[] weights, double[][] subsetWeights, double[] giniGains, Instances data, boolean useHeuristic) throws Exception(Code) | | Compute distributions, proportions and total weights of two successor
nodes for a given nominal attribute.
Parameters: props - proportions of each two branches for each attribute Parameters: dists - class distributions of two branches for each attribute Parameters: att - numeric att split on Parameters: sortedIndices - sorted indices of instances for the attirubte Parameters: weights - weights of instances for the attirbute Parameters: subsetWeights - total weight of two branches split based on the attribute Parameters: giniGains - Gini gains for each attribute Parameters: data - training instances Parameters: useHeuristic - if use heuristic search Gini gain for the given nominal attribute throws: Exception - if something goes wrong |
numFoldsPruningTipText | public String numFoldsPruningTipText()(Code) | | Returns the tip text for this property
tip text for this property suitable fordisplaying in the explorer/experimenter gui |
numInnerNodes | public int numInnerNodes()(Code) | | Method to count the number of inner nodes in the tree.
the number of inner nodes |
numLeaves | public int numLeaves()(Code) | | Compute number of leaf nodes.
number of leaf nodes |
numNodes | public int numNodes()(Code) | | Compute size of the tree.
size of the tree |
numericDistribution | protected double numericDistribution(double[][] props, double[][][] dists, Attribute att, int[] sortedIndices, double[] weights, double[][] subsetWeights, double[] giniGains, Instances data) throws Exception(Code) | | Compute distributions, proportions and total weights of two successor
nodes for a given numeric attribute.
Parameters: props - proportions of each two branches for each attribute Parameters: dists - class distributions of two branches for each attribute Parameters: att - numeric att split on Parameters: sortedIndices - sorted indices of instances for the attirubte Parameters: weights - weights of instances for the attirbute Parameters: subsetWeights - total weight of two branches split based on the attribute Parameters: giniGains - Gini gains for each attribute Parameters: data - training instances Gini gain the given numeric attribute throws: Exception - if something goes wrong |
prune | public void prune(double alpha) throws Exception(Code) | | Prunes the original tree using the CART pruning scheme, given a
cost-complexity parameter alpha.
Parameters: alpha - the cost-complexity parameter throws: Exception - if something goes wrong |
prune | public int prune(double[] alphas, double[] errors, Instances test) throws Exception(Code) | | Method for performing one fold in the cross-validation of minimal
cost-complexity pruning. Generates a sequence of alpha-values with error
estimates for the corresponding (partially pruned) trees, given the test
set of that fold.
Parameters: alphas - array to hold the generated alpha-values Parameters: errors - array to hold the corresponding error estimates Parameters: test - test set of that fold (to obtain error estimates) the iteration of the pruning throws: Exception - if something goes wrong |
setHeuristic | public void setHeuristic(boolean value)(Code) | | Set if use heuristic search for nominal attributes in multi-class problems.
Parameters: value - if use heuristic search for nominal attributes in multi-class problems |
setMinNumObj | public void setMinNumObj(double value)(Code) | | Set minimal number of instances at the terminal nodes.
Parameters: value - minimal number of instances at the terminal nodes |
setNumFoldsPruning | public void setNumFoldsPruning(int value)(Code) | | Set number of folds in internal cross-validation.
Parameters: value - number of folds in internal cross-validation. |
setOptions | public void setOptions(String[] options) throws Exception(Code) | | Parses a given list of options.
Valid options are:
-S <num>
Random number seed.
(default 1)
-D
If set, classifier is run in debug mode and
may output additional info to the console
-M <min no>
The minimal number of instances at the terminal nodes.
(default 2)
-N <num folds>
The number of folds used in the minimal cost-complexity pruning.
(default 5)
-U
Don't use the minimal cost-complexity pruning.
(default yes).
-H
Don't use the heuristic method for binary split.
(default true).
-A
Use 1 SE rule to make pruning decision.
(default no).
-C
Percentage of training data size (0-1].
(default 1).
Parameters: options - the list of options as an array of strings throws: Exception - if an options is not supported |
setSizePer | public void setSizePer(double value)(Code) | | Set training set size.
Parameters: value - training set size |
setUseOneSE | public void setUseOneSE(boolean value)(Code) | | Set if use the 1SE rule to choose final model.
Parameters: value - if use the 1SE rule to choose final model |
setUsePrune | public void setUsePrune(boolean value)(Code) | | Set if use minimal cost-complexity pruning.
Parameters: value - if use minimal cost-complexity pruning |
sizePerTipText | public String sizePerTipText()(Code) | | Returns the tip text for this property
tip text for this property suitable fordisplaying in the explorer/experimenter gui. |
splitData | protected void splitData(int[][][] subsetIndices, double[][][] subsetWeights, Attribute att, double splitPoint, String splitStr, int[][] sortedIndices, double[][] weights, Instances data) throws Exception(Code) | | Split data into two subsets and store sorted indices and weights for two
successor nodes.
Parameters: subsetIndices - sorted indecis of instances for each attribute for two successor node Parameters: subsetWeights - weights of instances for each attribute for two successor node Parameters: att - attribute the split based on Parameters: splitPoint - split point the split based on if att is numeric Parameters: splitStr - split subset the split based on if att is nominal Parameters: sortedIndices - sorted indices of the instances to be split Parameters: weights - weights of the instances to bes split Parameters: data - training data throws: Exception - if something goes wrong |
toString | public String toString()(Code) | | Prints the decision tree using the protected toString method from below.
a textual description of the classifier |
toString | protected String toString(int level)(Code) | | Outputs a tree at a certain level.
Parameters: level - the level at which the tree is to be printed a tree at a certain level |
treeErrors | public void treeErrors() throws Exception(Code) | | Updates the numIncorrectTree field for all nodes. This is needed for
calculating the alpha-values.
throws: Exception - if something goes wrong |
unprune | protected void unprune()(Code) | | Method to "unprune" the CART tree. Sets all leaf-fields to false.
Faster than re-growing the tree because CART do not have to be fit again.
|
useOneSETipText | public String useOneSETipText()(Code) | | Returns the tip text for this property
tip text for this property suitable fordisplaying in the explorer/experimenter gui. |
usePruneTipText | public String usePruneTipText()(Code) | | Return the tip text for this property
tip text for this property suitable for displaying in the explorer/experimenter gui. |
Fields inherited from weka.classifiers.RandomizableClassifier | protected int m_Seed(Code)(Java Doc)
|
|
|