Source Code Cross Referenced for RandomSubSpace.java in » Science » weka » weka » classifiers » meta » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1.	6.0 JDK Core
2.	6.0 JDK Modules
3.	6.0 JDK Modules com.sun
4.	6.0 JDK Modules com.sun.java
5.	6.0 JDK Modules sun
6.	6.0 JDK Platform
7.	Ajax
8.	Apache Harmony Java SE
9.	Aspect oriented
10.	Authentication Authorization
11.	Blogger System
12.	Build
13.	Byte Code
14.	Cache
15.	Chart
16.	Chat
17.	Code Analyzer
18.	Collaboration
19.	Content Management System
20.	Database Client
21.	Database DBMS
22.	Database JDBC Connection Pool
23.	Database ORM
24.	Development
25.	EJB Server geronimo
26.	EJB Server GlassFish
27.	EJB Server JBoss 4.2.1
28.	EJB Server resin 3.1.5
29.	ERP CRM Financial
30.	ESB
31.	Forum
32.	GIS
33.	Graphic Library
34.	Groupware
35.	HTML Parser
36.	IDE
37.	IDE Eclipse
38.	IDE Netbeans
39.	Installer
40.	Internationalization Localization
41.	Inversion of Control
42.	Issue Tracking
43.	J2EE
44.	JBoss
45.	JMS
46.	JMX
47.	Library
48.	Mail Clients
49.	Net
50.	Parser
51.	PDF
52.	Portal
53.	Profiler
54.	Project Management
55.	Report
56.	RSS RDF
57.	Rule Engine
58.	Science
59.	Scripting
60.	Search Engine
61.	Security
62.	Sevlet Container
63.	Source Control
64.	Swing Library
65.	Template Engine
66.	Test Coverage
67.	Testing
68.	UML
69.	Web Crawler
70.	Web Framework
71.	Web Mail
72.	Web Server
73.	Web Services
74.	Web Services apache cxf 2.0.1
75.	Web Services AXIS2
76.	Wiki Engine
77.	Workflow Engines
78.	XML
79.	XML UI
Java
Java Tutorial
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Science » weka » weka.classifiers.meta
Source Cross Referenced Class Diagram Java Document (Java Doc)
001:        /*
002:         *    This program is free software; you can redistribute it and/or modify
003:         *    it under the terms of the GNU General Public License as published by
004:         *    the Free Software Foundation; either version 2 of the License, or
005:         *    (at your option) any later version.
006:         *
007:         *    This program is distributed in the hope that it will be useful,
008:         *    but WITHOUT ANY WARRANTY; without even the implied warranty of
009:         *    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
010:         *    GNU General Public License for more details.
011:         *
012:         *    You should have received a copy of the GNU General Public License
013:         *    along with this program; if not, write to the Free Software
014:         *    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
015:         */
016:
017:        /*
018:         *    RandomSubSpace.java
019:         *    Copyright (C) 2006 University of Waikato, Hamilton, New Zealand
020:         *
021:         */
022:
023:        package weka.classifiers.meta;
024:
025:        import weka.filters.unsupervised.attribute.Remove;
026:        import weka.classifiers.Classifier;
027:        import weka.classifiers.RandomizableIteratedSingleClassifierEnhancer;
028:        import weka.core.Instance;
029:        import weka.core.Instances;
030:        import weka.core.Option;
031:        import weka.core.Randomizable;
032:        import weka.core.TechnicalInformation;
033:        import weka.core.TechnicalInformationHandler;
034:        import weka.core.Utils;
035:        import weka.core.WeightedInstancesHandler;
036:        import weka.core.TechnicalInformation.Field;
037:        import weka.core.TechnicalInformation.Type;
038:
039:        import java.util.Enumeration;
040:        import java.util.Random;
041:        import java.util.Vector;
042:        import java.util.Arrays;
043:        import java.util.Collections;
044:
045:        /**
046:         <!-- globalinfo-start -->
047:         * This method constructs a decision tree based classifier that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity. The classifier consists of multiple trees constructed systematically by pseudorandomly selecting subsets of components of the feature vector, that is, trees constructed in randomly chosen subspaces.<br/>
048:         * <br/>
049:         * For more information, see<br/>
050:         * <br/>
051:         * Tin Kam Ho (1998). The Random Subspace Method for Constructing Decision Forests. IEEE Transactions on Pattern Analysis and Machine Intelligence. 20(8):832-844. URL http://citeseer.ist.psu.edu/ho98random.html.
052:         * <p/>
053:         <!-- globalinfo-end -->
054:         *
055:         <!-- technical-bibtex-start -->
056:         * BibTeX:
057:         * <pre>
058:         * &#64;article{Ho1998,
059:         *    author = {Tin Kam Ho},
060:         *    journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
061:         *    number = {8},
062:         *    pages = {832-844},
063:         *    title = {The Random Subspace Method for Constructing Decision Forests},
064:         *    volume = {20},
065:         *    year = {1998},
066:         *    ISSN = {0162-8828},
067:         *    URL = {http://citeseer.ist.psu.edu/ho98random.html}
068:         * }
069:         * </pre>
070:         * <p/>
071:         <!-- technical-bibtex-end -->
072:         *
073:         <!-- options-start -->
074:         * Valid options are: <p/>
075:         * 
076:         * <pre> -P
077:         *  Size of each subspace:
078:         *   &lt; 1: percentage of the number of attributes
079:         *   &gt;=1: absolute number of attributes
080:         * </pre>
081:         * 
082:         * <pre> -S &lt;num&gt;
083:         *  Random number seed.
084:         *  (default 1)</pre>
085:         * 
086:         * <pre> -I &lt;num&gt;
087:         *  Number of iterations.
088:         *  (default 10)</pre>
089:         * 
090:         * <pre> -D
091:         *  If set, classifier is run in debug mode and
092:         *  may output additional info to the console</pre>
093:         * 
094:         * <pre> -W
095:         *  Full name of base classifier.
096:         *  (default: weka.classifiers.trees.REPTree)</pre>
097:         * 
098:         * <pre> 
099:         * Options specific to classifier weka.classifiers.trees.REPTree:
100:         * </pre>
101:         * 
102:         * <pre> -M &lt;minimum number of instances&gt;
103:         *  Set minimum number of instances per leaf (default 2).</pre>
104:         * 
105:         * <pre> -V &lt;minimum variance for split&gt;
106:         *  Set minimum numeric class variance proportion
107:         *  of train variance for split (default 1e-3).</pre>
108:         * 
109:         * <pre> -N &lt;number of folds&gt;
110:         *  Number of folds for reduced error pruning (default 3).</pre>
111:         * 
112:         * <pre> -S &lt;seed&gt;
113:         *  Seed for random data shuffling (default 1).</pre>
114:         * 
115:         * <pre> -P
116:         *  No pruning.</pre>
117:         * 
118:         * <pre> -L
119:         *  Maximum tree depth (default -1, no maximum)</pre>
120:         * 
121:         <!-- options-end -->
122:         *
123:         * Options after -- are passed to the designated classifier.<p>
124:         *
125:         * @author Bernhard Pfahringer (bernhard@cs.waikato.ac.nz)
126:         * @author Peter Reutemann (fracpete@cs.waikato.ac.nz)
127:         * @version $Revision: 1.3 $
128:         */
129:        public class RandomSubSpace extends
130:                RandomizableIteratedSingleClassifierEnhancer implements 
131:                WeightedInstancesHandler, TechnicalInformationHandler {
132:
133:            /** for serialization */
134:            private static final long serialVersionUID = 1278172513912424947L;
135:
136:            /** The size of each bag sample, as a percentage of the training size */
137:            protected double m_SubSpaceSize = 0.5;
138:
139:            /** a ZeroR model in case no model can be built from the data */
140:            protected Classifier m_ZeroR;
141:
142:            /**
143:             * Constructor.
144:             */
145:            public RandomSubSpace() {
146:                super ();
147:
148:                m_Classifier = new weka.classifiers.trees.REPTree();
149:            }
150:
151:            /**
152:             * Returns a string describing classifier
153:             * 
154:             * @return 		a description suitable for
155:             * 			displaying in the explorer/experimenter gui
156:             */
157:            public String globalInfo() {
158:                return "This method constructs a decision tree based classifier that "
159:                        + "maintains highest accuracy on training data and improves on "
160:                        + "generalization accuracy as it grows in complexity. The classifier "
161:                        + "consists of multiple trees constructed systematically by "
162:                        + "pseudorandomly selecting subsets of components of the feature vector, "
163:                        + "that is, trees constructed in randomly chosen subspaces.\n\n"
164:                        + "For more information, see\n\n"
165:                        + getTechnicalInformation().toString();
166:            }
167:
168:            /**
169:             * Returns an instance of a TechnicalInformation object, containing 
170:             * detailed information about the technical background of this class,
171:             * e.g., paper reference or book this class is based on.
172:             * 
173:             * @return 		the technical information about this class
174:             */
175:            public TechnicalInformation getTechnicalInformation() {
176:                TechnicalInformation result;
177:
178:                result = new TechnicalInformation(Type.ARTICLE);
179:                result.setValue(Field.AUTHOR, "Tin Kam Ho");
180:                result.setValue(Field.YEAR, "1998");
181:                result
182:                        .setValue(Field.TITLE,
183:                                "The Random Subspace Method for Constructing Decision Forests");
184:                result
185:                        .setValue(Field.JOURNAL,
186:                                "IEEE Transactions on Pattern Analysis and Machine Intelligence");
187:                result.setValue(Field.VOLUME, "20");
188:                result.setValue(Field.NUMBER, "8");
189:                result.setValue(Field.PAGES, "832-844");
190:                result.setValue(Field.URL,
191:                        "http://citeseer.ist.psu.edu/ho98random.html");
192:                result.setValue(Field.ISSN, "0162-8828");
193:
194:                return result;
195:            }
196:
197:            /**
198:             * String describing default classifier.
199:             * 
200:             * @return 		the default classifier classname
201:             */
202:            protected String defaultClassifierString() {
203:                return "weka.classifiers.trees.REPTree";
204:            }
205:
206:            /**
207:             * Returns an enumeration describing the available options.
208:             *
209:             * @return 		an enumeration of all the available options.
210:             */
211:            public Enumeration listOptions() {
212:                Vector result = new Vector();
213:
214:                result.addElement(new Option("\tSize of each subspace:\n"
215:                        + "\t\t< 1: percentage of the number of attributes\n"
216:                        + "\t\t>=1: absolute number of attributes\n", "P", 1,
217:                        "-P"));
218:
219:                Enumeration enu = super .listOptions();
220:                while (enu.hasMoreElements()) {
221:                    result.addElement(enu.nextElement());
222:                }
223:
224:                return result.elements();
225:            }
226:
227:            /**
228:             * Parses a given list of options. <p/>
229:             *
230:             <!-- options-start -->
231:             * Valid options are: <p/>
232:             * 
233:             * <pre> -P
234:             *  Size of each subspace:
235:             *   &lt; 1: percentage of the number of attributes
236:             *   &gt;=1: absolute number of attributes
237:             * </pre>
238:             * 
239:             * <pre> -S &lt;num&gt;
240:             *  Random number seed.
241:             *  (default 1)</pre>
242:             * 
243:             * <pre> -I &lt;num&gt;
244:             *  Number of iterations.
245:             *  (default 10)</pre>
246:             * 
247:             * <pre> -D
248:             *  If set, classifier is run in debug mode and
249:             *  may output additional info to the console</pre>
250:             * 
251:             * <pre> -W
252:             *  Full name of base classifier.
253:             *  (default: weka.classifiers.trees.REPTree)</pre>
254:             * 
255:             * <pre> 
256:             * Options specific to classifier weka.classifiers.trees.REPTree:
257:             * </pre>
258:             * 
259:             * <pre> -M &lt;minimum number of instances&gt;
260:             *  Set minimum number of instances per leaf (default 2).</pre>
261:             * 
262:             * <pre> -V &lt;minimum variance for split&gt;
263:             *  Set minimum numeric class variance proportion
264:             *  of train variance for split (default 1e-3).</pre>
265:             * 
266:             * <pre> -N &lt;number of folds&gt;
267:             *  Number of folds for reduced error pruning (default 3).</pre>
268:             * 
269:             * <pre> -S &lt;seed&gt;
270:             *  Seed for random data shuffling (default 1).</pre>
271:             * 
272:             * <pre> -P
273:             *  No pruning.</pre>
274:             * 
275:             * <pre> -L
276:             *  Maximum tree depth (default -1, no maximum)</pre>
277:             * 
278:             <!-- options-end -->
279:             *
280:             * Options after -- are passed to the designated classifier.<p>
281:             *
282:             * @param options 	the list of options as an array of strings
283:             * @throws Exception 	if an option is not supported
284:             */
285:            public void setOptions(String[] options) throws Exception {
286:                String tmpStr;
287:
288:                tmpStr = Utils.getOption('P', options);
289:                if (tmpStr.length() != 0)
290:                    setSubSpaceSize(Double.parseDouble(tmpStr));
291:                else
292:                    setSubSpaceSize(0.5);
293:
294:                super .setOptions(options);
295:            }
296:
297:            /**
298:             * Gets the current settings of the Classifier.
299:             *
300:             * @return 		an array of strings suitable for passing to setOptions
301:             */
302:            public String[] getOptions() {
303:                Vector result;
304:                String[] options;
305:                int i;
306:
307:                result = new Vector();
308:
309:                result.add("-P");
310:                result.add("" + getSubSpaceSize());
311:
312:                options = super .getOptions();
313:                for (i = 0; i < options.length; i++)
314:                    result.add(options[i]);
315:
316:                return (String[]) result.toArray(new String[result.size()]);
317:            }
318:
319:            /**
320:             * Returns the tip text for this property
321:             * 
322:             * @return 		tip text for this property suitable for
323:             * 			displaying in the explorer/experimenter gui
324:             */
325:            public String subSpaceSizeTipText() {
326:                return "Size of each subSpace: if less than 1 as a percentage of the "
327:                        + "number of attributes, otherwise the absolute number of attributes.";
328:            }
329:
330:            /**
331:             * Gets the size of each subSpace, as a percentage of the training set size.
332:             *
333:             * @return 		the subSpace size, as a percentage.
334:             */
335:            public double getSubSpaceSize() {
336:                return m_SubSpaceSize;
337:            }
338:
339:            /**
340:             * Sets the size of each subSpace, as a percentage of the training set size.
341:             *
342:             * @param value 	the subSpace size, as a percentage.
343:             */
344:            public void setSubSpaceSize(double value) {
345:                m_SubSpaceSize = value;
346:            }
347:
348:            /**
349:             * calculates the number of attributes
350:             * 
351:             * @param total	the available number of attributes
352:             * @param fraction	the fraction - if less than 1 it represents the
353:             * 			percentage, otherwise the absolute number of attributes
354:             * @return		the number of attributes to use
355:             */
356:            protected int numberOfAttributes(int total, double fraction) {
357:                int k = (int) Math.round((fraction < 1.0) ? total * fraction
358:                        : fraction);
359:
360:                if (k > total)
361:                    k = total;
362:                if (k < 1)
363:                    k = 1;
364:
365:                return k;
366:            }
367:
368:            /**
369:             * generates an index string describing a random subspace, suitable for
370:             * the Remove filter.
371:             * 
372:             * @param indices		the attribute indices
373:             * @param subSpaceSize	the size of the subspace
374:             * @param classIndex		the class index
375:             * @param random		the random number generator
376:             * @return			the generated string describing the subspace
377:             */
378:            protected String randomSubSpace(Integer[] indices,
379:                    int subSpaceSize, int classIndex, Random random) {
380:                Collections.shuffle(Arrays.asList(indices), random);
381:                StringBuffer sb = new StringBuffer("");
382:                for (int i = 0; i < subSpaceSize; i++) {
383:                    sb.append(indices[i] + ",");
384:                }
385:                sb.append(classIndex);
386:
387:                if (getDebug())
388:                    System.out.println("subSPACE = " + sb);
389:
390:                return sb.toString();
391:            }
392:
393:            /**
394:             * builds the classifier.
395:             *
396:             * @param data 	the training data to be used for generating the
397:             * 			classifier.
398:             * @throws Exception 	if the classifier could not be built successfully
399:             */
400:            public void buildClassifier(Instances data) throws Exception {
401:
402:                // can classifier handle the data?
403:                getCapabilities().testWithFail(data);
404:
405:                // remove instances with missing class
406:                data = new Instances(data);
407:                data.deleteWithMissingClass();
408:
409:                // only class? -> build ZeroR model
410:                if (data.numAttributes() == 1) {
411:                    System.err
412:                            .println("Cannot build model (only class attribute present in data!), "
413:                                    + "using ZeroR model instead!");
414:                    m_ZeroR = new weka.classifiers.rules.ZeroR();
415:                    m_ZeroR.buildClassifier(data);
416:                    return;
417:                } else {
418:                    m_ZeroR = null;
419:                }
420:
421:                super .buildClassifier(data);
422:
423:                Integer[] indices = new Integer[data.numAttributes() - 1];
424:                int classIndex = data.classIndex();
425:                int offset = 0;
426:                for (int i = 0; i < indices.length + 1; i++) {
427:                    if (i != classIndex) {
428:                        indices[offset++] = i + 1;
429:                    }
430:                }
431:                int subSpaceSize = numberOfAttributes(indices.length,
432:                        getSubSpaceSize());
433:                Random random = data.getRandomNumberGenerator(m_Seed);
434:
435:                for (int j = 0; j < m_Classifiers.length; j++) {
436:                    if (m_Classifier instanceof  Randomizable) {
437:                        ((Randomizable) m_Classifiers[j]).setSeed(random
438:                                .nextInt());
439:                    }
440:                    FilteredClassifier fc = new FilteredClassifier();
441:                    fc.setClassifier(m_Classifiers[j]);
442:                    m_Classifiers[j] = fc;
443:                    Remove rm = new Remove();
444:                    rm.setOptions(new String[] {
445:                            "-V",
446:                            "-R",
447:                            randomSubSpace(indices, subSpaceSize,
448:                                    classIndex + 1, random) });
449:                    fc.setFilter(rm);
450:
451:                    // build the classifier
452:                    m_Classifiers[j].buildClassifier(data);
453:                }
454:
455:            }
456:
457:            /**
458:             * Calculates the class membership probabilities for the given test
459:             * instance.
460:             *
461:             * @param instance 	the instance to be classified
462:             * @return 		preedicted class probability distribution
463:             * @throws Exception 	if distribution can't be computed successfully 
464:             */
465:            public double[] distributionForInstance(Instance instance)
466:                    throws Exception {
467:
468:                // default model?
469:                if (m_ZeroR != null) {
470:                    return m_ZeroR.distributionForInstance(instance);
471:                }
472:
473:                double[] sums = new double[instance.numClasses()], newProbs;
474:
475:                for (int i = 0; i < m_NumIterations; i++) {
476:                    if (instance.classAttribute().isNumeric() == true) {
477:                        sums[0] += m_Classifiers[i].classifyInstance(instance);
478:                    } else {
479:                        newProbs = m_Classifiers[i]
480:                                .distributionForInstance(instance);
481:                        for (int j = 0; j < newProbs.length; j++)
482:                            sums[j] += newProbs[j];
483:                    }
484:                }
485:                if (instance.classAttribute().isNumeric() == true) {
486:                    sums[0] /= (double) m_NumIterations;
487:                    return sums;
488:                } else if (Utils.eq(Utils.sum(sums), 0)) {
489:                    return sums;
490:                } else {
491:                    Utils.normalize(sums);
492:                    return sums;
493:                }
494:            }
495:
496:            /**
497:             * Returns description of the bagged classifier.
498:             *
499:             * @return 		description of the bagged classifier as a string
500:             */
501:            public String toString() {
502:
503:                // only ZeroR model?
504:                if (m_ZeroR != null) {
505:                    StringBuffer buf = new StringBuffer();
506:                    buf.append(this .getClass().getName()
507:                            .replaceAll(".*\\.", "")
508:                            + "\n");
509:                    buf.append(this .getClass().getName()
510:                            .replaceAll(".*\\.", "").replaceAll(".", "=")
511:                            + "\n\n");
512:                    buf
513:                            .append("Warning: No model could be built, hence ZeroR model is used:\n\n");
514:                    buf.append(m_ZeroR.toString());
515:                    return buf.toString();
516:                }
517:
518:                if (m_Classifiers == null) {
519:                    return "RandomSubSpace: No model built yet.";
520:                }
521:                StringBuffer text = new StringBuffer();
522:                text.append("All the base classifiers: \n\n");
523:                for (int i = 0; i < m_Classifiers.length; i++)
524:                    text.append(m_Classifiers[i].toString() + "\n\n");
525:
526:                return text.toString();
527:            }
528:
529:            /**
530:             * Main method for testing this class.
531:             *
532:             * @param args 	the options
533:             */
534:            public static void main(String[] args) {
535:                runClassifier(new RandomSubSpace(), args);
536:            }
537:        }
www.java2java.com | Contact Us
All other trademarks are property of their respective owners.