| org.archive.crawler.framework.Processor org.archive.crawler.extractor.Extractor
All known Subclasses: org.archive.crawler.extractor.ExtractorSWF, org.archive.crawler.extractor.ExtractorCSS, org.archive.crawler.extractor.ExtractorDOC, org.archive.crawler.extractor.ExtractorJS, org.archive.crawler.extractor.ExtractorXML, org.archive.crawler.extractor.ExtractorHTML, org.archive.crawler.extractor.ExtractorPDF, org.archive.crawler.extractor.ExtractorImpliedURI, org.archive.crawler.extractor.ExtractorUniversal, org.archive.crawler.extractor.TrapSuppressExtractor, org.archive.crawler.extractor.ExtractorURI,
Extractor | abstract public class Extractor extends Processor (Code) | | Convenience shared superclass for Extractor Processors.
Currently only wraps Extractor-specific extract() action with
a StackOverflowError catch/log/proceed handler, so that any
extractors that recurse too deep on problematic input will
only suffer a local error, and other normal CrawlURI processing
can continue. See:
[ 1122836 ] Localize StackOverflowError in Extractors
http://sourceforge.net/tracker/index.php?func=detail&aid=1122836&group_id=73833&atid=539099
This class could also become home to common utility features
of extractors, like a running tally of the URIs examined/discovered,
etc.
author: gojomo |
Extractor | public Extractor(String name, String description)(Code) | | Passthrough constructor.
Parameters: name - Parameters: description - |
|
|