org.archive.crawler.postprocessor |
|
Java Source File Name | Type | Comment |
ContentBasedWaitEvaluator.java | Class | A WaitEvaluator that compares the CrawlURIs content type to a configurable
regular expression. |
CrawlStateUpdater.java | Class | A step, late in the processing of a CrawlURI, for updating the per-host
information that may have been affected by the fetch. |
FrontierScheduler.java | Class | 'Schedule' with the Frontier CandidateURIs being carried by the passed
CrawlURI.
Adds either prerequisites or whatever is in CrawlURI outlinks to the
Frontier. |
ImageWaitEvaluator.java | Class | A specialized ContentBasedWaitEvaluator. |
LinksScoper.java | Class | Determine which extracted links are within scope.
TODO: To test scope, requires that Link be converted to
a CandidateURI. |
LowDiskPauseProcessor.java | Class | Processor module which uses 'df -k', where available and with
the expected output format (on Linux), to monitor available
disk space and pause the crawl if free space on monitored
filesystems falls below certain thresholds. |
SupplementaryLinksScoper.java | Class | Run CandidateURI links carried in the passed CrawlURI through a filter
and 'handle' rejections.
Used to do supplementary processing of links after they've been scope
processed and ruled 'in-scope' by LinkScoper. |
TextWaitEvaluator.java | Class | A specialized ContentBasedWaitEvaluator. |
WaitEvaluator.java | Class | A processor that determines when a URI should be revisited next. |