org.archive.crawler.framework |
|
Java Source File Name | Type | Comment |
AbstractTracker.java | Class | A partial implementation of the StatisticsTracking interface.
It covers the thread handling. |
AlertManager.java | Interface | Manager for application alerts. |
Checkpointer.java | Class | Runs checkpointing. |
CrawlController.java | Class | CrawlController collects all the classes which cooperate to
perform a crawl and provides a high-level interface to the
running crawl. |
CrawlScope.java | Class | A CrawlScope instance defines which URIs are "in"
a particular crawl.
It is essentially a Filter which determines, looking at
the totality of information available about a
CandidateURI/CrawlURI instamce, if that URI should be
scheduled for crawling.
Dynamic information inherent in the discovery of the
URI -- such as the path by which it was discovered --
may be considered.
Dynamic information which requires the consultation
of external and potentially volatile information --
such as current robots.txt requests and the history
of attempts to crawl the same URI -- should NOT be
considered. |
Filter.java | Class | Base class for filter classes.
Several classes allow 'filters' to be applied to them. |
Frontier.java | Interface | An interface for URI Frontiers.
A URI Frontier is a pluggable module in Heritrix that maintains the
internal state of the crawl. |
FrontierHostStatistics.java | Interface | An optional interface the Frontiers can implement to provide information
about specific hosts.
Some URIFrontier implmentations will want to provide a number of
statistics relating to the progress of particular hosts. |
FrontierMarker.java | Interface | A marker is a pointer to a place somewhere inside a frontier's list of
pending URIs. |
Processor.java | Class | Base class for URI processing classes.
Each URI is processed by a user defined series of processors. |
ProcessorChain.java | Class | This class groups together a number of processors that logically fit
together. |
ProcessorChainList.java | Class | A list of all the ProcessorChains. |
Scoper.java | Class | Base class for Scopers. |
StatisticsTracking.java | Interface | An interface for objects that want to collect statistics on
running crawls. |
ToePool.java | Class | A collection of ToeThreads. |
ToeThread.java | Class | One "worker thread"; asks for CrawlURIs, processes them,
repeats unless told otherwise. |
WriterPoolProcessor.java | Class | Abstract implementation of a file pool processor. |