org.archive.crawler.processor |
|
Java Source File Name | Type | Comment |
BeanShellProcessor.java | Class | A processor which runs a BeanShell script on the CrawlURI.
Script source may be provided via a file
local to the crawler. |
CrawlMapper.java | Class | A simple crawl splitter/mapper, dividing up CandidateURIs/CrawlURIs
between crawlers by diverting some range of URIs to local log files
(which can then be imported to other crawlers). |
HashCrawlMapper.java | Class | Maps URIs to one of N crawler names by applying a hash to the
URI's (possibly-transformed) classKey. |
LexicalCrawlMapper.java | Class | A simple crawl splitter/mapper, dividing up CandidateURIs/CrawlURIs
between crawlers by diverting some range of URIs to local log files
(which can then be imported to other crawlers). |