| org.archive.crawler.settings.ModuleType org.archive.crawler.framework.Processor
All known Subclasses: org.archive.crawler.processor.recrawl.PersistProcessor, org.archive.crawler.extractor.Extractor, org.archive.crawler.fetcher.FetchHTTP, org.archive.crawler.processor.BeanShellProcessor, org.archive.crawler.framework.WriterPoolProcessor, org.archive.crawler.prefetch.RuntimeLimitEnforcer, org.archive.crawler.postprocessor.FrontierScheduler, org.archive.crawler.prefetch.QuotaEnforcer, org.archive.crawler.extractor.ChangeEvaluator, org.archive.crawler.fetcher.FetchDNS, org.archive.crawler.prefetch.PreconditionEnforcer, org.archive.crawler.postprocessor.LowDiskPauseProcessor, org.archive.crawler.processor.recrawl.FetchHistoryProcessor, org.archive.crawler.writer.MirrorWriterProcessor, org.archive.crawler.processor.CrawlMapper, org.archive.crawler.postprocessor.WaitEvaluator, org.archive.crawler.extractor.HTTPContentDigest, org.archive.crawler.fetcher.FetchFTP, org.archive.crawler.extractor.ExtractorHTTP, org.archive.crawler.writer.Kw3WriterProcessor, org.archive.crawler.postprocessor.CrawlStateUpdater, org.archive.crawler.framework.Scoper,
Processor | public class Processor extends ModuleType (Code) | | Base class for URI processing classes.
Each URI is processed by a user defined series of processors. This class
provides the basic infrastructure for these but does not actually do
anything. New processors can be easily created by subclassing this class.
Classes subclassing this one should not trap InterruptedExceptions.
They should be allowed to propagate to the ToeThread executing the processor.
Also they should immediately exit their main method (innerProcess())
if the interrupted flag is set.
author: Gordon Mohr See Also: org.archive.crawler.framework.ToeThread |
ATTR_DECIDE_RULES | final public static String ATTR_DECIDE_RULES(Code) | | Key to use asking settings for decide-rules value.
|
ATTR_ENABLED | final public static String ATTR_ENABLED(Code) | | Key to use asking settings for enabled value.
|
attrDecideRules | protected String attrDecideRules(Code) | | local name for decide-rules
|
Processor | public Processor(String name, String description)(Code) | | Parameters: name - Parameters: description - |
finalTasks | protected void finalTasks()(Code) | | Classes subclassing this one should override this method to perform
processor specific actions.
|
getController | public CrawlController getController()(Code) | | Get the controller object.
the controller object. |
getDefaultNextProcessor | public Processor getDefaultNextProcessor(CrawlURI curi)(Code) | | Returns the next processor for the given CrawlURI in the processor chain.
Parameters: curi - The CrawlURI that we want to find the next processor for. The next processor for the given CrawlURI in the processor chain. |
initialTasks | protected void initialTasks()(Code) | | Classes subclassing this one should override this method to perform
processor specific actions.
This method is garanteed to be called after the crawl is set up, but
before any URI-processing has occured.
|
innerProcess | protected void innerProcess(CrawlURI curi) throws InterruptedException(Code) | | Classes subclassing this one should override this method to perform
their custom actions on the CrawlURI.
Parameters: curi - The CrawlURI being processed. throws: InterruptedException - |
isContentToProcess | protected boolean isContentToProcess(CrawlURI curi)(Code) | | Parameters: curi - CrawlURI to examine. True if content to process -- content length is > 0 -- and links have not yet been extracted. |
isExpectedMimeType | protected boolean isExpectedMimeType(String contentType, String expectedPrefix)(Code) | | Parameters: contentType - Found content type. Parameters: expectedPrefix - String to find at start of contenttype: e.g.text/html . True if passed content-type begins withexpected mimetype. |
kickUpdate | public void kickUpdate()(Code) | | |
report | public String report()(Code) | | Compiles and returns a report (in human readable form) about the status
of the processor. The processor's name (of implementing class) should
always be included.
Examples of stats declared would include:
Number of CrawlURIs handled.
Number of links extracted (for link extractors)
etc.
A human readable report on the processor's state. |
setDefaultNextProcessor | public void setDefaultNextProcessor(Processor nextProcessor)(Code) | | Set the default next processor in the chain.
Parameters: nextProcessor - the default next processor in the chain. |
|
|