Abstract implementation of a file pool processor.
Subclass to implement for a particular
WriterPoolMember instance.
author: Parker Thompson author: stack
getAttributeUnchecked(String name) Version of getAttributes that catches and logs exceptions
and returns null if failure to fetch the attribute.
Parameters: name - Attribute name.
getFirstrecordBody(File orderFile) Write the arc metadata body content.
Its based on the order xml file but into this base we'll add other info
such as machine ip.
Parameters: orderFile - Order file.
Version of getAttributes that catches and logs exceptions
and returns null if failure to fetch the attribute.
Parameters: name - Attribute name. Attribute or null.
Write the arc metadata body content.
Its based on the order xml file but into this base we'll add other info
such as machine ip.
Parameters: orderFile - Order file. String that holds the arc metaheader body.
Max size we want files to be (bytes).
Default is ARCConstants.DEFAULT_MAX_ARC_FILE_SIZE. Note that ARC
files will usually be bigger than maxSize; they'll be maxSize + length
to next boundary.
ARC maximum size.
Return list of metadatas to add to first arc file metadata record.
Default is to stylesheet the order file. To specify stylesheet,
override
WriterPoolProcessor.getFirstrecordStylesheet() .
Get xml files from settingshandler. Currently order file is the
only xml file. We're NOT adding seeds to meta data.
List of strings and/or files to add to arc file as metadata ornull.
Writes a CrawlURI and its associated data to store file.
Currently this method understands the following uri types: dns, http,
and https.
Parameters: curi - CrawlURI to process.
Whether the given CrawlURI should be written to archive files.
Annotates CrawlURI with a reason for any negative answer.
Parameters: curi - CrawlURI true if URI should be written; false otherwise
Fields inherited from org.archive.crawler.framework.Processor