| java.lang.Object org.archive.io.WriterPoolMember org.archive.io.warc.v10.ExperimentalWARCWriter
ExperimentalWARCWriter | public class ExperimentalWARCWriter extends WriterPoolMember implements WARCConstants(Code) | | Experimental WARC implementation.
Based on unreleased version 0.9 of WARC
File Format document. Specification and implementation subject to
change.
Assumption is that the caller is managing access to this
ExperimentalWARCWriter ensuring only one thread accessing this WARC instance
at any one time.
While being written, WARCs have a '.open' suffix appended.
author: stack version: $Revision: 4604 $ $Date: 2006-09-05 22:38:18 -0700 (Tue, 05 Sep 2006) $ |
Field Summary | |
public static byte[] | CRLF_BYTES NEWLINE as bytes. |
Constructor Summary | |
| ExperimentalWARCWriter() Shutdown Constructor
Has default access so can make instance to test utility methods. | public | ExperimentalWARCWriter(AtomicInteger serialNo, OutputStream out, File f, boolean cmprs, String a14DigitDate, List warcinfoData) Constructor.
Takes a stream. | public | ExperimentalWARCWriter(AtomicInteger serialNo, List<File> dirs, String prefix, String suffix, boolean cmprs, long maxSize, List warcinfoData) Constructor.
Parameters: dirs - Where to drop files. Parameters: prefix - File prefix to use. Parameters: cmprs - Compress the records written. |
Method Summary | |
protected void | baseCharacterCheck(char c, String parameter) | protected String | checkHeaderLineMimetypeParameter(String parameter) | protected String | checkHeaderLineParameters(String parameter) | protected String | createFile(File file) | protected byte[] | createRecordHeaderline(String type, String url, String create14DigitDate, String mimetype, URI recordId, int namedFieldsLength, long contentLength) | protected URI | generateRecordId(Map<String, String> qualifiers) | protected URI | generateRecordId(String key, String value) | public static URI | getRecordID() Convenience method for getting Record-Ids. | public void | writeMetadataRecord(String url, String create14DigitDate, String mimetype, URI recordId, ANVLRecord namedFields, InputStream metadata, long metadataLength) | protected void | writeRecord(String type, String url, String create14DigitDate, String mimetype, URI recordId, ANVLRecord namedFields, InputStream contentStream, long contentLength) | public void | writeRequestRecord(String url, String create14DigitDate, String mimetype, URI recordId, ANVLRecord namedFields, InputStream request, long requestLength) | public void | writeResourceRecord(String url, String create14DigitDate, String mimetype, ANVLRecord namedFields, InputStream response, long responseLength) | public void | writeResourceRecord(String url, String create14DigitDate, String mimetype, URI recordId, ANVLRecord namedFields, InputStream response, long responseLength) | public void | writeResponseRecord(String url, String create14DigitDate, String mimetype, URI recordId, ANVLRecord namedFields, InputStream response, long responseLength) | public URI | writeWarcinfoRecord(String filename) | public URI | writeWarcinfoRecord(String filename, String description) | public URI | writeWarcinfoRecord(String mimetype, ANVLRecord namedFields, InputStream fileMetadata, long fileMetadataLength) Write a warcinfo to current file.
TODO: Write crawl metadata or pointers to crawl description.
Parameters: mimetype - Mimetype of the fileMetadata block. Parameters: namedFields - Named fields. | public void | writeWarcinfoRecord(String create14DigitDate, String mimetype, URI recordId, ANVLRecord namedFields, InputStream fileMetadata, long fileMetadataLength) Write a warcinfo to current file. |
CRLF_BYTES | public static byte[] CRLF_BYTES(Code) | | NEWLINE as bytes.
|
ExperimentalWARCWriter | ExperimentalWARCWriter()(Code) | | Shutdown Constructor
Has default access so can make instance to test utility methods.
|
ExperimentalWARCWriter | public ExperimentalWARCWriter(AtomicInteger serialNo, OutputStream out, File f, boolean cmprs, String a14DigitDate, List warcinfoData) throws IOException(Code) | | Constructor.
Takes a stream. Use with caution. There is no upperbound check on size.
Will just keep writing. Only pass Streams that are bounded.
Parameters: serialNo - used to generate unique file name sequences Parameters: out - Where to write. Parameters: f - File the out is connected to. Parameters: cmprs - Compress the content written. Parameters: a14DigitDate - If null, we'll write current time. throws: IOException - |
ExperimentalWARCWriter | public ExperimentalWARCWriter(AtomicInteger serialNo, List<File> dirs, String prefix, String suffix, boolean cmprs, long maxSize, List warcinfoData)(Code) | | Constructor.
Parameters: dirs - Where to drop files. Parameters: prefix - File prefix to use. Parameters: cmprs - Compress the records written. Parameters: maxSize - Maximum size for ARC files written. Parameters: suffix - File tail to use. If null, unused. Parameters: warcinfoData - File metadata for warcinfo record. |
writeWarcinfoRecord | public URI writeWarcinfoRecord(String mimetype, ANVLRecord namedFields, InputStream fileMetadata, long fileMetadataLength) throws IOException(Code) | | Write a warcinfo to current file.
TODO: Write crawl metadata or pointers to crawl description.
Parameters: mimetype - Mimetype of the fileMetadata block. Parameters: namedFields - Named fields. Pass null if none. Parameters: fileMetadata - Metadata about this WARC as RDF, ANVL, etc. Parameters: fileMetadataLength - Length of fileMetadata . throws: IOException - Generated record-id made withdata: scheme andthe current filename. |
writeWarcinfoRecord | public void writeWarcinfoRecord(String create14DigitDate, String mimetype, URI recordId, ANVLRecord namedFields, InputStream fileMetadata, long fileMetadataLength) throws IOException(Code) | | Write a warcinfo to current file.
The warcinfo type uses its recordId as its URL.
Parameters: recordId - URI to use for this warcinfo. Parameters: create14DigitDate - Record creation date as 14 digit date. Parameters: mimetype - Mimetype of the fileMetadata . Parameters: namedFields - Named fields. Parameters: fileMetadata - Metadata about this WARC as RDF, ANVL, etc. Parameters: fileMetadataLength - Length of fileMetadata . throws: IOException - |
|
|