| java.lang.Object java.io.InputStream org.archive.io.ArchiveRecord org.archive.io.warc.v10.WARCRecord
Constructor Summary | |
public | WARCRecord(InputStream in, String identifier, long offset) Constructor. | public | WARCRecord(InputStream in, ArchiveRecordHeader headers) Constructor. | public | WARCRecord(InputStream in, String identifier, long offset, boolean digest, boolean strict) Constructor.
Parameters: in - Stream cue'd up to be at the start of the record this instanceis to represent or, if headers is not null, just past theHeader Line and Named Fields. Parameters: identifier - Identifier for this the hosting Reader. Parameters: offset - Current offset into in (Used to keepposition properly aligned). |
Method Summary | |
protected String | getMimetype4Cdx(ArchiveRecordHeader h) | public static boolean | isCR(char c) | public static boolean | isCROrLF(char c) | public static boolean | isLF(char c) | protected int | parseHeaderLine(InputStream in, Map<Object, Object> fields, boolean strict) | protected ArchiveRecordHeader | parseHeaders(InputStream in, String identifier, long offset, boolean strict) Parse WARC Header Line and Named Fields.
Parameters: in - Stream to read. Parameters: identifier - Identifier for the hosting Reader. Parameters: offset - Absolute offset into Reader. Parameters: strict - Whether to be loose parsing or not. | protected void | parseNamedFields(InputStream in, Map<Object, Object> fields) | protected byte[] | readLine(InputStream in, boolean strict) Read a line.
A 'line' in this context ends in CRLF and contains ascii-only and no
control-characters.
Parameters: in - InputStream to read. Parameters: strict - Strict parsing (If false, we'll eat whitespace before therecord. |
WARCRecord | public WARCRecord(InputStream in, String identifier, long offset) throws IOException(Code) | | Constructor.
Parameters: in - Stream cue'd up to be at the start of the record this instanceis to represent. throws: IOException - |
WARCRecord | public WARCRecord(InputStream in, String identifier, long offset, boolean digest, boolean strict) throws IOException(Code) | | Constructor.
Parameters: in - Stream cue'd up to be at the start of the record this instanceis to represent or, if headers is not null, just past theHeader Line and Named Fields. Parameters: identifier - Identifier for this the hosting Reader. Parameters: offset - Current offset into in (Used to keepposition properly aligned). Usually 0. Parameters: digest - True if we're to calculate digest for this record. Notdigesting saves about ~15% of cpu during parse. Parameters: strict - Be strict parsing (Parsing stops if file inproperlyformatted). throws: IOException - |
isCR | public static boolean isCR(char c)(Code) | | |
isCROrLF | public static boolean isCROrLF(char c)(Code) | | |
isLF | public static boolean isLF(char c)(Code) | | |
parseHeaders | protected ArchiveRecordHeader parseHeaders(InputStream in, String identifier, long offset, boolean strict) throws IOException(Code) | | Parse WARC Header Line and Named Fields.
Parameters: in - Stream to read. Parameters: identifier - Identifier for the hosting Reader. Parameters: offset - Absolute offset into Reader. Parameters: strict - Whether to be loose parsing or not. An ArchiveRecordHeader. throws: IOException - |
readLine | protected byte[] readLine(InputStream in, boolean strict) throws IOException(Code) | | Read a line.
A 'line' in this context ends in CRLF and contains ascii-only and no
control-characters.
Parameters: in - InputStream to read. Parameters: strict - Strict parsing (If false, we'll eat whitespace before therecord. All bytes in line including terminating CRLF. throws: IOException - |
|
|