| java.lang.Object org.archive.io.ArchiveReader org.archive.io.warc.v10.WARCReader
createArchiveRecord | protected WARCRecord createArchiveRecord(InputStream is, long offset) throws IOException(Code) | | Create new WARC record.
Encapsulate housekeeping that has to do w/ creating new Record.
Parameters: is - InputStream to use. Parameters: offset - Absolute offset into WARC file. A WARCRecord. throws: IOException - |
getDotFileExtension | public String getDotFileExtension()(Code) | | |
gotoEOR | protected void gotoEOR(ArchiveRecord record) throws IOException(Code) | | Skip over any trailing new lines at end of the record so we're lined up
ready to read the next.
Parameters: record - throws: IOException - |
main | public static void main(String[] args) throws ParseException, IOException, java.text.ParseException(Code) | | Command-line interface to WARCReader.
Here is the command-line interface:
usage: java org.archive.io.arc.WARCReader [--offset=#] ARCFILE
-h,--help Prints this message and exits.
-o,--offset Outputs record at this offset into arc file.
Outputs using a pseudo-CDX format as described here:
CDX
Legent and here
Example.
Legend used in below is: 'CDX b e a m s c V (or v if uncompressed) n g'.
Hash is hard-coded straight SHA-1 hash of content.
Parameters: args - Command-line arguments. throws: ParseException - Failed parse of the command line. throws: IOException - throws: java.text.ParseException - |
outputRecord | protected static void outputRecord(WARCReader r, String format) throws IOException(Code) | | Output passed record using passed format specifier.
Parameters: r - ARCReader instance to output. Parameters: format - What format to use outputting. throws: IOException - |
Fields inherited from org.archive.io.ArchiveReader | final public static int MAX_ALLOWED_RECOVERABLES(Code)(Java Doc)
|
|
|