org.archive.io.warc package
Experimental WARC Writer and Readers. Code and specification subject to change
with no guarantees of backward compatibility: i.e. newer readers
may not be able to parse WARCs written with older writers. This package
contains prototyping code for revision 0.12 of the WARC specification.
See latest revision
for current state (Version 0.10 code and its documentation has been moved into the
v10 subpackage).
Implementation Notes
Tools
Initial implementations of Arc2Warc and Warc2Arc
tools can be found in the package above this one, at
{@link org.archive.io.Arc2Warc} and {@link org.archive.io.Warc2Arc}
respectively. Pass --help to learn how to use each tool.
TODO
- Is MIME-Version header needed? MIME Parsers seem fine without (python email
lib and java mail).
- Should we write out a Content-Transfer-Encoding
header (Currently we do not). Need section in spec. explicit about our
interpretation of MIME and deviations (e.g. content-transfer-encoding should
be assumed binary in case of WARCs, multipart is not disallowed but not
encouraged, etc.)
- Minor: Do WARC-Version: 0.12 like MIME-Version: 1.0 rather than
WARC/0.12 for lead in to an ARCRecord?
|