WARC Constants used by WARC readers and writers.
Below constants are used by version 0.10 and 0.12 of WARC Reader/Writer.
author: stack version: $Revision: 4976 $ $Date: 2007-03-09 13:59:07 +0000 (Fri, 09 Mar 2007) $
PLACEHOLDER_RECORD_LENGTH_STRING Placeholder for length in Header line.
Placeholder is same size as the fixed field size allocated for length,
12 characters.
Encoding to use getting bytes from strings.
Specify an encoding rather than leave it to chance: i.e whatever the
JVMs encoding. Use an encoding that gets the stream as bytes, not chars.
TODO: ARC uses ISO-8859-1. In general, we should use UTF-8 but we
probably need a single byte encoding if we're out for preserving the
binary data as received over the net (We probably don't want to transform
the supra-ASCII characters to UTF-8 before storing in ARC). For now,
till we figure it, DEFAULT_ENCODING is single-byte charset -- same as
ARCs.
DEFAULT_MAX_WARC_FILE_SIZE
final public static int DEFAULT_MAX_WARC_FILE_SIZE(Code)
Default maximum WARC file size.
1Gig.
DOT_COMPRESSED_FILE_EXTENSION
final public static String DOT_COMPRESSED_FILE_EXTENSION(Code)
DOT_COMPRESSED_WARC_FILE_EXTENSION
final public static String DOT_COMPRESSED_WARC_FILE_EXTENSION(Code)
Compressed dot WARC file extension.
DOT_WARC_FILE_EXTENSION
final public static String DOT_WARC_FILE_EXTENSION(Code)
Dot WARC file extension.
HEADER_FIELD_KEYS
final public static String[] HEADER_FIELD_KEYS(Code)
HEADER_FIELD_SEPARATOR
final public static char HEADER_FIELD_SEPARATOR(Code)
Header field seperator character.
HEADER_KEY_CHECKSUM
final public static String HEADER_KEY_CHECKSUM(Code)
HEADER_KEY_CONCURRENT_TO
final public static String HEADER_KEY_CONCURRENT_TO(Code)
final public static String HEADER_LINE_ENCODING(Code)
HTTP_REQUEST_MIMETYPE
final public static String HTTP_REQUEST_MIMETYPE(Code)
To be safe, lets use application type rather than message. Regards
'message/http', RFC says "...provided that it obeys the MIME restrictions
for all 'message' types regarding line length and encodings." This
usually means lines of 1000 octets max (unless a
'Content-Transfer-Encoding: binary' mime header is present).
See Also:rfc2616 section 19.1
HTTP_RESPONSE_MIMETYPE
final public static String HTTP_RESPONSE_MIMETYPE(Code)
final public static String NAMED_FIELD_CHECKSUM_LABEL(Code)
NAMED_FIELD_DESCRIPTION
final public static String NAMED_FIELD_DESCRIPTION(Code)
NAMED_FIELD_FILEDESC
final public static String NAMED_FIELD_FILEDESC(Code)
NAMED_FIELD_IP_LABEL
final public static String NAMED_FIELD_IP_LABEL(Code)
NAMED_FIELD_RELATED_LABEL
final public static String NAMED_FIELD_RELATED_LABEL(Code)
NAMED_FIELD_TRUNCATED
final public static String NAMED_FIELD_TRUNCATED(Code)
NAMED_FIELD_TRUNCATED_VALUE_HEAD
final public static String NAMED_FIELD_TRUNCATED_VALUE_HEAD(Code)
NAMED_FIELD_TRUNCATED_VALUE_LEN
final public static String NAMED_FIELD_TRUNCATED_VALUE_LEN(Code)
NAMED_FIELD_TRUNCATED_VALUE_TIME
final public static String NAMED_FIELD_TRUNCATED_VALUE_TIME(Code)
NAMED_FIELD_TRUNCATED_VALUE_UNSPECIFIED
final public static String NAMED_FIELD_TRUNCATED_VALUE_UNSPECIFIED(Code)
NAMED_FIELD_WARCFILENAME
final public static String NAMED_FIELD_WARCFILENAME(Code)
PLACEHOLDER_RECORD_LENGTH_STRING
final public static String PLACEHOLDER_RECORD_LENGTH_STRING(Code)
Placeholder for length in Header line.
Placeholder is same size as the fixed field size allocated for length,
12 characters. 12 characters allows records of size almost 1TB.
PROFILE_CONVERSION_SOFTWARE_COMMAND
final public static String PROFILE_CONVERSION_SOFTWARE_COMMAND(Code)
PROFILE_REVISIT_IDENTICAL_DIGEST
final public static String PROFILE_REVISIT_IDENTICAL_DIGEST(Code)
PROFILE_REVISIT_NOT_MODIFIED
final public static String PROFILE_REVISIT_NOT_MODIFIED(Code)