| java.lang.Object org.apache.harmony.pack200.Codec
All known Subclasses: org.apache.harmony.pack200.BHSDCodec, org.apache.harmony.pack200.RunCodec, org.apache.harmony.pack200.PopulationCodec,
Codec | abstract public class Codec (Code) | | A codec allows a sequence of bytes to be decoded into integer values (or vice
versa). It uses a variable-length encoding and a modified sign representation
such that small numbers are represented as a single byte, whilst larger
numbers take more bytes to encode. The number may be signed or unsigned; if
it is unsigned, it can be weighted towards positive numbers or equally
distributed using a one's complement. The codec also supports delta coding,
where a sequence of numbers is represented as a series of first-order
differences. So a delta encoding of the integers [1..10] would be represented
as a sequence of 10x1s. This allows the absolute value of a coded integer to
fall outside of the 'small number' range, whilst still being encoded as a
single byte.
A codec is configured with four parameters:
- B
- The maximum number of bytes that each value is encoded as. B must be a
value between [1..5]. For a pass-through coding (where each byte is encoded
as itself, aka
Codec.BYTE1 , B is 1 (each byte takes a maximum of 1 byte).
- H
- The radix of the integer. Values are defined as a sequence of values,
where value
n is multiplied by H^n .
So the number 1234 may be represented as the sequence 4 3 2 1 with a radix
(H) of 10. Note that other permutations are also possible; 43 2 1 will also
encode 1234. The co-parameter L is defined as 256-H. This is important
because only the last value in a sequence may be < L; all prior values
must be > L.
- S
- Whether the codec represents signed values (or not). This may have 3
values; 0 (unsigned), 1 (signed, ones complement) or 2 (signed, but not sure
what the difference is) TODO Update documentation when I know what the
difference is
- D
- Whether the codec represents a delta encoding. This may be 0 (no delta)
or 1 (delta encoding). A delta encoding of 1 indicates that values are
cumulative; a sequence of
1 1 1 1 1 will represent the
sequence 1 2 3 4 5 . For this reason, the codec supports two
variants of decode; one
Codec.decode(InputStream,long) with and one
Codec.decode(InputStream) without a last parameter. If the
codec is a non-delta encoding, then the value is ignored if passed. If the
codec is a delta encoding, it is a run-time error to call the value without
the extra parameter, and the previous value should be returned. (It was
designed this way to support multi-threaded access without requiring a new
instance of the Codec to be cloned for each use.)
-
Codecs are notated as (B,H,S,D) and either D or S,D may be omitted if zero.
Thus
Codec.BYTE1 is denoted (1,256,0,0) or (1,256). The
Codec.toString() method prints out the condensed form of the encoding.
Often, the last character in the name (
Codec.BYTE1 ,
Codec.UNSIGNED5 )
gives a clue as to the B value. Those that start with U (
Codec.UDELTA5 ,
Codec.UNSIGNED5 ) are unsigned; otherwise, in most cases, they are signed.
The presence of the word Delta (
Codec.DELTA5 ,
Codec.UDELTA5 )
indicates a delta encoding is used.
This codec is really quite cool for storing compressed information, and could
be used entirely separately from the Pack200 implementation for efficient
transfer of integer data if required.
Note that all information is byte-oriented; for decoding float/double
information, the bit values are converted (not cast) into a long type. Note
that long values are used throughout even though most may be cast to ints;
this is primarily to avoid having to worry about signed values, even if it
would be more efficient to do so.
There are a number of standard codecs (
Codec.UDELTA5 ,
Codec.UNSIGNED5 ,
Codec.BYTE1 ,
Codec.CHAR3 ) that are used in the implementation of many
bands; but there are a variety of other ones, and indeed the specification
assumes that other combinations of values can result in more specific and
efficient formats. There are also a sequence of canonical encodings defined
by the Pack200 specification, which allow a codec to be referred to by
canonical number.
CodecEncoding.canonicalCodec )
|
Field Summary | |
final public static BHSDCodec | BCI5 BCI5 = (5,4): Used for storing branching information in bytecode. | final public static BHSDCodec | BRANCH5 BRANCH5 = (5,4,2): Used for storing branching information in bytecode. | final public static BHSDCodec | BYTE1 BYTE1 = (1,256): Used for storing plain bytes. | final public static BHSDCodec | CHAR3 CHAR3 = (3,128): Used for storing text (UTF-8) strings. | final public static BHSDCodec | DELTA5 DELTA5 = (5,64,1,1): Used for the majority of numerical codings where
there is a correlated sequence of signed values. | final public static BHSDCodec | MDELTA5 MDELTA5 = (5,64,2,1): Used for the majority of numerical codings where
there is a correlated sequence of signed values, but where most of them
are expected to be non-negative. | final public static BHSDCodec | SIGNED5 SIGNED5 = (5,64,1): Used for small signed values. | final public static BHSDCodec | UDELTA5 UDELTA5 = (5,64,0,1): Used for the majority of numerical codings where
there is a correlated sequence of unsigned values. | final public static BHSDCodec | UNSIGNED5 UNSIGNED5 = (5,64): Used for small unsigned values. |
Method Summary | |
abstract public long | decode(InputStream in) Decode a sequence of bytes from the given input stream, returning the
value as a long. | abstract public long | decode(InputStream in, long last) Decode a sequence of bytes from the given input stream, returning the
value as a long. | public long[] | decode(int n, InputStream in) Decodes a sequence of n values from in . | public long[] | decode(int n, InputStream in, long firstValue) Decodes a sequence of n values from in . |
BCI5 | final public static BHSDCodec BCI5(Code) | | BCI5 = (5,4): Used for storing branching information in bytecode.
|
BRANCH5 | final public static BHSDCodec BRANCH5(Code) | | BRANCH5 = (5,4,2): Used for storing branching information in bytecode.
|
BYTE1 | final public static BHSDCodec BYTE1(Code) | | BYTE1 = (1,256): Used for storing plain bytes.
|
CHAR3 | final public static BHSDCodec CHAR3(Code) | | CHAR3 = (3,128): Used for storing text (UTF-8) strings. NB This isn't
quite the same as UTF-8, but has similar properties; ASCII characters
< 127 are stored in a single byte.
|
DELTA5 | final public static BHSDCodec DELTA5(Code) | | DELTA5 = (5,64,1,1): Used for the majority of numerical codings where
there is a correlated sequence of signed values.
|
MDELTA5 | final public static BHSDCodec MDELTA5(Code) | | MDELTA5 = (5,64,2,1): Used for the majority of numerical codings where
there is a correlated sequence of signed values, but where most of them
are expected to be non-negative.
|
SIGNED5 | final public static BHSDCodec SIGNED5(Code) | | SIGNED5 = (5,64,1): Used for small signed values.
|
UDELTA5 | final public static BHSDCodec UDELTA5(Code) | | UDELTA5 = (5,64,0,1): Used for the majority of numerical codings where
there is a correlated sequence of unsigned values.
|
UNSIGNED5 | final public static BHSDCodec UNSIGNED5(Code) | | UNSIGNED5 = (5,64): Used for small unsigned values.
|
decode | abstract public long decode(InputStream in) throws IOException, Pack200Exception(Code) | | Decode a sequence of bytes from the given input stream, returning the
value as a long. Note that this method can only be applied for non-delta
encodings.
Parameters: in - the input stream to read from the value as a long throws: IOException - if there is a problem reading from the underlying inputstream throws: Pack200Exception - if the encoding is a delta encoding |
decode | abstract public long decode(InputStream in, long last) throws IOException, Pack200Exception(Code) | | Decode a sequence of bytes from the given input stream, returning the
value as a long. If this encoding is a delta encoding (d=1) then the
previous value must be passed in as a parameter. If it is a non-delta
encoding, then it does not matter what value is passed in, so it makes
sense for the value to be passed in by default using code similar to:
long last = 0;
while (condition) {
last = codec.decode(in, last);
// do something with last
}
Parameters: in - the input stream to read from Parameters: last - the previous value read, which must be supplied if the codecis a delta encoding the value as a long throws: IOException - if there is a problem reading from the underlying inputstream throws: Pack200Exception - if there is a problem decoding the value or that the value isinvalid |
decode | public long[] decode(int n, InputStream in) throws IOException, Pack200Exception(Code) | | Decodes a sequence of n values from in .
This should probably be used in most cases, since some codecs
(such as @{link PopCodec}) only work when the number of values
to be read is known.
Parameters: n - the number of values to decode Parameters: in - the input stream to read from an array of long values corresponding to valuesdecoded throws: IOException - if there is a problem reading from the underlying inputstream throws: Pack200Exception - if there is a problem decoding the value or that the value isinvalid |
decode | public long[] decode(int n, InputStream in, long firstValue) throws IOException, Pack200Exception(Code) | | Decodes a sequence of n values from in .
Parameters: n - the number of values to decode Parameters: in - the input stream to read from Parameters: firstValue - the first value in the band if it has already been read an array of long values corresponding to valuesdecoded, with firstValue as the first value in the array. throws: IOException - if there is a problem reading from the underlying inputstream throws: Pack200Exception - if there is a problem decoding the value or that the value isinvalid |
|
|