it.unimi.dsi.mg4j.io |
MG4J: Managing Gigabytes for Java
Bit-level I/O classes.
Package Specification
The standard Java API lacks bit-level I/O classes: to this purpose, MG4J
provides {@link it.unimi.dsi.mg4j.io.InputBitStream} and {@link
it.unimi.dsi.mg4j.io.OutputBitStream}, which can wrap any standard Java
corresponding stream and make it work at the bit level; moreover, they
provide support for several useful formats (such as unary, binary, minimal
binary, γ, δ and Golomb encoding).
Compression can be achieved using self-delimiting formats supported by
the classes above, or also by arithmetic coding, using the
classes {@link it.unimi.dsi.mg4j.io.ArithmeticCoder} and {@link
it.unimi.dsi.mg4j.io.ArithmeticDecoder}. Note that arithmetic coding is
not very efficient in the present implementation, as it does not allow a
varying number of symbols.
Bit input and output streams offer also efficient buffering and a way to
reposition the bit stream in case the underlying byte stream is a
file-based stream or a {@link it.unimi.dsi.fastutil.io.RepositionableStream}.
Conventions
All coding methods work on natural numbers. The
encoding of zero is very natural for some techniques, and much less natural
for others. To keep methods rationally organised, all methods are able to
encode any natural number. If, for instance, you want to write positive
numbers in unary encoding and you do not want to waste a bit, you have to
decrement them first (i.e., instead of p you must encode
p−1).
|
Java Source File Name | Type | Comment |
ArithmeticCoder.java | Class | An arithmetic coder.
This class provides an arithmetic coder. |
ArithmeticDecoder.java | Class | An arithmetic decoder.
This class provides an arithmetic decoder. |
ByteArrayPostingList.java | Class | Lightweight posting accumulator with format similar to that generated by
BitStreamIndexWriter .
This class is essentially a dirty trick: it borrows some code and precomputed tables from
OutputBitStream and exposes two simple methods (
ByteArrayPostingList.setDocumentPointer(int) and
ByteArrayPostingList.addPosition(int) ) with obvious
semantics. |
ByteBufferInputStream.java | Class | A bridge between byte
and
. |
DebugInputBitStream.java | Class | A debugging wrapper for input bit streams.
This class can be used to wrap an input bit stream. |
DebugOutputBitStream.java | Class | A debugging wrapper for output bit streams.
This class can be used to wrap an output bit stream. |
FastBufferedReader.java | Class | A lightweight, unsynchronised buffered reader based on
.
This class provides buffering for readers, but it does so with
purposes and an internal logic that are radically different from the ones
adopted in
java.io.BufferedReader .
There is no support for marking. |
FastByteArrayInputStream.java | Class | Simple, fast and repositionable byte-array input stream. |
FastByteArrayOutputStream.java | Class | Simple, fast byte-array output stream that exposes the backing array.
java.io.ByteArrayOutputStream is nice, but to get its content you
must generate each time a new object. |
FastMultiByteArrayInputStream.java | Class | Simple, fast and repositionable byte array input stream that multiplexes its content among several arrays.
This class is significantly slower than
it.unimi.dsi.mg4j.io.FastByteArrayInputStream ,
but it can hold 256 PiB of data. |
FileLinesCollection.java | Class | A wrapper exhibiting the lines of a file as a
java.util.Collection .
Warning: the lines returned by iterators generated by
instances of this class are not cacheable. |
InputBitStream.java | Class | Bit-level input stream.
Warning: for simplicity, efficiency and lack of usefulness
overflowing and unget features have been removed in MG4J 1.1. |
InterpolativeCoding.java | Class | Static methods implementing interpolative coding.
Interpolative coding is a sophisticated compression technique that can be
applied to increasing sequences of integers. |
LineIterator.java | Class | An adapter that exposes a fast buffered reader as an iterator
over the returned lines. |
LineWordReader.java | Class | A trivial
it.unimi.dsi.mg4j.io.WordReader that considers each line
of a document a single word.
The intended usage of this class is that of indexing stuff like lists of document
identifiers: if the identifiers contain nonalphabetical characters, the default
it.unimi.dsi.mg4j.io.FastBufferedReader might do a poor job. |
MultipleInputStream.java | Class | A multiple input stream. |
NullInputStream.java | Class | End-of-stream-only input stream.
This stream has length 0, and will always return end-of-file on any read attempt.
This class is a singleton. |
NullOutputStream.java | Class | Throw-it-away output stream.
This stream discards whatever is written into it. |
NullReader.java | Class | End-of-stream-only reader.
This reader will always return end-of-file on any read attempt.
This class is a singleton. |
OutputBitStream.java | Class | Bit-level output stream.
This class wraps any
OutputStream so that you can treat it as
bit stream. |
SafelyCloseable.java | Interface | A marker interface for a closeable resource that implements safety measures to
make resource tracking easier.
Classes implementing this interface must provide a safety-net finaliser—a
finaliser that closes the resource and logs that resource should have been closed.
When the implementing class is abstract, concrete subclasses must
call super.close() in their own
java.io.Closeable.close method
to let the abstract class track correctly the resource. |
SegmentedInputStream.java | Class | Exhibits a single
InputStream as a number of streams divided into
java.io.InputStream.reset reset() -separated
segments.
An instance of this class wraps a given input stream (usually a replicable one, such as
a
java.io.FileInputStream ) and exposes its contents as a number of separated input
streams. |
WordReader.java | Interface | An interface providing methods to break the input from a reader into words.
The intended implementations of this interface should decorate
a given reader (see, for instance,
it.unimi.dsi.mg4j.io.FastBufferedReader ).
The reader can be changed at any time using
WordReader.setReader(Reader) .
This interface is heavily oriented towards reusability and
streaming. |