| java.lang.Object it.unimi.dsi.mg4j.index.AbstractBitStreamIndexWriter it.unimi.dsi.mg4j.index.BitStreamIndexWriter
All known Subclasses: it.unimi.dsi.mg4j.index.SkipBitStreamIndexWriter,
BitStreamIndexWriter | public class BitStreamIndexWriter extends AbstractBitStreamIndexWriter (Code) | | Writes a bitstream-based interleaved index.
Offsets bit stream
An inverted index may have an associated
OutputBitStream of
offsets: this file contains T+1 integers, where T
is the number of inverted lists (i.e., the number of terms), and the
i -th entry is a suitable coding of the position in bits where
the i -th inverted list starts (the last entry is actually the
length, in bytes, of the inverted index file itself). The coding used for
the offset stream is a γ code of the difference between the current
position and the last position.
author: Paolo Boldi author: Sebastiano Vigna since: 0.6 |
Constructor Summary | |
public | BitStreamIndexWriter(CharSequence basename, int numberOfDocuments, boolean writeOffsets, Map<Component, Coding> flags) Creates a new index writer, with the specified basename. | public | BitStreamIndexWriter(OutputBitStream obs, OutputBitStream offset, int numberOfDocuments, Map<Component, Coding> flags) Creates a new index writer with payloads using the specified underlying
OutputBitStream . | public | BitStreamIndexWriter(OutputBitStream obs, int numberOfDocuments, Map<Component, Coding> flags) Creates a new index writer, with the specified underlying
OutputBitStream ,
without an associated offset bit stream. |
BEFORE_COUNT | final protected static int BEFORE_COUNT(Code) | | This value of
BitStreamIndexWriter.state can be assumed only in indices that contain counts; it
means that we are positioned just before the count for the current document record.
|
BEFORE_PAYLOAD | final protected static int BEFORE_PAYLOAD(Code) | | This value of
BitStreamIndexWriter.state can be assumed only in indices that contain payloads; it
means that we are positioned just before the payload for the current document record.
|
BEFORE_POSITIONS | final protected static int BEFORE_POSITIONS(Code) | | This value of
BitStreamIndexWriter.state can be assumed only in indices that contain document positions;
it means that we are positioned just before the position list of the current document record.
|
FIRST_UNUSED_STATE | final protected static int FIRST_UNUSED_STATE(Code) | | This is the first unused state. Subclasses may start from this value to define new states.
|
b | protected int b(Code) | | The parameter b for Golomb coding of pointers.
|
currentDocument | protected int currentDocument(Code) | | The current document pointer.
|
frequency | protected int frequency(Code) | | The number of document records that the current inverted list will contain.
|
lastDocument | protected int lastDocument(Code) | | The last document pointer in the current list.
|
log2b | protected int log2b(Code) | | The parameter log2b for Golomb coding of pointers; it is the most significant bit of
BitStreamIndexWriter.b .
|
maxCount | public int maxCount(Code) | | The maximum number of positions in a document record so far.
|
obs | protected OutputBitStream obs(Code) | | The underlying
OutputBitStream .
|
state | protected int state(Code) | | The current state of the writer.
|
writtenDocuments | protected int writtenDocuments(Code) | | The number of document records already written for the current inverted list.
|
BitStreamIndexWriter | public BitStreamIndexWriter(CharSequence basename, int numberOfDocuments, boolean writeOffsets, Map<Component, Coding> flags) throws IOException(Code) | | Creates a new index writer, with the specified basename. The index will be written on a file (stemmed with .index).
If writeOffsets , also an offset file will be produced (stemmed with .offsets).
When
BitStreamIndexWriter.close() will be called, the property file will also be produced (stemmed with .properties),
or enriched if it already exists.
Parameters: basename - the basename. Parameters: numberOfDocuments - the number of documents in the collection to be indexed. Parameters: writeOffsets - if true , the offset file will also be produced. Parameters: flags - a flag map setting the coding techniques to be used (see CompressionFlags). |
BitStreamIndexWriter | public BitStreamIndexWriter(OutputBitStream obs, OutputBitStream offset, int numberOfDocuments, Map<Component, Coding> flags)(Code) | | Creates a new index writer with payloads using the specified underlying
OutputBitStream .
Parameters: obs - the underlying output bit stream. Parameters: offset - the offset bit stream, or null if offsets should not be written. Parameters: numberOfDocuments - the number of documents in the collection to be indexed. Parameters: flags - a flag map setting the coding techniques to be used (see CompressionFlags). |
BitStreamIndexWriter | public BitStreamIndexWriter(OutputBitStream obs, int numberOfDocuments, Map<Component, Coding> flags)(Code) | | Creates a new index writer, with the specified underlying
OutputBitStream ,
without an associated offset bit stream.
Parameters: obs - the underlying output bit stream. Parameters: numberOfDocuments - the number of documents in the collection to be indexed. Parameters: flags - a flag map setting the coding techniques to be used (see CompressionFlags). |
properties | public Properties properties()(Code) | | |
writeDocumentPointer | public int writeDocumentPointer(OutputBitStream out, int pointer) throws IOException(Code) | | |
writeDocumentPositions | public int writeDocumentPositions(OutputBitStream out, int[] occ, int offset, int len, int docSize) throws IOException(Code) | | |
writePositionCount | public int writePositionCount(OutputBitStream out, int count) throws IOException(Code) | | |
writtenBits | public long writtenBits()(Code) | | |
|
|