| java.lang.Object it.unimi.dsi.mg4j.index.AbstractBitStreamIndexWriter it.unimi.dsi.mg4j.index.BitStreamHPIndexWriter
BitStreamHPIndexWriter | public class BitStreamHPIndexWriter extends AbstractBitStreamIndexWriter implements IndexWriter(Code) | | Writes a bitstream-based high-performance index. The comments about
offsets in the documentation of
BitStreamIndexWriter apply here, too.
The difference between indices generated by this class and those generated
by
BitStreamIndexWriter lie in the level
of interleaving. Indices generated by this class have positions in a separate stream (similarly to Lucene), and
a compulsory skip structure (an extension of that used by a
BitStreamIndexWriter )
that indexes both the main index file and the positions file. This can result in major performance
improvement in the resolution of position-based operators (e.g., phrases) and in the evaluation
of
. Since the overhead due to the additional
skip structure and to the separate positions stream is negligible, indices generated by
this class are the default in MG4J.
Presently, indices generated by this class cannot carry payloads: you must use a
BitStreamIndexWriter in that case. Moreover, only nonparametric indices can be used for positions
(this limitation rules out
Coding.GOLOMB ,
Coding.SKEWED_GOLOMB , and
Coding.INTERPOLATIVE ).
author: Sebastiano Vigna since: 1.2 |
Inner Class :public static class TowerData | |
Constructor Summary | |
public | BitStreamHPIndexWriter(CharSequence basename, int numberOfDocuments, boolean writeOffsets, int tempBufferSize, Map<Component, Coding> flags, int q, int h) Creates a new index writer, with the specified basename. | public | BitStreamHPIndexWriter(OutputBitStream obs, OutputBitStream positions, OutputBitStream offset, int numberOfDocuments, int tempBufferSize, Map<Component, Coding> flags, int q, int h) Creates a new index writer with payloads using the specified underlying
OutputBitStream . |
BEFORE_COUNT | final protected static int BEFORE_COUNT(Code) | | This value of
BitStreamHPIndexWriter.state can be assumed only in indices that contain counts; it
means that we are positioned just before the count for the current document record.
|
BEFORE_PAYLOAD | final protected static int BEFORE_PAYLOAD(Code) | | This value of
BitStreamHPIndexWriter.state can be assumed only in indices that contain payloads; it
means that we are positioned just before the payload for the current document record.
|
BEFORE_POSITIONS | final protected static int BEFORE_POSITIONS(Code) | | This value of
BitStreamHPIndexWriter.state can be assumed only in indices that contain document positions;
it means that we are positioned just before the position list of the current document record.
|
DEFAULT_TEMP_BUFFER_SIZE | final public static int DEFAULT_TEMP_BUFFER_SIZE(Code) | | The size of the buffer for the temporary file used to build an inverted list. Inverted lists
shorter than this number of bytes will be directly rebuilt from the buffer, and never flushed to disk.
|
FIRST_UNUSED_STATE | final protected static int FIRST_UNUSED_STATE(Code) | | This is the first unused state. Subclasses may start from this value to define new states.
|
b | protected int b(Code) | | The parameter b for Golomb coding of pointers.
|
bitsForEntryBitLengths | public long bitsForEntryBitLengths(Code) | | The number of bits written for entry lenghts.
|
bitsForPositionsOffsets | public long bitsForPositionsOffsets(Code) | | The number of bits written for offsets in the file of positions.
|
bitsForPositionsQuantumBitLengths | public long bitsForPositionsQuantumBitLengths(Code) | | The number of bits written for quantum lengths in the positions stream.
|
bitsForQuantumBitLengths | public long bitsForQuantumBitLengths(Code) | | The number of bits written for quantum lengths.
|
currentDocument | protected int currentDocument(Code) | | The current document pointer.
|
frequency | protected int frequency(Code) | | The number of document records that the current inverted list will contain.
|
lastDocument | protected int lastDocument(Code) | | The last document pointer in the current list.
|
maxCount | public int maxCount(Code) | | The maximum number of positions in a document record so far.
|
numberOfBlocks | public long numberOfBlocks(Code) | | The number of written blocks.
|
obs | protected OutputBitStream obs(Code) | | The underlying index
OutputBitStream .
|
positions | protected OutputBitStream positions(Code) | | The underlying positions
OutputBitStream .
|
prevEntryBitLength | public int prevEntryBitLength(Code) | | An estimate on the number of bits occupied per tower entry in the last written cache, or -1 if no cache has been
written for the current inverted list.
|
prevPositionsQuantumBitLength | public int prevPositionsQuantumBitLength(Code) | | An estimate on the number of bits occupied per quantum in the positions stream in the last written cache, or -1 if no cache has been
written for the current inverted list.
|
prevQuantumBitLength | public int prevQuantumBitLength(Code) | | An estimate on the number of bits occupied per quantum in the last written cache, or -1 if no cache has been
written for the current inverted list.
|
state | protected int state(Code) | | The current state of the writer.
|
towerData | final public TowerData towerData(Code) | | The sum of all tower data computed so far.
|
writtenDocuments | protected int writtenDocuments(Code) | | The number of document records already written for the current inverted list.
|
BitStreamHPIndexWriter | public BitStreamHPIndexWriter(CharSequence basename, int numberOfDocuments, boolean writeOffsets, int tempBufferSize, Map<Component, Coding> flags, int q, int h) throws IOException(Code) | | Creates a new index writer, with the specified basename. The index will be written on a file (stemmed with .index).
If writeOffsets , also an offset file will be produced (stemmed with .offsets).
Parameters: basename - the basename. Parameters: numberOfDocuments - the number of documents in the collection to be indexed. Parameters: writeOffsets - if true , the offset file will also be produced. Parameters: flags - a flag map setting the coding techniques to be used (see CompressionFlags). |
BitStreamHPIndexWriter | public BitStreamHPIndexWriter(OutputBitStream obs, OutputBitStream positions, OutputBitStream offset, int numberOfDocuments, int tempBufferSize, Map<Component, Coding> flags, int q, int h) throws IOException(Code) | | Creates a new index writer with payloads using the specified underlying
OutputBitStream .
Parameters: obs - the underlying output bit stream. Parameters: offset - the offset bit stream, or null if offsets should not be written. Parameters: numberOfDocuments - the number of documents in the collection to be indexed. Parameters: flags - a flag map setting the coding techniques to be used (see CompressionFlags). throws: IOException - |
properties | public Properties properties()(Code) | | |
writeDocumentPointer | public int writeDocumentPointer(OutputBitStream unused, int pointer) throws IOException(Code) | | |
writeDocumentPositions | public int writeDocumentPositions(OutputBitStream unused, int[] occ, int offset, int len, int docSize) throws IOException(Code) | | |
writePositionCount | public int writePositionCount(OutputBitStream out, int count) throws IOException(Code) | | |
writtenBits | public long writtenBits()(Code) | | |
|
|