| it.unimi.dsi.mg4j.index.IndexWriter
All known Subclasses: it.unimi.dsi.mg4j.index.AbstractBitStreamIndexWriter, it.unimi.dsi.mg4j.index.BitStreamHPIndexWriter,
IndexWriter | public interface IndexWriter (Code) | | An interface for classes that generate indices.
Implementations of this interface are used to write inverted lists in
sequential order, as follows:
- to create a new inverted list, you must call
IndexWriter.newInvertedList() ;
- then, you must specified the frequency using
IndexWriter.writeFrequency(int) ;
- the document records follow; before writing a new document record, you must call
IndexWriter.newDocumentRecord() ;
note that, all in all, the number of calls to
IndexWriter.newDocumentRecord() must be equal to the frequency;
- for each document record, you must supply the information needed for the index you are building
(
,
,
, and
, in this order).
IndexWriter.newDocumentRecord() returns an
OutputBitStream that must be used to write the document-record data.
Note that there is no guarantee that the returned
OutputBitStream coincides with the
underlying bit stream. Moreover, there is no guarantee as to when the bits will be actually
written on the underlying stream, except that when starting a new inverted list, the previous
inverted list, if any, will be written onto the underlying stream.
author: Paolo Boldi author: Sebastiano Vigna since: 1.2 |
Method Summary | |
void | close() Closes this index writer, completing the index creation process and releasing all resources. | OutputBitStream | newDocumentRecord() Starts a new document record. | long | newInvertedList() Starts a new inverted list. | void | printStats(PrintStream stats) Writes to the given print stream statistical information about the index just built. | Properties | properties() Returns properties of the index generated by this index writer.
This method should only be called after
IndexWriter.close() . | int | writeDocumentPointer(OutputBitStream out, int pointer) Writes a document pointer.
This method must be called immediately after
IndexWriter.newDocumentRecord() .
Parameters: out - the output bit stream where the pointer will be written. Parameters: pointer - the document pointer. | int | writeDocumentPositions(OutputBitStream out, int[] occ, int offset, int len, int docSize) Writes the positions of the occurrences of the current term in the current document to the given
OutputBitStream . | int | writeFrequency(int frequency) Writes the frequency.
Parameters: frequency - the (positive) number of document records that this inverted list will contain. | int | writePayload(OutputBitStream out, Payload payload) Writes the payload for the current document.
This method must be called immediately after
IndexWriter.writeDocumentPointer(OutputBitStream,int) .
Parameters: out - the output bit stream where the payload will be written. Parameters: payload - the payload. | int | writePositionCount(OutputBitStream out, int count) Writes the count of the occurrences of the current term in the current document to the given
OutputBitStream . | long | writtenBits() Returns the overall number of bits written onto the underlying stream(s). |
close | void close() throws IOException(Code) | | Closes this index writer, completing the index creation process and releasing all resources.
throws: IllegalStateException - if too few records were written for the last inverted list. |
newDocumentRecord | OutputBitStream newDocumentRecord() throws IOException(Code) | | Starts a new document record.
This method must be called exactly exactly f times, where f is the frequency specified with
IndexWriter.writeFrequency(int) .
the output bit stream where the next document record data should be written. throws: IllegalStateException - if too many records were written for the current inverted list,or if there is no current inverted list. |
newInvertedList | long newInvertedList() throws IOException(Code) | | Starts a new inverted list. The previous inverted list, if any, is actually written
to the underlying bit stream.
the position (in bytes) of the underlying bit stream where the new invertedlist starts. throws: IllegalStateException - if too few records were written for the previous invertedlist. |
printStats | void printStats(PrintStream stats)(Code) | | Writes to the given print stream statistical information about the index just built.
This method must be called after
IndexWriter.close() .
Parameters: stats - a print stream where statistical information will be written. |
writeDocumentPointer | int writeDocumentPointer(OutputBitStream out, int pointer) throws IOException(Code) | | Writes a document pointer.
This method must be called immediately after
IndexWriter.newDocumentRecord() .
Parameters: out - the output bit stream where the pointer will be written. Parameters: pointer - the document pointer. the number of bits written. |
writeDocumentPositions | int writeDocumentPositions(OutputBitStream out, int[] occ, int offset, int len, int docSize) throws IOException(Code) | | Writes the positions of the occurrences of the current term in the current document to the given
OutputBitStream .
Parameters: out - the output stream where the occurrences should be written. Parameters: occ - the position vector (a sequence of strictly increasing natural numbers). Parameters: offset - the first valid entry in occ . Parameters: len - the number of valid entries in occ . Parameters: docSize - the size of the current document (only for Golomb and interpolative coding; you can safely pass -1 otherwise). the number of bits written. throws: IllegalStateException - if there is no current inverted list. |
writeFrequency | int writeFrequency(int frequency) throws IOException(Code) | | Writes the frequency.
Parameters: frequency - the (positive) number of document records that this inverted list will contain. the number of bits written. |
writePositionCount | int writePositionCount(OutputBitStream out, int count) throws IOException(Code) | | Writes the count of the occurrences of the current term in the current document to the given
OutputBitStream .
Parameters: out - the output stream where the occurrences should be written. Parameters: count - the count. the number of bits written. |
writtenBits | long writtenBits()(Code) | | Returns the overall number of bits written onto the underlying stream(s).
the number of bits written, according to the variables keeping statistical records. |
|
|