| java.lang.Object it.unimi.dsi.mg4j.tool.PartitionDocumentally
PartitionDocumentally | public class PartitionDocumentally (Code) | | Partitions an index documentally.
A global index is partitioned documentally by providing a
DocumentalPartitioningStrategy that specifies a destination local index for each document, and a local document pointer. The global index
is scanned, and the postings are partitioned among the local indices using the provided strategy. For instance,
a
ContiguousDocumentalStrategy divides an index into blocks of contiguous documents.
Since each local index contains a (proper) subset of the original set of documents, it contains in general a (proper)
subset of the terms in the global index. Thus, the local term numbers and the global term numbers will not in general coincide.
As a result, when a set of local indices is accessed transparently as a single index
using a
it.unimi.dsi.mg4j.index.cluster.DocumentalCluster ,
a call to
it.unimi.dsi.mg4j.index.Index.documents(int) will throw an
java.lang.UnsupportedOperationException ,
because there is no way to map the global term numbers to local term numbers.
On the other hand, a call to
it.unimi.dsi.mg4j.index.Index.documents(CharSequence) will be passed each local index to
build a global iterator. To speed up this phase for not-so-frequent terms, when partitioning an index you can require
the construction of
that will be used to try to avoid
inquiring indices that do not contain a term. The precision of the filters is settable.
The property file will use a
it.unimi.dsi.mg4j.index.cluster.DocumentalMergedCluster unless you provide
a
ContiguousDocumentalStrategy , in which case a
it.unimi.dsi.mg4j.index.cluster.DocumentalConcatenatedCluster will be used instead. Note that there might
be other cases in which the latter is adapt, in which case you can edit manually the property file.
Important: this class just partitions the index. No auxiliary files (most notably,
or
) will be generated. Please refer to a
StringMap implementation (e.g.,
ShiftAddXorSignedStringMap or
ImmutableExternalPrefixMap ).
Write-once output and distributed index partitioning
Plase see
it.unimi.dsi.mg4j.tool.PartitionLexically —the same comments apply.
author: Alessandro Arrabito author: Sebastiano Vigna since: 1.0.1 |
Field Summary | |
final public static int | DEFAULT_BUFFER_SIZE The default buffer size for all involved indices. |
Constructor Summary | |
public | PartitionDocumentally(String inputBasename, String outputBasename, DocumentalPartitioningStrategy strategy, String strategyFilename, int bloomFilterPrecision, int bufferSize, Map<Component, Coding> writerFlags, boolean interleaved, boolean skips, int quantum, int height, int skipBufferSize, long logInterval) |
Method Summary | |
public static void | main(String arg) | public void | run() |
DEFAULT_BUFFER_SIZE | final public static int DEFAULT_BUFFER_SIZE(Code) | | The default buffer size for all involved indices.
|
PartitionDocumentally | public PartitionDocumentally(String inputBasename, String outputBasename, DocumentalPartitioningStrategy strategy, String strategyFilename, int bloomFilterPrecision, int bufferSize, Map<Component, Coding> writerFlags, boolean interleaved, boolean skips, int quantum, int height, int skipBufferSize, long logInterval) throws ConfigurationException, IOException, ClassNotFoundException, SecurityException, InstantiationException, IllegalAccessException(Code) | | |
|
|