| java.lang.Object it.unimi.dsi.mg4j.tool.Combine it.unimi.dsi.mg4j.tool.Paste
Paste | final public class Paste extends Combine (Code) | | Pastes several indices.
Pasting is a very slow way of combining indices: we assume
that not only documents, but also document occurrences might be scattered
throughout several indices. When a document appears in several indices,
its occurrences in a given index are combined by renumbering them starting
from the sum of the sizes for the document in the previous indices.
Conceptually, this operation is equivalent to splitting a collection
vertically: each document is divided into a fixed number n
of consecutive segments (possibly of length 0), and a set of n indices
is created using the k-th segment of all documents. Pasting the
resulting indices will produce an index that is identical to the index generated
by the original collection. The behaviour is analogous to that of the UN*X
paste command if documents are single-line lists of words.
In pratice, pasting is usually applied to indices obtained from
a
(e.g., indices containing anchor text fragments).
Note that in case every document appears at most in one index pasting
is equivalent to
. It is, however,
significantly slower, as the presence of the same document in several lists makes
it necessary to scan completely the inverted lists to be pasted to compute the
frequency.
author: Sebastiano Vigna since: 1.0 |
Field Summary | |
final public static int | DEFAULT_MEMORY_BUFFER_SIZE The default size of the temporary bit stream buffer used while pasting. | protected int[] | doc The reference array of the document queue. | protected IntHeapPriorityQueue | documentQueue The queue containing document pointers (for remapped indices). |
Constructor Summary | |
public | Paste(String outputBasename, String[] inputBasename, boolean metadataOnly, int bufferSize, File tempFileDir, int tempBufferSize, Map<Component, Coding> writerFlags, boolean interleaved, boolean skips, int quantum, int height, int skipBufferSize, long logInterval) |
DEFAULT_MEMORY_BUFFER_SIZE | final public static int DEFAULT_MEMORY_BUFFER_SIZE(Code) | | The default size of the temporary bit stream buffer used while pasting. Posting lists larger
than this size will be precomputed on disk and then added to the index.
|
doc | protected int[] doc(Code) | | The reference array of the document queue.
|
documentQueue | protected IntHeapPriorityQueue documentQueue(Code) | | The queue containing document pointers (for remapped indices).
|
Paste | public Paste(String outputBasename, String[] inputBasename, boolean metadataOnly, int bufferSize, File tempFileDir, int tempBufferSize, Map<Component, Coding> writerFlags, boolean interleaved, boolean skips, int quantum, int height, int skipBufferSize, long logInterval) throws IOException, ConfigurationException, URISyntaxException, ClassNotFoundException, SecurityException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException(Code) | | |
combineNumberOfDocuments | protected int combineNumberOfDocuments()(Code) | | |
Methods inherited from it.unimi.dsi.mg4j.tool.Combine | abstract protected int combine(int numUsedIndices) throws IOException(Code)(Java Doc) abstract protected int combineNumberOfDocuments()(Code)(Java Doc) abstract protected int combineSizes() throws IOException(Code)(Java Doc) protected BitStreamIndex getIndex(CharSequence basename) throws ConfigurationException, IOException, URISyntaxException, ClassNotFoundException, SecurityException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException(Code)(Java Doc) public static void main(String[] arg) throws JSAPException, ConfigurationException, IOException, URISyntaxException, ClassNotFoundException, SecurityException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException(Code)(Java Doc) public static void main(String[] arg, Class<? extends Combine> combineClass) throws JSAPException, ConfigurationException, IOException, URISyntaxException, ClassNotFoundException, SecurityException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException(Code)(Java Doc) public void run() throws ConfigurationException, IOException(Code)(Java Doc) protected IntIterator sizes(int numIndex) throws FileNotFoundException(Code)(Java Doc)
|
|
|