it.unimi.dsi.mg4j.tool |
MG4J: Managing Gigabytes for Java
Line-command tools for index construction.
The classes in this package
contain a main method, and can be used to build indices starting from a
document sequence. Please have a look at the MG4J manual to learn
how to build an index.
|
Java Source File Name | Type | Comment |
Combine.java | Class | Combines several indices.
Indices may be combined in several different ways. |
Concatenate.java | Class | Concatenates several indices.
This implementation of
it.unimi.dsi.mg4j.tool.Combine concatenates
the involved indices: document 0 of the first index is document 0 of the
final collection, but document 0 of the second index is numbered after
the number of documents in the first index, and so on. |
IndexBuilder.java | Class | An index builder. |
Merge.java | Class | Merges several indices.
This class merges indices by performing a simple ordered list merge. |
PartitionDocumentally.java | Class | Partitions an index documentally.
A global index is partitioned documentally by providing a
DocumentalPartitioningStrategy that specifies a destination local index for each document, and a local document pointer. |
PartitionLexically.java | Class | Partitions an index lexically.
A global index is partitioned lexically by providing a
LexicalPartitioningStrategy that specifies a destination local index for each term, and a local term number. |
Paste.java | Class | Pastes several indices.
Pasting is a very slow way of combining indices: we assume
that not only documents, but also document occurrences might be scattered
throughout several indices. |
Scan.java | Class | Scans a document sequence, dividing it in batches of occurrences and writing for each batch a
corresponding subindex.
This class (more precisely, its
Scan.run(String,DocumentSequence,TermProcessor,String,int,int,int[],VirtualDocumentResolver[],int[],String,long,String) run() method) reads a document sequence and produces several batches, that is, subindices
corresponding to subsets of term/document pairs of the collection. |
ScanMetadata.java | Class | Scans a document sequence and prints on standard output the corresponding URIs. |
URLMPHVirtualDocumentResolver.java | Class | A virtual-document resolver based on document URIs.
Instances of this class store in a
StringMap instances
all URIs from a collection, and consider a virtual-document specification a (possibly relative) URI. |
VirtualDocumentResolver.java | Interface | A resolver for virtual documents.
Fields of
return
a list of
containing a document specification (e.g., its URI) and the virtual text associated to the document. |