MG4J: Managing Gigabytes for Java
Visitors for composite {@linkplain it.unimi.dsi.mg4j.search.DocumentIterator document iterators}.
Composites and visitors
A {@link it.unimi.dsi.mg4j.search.DocumentIterator}
(in particular, those provided by MG4J in the package {@link it.unimi.dsi.mg4j.search})
is usually structured as a composite,
with operators as internal nodes and {@link it.unimi.dsi.mg4j.index.IndexIterator}s
as leaves. A composite can be explored using a visitor: thus,
the {@link it.unimi.dsi.mg4j.search.DocumentIterator} interface provides two methods,
{@link it.unimi.dsi.mg4j.search.DocumentIterator#accept(it.unimi.dsi.mg4j.search.visitor.DocumentIteratorVisitor) accept(DocumentIteratorVisitor)} and
{@link it.unimi.dsi.mg4j.search.DocumentIterator#acceptOnTruePaths(it.unimi.dsi.mg4j.search.visitor.DocumentIteratorVisitor) acceptOnTruePaths(DocumentIteratorVisitor)},
that let a {@link it.unimi.dsi.mg4j.search.visitor.DocumentIteratorVisitor} visit the composite structure.
A {@link it.unimi.dsi.mg4j.search.visitor.DocumentIteratorVisitor} provides methods
for visiting in {@linkplain it.unimi.dsi.mg4j.search.visitor.DocumentIteratorVisitor#visitPre(it.unimi.dsi.mg4j.search.DocumentIterator) preorder}
and in {@linkplain it.unimi.dsi.mg4j.search.visitor.DocumentIteratorVisitor#visitPost(it.unimi.dsi.mg4j.search.DocumentIterator) postorder} all internal nodes.
Leaves have a {@linkplain it.unimi.dsi.mg4j.search.visitor.DocumentIteratorVisitor#visit(it.unimi.dsi.mg4j.index.IndexIterator) single visit method} instead. Note that a {@link it.unimi.dsi.mg4j.search.visitor.DocumentIteratorVisitor}
must be (re)usable after each call
to {@link it.unimi.dsi.mg4j.search.visitor.DocumentIteratorVisitor#prepare() prepare()}.
The abstract class {@link it.unimi.dsi.mg4j.search.visitor.AbstractDocumentIteratorVisitor} provides
stubs implementing internal visits and {@link it.unimi.dsi.mg4j.search.visitor.DocumentIteratorVisitor#prepare() prepare()}
as no-ops.
Counting term occurrences
An example of the utility of visitors for document iterators is given by term counting:
using a number of coordinated visitors, it is possible to compute
a count for each term appearing in a (no matter how complex) query. The count can be used as
an input for counting-based scoring schemes, such as BM25 or cosine-based measures. For more information,
please read the documentation of {@link it.unimi.dsi.mg4j.search.visitor.CounterCollectionVisitor}.
|