it.unimi.dsi.mg4j.search.score |
MG4J: Managing Gigabytes for Java
Classes for assigning scores to documents.
The content of this package has changed significantly in MG4J 1.1 (hopefully for the better).
A {@link it.unimi.dsi.mg4j.search.score.Scorer} is an
object that wraps an underlying {@link it.unimi.dsi.mg4j.search.DocumentIterator} and assigns scores to the documents
returned by the underlying iterator. In general, once a scorer has
{@linkplain it.unimi.dsi.mg4j.search.score.Scorer#wrap(it.unimi.dsi.mg4j.search.DocumentIterator) wrapped a document iterator}
one just calls {@link it.unimi.dsi.fastutil.ints.IntIterator#nextInt() nextInt()} and
{@link it.unimi.dsi.mg4j.search.score.Scorer#score()} to get scored documents (some iterators
might support {@linkplain it.unimi.dsi.mg4j.search.score.Scorer#score(it.unimi.dsi.mg4j.index.Index) index-restricted scoring},
but this is optional).
If the scorer is a {@link it.unimi.dsi.mg4j.search.score.DelegatingScorer}, then by contract
it just delegates all {@link it.unimi.dsi.fastutil.ints.IntIterator}'s
methods to the underlying iterator.
In this case, it is possible to advance manually the underlying iterator and call
{@link it.unimi.dsi.mg4j.search.score.Scorer#score()}. While this behaviour is useless for
general users, it is essential for {@linkplain it.unimi.dsi.mg4j.search.score.AbstractAggregator aggregated scorers},
which combine several delegating scorers and provide services such as equalisation
and interval caching (in case more than one component scorer uses intervals). See, for instance,
{@link it.unimi.dsi.mg4j.search.score.LinearAggregator}.
|
Java Source File Name | Type | Comment |
AbstractAggregator.java | Class | A
Scorer that aggregates a number of underlying
it.unimi.dsi.mg4j.search.score.DelegatingScorer delegating scorers , providing equalisation if required.
An aggregator combines the results of several scorers following some policy (see, e.g.,
it.unimi.dsi.mg4j.search.score.LinearAggregator ). |
AbstractIndexScorer.java | Class | An abstract subclass of
it.unimi.dsi.mg4j.search.score.AbstractScorer . |
AbstractScorer.java | Class | An abstract implementation of
it.unimi.dsi.mg4j.search.score.Scorer .
It provides internal caching of the underlying
document iterator during
,
and a complete implementation of the
it.unimi.dsi.fastutil.ints.IntIterator methods by delegation to the underlying document iterator (implementing subclasses
that do not alter this behaviour should implement
it.unimi.dsi.mg4j.search.score.DelegatingScorer ). |
AbstractWeightedScorer.java | Class | An abstract subsclass of
it.unimi.dsi.mg4j.search.score.AbstractIndexScorer providing internal storage and copy of the weight map, faster array-based
access to the latter, and a default implementation of
AbstractWeightedScorer.score() . |
BM25Scorer.java | Class | A scorer that implements the BM25 ranking formula.
Warning: the default values
BM25Scorer.DEFAULT_K1 and
BM25Scorer.DEFAULT_B have changed in MG4J 1.1.2 (see below).
BM25 is the name of a formula derived from the probabilistic model. |
ClarkeCormackScorer.java | Class | Computes the Clarke–Cormack score of all interval iterators of a document.
This score function is defined in Charles L.A. |
ConstantScorer.java | Class | A scorer assigning a constant score (0 by default) to all documents. |
CountScorer.java | Class | A trivial scorer that computes the score by adding the counts
(the number of occurrences within the current document) of each term
multiplied by the weight of the relative index. |
DecreasingDocumentRankScorer.java | Class | Compute scores that do not depend on intervals, but that
just assign a fixed score to each document starting from 1; scores are read
from a file whose name is passed to the constructor.
This scorer assumes that scores are nonnegative and that
documents are ordered in decreasing
score order: that is, that if i < j then
the score of i is greater than or equal to the score of j.
This allows to normalise the score (the document with the highest score has
exactly score 1) without additional costs. |
DelegatingScorer.java | Interface | A marker interface for those scorers that delegate all
it.unimi.dsi.fastutil.ints.IntIterator 's method to the
underlying
it.unimi.dsi.mg4j.search.DocumentIterator DocumentIterator .
An
can only aggregate scorers of this kind. |
DocumentRankScorer.java | Class | Compute scores that do not depend on intervals, but that
just assign a fixed score to each document; scores are read
from a file whose name is passed to the constructor. |
DocumentScoreInfo.java | Class | A container used to return scored results with additional information. |
LinearAggregator.java | Class | An aggregator that computes a linear combination of the component scorers.
This class requires, beside the usually array of scorers, a parallel array
of weights (not to be confused with
it.unimi.dsi.mg4j.search.score.Scorer.setWeights(Reference2DoubleMap) index weights ).
The score from each scorer will be multiplied by the respective weight, and the
overal score will be the sum of these values. |
ScoredDocumentBoundedSizeQueue.java | Class | A queue of scored documents with fixed maximum capacity.
Instances of this class contain a queue in which it possible to
.
The capacity of the queue is fixed at creation time: once the queue is filled, new elements are enqueued
by dequeueing those in the queue or discarded, depending on their score; the return
value of
ScoredDocumentBoundedSizeQueue.enqueue(int,double,Object) can be used to check whether the argument
has been actually enqueued or not. |
Scorer.java | Interface | A wrapper for a
DocumentIterator returning scored document pointers. |
TfIdfScorer.java | Class | A scorer that implements the TF/IDF ranking formula.
There are a number
of incarnations with small variations of the formula itself. |
VignaScorer.java | Class | Computes the Vigna score of all interval iterators of a document.
This scorer progressively moves score from a residual (initialised to 1)
to the current score (initialised to 0). |