The calculus of spans.
A span is a <doc,startPosition,endPosition> tuple.
The following span query operators are implemented:
- A SpanTermQuery matches all spans
containing a particular Term.
- A SpanNearQuery matches spans
which occur near one another, and can be used to implement things like
phrase search (when constructed from SpanTermQueries) and inter-phrase
proximity (when constructed from other SpanNearQueries).
- A SpanOrQuery merges spans from a
number of other SpanQueries.
- A SpanNotQuery removes spans
matching one SpanQuery which overlap
another. This can be used, e.g., to implement within-paragraph
search.
- A SpanFirstQuery matches spans
matching
q whose end position is less than
n . This can be used to constrain matches to the first
part of the document.
In all cases, output spans are minimally inclusive. In other words, a
span formed by matching a span in x and y starts at the lesser of the
two starts and ends at the greater of the two ends.
For example, a span query which matches "John Kerry" within ten
words of "George Bush" within the first 100 words of the document
could be constructed with:
SpanQuery john = new SpanTermQuery(new Term("content", "john"));
SpanQuery kerry = new SpanTermQuery(new Term("content", "kerry"));
SpanQuery george = new SpanTermQuery(new Term("content", "george"));
SpanQuery bush = new SpanTermQuery(new Term("content", "bush"));
SpanQuery johnKerry =
new SpanNearQuery(new SpanQuery[] {john, kerry}, 0, true);
SpanQuery georgeBush =
new SpanNearQuery(new SpanQuery[] {george, bush}, 0, true);
SpanQuery johnKerryNearGeorgeBush =
new SpanNearQuery(new SpanQuery[] {johnKerry, georgeBush}, 10, false);
SpanQuery johnKerryNearGeorgeBushAtStart =
new SpanFirstQuery(johnKerryNearGeorgeBush, 100);
Span queries may be freely intermixed with other Lucene queries.
So, for example, the above query can be restricted to documents which
also use the word "iraq" with:
Query query = new BooleanQuery();
query.add(johnKerryNearGeorgeBushAtStart, true, false);
query.add(new TermQuery("content", "iraq"), true, false);
|
NearSpansOrdered.java | Class | A Spans that is formed from the ordered subspans of a SpanNearQuery
where the subspans do not overlap and have a maximum slop between them.
The formed spans only contains minimum slop matches.
The matching slop is computed from the distance(s) between
the non overlapping matching Spans.
Successive matches are always formed from the successive Spans
of the SpanNearQuery.
The formed spans may contain overlaps when the slop is at least 1.
For example, when querying using
t1 t2 t3
with slop at least 1, the fragment:
t1 t2 t1 t3 t2 t3
matches twice:
t1 t2 .. |