org.apache.lucene.index.memory |
High-performance single-document main memory Apache Lucene fulltext search index.
|
Java Source File Name | Type | Comment |
AnalyzerUtil.java | Class | Various fulltext analysis utilities avoiding redundant code in several
classes. |
MemoryIndex.java | Class | High-performance single-document main memory Apache Lucene fulltext search index. |
MemoryIndexTest.java | Class | Verifies that Lucene MemoryIndex and RAMDirectory have the same behaviour,
returning the same results for any given query. |
PatternAnalyzer.java | Class | Efficient Lucene analyzer/tokenizer that preferably operates on a String rather than a
java.io.Reader , that can flexibly separate text into terms via a regular expression
Pattern (with behaviour identical to
String.split(String) ),
and that combines the functionality of
org.apache.lucene.analysis.LetterTokenizer ,
org.apache.lucene.analysis.LowerCaseTokenizer ,
org.apache.lucene.analysis.WhitespaceTokenizer ,
org.apache.lucene.analysis.StopFilter into a single efficient
multi-purpose class.
If you are unsure how exactly a regular expression should look like, consider
prototyping by simply trying various expressions on some test texts via
String.split(String) . |
PatternAnalyzerTest.java | Class | Verifies that Lucene PatternAnalyzer and normal Lucene Analyzers have the same behaviour,
returning the same results for any given free text.
Runs a set of texts against a tokenizers/analyzers
Can also be used as a simple benchmark.
Example usage:
cd lucene-cvs
java org.apache.lucene.index.memory.PatternAnalyzerTest 1 1 patluc 1 2 2 *.txt *.xml docs/*.html src/java/org/apache/lucene/index/*.java xdocs/*.xml ../nux/samples/data/*.xml
with WhitespaceAnalyzer problems can be found; These are not bugs but questionable
Lucene features: CharTokenizer.MAX_WORD_LEN = 255.
Thus the PatternAnalyzer produces correct output, whereas the WhitespaceAnalyzer
silently truncates text, and so the comparison results in assertEquals() don't match up. |
SynonymMap.java | Class | Loads the WordNet prolog file wn_s.pl
into a thread-safe main-memory hash map that can be used for fast
high-frequency lookups of synonyms for any given (lowercase) word string.
There holds: If B is a synonym for A (A -> B) then A is also a synonym for B (B -> A).
There does not necessarily hold: A -> B, B -> C then A -> C.
Loading typically takes some 1.5 secs, so should be done only once per
(server) program execution, using a singleton pattern. |
SynonymTokenFilter.java | Class | Injects additional tokens for synonyms of token terms fetched from the
underlying child stream; the child stream must deliver lowercase tokens
for synonyms to be found. |