| java.lang.Object org.apache.lucene.analysis.Analyzer org.apache.lucene.index.memory.PatternAnalyzer
Inner Class :final static class FastStringReader extends StringReader | |
Constructor Summary | |
public | PatternAnalyzer(Pattern pattern, boolean toLowerCase, Set stopWords) Constructs a new instance with the given parameters.
Parameters: pattern - a regular expression delimiting tokens Parameters: toLowerCase - if true returns tokens after applyingString.toLowerCase() Parameters: stopWords - if non-null, ignores all tokens that are contained in thegiven stop set (after previously having applied toLowerCase()if applicable). |
Method Summary | |
public boolean | equals(Object other) Indicates whether some other object is "equal to" this one.
Parameters: other - the reference object with which to compare. | public int | hashCode() Returns a hash code value for the object. | public TokenStream | tokenStream(String fieldName, String text) Creates a token stream that tokenizes the given string into token terms
(aka words). | public TokenStream | tokenStream(String fieldName, Reader reader) Creates a token stream that tokenizes all the text in the given Reader;
This implementation forwards to tokenStream(String, String) and is
less efficient than tokenStream(String, String) . |
DEFAULT_ANALYZER | final public static PatternAnalyzer DEFAULT_ANALYZER(Code) | | A lower-casing word analyzer with English stop words (can be shared
freely across threads without harm); global per class loader.
|
EXTENDED_ANALYZER | final public static PatternAnalyzer EXTENDED_ANALYZER(Code) | | A lower-casing word analyzer with extended English stop words
(can be shared freely across threads without harm); global per class
loader. The stop words are borrowed from
http://thomas.loc.gov/home/stopwords.html, see
http://thomas.loc.gov/home/all.about.inquery.html
|
NON_WORD_PATTERN | final public static Pattern NON_WORD_PATTERN(Code) | | "\\W+" ; Divides text at non-letters (NOT Character.isLetter(c))
|
WHITESPACE_PATTERN | final public static Pattern WHITESPACE_PATTERN(Code) | | "\\s+" ; Divides text at whitespaces (Character.isWhitespace(c))
|
PatternAnalyzer | public PatternAnalyzer(Pattern pattern, boolean toLowerCase, Set stopWords)(Code) | | Constructs a new instance with the given parameters.
Parameters: pattern - a regular expression delimiting tokens Parameters: toLowerCase - if true returns tokens after applyingString.toLowerCase() Parameters: stopWords - if non-null, ignores all tokens that are contained in thegiven stop set (after previously having applied toLowerCase()if applicable). For example, created viaStopFilter.makeStopSet(String[])and/ororg.apache.lucene.analysis.WordlistLoaderas inWordlistLoader.getWordSet(new File("samples/fulltext/stopwords.txt") or other stop wordslists . |
equals | public boolean equals(Object other)(Code) | | Indicates whether some other object is "equal to" this one.
Parameters: other - the reference object with which to compare. true if equal, false otherwise |
hashCode | public int hashCode()(Code) | | Returns a hash code value for the object.
the hash code. |
tokenStream | public TokenStream tokenStream(String fieldName, String text)(Code) | | Creates a token stream that tokenizes the given string into token terms
(aka words).
Parameters: fieldName - the name of the field to tokenize (currently ignored). Parameters: text - the string to tokenize a new token stream |
tokenStream | public TokenStream tokenStream(String fieldName, Reader reader)(Code) | | Creates a token stream that tokenizes all the text in the given Reader;
This implementation forwards to tokenStream(String, String) and is
less efficient than tokenStream(String, String) .
Parameters: fieldName - the name of the field to tokenize (currently ignored). Parameters: reader - the reader delivering the text a new token stream |
|
|