| java.lang.Object org.apache.solr.analysis.PatternTokenizerFactory
PatternTokenizerFactory | public class PatternTokenizerFactory implements TokenizerFactory(Code) | | This tokenizer uses regex pattern matching to construct distinct tokens
for the input stream. It takes two arguments: "pattern" and "group"
"pattern" is the regular expression.
"group" says which group to extract into tokens.
group=-1 (the default) is equivalent to "split". In this case, the tokens will
be equivalent to the output from:
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#split(java.lang.String)
Using group >= 0 selects the matching group as the token. For example, if you have:
pattern = \'([^\']+)\'
group = 0
input = aaa 'bbb' 'ccc'
the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input
but using group=1, the output would be: bbb and ccc (no ' marks)
author: ryan since: solr1.2 version: $Id:$ |
group | protected int group(Code) | | |
create | public TokenStream create(Reader input)(Code) | | Split the input using configured pattern
|
group | public static List<Token> group(Matcher matcher, String input, int group)(Code) | | Create tokens from the matches in a matcher
|
split | public static List<Token> split(Matcher matcher, String input)(Code) | | This behaves just like String.split( ), but returns a list of Tokens
rather then an array of strings
|
|
|