| org.apache.lucene.analysis.CharTokenizer org.apache.lucene.analysis.ru.RussianLetterTokenizer
RussianLetterTokenizer | public class RussianLetterTokenizer extends CharTokenizer (Code) | | A RussianLetterTokenizer is a tokenizer that extends LetterTokenizer by additionally looking up letters
in a given "russian charset". The problem with LeterTokenizer is that it uses Character.isLetter() method,
which doesn't know how to detect letters in encodings like CP1252 and KOI8
(well-known problems with 0xD7 and 0xF7 chars)
version: $Id: RussianLetterTokenizer.java 564236 2007-08-09 15:21:19Z gsingers $ |
RussianLetterTokenizer | public RussianLetterTokenizer(Reader in, char[] charset)(Code) | | |
|
|