| org.apache.lucene.analysis.ru.RussianLetterTokenizer
RussianLetterTokenizer | public class RussianLetterTokenizer extends CharTokenizer (Code) | | A RussianLetterTokenizer is a tokenizer that extends LetterTokenizer by additionally looking up letters
in a given "russian charset". The problem with LeterTokenizer is that it uses Character.isLetter() method,
which doesn't know how to detect letters in encodings like CP1252 and KOI8
(well-known problems with 0xD7 and 0xF7 chars)
author: Boris Okner, b.okner@rogers.com version: $Id: RussianLetterTokenizer.java,v 1.1 2005/06/02 01:35:59 jfendler Exp $ |
RussianLetterTokenizer | public RussianLetterTokenizer(Reader in, char[] charset)(Code) | | |
|
|