org.apache.lucene.analysis.cn |
|
Java Source File Name | Type | Comment |
ChineseAnalyzer.java | Class | Title: ChineseAnalyzer
Description:
Subclass of org.apache.lucene.analysis.Analyzer
build from a ChineseTokenizer, filtered with ChineseFilter. |
ChineseFilter.java | Class | Title: ChineseFilter
Description: Filter with a stop word table
Rule: No digital is allowed.
English word/token should larger than 1 character.
One Chinese character as one Chinese word.
TO DO:
1. |
ChineseTokenizer.java | Class | Title: ChineseTokenizer
Description: Extract tokens from the Stream using Character.getType()
Rule: A Chinese character as a single token
Copyright: Copyright (c) 2001
Company:
The difference between thr ChineseTokenizer and the
CJKTokenizer (id=23545) is that they have different
token parsing logic.
Let me use an example. |