| |
|
| java.lang.Object org.apache.lucene.analysis.WordlistLoader
WordlistLoader | public class WordlistLoader (Code) | | Loader for text files that represent a list of stopwords.
version: $Id: WordlistLoader.java 564236 2007-08-09 15:21:19Z gsingers $ |
Method Summary | |
public static HashMap | getStemDict(File wordstemfile) Reads a stem dictionary. | public static HashSet | getWordSet(File wordfile) Loads a text file and adds every line as an entry to a HashSet (omitting
leading and trailing whitespace). | public static HashSet | getWordSet(Reader reader) Reads lines from a Reader and adds every line as an entry to a HashSet (omitting
leading and trailing whitespace). |
getStemDict | public static HashMap getStemDict(File wordstemfile) throws IOException(Code) | | Reads a stem dictionary. Each line contains:
word\tstem
(i.e. two tab seperated words)
stem dictionary that overrules the stemming algorithm throws: IOException - |
getWordSet | public static HashSet getWordSet(File wordfile) throws IOException(Code) | | Loads a text file and adds every line as an entry to a HashSet (omitting
leading and trailing whitespace). Every line of the file should contain only
one word. The words need to be in lowercase if you make use of an
Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
Parameters: wordfile - File containing the wordlist A HashSet with the file's words |
getWordSet | public static HashSet getWordSet(Reader reader) throws IOException(Code) | | Reads lines from a Reader and adds every line as an entry to a HashSet (omitting
leading and trailing whitespace). Every line of the Reader should contain only
one word. The words need to be in lowercase if you make use of an
Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
Parameters: reader - Reader containing the wordlist A HashSet with the reader's words |
|
|
|