| java.lang.Object org.apache.lucene.search.spell.SpellChecker
SpellChecker | public class SpellChecker (Code) | |
Spell Checker class (Main class)
(initially inspired by the David Spencer code).
Example Usage:
SpellChecker spellchecker = new SpellChecker(spellIndexDirectory);
// To index a field of a user index:
spellchecker.indexDictionary(new LuceneDictionary(my_lucene_reader, a_field));
// To index a file containing words:
spellchecker.indexDictionary(new PlainTextDictionary(new File("myfile.txt")));
String[] suggestions = spellchecker.suggestSimilar("misspelt", 5);
version: 1.0 |
Constructor Summary | |
public | SpellChecker(Directory spellIndex) Use the given directory as a spell checker index. |
Method Summary | |
public void | clearIndex() Removes all terms from the spell check index. | public boolean | exist(String word) Check whether the word exists in the index. | protected void | finalize() Closes the internal IndexReader. | public void | indexDictionary(Dictionary dict) | public void | setAccuracy(float minScore) | public void | setSpellIndex(Directory spellIndex) Use a different index as the spell checker index or re-open
the existing index if spellIndex is the same value
as given in the constructor. | public String[] | suggestSimilar(String word, int numSug) Suggest similar words.
As the Lucene similarity that is used to fetch the most relevant n-grammed terms
is not the same as the edit distance strategy used to calculate the best
matching spell-checked word from the hits that Lucene found, one usually has
to retrieve a couple of numSug's in order to get the true best match.
I.e. | public String[] | suggestSimilar(String word, int numSug, IndexReader ir, String field, boolean morePopular) Suggest similar words (optionally restricted to a field of an index).
As the Lucene similarity that is used to fetch the most relevant n-grammed terms
is not the same as the edit distance strategy used to calculate the best
matching spell-checked word from the hits that Lucene found, one usually has
to retrieve a couple of numSug's in order to get the true best match.
I.e. |
F_WORD | final public static String F_WORD(Code) | | Field name for each word in the ngram index.
|
SpellChecker | public SpellChecker(Directory spellIndex) throws IOException(Code) | | Use the given directory as a spell checker index. The directory
is created if it doesn't exist yet.
Parameters: spellIndex - throws: IOException - |
exist | public boolean exist(String word) throws IOException(Code) | | Check whether the word exists in the index.
Parameters: word - throws: IOException - true iff the word exists in the index |
finalize | protected void finalize() throws Throwable(Code) | | Closes the internal IndexReader.
|
setAccuracy | public void setAccuracy(float minScore)(Code) | | Sets the accuracy 0 < minScore < 1; default 0.5
|
setSpellIndex | public void setSpellIndex(Directory spellIndex) throws IOException(Code) | | Use a different index as the spell checker index or re-open
the existing index if spellIndex is the same value
as given in the constructor.
Parameters: spellIndex - throws: IOException - |
suggestSimilar | public String[] suggestSimilar(String word, int numSug) throws IOException(Code) | | Suggest similar words.
As the Lucene similarity that is used to fetch the most relevant n-grammed terms
is not the same as the edit distance strategy used to calculate the best
matching spell-checked word from the hits that Lucene found, one usually has
to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one.
Thus, you should set this value to at least 5 for a good suggestion.
Parameters: word - the word you want a spell check done on Parameters: numSug - the number of suggested words throws: IOException - String[] |
suggestSimilar | public String[] suggestSimilar(String word, int numSug, IndexReader ir, String field, boolean morePopular) throws IOException(Code) | | Suggest similar words (optionally restricted to a field of an index).
As the Lucene similarity that is used to fetch the most relevant n-grammed terms
is not the same as the edit distance strategy used to calculate the best
matching spell-checked word from the hits that Lucene found, one usually has
to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one.
Thus, you should set this value to at least 5 for a good suggestion.
Parameters: word - the word you want a spell check done on Parameters: numSug - the number of suggested words Parameters: ir - the indexReader of the user index (can be null see field param) Parameters: field - the field of the user index: if field is not null, the suggestedwords are restricted to the words present in this field. Parameters: morePopular - return only the suggest words that are more frequent than the searched word(only if restricted mode = (indexReader!=null and field!=null) throws: IOException - String[] the sorted list of the suggest words with these 2 criteria:first criteria: the edit distance, second criteria (only if restricted mode): the popularityof the suggest words in the field of the user index |
|
|