| Loads the WordNet prolog file wn_s.pl
into a thread-safe main-memory hash map that can be used for fast
high-frequency lookups of synonyms for any given (lowercase) word string.
There holds: If B is a synonym for A (A -> B) then A is also a synonym for B (B -> A).
There does not necessarily hold: A -> B, B -> C then A -> C.
Loading typically takes some 1.5 secs, so should be done only once per
(server) program execution, using a singleton pattern. Once loaded, a
synonym lookup via
SynonymMap.getSynonyms(String) takes constant time O(1).
A loaded default synonym map consumes about 10 MB main memory.
An instance is immutable, hence thread-safe.
This implementation borrows some ideas from the Lucene Syns2Index demo that
Dave Spencer originally contributed to Lucene. Dave's approach
involved a persistent Lucene index which is suitable for occasional
lookups or very large synonym tables, but considered unsuitable for
high-frequency lookups of medium size synonym tables.
Example Usage:
String[] words = new String[] { "hard", "woods", "forest", "wolfish", "xxxx"};
SynonymMap map = new SynonymMap(new FileInputStream("samples/fulltext/wn_s.pl"));
for (int i = 0; i < words.length; i++) {
String[] synonyms = map.getSynonyms(words[i]);
System.out.println(words[i] + ":" + java.util.Arrays.asList(synonyms).toString());
}
Example output:
hard:[arduous, backbreaking, difficult, fermented, firmly, grueling, gruelling, heavily, heavy, intemperately, knockout, laborious, punishing, severe, severely, strong, toilsome, tough]
woods:[forest, wood]
forest:[afforest, timber, timberland, wood, woodland, woods]
wolfish:[edacious, esurient, rapacious, ravening, ravenous, voracious, wolflike]
xxxx:[]
author: whoschek.AT.lbl.DOT.gov See Also: See Also: href="http://www.cogsci.princeton.edu/~wn/man/prologdb.5WN.html">prologdb See Also: man page See Also: Dave's synonym demo site |