| |
|
| java.lang.Object bdd.search.spider.WordExtractor
All known Subclasses: bdd.search.spider.TextWordExtractor, bdd.search.spider.HTMLWordExtractor,
WordExtractor | public class WordExtractor (Code) | | Written by Tim Macinta 1997
Distributed under the GNU Public License
(a copy of which is enclosed with the source).
A WordExtractor should be able to extract the words from
a given file. This class should be subclassed by classes which
understand different document types.
|
Method Summary | |
public void | addWord(String word) Used internally to add a word to the list of words as they are found
in the document. | public int | countOccurances(String word) Returns a count of the number of times that "word" appears in the
the document. | public int | countWords() Returns the number of words in this document. | public int | firstOccurance(String word) Returns the index of "word". | public Enumeration | getWords() Returns an Enumeration that returns each word in the document in
no particular order. |
current_index | int current_index(Code) | | |
WordExtractor | public WordExtractor()(Code) | | |
addWord | public void addWord(String word)(Code) | | Used internally to add a word to the list of words as they are found
in the document.
|
countOccurances | public int countOccurances(String word)(Code) | | Returns a count of the number of times that "word" appears in the
the document.
|
countWords | public int countWords()(Code) | | Returns the number of words in this document.
|
firstOccurance | public int firstOccurance(String word)(Code) | | Returns the index of "word". The index is determined by counting
the words in the document until the first occurance of "word" is
found. For instance, firstOccurance("the") would return 5 if the
document started like this "Once upon a time the giant tomato of...".
Returns -1 if the word is not in the document.
|
getWords | public Enumeration getWords()(Code) | | Returns an Enumeration that returns each word in the document in
no particular order. A word is returned once at most regardless of
the number of times it appears in the document. The Enumeration
returns a String for each call to nextElement().
|
|
|
|