Method Summary |
|
public String | current() Returns the current word in the text. |
public String | currentNGram(int n) Returns the current word N-gram from the input. |
public String | currentSegment() Returns the current text segment from the input. |
public String | currentWordGram(int n) Returns the current word N-gram from the input. |
public String | getText() Returns the text associated with this DefaultWordFinder. |
public boolean | hasNext() Tests if there are more words available from the text. |
protected int | ignore(int index, char startIgnore) Ignore all characters from the text after the first occurence of a given character.
Parameters: index - A starting index for the text from where characters should be ignored Parameters: startIgnore - The character that marks the begining of the sequence to be ignored. |
protected int | ignore(int index, char startIgnore, char endIgnore) Ignore all characters from the text between the first occurence of a given character
and the next occurence of another given character.
Parameters: index - A starting index for the text from where characters should be ignored. Parameters: startIgnore - The character that marks the begining of the sequence to be ignored. Parameters: endIgnore - The character that marks the ending of the sequence to be ignored. |
protected int | ignore(int index, Character startIgnore, Character endIgnore) Ignore all characters from the text between the first occurence of a given character
and the next occurence of another given character.
Parameters: index - A starting index for the text from where characters should be ignored. Parameters: startIgnore - The character that marks the begining of the sequence to be ignored. Parameters: endIgnore - The character that marks the ending of the sequence to be ignored, or nullif all the next characters from the text are to be ignored. |
protected int | ignore(int index, String startIgnore, String endIgnore) Ignore all characters from the text between the first occurence of a given String
and the next occurence of another given String.
Parameters: index - A starting index for the text from where characters should be ignored. Parameters: startIgnore - The String that marks the begining of the sequence to be ignored. Parameters: endIgnore - The String that marks the ending of the sequence to be ignored. |
protected static boolean | isWordChar(String text, int posn) Checks if the character at a given position in a String is part of a word.
Special characters such as '.' or '-' are considered alphanumeric or not depending
on the surrounding characters. |
protected static boolean | isWordChar(char c) Checks if a given character is alphanumeric.
Parameters: c - The char to check. |
public String | lookAhead() Retuns the next word without advancing the tokenizer, cheking if the character
separating both words is an empty space. |
public String | next() This method scans the text from the end of the last word, and returns a
String corresponding to the next word. |
public String | nextSegment() Returns the next text segment from the input. |
public void | replace(String newWord) Replaces the current word in the text. |
public void | replaceBigram(String newBigram) Replaces the current bigram (current word and the next as returned by lookahead) in
the text. |
public void | replaceSegment(String newSegment) Replaces the current text segment. |
public void | setText(String newText) Changes the text associates with this DefaultWordFinder. |
public static String[] | splitNGrams(String text, int n) Splits a given String into an array with its constituent character n-grams.
Parameters: text - A String. Parameters: n - Number of consecutive characters on the n-grams. |
public static String[] | splitSegments(String text) Splits a given String into an array with its constituent text segments.
Parameters: text - A String. |
public static String[] | splitWordGrams(String text, int n) Splits a given String into an array with its constituent word n-grams.
Parameters: text - A String. Parameters: n - Number of consecutive words on the n-grams. |
public static String[] | splitWords(String text) Splits a given String into an array with its constituent words.
Parameters: text - A String. |
public boolean | startsSentence() Checks if the current word marks the begining of a sentence. |
public String | toString() Produces a string representation of this word finder by returning
the associated text. |