| java.lang.Object com.ibm.icu.text.BreakIterator com.ibm.icu.text.RuleBasedBreakIterator
All known Subclasses: com.ibm.icu.text.DictionaryBasedBreakIterator,
RuleBasedBreakIterator | public class RuleBasedBreakIterator extends BreakIterator (Code) | | Rule Based Break Iterator
This is a port of the C++ class RuleBasedBreakIterator from ICU4C.
|
Field Summary | |
final public static int | WORD_IDEO | final public static int | WORD_IDEO_LIMIT | final public static int | WORD_KANA | final public static int | WORD_KANA_LIMIT | final public static int | WORD_LETTER Tag value for words that contain letters, excluding
hiragana, katakana or ideographic characters, lower limit. | final public static int | WORD_LETTER_LIMIT | final public static int | WORD_NONE Tag value for "words" that do not fit into any of other categories. | final public static int | WORD_NONE_LIMIT Upper bound for tags for uncategorized words. | final public static int | WORD_NUMBER Tag value for words that appear to be numbers, lower limit. | final public static int | WORD_NUMBER_LIMIT Tag value for words that appear to be numbers, upper limit. | protected static String | fDebugEnv Control debug, trace and dump options. | protected int | fDictionaryCharCount Counter for the number of characters encountered with the "dictionary"
flag set. | protected RBBIDataWrapper | fRData | public static boolean | fTrace Debugging flag. |
Method Summary | |
static int | CICurrent32(CharacterIterator ci) | static int | CINext32(CharacterIterator ci) Move the iterator forward to the next code point, and return that code point,
leaving the iterator positioned at char returned. | final protected static void | checkOffset(int offset, CharacterIterator text) Throw IllegalArgumentException unless begin <= offset < end. | public Object | clone() Clones this iterator. | public int | current() Returns the current iteration position. | public void | dump() Dump the contents of the state table and character classes for this break iterator. | public boolean | equals(Object that) Returns true if both BreakIterators are of the same class, have the same
rules, and iterate over the same text. | public int | first() Sets the current iteration position to the beginning of the text. | public int | following(int offset) Sets the iterator to refer to the first boundary position following
the specified position.
Parameters: offset - The position from which to begin searching for a break position. | public static RuleBasedBreakIterator | getInstanceFromCompiledRules(InputStream is) Create a break iterator from a precompiled set of rules. | public int | getRuleStatus() Return the status tag from the break rule that determined the most recently
returned break position. | public int | getRuleStatusVec(int[] fillInArray) Get the status (tag) values from the break rule(s) that determined the most
recently returned break position. | public CharacterIterator | getText() Return a CharacterIterator over the text being analyzed. | int | handleNext() | public int | hashCode() | public boolean | isBoundary(int offset) Returns true if the specfied position is a boundary position. | boolean | isDictionaryChar(int c) | public int | last() Sets the current iteration position to the end of the text. | public int | next(int n) Advances the iterator either forward or backward the specified number of steps.
Negative values move backward, and positive values move forward. | public int | next() Advances the iterator to the next boundary position. | public int | preceding(int offset) Sets the iterator to refer to the last boundary position before the
specified position.
Parameters: offset - The position to begin searching for a break from. | public int | previous() Moves the iterator backwards, to the last boundary preceding this one. | public void | setText(CharacterIterator newText) Set the iterator to analyze a new piece of text. | public String | toString() Returns the description (rules) used to create this iterator. |
WORD_IDEO | final public static int WORD_IDEO(Code) | | Tag value for words containing ideographic characters, lower limit
|
WORD_IDEO_LIMIT | final public static int WORD_IDEO_LIMIT(Code) | | Tag value for words containing ideographic characters, upper limit
|
WORD_KANA | final public static int WORD_KANA(Code) | | Tag value for words containing kana characters, lower limit
|
WORD_KANA_LIMIT | final public static int WORD_KANA_LIMIT(Code) | | Tag value for words containing kana characters, upper limit
|
WORD_LETTER | final public static int WORD_LETTER(Code) | | Tag value for words that contain letters, excluding
hiragana, katakana or ideographic characters, lower limit.
|
WORD_LETTER_LIMIT | final public static int WORD_LETTER_LIMIT(Code) | | Tag value for words containing letters, upper limit
|
WORD_NONE | final public static int WORD_NONE(Code) | | Tag value for "words" that do not fit into any of other categories.
Includes spaces and most punctuation.
|
WORD_NONE_LIMIT | final public static int WORD_NONE_LIMIT(Code) | | Upper bound for tags for uncategorized words.
|
WORD_NUMBER | final public static int WORD_NUMBER(Code) | | Tag value for words that appear to be numbers, lower limit.
|
WORD_NUMBER_LIMIT | final public static int WORD_NUMBER_LIMIT(Code) | | Tag value for words that appear to be numbers, upper limit.
|
fDebugEnv | protected static String fDebugEnv(Code) | | Control debug, trace and dump options.
|
fDictionaryCharCount | protected int fDictionaryCharCount(Code) | | Counter for the number of characters encountered with the "dictionary"
flag set. Normal RBBI iterators don't use it, although the code
for updating it is live. Dictionary Based break iterators (a subclass
of us) access this field directly.
|
fTrace | public static boolean fTrace(Code) | | Debugging flag. Trace operation of state machine when true.
|
RuleBasedBreakIterator | public RuleBasedBreakIterator()(Code) | | |
RuleBasedBreakIterator | public RuleBasedBreakIterator(String rules)(Code) | | Construct a RuleBasedBreakIterator from a set of rules supplied as a string.
Parameters: rules - The break rules to be used. Parameters: parseError - In the event of a syntax error in the rules, provides the locationwithin the rules of the problem. Parameters: status - Information on any errors encountered. |
CINext32 | static int CINext32(CharacterIterator ci)(Code) | | Move the iterator forward to the next code point, and return that code point,
leaving the iterator positioned at char returned.
For Supplementary chars, the iterator is left positioned at the lead surrogate.
Parameters: ci - The character iterator The next code point. |
checkOffset | final protected static void checkOffset(int offset, CharacterIterator text)(Code) | | Throw IllegalArgumentException unless begin <= offset < end.
|
clone | public Object clone()(Code) | | Clones this iterator.
A newly-constructed RuleBasedBreakIterator with the samebehavior as this one. |
current | public int current()(Code) | | Returns the current iteration position.
The current iteration position. |
dump | public void dump()(Code) | | Dump the contents of the state table and character classes for this break iterator.
For debugging only.
|
equals | public boolean equals(Object that)(Code) | | Returns true if both BreakIterators are of the same class, have the same
rules, and iterate over the same text.
|
first | public int first()(Code) | | Sets the current iteration position to the beginning of the text.
(i.e., the CharacterIterator's starting offset).
The offset of the beginning of the text. |
following | public int following(int offset)(Code) | | Sets the iterator to refer to the first boundary position following
the specified position.
Parameters: offset - The position from which to begin searching for a break position. The position of the first break after the current position. |
getRuleStatus | public int getRuleStatus()(Code) | | Return the status tag from the break rule that determined the most recently
returned break position. The values appear in the rule source
within brackets, {123}, for example. For rules that do not specify a
status, a default value of 0 is returned. If more than one rule applies,
the numerically largest of the possible status values is returned.
Of the standard types of ICU break iterators, only the word break
iterator provides status values. The values are defined in
class RuleBasedBreakIterator, and allow distinguishing between words
that contain alphabetic letters, "words" that appear to be numbers,
punctuation and spaces, words containing ideographic characters, and
more. Call getRuleStatus after obtaining a boundary
position from next(), previous() , or
any other break iterator functions that returns a boundary position.
the status from the break rule that determined the most recentlyreturned break position. |
getRuleStatusVec | public int getRuleStatusVec(int[] fillInArray)(Code) | | Get the status (tag) values from the break rule(s) that determined the most
recently returned break position. The values appear in the rule source
within brackets, {123}, for example. The default status value for rules
that do not explicitly provide one is zero.
The status values used by the standard ICU break rules are defined
as public constants in class RuleBasedBreakIterator.
If the size of the output array is insufficient to hold the data,
the output will be truncated to the available length. No exception
will be thrown.
Parameters: fillInArray - an array to be filled in with the status values. The number of rule status values from rules that determined the most recent boundary returned by the break iterator.In the event that the array is too small, the return valueis the total number of status values that were available,not the reduced number that were actually returned. |
getText | public CharacterIterator getText()(Code) | | Return a CharacterIterator over the text being analyzed. This version
of this method returns the actual CharacterIterator we're using internally.
Changing the state of this iterator can have undefined consequences. If
you need to change it, clone it first.
An iterator over the text being analyzed. |
handleNext | int handleNext()(Code) | | |
hashCode | public int hashCode()(Code) | | Compute a hashcode for this BreakIterator
A hash code |
isBoundary | public boolean isBoundary(int offset)(Code) | | Returns true if the specfied position is a boundary position. As a side
effect, leaves the iterator pointing to the first boundary position at
or after "offset".
Parameters: offset - the offset to check. True if "offset" is a boundary position. |
isDictionaryChar | boolean isDictionaryChar(int c)(Code) | | |
last | public int last()(Code) | | Sets the current iteration position to the end of the text.
(i.e., the CharacterIterator's ending offset).
The text's past-the-end offset. |
next | public int next(int n)(Code) | | Advances the iterator either forward or backward the specified number of steps.
Negative values move backward, and positive values move forward. This is
equivalent to repeatedly calling next() or previous().
Parameters: n - The number of steps to move. The sign indicates the direction(negative is backwards, and positive is forwards). The character offset of the boundary position n boundaries away fromthe current one. |
next | public int next()(Code) | | Advances the iterator to the next boundary position.
The position of the first boundary after this one. |
preceding | public int preceding(int offset)(Code) | | Sets the iterator to refer to the last boundary position before the
specified position.
Parameters: offset - The position to begin searching for a break from. The position of the last boundary before the starting position. |
previous | public int previous()(Code) | | Moves the iterator backwards, to the last boundary preceding this one.
The position of the last boundary position preceding this one. |
setText | public void setText(CharacterIterator newText)(Code) | | Set the iterator to analyze a new piece of text. This function resets
the current iteration position to the beginning of the text.
Parameters: newText - An iterator over the text to analyze. |
toString | public String toString()(Code) | | Returns the description (rules) used to create this iterator.
(In ICU4C, the same function is RuleBasedBreakIterator::getRules())
|
|
|