| com.ibm.icu.text.UnicodeMatcher
All known Subclasses: com.ibm.icu.text.UnicodeFilter, com.ibm.icu.text.Quantifier, com.ibm.icu.text.StringMatcher,
UnicodeMatcher | public interface UnicodeMatcher (Code) | | UnicodeMatcher defines a protocol for objects that can
match a range of characters in a Replaceable string.
|
Field Summary | |
final static char | ETHER The character at index i, where i < contextStart || i >= contextLimit,
is ETHER. | final public static int | U_MATCH Constant returned by matches() indicating a
complete match between the text and this matcher. | final public static int | U_MISMATCH Constant returned by matches() indicating a
mismatch between the text and this matcher. | final public static int | U_PARTIAL_MATCH Constant returned by matches() indicating a
partial match between the text and this matcher. |
Method Summary | |
abstract public void | addMatchSetTo(UnicodeSet toUnionTo) Union the set of all characters that may be matched by this object
into the given set. | abstract public int | matches(Replaceable text, int[] offset, int limit, boolean incremental) Return a UMatchDegree value indicating the degree of match for
the given text at the given offset. | abstract public boolean | matchesIndexValue(int v) Returns TRUE if this matcher will match a character c, where c
& 0xFF == v, at offset, in the forward direction (with limit >
offset). | abstract public String | toPattern(boolean escapeUnprintable) Returns a string representation of this matcher. |
ETHER | final static char ETHER(Code) | | The character at index i, where i < contextStart || i >= contextLimit,
is ETHER. This allows explicit matching by rules and UnicodeSets
of text outside the context. In traditional terms, this allows anchoring
at the start and/or end.
|
U_MATCH | final public static int U_MATCH(Code) | | Constant returned by matches() indicating a
complete match between the text and this matcher. For an
incremental variable-length match, this value is returned if
the given text matches, and it is known that additional
characters would not alter the extent of the match.
|
U_MISMATCH | final public static int U_MISMATCH(Code) | | Constant returned by matches() indicating a
mismatch between the text and this matcher. The text contains
a character which does not match, or the text does not contain
all desired characters for a non-incremental match.
|
U_PARTIAL_MATCH | final public static int U_PARTIAL_MATCH(Code) | | Constant returned by matches() indicating a
partial match between the text and this matcher. This value is
only returned for incremental match operations. All characters
of the text match, but more characters are required for a
complete match. Alternatively, for variable-length matchers,
all characters of the text match, and if more characters were
supplied at limit, they might also match.
|
addMatchSetTo | abstract public void addMatchSetTo(UnicodeSet toUnionTo)(Code) | | Union the set of all characters that may be matched by this object
into the given set.
Parameters: toUnionTo - the set into which to union the source characters |
matches | abstract public int matches(Replaceable text, int[] offset, int limit, boolean incremental)(Code) | | Return a UMatchDegree value indicating the degree of match for
the given text at the given offset. Zero, one, or more
characters may be matched.
Matching in the forward direction is indicated by limit >
offset. Characters from offset forwards to limit-1 will be
considered for matching.
Matching in the reverse direction is indicated by limit <
offset. Characters from offset backwards to limit+1 will be
considered for matching.
If limit == offset then the only match possible is a zero
character match (which subclasses may implement if desired).
If U_MATCH is returned, then as a side effect, advance the
offset parameter to the limit of the matched substring. In the
forward direction, this will be the index of the last matched
character plus one. In the reverse direction, this will be the
index of the last matched character minus one.
Parameters: text - the text to be matched Parameters: offset - on input, the index into text at which to beginmatching. On output, the limit of the matched text. Thenumber of matched characters is the output value of offsetminus the input value. Offset should always point to theHIGH SURROGATE (leading code unit) of a pair of surrogates,both on entry and upon return. Parameters: limit - the limit index of text to be matched. Greaterthan offset for a forward direction match, less than offset fora backward direction match. The last character to beconsidered for matching will be text.charAt(limit-1) in theforward direction or text.charAt(limit+1) in the backwarddirection. Parameters: incremental - if TRUE, then assume further characters maybe inserted at limit and check for partial matching. Otherwiseassume the text as given is complete. a match degree value indicating a full match, a partialmatch, or a mismatch. If incremental is FALSE thenU_PARTIAL_MATCH should never be returned. |
matchesIndexValue | abstract public boolean matchesIndexValue(int v)(Code) | | Returns TRUE if this matcher will match a character c, where c
& 0xFF == v, at offset, in the forward direction (with limit >
offset). This is used by RuleBasedTransliterator for
indexing.
Note: This API uses an int even though the value will be
restricted to 8 bits in order to avoid complications with
signedness (bytes convert to ints in the range -128..127).
|
toPattern | abstract public String toPattern(boolean escapeUnprintable)(Code) | | Returns a string representation of this matcher. If the result of
calling this function is passed to the appropriate parser, it
will produce another matcher that is equal to this one.
Parameters: escapeUnprintable - if TRUE then convert unprintablecharacter to their hex escape representations, \\uxxxx or\\Uxxxxxxxx. Unprintable characters are those other thanU+000A, U+0020..U+007E. |
|
|