| java.lang.Object com.ibm.icu.text.TransliterationRule
TransliterationRule | class TransliterationRule (Code) | | A transliteration rule used by
RuleBasedTransliterator .
TransliterationRule is an immutable object.
A rule consists of an input pattern and an output string. When
the input pattern is matched, the output string is emitted. The
input pattern consists of zero or more characters which are matched
exactly (the key) and optional context. Context must match if it
is specified. Context may be specified before the key, after the
key, or both. The key, preceding context, and following context
may contain variables. Variables represent a set of Unicode
characters, such as the letters a through z.
Variables are detected by looking up each character in a supplied
variable list to see if it has been so defined.
A rule may contain segments in its input string and segment
references in its output string. A segment is a substring of the
input pattern, indicated by an offset and limit. The segment may
be in the preceding or following context. It may not span a
context boundary. A segment reference is a special character in
the output string that causes a segment of the input string (not
the input pattern) to be copied to the output string. The range of
special characters that represent segment references is defined by
RuleBasedTransliterator.Data.
Example: The rule "([a-z]) . ([0-9]) > $2 . $1" will change the input
string "abc.123" to "ab1.c23".
Copyright © IBM Corporation 1999. All rights reserved.
author: Alan Liu |
Constructor Summary | |
public | TransliterationRule(String input, int anteContextPos, int postContextPos, String output, int cursorPos, int cursorOffset, UnicodeMatcher[] segs, boolean anchorStart, boolean anchorEnd, RuleBasedTransliterator.Data theData) Construct a new rule with the given input, output text, and other
attributes. |
ANCHOR_END | final static int ANCHOR_END(Code) | | |
ANCHOR_START | final static int ANCHOR_START(Code) | | Flag attributes.
|
flags | byte flags(Code) | | Miscellaneous attributes.
|
segments | UnicodeMatcher[] segments(Code) | | An array of matcher objects corresponding to the input pattern
segments. If there are no segments this is null. N.B. This is
a UnicodeMatcher for generality, but in practice it is always a
StringMatcher. In the future we may generalize this, but for
now we sometimes cast down to StringMatcher.
|
TransliterationRule | public TransliterationRule(String input, int anteContextPos, int postContextPos, String output, int cursorPos, int cursorOffset, UnicodeMatcher[] segs, boolean anchorStart, boolean anchorEnd, RuleBasedTransliterator.Data theData)(Code) | | Construct a new rule with the given input, output text, and other
attributes. A cursor position may be specified for the output text.
Parameters: input - input string, including key and optional ante andpost context Parameters: anteContextPos - offset into input to end of ante context, or -1 ifnone. Must be <= input.length() if not -1. Parameters: postContextPos - offset into input to start of post context, or -1if none. Must be <= input.length() if not -1, and must be >=anteContextPos. Parameters: output - output string Parameters: cursorPos - offset into output at which cursor is located, or -1 ifnone. If less than zero, then the cursor is placed after theoutput ; that is, -1 is equivalent tooutput.length() . If greater thanoutput.length() then an exception is thrown. Parameters: cursorOffset - an offset to be added to cursorPos to position thecursor either in the ante context, if < 0, or in the post context, if >0. For example, the rule "abc{def} > | @@@ xyz;" changes "def" to"xyz" and moves the cursor to before "a". It would have a cursorOffsetof -3. Parameters: segs - array of UnicodeMatcher corresponding to input patternsegments, or null if there are none Parameters: anchorStart - true if the the rule is anchored on the left tothe context start Parameters: anchorEnd - true if the rule is anchored on the right to thecontext limit |
addSourceSetTo | void addSourceSetTo(UnicodeSet toUnionTo)(Code) | | Union the set of all characters that may be modified by this rule
into the given set.
|
addTargetSetTo | void addTargetSetTo(UnicodeSet toUnionTo)(Code) | | Union the set of all characters that may be emitted by this rule
into the given set.
|
getAnteContextLength | public int getAnteContextLength()(Code) | | Return the preceding context length. This method is needed to
support the Transliterator method
getMaximumContextLength() .
|
getIndexValue | final int getIndexValue()(Code) | | Internal method. Returns 8-bit index value for this rule.
This is the low byte of the first character of the key,
unless the first character of the key is a set. If it's a
set, or otherwise can match multiple keys, the index value is -1.
|
masks | public boolean masks(TransliterationRule r2)(Code) | | Return true if this rule masks another rule. If r1 masks r2 then
r1 matches any input string that r2 matches. If r1 masks r2 and r2 masks
r1 then r1 == r2. Examples: "a>x" masks "ab>y". "a>x" masks "a[b]>y".
"[c]a>x" masks "[dc]a>y".
|
matchAndReplace | public int matchAndReplace(Replaceable text, Transliterator.Position pos, boolean incremental)(Code) | | Attempt a match and replacement at the given position. Return
the degree of match between this rule and the given text. The
degree of match may be mismatch, a partial match, or a full
match. A mismatch means at least one character of the text
does not match the context or key. A partial match means some
context and key characters match, but the text is not long
enough to match all of them. A full match means all context
and key characters match.
If a full match is obtained, perform a replacement, update pos,
and return U_MATCH. Otherwise both text and pos are unchanged.
Parameters: text - the text Parameters: pos - the position indices Parameters: incremental - if TRUE, test for partial matches that maybe completed by additional text inserted at pos.limit. one of U_MISMATCH ,U_PARTIAL_MATCH , or U_MATCH . Ifincremental is FALSE then U_PARTIAL_MATCH will not be returned. |
matchesIndexValue | final boolean matchesIndexValue(int v)(Code) | | Internal method. Returns true if this rule matches the given
index value. The index value is an 8-bit integer, 0..255,
representing the low byte of the first character of the key.
It matches this rule if it matches the first character of the
key, or if the first character of the key is a set, and the set
contains any character with a low byte equal to the index
value. If the rule contains only ante context, as in foo)>bar,
then it will match any key.
|
toRule | public String toRule(boolean escapeUnprintable)(Code) | | Create a source string that represents this rule. Append it to the
given string.
|
toString | public String toString()(Code) | | Return a string representation of this object.
string representation of this object |
|
|