001: /*
002: *******************************************************************************
003: * Copyright (C) 1996-2006, International Business Machines Corporation and *
004: * others. All Rights Reserved. *
005: *******************************************************************************
006: */
007: package com.ibm.icu.text;
008:
009: import java.util.Hashtable;
010:
011: /**
012: * <code>RuleBasedTransliterator</code> is a transliterator
013: * that reads a set of rules in order to determine how to perform
014: * translations. Rule sets are stored in resource bundles indexed by
015: * name. Rules within a rule set are separated by semicolons (';').
016: * To include a literal semicolon, prefix it with a backslash ('\').
017: * Whitespace, as defined by <code>UCharacterProperty.isRuleWhiteSpace()</code>,
018: * is ignored. If the first non-blank character on a line is '#',
019: * the entire line is ignored as a comment. </p>
020: *
021: * <p>Each set of rules consists of two groups, one forward, and one
022: * reverse. This is a convention that is not enforced; rules for one
023: * direction may be omitted, with the result that translations in
024: * that direction will not modify the source text. In addition,
025: * bidirectional forward-reverse rules may be specified for
026: * symmetrical transformations.</p>
027: *
028: * <p><b>Rule syntax</b> </p>
029: *
030: * <p>Rule statements take one of the following forms: </p>
031: *
032: * <dl>
033: * <dt><code>$alefmadda=\u0622;</code></dt>
034: * <dd><strong>Variable definition.</strong> The name on the
035: * left is assigned the text on the right. In this example,
036: * after this statement, instances of the left hand name,
037: * "<code>$alefmadda</code>", will be replaced by
038: * the Unicode character U+0622. Variable names must begin
039: * with a letter and consist only of letters, digits, and
040: * underscores. Case is significant. Duplicate names cause
041: * an exception to be thrown, that is, variables cannot be
042: * redefined. The right hand side may contain well-formed
043: * text of any length, including no text at all ("<code>$empty=;</code>").
044: * The right hand side may contain embedded <code>UnicodeSet</code>
045: * patterns, for example, "<code>$softvowel=[eiyEIY]</code>".</dd>
046: * <dd> </dd>
047: * <dt><code>ai>$alefmadda;</code></dt>
048: * <dd><strong>Forward translation rule.</strong> This rule
049: * states that the string on the left will be changed to the
050: * string on the right when performing forward
051: * transliteration.</dd>
052: * <dt> </dt>
053: * <dt><code>ai<$alefmadda;</code></dt>
054: * <dd><strong>Reverse translation rule.</strong> This rule
055: * states that the string on the right will be changed to
056: * the string on the left when performing reverse
057: * transliteration.</dd>
058: * </dl>
059: *
060: * <dl>
061: * <dt><code>ai<>$alefmadda;</code></dt>
062: * <dd><strong>Bidirectional translation rule.</strong> This
063: * rule states that the string on the right will be changed
064: * to the string on the left when performing forward
065: * transliteration, and vice versa when performing reverse
066: * transliteration.</dd>
067: * </dl>
068: *
069: * <p>Translation rules consist of a <em>match pattern</em> and an <em>output
070: * string</em>. The match pattern consists of literal characters,
071: * optionally preceded by context, and optionally followed by
072: * context. Context characters, like literal pattern characters,
073: * must be matched in the text being transliterated. However, unlike
074: * literal pattern characters, they are not replaced by the output
075: * text. For example, the pattern "<code>abc{def}</code>"
076: * indicates the characters "<code>def</code>" must be
077: * preceded by "<code>abc</code>" for a successful match.
078: * If there is a successful match, "<code>def</code>" will
079: * be replaced, but not "<code>abc</code>". The final '<code>}</code>'
080: * is optional, so "<code>abc{def</code>" is equivalent to
081: * "<code>abc{def}</code>". Another example is "<code>{123}456</code>"
082: * (or "<code>123}456</code>") in which the literal
083: * pattern "<code>123</code>" must be followed by "<code>456</code>".
084: * </p>
085: *
086: * <p>The output string of a forward or reverse rule consists of
087: * characters to replace the literal pattern characters. If the
088: * output string contains the character '<code>|</code>', this is
089: * taken to indicate the location of the <em>cursor</em> after
090: * replacement. The cursor is the point in the text at which the
091: * next replacement, if any, will be applied. The cursor is usually
092: * placed within the replacement text; however, it can actually be
093: * placed into the precending or following context by using the
094: * special character '<code>@</code>'. Examples:</p>
095: *
096: * <blockquote>
097: * <p><code>a {foo} z > | @ bar; # foo -> bar, move cursor
098: * before a<br>
099: * {foo} xyz > bar @@|; # foo -> bar, cursor between
100: * y and z</code></p>
101: * </blockquote>
102: *
103: * <p><b>UnicodeSet</b></p>
104: *
105: * <p><code>UnicodeSet</code> patterns may appear anywhere that
106: * makes sense. They may appear in variable definitions.
107: * Contrariwise, <code>UnicodeSet</code> patterns may themselves
108: * contain variable references, such as "<code>$a=[a-z];$not_a=[^$a]</code>",
109: * or "<code>$range=a-z;$ll=[$range]</code>".</p>
110: *
111: * <p><code>UnicodeSet</code> patterns may also be embedded directly
112: * into rule strings. Thus, the following two rules are equivalent:</p>
113: *
114: * <blockquote>
115: * <p><code>$vowel=[aeiou]; $vowel>'*'; # One way to do this<br>
116: * [aeiou]>'*';
117: * #
118: * Another way</code></p>
119: * </blockquote>
120: *
121: * <p>See {@link UnicodeSet} for more documentation and examples.</p>
122: *
123: * <p><b>Segments</b></p>
124: *
125: * <p>Segments of the input string can be matched and copied to the
126: * output string. This makes certain sets of rules simpler and more
127: * general, and makes reordering possible. For example:</p>
128: *
129: * <blockquote>
130: * <p><code>([a-z]) > $1 $1;
131: * #
132: * double lowercase letters<br>
133: * ([:Lu:]) ([:Ll:]) > $2 $1; # reverse order of Lu-Ll pairs</code></p>
134: * </blockquote>
135: *
136: * <p>The segment of the input string to be copied is delimited by
137: * "<code>(</code>" and "<code>)</code>". Up to
138: * nine segments may be defined. Segments may not overlap. In the
139: * output string, "<code>$1</code>" through "<code>$9</code>"
140: * represent the input string segments, in left-to-right order of
141: * definition.</p>
142: *
143: * <p><b>Anchors</b></p>
144: *
145: * <p>Patterns can be anchored to the beginning or the end of the text. This is done with the
146: * special characters '<code>^</code>' and '<code>$</code>'. For example:</p>
147: *
148: * <blockquote>
149: * <p><code>^ a > 'BEG_A'; # match 'a' at start of text<br>
150: * a > 'A'; # match other instances
151: * of 'a'<br>
152: * z $ > 'END_Z'; # match 'z' at end of text<br>
153: * z > 'Z'; # match other instances
154: * of 'z'</code></p>
155: * </blockquote>
156: *
157: * <p>It is also possible to match the beginning or the end of the text using a <code>UnicodeSet</code>.
158: * This is done by including a virtual anchor character '<code>$</code>' at the end of the
159: * set pattern. Although this is usually the match chafacter for the end anchor, the set will
160: * match either the beginning or the end of the text, depending on its placement. For
161: * example:</p>
162: *
163: * <blockquote>
164: * <p><code>$x = [a-z$]; # match 'a' through 'z' OR anchor<br>
165: * $x 1 > 2; # match '1' after a-z or at the start<br>
166: * 3 $x > 4; # match '3' before a-z or at the end</code></p>
167: * </blockquote>
168: *
169: * <p><b>Example</b> </p>
170: *
171: * <p>The following example rules illustrate many of the features of
172: * the rule language. </p>
173: *
174: * <table border="0" cellpadding="4">
175: * <tr>
176: * <td valign="top">Rule 1.</td>
177: * <td valign="top" nowrap><code>abc{def}>x|y</code></td>
178: * </tr>
179: * <tr>
180: * <td valign="top">Rule 2.</td>
181: * <td valign="top" nowrap><code>xyz>r</code></td>
182: * </tr>
183: * <tr>
184: * <td valign="top">Rule 3.</td>
185: * <td valign="top" nowrap><code>yz>q</code></td>
186: * </tr>
187: * </table>
188: *
189: * <p>Applying these rules to the string "<code>adefabcdefz</code>"
190: * yields the following results: </p>
191: *
192: * <table border="0" cellpadding="4">
193: * <tr>
194: * <td valign="top" nowrap><code>|adefabcdefz</code></td>
195: * <td valign="top">Initial state, no rules match. Advance
196: * cursor.</td>
197: * </tr>
198: * <tr>
199: * <td valign="top" nowrap><code>a|defabcdefz</code></td>
200: * <td valign="top">Still no match. Rule 1 does not match
201: * because the preceding context is not present.</td>
202: * </tr>
203: * <tr>
204: * <td valign="top" nowrap><code>ad|efabcdefz</code></td>
205: * <td valign="top">Still no match. Keep advancing until
206: * there is a match...</td>
207: * </tr>
208: * <tr>
209: * <td valign="top" nowrap><code>ade|fabcdefz</code></td>
210: * <td valign="top">...</td>
211: * </tr>
212: * <tr>
213: * <td valign="top" nowrap><code>adef|abcdefz</code></td>
214: * <td valign="top">...</td>
215: * </tr>
216: * <tr>
217: * <td valign="top" nowrap><code>adefa|bcdefz</code></td>
218: * <td valign="top">...</td>
219: * </tr>
220: * <tr>
221: * <td valign="top" nowrap><code>adefab|cdefz</code></td>
222: * <td valign="top">...</td>
223: * </tr>
224: * <tr>
225: * <td valign="top" nowrap><code>adefabc|defz</code></td>
226: * <td valign="top">Rule 1 matches; replace "<code>def</code>"
227: * with "<code>xy</code>" and back up the cursor
228: * to before the '<code>y</code>'.</td>
229: * </tr>
230: * <tr>
231: * <td valign="top" nowrap><code>adefabcx|yz</code></td>
232: * <td valign="top">Although "<code>xyz</code>" is
233: * present, rule 2 does not match because the cursor is
234: * before the '<code>y</code>', not before the '<code>x</code>'.
235: * Rule 3 does match. Replace "<code>yz</code>"
236: * with "<code>q</code>".</td>
237: * </tr>
238: * <tr>
239: * <td valign="top" nowrap><code>adefabcxq|</code></td>
240: * <td valign="top">The cursor is at the end;
241: * transliteration is complete.</td>
242: * </tr>
243: * </table>
244: *
245: * <p>The order of rules is significant. If multiple rules may match
246: * at some point, the first matching rule is applied. </p>
247: *
248: * <p>Forward and reverse rules may have an empty output string.
249: * Otherwise, an empty left or right hand side of any statement is a
250: * syntax error. </p>
251: *
252: * <p>Single quotes are used to quote any character other than a
253: * digit or letter. To specify a single quote itself, inside or
254: * outside of quotes, use two single quotes in a row. For example,
255: * the rule "<code>'>'>o''clock</code>" changes the
256: * string "<code>></code>" to the string "<code>o'clock</code>".
257: * </p>
258: *
259: * <p><b>Notes</b> </p>
260: *
261: * <p>While a RuleBasedTransliterator is being built, it checks that
262: * the rules are added in proper order. For example, if the rule
263: * "a>x" is followed by the rule "ab>y",
264: * then the second rule will throw an exception. The reason is that
265: * the second rule can never be triggered, since the first rule
266: * always matches anything it matches. In other words, the first
267: * rule <em>masks</em> the second rule. </p>
268: *
269: * <p>Copyright (c) IBM Corporation 1999-2000. All rights reserved.</p>
270: *
271: * @author Alan Liu
272: * @internal
273: * @deprecated This API is ICU internal only.
274: */
275: public class RuleBasedTransliterator extends Transliterator {
276:
277: private Data data;
278:
279: private static final String COPYRIGHT = "\u00A9 IBM Corporation 1999. All rights reserved.";
280:
281: /**
282: * Constructs a new transliterator from the given rules.
283: * @param rules rules, separated by ';'
284: * @param direction either FORWARD or REVERSE.
285: * @exception IllegalArgumentException if rules are malformed
286: * or direction is invalid.
287: * @internal
288: * @deprecated This API is ICU internal only.
289: */
290: public RuleBasedTransliterator(String ID, String rules,
291: int direction, UnicodeFilter filter) {
292: super (ID, filter);
293: if (direction != FORWARD && direction != REVERSE) {
294: throw new IllegalArgumentException("Invalid direction");
295: }
296:
297: TransliteratorParser parser = new TransliteratorParser();
298: parser.parse(rules, direction);
299: if (parser.idBlockVector.size() != 0
300: || parser.compoundFilter != null) {
301: throw new IllegalArgumentException(
302: "::ID blocks illegal in RuleBasedTransliterator constructor");
303: }
304:
305: data = (Data) parser.dataVector.get(0);
306: setMaximumContextLength(data.ruleSet.getMaximumContextLength());
307: }
308:
309: /**
310: * Constructs a new transliterator from the given rules in the
311: * <code>FORWARD</code> direction.
312: * @param rules rules, separated by ';'
313: * @exception IllegalArgumentException if rules are malformed
314: * or direction is invalid.
315: * @internal
316: * @deprecated This API is ICU internal only.
317: */
318: public RuleBasedTransliterator(String ID, String rules) {
319: this (ID, rules, FORWARD, null);
320: }
321:
322: RuleBasedTransliterator(String ID, Data data, UnicodeFilter filter) {
323: super (ID, filter);
324: this .data = data;
325: setMaximumContextLength(data.ruleSet.getMaximumContextLength());
326: }
327:
328: /**
329: * Implements {@link Transliterator#handleTransliterate}.
330: * @internal
331: * @deprecated This API is ICU internal only.
332: */
333: protected synchronized void handleTransliterate(Replaceable text,
334: Position index, boolean incremental) {
335: /* We keep start and limit fixed the entire time,
336: * relative to the text -- limit may move numerically if text is
337: * inserted or removed. The cursor moves from start to limit, with
338: * replacements happening under it.
339: *
340: * Example: rules 1. ab>x|y
341: * 2. yc>z
342: *
343: * |eabcd start - no match, advance cursor
344: * e|abcd match rule 1 - change text & adjust cursor
345: * ex|ycd match rule 2 - change text & adjust cursor
346: * exz|d no match, advance cursor
347: * exzd| done
348: */
349:
350: /* A rule like
351: * a>b|a
352: * creates an infinite loop. To prevent that, we put an arbitrary
353: * limit on the number of iterations that we take, one that is
354: * high enough that any reasonable rules are ok, but low enough to
355: * prevent a server from hanging. The limit is 16 times the
356: * number of characters n, unless n is so large that 16n exceeds a
357: * uint32_t.
358: */
359: int loopCount = 0;
360: int loopLimit = (index.limit - index.start) << 4;
361: if (loopLimit < 0) {
362: loopLimit = 0x7FFFFFFF;
363: }
364:
365: while (index.start < index.limit && loopCount <= loopLimit
366: && data.ruleSet.transliterate(text, index, incremental)) {
367: ++loopCount;
368: }
369: }
370:
371: static class Data {
372: public Data() {
373: variableNames = new Hashtable();
374: ruleSet = new TransliterationRuleSet();
375: }
376:
377: /**
378: * Rule table. May be empty.
379: */
380: public TransliterationRuleSet ruleSet;
381:
382: /**
383: * Map variable name (String) to variable (char[]). A variable name
384: * corresponds to zero or more characters, stored in a char[] array in
385: * this hash. One or more of these chars may also correspond to a
386: * UnicodeSet, in which case the character in the char[] in this hash is
387: * a stand-in: it is an index for a secondary lookup in
388: * data.variables. The stand-in also represents the UnicodeSet in
389: * the stored rules.
390: */
391: Hashtable variableNames;
392:
393: /**
394: * Map category variable (Character) to UnicodeMatcher or UnicodeReplacer.
395: * Variables that correspond to a set of characters are mapped
396: * from variable name to a stand-in character in data.variableNames.
397: * The stand-in then serves as a key in this hash to lookup the
398: * actual UnicodeSet object. In addition, the stand-in is
399: * stored in the rule text to represent the set of characters.
400: * variables[i] represents character (variablesBase + i).
401: */
402: Object[] variables;
403:
404: /**
405: * The character that represents variables[0]. Characters
406: * variablesBase through variablesBase +
407: * variables.length - 1 represent UnicodeSet objects.
408: */
409: char variablesBase;
410:
411: /**
412: * Return the UnicodeMatcher represented by the given character, or
413: * null if none.
414: */
415: public UnicodeMatcher lookupMatcher(int standIn) {
416: int i = standIn - variablesBase;
417: return (i >= 0 && i < variables.length) ? (UnicodeMatcher) variables[i]
418: : null;
419: }
420:
421: /**
422: * Return the UnicodeReplacer represented by the given character, or
423: * null if none.
424: */
425: public UnicodeReplacer lookupReplacer(int standIn) {
426: int i = standIn - variablesBase;
427: return (i >= 0 && i < variables.length) ? (UnicodeReplacer) variables[i]
428: : null;
429: }
430: }
431:
432: /**
433: * Return a representation of this transliterator as source rules.
434: * These rules will produce an equivalent transliterator if used
435: * to construct a new transliterator.
436: * @param escapeUnprintable if TRUE then convert unprintable
437: * character to their hex escape representations, \\uxxxx or
438: * \\Uxxxxxxxx. Unprintable characters are those other than
439: * U+000A, U+0020..U+007E.
440: * @return rules string
441: * @internal
442: * @deprecated This API is ICU internal only.
443: */
444: public String toRules(boolean escapeUnprintable) {
445: return data.ruleSet.toRules(escapeUnprintable);
446: }
447:
448: /**
449: * Return the set of all characters that may be modified by this
450: * Transliterator, ignoring the effect of our filter.
451: * @internal
452: * @deprecated This API is ICU internal only.
453: */
454: protected UnicodeSet handleGetSourceSet() {
455: return data.ruleSet.getSourceTargetSet(false);
456: }
457:
458: /**
459: * Returns the set of all characters that may be generated as
460: * replacement text by this transliterator.
461: * @internal
462: * @deprecated This API is ICU internal only.
463: */
464: public UnicodeSet getTargetSet() {
465: return data.ruleSet.getSourceTargetSet(true);
466: }
467: }
468:
469: /**
470: * Revision 1.61 2004/02/25 01:26:23 alan
471: * jitterbug 3517: make concrete transilterators package private and @internal
472: *
473: * Revision 1.60 2003/06/03 18:49:35 alan
474: * jitterbug 2959: update copyright dates to include 2003
475: *
476: * Revision 1.59 2003/05/14 19:03:30 rviswanadha
477: * jitterbug 2836: fix compiler warnings
478: *
479: * Revision 1.58 2002/12/03 18:57:36 alan
480: * jitterbug 2087: fix @ tags
481: *
482: * Revision 1.57 2002/07/26 21:12:36 alan
483: * jitterbug 1997: use UCharacterProperty.isRuleWhiteSpace() in parsers
484: *
485: * Revision 1.56 2002/06/28 19:15:52 alan
486: * jitterbug 1434: improve method names; minor cleanup
487: *
488: * Revision 1.55 2002/06/26 18:12:39 alan
489: * jitterbug 1434: initial public implementation of getSourceSet and getTargetSet
490: *
491: * Revision 1.54 2002/02/25 22:43:58 ram
492: * Move Utility class to icu.impl
493: *
494: * Revision 1.53 2002/02/16 03:06:13 Mohan
495: * ICU4J reorganization
496: *
497: * Revision 1.52 2002/02/07 00:53:54 alan
498: * jitterbug 1234: make output side of RBTs object-oriented; rewrite ID parsers and modularize them; implement &Any-Lower() support
499: *
500: * Revision 1.51 2001/11/29 22:31:18 alan
501: * jitterbug 1560: add source-set methods and TransliteratorUtility class
502: *
503: * Revision 1.50 2001/11/27 22:07:33 alan
504: * jitterbug 1389: incorporate Mark's review comments - comments only
505: *
506: * Revision 1.49 2001/10/10 20:26:27 alan
507: * jitterbug 81: initial implementation of compound filters in IDs and ::ID blocks
508: *
509: * Revision 1.48 2001/10/05 18:15:54 alan
510: * jitterbug 74: finish port of Source-Target/Variant code incl. TransliteratorRegistry and tests
511: *
512: * Revision 1.47 2001/10/03 00:14:22 alan
513: * jitterbug 73: finish quantifier and supplemental char support
514: *
515: * Revision 1.46 2001/09/26 18:00:06 alan
516: * jitterbug 67: sync parser with icu4c, allow unlimited, nested segments
517: *
518: * Revision 1.45 2001/09/24 19:57:17 alan
519: * jitterbug 60: implement toPattern in UnicodeSet; update UnicodeFilter.contains to take an int; update UnicodeSet to support code points to U+10FFFF
520: *
521: * Revision 1.44 2001/09/21 21:24:04 alan
522: * jitterbug 64: allow ::ID blocks in rules
523: *
524: * Revision 1.43 2001/09/19 17:43:37 alan
525: * jitterbug 60: initial implementation of toRules()
526: *
527: * Revision 1.42 2001/02/20 17:59:40 alan4j
528: * Remove backslash-u from log
529: *
530: * Revision 1.41 2001/02/16 18:53:55 alan4j
531: * Handle backslash-u escapes
532: *
533: * Revision 1.40 2001/02/03 00:46:21 alan4j
534: * Load RuleBasedTransliterator files from UTF8 files instead of ResourceBundles
535: *
536: * Revision 1.39 2000/08/31 17:11:42 alan4j
537: * Implement anchors.
538: *
539: * Revision 1.38 2000/08/30 20:40:30 alan4j
540: * Implement anchors.
541: *
542: * Revision 1.37 2000/07/12 16:31:36 alan4j
543: * Simplify loop limit logic
544: *
545: * Revision 1.36 2000/06/29 21:59:23 alan4j
546: * Fix handling of Transliterator.Position fields
547: *
548: * Revision 1.35 2000/06/28 20:49:54 alan4j
549: * Fix handling of Positions fields
550: *
551: * Revision 1.34 2000/06/28 20:36:32 alan4j
552: * Clean up Transliterator::Position - rename temporary names
553: *
554: * Revision 1.33 2000/06/28 20:31:43 alan4j
555: * Clean up Transliterator::Position and rename fields (related to jitterbug 450)
556: *
557: * Revision 1.32 2000/05/24 22:21:00 alan4j
558: * Compact UnicodeSets
559: *
560: * Revision 1.31 2000/05/23 16:48:27 alan4j
561: * Fix doc; remove unused auto
562: *
563: * Revision 1.30 2000/05/18 22:49:51 alan
564: * Update docs
565: *
566: * Revision 1.29 2000/04/28 00:25:42 alan
567: * Improve error reporting
568: *
569: * Revision 1.28 2000/04/25 17:38:00 alan
570: * Minor parser cleanup.
571: *
572: * Revision 1.27 2000/04/25 01:42:58 alan
573: * Allow arbitrary length variable values. Clean up Data API. Update javadocs.
574: *
575: * Revision 1.26 2000/04/22 01:25:10 alan
576: * Add support for cursor positioner '@'; update javadoc
577: *
578: * Revision 1.25 2000/04/22 00:08:43 alan
579: * Narrow range to 21 - 7E for mandatory quoting.
580: *
581: * Revision 1.24 2000/04/22 00:03:54 alan
582: * Disallow unquoted special chars. Report multiple errors at once.
583: *
584: * Revision 1.23 2000/04/21 22:23:40 alan
585: * Clean up parseReference. Previous log should read 'delegate', not 'delete'.
586: *
587: * Revision 1.22 2000/04/21 22:16:29 alan
588: * Delete variable name parsing to SymbolTable interface to consolidate parsing code.
589: *
590: * Revision 1.21 2000/04/21 21:16:40 alan
591: * Modify rule syntax
592: *
593: * Revision 1.20 2000/04/19 17:35:23 alan
594: * Update javadoc; fix compile error
595: *
596: * Revision 1.19 2000/04/19 16:34:18 alan
597: * Add segment support.
598: *
599: * Revision 1.18 2000/04/12 20:17:45 alan
600: * Delegate replace operation to rule object
601: *
602: * Revision 1.17 2000/03/10 04:07:23 johnf
603: * Copyright update
604: *
605: * Revision 1.16 2000/02/24 20:46:49 liu
606: * Add infinite loop check
607: *
608: * Revision 1.15 2000/02/10 07:36:25 johnf
609: * fixed imports for com.ibm.icu.impl.Utility
610: *
611: * Revision 1.14 2000/02/03 18:18:42 Alan
612: * Use array rather than hashtable for char-to-set map
613: *
614: * Revision 1.13 2000/01/27 18:59:19 Alan
615: * Use Position rather than int[] and move all subclass overrides to one method (handleTransliterate)
616: *
617: * Revision 1.12 2000/01/18 17:51:09 Alan
618: * Remove "keyboard" from method names. Make maximum context a field of Transliterator, and have subclasses set it.
619: *
620: * Revision 1.11 2000/01/18 02:30:49 Alan
621: * Add Jamo-Hangul, Hangul-Jamo, fix rules, add compound ID support
622: *
623: * Revision 1.10 2000/01/13 23:53:23 Alan
624: * Fix bugs found during ICU port
625: *
626: * Revision 1.9 2000/01/11 04:12:06 Alan
627: * Cleanup, embellish comments
628: *
629: * Revision 1.8 2000/01/11 02:25:03 Alan
630: * Rewrite UnicodeSet and RBT parsers for better performance and new syntax
631: *
632: * Revision 1.7 2000/01/06 01:36:36 Alan
633: * Allow string arrays in rule resource bundles
634: *
635: * Revision 1.6 2000/01/04 21:43:57 Alan
636: * Add rule indexing, and move masking check to TransliterationRuleSet.
637: *
638: * Revision 1.5 1999/12/22 01:40:54 Alan
639: * Consolidate rule pattern anteContext, key, and postContext into one string.
640: *
641: * Revision 1.4 1999/12/22 01:05:54 Alan
642: * Improve masking checking; turn it off by default, for better performance
643: */
|