Source Code Cross Referenced for RuleBasedTransliterator.java in  » Internationalization-Localization » icu4j » com » ibm » icu » text » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Internationalization Localization » icu4j » com.ibm.icu.text 
Source Cross Referenced  Class Diagram Java Document (Java Doc) 


001:        /*
002:         *******************************************************************************
003:         * Copyright (C) 1996-2006, International Business Machines Corporation and    *
004:         * others. All Rights Reserved.                                                *
005:         *******************************************************************************
006:         */
007:        package com.ibm.icu.text;
008:
009:        import java.util.Hashtable;
010:
011:        /**
012:         * <code>RuleBasedTransliterator</code> is a transliterator
013:         * that reads a set of rules in order to determine how to perform
014:         * translations. Rule sets are stored in resource bundles indexed by
015:         * name. Rules within a rule set are separated by semicolons (';').
016:         * To include a literal semicolon, prefix it with a backslash ('\').
017:         * Whitespace, as defined by <code>UCharacterProperty.isRuleWhiteSpace()</code>,
018:         * is ignored. If the first non-blank character on a line is '#',
019:         * the entire line is ignored as a comment. </p>
020:         *
021:         * <p>Each set of rules consists of two groups, one forward, and one
022:         * reverse. This is a convention that is not enforced; rules for one
023:         * direction may be omitted, with the result that translations in
024:         * that direction will not modify the source text. In addition,
025:         * bidirectional forward-reverse rules may be specified for
026:         * symmetrical transformations.</p>
027:         *
028:         * <p><b>Rule syntax</b> </p>
029:         *
030:         * <p>Rule statements take one of the following forms: </p>
031:         *
032:         * <dl>
033:         *     <dt><code>$alefmadda=\u0622;</code></dt>
034:         *     <dd><strong>Variable definition.</strong> The name on the
035:         *         left is assigned the text on the right. In this example,
036:         *         after this statement, instances of the left hand name,
037:         *         &quot;<code>$alefmadda</code>&quot;, will be replaced by
038:         *         the Unicode character U+0622. Variable names must begin
039:         *         with a letter and consist only of letters, digits, and
040:         *         underscores. Case is significant. Duplicate names cause
041:         *         an exception to be thrown, that is, variables cannot be
042:         *         redefined. The right hand side may contain well-formed
043:         *         text of any length, including no text at all (&quot;<code>$empty=;</code>&quot;).
044:         *         The right hand side may contain embedded <code>UnicodeSet</code>
045:         *         patterns, for example, &quot;<code>$softvowel=[eiyEIY]</code>&quot;.</dd>
046:         *     <dd>&nbsp;</dd>
047:         *     <dt><code>ai&gt;$alefmadda;</code></dt>
048:         *     <dd><strong>Forward translation rule.</strong> This rule
049:         *         states that the string on the left will be changed to the
050:         *         string on the right when performing forward
051:         *         transliteration.</dd>
052:         *     <dt>&nbsp;</dt>
053:         *     <dt><code>ai&lt;$alefmadda;</code></dt>
054:         *     <dd><strong>Reverse translation rule.</strong> This rule
055:         *         states that the string on the right will be changed to
056:         *         the string on the left when performing reverse
057:         *         transliteration.</dd>
058:         * </dl>
059:         *
060:         * <dl>
061:         *     <dt><code>ai&lt;&gt;$alefmadda;</code></dt>
062:         *     <dd><strong>Bidirectional translation rule.</strong> This
063:         *         rule states that the string on the right will be changed
064:         *         to the string on the left when performing forward
065:         *         transliteration, and vice versa when performing reverse
066:         *         transliteration.</dd>
067:         * </dl>
068:         *
069:         * <p>Translation rules consist of a <em>match pattern</em> and an <em>output
070:         * string</em>. The match pattern consists of literal characters,
071:         * optionally preceded by context, and optionally followed by
072:         * context. Context characters, like literal pattern characters,
073:         * must be matched in the text being transliterated. However, unlike
074:         * literal pattern characters, they are not replaced by the output
075:         * text. For example, the pattern &quot;<code>abc{def}</code>&quot;
076:         * indicates the characters &quot;<code>def</code>&quot; must be
077:         * preceded by &quot;<code>abc</code>&quot; for a successful match.
078:         * If there is a successful match, &quot;<code>def</code>&quot; will
079:         * be replaced, but not &quot;<code>abc</code>&quot;. The final '<code>}</code>'
080:         * is optional, so &quot;<code>abc{def</code>&quot; is equivalent to
081:         * &quot;<code>abc{def}</code>&quot;. Another example is &quot;<code>{123}456</code>&quot;
082:         * (or &quot;<code>123}456</code>&quot;) in which the literal
083:         * pattern &quot;<code>123</code>&quot; must be followed by &quot;<code>456</code>&quot;.
084:         * </p>
085:         *
086:         * <p>The output string of a forward or reverse rule consists of
087:         * characters to replace the literal pattern characters. If the
088:         * output string contains the character '<code>|</code>', this is
089:         * taken to indicate the location of the <em>cursor</em> after
090:         * replacement. The cursor is the point in the text at which the
091:         * next replacement, if any, will be applied. The cursor is usually
092:         * placed within the replacement text; however, it can actually be
093:         * placed into the precending or following context by using the
094:         * special character '<code>@</code>'. Examples:</p>
095:         *
096:         * <blockquote>
097:         *     <p><code>a {foo} z &gt; | @ bar; # foo -&gt; bar, move cursor
098:         *     before a<br>
099:         *     {foo} xyz &gt; bar @@|; #&nbsp;foo -&gt; bar, cursor between
100:         *     y and z</code></p>
101:         * </blockquote>
102:         *
103:         * <p><b>UnicodeSet</b></p>
104:         *
105:         * <p><code>UnicodeSet</code> patterns may appear anywhere that
106:         * makes sense. They may appear in variable definitions.
107:         * Contrariwise, <code>UnicodeSet</code> patterns may themselves
108:         * contain variable references, such as &quot;<code>$a=[a-z];$not_a=[^$a]</code>&quot;,
109:         * or &quot;<code>$range=a-z;$ll=[$range]</code>&quot;.</p>
110:         *
111:         * <p><code>UnicodeSet</code> patterns may also be embedded directly
112:         * into rule strings. Thus, the following two rules are equivalent:</p>
113:         *
114:         * <blockquote>
115:         *     <p><code>$vowel=[aeiou]; $vowel&gt;'*'; # One way to do this<br>
116:         *     [aeiou]&gt;'*';
117:         *     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#
118:         *     Another way</code></p>
119:         * </blockquote>
120:         *
121:         * <p>See {@link UnicodeSet} for more documentation and examples.</p>
122:         *
123:         * <p><b>Segments</b></p>
124:         *
125:         * <p>Segments of the input string can be matched and copied to the
126:         * output string. This makes certain sets of rules simpler and more
127:         * general, and makes reordering possible. For example:</p>
128:         *
129:         * <blockquote>
130:         *     <p><code>([a-z]) &gt; $1 $1;
131:         *     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#
132:         *     double lowercase letters<br>
133:         *     ([:Lu:]) ([:Ll:]) &gt; $2 $1; # reverse order of Lu-Ll pairs</code></p>
134:         * </blockquote>
135:         *
136:         * <p>The segment of the input string to be copied is delimited by
137:         * &quot;<code>(</code>&quot; and &quot;<code>)</code>&quot;. Up to
138:         * nine segments may be defined. Segments may not overlap. In the
139:         * output string, &quot;<code>$1</code>&quot; through &quot;<code>$9</code>&quot;
140:         * represent the input string segments, in left-to-right order of
141:         * definition.</p>
142:         *
143:         * <p><b>Anchors</b></p>
144:         *
145:         * <p>Patterns can be anchored to the beginning or the end of the text. This is done with the
146:         * special characters '<code>^</code>' and '<code>$</code>'. For example:</p>
147:         *
148:         * <blockquote>
149:         *   <p><code>^ a&nbsp;&nbsp; &gt; 'BEG_A'; &nbsp;&nbsp;# match 'a' at start of text<br>
150:         *   &nbsp; a&nbsp;&nbsp; &gt; 'A';&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; # match other instances
151:         *   of 'a'<br>
152:         *   &nbsp; z $ &gt; 'END_Z'; &nbsp;&nbsp;# match 'z' at end of text<br>
153:         *   &nbsp; z&nbsp;&nbsp; &gt; 'Z';&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; # match other instances
154:         *   of 'z'</code></p>
155:         * </blockquote>
156:         *
157:         * <p>It is also possible to match the beginning or the end of the text using a <code>UnicodeSet</code>.
158:         * This is done by including a virtual anchor character '<code>$</code>' at the end of the
159:         * set pattern. Although this is usually the match chafacter for the end anchor, the set will
160:         * match either the beginning or the end of the text, depending on its placement. For
161:         * example:</p>
162:         *
163:         * <blockquote>
164:         *   <p><code>$x = [a-z$]; &nbsp;&nbsp;# match 'a' through 'z' OR anchor<br>
165:         *   $x 1&nbsp;&nbsp;&nbsp; &gt; 2;&nbsp;&nbsp; # match '1' after a-z or at the start<br>
166:         *   &nbsp;&nbsp; 3 $x &gt; 4; &nbsp;&nbsp;# match '3' before a-z or at the end</code></p>
167:         * </blockquote>
168:         *
169:         * <p><b>Example</b> </p>
170:         *
171:         * <p>The following example rules illustrate many of the features of
172:         * the rule language. </p>
173:         *
174:         * <table border="0" cellpadding="4">
175:         *     <tr>
176:         *         <td valign="top">Rule 1.</td>
177:         *         <td valign="top" nowrap><code>abc{def}&gt;x|y</code></td>
178:         *     </tr>
179:         *     <tr>
180:         *         <td valign="top">Rule 2.</td>
181:         *         <td valign="top" nowrap><code>xyz&gt;r</code></td>
182:         *     </tr>
183:         *     <tr>
184:         *         <td valign="top">Rule 3.</td>
185:         *         <td valign="top" nowrap><code>yz&gt;q</code></td>
186:         *     </tr>
187:         * </table>
188:         *
189:         * <p>Applying these rules to the string &quot;<code>adefabcdefz</code>&quot;
190:         * yields the following results: </p>
191:         *
192:         * <table border="0" cellpadding="4">
193:         *     <tr>
194:         *         <td valign="top" nowrap><code>|adefabcdefz</code></td>
195:         *         <td valign="top">Initial state, no rules match. Advance
196:         *         cursor.</td>
197:         *     </tr>
198:         *     <tr>
199:         *         <td valign="top" nowrap><code>a|defabcdefz</code></td>
200:         *         <td valign="top">Still no match. Rule 1 does not match
201:         *         because the preceding context is not present.</td>
202:         *     </tr>
203:         *     <tr>
204:         *         <td valign="top" nowrap><code>ad|efabcdefz</code></td>
205:         *         <td valign="top">Still no match. Keep advancing until
206:         *         there is a match...</td>
207:         *     </tr>
208:         *     <tr>
209:         *         <td valign="top" nowrap><code>ade|fabcdefz</code></td>
210:         *         <td valign="top">...</td>
211:         *     </tr>
212:         *     <tr>
213:         *         <td valign="top" nowrap><code>adef|abcdefz</code></td>
214:         *         <td valign="top">...</td>
215:         *     </tr>
216:         *     <tr>
217:         *         <td valign="top" nowrap><code>adefa|bcdefz</code></td>
218:         *         <td valign="top">...</td>
219:         *     </tr>
220:         *     <tr>
221:         *         <td valign="top" nowrap><code>adefab|cdefz</code></td>
222:         *         <td valign="top">...</td>
223:         *     </tr>
224:         *     <tr>
225:         *         <td valign="top" nowrap><code>adefabc|defz</code></td>
226:         *         <td valign="top">Rule 1 matches; replace &quot;<code>def</code>&quot;
227:         *         with &quot;<code>xy</code>&quot; and back up the cursor
228:         *         to before the '<code>y</code>'.</td>
229:         *     </tr>
230:         *     <tr>
231:         *         <td valign="top" nowrap><code>adefabcx|yz</code></td>
232:         *         <td valign="top">Although &quot;<code>xyz</code>&quot; is
233:         *         present, rule 2 does not match because the cursor is
234:         *         before the '<code>y</code>', not before the '<code>x</code>'.
235:         *         Rule 3 does match. Replace &quot;<code>yz</code>&quot;
236:         *         with &quot;<code>q</code>&quot;.</td>
237:         *     </tr>
238:         *     <tr>
239:         *         <td valign="top" nowrap><code>adefabcxq|</code></td>
240:         *         <td valign="top">The cursor is at the end;
241:         *         transliteration is complete.</td>
242:         *     </tr>
243:         * </table>
244:         *
245:         * <p>The order of rules is significant. If multiple rules may match
246:         * at some point, the first matching rule is applied. </p>
247:         *
248:         * <p>Forward and reverse rules may have an empty output string.
249:         * Otherwise, an empty left or right hand side of any statement is a
250:         * syntax error. </p>
251:         *
252:         * <p>Single quotes are used to quote any character other than a
253:         * digit or letter. To specify a single quote itself, inside or
254:         * outside of quotes, use two single quotes in a row. For example,
255:         * the rule &quot;<code>'&gt;'&gt;o''clock</code>&quot; changes the
256:         * string &quot;<code>&gt;</code>&quot; to the string &quot;<code>o'clock</code>&quot;.
257:         * </p>
258:         *
259:         * <p><b>Notes</b> </p>
260:         *
261:         * <p>While a RuleBasedTransliterator is being built, it checks that
262:         * the rules are added in proper order. For example, if the rule
263:         * &quot;a&gt;x&quot; is followed by the rule &quot;ab&gt;y&quot;,
264:         * then the second rule will throw an exception. The reason is that
265:         * the second rule can never be triggered, since the first rule
266:         * always matches anything it matches. In other words, the first
267:         * rule <em>masks</em> the second rule. </p>
268:         *
269:         * <p>Copyright (c) IBM Corporation 1999-2000. All rights reserved.</p>
270:         *
271:         * @author Alan Liu
272:         * @internal
273:         * @deprecated This API is ICU internal only.
274:         */
275:        public class RuleBasedTransliterator extends Transliterator {
276:
277:            private Data data;
278:
279:            private static final String COPYRIGHT = "\u00A9 IBM Corporation 1999. All rights reserved.";
280:
281:            /**
282:             * Constructs a new transliterator from the given rules.
283:             * @param rules rules, separated by ';'
284:             * @param direction either FORWARD or REVERSE.
285:             * @exception IllegalArgumentException if rules are malformed
286:             * or direction is invalid.
287:             * @internal
288:             * @deprecated This API is ICU internal only.
289:             */
290:            public RuleBasedTransliterator(String ID, String rules,
291:                    int direction, UnicodeFilter filter) {
292:                super (ID, filter);
293:                if (direction != FORWARD && direction != REVERSE) {
294:                    throw new IllegalArgumentException("Invalid direction");
295:                }
296:
297:                TransliteratorParser parser = new TransliteratorParser();
298:                parser.parse(rules, direction);
299:                if (parser.idBlockVector.size() != 0
300:                        || parser.compoundFilter != null) {
301:                    throw new IllegalArgumentException(
302:                            "::ID blocks illegal in RuleBasedTransliterator constructor");
303:                }
304:
305:                data = (Data) parser.dataVector.get(0);
306:                setMaximumContextLength(data.ruleSet.getMaximumContextLength());
307:            }
308:
309:            /**
310:             * Constructs a new transliterator from the given rules in the
311:             * <code>FORWARD</code> direction.
312:             * @param rules rules, separated by ';'
313:             * @exception IllegalArgumentException if rules are malformed
314:             * or direction is invalid.
315:             * @internal
316:             * @deprecated This API is ICU internal only.
317:             */
318:            public RuleBasedTransliterator(String ID, String rules) {
319:                this (ID, rules, FORWARD, null);
320:            }
321:
322:            RuleBasedTransliterator(String ID, Data data, UnicodeFilter filter) {
323:                super (ID, filter);
324:                this .data = data;
325:                setMaximumContextLength(data.ruleSet.getMaximumContextLength());
326:            }
327:
328:            /**
329:             * Implements {@link Transliterator#handleTransliterate}.
330:             * @internal
331:             * @deprecated This API is ICU internal only.
332:             */
333:            protected synchronized void handleTransliterate(Replaceable text,
334:                    Position index, boolean incremental) {
335:                /* We keep start and limit fixed the entire time,
336:                 * relative to the text -- limit may move numerically if text is
337:                 * inserted or removed.  The cursor moves from start to limit, with
338:                 * replacements happening under it.
339:                 *
340:                 * Example: rules 1. ab>x|y
341:                 *                2. yc>z
342:                 *
343:                 * |eabcd   start - no match, advance cursor
344:                 * e|abcd   match rule 1 - change text & adjust cursor
345:                 * ex|ycd   match rule 2 - change text & adjust cursor
346:                 * exz|d    no match, advance cursor
347:                 * exzd|    done
348:                 */
349:
350:                /* A rule like
351:                 *   a>b|a
352:                 * creates an infinite loop. To prevent that, we put an arbitrary
353:                 * limit on the number of iterations that we take, one that is
354:                 * high enough that any reasonable rules are ok, but low enough to
355:                 * prevent a server from hanging.  The limit is 16 times the
356:                 * number of characters n, unless n is so large that 16n exceeds a
357:                 * uint32_t.
358:                 */
359:                int loopCount = 0;
360:                int loopLimit = (index.limit - index.start) << 4;
361:                if (loopLimit < 0) {
362:                    loopLimit = 0x7FFFFFFF;
363:                }
364:
365:                while (index.start < index.limit && loopCount <= loopLimit
366:                        && data.ruleSet.transliterate(text, index, incremental)) {
367:                    ++loopCount;
368:                }
369:            }
370:
371:            static class Data {
372:                public Data() {
373:                    variableNames = new Hashtable();
374:                    ruleSet = new TransliterationRuleSet();
375:                }
376:
377:                /**
378:                 * Rule table.  May be empty.
379:                 */
380:                public TransliterationRuleSet ruleSet;
381:
382:                /**
383:                 * Map variable name (String) to variable (char[]).  A variable name
384:                 * corresponds to zero or more characters, stored in a char[] array in
385:                 * this hash.  One or more of these chars may also correspond to a
386:                 * UnicodeSet, in which case the character in the char[] in this hash is
387:                 * a stand-in: it is an index for a secondary lookup in
388:                 * data.variables.  The stand-in also represents the UnicodeSet in
389:                 * the stored rules.
390:                 */
391:                Hashtable variableNames;
392:
393:                /**
394:                 * Map category variable (Character) to UnicodeMatcher or UnicodeReplacer.
395:                 * Variables that correspond to a set of characters are mapped
396:                 * from variable name to a stand-in character in data.variableNames.
397:                 * The stand-in then serves as a key in this hash to lookup the
398:                 * actual UnicodeSet object.  In addition, the stand-in is
399:                 * stored in the rule text to represent the set of characters.
400:                 * variables[i] represents character (variablesBase + i).
401:                 */
402:                Object[] variables;
403:
404:                /**
405:                 * The character that represents variables[0].  Characters
406:                 * variablesBase through variablesBase +
407:                 * variables.length - 1 represent UnicodeSet objects.
408:                 */
409:                char variablesBase;
410:
411:                /**
412:                 * Return the UnicodeMatcher represented by the given character, or
413:                 * null if none.
414:                 */
415:                public UnicodeMatcher lookupMatcher(int standIn) {
416:                    int i = standIn - variablesBase;
417:                    return (i >= 0 && i < variables.length) ? (UnicodeMatcher) variables[i]
418:                            : null;
419:                }
420:
421:                /**
422:                 * Return the UnicodeReplacer represented by the given character, or
423:                 * null if none.
424:                 */
425:                public UnicodeReplacer lookupReplacer(int standIn) {
426:                    int i = standIn - variablesBase;
427:                    return (i >= 0 && i < variables.length) ? (UnicodeReplacer) variables[i]
428:                            : null;
429:                }
430:            }
431:
432:            /**
433:             * Return a representation of this transliterator as source rules.
434:             * These rules will produce an equivalent transliterator if used
435:             * to construct a new transliterator.
436:             * @param escapeUnprintable if TRUE then convert unprintable
437:             * character to their hex escape representations, \\uxxxx or
438:     * \\Uxxxxxxxx.  Unprintable characters are those other than
439:     * U+000A, U+0020..U+007E.
440:     * @return rules string
441:     * @internal
442:     * @deprecated This API is ICU internal only.
443:     */
444:            public String toRules(boolean escapeUnprintable) {
445:                return data.ruleSet.toRules(escapeUnprintable);
446:            }
447:
448:            /**
449:             * Return the set of all characters that may be modified by this
450:             * Transliterator, ignoring the effect of our filter.
451:             * @internal
452:             * @deprecated This API is ICU internal only.
453:             */
454:            protected UnicodeSet handleGetSourceSet() {
455:                return data.ruleSet.getSourceTargetSet(false);
456:            }
457:
458:            /**
459:             * Returns the set of all characters that may be generated as
460:             * replacement text by this transliterator.
461:             * @internal
462:             * @deprecated This API is ICU internal only.
463:             */
464:            public UnicodeSet getTargetSet() {
465:                return data.ruleSet.getSourceTargetSet(true);
466:            }
467:        }
468:
469:        /**
470:         * Revision 1.61  2004/02/25 01:26:23  alan
471:         * jitterbug 3517: make concrete transilterators package private and @internal
472:         *
473:         * Revision 1.60  2003/06/03 18:49:35  alan
474:         * jitterbug 2959: update copyright dates to include 2003
475:         *
476:         * Revision 1.59  2003/05/14 19:03:30  rviswanadha
477:         * jitterbug 2836: fix compiler warnings
478:         *
479:         * Revision 1.58  2002/12/03 18:57:36  alan
480:         * jitterbug 2087: fix @ tags
481:         *
482:         * Revision 1.57  2002/07/26 21:12:36  alan
483:         * jitterbug 1997: use UCharacterProperty.isRuleWhiteSpace() in parsers
484:         *
485:         * Revision 1.56  2002/06/28 19:15:52  alan
486:         * jitterbug 1434: improve method names; minor cleanup
487:         *
488:         * Revision 1.55  2002/06/26 18:12:39  alan
489:         * jitterbug 1434: initial public implementation of getSourceSet and getTargetSet
490:         *
491:         * Revision 1.54  2002/02/25 22:43:58  ram
492:         * Move Utility class to icu.impl
493:         *
494:         * Revision 1.53  2002/02/16 03:06:13  Mohan
495:         * ICU4J reorganization
496:         *
497:         * Revision 1.52  2002/02/07 00:53:54  alan
498:         * jitterbug 1234: make output side of RBTs object-oriented; rewrite ID parsers and modularize them; implement &Any-Lower() support
499:         *
500:         * Revision 1.51  2001/11/29 22:31:18  alan
501:         * jitterbug 1560: add source-set methods and TransliteratorUtility class
502:         *
503:         * Revision 1.50  2001/11/27 22:07:33  alan
504:         * jitterbug 1389: incorporate Mark's review comments - comments only
505:         *
506:         * Revision 1.49  2001/10/10 20:26:27  alan
507:         * jitterbug 81: initial implementation of compound filters in IDs and ::ID blocks
508:         *
509:         * Revision 1.48  2001/10/05 18:15:54  alan
510:         * jitterbug 74: finish port of Source-Target/Variant code incl. TransliteratorRegistry and tests
511:         *
512:         * Revision 1.47  2001/10/03 00:14:22  alan
513:         * jitterbug 73: finish quantifier and supplemental char support
514:         *
515:         * Revision 1.46  2001/09/26 18:00:06  alan
516:         * jitterbug 67: sync parser with icu4c, allow unlimited, nested segments
517:         *
518:         * Revision 1.45  2001/09/24 19:57:17  alan
519:         * jitterbug 60: implement toPattern in UnicodeSet; update UnicodeFilter.contains to take an int; update UnicodeSet to support code points to U+10FFFF
520:         *
521:         * Revision 1.44  2001/09/21 21:24:04  alan
522:         * jitterbug 64: allow ::ID blocks in rules
523:         *
524:         * Revision 1.43  2001/09/19 17:43:37  alan
525:         * jitterbug 60: initial implementation of toRules()
526:         *
527:         * Revision 1.42  2001/02/20 17:59:40  alan4j
528:         * Remove backslash-u from log
529:         *
530:         * Revision 1.41  2001/02/16 18:53:55  alan4j
531:         * Handle backslash-u escapes
532:         *
533:         * Revision 1.40  2001/02/03 00:46:21  alan4j
534:         * Load RuleBasedTransliterator files from UTF8 files instead of ResourceBundles
535:         *
536:         * Revision 1.39  2000/08/31 17:11:42  alan4j
537:         * Implement anchors.
538:         *
539:         * Revision 1.38  2000/08/30 20:40:30  alan4j
540:         * Implement anchors.
541:         *
542:         * Revision 1.37  2000/07/12 16:31:36  alan4j
543:         * Simplify loop limit logic
544:         *
545:         * Revision 1.36  2000/06/29 21:59:23  alan4j
546:         * Fix handling of Transliterator.Position fields
547:         *
548:         * Revision 1.35  2000/06/28 20:49:54  alan4j
549:         * Fix handling of Positions fields
550:         *
551:         * Revision 1.34  2000/06/28 20:36:32  alan4j
552:         * Clean up Transliterator::Position - rename temporary names
553:         *
554:         * Revision 1.33  2000/06/28 20:31:43  alan4j
555:         * Clean up Transliterator::Position and rename fields (related to jitterbug 450)
556:         *
557:         * Revision 1.32  2000/05/24 22:21:00  alan4j
558:         * Compact UnicodeSets
559:         *
560:         * Revision 1.31  2000/05/23 16:48:27  alan4j
561:         * Fix doc; remove unused auto
562:         *
563:         * Revision 1.30  2000/05/18 22:49:51  alan
564:         * Update docs
565:         *
566:         * Revision 1.29  2000/04/28 00:25:42  alan
567:         * Improve error reporting
568:         *
569:         * Revision 1.28  2000/04/25 17:38:00  alan
570:         * Minor parser cleanup.
571:         *
572:         * Revision 1.27  2000/04/25 01:42:58  alan
573:         * Allow arbitrary length variable values. Clean up Data API. Update javadocs.
574:         *
575:         * Revision 1.26  2000/04/22 01:25:10  alan
576:         * Add support for cursor positioner '@'; update javadoc
577:         *
578:         * Revision 1.25  2000/04/22 00:08:43  alan
579:         * Narrow range to 21 - 7E for mandatory quoting.
580:         *
581:         * Revision 1.24  2000/04/22 00:03:54  alan
582:         * Disallow unquoted special chars. Report multiple errors at once.
583:         *
584:         * Revision 1.23  2000/04/21 22:23:40  alan
585:         * Clean up parseReference. Previous log should read 'delegate', not 'delete'.
586:         *
587:         * Revision 1.22  2000/04/21 22:16:29  alan
588:         * Delete variable name parsing to SymbolTable interface to consolidate parsing code.
589:         *
590:         * Revision 1.21  2000/04/21 21:16:40  alan
591:         * Modify rule syntax
592:         *
593:         * Revision 1.20  2000/04/19 17:35:23  alan
594:         * Update javadoc; fix compile error
595:         *
596:         * Revision 1.19  2000/04/19 16:34:18  alan
597:         * Add segment support.
598:         *
599:         * Revision 1.18  2000/04/12 20:17:45  alan
600:         * Delegate replace operation to rule object
601:         *
602:         * Revision 1.17  2000/03/10 04:07:23  johnf
603:         * Copyright update
604:         *
605:         * Revision 1.16  2000/02/24 20:46:49  liu
606:         * Add infinite loop check
607:         *
608:         * Revision 1.15  2000/02/10 07:36:25  johnf
609:         * fixed imports for com.ibm.icu.impl.Utility
610:         *
611:         * Revision 1.14  2000/02/03 18:18:42  Alan
612:         * Use array rather than hashtable for char-to-set map
613:         *
614:         * Revision 1.13  2000/01/27 18:59:19  Alan
615:         * Use Position rather than int[] and move all subclass overrides to one method (handleTransliterate)
616:         *
617:         * Revision 1.12  2000/01/18 17:51:09  Alan
618:         * Remove "keyboard" from method names. Make maximum context a field of Transliterator, and have subclasses set it.
619:         *
620:         * Revision 1.11  2000/01/18 02:30:49  Alan
621:         * Add Jamo-Hangul, Hangul-Jamo, fix rules, add compound ID support
622:         *
623:         * Revision 1.10  2000/01/13 23:53:23  Alan
624:         * Fix bugs found during ICU port
625:         *
626:         * Revision 1.9  2000/01/11 04:12:06  Alan
627:         * Cleanup, embellish comments
628:         *
629:         * Revision 1.8  2000/01/11 02:25:03  Alan
630:         * Rewrite UnicodeSet and RBT parsers for better performance and new syntax
631:         *
632:         * Revision 1.7  2000/01/06 01:36:36  Alan
633:         * Allow string arrays in rule resource bundles
634:         *
635:         * Revision 1.6  2000/01/04 21:43:57  Alan
636:         * Add rule indexing, and move masking check to TransliterationRuleSet.
637:         *
638:         * Revision 1.5  1999/12/22 01:40:54  Alan
639:         * Consolidate rule pattern anteContext, key, and postContext into one string.
640:         *
641:         * Revision 1.4  1999/12/22 01:05:54  Alan
642:         * Improve masking checking; turn it off by default, for better performance
643:         */
www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.