Java Doc for RuleBasedTransliterator.java in » Internationalization-Localization » icu4j » com » ibm » icu » text » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation

1.	6.0 JDK Core
2.	6.0 JDK Modules
3.	6.0 JDK Modules com.sun
4.	6.0 JDK Modules com.sun.java
5.	6.0 JDK Modules sun
6.	6.0 JDK Platform
7.	Ajax
8.	Apache Harmony Java SE
9.	Aspect oriented
10.	Authentication Authorization
11.	Blogger System
12.	Build
13.	Byte Code
14.	Cache
15.	Chart
16.	Chat
17.	Code Analyzer
18.	Collaboration
19.	Content Management System
20.	Database Client
21.	Database DBMS
22.	Database JDBC Connection Pool
23.	Database ORM
24.	Development
25.	EJB Server geronimo
26.	EJB Server GlassFish
27.	EJB Server JBoss 4.2.1
28.	EJB Server resin 3.1.5
29.	ERP CRM Financial
30.	ESB
31.	Forum
32.	GIS
33.	Graphic Library
34.	Groupware
35.	HTML Parser
36.	IDE
37.	IDE Eclipse
38.	IDE Netbeans
39.	Installer
40.	Internationalization Localization
41.	Inversion of Control
42.	Issue Tracking
43.	J2EE
44.	JBoss
45.	JMS
46.	JMX
47.	Library
48.	Mail Clients
49.	Net
50.	Parser
51.	PDF
52.	Portal
53.	Profiler
54.	Project Management
55.	Report
56.	RSS RDF
57.	Rule Engine
58.	Science
59.	Scripting
60.	Search Engine
61.	Security
62.	Sevlet Container
63.	Source Control
64.	Swing Library
65.	Template Engine
66.	Test Coverage
67.	Testing
68.	UML
69.	Web Crawler
70.	Web Framework
71.	Web Mail
72.	Web Server
73.	Web Services
74.	Web Services apache cxf 2.0.1
75.	Web Services AXIS2
76.	Wiki Engine
77.	Workflow Engines
78.	XML
79.	XML UI

Java

Java Tutorial

Illustrator Tutorials

GIMP Tutorials

C# / C Sharp

C# / CSharp Tutorial

C# / CSharp Open Source

SQL Server / T-SQL Tutorial

Oracle PL / SQL

Oracle PL/SQL Tutorial

Flash / Flex / ActionScript

VBA / Excel / Access / Word

XML

XML Tutorial

Microsoft Office PowerPoint 2007 Tutorial

Microsoft Office Excel 2007 Tutorial

Microsoft Office Word 2007 Tutorial

Java Source Code / Java Documentation » Internationalization Localization » icu4j » com.ibm.icu.text

Source Cross Reference Class Diagram Java Document (Java Doc)

java.lang .Object

com.ibm.icu.text .Transliterator

com.ibm.icu.text .RuleBasedTransliterator

RuleBasedTransliterator

public class RuleBasedTransliterator extends Transliterator (Code)

RuleBasedTransliterator is a transliterator that reads a set of rules in order to determine how to perform translations. Rule sets are stored in resource bundles indexed by name. Rules within a rule set are separated by semicolons (';'). To include a literal semicolon, prefix it with a backslash ('\'). Whitespace, as defined by UCharacterProperty.isRuleWhiteSpace(), is ignored. If the first non-blank character on a line is '#', the entire line is ignored as a comment.

Each set of rules consists of two groups, one forward, and one reverse. This is a convention that is not enforced; rules for one direction may be omitted, with the result that translations in that direction will not modify the source text. In addition, bidirectional forward-reverse rules may be specified for symmetrical transformations.

Rule syntax

Rule statements take one of the following forms:

$alefmadda=\u0622;: Variable definition. The name on the left is assigned the text on the right. In this example, after this statement, instances of the left hand name, "$alefmadda", will be replaced by the Unicode character U+0622. Variable names must begin with a letter and consist only of letters, digits, and underscores. Case is significant. Duplicate names cause an exception to be thrown, that is, variables cannot be redefined. The right hand side may contain well-formed text of any length, including no text at all ("$empty=;"). The right hand side may contain embedded UnicodeSet patterns, for example, "$softvowel=[eiyEIY]".
ai>$alefmadda;: Forward translation rule. This rule states that the string on the left will be changed to the string on the right when performing forward transliteration.

ai<$alefmadda;: Reverse translation rule. This rule states that the string on the right will be changed to the string on the left when performing reverse transliteration.

ai<>$alefmadda;: Bidirectional translation rule. This rule states that the string on the right will be changed to the string on the left when performing forward transliteration, and vice versa when performing reverse transliteration.

Translation rules consist of a match pattern and an output string. The match pattern consists of literal characters, optionally preceded by context, and optionally followed by context. Context characters, like literal pattern characters, must be matched in the text being transliterated. However, unlike literal pattern characters, they are not replaced by the output text. For example, the pattern "abc{def}" indicates the characters "def" must be preceded by "abc" for a successful match. If there is a successful match, "def" will be replaced, but not "abc". The final '}' is optional, so "abc{def" is equivalent to "abc{def}". Another example is "{123}456" (or "123}456") in which the literal pattern "123" must be followed by "456".

The output string of a forward or reverse rule consists of characters to replace the literal pattern characters. If the output string contains the character '|', this is taken to indicate the location of the cursor after replacement. The cursor is the point in the text at which the next replacement, if any, will be applied. The cursor is usually placed within the replacement text; however, it can actually be placed into the precending or following context by using the special character '@'. Examples:

a {foo} z > | @ bar; # foo -> bar, move cursor before a {foo} xyz > bar @@|; # foo -> bar, cursor between y and z

UnicodeSet

UnicodeSet patterns may appear anywhere that makes sense. They may appear in variable definitions. Contrariwise, UnicodeSet patterns may themselves contain variable references, such as "$a=[a-z];$not_a=[^$a]", or "$range=a-z;$ll=[$range]".

UnicodeSet patterns may also be embedded directly into rule strings. Thus, the following two rules are equivalent:

$vowel=[aeiou]; $vowel>'*'; # One way to do this [aeiou]>'*'; # Another way

See UnicodeSet for more documentation and examples.

Segments

Segments of the input string can be matched and copied to the output string. This makes certain sets of rules simpler and more general, and makes reordering possible. For example:

([a-z]) > $1 $1; # double lowercase letters ([:Lu:]) ([:Ll:]) > $2 $1; # reverse order of Lu-Ll pairs

The segment of the input string to be copied is delimited by "(" and ")". Up to nine segments may be defined. Segments may not overlap. In the output string, "$1" through "$9" represent the input string segments, in left-to-right order of definition.

Anchors

Patterns can be anchored to the beginning or the end of the text. This is done with the special characters '^' and '$'. For example:

^ a > 'BEG_A'; # match 'a' at start of text a > 'A'; # match other instances of 'a' z $ > 'END_Z'; # match 'z' at end of text z > 'Z'; # match other instances of 'z'

It is also possible to match the beginning or the end of the text using a UnicodeSet. This is done by including a virtual anchor character '$' at the end of the set pattern. Although this is usually the match chafacter for the end anchor, the set will match either the beginning or the end of the text, depending on its placement. For example:

$x = [a-z$]; # match 'a' through 'z' OR anchor $x 1 > 2; # match '1' after a-z or at the start 3 $x > 4; # match '3' before a-z or at the end

Example

The following example rules illustrate many of the features of the rule language.

Rule 1.	`abc{def}>x\|y`
Rule 2.	`xyz>r`
Rule 3.	`yz>q`

Applying these rules to the string "adefabcdefz" yields the following results:

`\|adefabcdefz`	Initial state, no rules match. Advance cursor.
`a\|defabcdefz`	Still no match. Rule 1 does not match because the preceding context is not present.
`ad\|efabcdefz`	Still no match. Keep advancing until there is a match...
`ade\|fabcdefz`	...
`adef\|abcdefz`	...
`adefa\|bcdefz`	...
`adefab\|cdefz`	...
`adefabc\|defz`	Rule 1 matches; replace "`def`" with "`xy`" and back up the cursor to before the '`y`'.
`adefabcx\|yz`	Although "`xyz`" is present, rule 2 does not match because the cursor is before the '`y`', not before the '`x`'. Rule 3 does match. Replace "`yz`" with "`q`".
`adefabcxq\|`	The cursor is at the end; transliteration is complete.

The order of rules is significant. If multiple rules may match at some point, the first matching rule is applied.

Forward and reverse rules may have an empty output string. Otherwise, an empty left or right hand side of any statement is a syntax error.

Single quotes are used to quote any character other than a digit or letter. To specify a single quote itself, inside or outside of quotes, use two single quotes in a row. For example, the rule "'>'>o''clock" changes the string ">" to the string "o'clock".

Notes

While a RuleBasedTransliterator is being built, it checks that the rules are added in proper order. For example, if the rule "a>x" is followed by the rule "ab>y", then the second rule will throw an exception. The reason is that the second rule can never be triggered, since the first rule always matches anything it matches. In other words, the first rule masks the second rule.

author:
Alan Liu

Inner Class :static class Data

Constructor Summary
public	RuleBasedTransliterator(String ID, String rules, int direction, UnicodeFilter filter) Constructs a new transliterator from the given rules.
public	RuleBasedTransliterator(String ID, String rules) Constructs a new transliterator from the given rules in the `FORWARD` direction.
	RuleBasedTransliterator(String ID, Data data, UnicodeFilter filter)

Method Summary
public UnicodeSet	getTargetSet() Returns the set of all characters that may be generated as replacement text by this transliterator.
protected UnicodeSet	handleGetSourceSet() Return the set of all characters that may be modified by this Transliterator, ignoring the effect of our filter.
protected synchronized void	handleTransliterate(Replaceable text, Position index, boolean incremental) Implements Transliterator.handleTransliterate .
public String	toRules(boolean escapeUnprintable) Return a representation of this transliterator as source rules. These rules will produce an equivalent transliterator if used to construct a new transliterator. Parameters: escapeUnprintable - if TRUE then convert unprintablecharacter to their hex escape representations, \\uxxxx or\\Uxxxxxxxx.

Constructor Detail

RuleBasedTransliterator
public RuleBasedTransliterator(String ID, String rules, int direction, UnicodeFilter filter)(Code)
	Constructs a new transliterator from the given rules. Parameters: rules - rules, separated by ';' Parameters: direction - either FORWARD or REVERSE. exception: IllegalArgumentException - if rules are malformedor direction is invalid.

RuleBasedTransliterator
public RuleBasedTransliterator(String ID, String rules)(Code)
	Constructs a new transliterator from the given rules in the `FORWARD` direction. Parameters: rules - rules, separated by ';' exception: IllegalArgumentException - if rules are malformedor direction is invalid.

RuleBasedTransliterator
RuleBasedTransliterator(String ID, Data data, UnicodeFilter filter)(Code)

Method Detail

getTargetSet
public UnicodeSet getTargetSet()(Code)
	Returns the set of all characters that may be generated as replacement text by this transliterator.

handleGetSourceSet
protected UnicodeSet handleGetSourceSet()(Code)
	Return the set of all characters that may be modified by this Transliterator, ignoring the effect of our filter.

handleTransliterate
protected synchronized void handleTransliterate(Replaceable text, Position index, boolean incremental)(Code)
	Implements Transliterator.handleTransliterate .

toRules
public String toRules(boolean escapeUnprintable)(Code)
	Return a representation of this transliterator as source rules. These rules will produce an equivalent transliterator if used to construct a new transliterator. Parameters: escapeUnprintable - if TRUE then convert unprintablecharacter to their hex escape representations, \\uxxxx or\\Uxxxxxxxx. Unprintable characters are those other thanU+000A, U+0020..U+007E. rules string

Fields inherited from com.ibm.icu.text.Transliterator

final static boolean DEBUG(Code)(Java Doc)
final public static int FORWARD(Code)(Java Doc)
final static char ID_DELIM(Code)(Java Doc)
final static char ID_SEP(Code)(Java Doc)
final public static int REVERSE(Code)(Java Doc)
final static char VARIANT_SEP(Code)(Java Doc)

Methods inherited from com.ibm.icu.text.Transliterator

final protected String baseToRules(boolean escapeUnprintable)(Code)(Java Doc)
final public static Transliterator createFromRules(String ID, String rules, int dir)(Code)(Java Doc)
public void filteredTransliterate(Replaceable text, Position index, boolean incremental)(Code)(Java Doc)
final public void finishTransliteration(Replaceable text, Position index)(Code)(Java Doc)
final public static Enumeration getAvailableIDs()(Code)(Java Doc)
final public static Enumeration getAvailableSources()(Code)(Java Doc)
final public static Enumeration getAvailableTargets(String source)(Code)(Java Doc)
final public static Enumeration getAvailableVariants(String source, String target)(Code)(Java Doc)
static Transliterator getBasicInstance(String id, String canonID)(Code)(Java Doc)
final public static String getDisplayName(String ID)(Code)(Java Doc)
public static String getDisplayName(String id, Locale inLocale)(Code)(Java Doc)
public static String getDisplayName(String id, ULocale inLocale)(Code)(Java Doc)
public Transliterator[] getElements()(Code)(Java Doc)
final public UnicodeFilter getFilter()(Code)(Java Doc)
final public String getID()(Code)(Java Doc)
final public static Transliterator getInstance(String ID)(Code)(Java Doc)
public static Transliterator getInstance(String ID, int dir)(Code)(Java Doc)
final public Transliterator getInverse()(Code)(Java Doc)
final public int getMaximumContextLength()(Code)(Java Doc)
final public UnicodeSet getSourceSet()(Code)(Java Doc)
public UnicodeSet getTargetSet()(Code)(Java Doc)
protected UnicodeSet handleGetSourceSet()(Code)(Java Doc)
abstract protected void handleTransliterate(Replaceable text, Position pos, boolean incremental)(Code)(Java Doc)
public static void registerAlias(String aliasID, String realID)(Code)(Java Doc)
public static void registerClass(String ID, Class transClass, String displayName)(Code)(Java Doc)
public static void registerFactory(String ID, Factory factory)(Code)(Java Doc)
public static void registerInstance(Transliterator trans)(Code)(Java Doc)
static void registerInstance(Transliterator trans, boolean visible)(Code)(Java Doc)
static void registerSpecialInverse(String target, String inverseTarget, boolean bidirectional)(Code)(Java Doc)
public void setFilter(UnicodeFilter filter)(Code)(Java Doc)
final protected void setID(String id)(Code)(Java Doc)
protected void setMaximumContextLength(int a)(Code)(Java Doc)
public String toRules(boolean escapeUnprintable)(Code)(Java Doc)
final public int transliterate(Replaceable text, int start, int limit)(Code)(Java Doc)
final public void transliterate(Replaceable text)(Code)(Java Doc)
final public String transliterate(String text)(Code)(Java Doc)
final public void transliterate(Replaceable text, Position index, String insertion)(Code)(Java Doc)
final public void transliterate(Replaceable text, Position index, int insertion)(Code)(Java Doc)
final public void transliterate(Replaceable text, Position index)(Code)(Java Doc)
public static void unregister(String ID)(Code)(Java Doc)

Methods inherited from java.lang.Object

native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us

All other trademarks are property of their respective owners.