Source Code Cross Referenced for RuleBasedCollator.java in » 6.0-JDK-Modules » j2me » java » text » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1.	6.0 JDK Core
2.	6.0 JDK Modules
3.	6.0 JDK Modules com.sun
4.	6.0 JDK Modules com.sun.java
5.	6.0 JDK Modules sun
6.	6.0 JDK Platform
7.	Ajax
8.	Apache Harmony Java SE
9.	Aspect oriented
10.	Authentication Authorization
11.	Blogger System
12.	Build
13.	Byte Code
14.	Cache
15.	Chart
16.	Chat
17.	Code Analyzer
18.	Collaboration
19.	Content Management System
20.	Database Client
21.	Database DBMS
22.	Database JDBC Connection Pool
23.	Database ORM
24.	Development
25.	EJB Server geronimo
26.	EJB Server GlassFish
27.	EJB Server JBoss 4.2.1
28.	EJB Server resin 3.1.5
29.	ERP CRM Financial
30.	ESB
31.	Forum
32.	GIS
33.	Graphic Library
34.	Groupware
35.	HTML Parser
36.	IDE
37.	IDE Eclipse
38.	IDE Netbeans
39.	Installer
40.	Internationalization Localization
41.	Inversion of Control
42.	Issue Tracking
43.	J2EE
44.	JBoss
45.	JMS
46.	JMX
47.	Library
48.	Mail Clients
49.	Net
50.	Parser
51.	PDF
52.	Portal
53.	Profiler
54.	Project Management
55.	Report
56.	RSS RDF
57.	Rule Engine
58.	Science
59.	Scripting
60.	Search Engine
61.	Security
62.	Sevlet Container
63.	Source Control
64.	Swing Library
65.	Template Engine
66.	Test Coverage
67.	Testing
68.	UML
69.	Web Crawler
70.	Web Framework
71.	Web Mail
72.	Web Server
73.	Web Services
74.	Web Services apache cxf 2.0.1
75.	Web Services AXIS2
76.	Wiki Engine
77.	Workflow Engines
78.	XML
79.	XML UI
Java
Java Tutorial
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » 6.0 JDK Modules » j2me » java.text
Source Cross Referenced Class Diagram Java Document (Java Doc)
001:        /*
002:         *
003:         * @(#)RuleBasedCollator.java	1.36 06/10/03
004:         *
005:         * Portions Copyright  2000-2006 Sun Microsystems, Inc. All Rights
006:         * Reserved.  Use is subject to license terms.
007:         * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER
008:         * 
009:         * This program is free software; you can redistribute it and/or
010:         * modify it under the terms of the GNU General Public License version
011:         * 2 only, as published by the Free Software Foundation.
012:         * 
013:         * This program is distributed in the hope that it will be useful, but
014:         * WITHOUT ANY WARRANTY; without even the implied warranty of
015:         * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
016:         * General Public License version 2 for more details (a copy is
017:         * included at /legal/license.txt).
018:         * 
019:         * You should have received a copy of the GNU General Public License
020:         * version 2 along with this work; if not, write to the Free Software
021:         * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
022:         * 02110-1301 USA
023:         * 
024:         * Please contact Sun Microsystems, Inc., 4150 Network Circle, Santa
025:         * Clara, CA 95054 or visit www.sun.com if you need additional
026:         * information or have any questions.
027:         */
028:
029:        /*
030:         * (C) Copyright Taligent, Inc. 1996, 1997 - All Rights Reserved
031:         * (C) Copyright IBM Corp. 1996-1998 - All Rights Reserved
032:         *
033:         *   The original version of this source code and documentation is copyrighted
034:         * and owned by Taligent, Inc., a wholly-owned subsidiary of IBM. These
035:         * materials are provided under terms of a License Agreement between Taligent
036:         * and Sun. This technology is protected by multiple US and International
037:         * patents. This notice and attribution to Taligent may not be removed.
038:         *   Taligent is a registered trademark of Taligent, Inc.
039:         *
040:         */
041:
042:        package java.text;
043:
044:        import java.util.Vector;
045:        import java.util.Locale;
046:        import sun.text.Normalizer;
047:        import sun.text.NormalizerUtilities;
048:
049:        /**
050:         * The <code>RuleBasedCollator</code> class is a concrete subclass of
051:         * <code>Collator</code> that provides a simple, data-driven, table
052:         * collator.  With this class you can create a customized table-based
053:         * <code>Collator</code>.  <code>RuleBasedCollator</code> maps
054:         * characters to sort keys.
055:         *
056:         * <p>
057:         * <code>RuleBasedCollator</code> has the following restrictions
058:         * for efficiency (other subclasses may be used for more complex languages) :
059:         * <ol>
060:         * <li>If a special collation rule controlled by a &lt;modifier&gt; is
061:         specified it applies to the whole collator object.
062:         * <li>All non-mentioned characters are at the end of the
063:         *     collation order.
064:         * </ol>
065:         *
066:         * <p>
067:         * The collation table is composed of a list of collation rules, where each
068:         * rule is of one of three forms:
069:         * <pre>
070:         *    &lt;modifier&gt;
071:         *    &lt;relation&gt; &lt;text-argument&gt;
072:         *    &lt;reset&gt; &lt;text-argument&gt;
073:         * </pre>
074:         * The definitions of the rule elements is as follows:
075:         * <UL Type=disc>
076:         *    <LI><strong>Text-Argument</strong>: A text-argument is any sequence of
077:         *        characters, excluding special characters (that is, common
078:         *        whitespace characters [0009-000D, 0020] and rule syntax characters
079:         *        [0021-002F, 003A-0040, 005B-0060, 007B-007E]). If those
080:         *        characters are desired, you can put them in single quotes
081:         *        (e.g. ampersand => '&'). Note that unquoted white space characters
082:         *        are ignored; e.g. <code>b c</code> is treated as <code>bc</code>.
083:         *    <LI><strong>Modifier</strong>: There are currently two modifiers that 
084:         *        turn on special collation rules.
085:         *        <UL Type=square>
086:         *            <LI>'@' : Turns on backwards sorting of accents (secondary
087:         *                      differences), as in French.
088:         *            <LI>'!' : Turns on Thai/Lao vowel-consonant swapping.  If this
089:         *                      rule is in force when a Thai vowel of the range
090:         *                      &#92;U0E40-&#92;U0E44 precedes a Thai consonant of the range
091:         *                      &#92;U0E01-&#92;U0E2E OR a Lao vowel of the range &#92;U0EC0-&#92;U0EC4
092:         *                      precedes a Lao consonant of the range &#92;U0E81-&#92;U0EAE then
093:         *                      the vowel is placed after the consonant for collation
094:         *                      purposes.
095:         *        </UL>
096:         *        <p>'@' : Indicates that accents are sorted backwards, as in French.
097:         *    <LI><strong>Relation</strong>: The relations are the following:
098:         *        <UL Type=square>
099:         *            <LI>'&lt;' : Greater, as a letter difference (primary)
100:         *            <LI>';' : Greater, as an accent difference (secondary)
101:         *            <LI>',' : Greater, as a case difference (tertiary)
102:         *            <LI>'=' : Equal
103:         *        </UL>
104:         *    <LI><strong>Reset</strong>: There is a single reset
105:         *        which is used primarily for contractions and expansions, but which
106:         *        can also be used to add a modification at the end of a set of rules.
107:         *        <p>'&' : Indicates that the next rule follows the position to where
108:         *            the reset text-argument would be sorted.
109:         * </UL>
110:         *
111:         * <p>
112:         * This sounds more complicated than it is in practice. For example, the
113:         * following are equivalent ways of expressing the same thing:
114:         * <blockquote>
115:         * <pre>
116:         * a &lt; b &lt; c
117:         * a &lt; b &amp; b &lt; c
118:         * a &lt; c &amp; a &lt; b
119:         * </pre>
120:         * </blockquote>
121:         * Notice that the order is important, as the subsequent item goes immediately
122:         * after the text-argument. The following are not equivalent:
123:         * <blockquote>
124:         * <pre>
125:         * a &lt; b &amp; a &lt; c
126:         * a &lt; c &amp; a &lt; b
127:         * </pre>
128:         * </blockquote>
129:         * Either the text-argument must already be present in the sequence, or some
130:         * initial substring of the text-argument must be present. (e.g. "a &lt; b &amp; ae &lt; 
131:         * e" is valid since "a" is present in the sequence before "ae" is reset). In
132:         * this latter case, "ae" is not entered and treated as a single character;
133:         * instead, "e" is sorted as if it were expanded to two characters: "a"
134:         * followed by an "e". This difference appears in natural languages: in
135:         * traditional Spanish "ch" is treated as though it contracts to a single
136:         * character (expressed as "c &lt; ch &lt; d"), while in traditional German
137:         * a-umlaut is treated as though it expanded to two characters
138:         * (expressed as "a,A &lt; b,B ... &amp;ae;&#92;u00e3&amp;AE;&#92;u00c3").
139:         * [&#92;u00e3 and &#92;u00c3 are, of course, the escape sequences for a-umlaut.]
140:         * <p>
141:         * <strong>Ignorable Characters</strong>
142:         * <p>
143:         * For ignorable characters, the first rule must start with a relation (the
144:         * examples we have used above are really fragments; "a &lt; b" really should be
145:         * "&lt; a &lt; b"). If, however, the first relation is not "&lt;", then all the all
146:         * text-arguments up to the first "&lt;" are ignorable. For example, ", - &lt; a &lt; b"
147:         * makes "-" an ignorable character, as we saw earlier in the word
148:         * "black-birds". In the samples for different languages, you see that most
149:         * accents are ignorable.
150:         *
151:         * <p><strong>Normalization and Accents</strong>
152:         * <p>
153:         * <code>RuleBasedCollator</code> automatically processes its rule table to
154:         * include both pre-composed and combining-character versions of
155:         * accented characters.  Even if the provided rule string contains only
156:         * base characters and separate combining accent characters, the pre-composed
157:         * accented characters matching all canonical combinations of characters from
158:         * the rule string will be entered in the table.
159:         * <p>
160:         * This allows you to use a RuleBasedCollator to compare accented strings
161:         * even when the collator is set to NO_DECOMPOSITION.  There are two caveats,
162:         * however.  First, if the strings to be collated contain combining
163:         * sequences that may not be in canonical order, you should set the collator to
164:         * CANONICAL_DECOMPOSITION or FULL_DECOMPOSITION to enable sorting of
165:         * combining sequences.  Second, if the strings contain characters with
166:         * compatibility decompositions (such as full-width and half-width forms),
167:         * you must use FULL_DECOMPOSITION, since the rule tables only include
168:         * canonical mappings.
169:         *
170:         * <p><strong>Errors</strong>
171:         * <p>
172:         * The following are errors:
173:         * <UL Type=disc>
174:         *     <LI>A text-argument contains unquoted punctuation symbols
175:         *        (e.g. "a &lt; b-c &lt; d").
176:         *     <LI>A relation or reset character not followed by a text-argument
177:         *        (e.g. "a &lt; ,b").
178:         *     <LI>A reset where the text-argument (or an initial substring of the
179:         *         text-argument) is not already in the sequence.
180:         *         (e.g. "a &lt; b &amp; e &lt; f")
181:         * </UL>
182:         * If you produce one of these errors, a <code>RuleBasedCollator</code> throws
183:         * a <code>ParseException</code>.
184:         * 
185:         * <p><strong>Examples</strong>
186:         * <p>Simple:     "&lt; a &lt; b &lt; c &lt; d"
187:         * <p>Norwegian:  "&lt; a,A&lt; b,B&lt; c,C&lt; d,D&lt; e,E&lt; f,F&lt; g,G&lt; h,H&lt; i,I&lt; j,J
188:         *                 &lt; k,K&lt; l,L&lt; m,M&lt; n,N&lt; o,O&lt; p,P&lt; q,Q&lt; r,R&lt; s,S&lt; t,T
189:         *                 &lt; u,U&lt; v,V&lt; w,W&lt; x,X&lt; y,Y&lt; z,Z
190:         *                 &lt; &#92;u00E5=a&#92;u030A,&#92;u00C5=A&#92;u030A
191:         *                 ;aa,AA&lt; &#92;u00E6,&#92;u00C6&lt; &#92;u00F8,&#92;u00D8"
192:         *
193:         * <p>
194:         * Normally, to create a rule-based Collator object, you will use
195:         * <code>Collator</code>'s factory method <code>getInstance</code>.
196:         * However, to create a rule-based Collator object with specialized
197:         * rules tailored to your needs, you construct the <code>RuleBasedCollator</code>
198:         * with the rules contained in a <code>String</code> object. For example:
199:         * <blockquote>
200:         * <pre>
201:         * String Simple = "&lt; a&lt; b&lt; c&lt; d";
202:         * RuleBasedCollator mySimple = new RuleBasedCollator(Simple);
203:         * </pre>
204:         * </blockquote>
205:         * Or:
206:         * <blockquote>
207:         * <pre>
208:         * String Norwegian = "&lt; a,A&lt; b,B&lt; c,C&lt; d,D&lt; e,E&lt; f,F&lt; g,G&lt; h,H&lt; i,I&lt; j,J" +
209:         *                 "&lt; k,K&lt; l,L&lt; m,M&lt; n,N&lt; o,O&lt; p,P&lt; q,Q&lt; r,R&lt; s,S&lt; t,T" +
210:         *                 "&lt; u,U&lt; v,V&lt; w,W&lt; x,X&lt; y,Y&lt; z,Z" +
211:         *                 "&lt; &#92;u00E5=a&#92;u030A,&#92;u00C5=A&#92;u030A" +
212:         *                 ";aa,AA&lt; &#92;u00E6,&#92;u00C6&lt; &#92;u00F8,&#92;u00D8";
213:         * RuleBasedCollator myNorwegian = new RuleBasedCollator(Norwegian);
214:         * </pre>
215:         * </blockquote>
216:         *
217:         * <p>
218:         * Combining <code>Collator</code>s is as simple as concatenating strings.
219:         * Here's an example that combines two <code>Collator</code>s from two
220:         * different locales:
221:         * <blockquote>
222:         * <pre>
223:         * // Create an en_US Collator object
224:         * RuleBasedCollator en_USCollator = (RuleBasedCollator)
225:         *     Collator.getInstance(new Locale("en", "US", ""));
226:         * // Create a da_DK Collator object
227:         * RuleBasedCollator da_DKCollator = (RuleBasedCollator)
228:         *     Collator.getInstance(new Locale("da", "DK", ""));
229:         * // Combine the two
230:         * // First, get the collation rules from en_USCollator
231:         * String en_USRules = en_USCollator.getRules();
232:         * // Second, get the collation rules from da_DKCollator
233:         * String da_DKRules = da_DKCollator.getRules();
234:         * RuleBasedCollator newCollator =
235:         *     new RuleBasedCollator(en_USRules + da_DKRules);
236:         * // newCollator has the combined rules
237:         * </pre>
238:         * </blockquote>
239:         *
240:         * <p>
241:         * Another more interesting example would be to make changes on an existing
242:         * table to create a new <code>Collator</code> object.  For example, add
243:         * "&amp;C&lt; ch, cH, Ch, CH" to the <code>en_USCollator</code> object to create
244:         * your own:
245:         * <blockquote>
246:         * <pre>
247:         * // Create a new Collator object with additional rules
248:         * String addRules = "&amp;C&lt; ch, cH, Ch, CH";
249:         * RuleBasedCollator myCollator =
250:         *     new RuleBasedCollator(en_USCollator + addRules);
251:         * // myCollator contains the new rules
252:         * </pre>
253:         * </blockquote>
254:         *
255:         * <p>
256:         * The following example demonstrates how to change the order of
257:         * non-spacing accents,
258:         * <blockquote>
259:         * <pre>
260:         * // old rule
261:         * String oldRules = "=&#92;u0301;&#92;u0300;&#92;u0302;&#92;u0308"    // main accents
262:         *                 + ";&#92;u0327;&#92;u0303;&#92;u0304;&#92;u0305"    // main accents
263:         *                 + ";&#92;u0306;&#92;u0307;&#92;u0309;&#92;u030A"    // main accents
264:         *                 + ";&#92;u030B;&#92;u030C;&#92;u030D;&#92;u030E"    // main accents
265:         *                 + ";&#92;u030F;&#92;u0310;&#92;u0311;&#92;u0312"    // main accents
266:         *                 + "&lt; a , A ; ae, AE ; &#92;u00e6 , &#92;u00c6"
267:         *                 + "&lt; b , B &lt; c, C &lt; e, E & C &lt; d, D";
268:         * // change the order of accent characters
269:         * String addOn = "& &#92;u0300 ; &#92;u0308 ; &#92;u0302";
270:         * RuleBasedCollator myCollator = new RuleBasedCollator(oldRules + addOn);
271:         * </pre>
272:         * </blockquote>
273:         *
274:         * <p>
275:         * The last example shows how to put new primary ordering in before the
276:         * default setting. For example, in Japanese <code>Collator</code>, you
277:         * can either sort English characters before or after Japanese characters,
278:         * <blockquote>
279:         * <pre>
280:         * // get en_US Collator rules
281:         * RuleBasedCollator en_USCollator = (RuleBasedCollator)Collator.getInstance(Locale.US);
282:         * // add a few Japanese character to sort before English characters
283:         * // suppose the last character before the first base letter 'a' in
284:         * // the English collation rule is &#92;u2212
285:         * String jaString = "& &#92;u2212 &lt; &#92;u3041, &#92;u3042 &lt; &#92;u3043, &#92;u3044";
286:         * RuleBasedCollator myJapaneseCollator = new
287:         *     RuleBasedCollator(en_USCollator.getRules() + jaString);
288:         * </pre>
289:         * </blockquote>
290:         *
291:         * @see        Collator
292:         * @see        CollationElementIterator
293:         * @version    1.25 07/24/98
294:         * @author     Helena Shih, Laura Werner, Richard Gillam
295:         */
296:        public class RuleBasedCollator extends Collator {
297:            // IMPLEMENTATION NOTES:  The implementation of the collation algorithm is
298:            // divided across three classes: RuleBasedCollator, RBCollationTables, and
299:            // CollationElementIterator.  RuleBasedCollator contains the collator's
300:            // transient state and includes the code that uses the other classes to
301:            // implement comparison and sort-key building.  RuleBasedCollator also
302:            // contains the logic to handle French secondary accent sorting.
303:            // A RuleBasedCollator has two CollationElementIterators.  State doesn't
304:            // need to be preserved in these objects between calls to compare() or
305:            // getCollationKey(), but the objects persist anyway to avoid wasting extra
306:            // creation time.  compare() and getCollationKey() are synchronized to ensure
307:            // thread safety with this scheme.  The CollationElementIterator is responsible
308:            // for generating collation elements from strings and returning one element at
309:            // a time (sometimes there's a one-to-many or many-to-one mapping between
310:            // characters and collation elements-- this class handles that).
311:            // CollationElementIterator depends on RBCollationTables, which contains the
312:            // collator's static state.  RBCollationTables contains the actual data
313:            // tables specifying the collation order of characters for a particular locale
314:            // or use.  It also contains the base logic that CollationElementIterator
315:            // uses to map from characters to collation elements.  A single RBCollationTables
316:            // object is shared among all RuleBasedCollators for the same locale, and
317:            // thus by all the CollationElementIterators they create.
318:
319:            /**
320:             * RuleBasedCollator constructor.  This takes the table rules and builds
321:             * a collation table out of them.  Please see RuleBasedCollator class
322:             * description for more details on the collation rule syntax.
323:             * @see java.util.Locale
324:             * @param rules the collation rules to build the collation table from.
325:             * @exception ParseException A format exception
326:             * will be thrown if the build process of the rules fails. For
327:             * example, build rule "a < ? < d" will cause the constructor to
328:             * throw the ParseException because the '?' is not quoted.
329:             */
330:            public RuleBasedCollator(String rules) throws ParseException {
331:                this (rules, Collator.CANONICAL_DECOMPOSITION);
332:            }
333:
334:            /**
335:             * RuleBasedCollator constructor.  This takes the table rules and builds
336:             * a collation table out of them.  Please see RuleBasedCollator class
337:             * description for more details on the collation rule syntax.
338:             * @see java.util.Locale
339:             * @param rules the collation rules to build the collation table from.
340:             * @param decomp the decomposition strength used to build the
341:             * collation table and to perform comparisons.
342:             * @exception ParseException A format exception
343:             * will be thrown if the build process of the rules fails. For
344:             * example, build rule "a < ? < d" will cause the constructor to
345:             * throw the ParseException because the '?' is not quoted.
346:             */
347:            RuleBasedCollator(String rules, int decomp) throws ParseException {
348:                setStrength(Collator.TERTIARY);
349:                setDecomposition(decomp);
350:                tables = new RBCollationTables(rules, decomp);
351:            }
352:
353:            /**
354:             * "Copy constructor."  Used in clone() for performance.
355:             */
356:            private RuleBasedCollator(RuleBasedCollator that) {
357:                setStrength(that.getStrength());
358:                setDecomposition(that.getDecomposition());
359:                tables = that.tables;
360:            }
361:
362:            /**
363:             * Gets the table-based rules for the collation object.
364:             * @return returns the collation rules that the table collation object
365:             * was created from.
366:             */
367:            public String getRules() {
368:                return tables.getRules();
369:            }
370:
371:            /**
372:             * Return a CollationElementIterator for the given String.
373:             * @see java.text.CollationElementIterator
374:             */
375:            public CollationElementIterator getCollationElementIterator(
376:                    String source) {
377:                return new CollationElementIterator(source, this );
378:            }
379:
380:            /**
381:             * Return a CollationElementIterator for the given String.
382:             * @see java.text.CollationElementIterator
383:             * @since 1.2
384:             */
385:            public CollationElementIterator getCollationElementIterator(
386:                    CharacterIterator source) {
387:                return new CollationElementIterator(source, this );
388:            }
389:
390:            /**
391:             * Compares the character data stored in two different strings based on the
392:             * collation rules.  Returns information about whether a string is less
393:             * than, greater than or equal to another string in a language.
394:             * This can be overriden in a subclass.
395:             */
396:            public synchronized int compare(String source, String target) {
397:                // The basic algorithm here is that we use CollationElementIterators
398:                // to step through both the source and target strings.  We compare each
399:                // collation element in the source string against the corresponding one
400:                // in the target, checking for differences.
401:                //
402:                // If a difference is found, we set <result> to LESS or GREATER to
403:                // indicate whether the source string is less or greater than the target.
404:                //
405:                // However, it's not that simple.  If we find a tertiary difference
406:                // (e.g. 'A' vs. 'a') near the beginning of a string, it can be
407:                // overridden by a primary difference (e.g. "A" vs. "B") later in
408:                // the string.  For example, "AA" < "aB", even though 'A' > 'a'.
409:                //
410:                // To keep track of this, we use strengthResult to keep track of the
411:                // strength of the most significant difference that has been found
412:                // so far.  When we find a difference whose strength is greater than
413:                // strengthResult, it overrides the last difference (if any) that
414:                // was found.
415:
416:                int result = Collator.EQUAL;
417:
418:                if (sourceCursor == null) {
419:                    sourceCursor = getCollationElementIterator(source);
420:                } else {
421:                    sourceCursor.setText(source);
422:                }
423:                if (targetCursor == null) {
424:                    targetCursor = getCollationElementIterator(target);
425:                } else {
426:                    targetCursor.setText(target);
427:                }
428:
429:                int sOrder = 0, tOrder = 0;
430:
431:                boolean initialCheckSecTer = getStrength() >= Collator.SECONDARY;
432:                boolean checkSecTer = initialCheckSecTer;
433:                boolean checkTertiary = getStrength() >= Collator.TERTIARY;
434:
435:                boolean gets = true, gett = true;
436:
437:                while (true) {
438:                    // Get the next collation element in each of the strings, unless
439:                    // we've been requested to skip it.
440:                    if (gets)
441:                        sOrder = sourceCursor.next();
442:                    else
443:                        gets = true;
444:                    if (gett)
445:                        tOrder = targetCursor.next();
446:                    else
447:                        gett = true;
448:
449:                    // If we've hit the end of one of the strings, jump out of the loop
450:                    if ((sOrder == CollationElementIterator.NULLORDER)
451:                            || (tOrder == CollationElementIterator.NULLORDER))
452:                        break;
453:
454:                    int pSOrder = CollationElementIterator.primaryOrder(sOrder);
455:                    int pTOrder = CollationElementIterator.primaryOrder(tOrder);
456:
457:                    // If there's no difference at this position, we can skip it
458:                    if (sOrder == tOrder) {
459:                        if (tables.isFrenchSec() && pSOrder != 0) {
460:                            if (!checkSecTer) {
461:                                // in french, a secondary difference more to the right is stronger,
462:                                // so accents have to be checked with each base element
463:                                checkSecTer = initialCheckSecTer;
464:                                // but tertiary differences are less important than the first
465:                                // secondary difference, so checking tertiary remains disabled
466:                                checkTertiary = false;
467:                            }
468:                        }
469:                        continue;
470:                    }
471:
472:                    // Compare primary differences first.
473:                    if (pSOrder != pTOrder) {
474:                        if (sOrder == 0) {
475:                            // The entire source element is ignorable.
476:                            // Skip to the next source element, but don't fetch another target element.
477:                            gett = false;
478:                            continue;
479:                        }
480:                        if (tOrder == 0) {
481:                            gets = false;
482:                            continue;
483:                        }
484:
485:                        // The source and target elements aren't ignorable, but it's still possible
486:                        // for the primary component of one of the elements to be ignorable....
487:
488:                        if (pSOrder == 0) // primary order in source is ignorable
489:                        {
490:                            // The source's primary is ignorable, but the target's isn't.  We treat ignorables
491:                            // as a secondary difference, so remember that we found one.
492:                            if (checkSecTer) {
493:                                result = Collator.GREATER; // (strength is SECONDARY)
494:                                checkSecTer = false;
495:                            }
496:                            // Skip to the next source element, but don't fetch another target element.
497:                            gett = false;
498:                        } else if (pTOrder == 0) {
499:                            // record differences - see the comment above.
500:                            if (checkSecTer) {
501:                                result = Collator.LESS; // (strength is SECONDARY)
502:                                checkSecTer = false;
503:                            }
504:                            // Skip to the next source element, but don't fetch another target element.
505:                            gets = false;
506:                        } else {
507:                            // Neither of the orders is ignorable, and we already know that the primary
508:                            // orders are different because of the (pSOrder != pTOrder) test above.
509:                            // Record the difference and stop the comparison.
510:                            if (pSOrder < pTOrder) {
511:                                return Collator.LESS; // (strength is PRIMARY)
512:                            } else {
513:                                return Collator.GREATER; // (strength is PRIMARY)
514:                            }
515:                        }
516:                    } else { // else of if ( pSOrder != pTOrder )
517:                        // primary order is the same, but complete order is different. So there
518:                        // are no base elements at this point, only ignorables (Since the strings are
519:                        // normalized)
520:
521:                        if (checkSecTer) {
522:                            // a secondary or tertiary difference may still matter
523:                            short secSOrder = CollationElementIterator
524:                                    .secondaryOrder(sOrder);
525:                            short secTOrder = CollationElementIterator
526:                                    .secondaryOrder(tOrder);
527:                            if (secSOrder != secTOrder) {
528:                                // there is a secondary difference
529:                                result = (secSOrder < secTOrder) ? Collator.LESS
530:                                        : Collator.GREATER;
531:                                // (strength is SECONDARY)
532:                                checkSecTer = false;
533:                                // (even in french, only the first secondary difference within
534:                                //  a base character matters)
535:                            } else {
536:                                if (checkTertiary) {
537:                                    // a tertiary difference may still matter
538:                                    short terSOrder = CollationElementIterator
539:                                            .tertiaryOrder(sOrder);
540:                                    short terTOrder = CollationElementIterator
541:                                            .tertiaryOrder(tOrder);
542:                                    if (terSOrder != terTOrder) {
543:                                        // there is a tertiary difference
544:                                        result = (terSOrder < terTOrder) ? Collator.LESS
545:                                                : Collator.GREATER;
546:                                        // (strength is TERTIARY)
547:                                        checkTertiary = false;
548:                                    }
549:                                }
550:                            }
551:                        } // if (checkSecTer)
552:
553:                    } // if ( pSOrder != pTOrder )
554:                } // while()
555:
556:                if (sOrder != CollationElementIterator.NULLORDER) {
557:                    // (tOrder must be CollationElementIterator::NULLORDER,
558:                    //  since this point is only reached when sOrder or tOrder is NULLORDER.)
559:                    // The source string has more elements, but the target string hasn't.
560:                    do {
561:                        if (CollationElementIterator.primaryOrder(sOrder) != 0) {
562:                            // We found an additional non-ignorable base character in the source string.
563:                            // This is a primary difference, so the source is greater
564:                            return Collator.GREATER; // (strength is PRIMARY)
565:                        } else if (CollationElementIterator
566:                                .secondaryOrder(sOrder) != 0) {
567:                            // Additional secondary elements mean the source string is greater
568:                            if (checkSecTer) {
569:                                result = Collator.GREATER; // (strength is SECONDARY)
570:                                checkSecTer = false;
571:                            }
572:                        }
573:                    } while ((sOrder = sourceCursor.next()) != CollationElementIterator.NULLORDER);
574:                } else if (tOrder != CollationElementIterator.NULLORDER) {
575:                    // The target string has more elements, but the source string hasn't.
576:                    do {
577:                        if (CollationElementIterator.primaryOrder(tOrder) != 0)
578:                            // We found an additional non-ignorable base character in the target string.
579:                            // This is a primary difference, so the source is less
580:                            return Collator.LESS; // (strength is PRIMARY)
581:                        else if (CollationElementIterator
582:                                .secondaryOrder(tOrder) != 0) {
583:                            // Additional secondary elements in the target mean the source string is less
584:                            if (checkSecTer) {
585:                                result = Collator.LESS; // (strength is SECONDARY)
586:                                checkSecTer = false;
587:                            }
588:                        }
589:                    } while ((tOrder = targetCursor.next()) != CollationElementIterator.NULLORDER);
590:                }
591:
592:                // For IDENTICAL comparisons, we use a bitwise character comparison
593:                // as a tiebreaker if all else is equal
594:                if (result == 0 && getStrength() == IDENTICAL) {
595:                    Normalizer.Mode mode = NormalizerUtilities
596:                            .toNormalizerMode(getDecomposition());
597:                    String sourceDecomposition = Normalizer.normalize(source,
598:                            mode, 0);
599:                    String targetDecomposition = Normalizer.normalize(target,
600:                            mode, 0);
601:                    result = sourceDecomposition.compareTo(targetDecomposition);
602:                }
603:                return result;
604:            }
605:
606:            /**
607:             * Transforms the string into a series of characters that can be compared
608:             * with CollationKey.compareTo. This overrides java.text.Collator.getCollationKey.
609:             * It can be overriden in a subclass.
610:             */
611:            public synchronized CollationKey getCollationKey(String source) {
612:                //
613:                // The basic algorithm here is to find all of the collation elements for each
614:                // character in the source string, convert them to a char representation,
615:                // and put them into the collation key.  
616:                // Each collation element in a string has three components: primary (A vs B),
617:                // secondary (A vs A-acute), and tertiary (A' vs a); and a primary difference
618:                // at the end of a string takes precedence over a secondary or tertiary
619:                // difference earlier in the string.
620:                //
621:                // To account for this, we put all of the primary orders at the beginning of the
622:                // string, followed by the secondary and tertiary orders, separated by nulls.
623:                //
624:                // Here's a hypothetical example, with the collation element represented as
625:                // a three-digit number, one digit for primary, one for secondary, etc.
626:                //
627:                // String:              A     a     B   \u00e9 <--(e-acute)
628:                // Collation Elements: 101   100   201  510
629:                //
630:                // Collation Key:      1125<null>0001<null>1010
631:                //
632:                // Secondary differences (accent marks) are compared
633:                // starting at the *end* of the string in languages with French secondary ordering.
634:                // But when comparing the accent marks on a single base character, they are compared
635:                // from the beginning.  To handle this, we reverse all of the accents that belong
636:                // to each base character, then we reverse the entire string of secondary orderings
637:                // at the end.  Taking the same example above, a French collator might return
638:                // this instead:
639:                //
640:                // Collation Key:      1125<null>1000<null>1010
641:                //
642:                if (source == null)
643:                    return null;
644:
645:                if (primResult == null) {
646:                    primResult = new StringBuffer();
647:                    secResult = new StringBuffer();
648:                    terResult = new StringBuffer();
649:                } else {
650:                    primResult.setLength(0);
651:                    secResult.setLength(0);
652:                    terResult.setLength(0);
653:                }
654:                int order = 0;
655:                boolean compareSec = (getStrength() >= Collator.SECONDARY);
656:                boolean compareTer = (getStrength() >= Collator.TERTIARY);
657:                int secOrder = CollationElementIterator.NULLORDER;
658:                int terOrder = CollationElementIterator.NULLORDER;
659:                int preSecIgnore = 0;
660:
661:                if (sourceCursor == null) {
662:                    sourceCursor = getCollationElementIterator(source);
663:                } else {
664:                    sourceCursor.setText(source);
665:                }
666:
667:                // walk through each character
668:                while ((order = sourceCursor.next()) != CollationElementIterator.NULLORDER) {
669:                    secOrder = CollationElementIterator.secondaryOrder(order);
670:                    terOrder = CollationElementIterator.tertiaryOrder(order);
671:                    if (!CollationElementIterator.isIgnorable(order)) {
672:                        primResult.append((char) (CollationElementIterator
673:                                .primaryOrder(order) + COLLATIONKEYOFFSET));
674:
675:                        if (compareSec) {
676:                            //
677:                            // accumulate all of the ignorable/secondary characters attached
678:                            // to a given base character
679:                            //
680:                            if (tables.isFrenchSec()
681:                                    && preSecIgnore < secResult.length()) {
682:                                //
683:                                // We're doing reversed secondary ordering and we've hit a base
684:                                // (non-ignorable) character.  Reverse any secondary orderings
685:                                // that applied to the last base character.  (see block comment above.)
686:                                //
687:                                RBCollationTables.reverse(secResult,
688:                                        preSecIgnore, secResult.length());
689:                            }
690:                            // Remember where we are in the secondary orderings - this is how far
691:                            // back to go if we need to reverse them later.
692:                            secResult
693:                                    .append((char) (secOrder + COLLATIONKEYOFFSET));
694:                            preSecIgnore = secResult.length();
695:                        }
696:                        if (compareTer) {
697:                            terResult
698:                                    .append((char) (terOrder + COLLATIONKEYOFFSET));
699:                        }
700:                    } else {
701:                        if (compareSec && secOrder != 0)
702:                            secResult
703:                                    .append((char) (secOrder
704:                                            + tables.getMaxSecOrder() + COLLATIONKEYOFFSET));
705:                        if (compareTer && terOrder != 0)
706:                            terResult
707:                                    .append((char) (terOrder
708:                                            + tables.getMaxTerOrder() + COLLATIONKEYOFFSET));
709:                    }
710:                }
711:                if (tables.isFrenchSec()) {
712:                    if (preSecIgnore < secResult.length()) {
713:                        // If we've accumlated any secondary characters after the last base character,
714:                        // reverse them.
715:                        RBCollationTables.reverse(secResult, preSecIgnore,
716:                                secResult.length());
717:                    }
718:                    // And now reverse the entire secResult to get French secondary ordering.
719:                    RBCollationTables.reverse(secResult, 0, secResult.length());
720:                }
721:                primResult.append((char) 0);
722:                secResult.append((char) 0);
723:                secResult.append(terResult.toString());
724:                primResult.append(secResult.toString());
725:
726:                if (getStrength() == IDENTICAL) {
727:                    primResult.append((char) 0);
728:                    Normalizer.Mode mode = NormalizerUtilities
729:                            .toNormalizerMode(getDecomposition());
730:                    primResult.append(Normalizer.normalize(source, mode, 0));
731:                }
732:                return new CollationKey(source, primResult.toString());
733:            }
734:
735:            /**
736:             * Standard override; no change in semantics.
737:             */
738:            public Object clone() {
739:                // if we know we're not actually a subclass of RuleBasedCollator
740:                // (this class really should have been made final), bypass
741:                // Object.clone() and use our "copy constructor".  This is faster.
742:                if (getClass() == RuleBasedCollator.class) {
743:                    return new RuleBasedCollator(this );
744:                } else {
745:                    RuleBasedCollator result = (RuleBasedCollator) super 
746:                            .clone();
747:                    result.primResult = null;
748:                    result.secResult = null;
749:                    result.terResult = null;
750:                    result.sourceCursor = null;
751:                    result.targetCursor = null;
752:                    return result;
753:                }
754:            }
755:
756:            /**
757:             * Compares the equality of two collation objects.
758:             * @param obj the table-based collation object to be compared with this.
759:             * @return true if the current table-based collation object is the same
760:             * as the table-based collation object obj; false otherwise.
761:             */
762:            public boolean equals(Object obj) {
763:                if (obj == null)
764:                    return false;
765:                if (!super .equals(obj))
766:                    return false; // super does class check
767:                RuleBasedCollator other = (RuleBasedCollator) obj;
768:                // all other non-transient information is also contained in rules.
769:                return (getRules().equals(other.getRules()));
770:            }
771:
772:            /**
773:             * Generates the hash code for the table-based collation object
774:             */
775:            public int hashCode() {
776:                return getRules().hashCode();
777:            }
778:
779:            /**
780:             * Allows CollationElementIterator access to the tables object
781:             */
782:            RBCollationTables getTables() {
783:                return tables;
784:            }
785:
786:            // ==============================================================
787:            // private
788:            // ==============================================================
789:
790:            final static int CHARINDEX = 0x70000000; // need look up in .commit()
791:            final static int EXPANDCHARINDEX = 0x7E000000; // Expand index follows
792:            final static int CONTRACTCHARINDEX = 0x7F000000; // contract indexes follow
793:            final static int UNMAPPED = 0xFFFFFFFF;
794:
795:            private final static int COLLATIONKEYOFFSET = 1;
796:
797:            private RBCollationTables tables = null;
798:
799:            // Internal objects that are cached across calls so that they don't have to
800:            // be created/destroyed on every call to compare() and getCollationKey()
801:            private StringBuffer primResult = null;
802:            private StringBuffer secResult = null;
803:            private StringBuffer terResult = null;
804:            private CollationElementIterator sourceCursor = null;
805:            private CollationElementIterator targetCursor = null;
806:        }
www.java2java.com | Contact Us
All other trademarks are property of their respective owners.