Java Doc for RuleBasedCollator.java in  » Internationalization-Localization » icu4j » com » ibm » icu » text » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Internationalization Localization » icu4j » com.ibm.icu.text 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   com.ibm.icu.text.Collator
      com.ibm.icu.text.RuleBasedCollator

RuleBasedCollator
final public class RuleBasedCollator extends Collator (Code)

RuleBasedCollator is a concrete subclass of Collator. It allows customization of the Collator via user-specified rule sets. RuleBasedCollator is designed to be fully compliant to the Unicode Collation Algorithm (UCA) and conforms to ISO 14651.

Users are strongly encouraged to read the users guide for more information about the collation service before using this class.

Create a RuleBasedCollator from a locale by calling the getInstance(Locale) factory method in the base class Collator. Collator.getInstance(Locale) creates a RuleBasedCollator object based on the collation rules defined by the argument locale. If a customized collation ordering ar attributes is required, use the RuleBasedCollator(String) constructor with the appropriate rules. The customized RuleBasedCollator will base its ordering on UCA, while re-adjusting the attributes and orders of the characters in the specified rule accordingly.

RuleBasedCollator provides correct collation orders for most locales supported in ICU. If specific data for a locale is not available, the orders eventually falls back to the UCA collation order .

For information about the collation rule syntax and details about customization, please refer to the Collation customization section of the user's guide.

Note that there are some differences between the Collation rule syntax used in Java and ICU4J:

  • According to the JDK documentation:

    Modifier '!' : Turns on Thai/Lao vowel-consonant swapping. If this rule is in force when a Thai vowel of the range \U0E40-\U0E44 precedes a Thai consonant of the range \U0E01-\U0E2E OR a Lao vowel of the range \U0EC0-\U0EC4 precedes a Lao consonant of the range \U0E81-\U0EAE then the vowel is placed after the consonant for collation purposes.

    If a rule is without the modifier '!', the Thai/Lao vowel-consonant swapping is not turned on.

    ICU4J's RuleBasedCollator does not support turning off the Thai/Lao vowel-consonant swapping, since the UCA clearly states that it has to be supported to ensure a correct sorting order. If a '!' is encountered, it is ignored.

  • As mentioned in the documentation of the base class Collator, compatibility decomposition mode is not supported.

Examples

Creating Customized RuleBasedCollators:

 String simple = "& a < b < c < d";
 RuleBasedCollator simpleCollator = new RuleBasedCollator(simple);
 String norwegian = "& a , A < b , B < c , C < d , D < e , E "
 + "< f , F < g , G < h , H < i , I < j , "
 + "J < k , K < l , L < m , M < n , N < "
 + "o , O < p , P < q , Q < r , R < s , S < "
 + "t , T < u , U < v , V < w , W < x , X "
 + "< y , Y < z , Z < \u00E5 = a\u030A "
 + ", \u00C5 = A\u030A ; aa , AA < \u00E6 "
 + ", \u00C6 < \u00F8 , \u00D8";
 RuleBasedCollator norwegianCollator = new RuleBasedCollator(norwegian);
 
Concatenating rules to combine Collators:
 // Create an en_US Collator object
 RuleBasedCollator en_USCollator = (RuleBasedCollator)
 Collator.getInstance(new Locale("en", "US", ""));
 // Create a da_DK Collator object
 RuleBasedCollator da_DKCollator = (RuleBasedCollator)
 Collator.getInstance(new Locale("da", "DK", ""));
 // Combine the two
 // First, get the collation rules from en_USCollator
 String en_USRules = en_USCollator.getRules();
 // Second, get the collation rules from da_DKCollator
 String da_DKRules = da_DKCollator.getRules();
 RuleBasedCollator newCollator =
 new RuleBasedCollator(en_USRules + da_DKRules);
 // newCollator has the combined rules
 
Making changes to an existing RuleBasedCollator to create a new Collator object, by appending changes to the existing rule:
 // Create a new Collator object with additional rules
 String addRules = "& C < ch, cH, Ch, CH";
 RuleBasedCollator myCollator =
 new RuleBasedCollator(en_USCollator + addRules);
 // myCollator contains the new rules
 
How to change the order of non-spacing accents:
 // old rule with main accents
 String oldRules = "= \u0301 ; \u0300 ; \u0302 ; \u0308 "
 + "; \u0327 ; \u0303 ; \u0304 ; \u0305 "
 + "; \u0306 ; \u0307 ; \u0309 ; \u030A "
 + "; \u030B ; \u030C ; \u030D ; \u030E "
 + "; \u030F ; \u0310 ; \u0311 ; \u0312 "
 + "< a , A ; ae, AE ; \u00e6 , \u00c6 "
 + "< b , B < c, C < e, E & C < d , D";
 // change the order of accent characters
 String addOn = "& \u0300 ; \u0308 ; \u0302";
 RuleBasedCollator myCollator = new RuleBasedCollator(oldRules + addOn);
 
Putting in a new primary ordering before the default setting, e.g. sort English characters before or after Japanese characters in the Japanese Collator:
 // get en_US Collator rules
 RuleBasedCollator en_USCollator
 = (RuleBasedCollator)Collator.getInstance(Locale.US);
 // add a few Japanese characters to sort before English characters
 // suppose the last character before the first base letter 'a' in
 // the English collation rule is \u2212
 String jaString = "& \u2212 < \u3041, \u3042 < \u3043, "
 + "\u3044";
 RuleBasedCollator myJapaneseCollator
 = new RuleBasedCollator(en_USCollator.getRules() + jaString);
 

This class is not subclassable


author:
   Syn Wee Quek

Inner Class :static interface AttributeValue
Inner Class :static interface Attribute
Inner Class :static class DataManipulate implements Trie.DataManipulate
Inner Class :final static class UCAConstants

Field Summary
final static  byteBYTE_COMMON_
    
final static  byteBYTE_FIRST_TAILORED_
    
final static  byteBYTE_UNSHIFTED_MIN_
    
final static  intCE_CASE_BIT_MASK_
    
final static  intCE_CONTINUATION_MARKER_
    
final static  intCE_PRIMARY_MASK_
    
final static  intCE_PRIMARY_SHIFT_
    
final static  intCE_SECONDARY_MASK_
    
final static  intCE_SECONDARY_SHIFT_
    
final static  intCE_SPECIAL_FLAG_
    
final static  intCE_SURROGATE_TAG_
    
final static  intCE_TAG_MASK_
    
final static  intCE_TAG_SHIFT_
    
final static  intCE_TERTIARY_MASK_
    
final static  byteCODAN_PLACEHOLDER
    
final static  intCOMMON_BOTTOM_2_
    
final static  intCOMMON_TOP_2_
    
final static  byteSORT_LEVEL_TERMINATOR_
    
final static  RuleBasedCollatorUCA_
    
final static  UCAConstantsUCA_CONSTANTS_
    
final static  charUCA_CONTRACTIONS_
    
final static  ImplicitCEGeneratorimpCEGen_
    
 intlatinOneCEs_
    
 booleanlatinOneFailed_
    
 booleanlatinOneRegenTable_
    
 intlatinOneTableLen_
    
 booleanlatinOneUse_
    
 ContractionInfom_ContInfo_
    
 VersionInfom_UCA_version_
    
 VersionInfom_UCD_version_
    
 intm_caseFirst_
    
 intm_contractionCE_
    
 bytem_contractionEnd_
    
 charm_contractionIndex_
    
 intm_contractionOffset_
     Size of collator raw data headers, options and expansions before contraction data.
 intm_defaultCaseFirst_
    
 intm_defaultDecomposition_
    
 booleanm_defaultIsAlternateHandlingShifted_
    
 booleanm_defaultIsCaseLevel_
    
 booleanm_defaultIsFrenchCollation_
    
 booleanm_defaultIsHiragana4_
    
 booleanm_defaultIsNumericCollation_
    
 intm_defaultStrength_
    
 intm_defaultVariableTopValue_
    
 bytem_expansionEndCEMaxSize_
     Table to store the maximum size of any expansions that end with the corresponding collation element in m_expansionEndCE_.
 intm_expansionEndCE_
     Table to store all collation elements that are the last element of an expansion.
 intm_expansionOffset_
     Size of collator raw data headers and options before the expansion data.
 intm_expansion_
    
 booleanm_isHiragana4_
    
 booleanm_isJamoSpecial_
    
 booleanm_isNumericCollation_
    
 charm_minContractionEnd_
    
 charm_minUnsafe_
    
 Stringm_rules_
    
 IntTriem_trie_
    
 bytem_unsafe_
     Heuristic table to store information on whether a char character is considered "unsafe".
 intm_variableTopValue_
    
 VersionInfom_version_
    
final static  intmaxImplicitPrimary
    
final static  intmaxRegularPrimary
    
final static  intminImplicitPrimary
    

Constructor Summary
public  RuleBasedCollator(String rules)
    

Constructor that takes the argument rules for customization.

 RuleBasedCollator()
    

Private contructor for use by subclasses.

 RuleBasedCollator(ULocale locale)
     Constructors a RuleBasedCollator from the argument locale.

Method Summary
public  Objectclone()
    
public  intcompare(String source, String target)
     Compares the source text String to the target text String according to the collation rules, strength and decomposition mode for this RuleBasedCollator. Returns an integer less than, equal to or greater than zero depending on whether the source String is less than, equal to or greater than the target String.
public  booleanequals(Object obj)
     Compares the equality of two RuleBasedCollator objects. RuleBasedCollator objects are equal if they have the same collation rules and the same attributes.
Parameters:
  obj - the RuleBasedCollator to be compared to.
public  CollationElementIteratorgetCollationElementIterator(String source)
     Return a CollationElementIterator for the given String.
public  CollationElementIteratorgetCollationElementIterator(CharacterIterator source)
     Return a CollationElementIterator for the given CharacterIterator.
public  CollationElementIteratorgetCollationElementIterator(UCharacterIterator source)
     Return a CollationElementIterator for the given UCharacterIterator.
public  CollationKeygetCollationKey(String source)
    

Get a Collation key for the argument String source from this RuleBasedCollator.

General recommendation:
If comparison are to be done to the same String multiple times, it would be more efficient to generate CollationKeys for the Strings and use CollationKey.compareTo(CollationKey) for the comparisons. If the each Strings are compared to only once, using the method RuleBasedCollator.compare(String, String) will have a better performance.

See the class documentation for an explanation about CollationKeys.


Parameters:
  source - the text String to be transformed into a collation key.
public  voidgetContractionsAndExpansions(UnicodeSet contractions, UnicodeSet expansions, boolean addPrefixes)
    
public  booleangetNumericCollation()
     Method to retrieve the numeric collation value. When numeric collation is turned on, this Collator generates a collation key for the numeric value of substrings of digits.
public  RawCollationKeygetRawCollationKey(String source, RawCollationKey key)
     Gets the simpler form of a CollationKey for the String source following the rules of this Collator and stores the result into the user provided argument key.
public  StringgetRules()
     Gets the collation rules for this RuleBasedCollator.
public  StringgetRules(boolean fullrules)
     Returns current rules.
static  intgetTag(int ce)
    
public  UnicodeSetgetTailoredSet()
     Get an UnicodeSet that contains all the characters and sequences tailored in this collator. a pointer to a UnicodeSet object containing all thecode points and sequences that may sort differently thanin the UCA.
exception:
  ParseException - thrown when argument rules have aninvalid syntax.
public  VersionInfogetUCAVersion()
     Get the UCA version of this collator object.
public  intgetVariableTop()
     Gets the variable top value of a Collator.
public  VersionInfogetVersion()
     Get the version of this collator object.
public  inthashCode()
     Generates a unique hash code for this RuleBasedCollator.
public  booleanisAlternateHandlingShifted()
     Checks if the alternate handling behaviour is the UCA defined SHIFTED or NON_IGNORABLE. If return value is true, then the alternate handling attribute for the Collator is SHIFTED.
public  booleanisCaseLevel()
     Checks if case level is set to true.
final static  booleanisContinuation(int ce)
    
final  booleanisContractionEnd(char ch)
     Approximate determination if a char character is at a contraction end.
public  booleanisFrenchCollation()
     Checks if French Collation is set to true.
public  booleanisHiraganaQuaternary()
     Checks if the Hiragana Quaternary mode is set on.
public  booleanisLowerCaseFirst()
     Return true if a lowercase character is sorted before the corresponding uppercase character.
static  booleanisSpecial(int ce)
    
final  booleanisUnsafe(char ch)
     Test whether a char character is potentially "unsafe" for use as a collation starting point.
public  booleanisUpperCaseFirst()
     Return true if an uppercase character is sorted before the corresponding lowercase character.
public  voidsetAlternateHandlingDefault()
     Sets the alternate handling mode to the initial mode set during construction of the RuleBasedCollator.
public  voidsetAlternateHandlingShifted(boolean shifted)
     Sets the alternate handling for QUATERNARY strength to be either shifted or non-ignorable. See the UCA definition on Alternate Weighting. This attribute will only be effective when QUATERNARY strength is set. The default value for this mode is false, corresponding to the NON_IGNORABLE mode in UCA.
final public  voidsetCaseFirstDefault()
     Sets the case first mode to the initial mode set during construction of the RuleBasedCollator.
public  voidsetCaseLevel(boolean flag)
    

When case level is set to true, an additional weight is formed between the SECONDARY and TERTIARY weight, known as the case level. The case level is used to distinguish large and small Japanese Kana characters.

public  voidsetCaseLevelDefault()
     Sets the case level mode to the initial mode set during construction of the RuleBasedCollator.
public  voidsetDecompositionDefault()
     Sets the decomposition mode to the initial mode set during construction of the RuleBasedCollator.
public  voidsetFrenchCollation(boolean flag)
     Sets the mode for the direction of SECONDARY weights to be used in French collation.
public  voidsetFrenchCollationDefault()
     Sets the French collation mode to the initial mode set during construction of the RuleBasedCollator.
public  voidsetHiraganaQuaternary(boolean flag)
     Sets the Hiragana Quaternary mode to be on or off. When the Hiragana Quaternary mode is turned on, the collator positions Hiragana characters before all non-ignorable characters in QUATERNARY strength.
public  voidsetHiraganaQuaternaryDefault()
     Sets the Hiragana Quaternary mode to the initial mode set during construction of the RuleBasedCollator.
public  voidsetLowerCaseFirst(boolean lowerfirst)
     Sets the orders of lower cased characters to sort before upper cased characters, in strength TERTIARY.
public  voidsetNumericCollation(boolean flag)
     When numeric collation is turned on, this Collator generates a collation key for the numeric value of substrings of digits.
public  voidsetNumericCollationDefault()
     Method to set numeric collation to its default value. When numeric collation is turned on, this Collator generates a collation key for the numeric value of substrings of digits.
public  voidsetStrength(int newStrength)
    

Sets this Collator's strength property.

public  voidsetStrengthDefault()
     Sets the collation strength to the initial mode set during the construction of the RuleBasedCollator.
public  voidsetUpperCaseFirst(boolean upperfirst)
     Sets whether uppercase characters sort before lowercase characters or vice versa, in strength TERTIARY.
public  intsetVariableTop(String varTop)
    

Variable top is a two byte primary value which causes all the codepoints with primary values that are less or equal than the variable top to be shifted when alternate handling is set to SHIFTED.

Sets the variable top to a collation element value of a string supplied.


Parameters:
  varTop - one or more (if contraction) characters to which the variable top should be set a int value containing the value of the variable top in upper 16bits.
public  voidsetVariableTop(int varTop)
     Sets the variable top to a collation element value supplied. Variable top is set to the upper 16 bits.
final  voidsetWithUCAData()
     Sets this collator to use the all options and tables in UCA.
final  voidsetWithUCATables()
     Sets this collator to use the tables in UCA.

Field Detail
BYTE_COMMON_
final static byte BYTE_COMMON_(Code)



BYTE_FIRST_TAILORED_
final static byte BYTE_FIRST_TAILORED_(Code)



BYTE_UNSHIFTED_MIN_
final static byte BYTE_UNSHIFTED_MIN_(Code)



CE_CASE_BIT_MASK_
final static int CE_CASE_BIT_MASK_(Code)
Case strength mask



CE_CONTINUATION_MARKER_
final static int CE_CONTINUATION_MARKER_(Code)
Continuation marker



CE_PRIMARY_MASK_
final static int CE_PRIMARY_MASK_(Code)
Mask to get the primary strength of the collation element



CE_PRIMARY_SHIFT_
final static int CE_PRIMARY_SHIFT_(Code)
Primary strength shift



CE_SECONDARY_MASK_
final static int CE_SECONDARY_MASK_(Code)
Mask to get the secondary strength of the collation element



CE_SECONDARY_SHIFT_
final static int CE_SECONDARY_SHIFT_(Code)
Secondary strength shift



CE_SPECIAL_FLAG_
final static int CE_SPECIAL_FLAG_(Code)



CE_SURROGATE_TAG_
final static int CE_SURROGATE_TAG_(Code)
Lead surrogate that is tailored and doesn't start a contraction



CE_TAG_MASK_
final static int CE_TAG_MASK_(Code)



CE_TAG_SHIFT_
final static int CE_TAG_SHIFT_(Code)



CE_TERTIARY_MASK_
final static int CE_TERTIARY_MASK_(Code)
Mask to get the tertiary strength of the collation element



CODAN_PLACEHOLDER
final static byte CODAN_PLACEHOLDER(Code)



COMMON_BOTTOM_2_
final static int COMMON_BOTTOM_2_(Code)



COMMON_TOP_2_
final static int COMMON_TOP_2_(Code)



SORT_LEVEL_TERMINATOR_
final static byte SORT_LEVEL_TERMINATOR_(Code)



UCA_
final static RuleBasedCollator UCA_(Code)
UnicodeData.txt property object



UCA_CONSTANTS_
final static UCAConstants UCA_CONSTANTS_(Code)
UCA Constants



UCA_CONTRACTIONS_
final static char UCA_CONTRACTIONS_(Code)
Table for UCA and builder use



impCEGen_
final static ImplicitCEGenerator impCEGen_(Code)
Implicit generator



latinOneCEs_
int latinOneCEs_(Code)



latinOneFailed_
boolean latinOneFailed_(Code)



latinOneRegenTable_
boolean latinOneRegenTable_(Code)



latinOneTableLen_
int latinOneTableLen_(Code)



latinOneUse_
boolean latinOneUse_(Code)



m_ContInfo_
ContractionInfo m_ContInfo_(Code)



m_UCA_version_
VersionInfo m_UCA_version_(Code)
UCA version



m_UCD_version_
VersionInfo m_UCD_version_(Code)
UCD version



m_caseFirst_
int m_caseFirst_(Code)
Case sorting customization



m_contractionCE_
int m_contractionCE_(Code)
Contraction CE table



m_contractionEnd_
byte m_contractionEnd_(Code)
Table to store information on whether a codepoint can occur as the last character in a contraction



m_contractionIndex_
char m_contractionIndex_(Code)
Contraction index table



m_contractionOffset_
int m_contractionOffset_(Code)
Size of collator raw data headers, options and expansions before contraction data. This is used when contraction ces are to be retrieved. ICU4C uses contraction offset starting from UCollator.UColHeader, hence ICU4J will have to minus that off to get the right contraction ce offset. In number of chars.



m_defaultCaseFirst_
int m_defaultCaseFirst_(Code)



m_defaultDecomposition_
int m_defaultDecomposition_(Code)



m_defaultIsAlternateHandlingShifted_
boolean m_defaultIsAlternateHandlingShifted_(Code)



m_defaultIsCaseLevel_
boolean m_defaultIsCaseLevel_(Code)



m_defaultIsFrenchCollation_
boolean m_defaultIsFrenchCollation_(Code)



m_defaultIsHiragana4_
boolean m_defaultIsHiragana4_(Code)



m_defaultIsNumericCollation_
boolean m_defaultIsNumericCollation_(Code)



m_defaultStrength_
int m_defaultStrength_(Code)



m_defaultVariableTopValue_
int m_defaultVariableTopValue_(Code)



m_expansionEndCEMaxSize_
byte m_expansionEndCEMaxSize_(Code)
Table to store the maximum size of any expansions that end with the corresponding collation element in m_expansionEndCE_. For use in StringSearch too



m_expansionEndCE_
int m_expansionEndCE_(Code)
Table to store all collation elements that are the last element of an expansion. This is for use in StringSearch.



m_expansionOffset_
int m_expansionOffset_(Code)
Size of collator raw data headers and options before the expansion data. This is used when expansion ces are to be retrieved. ICU4C uses the expansion offset starting from UCollator.UColHeader, hence ICU4J will have to minus that off to get the right expansion ce offset. In number of ints.



m_expansion_
int m_expansion_(Code)
Expansion table



m_isHiragana4_
boolean m_isHiragana4_(Code)
Attribute for special Hiragana



m_isJamoSpecial_
boolean m_isJamoSpecial_(Code)
Flag indicator if Jamo is special



m_isNumericCollation_
boolean m_isNumericCollation_(Code)
Numeric collation option



m_minContractionEnd_
char m_minContractionEnd_(Code)
The smallest codepoint that could be the end of a contraction



m_minUnsafe_
char m_minUnsafe_(Code)
The smallest "unsafe" codepoint



m_rules_
String m_rules_(Code)
Original collation rules



m_trie_
IntTrie m_trie_(Code)
Data trie



m_unsafe_
byte m_unsafe_(Code)
Heuristic table to store information on whether a char character is considered "unsafe". "Unsafe" character are combining marks or those belonging to some contraction sequence from the offset 1 onwards. E.g. if "ABC" is the only contraction, then 'B' and 'C' are considered unsafe. If we have another contraction "ZA" with the one above, then 'A', 'B', 'C' are "unsafe" but 'Z' is not.



m_variableTopValue_
int m_variableTopValue_(Code)
Value of the variable top



m_version_
VersionInfo m_version_(Code)
General version of the collator



maxImplicitPrimary
final static int maxImplicitPrimary(Code)



maxRegularPrimary
final static int maxRegularPrimary(Code)



minImplicitPrimary
final static int minImplicitPrimary(Code)




Constructor Detail
RuleBasedCollator
public RuleBasedCollator(String rules) throws Exception(Code)

Constructor that takes the argument rules for customization. The collator will be based on UCA, with the attributes and re-ordering of the characters specified in the argument rules.

See the user guide's section on Collation Customization for details on the rule syntax.


Parameters:
  rules - the collation rules to build the collation table from.
exception:
  ParseException - and IOException thrown. ParseException thrownwhen argument rules have an invalid syntax. IOExceptionthrown when an error occured while reading internal data.



RuleBasedCollator
RuleBasedCollator()(Code)

Private contructor for use by subclasses. Public access to creating Collators is handled by the API Collator.getInstance() or RuleBasedCollator(String rules).

This constructor constructs the UCA collator internally




RuleBasedCollator
RuleBasedCollator(ULocale locale)(Code)
Constructors a RuleBasedCollator from the argument locale. If no resource bundle is associated with the locale, UCA is used instead.
Parameters:
  locale -




Method Detail
clone
public Object clone() throws CloneNotSupportedException(Code)
Clones the RuleBasedCollator a new instance of this RuleBasedCollator object



compare
public int compare(String source, String target)(Code)
Compares the source text String to the target text String according to the collation rules, strength and decomposition mode for this RuleBasedCollator. Returns an integer less than, equal to or greater than zero depending on whether the source String is less than, equal to or greater than the target String. See the Collator class description for an example of use.

General recommendation:
If comparison are to be done to the same String multiple times, it would be more efficient to generate CollationKeys for the Strings and use CollationKey.compareTo(CollationKey) for the comparisons. If speed performance is critical and object instantiation is to be reduced, further optimization may be achieved by generating a simpler key of the form RawCollationKey and reusing this RawCollationKey object with the method RuleBasedCollator.getRawCollationKey. Internal byte representation can be directly accessed via RawCollationKey and stored for future use. Like CollationKey, RawCollationKey provides a method RawCollationKey.compareTo for key comparisons. If the each Strings are compared to only once, using the method RuleBasedCollator.compare(String, String) will have a better performance.


Parameters:
  source - the source text String.
Parameters:
  target - the target text String. Returns an integer value. Value is less than zero if source isless than target, value is zero if source and target are equal,value is greater than zero if source is greater than target.
See Also:   CollationKey
See Also:   RuleBasedCollator.getCollationKey



equals
public boolean equals(Object obj)(Code)
Compares the equality of two RuleBasedCollator objects. RuleBasedCollator objects are equal if they have the same collation rules and the same attributes.
Parameters:
  obj - the RuleBasedCollator to be compared to. true if this RuleBasedCollator has exactly the samecollation behaviour as obj, false otherwise.



getCollationElementIterator
public CollationElementIterator getCollationElementIterator(String source)(Code)
Return a CollationElementIterator for the given String.
See Also:   CollationElementIterator



getCollationElementIterator
public CollationElementIterator getCollationElementIterator(CharacterIterator source)(Code)
Return a CollationElementIterator for the given CharacterIterator. The source iterator's integrity will be preserved since a new copy will be created for use.
See Also:   CollationElementIterator



getCollationElementIterator
public CollationElementIterator getCollationElementIterator(UCharacterIterator source)(Code)
Return a CollationElementIterator for the given UCharacterIterator. The source iterator's integrity will be preserved since a new copy will be created for use.
See Also:   CollationElementIterator



getCollationKey
public CollationKey getCollationKey(String source)(Code)

Get a Collation key for the argument String source from this RuleBasedCollator.

General recommendation:
If comparison are to be done to the same String multiple times, it would be more efficient to generate CollationKeys for the Strings and use CollationKey.compareTo(CollationKey) for the comparisons. If the each Strings are compared to only once, using the method RuleBasedCollator.compare(String, String) will have a better performance.

See the class documentation for an explanation about CollationKeys.


Parameters:
  source - the text String to be transformed into a collation key. the CollationKey for the given String based on thisRuleBasedCollator's collation rules. If the source String isnull, a null CollationKey is returned.
See Also:   CollationKey
See Also:   RuleBasedCollator.compare(String,String)
See Also:   RuleBasedCollator.getRawCollationKey



getContractionsAndExpansions
public void getContractionsAndExpansions(UnicodeSet contractions, UnicodeSet expansions, boolean addPrefixes) throws Exception(Code)
Gets unicode sets containing contractions and/or expansions of a collator
Parameters:
  contractions - if not null, set to contain contractions
Parameters:
  expansions - if not null, set to contain expansions
Parameters:
  addPrefixes - add the prefix contextual elements to contractions
throws:
  Exception -



getNumericCollation
public boolean getNumericCollation()(Code)
Method to retrieve the numeric collation value. When numeric collation is turned on, this Collator generates a collation key for the numeric value of substrings of digits. This is a way to get '100' to sort AFTER '2'
See Also:   RuleBasedCollator.setNumericCollation
See Also:   RuleBasedCollator.setNumericCollationDefault true if numeric collation is turned on, false otherwise



getRawCollationKey
public RawCollationKey getRawCollationKey(String source, RawCollationKey key)(Code)
Gets the simpler form of a CollationKey for the String source following the rules of this Collator and stores the result into the user provided argument key. If key has a internal byte array of length that's too small for the result, the internal byte array will be grown to the exact required size.
Parameters:
  source - the text String to be transformed into a RawCollationKey
Parameters:
  key - output RawCollationKey to store results If key is null, a new instance of RawCollationKey will be created and returned, otherwise the user provided key will be returned.
See Also:   RuleBasedCollator.getCollationKey
See Also:   
See Also:   RuleBasedCollator.compare(String,String)
See Also:   RawCollationKey



getRules
public String getRules()(Code)
Gets the collation rules for this RuleBasedCollator. Equivalent to String getRules(RuleOption.FULL_RULES). returns the collation rules
See Also:   RuleBasedCollator.getRules(boolean)



getRules
public String getRules(boolean fullrules)(Code)
Returns current rules. The argument defines whether full rules (UCA + tailored) rules are returned or just the tailoring.
Parameters:
  fullrules - true if the rules that defines the full set of collation order is required, otherwise false for returning only the tailored rules the current rules that defines this Collator.
See Also:   RuleBasedCollator.getRules()



getTag
static int getTag(int ce)(Code)
Retrieve the tag of a special ce
Parameters:
  ce - ce to test tag of ce



getTailoredSet
public UnicodeSet getTailoredSet()(Code)
Get an UnicodeSet that contains all the characters and sequences tailored in this collator. a pointer to a UnicodeSet object containing all thecode points and sequences that may sort differently thanin the UCA.
exception:
  ParseException - thrown when argument rules have aninvalid syntax. IOException



getUCAVersion
public VersionInfo getUCAVersion()(Code)
Get the UCA version of this collator object. the version object associated with this collator



getVariableTop
public int getVariableTop()(Code)
Gets the variable top value of a Collator. Lower 16 bits are undefined and should be ignored. the variable top value of a Collator.
See Also:   RuleBasedCollator.setVariableTop



getVersion
public VersionInfo getVersion()(Code)
Get the version of this collator object. the version object associated with this collator



hashCode
public int hashCode()(Code)
Generates a unique hash code for this RuleBasedCollator. the unique hash code for this Collator



isAlternateHandlingShifted
public boolean isAlternateHandlingShifted()(Code)
Checks if the alternate handling behaviour is the UCA defined SHIFTED or NON_IGNORABLE. If return value is true, then the alternate handling attribute for the Collator is SHIFTED. Otherwise if return value is false, then the alternate handling attribute for the Collator is NON_IGNORABLE See setAlternateHandlingShifted(boolean) for more details. true or false
See Also:   RuleBasedCollator.setAlternateHandlingShifted(boolean)
See Also:   RuleBasedCollator.setAlternateHandlingDefault



isCaseLevel
public boolean isCaseLevel()(Code)
Checks if case level is set to true. See setCaseLevel(boolean) for details. the case level mode
See Also:   RuleBasedCollator.setCaseLevelDefault
See Also:   RuleBasedCollator.isCaseLevel
See Also:   RuleBasedCollator.setCaseLevel(boolean)



isContinuation
final static boolean isContinuation(int ce)(Code)
Checks if the argument ce is a continuation
Parameters:
  ce - collation element to test true if ce is a continuation



isContractionEnd
final boolean isContractionEnd(char ch)(Code)
Approximate determination if a char character is at a contraction end. Guaranteed to be true if a character is at the end of a contraction, otherwise it is not deterministic.
Parameters:
  ch - character to be determined



isFrenchCollation
public boolean isFrenchCollation()(Code)
Checks if French Collation is set to true. See setFrenchCollation(boolean) for details. true if French Collation is set to true, false otherwise
See Also:   RuleBasedCollator.setFrenchCollation(boolean)
See Also:   RuleBasedCollator.setFrenchCollationDefault



isHiraganaQuaternary
public boolean isHiraganaQuaternary()(Code)
Checks if the Hiragana Quaternary mode is set on. See setHiraganaQuaternary(boolean) for more details. flag true if Hiragana Quaternary mode is on, false otherwise
See Also:   RuleBasedCollator.setHiraganaQuaternaryDefault
See Also:   RuleBasedCollator.setHiraganaQuaternary(boolean)



isLowerCaseFirst
public boolean isLowerCaseFirst()(Code)
Return true if a lowercase character is sorted before the corresponding uppercase character. See setCaseFirst(boolean) for details.
See Also:   RuleBasedCollator.setUpperCaseFirst
See Also:   RuleBasedCollator.setLowerCaseFirst
See Also:   RuleBasedCollator.isUpperCaseFirst
See Also:   RuleBasedCollator.setCaseFirstDefault true lower cased characters are sorted before upper casedcharacters, false otherwise



isSpecial
static boolean isSpecial(int ce)(Code)
Checking if ce is special
Parameters:
  ce - to check true if ce is special



isUnsafe
final boolean isUnsafe(char ch)(Code)
Test whether a char character is potentially "unsafe" for use as a collation starting point. "Unsafe" characters are combining marks or those belonging to some contraction sequence from the offset 1 onwards. E.g. if "ABC" is the only contraction, then 'B' and 'C' are considered unsafe. If we have another contraction "ZA" with the one above, then 'A', 'B', 'C' are "unsafe" but 'Z' is not.
Parameters:
  ch - character to determin true if ch is unsafe, false otherwise



isUpperCaseFirst
public boolean isUpperCaseFirst()(Code)
Return true if an uppercase character is sorted before the corresponding lowercase character. See setCaseFirst(boolean) for details.
See Also:   RuleBasedCollator.setUpperCaseFirst
See Also:   RuleBasedCollator.setLowerCaseFirst
See Also:   RuleBasedCollator.isLowerCaseFirst
See Also:   RuleBasedCollator.setCaseFirstDefault true if upper cased characters are sorted before lower casedcharacters, false otherwise



setAlternateHandlingDefault
public void setAlternateHandlingDefault()(Code)
Sets the alternate handling mode to the initial mode set during construction of the RuleBasedCollator. See setAlternateHandling(boolean) for more details.
See Also:   RuleBasedCollator.setAlternateHandlingShifted(boolean)
See Also:   RuleBasedCollator.isAlternateHandlingShifted()



setAlternateHandlingShifted
public void setAlternateHandlingShifted(boolean shifted)(Code)
Sets the alternate handling for QUATERNARY strength to be either shifted or non-ignorable. See the UCA definition on Alternate Weighting. This attribute will only be effective when QUATERNARY strength is set. The default value for this mode is false, corresponding to the NON_IGNORABLE mode in UCA. In the NON-IGNORABLE mode, the RuleBasedCollator will treats all the codepoints with non-ignorable primary weights in the same way. If the mode is set to true, the behaviour corresponds to SHIFTED defined in UCA, this causes codepoints with PRIMARY orders that are equal or below the variable top value to be ignored in PRIMARY order and moved to the QUATERNARY order.
Parameters:
  shifted - true if SHIFTED behaviour for alternate handling isdesired, false for the NON_IGNORABLE behaviour.
See Also:   RuleBasedCollator.isAlternateHandlingShifted
See Also:   RuleBasedCollator.setAlternateHandlingDefault



setCaseFirstDefault
final public void setCaseFirstDefault()(Code)
Sets the case first mode to the initial mode set during construction of the RuleBasedCollator. See setUpperCaseFirst(boolean) and setLowerCaseFirst(boolean) for more details.
See Also:   RuleBasedCollator.isLowerCaseFirst
See Also:   RuleBasedCollator.isUpperCaseFirst
See Also:   RuleBasedCollator.setLowerCaseFirst(boolean)
See Also:   RuleBasedCollator.setUpperCaseFirst(boolean)



setCaseLevel
public void setCaseLevel(boolean flag)(Code)

When case level is set to true, an additional weight is formed between the SECONDARY and TERTIARY weight, known as the case level. The case level is used to distinguish large and small Japanese Kana characters. Case level could also be used in other situations. For example to distinguish certain Pinyin characters. The default value is false, which means the case level is not generated. The contents of the case level are affected by the case first mode. A simple way to ignore accent differences in a string is to set the strength to PRIMARY and enable case level.

See the section on case level for more information.


Parameters:
  flag - true if case level sorting is required, false otherwise
See Also:   RuleBasedCollator.setCaseLevelDefault
See Also:   RuleBasedCollator.isCaseLevel



setCaseLevelDefault
public void setCaseLevelDefault()(Code)
Sets the case level mode to the initial mode set during construction of the RuleBasedCollator. See setCaseLevel(boolean) for more details.
See Also:   RuleBasedCollator.setCaseLevel(boolean)
See Also:   RuleBasedCollator.isCaseLevel



setDecompositionDefault
public void setDecompositionDefault()(Code)
Sets the decomposition mode to the initial mode set during construction of the RuleBasedCollator. See setDecomposition(int) for more details.
See Also:   RuleBasedCollator.getDecomposition
See Also:   RuleBasedCollator.setDecomposition(int)



setFrenchCollation
public void setFrenchCollation(boolean flag)(Code)
Sets the mode for the direction of SECONDARY weights to be used in French collation. The default value is false, which treats SECONDARY weights in the order they appear. If set to true, the SECONDARY weights will be sorted backwards. See the section on French collation for more information.
Parameters:
  flag - true to set the French collation on, false to set it off
See Also:   RuleBasedCollator.isFrenchCollation
See Also:   RuleBasedCollator.setFrenchCollationDefault



setFrenchCollationDefault
public void setFrenchCollationDefault()(Code)
Sets the French collation mode to the initial mode set during construction of the RuleBasedCollator. See setFrenchCollation(boolean) for more details.
See Also:   RuleBasedCollator.isFrenchCollation
See Also:   RuleBasedCollator.setFrenchCollation(boolean)



setHiraganaQuaternary
public void setHiraganaQuaternary(boolean flag)(Code)
Sets the Hiragana Quaternary mode to be on or off. When the Hiragana Quaternary mode is turned on, the collator positions Hiragana characters before all non-ignorable characters in QUATERNARY strength. This is to produce a correct JIS collation order, distinguishing between Katakana and Hiragana characters.
Parameters:
  flag - true if Hiragana Quaternary mode is to be on, falseotherwise
See Also:   RuleBasedCollator.setHiraganaQuaternaryDefault
See Also:   RuleBasedCollator.isHiraganaQuaternary



setHiraganaQuaternaryDefault
public void setHiraganaQuaternaryDefault()(Code)
Sets the Hiragana Quaternary mode to the initial mode set during construction of the RuleBasedCollator. See setHiraganaQuaternary(boolean) for more details.
See Also:   RuleBasedCollator.setHiraganaQuaternary(boolean)
See Also:   RuleBasedCollator.isHiraganaQuaternary



setLowerCaseFirst
public void setLowerCaseFirst(boolean lowerfirst)(Code)
Sets the orders of lower cased characters to sort before upper cased characters, in strength TERTIARY. The default mode is false. If true is set, the RuleBasedCollator will sort lower cased characters before the upper cased ones. Otherwise, if false is set, the RuleBasedCollator will ignore case preferences.
Parameters:
  lowerfirst - true for sorting lower cased characters beforeupper cased characters, false to ignore casepreferences.
See Also:   RuleBasedCollator.isLowerCaseFirst
See Also:   RuleBasedCollator.isUpperCaseFirst
See Also:   RuleBasedCollator.setUpperCaseFirst
See Also:   RuleBasedCollator.setCaseFirstDefault



setNumericCollation
public void setNumericCollation(boolean flag)(Code)
When numeric collation is turned on, this Collator generates a collation key for the numeric value of substrings of digits. This is a way to get '100' to sort AFTER '2'
Parameters:
  flag - true to turn numeric collation on and false to turn it off
See Also:   RuleBasedCollator.getNumericCollation
See Also:   RuleBasedCollator.setNumericCollationDefault



setNumericCollationDefault
public void setNumericCollationDefault()(Code)
Method to set numeric collation to its default value. When numeric collation is turned on, this Collator generates a collation key for the numeric value of substrings of digits. This is a way to get '100' to sort AFTER '2'
See Also:   RuleBasedCollator.getNumericCollation
See Also:   RuleBasedCollator.setNumericCollation



setStrength
public void setStrength(int newStrength)(Code)

Sets this Collator's strength property. The strength property determines the minimum level of difference considered significant during comparison.

See the Collator class description for an example of use.


Parameters:
  newStrength - the new strength value.
See Also:   RuleBasedCollator.getStrength
See Also:   RuleBasedCollator.setStrengthDefault
See Also:   RuleBasedCollator.PRIMARY
See Also:   RuleBasedCollator.SECONDARY
See Also:   RuleBasedCollator.TERTIARY
See Also:   RuleBasedCollator.QUATERNARY
See Also:   RuleBasedCollator.IDENTICAL
exception:
  IllegalArgumentException - If the new strength value is not oneof PRIMARY, SECONDARY, TERTIARY, QUATERNARY or IDENTICAL.



setStrengthDefault
public void setStrengthDefault()(Code)
Sets the collation strength to the initial mode set during the construction of the RuleBasedCollator. See setStrength(int) for more details.
See Also:   RuleBasedCollator.setStrength(int)
See Also:   RuleBasedCollator.getStrength



setUpperCaseFirst
public void setUpperCaseFirst(boolean upperfirst)(Code)
Sets whether uppercase characters sort before lowercase characters or vice versa, in strength TERTIARY. The default mode is false, and so lowercase characters sort before uppercase characters. If true, sort upper case characters first.
Parameters:
  upperfirst - true to sort uppercase characters beforelowercase characters, false to sort lowercasecharacters before uppercase characters
See Also:   RuleBasedCollator.isLowerCaseFirst
See Also:   RuleBasedCollator.isUpperCaseFirst
See Also:   RuleBasedCollator.setLowerCaseFirst
See Also:   RuleBasedCollator.setCaseFirstDefault



setVariableTop
public int setVariableTop(String varTop)(Code)

Variable top is a two byte primary value which causes all the codepoints with primary values that are less or equal than the variable top to be shifted when alternate handling is set to SHIFTED.

Sets the variable top to a collation element value of a string supplied.


Parameters:
  varTop - one or more (if contraction) characters to which the variable top should be set a int value containing the value of the variable top in upper 16bits. Lower 16 bits are undefined.
exception:
  IllegalArgumentException - is thrown if varTop argument is not a valid variable top element. A variable top element is invalid when
  • it is a contraction that does not exist in theCollation order
  • when the PRIMARY strength collation element for the variable top has more than two bytes
  • when the varTop argument is null or zero in length.

See Also:   RuleBasedCollator.getVariableTop
See Also:   RuleBasedCollator.setAlternateHandlingShifted



setVariableTop
public void setVariableTop(int varTop)(Code)
Sets the variable top to a collation element value supplied. Variable top is set to the upper 16 bits. Lower 16 bits are ignored.
Parameters:
  varTop - Collation element value, as returned by setVariableTop or getVariableTop
See Also:   RuleBasedCollator.getVariableTop
See Also:   RuleBasedCollator.setVariableTop(String)



setWithUCAData
final void setWithUCAData()(Code)
Sets this collator to use the all options and tables in UCA.



setWithUCATables
final void setWithUCATables()(Code)
Sets this collator to use the tables in UCA. Note options not taken care of here.



Fields inherited from com.ibm.icu.text.Collator
final public static int CANONICAL_DECOMPOSITION(Code)(Java Doc)
final public static int FULL_DECOMPOSITION(Code)(Java Doc)
final public static int IDENTICAL(Code)(Java Doc)
final public static int NO_DECOMPOSITION(Code)(Java Doc)
final public static int PRIMARY(Code)(Java Doc)
final public static int QUATERNARY(Code)(Java Doc)
final public static int SECONDARY(Code)(Java Doc)
final public static int TERTIARY(Code)(Java Doc)

Methods inherited from com.ibm.icu.text.Collator
public Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public int compare(Object source, Object target)(Code)(Java Doc)
abstract public int compare(String source, String target)(Code)(Java Doc)
public boolean equals(String source, String target)(Code)(Java Doc)
public static Locale[] getAvailableLocales()(Code)(Java Doc)
final public static ULocale[] getAvailableULocales()(Code)(Java Doc)
abstract public CollationKey getCollationKey(String source)(Code)(Java Doc)
public int getDecomposition()(Code)(Java Doc)
public static String getDisplayName(Locale objectLocale, Locale displayLocale)(Code)(Java Doc)
public static String getDisplayName(ULocale objectLocale, ULocale displayLocale)(Code)(Java Doc)
public static String getDisplayName(Locale objectLocale)(Code)(Java Doc)
public static String getDisplayName(ULocale objectLocale)(Code)(Java Doc)
final public static ULocale getFunctionalEquivalent(String keyword, ULocale locID, boolean isAvailable)(Code)(Java Doc)
final public static ULocale getFunctionalEquivalent(String keyword, ULocale locID)(Code)(Java Doc)
final public static Collator getInstance()(Code)(Java Doc)
final public static Collator getInstance(ULocale locale)(Code)(Java Doc)
final public static Collator getInstance(Locale locale)(Code)(Java Doc)
final public static String[] getKeywordValues(String keyword)(Code)(Java Doc)
final public static String[] getKeywords()(Code)(Java Doc)
final public ULocale getLocale(ULocale.Type type)(Code)(Java Doc)
abstract public RawCollationKey getRawCollationKey(String source, RawCollationKey key)(Code)(Java Doc)
public int getStrength()(Code)(Java Doc)
public UnicodeSet getTailoredSet()(Code)(Java Doc)
abstract public VersionInfo getUCAVersion()(Code)(Java Doc)
abstract public int getVariableTop()(Code)(Java Doc)
abstract public VersionInfo getVersion()(Code)(Java Doc)
final public static Object registerFactory(CollatorFactory factory)(Code)(Java Doc)
final public static Object registerInstance(Collator collator, ULocale locale)(Code)(Java Doc)
public void setDecomposition(int decomposition)(Code)(Java Doc)
final void setLocale(ULocale valid, ULocale actual)(Code)(Java Doc)
public void setStrength(int newStrength)(Code)(Java Doc)
abstract public int setVariableTop(String varTop)(Code)(Java Doc)
abstract public void setVariableTop(int varTop)(Code)(Java Doc)
final public static boolean unregister(Object registryKey)(Code)(Java Doc)

Methods inherited from java.lang.Object
native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.