| java.lang.Object com.ibm.icu.text.Collator com.ibm.icu.text.RuleBasedCollator
RuleBasedCollator | final public class RuleBasedCollator extends Collator (Code) | | RuleBasedCollator is a concrete subclass of Collator. It allows
customization of the Collator via user-specified rule sets.
RuleBasedCollator is designed to be fully compliant to the Unicode
Collation Algorithm (UCA) and conforms to ISO 14651.
Users are strongly encouraged to read
the users guide for more information about the collation
service before using this class.
Create a RuleBasedCollator from a locale by calling the
getInstance(Locale) factory method in the base class Collator.
Collator.getInstance(Locale) creates a RuleBasedCollator object
based on the collation rules defined by the argument locale. If a
customized collation ordering ar attributes is required, use the
RuleBasedCollator(String) constructor with the appropriate
rules. The customized RuleBasedCollator will base its ordering on
UCA, while re-adjusting the attributes and orders of the characters
in the specified rule accordingly.
RuleBasedCollator provides correct collation orders for most
locales supported in ICU. If specific data for a locale is not
available, the orders eventually falls back to the UCA collation
order .
For information about the collation rule syntax and details
about customization, please refer to the
Collation customization section of the user's guide.
Note that there are some differences between
the Collation rule syntax used in Java and ICU4J:
- According to the JDK documentation:
Modifier '!' : Turns on Thai/Lao vowel-consonant swapping. If this rule
is in force when a Thai vowel of the range \U0E40-\U0E44 precedes a
Thai consonant of the range \U0E01-\U0E2E OR a Lao vowel of the
range \U0EC0-\U0EC4 precedes a Lao consonant of the range
\U0E81-\U0EAE then the
vowel is placed after the consonant for collation purposes.
If a rule is without the modifier '!', the Thai/Lao vowel-consonant
swapping is not turned on.
ICU4J's RuleBasedCollator does not support turning off the Thai/Lao
vowel-consonant swapping, since the UCA clearly states that it has to be
supported to ensure a correct sorting order. If a '!' is encountered, it is
ignored.
- As mentioned in the documentation of the base class Collator,
compatibility decomposition mode is not supported.
Examples
Creating Customized RuleBasedCollators:
String simple = "& a < b < c < d";
RuleBasedCollator simpleCollator = new RuleBasedCollator(simple);
String norwegian = "& a , A < b , B < c , C < d , D < e , E "
+ "< f , F < g , G < h , H < i , I < j , "
+ "J < k , K < l , L < m , M < n , N < "
+ "o , O < p , P < q , Q < r , R < s , S < "
+ "t , T < u , U < v , V < w , W < x , X "
+ "< y , Y < z , Z < \u00E5 = a\u030A "
+ ", \u00C5 = A\u030A ; aa , AA < \u00E6 "
+ ", \u00C6 < \u00F8 , \u00D8";
RuleBasedCollator norwegianCollator = new RuleBasedCollator(norwegian);
Concatenating rules to combine Collator s:
// Create an en_US Collator object
RuleBasedCollator en_USCollator = (RuleBasedCollator)
Collator.getInstance(new Locale("en", "US", ""));
// Create a da_DK Collator object
RuleBasedCollator da_DKCollator = (RuleBasedCollator)
Collator.getInstance(new Locale("da", "DK", ""));
// Combine the two
// First, get the collation rules from en_USCollator
String en_USRules = en_USCollator.getRules();
// Second, get the collation rules from da_DKCollator
String da_DKRules = da_DKCollator.getRules();
RuleBasedCollator newCollator =
new RuleBasedCollator(en_USRules + da_DKRules);
// newCollator has the combined rules
Making changes to an existing RuleBasedCollator to create a new
Collator object, by appending changes to the existing rule:
// Create a new Collator object with additional rules
String addRules = "& C < ch, cH, Ch, CH";
RuleBasedCollator myCollator =
new RuleBasedCollator(en_USCollator + addRules);
// myCollator contains the new rules
How to change the order of non-spacing accents:
// old rule with main accents
String oldRules = "= \u0301 ; \u0300 ; \u0302 ; \u0308 "
+ "; \u0327 ; \u0303 ; \u0304 ; \u0305 "
+ "; \u0306 ; \u0307 ; \u0309 ; \u030A "
+ "; \u030B ; \u030C ; \u030D ; \u030E "
+ "; \u030F ; \u0310 ; \u0311 ; \u0312 "
+ "< a , A ; ae, AE ; \u00e6 , \u00c6 "
+ "< b , B < c, C < e, E & C < d , D";
// change the order of accent characters
String addOn = "& \u0300 ; \u0308 ; \u0302";
RuleBasedCollator myCollator = new RuleBasedCollator(oldRules + addOn);
Putting in a new primary ordering before the default setting,
e.g. sort English characters before or after Japanese characters in the Japanese
Collator :
// get en_US Collator rules
RuleBasedCollator en_USCollator
= (RuleBasedCollator)Collator.getInstance(Locale.US);
// add a few Japanese characters to sort before English characters
// suppose the last character before the first base letter 'a' in
// the English collation rule is \u2212
String jaString = "& \u2212 < \u3041, \u3042 < \u3043, "
+ "\u3044";
RuleBasedCollator myJapaneseCollator
= new RuleBasedCollator(en_USCollator.getRules() + jaString);
This class is not subclassable
author: Syn Wee Quek |
Inner Class :static interface AttributeValue | |
Inner Class :static interface Attribute | |
Inner Class :final static class UCAConstants | |
Method Summary | |
public Object | clone() | public int | compare(String source, String target) Compares the source text String to the target text String according to
the collation rules, strength and decomposition mode for this
RuleBasedCollator.
Returns an integer less than,
equal to or greater than zero depending on whether the source String is
less than, equal to or greater than the target String. | public boolean | equals(Object obj) Compares the equality of two RuleBasedCollator objects.
RuleBasedCollator objects are equal if they have the same collation
rules and the same attributes.
Parameters: obj - the RuleBasedCollator to be compared to. | public CollationElementIterator | getCollationElementIterator(String source) Return a CollationElementIterator for the given String. | public CollationElementIterator | getCollationElementIterator(CharacterIterator source) Return a CollationElementIterator for the given CharacterIterator. | public CollationElementIterator | getCollationElementIterator(UCharacterIterator source) Return a CollationElementIterator for the given UCharacterIterator. | public CollationKey | getCollationKey(String source)
Get a Collation key for the argument String source from this
RuleBasedCollator.
General recommendation:
If comparison are to be done to the same String multiple times, it would
be more efficient to generate CollationKeys for the Strings and use
CollationKey.compareTo(CollationKey) for the comparisons.
If the each Strings are compared to only once, using the method
RuleBasedCollator.compare(String, String) will have a better performance.
See the class documentation for an explanation about CollationKeys.
Parameters: source - the text String to be transformed into a collation key. | public void | getContractionsAndExpansions(UnicodeSet contractions, UnicodeSet expansions, boolean addPrefixes) | public boolean | getNumericCollation() Method to retrieve the numeric collation value.
When numeric collation is turned on, this Collator generates a collation
key for the numeric value of substrings of digits. | public RawCollationKey | getRawCollationKey(String source, RawCollationKey key) Gets the simpler form of a CollationKey for the String source following
the rules of this Collator and stores the result into the user provided
argument key. | public String | getRules() Gets the collation rules for this RuleBasedCollator. | public String | getRules(boolean fullrules) Returns current rules. | static int | getTag(int ce) | public UnicodeSet | getTailoredSet() Get an UnicodeSet that contains all the characters and sequences
tailored in this collator.
a pointer to a UnicodeSet object containing all thecode points and sequences that may sort differently thanin the UCA. exception: ParseException - thrown when argument rules have aninvalid syntax. | public VersionInfo | getUCAVersion() Get the UCA version of this collator object. | public int | getVariableTop() Gets the variable top value of a Collator. | public VersionInfo | getVersion() Get the version of this collator object. | public int | hashCode() Generates a unique hash code for this RuleBasedCollator. | public boolean | isAlternateHandlingShifted() Checks if the alternate handling behaviour is the UCA defined SHIFTED or
NON_IGNORABLE.
If return value is true, then the alternate handling attribute for the
Collator is SHIFTED. | public boolean | isCaseLevel() Checks if case level is set to true. | final static boolean | isContinuation(int ce) | final boolean | isContractionEnd(char ch) Approximate determination if a char character is at a contraction end. | public boolean | isFrenchCollation() Checks if French Collation is set to true. | public boolean | isHiraganaQuaternary() Checks if the Hiragana Quaternary mode is set on. | public boolean | isLowerCaseFirst() Return true if a lowercase character is sorted before the corresponding uppercase character. | static boolean | isSpecial(int ce) | final boolean | isUnsafe(char ch) Test whether a char character is potentially "unsafe" for use as a
collation starting point. | public boolean | isUpperCaseFirst() Return true if an uppercase character is sorted before the corresponding lowercase character. | public void | setAlternateHandlingDefault() Sets the alternate handling mode to the initial mode set during
construction of the RuleBasedCollator. | public void | setAlternateHandlingShifted(boolean shifted) Sets the alternate handling for QUATERNARY strength to be either
shifted or non-ignorable.
See the UCA definition on
Alternate Weighting.
This attribute will only be effective when QUATERNARY strength is set.
The default value for this mode is false, corresponding to the
NON_IGNORABLE mode in UCA. | final public void | setCaseFirstDefault() Sets the case first mode to the initial mode set during
construction of the RuleBasedCollator. | public void | setCaseLevel(boolean flag)
When case level is set to true, an additional weight is formed
between the SECONDARY and TERTIARY weight, known as the case level.
The case level is used to distinguish large and small Japanese Kana
characters. | public void | setCaseLevelDefault() Sets the case level mode to the initial mode set during
construction of the RuleBasedCollator. | public void | setDecompositionDefault() Sets the decomposition mode to the initial mode set during construction
of the RuleBasedCollator. | public void | setFrenchCollation(boolean flag) Sets the mode for the direction of SECONDARY weights to be used in
French collation. | public void | setFrenchCollationDefault() Sets the French collation mode to the initial mode set during
construction of the RuleBasedCollator. | public void | setHiraganaQuaternary(boolean flag) Sets the Hiragana Quaternary mode to be on or off.
When the Hiragana Quaternary mode is turned on, the collator
positions Hiragana characters before all non-ignorable characters in
QUATERNARY strength. | public void | setHiraganaQuaternaryDefault() Sets the Hiragana Quaternary mode to the initial mode set during
construction of the RuleBasedCollator. | public void | setLowerCaseFirst(boolean lowerfirst) Sets the orders of lower cased characters to sort before upper cased
characters, in strength TERTIARY. | public void | setNumericCollation(boolean flag) When numeric collation is turned on, this Collator generates a collation
key for the numeric value of substrings of digits. | public void | setNumericCollationDefault() Method to set numeric collation to its default value.
When numeric collation is turned on, this Collator generates a collation
key for the numeric value of substrings of digits. | public void | setStrength(int newStrength)
Sets this Collator's strength property. | public void | setStrengthDefault() Sets the collation strength to the initial mode set during the
construction of the RuleBasedCollator. | public void | setUpperCaseFirst(boolean upperfirst) Sets whether uppercase characters sort before lowercase
characters or vice versa, in strength TERTIARY. | public int | setVariableTop(String varTop)
Variable top is a two byte primary value which causes all the codepoints
with primary values that are less or equal than the variable top to be
shifted when alternate handling is set to SHIFTED.
Sets the variable top to a collation element value of a string supplied.
Parameters: varTop - one or more (if contraction) characters to which the variable top should be set a int value containing the value of the variable top in upper 16bits. | public void | setVariableTop(int varTop) Sets the variable top to a collation element value supplied.
Variable top is set to the upper 16 bits. | final void | setWithUCAData() Sets this collator to use the all options and tables in UCA. | final void | setWithUCATables() Sets this collator to use the tables in UCA. |
BYTE_COMMON_ | final static byte BYTE_COMMON_(Code) | | |
BYTE_FIRST_TAILORED_ | final static byte BYTE_FIRST_TAILORED_(Code) | | |
BYTE_UNSHIFTED_MIN_ | final static byte BYTE_UNSHIFTED_MIN_(Code) | | |
CE_CASE_BIT_MASK_ | final static int CE_CASE_BIT_MASK_(Code) | | Case strength mask
|
CE_CONTINUATION_MARKER_ | final static int CE_CONTINUATION_MARKER_(Code) | | Continuation marker
|
CE_PRIMARY_MASK_ | final static int CE_PRIMARY_MASK_(Code) | | Mask to get the primary strength of the collation element
|
CE_PRIMARY_SHIFT_ | final static int CE_PRIMARY_SHIFT_(Code) | | Primary strength shift
|
CE_SECONDARY_MASK_ | final static int CE_SECONDARY_MASK_(Code) | | Mask to get the secondary strength of the collation element
|
CE_SECONDARY_SHIFT_ | final static int CE_SECONDARY_SHIFT_(Code) | | Secondary strength shift
|
CE_SPECIAL_FLAG_ | final static int CE_SPECIAL_FLAG_(Code) | | |
CE_SURROGATE_TAG_ | final static int CE_SURROGATE_TAG_(Code) | | Lead surrogate that is tailored and doesn't start a contraction
|
CE_TAG_MASK_ | final static int CE_TAG_MASK_(Code) | | |
CE_TAG_SHIFT_ | final static int CE_TAG_SHIFT_(Code) | | |
CE_TERTIARY_MASK_ | final static int CE_TERTIARY_MASK_(Code) | | Mask to get the tertiary strength of the collation element
|
CODAN_PLACEHOLDER | final static byte CODAN_PLACEHOLDER(Code) | | |
COMMON_BOTTOM_2_ | final static int COMMON_BOTTOM_2_(Code) | | |
COMMON_TOP_2_ | final static int COMMON_TOP_2_(Code) | | |
SORT_LEVEL_TERMINATOR_ | final static byte SORT_LEVEL_TERMINATOR_(Code) | | |
UCA_CONSTANTS_ | final static UCAConstants UCA_CONSTANTS_(Code) | | UCA Constants
|
UCA_CONTRACTIONS_ | final static char UCA_CONTRACTIONS_(Code) | | Table for UCA and builder use
|
latinOneCEs_ | int latinOneCEs_(Code) | | |
latinOneFailed_ | boolean latinOneFailed_(Code) | | |
latinOneRegenTable_ | boolean latinOneRegenTable_(Code) | | |
latinOneTableLen_ | int latinOneTableLen_(Code) | | |
latinOneUse_ | boolean latinOneUse_(Code) | | |
m_ContInfo_ | ContractionInfo m_ContInfo_(Code) | | |
m_caseFirst_ | int m_caseFirst_(Code) | | Case sorting customization
|
m_contractionCE_ | int m_contractionCE_(Code) | | Contraction CE table
|
m_contractionEnd_ | byte m_contractionEnd_(Code) | | Table to store information on whether a codepoint can occur as the last
character in a contraction
|
m_contractionIndex_ | char m_contractionIndex_(Code) | | Contraction index table
|
m_contractionOffset_ | int m_contractionOffset_(Code) | | Size of collator raw data headers, options and expansions before
contraction data. This is used when contraction ces are to be retrieved.
ICU4C uses contraction offset starting from UCollator.UColHeader, hence
ICU4J will have to minus that off to get the right contraction ce
offset. In number of chars.
|
m_defaultCaseFirst_ | int m_defaultCaseFirst_(Code) | | |
m_defaultDecomposition_ | int m_defaultDecomposition_(Code) | | |
m_defaultIsAlternateHandlingShifted_ | boolean m_defaultIsAlternateHandlingShifted_(Code) | | |
m_defaultIsCaseLevel_ | boolean m_defaultIsCaseLevel_(Code) | | |
m_defaultIsFrenchCollation_ | boolean m_defaultIsFrenchCollation_(Code) | | |
m_defaultIsHiragana4_ | boolean m_defaultIsHiragana4_(Code) | | |
m_defaultIsNumericCollation_ | boolean m_defaultIsNumericCollation_(Code) | | |
m_defaultStrength_ | int m_defaultStrength_(Code) | | |
m_defaultVariableTopValue_ | int m_defaultVariableTopValue_(Code) | | |
m_expansionEndCEMaxSize_ | byte m_expansionEndCEMaxSize_(Code) | | Table to store the maximum size of any expansions that end with the
corresponding collation element in m_expansionEndCE_. For use in
StringSearch too
|
m_expansionEndCE_ | int m_expansionEndCE_(Code) | | Table to store all collation elements that are the last element of an
expansion. This is for use in StringSearch.
|
m_expansionOffset_ | int m_expansionOffset_(Code) | | Size of collator raw data headers and options before the expansion
data. This is used when expansion ces are to be retrieved. ICU4C uses
the expansion offset starting from UCollator.UColHeader, hence ICU4J
will have to minus that off to get the right expansion ce offset. In
number of ints.
|
m_expansion_ | int m_expansion_(Code) | | Expansion table
|
m_isHiragana4_ | boolean m_isHiragana4_(Code) | | Attribute for special Hiragana
|
m_isJamoSpecial_ | boolean m_isJamoSpecial_(Code) | | Flag indicator if Jamo is special
|
m_isNumericCollation_ | boolean m_isNumericCollation_(Code) | | Numeric collation option
|
m_minContractionEnd_ | char m_minContractionEnd_(Code) | | The smallest codepoint that could be the end of a contraction
|
m_minUnsafe_ | char m_minUnsafe_(Code) | | The smallest "unsafe" codepoint
|
m_unsafe_ | byte m_unsafe_(Code) | | Heuristic table to store information on whether a char character is
considered "unsafe". "Unsafe" character are combining marks or those
belonging to some contraction sequence from the offset 1 onwards.
E.g. if "ABC" is the only contraction, then 'B' and 'C' are considered
unsafe. If we have another contraction "ZA" with the one above, then
'A', 'B', 'C' are "unsafe" but 'Z' is not.
|
m_variableTopValue_ | int m_variableTopValue_(Code) | | Value of the variable top
|
maxImplicitPrimary | final static int maxImplicitPrimary(Code) | | |
maxRegularPrimary | final static int maxRegularPrimary(Code) | | |
minImplicitPrimary | final static int minImplicitPrimary(Code) | | |
RuleBasedCollator | public RuleBasedCollator(String rules) throws Exception(Code) | |
Constructor that takes the argument rules for
customization. The collator will be based on UCA,
with the attributes and re-ordering of the characters specified in the
argument rules.
See the user guide's section on
Collation Customization for details on the rule syntax.
Parameters: rules - the collation rules to build the collation table from. exception: ParseException - and IOException thrown. ParseException thrownwhen argument rules have an invalid syntax. IOExceptionthrown when an error occured while reading internal data. |
RuleBasedCollator | RuleBasedCollator()(Code) | | Private contructor for use by subclasses.
Public access to creating Collators is handled by the API
Collator.getInstance() or RuleBasedCollator(String rules).
This constructor constructs the UCA collator internally
|
RuleBasedCollator | RuleBasedCollator(ULocale locale)(Code) | | Constructors a RuleBasedCollator from the argument locale.
If no resource bundle is associated with the locale, UCA is used
instead.
Parameters: locale - |
compare | public int compare(String source, String target)(Code) | | Compares the source text String to the target text String according to
the collation rules, strength and decomposition mode for this
RuleBasedCollator.
Returns an integer less than,
equal to or greater than zero depending on whether the source String is
less than, equal to or greater than the target String. See the Collator
class description for an example of use.
General recommendation:
If comparison are to be done to the same String multiple times, it would
be more efficient to generate CollationKeys for the Strings and use
CollationKey.compareTo(CollationKey) for the comparisons.
If speed performance is critical and object instantiation is to be
reduced, further optimization may be achieved by generating a simpler
key of the form RawCollationKey and reusing this RawCollationKey
object with the method RuleBasedCollator.getRawCollationKey. Internal
byte representation can be directly accessed via RawCollationKey and
stored for future use. Like CollationKey, RawCollationKey provides a
method RawCollationKey.compareTo for key comparisons.
If the each Strings are compared to only once, using the method
RuleBasedCollator.compare(String, String) will have a better performance.
Parameters: source - the source text String. Parameters: target - the target text String. Returns an integer value. Value is less than zero if source isless than target, value is zero if source and target are equal,value is greater than zero if source is greater than target. See Also: CollationKey See Also: RuleBasedCollator.getCollationKey |
equals | public boolean equals(Object obj)(Code) | | Compares the equality of two RuleBasedCollator objects.
RuleBasedCollator objects are equal if they have the same collation
rules and the same attributes.
Parameters: obj - the RuleBasedCollator to be compared to. true if this RuleBasedCollator has exactly the samecollation behaviour as obj, false otherwise. |
getCollationKey | public CollationKey getCollationKey(String source)(Code) | |
Get a Collation key for the argument String source from this
RuleBasedCollator.
General recommendation:
If comparison are to be done to the same String multiple times, it would
be more efficient to generate CollationKeys for the Strings and use
CollationKey.compareTo(CollationKey) for the comparisons.
If the each Strings are compared to only once, using the method
RuleBasedCollator.compare(String, String) will have a better performance.
See the class documentation for an explanation about CollationKeys.
Parameters: source - the text String to be transformed into a collation key. the CollationKey for the given String based on thisRuleBasedCollator's collation rules. If the source String isnull, a null CollationKey is returned. See Also: CollationKey See Also: RuleBasedCollator.compare(String,String) See Also: RuleBasedCollator.getRawCollationKey |
getContractionsAndExpansions | public void getContractionsAndExpansions(UnicodeSet contractions, UnicodeSet expansions, boolean addPrefixes) throws Exception(Code) | | Gets unicode sets containing contractions and/or expansions of a collator
Parameters: contractions - if not null, set to contain contractions Parameters: expansions - if not null, set to contain expansions Parameters: addPrefixes - add the prefix contextual elements to contractions throws: Exception - |
getNumericCollation | public boolean getNumericCollation()(Code) | | Method to retrieve the numeric collation value.
When numeric collation is turned on, this Collator generates a collation
key for the numeric value of substrings of digits. This is a way to get
'100' to sort AFTER '2'
See Also: RuleBasedCollator.setNumericCollation See Also: RuleBasedCollator.setNumericCollationDefault true if numeric collation is turned on, false otherwise |
getRawCollationKey | public RawCollationKey getRawCollationKey(String source, RawCollationKey key)(Code) | | Gets the simpler form of a CollationKey for the String source following
the rules of this Collator and stores the result into the user provided
argument key.
If key has a internal byte array of length that's too small for the
result, the internal byte array will be grown to the exact required
size.
Parameters: source - the text String to be transformed into a RawCollationKey Parameters: key - output RawCollationKey to store results If key is null, a new instance of RawCollationKey will be created and returned, otherwise the user provided key will be returned. See Also: RuleBasedCollator.getCollationKey See Also: See Also: RuleBasedCollator.compare(String,String) See Also: RawCollationKey |
getRules | public String getRules(boolean fullrules)(Code) | | Returns current rules. The argument defines whether full rules
(UCA + tailored) rules are returned or just the tailoring.
Parameters: fullrules - true if the rules that defines the full set of collation order is required, otherwise false for returning only the tailored rules the current rules that defines this Collator. See Also: RuleBasedCollator.getRules() |
getTag | static int getTag(int ce)(Code) | | Retrieve the tag of a special ce
Parameters: ce - ce to test tag of ce |
getTailoredSet | public UnicodeSet getTailoredSet()(Code) | | Get an UnicodeSet that contains all the characters and sequences
tailored in this collator.
a pointer to a UnicodeSet object containing all thecode points and sequences that may sort differently thanin the UCA. exception: ParseException - thrown when argument rules have aninvalid syntax. IOException |
getUCAVersion | public VersionInfo getUCAVersion()(Code) | | Get the UCA version of this collator object.
the version object associated with this collator |
getVariableTop | public int getVariableTop()(Code) | | Gets the variable top value of a Collator.
Lower 16 bits are undefined and should be ignored.
the variable top value of a Collator. See Also: RuleBasedCollator.setVariableTop |
getVersion | public VersionInfo getVersion()(Code) | | Get the version of this collator object.
the version object associated with this collator |
hashCode | public int hashCode()(Code) | | Generates a unique hash code for this RuleBasedCollator.
the unique hash code for this Collator |
isAlternateHandlingShifted | public boolean isAlternateHandlingShifted()(Code) | | Checks if the alternate handling behaviour is the UCA defined SHIFTED or
NON_IGNORABLE.
If return value is true, then the alternate handling attribute for the
Collator is SHIFTED. Otherwise if return value is false, then the
alternate handling attribute for the Collator is NON_IGNORABLE
See setAlternateHandlingShifted(boolean) for more details.
true or false See Also: RuleBasedCollator.setAlternateHandlingShifted(boolean) See Also: RuleBasedCollator.setAlternateHandlingDefault |
isContinuation | final static boolean isContinuation(int ce)(Code) | | Checks if the argument ce is a continuation
Parameters: ce - collation element to test true if ce is a continuation |
isContractionEnd | final boolean isContractionEnd(char ch)(Code) | | Approximate determination if a char character is at a contraction end.
Guaranteed to be true if a character is at the end of a contraction,
otherwise it is not deterministic.
Parameters: ch - character to be determined |
isSpecial | static boolean isSpecial(int ce)(Code) | | Checking if ce is special
Parameters: ce - to check true if ce is special |
isUnsafe | final boolean isUnsafe(char ch)(Code) | | Test whether a char character is potentially "unsafe" for use as a
collation starting point. "Unsafe" characters are combining marks or
those belonging to some contraction sequence from the offset 1 onwards.
E.g. if "ABC" is the only contraction, then 'B' and
'C' are considered unsafe. If we have another contraction "ZA" with
the one above, then 'A', 'B', 'C' are "unsafe" but 'Z' is not.
Parameters: ch - character to determin true if ch is unsafe, false otherwise |
setAlternateHandlingShifted | public void setAlternateHandlingShifted(boolean shifted)(Code) | | Sets the alternate handling for QUATERNARY strength to be either
shifted or non-ignorable.
See the UCA definition on
Alternate Weighting.
This attribute will only be effective when QUATERNARY strength is set.
The default value for this mode is false, corresponding to the
NON_IGNORABLE mode in UCA. In the NON-IGNORABLE mode, the
RuleBasedCollator will treats all the codepoints with non-ignorable
primary weights in the same way.
If the mode is set to true, the behaviour corresponds to SHIFTED defined
in UCA, this causes codepoints with PRIMARY orders that are equal or
below the variable top value to be ignored in PRIMARY order and
moved to the QUATERNARY order.
Parameters: shifted - true if SHIFTED behaviour for alternate handling isdesired, false for the NON_IGNORABLE behaviour. See Also: RuleBasedCollator.isAlternateHandlingShifted See Also: RuleBasedCollator.setAlternateHandlingDefault |
setCaseLevel | public void setCaseLevel(boolean flag)(Code) | |
When case level is set to true, an additional weight is formed
between the SECONDARY and TERTIARY weight, known as the case level.
The case level is used to distinguish large and small Japanese Kana
characters. Case level could also be used in other situations.
For example to distinguish certain Pinyin characters.
The default value is false, which means the case level is not generated.
The contents of the case level are affected by the case first
mode. A simple way to ignore accent differences in a string is to set
the strength to PRIMARY and enable case level.
See the section on
case level for more information.
Parameters: flag - true if case level sorting is required, false otherwise See Also: RuleBasedCollator.setCaseLevelDefault See Also: RuleBasedCollator.isCaseLevel |
setFrenchCollation | public void setFrenchCollation(boolean flag)(Code) | | Sets the mode for the direction of SECONDARY weights to be used in
French collation.
The default value is false, which treats SECONDARY weights in the order
they appear.
If set to true, the SECONDARY weights will be sorted backwards.
See the section on
French collation for more information.
Parameters: flag - true to set the French collation on, false to set it off See Also: RuleBasedCollator.isFrenchCollation See Also: RuleBasedCollator.setFrenchCollationDefault |
setHiraganaQuaternary | public void setHiraganaQuaternary(boolean flag)(Code) | | Sets the Hiragana Quaternary mode to be on or off.
When the Hiragana Quaternary mode is turned on, the collator
positions Hiragana characters before all non-ignorable characters in
QUATERNARY strength. This is to produce a correct JIS collation order,
distinguishing between Katakana and Hiragana characters.
Parameters: flag - true if Hiragana Quaternary mode is to be on, falseotherwise See Also: RuleBasedCollator.setHiraganaQuaternaryDefault See Also: RuleBasedCollator.isHiraganaQuaternary |
setVariableTop | public int setVariableTop(String varTop)(Code) | |
Variable top is a two byte primary value which causes all the codepoints
with primary values that are less or equal than the variable top to be
shifted when alternate handling is set to SHIFTED.
Sets the variable top to a collation element value of a string supplied.
Parameters: varTop - one or more (if contraction) characters to which the variable top should be set a int value containing the value of the variable top in upper 16bits. Lower 16 bits are undefined. exception: IllegalArgumentException - is thrown if varTop argument is not a valid variable top element. A variable top element is invalid when - it is a contraction that does not exist in theCollation order
- when the PRIMARY strength collation element for the variable top has more than two bytes
- when the varTop argument is null or zero in length.
See Also: RuleBasedCollator.getVariableTop See Also: RuleBasedCollator.setAlternateHandlingShifted |
setWithUCAData | final void setWithUCAData()(Code) | | Sets this collator to use the all options and tables in UCA.
|
setWithUCATables | final void setWithUCATables()(Code) | | Sets this collator to use the tables in UCA. Note options not taken
care of here.
|
Methods inherited from com.ibm.icu.text.Collator | public Object clone() throws CloneNotSupportedException(Code)(Java Doc) public int compare(Object source, Object target)(Code)(Java Doc) abstract public int compare(String source, String target)(Code)(Java Doc) public boolean equals(String source, String target)(Code)(Java Doc) public static Locale[] getAvailableLocales()(Code)(Java Doc) final public static ULocale[] getAvailableULocales()(Code)(Java Doc) abstract public CollationKey getCollationKey(String source)(Code)(Java Doc) public int getDecomposition()(Code)(Java Doc) public static String getDisplayName(Locale objectLocale, Locale displayLocale)(Code)(Java Doc) public static String getDisplayName(ULocale objectLocale, ULocale displayLocale)(Code)(Java Doc) public static String getDisplayName(Locale objectLocale)(Code)(Java Doc) public static String getDisplayName(ULocale objectLocale)(Code)(Java Doc) final public static ULocale getFunctionalEquivalent(String keyword, ULocale locID, boolean isAvailable)(Code)(Java Doc) final public static ULocale getFunctionalEquivalent(String keyword, ULocale locID)(Code)(Java Doc) final public static Collator getInstance()(Code)(Java Doc) final public static Collator getInstance(ULocale locale)(Code)(Java Doc) final public static Collator getInstance(Locale locale)(Code)(Java Doc) final public static String[] getKeywordValues(String keyword)(Code)(Java Doc) final public static String[] getKeywords()(Code)(Java Doc) final public ULocale getLocale(ULocale.Type type)(Code)(Java Doc) abstract public RawCollationKey getRawCollationKey(String source, RawCollationKey key)(Code)(Java Doc) public int getStrength()(Code)(Java Doc) public UnicodeSet getTailoredSet()(Code)(Java Doc) abstract public VersionInfo getUCAVersion()(Code)(Java Doc) abstract public int getVariableTop()(Code)(Java Doc) abstract public VersionInfo getVersion()(Code)(Java Doc) final public static Object registerFactory(CollatorFactory factory)(Code)(Java Doc) final public static Object registerInstance(Collator collator, ULocale locale)(Code)(Java Doc) public void setDecomposition(int decomposition)(Code)(Java Doc) final void setLocale(ULocale valid, ULocale actual)(Code)(Java Doc) public void setStrength(int newStrength)(Code)(Java Doc) abstract public int setVariableTop(String varTop)(Code)(Java Doc) abstract public void setVariableTop(int varTop)(Code)(Java Doc) final public static boolean unregister(Object registryKey)(Code)(Java Doc)
|
|
|