| java.lang.Object com.ibm.icu.text.Collator
All known Subclasses: com.ibm.icu.text.CollatorServiceShim, com.ibm.icu.text.RuleBasedCollator,
Collator | abstract public class Collator implements Comparator,Cloneable(Code) | | Collator performs locale-sensitive string comparison. A concrete
subclass, RuleBasedCollator, allows customization of the collation
ordering by the use of rule sets.
Following the Unicode
Consortium's specifications for the
Unicode Collation
Algorithm (UCA), there are 5 different levels of strength used
in comparisons:
- PRIMARY strength: Typically, this is used to denote differences between
base characters (for example, "a" < "b").
It is the strongest difference. For example, dictionaries are divided
into different sections by base character.
- SECONDARY strength: Accents in the characters are considered secondary
differences (for example, "as" < "às" < "at"). Other
differences
between letters can also be considered secondary differences, depending
on the language. A secondary difference is ignored when there is a
primary difference anywhere in the strings.
- TERTIARY strength: Upper and lower case differences in characters are
distinguished at tertiary strength (for example, "ao" < "Ao" <
"aò"). In addition, a variant of a letter differs from the base
form on the tertiary strength (such as "A" and "Ⓐ"). Another
example is the
difference between large and small Kana. A tertiary difference is ignored
when there is a primary or secondary difference anywhere in the strings.
- QUATERNARY strength: When punctuation is ignored
(see Ignoring Punctuations in the user guide) at PRIMARY to TERTIARY
strength, an additional strength level can
be used to distinguish words with and without punctuation (for example,
"ab" < "a-b" < "aB").
This difference is ignored when there is a PRIMARY, SECONDARY or TERTIARY
difference. The QUATERNARY strength should only be used if ignoring
punctuation is required.
- IDENTICAL strength:
When all other strengths are equal, the IDENTICAL strength is used as a
tiebreaker. The Unicode code point values of the NFD form of each string
are compared, just in case there is no difference.
For example, Hebrew cantellation marks are only distinguished at this
strength. This strength should be used sparingly, as only code point
value differences between two strings is an extremely rare occurrence.
Using this strength substantially decreases the performance for both
comparison and collation key generation APIs. This strength also
increases the size of the collation key.
Unlike the JDK, ICU4J's Collator deals only with 2 decomposition modes,
the canonical decomposition mode and one that does not use any decomposition.
The compatibility decomposition mode, java.text.Collator.FULL_DECOMPOSITION
is not supported here. If the canonical
decomposition mode is set, the Collator handles un-normalized text properly,
producing the same results as if the text were normalized in NFD. If
canonical decomposition is turned off, it is the user's responsibility to
ensure that all text is already in the appropriate form before performing
a comparison or before getting a CollationKey.
For more information about the collation service see the
users
guide.
Examples of use
// Get the Collator for US English and set its strength to PRIMARY
Collator usCollator = Collator.getInstance(Locale.US);
usCollator.setStrength(Collator.PRIMARY);
if (usCollator.compare("abc", "ABC") == 0) {
System.out.println("Strings are equivalent");
}
The following example shows how to compare two strings using the
Collator for the default locale.
// Compare two strings in the default locale
Collator myCollator = Collator.getInstance();
myCollator.setDecomposition(NO_DECOMPOSITION);
if (myCollator.compare("à\u0325", "a\u0325̀") != 0) {
System.out.println("à\u0325 is not equals to a\u0325̀ without decomposition");
myCollator.setDecomposition(CANONICAL_DECOMPOSITION);
if (myCollator.compare("à\u0325", "a\u0325̀") != 0) {
System.out.println("Error: à\u0325 should be equals to a\u0325̀ with decomposition");
}
else {
System.out.println("à\u0325 is equals to a\u0325̀ with decomposition");
}
}
else {
System.out.println("Error: à\u0325 should be not equals to a\u0325̀ without decomposition");
}
See Also: RuleBasedCollator See Also: CollationKey author: Syn Wee Quek |
Inner Class :abstract public static class CollatorFactory | |
Inner Class :abstract static class ServiceShim | |
Field Summary | |
final public static int | CANONICAL_DECOMPOSITION Decomposition mode value. | final public static int | FULL_DECOMPOSITION This is for backwards compatibility with Java APIs only. | final public static int | IDENTICAL
Smallest Collator strength value. | final public static int | NO_DECOMPOSITION Decomposition mode value. | final public static int | PRIMARY Strongest collator strength value. | final public static int | QUATERNARY Fourth level collator strength value. | final public static int | SECONDARY Second level collator strength value. | final public static int | TERTIARY Third level collator strength value.
Upper and lower case differences in characters are distinguished at this
strength level. |
Constructor Summary | |
protected | Collator() |
Method Summary | |
public Object | clone() Clone the collator. | public int | compare(Object source, Object target)
Compares the source text String to the target text String according to
this Collator's rules, strength and decomposition mode.
Returns an integer less than,
equal to or greater than zero depending on whether the source String is
less than, equal to or greater than the target String. | abstract public int | compare(String source, String target)
Compares the source text String to the target text String according to
this Collator's rules, strength and decomposition mode.
Returns an integer less than,
equal to or greater than zero depending on whether the source String is
less than, equal to or greater than the target String. | public boolean | equals(String source, String target) Convenience method for comparing the equality of two text Strings using
this Collator's rules, strength and decomposition mode.
Parameters: source - the source string to be compared. Parameters: target - the target string to be compared. | public static Locale[] | getAvailableLocales() Get the set of locales, as Locale objects, for which collators
are installed. | final public static ULocale[] | getAvailableULocales() Get the set of locales, as ULocale objects, for which collators
are installed. | abstract public CollationKey | getCollationKey(String source)
Transforms the String into a CollationKey suitable for efficient
repeated comparison. | public int | getDecomposition()
Get the decomposition mode of this Collator. | public static String | getDisplayName(Locale objectLocale, Locale displayLocale) Get the name of the collator for the objectLocale, localized for the displayLocale. | public static String | getDisplayName(ULocale objectLocale, ULocale displayLocale) Get the name of the collator for the objectLocale, localized for the displayLocale. | public static String | getDisplayName(Locale objectLocale) Get the name of the collator for the objectLocale, localized for the current locale. | public static String | getDisplayName(ULocale objectLocale) Get the name of the collator for the objectLocale, localized for the current locale. | final public static ULocale | getFunctionalEquivalent(String keyword, ULocale locID, boolean isAvailable) Return the functionally equivalent locale for the given
requested locale, with respect to given keyword, for the
collation service. | final public static ULocale | getFunctionalEquivalent(String keyword, ULocale locID) Return the functionally equivalent locale for the given
requested locale, with respect to given keyword, for the
collation service. | final public static Collator | getInstance() Gets the Collator for the current default locale.
The default locale is determined by java.util.Locale.getDefault().
the Collator for the default locale (for example, en_US) if itis created successfully. | final public static Collator | getInstance(ULocale locale) Gets the Collator for the desired locale.
Parameters: locale - the desired locale. | final public static Collator | getInstance(Locale locale) Gets the Collator for the desired locale.
Parameters: locale - the desired locale. | final public static String[] | getKeywordValues(String keyword) Given a keyword, return an array of all values for
that keyword that are currently in use. | final public static String[] | getKeywords() Return an array of all possible keywords that are relevant to
collation. | final public ULocale | getLocale(ULocale.Type type) Return the locale that was used to create this object, or null.
This may may differ from the locale requested at the time of
this object's creation. | abstract public RawCollationKey | getRawCollationKey(String source, RawCollationKey key) Gets the simpler form of a CollationKey for the String source following
the rules of this Collator and stores the result into the user provided
argument key. | public int | getStrength() Returns this Collator's strength property. | public UnicodeSet | getTailoredSet() Get an UnicodeSet that contains all the characters and sequences
tailored in this collator. | abstract public VersionInfo | getUCAVersion() Get the UCA version of this collator object. | abstract public int | getVariableTop() Gets the variable top value of a Collator. | abstract public VersionInfo | getVersion() Get the version of this collator object. | final public static Object | registerFactory(CollatorFactory factory) Register a collator factory. | final public static Object | registerInstance(Collator collator, ULocale locale) Register a collator as the default collator for the provided locale. | public void | setDecomposition(int decomposition) Set the decomposition mode of this Collator. | final void | setLocale(ULocale valid, ULocale actual) Set information about the locales that were used to create this
object. | public void | setStrength(int newStrength) Sets this Collator's strength property. | abstract public int | setVariableTop(String varTop)
Variable top is a two byte primary value which causes all the codepoints
with primary values that are less or equal than the variable top to be
shifted when alternate handling is set to SHIFTED.
Sets the variable top to a collation element value of a string supplied.
Parameters: varTop - one or more (if contraction) characters to which the variable top should be set a int value containing the value of the variable top in upper 16bits. | abstract public void | setVariableTop(int varTop) Sets the variable top to a collation element value supplied.
Variable top is set to the upper 16 bits. | final public static boolean | unregister(Object registryKey) Unregister a collator previously registered using registerInstance.
Parameters: registryKey - the object previously returned by registerInstance. |
FULL_DECOMPOSITION | final public static int FULL_DECOMPOSITION(Code) | | This is for backwards compatibility with Java APIs only. It
should not be used, IDENTICAL should be used instead. ICU's
collation does not support Java's FULL_DECOMPOSITION mode.
|
IDENTICAL | final public static int IDENTICAL(Code) | |
Smallest Collator strength value. When all other strengths are equal,
the IDENTICAL strength is used as a tiebreaker. The Unicode code point
values of the NFD form of each string are compared, just in case there
is no difference.
See class documentation for more explanation.
Note this value is different from JDK's
|
PRIMARY | final public static int PRIMARY(Code) | | Strongest collator strength value. Typically used to denote differences
between base characters. See class documentation for more explanation.
See Also: Collator.setStrength See Also: Collator.getStrength |
SECONDARY | final public static int SECONDARY(Code) | | Second level collator strength value.
Accents in the characters are considered secondary differences.
Other differences between letters can also be considered secondary
differences, depending on the language.
See class documentation for more explanation.
See Also: Collator.setStrength See Also: Collator.getStrength |
TERTIARY | final public static int TERTIARY(Code) | | Third level collator strength value.
Upper and lower case differences in characters are distinguished at this
strength level. In addition, a variant of a letter differs from the base
form on the tertiary level.
See class documentation for more explanation.
See Also: Collator.setStrength See Also: Collator.getStrength |
Collator | protected Collator()(Code) | | Empty default constructor to make javadocs happy
|
compare | public int compare(Object source, Object target)(Code) | |
Compares the source text String to the target text String according to
this Collator's rules, strength and decomposition mode.
Returns an integer less than,
equal to or greater than zero depending on whether the source String is
less than, equal to or greater than the target String. See the Collator
class description for an example of use.
Parameters: source - the source String. Parameters: target - the target String. Returns an integer value. Value is less than zero if source isless than target, value is zero if source and target are equal,value is greater than zero if source is greater than target. See Also: CollationKey See Also: Collator.getCollationKey exception: NullPointerException - thrown if either arguments is null.IllegalArgumentException thrown if either source or target isnot of the class String. |
compare | abstract public int compare(String source, String target)(Code) | |
Compares the source text String to the target text String according to
this Collator's rules, strength and decomposition mode.
Returns an integer less than,
equal to or greater than zero depending on whether the source String is
less than, equal to or greater than the target String. See the Collator
class description for an example of use.
Parameters: source - the source String. Parameters: target - the target String. Returns an integer value. Value is less than zero if source isless than target, value is zero if source and target are equal,value is greater than zero if source is greater than target. See Also: CollationKey See Also: Collator.getCollationKey exception: NullPointerException - thrown if either arguments is null. |
equals | public boolean equals(String source, String target)(Code) | | Convenience method for comparing the equality of two text Strings using
this Collator's rules, strength and decomposition mode.
Parameters: source - the source string to be compared. Parameters: target - the target string to be compared. true if the strings are equal according to the collationrules, otherwise false. See Also: Collator.compare exception: NullPointerException - thrown if either arguments is null. |
getAvailableLocales | public static Locale[] getAvailableLocales()(Code) | | Get the set of locales, as Locale objects, for which collators
are installed. Note that Locale objects do not support RFC 3066.
the list of locales in which collators are installed.This list includes any that have been registered, in addition tothose that are installed with ICU4J. |
getAvailableULocales | final public static ULocale[] getAvailableULocales()(Code) | | Get the set of locales, as ULocale objects, for which collators
are installed. ULocale objects support RFC 3066.
the list of locales in which collators are installed.This list includes any that have been registered, in addition tothose that are installed with ICU4J. |
getCollationKey | abstract public CollationKey getCollationKey(String source)(Code) | |
Transforms the String into a CollationKey suitable for efficient
repeated comparison. The resulting key depends on the collator's
rules, strength and decomposition mode.
See the CollationKey class documentation for more information.
Parameters: source - the string to be transformed into a CollationKey. the CollationKey for the given String based on this Collator'scollation rules. If the source String is null, a nullCollationKey is returned. See Also: CollationKey See Also: Collator.compare(String,String) See Also: Collator.getRawCollationKey |
getDisplayName | public static String getDisplayName(Locale objectLocale, Locale displayLocale)(Code) | | Get the name of the collator for the objectLocale, localized for the displayLocale.
Parameters: objectLocale - the locale of the collator Parameters: displayLocale - the locale for the collator's display name the display name |
getDisplayName | public static String getDisplayName(ULocale objectLocale, ULocale displayLocale)(Code) | | Get the name of the collator for the objectLocale, localized for the displayLocale.
Parameters: objectLocale - the locale of the collator Parameters: displayLocale - the locale for the collator's display name the display name |
getDisplayName | public static String getDisplayName(Locale objectLocale)(Code) | | Get the name of the collator for the objectLocale, localized for the current locale.
Parameters: objectLocale - the locale of the collator the display name |
getDisplayName | public static String getDisplayName(ULocale objectLocale)(Code) | | Get the name of the collator for the objectLocale, localized for the current locale.
Parameters: objectLocale - the locale of the collator the display name |
getFunctionalEquivalent | final public static ULocale getFunctionalEquivalent(String keyword, ULocale locID, boolean isAvailable)(Code) | | Return the functionally equivalent locale for the given
requested locale, with respect to given keyword, for the
collation service. If two locales return the same result, then
collators instantiated for these locales will behave
equivalently. The converse is not always true; two collators
may in fact be equivalent, but return different results, due to
internal details. The return result has no other meaning than
that stated above, and implies nothing as to the relationship
between the two locales. This is intended for use by
applications who wish to cache collators, or otherwise reuse
collators when possible. The functional equivalent may change
over time. For more information, please see the
Locales and Services section of the ICU User Guide.
Parameters: keyword - a particular keyword as enumerated bygetKeywords. Parameters: locID - The requested locale Parameters: isAvailable - If non-null, isAvailable[0] will receive andoutput boolean that indicates whether the requested locale was'available' to the collation service. The locale is defined as'available' if it physically exists within the collation localedata. If non-null, isAvailable must have length >= 1. the locale |
getFunctionalEquivalent | final public static ULocale getFunctionalEquivalent(String keyword, ULocale locID)(Code) | | Return the functionally equivalent locale for the given
requested locale, with respect to given keyword, for the
collation service.
Parameters: keyword - a particular keyword as enumerated bygetKeywords. Parameters: locID - The requested locale the locale See Also: Collator.getFunctionalEquivalent(String,ULocale,boolean[]) |
getInstance | final public static Collator getInstance()(Code) | | Gets the Collator for the current default locale.
The default locale is determined by java.util.Locale.getDefault().
the Collator for the default locale (for example, en_US) if itis created successfully. Otherwise if there is no Collatorassociated with the current locale, the default UCA collatorwill be returned. See Also: java.util.Locale.getDefault See Also: Collator.getInstance(Locale) |
getKeywordValues | final public static String[] getKeywordValues(String keyword)(Code) | | Given a keyword, return an array of all values for
that keyword that are currently in use.
Parameters: keyword - one of the keywords returned by getKeywords. See Also: Collator.getKeywords |
getKeywords | final public static String[] getKeywords()(Code) | | Return an array of all possible keywords that are relevant to
collation. At this point, the only recognized keyword for this
service is "collation".
an array of valid collation keywords. See Also: Collator.getKeywordValues |
getRawCollationKey | abstract public RawCollationKey getRawCollationKey(String source, RawCollationKey key)(Code) | | Gets the simpler form of a CollationKey for the String source following
the rules of this Collator and stores the result into the user provided
argument key.
If key has a internal byte array of length that's too small for the
result, the internal byte array will be grown to the exact required
size.
Parameters: source - the text String to be transformed into a RawCollationKey If key is null, a new instance of RawCollationKey will be created and returned, otherwise the user provided key will be returned. See Also: Collator.compare(String,String) See Also: Collator.getCollationKey See Also: See Also: RawCollationKey |
getTailoredSet | public UnicodeSet getTailoredSet()(Code) | | Get an UnicodeSet that contains all the characters and sequences
tailored in this collator.
a pointer to a UnicodeSet object containing all thecode points and sequences that may sort differently thanin the UCA. |
getUCAVersion | abstract public VersionInfo getUCAVersion()(Code) | | Get the UCA version of this collator object.
the version object associated with this collator |
getVariableTop | abstract public int getVariableTop()(Code) | | Gets the variable top value of a Collator.
Lower 16 bits are undefined and should be ignored.
the variable top value of a Collator. See Also: Collator.setVariableTop |
getVersion | abstract public VersionInfo getVersion()(Code) | | Get the version of this collator object.
the version object associated with this collator |
registerFactory | final public static Object registerFactory(CollatorFactory factory)(Code) | | Register a collator factory.
Parameters: factory - the factory to register an object that can be used to unregister the registered factory. |
registerInstance | final public static Object registerInstance(Collator collator, ULocale locale)(Code) | | Register a collator as the default collator for the provided locale. The
collator should not be modified after it is registered.
Parameters: collator - the collator to register Parameters: locale - the locale for which this is the default collator an object that can be used to unregister the registered collator. |
setDecomposition | public void setDecomposition(int decomposition)(Code) | | Set the decomposition mode of this Collator. Setting this
decomposition property with CANONICAL_DECOMPOSITION allows the
Collator to handle un-normalized text properly, producing the
same results as if the text were normalized. If
NO_DECOMPOSITION is set, it is the user's responsibility to
insure that all text is already in the appropriate form before
a comparison or before getting a CollationKey. Adjusting
decomposition mode allows the user to select between faster and
more complete collation behavior.
Since a great many of the world's languages do not require
text normalization, most locales set NO_DECOMPOSITION as the
default decomposition mode.
The default decompositon mode for the Collator is
NO_DECOMPOSITON, unless specified otherwise by the locale used
to create the Collator.
See getDecomposition for a description of decomposition
mode.
Parameters: decomposition - the new decomposition mode See Also: Collator.getDecomposition See Also: Collator.NO_DECOMPOSITION See Also: Collator.CANONICAL_DECOMPOSITION exception: IllegalArgumentException - If the given value is not a validdecomposition mode. |
setLocale | final void setLocale(ULocale valid, ULocale actual)(Code) | | Set information about the locales that were used to create this
object. If the object was not constructed from locale data,
both arguments should be set to null. Otherwise, neither
should be null. The actual locale must be at the same level or
less specific than the valid locale. This method is intended
for use by factories or other entities that create objects of
this class.
Parameters: valid - the most specific locale containing any resourcedata, or null Parameters: actual - the locale containing data used to construct thisobject, or null See Also: com.ibm.icu.util.ULocale See Also: com.ibm.icu.util.ULocale.VALID_LOCALE See Also: com.ibm.icu.util.ULocale.ACTUAL_LOCALE |
setStrength | public void setStrength(int newStrength)(Code) | | Sets this Collator's strength property. The strength property
determines the minimum level of difference considered significant
during comparison.
The default strength for the Collator is TERTIARY, unless specified
otherwise by the locale used to create the Collator.
See the Collator class description for an example of use.
Parameters: newStrength - the new strength value. See Also: Collator.getStrength See Also: Collator.PRIMARY See Also: Collator.SECONDARY See Also: Collator.TERTIARY See Also: Collator.QUATERNARY See Also: Collator.IDENTICAL exception: IllegalArgumentException - if the new strength value is not oneof PRIMARY, SECONDARY, TERTIARY, QUATERNARY or IDENTICAL. |
setVariableTop | abstract public int setVariableTop(String varTop)(Code) | |
Variable top is a two byte primary value which causes all the codepoints
with primary values that are less or equal than the variable top to be
shifted when alternate handling is set to SHIFTED.
Sets the variable top to a collation element value of a string supplied.
Parameters: varTop - one or more (if contraction) characters to which the variable top should be set a int value containing the value of the variable top in upper 16bits. Lower 16 bits are undefined. exception: IllegalArgumentException - is thrown if varTop argument is not a valid variable top element. A variable top element is invalid when it is a contraction that does not exist in theCollation order or when the PRIMARY strength collation element for the variable top has more than two bytes See Also: Collator.getVariableTop See Also: RuleBasedCollator.setAlternateHandlingShifted |
setVariableTop | abstract public void setVariableTop(int varTop)(Code) | | Sets the variable top to a collation element value supplied.
Variable top is set to the upper 16 bits.
Lower 16 bits are ignored.
Parameters: varTop - Collation element value, as returned by setVariableTop or getVariableTop See Also: Collator.getVariableTop See Also: Collator.setVariableTop |
unregister | final public static boolean unregister(Object registryKey)(Code) | | Unregister a collator previously registered using registerInstance.
Parameters: registryKey - the object previously returned by registerInstance. true if the collator was successfully unregistered. |
|
|