| java.lang.Object com.ibm.icu.lang.UCharacter
UCharacter | final public class UCharacter implements ECharacterCategory,ECharacterDirection(Code) | |
The UCharacter class provides extensions to the
java.lang.Character class. These extensions provide support for
more Unicode properties and together with the UTF16
class, provide support for supplementary characters (those with code
points above U+FFFF).
Each ICU release supports the latest version of Unicode available at that time.
Code points are represented in these API using ints. While it would be
more convenient in Java to have a separate primitive datatype for them,
ints suffice in the meantime.
To use this class please add the jar file name icu4j.jar to the
class path, since it contains data files which supply the information used
by this file.
E.g. In Windows
set CLASSPATH=%CLASSPATH%;$JAR_FILE_PATH/ucharacter.jar .
Otherwise, another method would be to copy the files uprops.dat and
unames.icu from the icu4j source subdirectory
$ICU4J_SRC/src/com.ibm.icu.impl.data to your class directory
$ICU4J_CLASS/com.ibm.icu.impl.data.
Aside from the additions for UTF-16 support, and the updated Unicode
properties, the main differences between UCharacter and Character are:
- UCharacter is not designed to be a char wrapper and does not have
APIs to which involves management of that single char.
These include:
- char charValue(),
- int compareTo(java.lang.Character, java.lang.Character), etc.
- UCharacter does not include Character APIs that are deprecated, nor
does it include the Java-specific character information, such as
boolean isJavaIdentifierPart(char ch).
- Character maps characters 'A' - 'Z' and 'a' - 'z' to the numeric
values '10' - '35'. UCharacter also does this in digit and
getNumericValue, to adhere to the java semantics of these
methods. New methods unicodeDigit, and
getUnicodeNumericValue do not treat the above code points
as having numeric values. This is a semantic change from ICU4J 1.3.1.
Further detail differences can be determined from the program
com.ibm.icu.dev.test.lang.UCharacterCompare
In addition to Java compatibility functions, which calculate derived properties,
this API provides low-level access to the Unicode Character Database.
Unicode assigns each code point (not just assigned character) values for
many properties.
Most of them are simple boolean flags, or constants from a small enumerated list.
For some properties, values are strings or other relatively more complex types.
For more information see
"About the Unicode Character Database" (http://www.unicode.org/ucd/)
and the ICU User Guide chapter on Properties (http://icu.sourceforge.net/userguide/properties.html).
There are also functions that provide easy migration from C/POSIX functions
like isblank(). Their use is generally discouraged because the C/POSIX
standards do not define their semantics beyond the ASCII range, which means
that different implementations exhibit very different behavior.
Instead, Unicode properties should be used directly.
There are also only a few, broad C/POSIX character classes, and they tend
to be used for conflicting purposes. For example, the "isalpha()" class
is sometimes used to determine word boundaries, while a more sophisticated
approach would at least distinguish initial letters from continuation
characters (the latter including combining marks).
(In ICU, BreakIterator is the most sophisticated API for word boundaries.)
Another example: There is no "istitle()" class for titlecase characters.
ICU 3.4 and later provides API access for all twelve C/POSIX character classes.
ICU implements them according to the Standard Recommendations in
Annex C: Compatibility Properties of UTS #18 Unicode Regular Expressions
(http://www.unicode.org/reports/tr18/#Compatibility_Properties).
API access for C/POSIX character classes is as follows:
- alpha: isUAlphabetic(c) or hasBinaryProperty(c, UProperty.ALPHABETIC)
- lower: isULowercase(c) or hasBinaryProperty(c, UProperty.LOWERCASE)
- upper: isUUppercase(c) or hasBinaryProperty(c, UProperty.UPPERCASE)
- punct: ((1<
The C/POSIX character classes are also available in UnicodeSet patterns,
using patterns like [:graph:] or \p{graph}.
Note: There are several ICU (and Java) whitespace functions.
Comparison:
- isUWhiteSpace=UCHAR_WHITE_SPACE: Unicode White_Space property;
most of general categories "Z" (separators) + most whitespace ISO controls
(including no-break spaces, but excluding IS1..IS4 and ZWSP)
- isWhitespace: Java isWhitespace; Z + whitespace ISO controls but excluding no-break spaces
- isSpaceChar: just Z (including no-break spaces)
This class is not subclassable
author: Syn Wee Quek See Also: com.ibm.icu.lang.UCharacterEnums |
Inner Class :public static interface EastAsianWidth | |
Inner Class :public static interface DecompositionType | |
Inner Class :public static interface JoiningType | |
Inner Class :public static interface JoiningGroup | |
Inner Class :public static interface GraphemeClusterBreak | |
Inner Class :public static interface WordBreak | |
Inner Class :public static interface SentenceBreak | |
Inner Class :public static interface LineBreak | |
Inner Class :public static interface NumericType | |
Inner Class :public static interface HangulSyllableType | |
Field Summary | |
final public static int | FOLD_CASE_DEFAULT Option value for case folding: use default mappings defined in CaseFolding.txt. | final public static int | FOLD_CASE_EXCLUDE_SPECIAL_I Option value for case folding: exclude the mappings for dotted I
and dotless i marked with 'I' in CaseFolding.txt. | final public static int | MAX_CODE_POINT Cover the JDK 1.5 API, for convenience. | final public static char | MAX_HIGH_SURROGATE Cover the JDK 1.5 API, for convenience. | final public static char | MAX_LOW_SURROGATE Cover the JDK 1.5 API, for convenience. | final public static int | MAX_RADIX Compatibility constant for Java Character's MAX_RADIX. | final public static char | MAX_SURROGATE Cover the JDK 1.5 API, for convenience. | final public static int | MAX_VALUE The highest Unicode code point value (scalar value) according to the
Unicode Standard. | final public static int | MIN_CODE_POINT Cover the JDK 1.5 API, for convenience. | final public static char | MIN_HIGH_SURROGATE Cover the JDK 1.5 API, for convenience. | final public static char | MIN_LOW_SURROGATE Cover the JDK 1.5 API, for convenience. | final public static int | MIN_RADIX Compatibility constant for Java Character's MIN_RADIX. | final public static int | MIN_SUPPLEMENTARY_CODE_POINT Cover the JDK 1.5 API, for convenience. | final public static char | MIN_SURROGATE Cover the JDK 1.5 API, for convenience. | final public static int | MIN_VALUE The lowest Unicode code point value. | static UCharacterName | NAME_ | final public static double | NO_NUMERIC_VALUE Special value that is returned by getUnicodeNumericValue(int) when no
numeric value is defined for a code point. | static UPropertyAliases | PNAMES_ | final public static int | REPLACEMENT_CHAR Unicode value used when translating into Unicode encoding form and there
is no existing character. | final public static int | SUPPLEMENTARY_MIN_VALUE |
Method Summary | |
public static int | charCount(int cp) Cover the JDK 1.5 API, for convenience. | final public static int | codePointAt(CharSequence seq, int index) Cover the JDK 1.5 API, for convenience. | final public static int | codePointAt(char[] text, int index) Cover the JDK 1.5 API, for convenience. | final public static int | codePointAt(char[] text, int index, int limit) Cover the JDK 1.5 API, for convenience. | final public static int | codePointBefore(CharSequence seq, int index) Cover the JDK 1.5 API, for convenience. | final public static int | codePointBefore(char[] text, int index) Cover the JDK 1.5 API, for convenience. | final public static int | codePointBefore(char[] text, int index, int limit) Cover the JDK 1.5 API, for convenience. | public static int | codePointCount(CharSequence text, int start, int limit) Cover the JDK API, for convenience. | public static int | codePointCount(char[] text, int start, int limit) Cover the JDK API, for convenience. | public static int | digit(int ch, int radix) Retrieves the numeric value of a decimal digit code point.
This method observes the semantics of
java.lang.Character.digit() . | public static int | digit(int ch) Retrieves the numeric value of a decimal digit code point.
This is a convenience overload of digit(int, int)
that provides a decimal radix.
Semantic Change: In release 1.3.1 and prior, this
treated numeric letters and other numbers as digits. | public static int | foldCase(int ch, boolean defaultmapping) The given character is mapped to its case folding equivalent according
to UnicodeData.txt and CaseFolding.txt; if the character has no case
folding equivalent, the character itself is returned.
This function only returns the simple, single-code point case mapping.
Full case mappings should be used whenever possible because they produce
better results by working on whole strings.
They can map to a result string with a different length as appropriate.
Full case mappings are applied by the case mapping functions
that take String parameters rather than code points (int).
See also the User Guide chapter on C/POSIX migration:
http://icu.sourceforge.net/userguide/posix.html#case_mappings
Parameters: ch - the character to be converted Parameters: defaultmapping - Indicates if all mappings defined in CaseFolding.txt is to be used, otherwise the mappings for dotted I and dotless i marked with 'I' in CaseFolding.txt will be skipped. | public static String | foldCase(String str, boolean defaultmapping) The given string is mapped to its case folding equivalent according to
UnicodeData.txt and CaseFolding.txt; if any character has no case
folding equivalent, the character itself is returned.
"Full", multiple-code point case folding mappings are returned here.
For "simple" single-code point mappings use the API
foldCase(int ch, boolean defaultmapping).
Parameters: str - the String to be converted Parameters: defaultmapping - Indicates if all mappings defined in CaseFolding.txt is to be used, otherwise the mappings for dotted I and dotless i marked with 'I' in CaseFolding.txt will be skipped. | public static int | foldCase(int ch, int options) The given character is mapped to its case folding equivalent according
to UnicodeData.txt and CaseFolding.txt; if the character has no case
folding equivalent, the character itself is returned.
This function only returns the simple, single-code point case mapping.
Full case mappings should be used whenever possible because they produce
better results by working on whole strings.
They can map to a result string with a different length as appropriate.
Full case mappings are applied by the case mapping functions
that take String parameters rather than code points (int).
See also the User Guide chapter on C/POSIX migration:
http://icu.sourceforge.net/userguide/posix.html#case_mappings
Parameters: ch - the character to be converted Parameters: options - A bit set for special processing. | final public static String | foldCase(String str, int options) The given string is mapped to its case folding equivalent according to
UnicodeData.txt and CaseFolding.txt; if any character has no case
folding equivalent, the character itself is returned.
"Full", multiple-code point case folding mappings are returned here.
For "simple" single-code point mappings use the API
foldCase(int ch, boolean defaultmapping).
Parameters: str - the String to be converted Parameters: options - A bit set for special processing. | public static char | forDigit(int digit, int radix) Provide the java.lang.Character forDigit API, for convenience. | public static VersionInfo | getAge(int ch) Get the "age" of the code point.
The "age" is the Unicode version when the code point was first
designated (as a non-character or for Private Use) or assigned a
character.
This can be useful to avoid emitting code points to receiving
processes that do not accept newer characters.
The data is from the UCD file DerivedAge.txt.
Parameters: ch - The code point. | public static int | getCharFromExtendedName(String name) Find a Unicode character by either its name and return its code
point value. | public static int | getCharFromName(String name) Find a Unicode code point by its most current Unicode name and
return its code point value. | public static int | getCharFromName1_0(String name) Find a Unicode character by its version 1.0 Unicode name and return
its code point value. | public static int | getCodePoint(char lead, char trail) Returns a code point corresponding to the two UTF16 characters. | public static int | getCodePoint(char char16) Returns the code point corresponding to the UTF16 character. | public static int | getCombiningClass(int ch) | public static int | getDirection(int ch) Returns the Bidirection property of a code point. | public static byte | getDirectionality(int cp) Cover the JDK API, for convenience. | public static String | getExtendedName(int ch) Retrieves a name for a valid codepoint. | public static ValueIterator | getExtendedNameIterator() Gets an iterator for character names, iterating over codepoints.
This API only gets the iterator for the extended names. | public static int | getHanNumericValue(int ch) Return numeric value of Han code points.
This returns the value of Han 'numeric' code points,
including those for zero, ten, hundred, thousand, ten thousand,
and hundred million.
This includes both the standard and 'checkwriting'
characters, the 'big circle' zero character, and the standard
zero character.
Parameters: ch - code point to query value if it is a Han 'numeric character,' otherwise return -1. | public static String | getISOComment(int ch) Get the ISO 10646 comment for a character.
The ISO 10646 comment is an informative field in the Unicode Character
Database (UnicodeData.txt field 11) and is from the ISO 10646 names list.
Parameters: ch - The code point for which to get the ISO comment.It must be 0<=c<=0x10ffff . | public static int | getIntPropertyMaxValue(int type) Get the maximum value for an integer/binary Unicode property.
Can be used together with UCharacter.getIntPropertyMinValue(int)
to allocate arrays of com.ibm.icu.text.UnicodeSet or similar.
Examples for min/max values (for Unicode 3.2):
- UProperty.BIDI_CLASS: 0/18 (UCharacterDirection.LEFT_TO_RIGHT/UCharacterDirection.BOUNDARY_NEUTRAL)
- UProperty.SCRIPT: 0/45 (UScript.COMMON/UScript.TAGBANWA)
- UProperty.IDEOGRAPHIC: 0/1 (false/true)
For undefined UProperty constant values, min/max values will be 0/-1.
Parameters: type - UProperty selector constant, identifies which binary property to check. | public static int | getIntPropertyMinValue(int type) Get the minimum value for an integer/binary Unicode property type.
Can be used together with UCharacter.getIntPropertyMaxValue(int)
to allocate arrays of com.ibm.icu.text.UnicodeSet or similar.
Parameters: type - UProperty selector constant, identifies which binary property to check. | public static int | getIntPropertyValue(int ch, int type) Gets the property value for an Unicode property type of a code point. | public static int | getMirror(int ch) Maps the specified code point to a "mirror-image" code point. | public static String | getName(int ch) Retrieve the most current Unicode name of the argument code point, or
null if the character is unassigned or outside the range
UCharacter.MIN_VALUE and UCharacter.MAX_VALUE or does not have a name.
Note calling any methods related to code point names, e.g. | public static String | getName(String s, String separator) | public static String | getName1_0(int ch) Retrieve the earlier version 1.0 Unicode name of the argument code
point, or null if the character is unassigned or outside the range
UCharacter.MIN_VALUE and UCharacter.MAX_VALUE or does not have a name.
Note calling any methods related to code point names, e.g. | public static ValueIterator | getName1_0Iterator() Gets an iterator for character names, iterating over codepoints.
This API only gets the iterator for the older 1.0 Unicode names. | public static ValueIterator | getNameIterator() Gets an iterator for character names, iterating over codepoints.
This API only gets the iterator for the modern, most up-to-date
Unicode names. | public static int | getNumericValue(int ch) Returns the numeric value of the code point as a nonnegative
integer.
If the code point does not have a numeric value, then -1 is returned. | public static int | getPropertyEnum(String propertyAlias) Return the UProperty selector for a given property name, as
specified in the Unicode database file PropertyAliases.txt.
Short, long, and any other variants are recognized.
In addition, this function maps the synthetic names "gcm" /
"General_Category_Mask" to the property
UProperty.GENERAL_CATEGORY_MASK. | public static String | getPropertyName(int property, int nameChoice) Return the Unicode name for a given property, as given in the
Unicode database file PropertyAliases.txt. | public static int | getPropertyValueEnum(int property, String valueAlias) Return the property value integer for a given value name, as
specified in the Unicode database file PropertyValueAliases.txt.
Short, long, and any other variants are recognized.
Note: Some of the names in PropertyValueAliases.txt will only be
recognized with UProperty.GENERAL_CATEGORY_MASK, not
UProperty.GENERAL_CATEGORY. | public static String | getPropertyValueName(int property, int value, int nameChoice) Return the Unicode name for a given property value, as given in
the Unicode database file PropertyValueAliases.txt. | public static String | getStringPropertyValue(int propertyEnum, int codepoint, int nameChoice) Returns a string version of the property value. | public static int | getType(int ch) Returns a value indicating a code point's Unicode category.
Up-to-date Unicode implementation of java.lang.Character.getType()
except for the above mentioned code points that had their category
changed.
Return results are constants from the interface
UCharacterCategory
NOTE: the UCharacterCategory values are not compatible with
those returned by java.lang.Character.getType. | public static RangeValueIterator | getTypeIterator() | public static double | getUnicodeNumericValue(int ch) Get the numeric value for a Unicode code point as defined in the
Unicode Character Database.
A "double" return type is necessary because some numeric values are
fractions, negative, or too large for int.
For characters without any numeric values in the Unicode Character
Database, this function will return NO_NUMERIC_VALUE.
API Change: In release 2.2 and prior, this API has a
return type int and returns -1 when the argument ch does not have a
corresponding numeric value. | public static VersionInfo | getUnicodeVersion() Gets the version of Unicode data used. | public static boolean | hasBinaryProperty(int ch, int property) Check a binary Unicode property for a code point.
Unicode, especially in version 3.2, defines many more properties
than the original set in UnicodeData.txt.
This API is intended to reflect Unicode properties as defined in
the Unicode Character Database (UCD) and Unicode Technical Reports
(UTR).
For details about the properties see
http://www.unicode.org/.
For names of Unicode properties see the UCD file
PropertyAliases.txt.
This API does not check the validity of the codepoint.
Important: If ICU is built with UCD files from Unicode versions
below 3.2, then properties marked with "new" are not or
not fully available.
Parameters: ch - code point to test. Parameters: property - selector constant from com.ibm.icu.lang.UProperty, identifies which binary property to check. | public static boolean | isBMP(int ch) Determines if the code point is in the BMP plane. | public static boolean | isBaseForm(int ch) Determines whether the specified code point is of base form. | public static boolean | isDefined(int ch) Determines if a code point has a defined meaning in the up-to-date
Unicode standard.
E.g. | public static boolean | isDigit(int ch) Determines if a code point is a Java digit.
This method observes the semantics of
java.lang.Character.isDigit() . | public static boolean | isHighSurrogate(char ch) Cover the JDK 1.5 API, for convenience. | public static boolean | isISOControl(int ch) Determines if the specified code point is an ISO control character. | public static boolean | isIdentifierIgnorable(int ch) Determines if the specified code point should be regarded as an
ignorable character in a Unicode identifier.
A character is ignorable in the Unicode standard if it is of the type
Cf, Formatting code.
Up-to-date Unicode implementation of
java.lang.Character.isIdentifierIgnorable().
See UTR #8.
Parameters: ch - code point to be determined if it can be ignored in a Unicode identifier. | public static boolean | isJavaIdentifierPart(int cp) Compatibility override of Java method, delegates to
java.lang.Character.isJavaIdentifierPart. | public static boolean | isJavaIdentifierStart(int cp) Compatibility override of Java method, delegates to
java.lang.Character.isJavaIdentifierStart. | public static boolean | isJavaLetter(int cp) Compatibility override of Java deprecated method. | public static boolean | isJavaLetterOrDigit(int cp) Compatibility override of Java deprecated method. | public static boolean | isLegal(int ch) A code point is illegal if and only if
- Out of bounds, less than 0 or greater than UCharacter.MAX_VALUE
- A surrogate value, 0xD800 to 0xDFFF
- Not-a-character, having the form 0x xxFFFF or 0x xxFFFE
Note: legal does not mean that it is assigned in this version of Unicode.
Parameters: ch - code point to determine if it is a legal code point by itself true if and only if legal. | public static boolean | isLegal(String str) A string is legal iff all its code points are legal.
A code point is illegal if and only if
- Out of bounds, less than 0 or greater than UCharacter.MAX_VALUE
- A surrogate value, 0xD800 to 0xDFFF
- Not-a-character, having the form 0x xxFFFF or 0x xxFFFE
Note: legal does not mean that it is assigned in this version of Unicode.
Parameters: str - containing code points to examin true if and only if legal. | public static boolean | isLetter(int ch) Determines if the specified code point is a letter. | public static boolean | isLetterOrDigit(int ch) Determines if the specified code point is a letter or digit. | public static boolean | isLowSurrogate(char ch) Cover the JDK 1.5 API, for convenience. | public static boolean | isLowerCase(int ch) Determines if the specified code point is a lowercase character. | public static boolean | isMirrored(int ch) Determines whether the code point has the "mirrored" property. | public static boolean | isPrintable(int ch) Determines whether the specified code point is a printable character
according to the Unicode standard. | public static boolean | isSpace(int ch) Compatibility override of Java deprecated method. | public static boolean | isSpaceChar(int ch) Determines if the specified code point is a Unicode specified space
character, i.e. | public static boolean | isSupplementary(int ch) Determines if the code point is a supplementary character. | final public static boolean | isSupplementaryCodePoint(int cp) Cover the JDK 1.5 API, for convenience. | final public static boolean | isSurrogatePair(char high, char low) Cover the JDK 1.5 API, for convenience. | public static boolean | isTitleCase(int ch) Determines if the specified code point is a titlecase character. | public static boolean | isUAlphabetic(int ch) | public static boolean | isULowercase(int ch) | public static boolean | isUUppercase(int ch) | public static boolean | isUWhiteSpace(int ch) | public static boolean | isUnicodeIdentifierPart(int ch) Determines if the specified code point may be any part of a Unicode
identifier other than the starting character. | public static boolean | isUnicodeIdentifierStart(int ch) Determines if the specified code point is permissible as the first
character in a Unicode identifier. | public static boolean | isUpperCase(int ch) Determines if the specified code point is an uppercase character.
UnicodeData only contains case mappings for code point where they are
one-to-one mappings; it also omits information about context-sensitive
case mappings.
For language specific case conversion behavior, use
toUpperCase(locale, str). | final public static boolean | isValidCodePoint(int cp) Cover the JDK 1.5 API, for convenience. | public static boolean | isWhitespace(int ch) Determines if the specified code point is a white space character.
A code point is considered to be an whitespace character if and only
if it satisfies one of the following criteria:
- It is a Unicode space separator (category "Zs"), but is not
a no-break space (\u00A0 or \u202F or \uFEFF).
- It is a Unicode line separator (category "Zl").
- It is a Unicode paragraph separator (category "Zp").
- It is \u0009, HORIZONTAL TABULATION.
| public static int | offsetByCodePoints(CharSequence text, int index, int codePointOffset) Cover the JDK API, for convenience. | public static int | offsetByCodePoints(char[] text, int start, int count, int index, int codePointOffset) Cover the JDK API, for convenience. | final public static int | toChars(int cp, char[] dst, int dstIndex) Cover the JDK 1.5 API, for convenience. | final public static char[] | toChars(int cp) Cover the JDK 1.5 API, for convenience. | final public static int | toCodePoint(char high, char low) Cover the JDK 1.5 API, for convenience. | public static int | toLowerCase(int ch) The given code point is mapped to its lowercase equivalent; if the code
point has no lowercase equivalent, the code point itself is returned. | public static String | toLowerCase(String str) Gets lowercase version of the argument string. | public static String | toLowerCase(Locale locale, String str) Gets lowercase version of the argument string. | public static String | toLowerCase(ULocale locale, String str) Gets lowercase version of the argument string. | public static String | toString(int ch) Converts argument code point and returns a String object representing
the code point's value in UTF16 format. | public static int | toTitleCase(int ch) Converts the code point argument to titlecase.
If no titlecase is available, the uppercase is returned. | public static String | toTitleCase(String str, BreakIterator breakiter) Gets the titlecase version of the argument string.
Position for titlecasing is determined by the argument break
iterator, hence the user can customized his break iterator for
a specialized titlecasing. | public static String | toTitleCase(Locale locale, String str, BreakIterator breakiter) Gets the titlecase version of the argument string.
Position for titlecasing is determined by the argument break
iterator, hence the user can customized his break iterator for
a specialized titlecasing. | public static String | toTitleCase(ULocale locale, String str, BreakIterator titleIter) Gets the titlecase version of the argument string.
Position for titlecasing is determined by the argument break
iterator, hence the user can customized his break iterator for
a specialized titlecasing. | public static int | toUpperCase(int ch) Converts the character argument to uppercase. | public static String | toUpperCase(String str) Gets uppercase version of the argument string. | public static String | toUpperCase(Locale locale, String str) Gets uppercase version of the argument string. | public static String | toUpperCase(ULocale locale, String str) Gets uppercase version of the argument string. |
FOLD_CASE_DEFAULT | final public static int FOLD_CASE_DEFAULT(Code) | | Option value for case folding: use default mappings defined in CaseFolding.txt.
|
FOLD_CASE_EXCLUDE_SPECIAL_I | final public static int FOLD_CASE_EXCLUDE_SPECIAL_I(Code) | | Option value for case folding: exclude the mappings for dotted I
and dotless i marked with 'I' in CaseFolding.txt.
|
MAX_RADIX | final public static int MAX_RADIX(Code) | | Compatibility constant for Java Character's MAX_RADIX.
|
MAX_VALUE | final public static int MAX_VALUE(Code) | | The highest Unicode code point value (scalar value) according to the
Unicode Standard.
This is a 21-bit value (21 bits, rounded up).
Up-to-date Unicode implementation of java.lang.Character.MIN_VALUE
|
MIN_RADIX | final public static int MIN_RADIX(Code) | | Compatibility constant for Java Character's MIN_RADIX.
|
MIN_VALUE | final public static int MIN_VALUE(Code) | | The lowest Unicode code point value.
|
NO_NUMERIC_VALUE | final public static double NO_NUMERIC_VALUE(Code) | | Special value that is returned by getUnicodeNumericValue(int) when no
numeric value is defined for a code point.
See Also: UCharacter.getUnicodeNumericValue |
PNAMES_ | static UPropertyAliases PNAMES_(Code) | | Singleton object encapsulating the imported pnames.icu property aliases
|
REPLACEMENT_CHAR | final public static int REPLACEMENT_CHAR(Code) | | Unicode value used when translating into Unicode encoding form and there
is no existing character.
|
SUPPLEMENTARY_MIN_VALUE | final public static int SUPPLEMENTARY_MIN_VALUE(Code) | | The minimum value for Supplementary code points
|
charCount | public static int charCount(int cp)(Code) | | Cover the JDK 1.5 API, for convenience. Return the number of chars needed
to represent the code point. This does not check the
code point for validity.
Parameters: cp - the code point to check the number of chars needed to represent the code point See Also: UTF16.getCharCount |
codePointAt | final public static int codePointAt(CharSequence seq, int index)(Code) | | Cover the JDK 1.5 API, for convenience. Return the code point at index.
Note: the semantics of this API is different from the related UTF16
API. This examines only the characters at index and index+1.
Parameters: seq - the characters to check Parameters: index - the index of the first or only char forming the code point the code point at the index |
codePointAt | final public static int codePointAt(char[] text, int index)(Code) | | Cover the JDK 1.5 API, for convenience. Return the code point at index.
Note: the semantics of this API is different from the related UTF16
API. This examines only the characters at index and index+1.
Parameters: text - the characters to check Parameters: index - the index of the first or only char forming the code point the code point at the index |
codePointAt | final public static int codePointAt(char[] text, int index, int limit)(Code) | | Cover the JDK 1.5 API, for convenience. Return the code point at index.
Note: the semantics of this API is different from the related UTF16
API. This examines only the characters at index and index+1.
Parameters: text - the characters to check Parameters: index - the index of the first or only char forming the code point Parameters: limit - the limit of the valid text the code point at the index |
codePointBefore | final public static int codePointBefore(CharSequence seq, int index)(Code) | | Cover the JDK 1.5 API, for convenience. Return the code point before index.
Note: the semantics of this API is different from the related UTF16
API. This examines only the characters at index-1 and index-2.
Parameters: seq - the characters to check Parameters: index - the index after the last or only char forming the code point the code point before the index |
codePointBefore | final public static int codePointBefore(char[] text, int index)(Code) | | Cover the JDK 1.5 API, for convenience. Return the code point before index.
Note: the semantics of this API is different from the related UTF16
API. This examines only the characters at index-1 and index-2.
Parameters: text - the characters to check Parameters: index - the index after the last or only char forming the code point the code point before the index |
codePointBefore | final public static int codePointBefore(char[] text, int index, int limit)(Code) | | Cover the JDK 1.5 API, for convenience. Return the code point before index.
Note: the semantics of this API is different from the related UTF16
API. This examines only the characters at index-1 and index-2.
Parameters: text - the characters to check Parameters: index - the index after the last or only char forming the code point Parameters: limit - the start of the valid text the code point before the index |
codePointCount | public static int codePointCount(CharSequence text, int start, int limit)(Code) | | Cover the JDK API, for convenience. Count the number of code points in the range of text.
Parameters: text - the characters to check Parameters: start - the start of the range Parameters: limit - the limit of the range the number of code points in the range |
codePointCount | public static int codePointCount(char[] text, int start, int limit)(Code) | | Cover the JDK API, for convenience. Count the number of code points in the range of text.
Parameters: text - the characters to check Parameters: start - the start of the range Parameters: limit - the limit of the range the number of code points in the range |
digit | public static int digit(int ch, int radix)(Code) | | Retrieves the numeric value of a decimal digit code point.
This method observes the semantics of
java.lang.Character.digit() . Note that this
will return positive values for code points for which isDigit
returns false, just like java.lang.Character.
Semantic Change: In release 1.3.1 and
prior, this did not treat the European letters as having a
digit value, and also treated numeric letters and other numbers as
digits.
This has been changed to conform to the java semantics.
A code point is a valid digit if and only if:
- ch is a decimal digit or one of the european letters, and
- the value of ch is less than the specified radix.
Parameters: ch - the code point to query Parameters: radix - the radix the numeric value represented by the code point in thespecified radix, or -1 if the code point is not a decimal digitor if its value is too large for the radix |
digit | public static int digit(int ch)(Code) | | Retrieves the numeric value of a decimal digit code point.
This is a convenience overload of digit(int, int)
that provides a decimal radix.
Semantic Change: In release 1.3.1 and prior, this
treated numeric letters and other numbers as digits. This has
been changed to conform to the java semantics.
Parameters: ch - the code point to query the numeric value represented by the code point,or -1 if the code point is not a decimal digit or if itsvalue is too large for a decimal radix |
foldCase | public static int foldCase(int ch, boolean defaultmapping)(Code) | | The given character is mapped to its case folding equivalent according
to UnicodeData.txt and CaseFolding.txt; if the character has no case
folding equivalent, the character itself is returned.
This function only returns the simple, single-code point case mapping.
Full case mappings should be used whenever possible because they produce
better results by working on whole strings.
They can map to a result string with a different length as appropriate.
Full case mappings are applied by the case mapping functions
that take String parameters rather than code points (int).
See also the User Guide chapter on C/POSIX migration:
http://icu.sourceforge.net/userguide/posix.html#case_mappings
Parameters: ch - the character to be converted Parameters: defaultmapping - Indicates if all mappings defined in CaseFolding.txt is to be used, otherwise the mappings for dotted I and dotless i marked with 'I' in CaseFolding.txt will be skipped. the case folding equivalent of the character, if any; otherwise the character itself. See Also: UCharacter.foldCase(String,boolean) |
foldCase | public static String foldCase(String str, boolean defaultmapping)(Code) | | The given string is mapped to its case folding equivalent according to
UnicodeData.txt and CaseFolding.txt; if any character has no case
folding equivalent, the character itself is returned.
"Full", multiple-code point case folding mappings are returned here.
For "simple" single-code point mappings use the API
foldCase(int ch, boolean defaultmapping).
Parameters: str - the String to be converted Parameters: defaultmapping - Indicates if all mappings defined in CaseFolding.txt is to be used, otherwise the mappings for dotted I and dotless i marked with 'I' in CaseFolding.txt will be skipped. the case folding equivalent of the character, if any; otherwise the character itself. See Also: UCharacter.foldCase(int,boolean) |
foldCase | public static int foldCase(int ch, int options)(Code) | | The given character is mapped to its case folding equivalent according
to UnicodeData.txt and CaseFolding.txt; if the character has no case
folding equivalent, the character itself is returned.
This function only returns the simple, single-code point case mapping.
Full case mappings should be used whenever possible because they produce
better results by working on whole strings.
They can map to a result string with a different length as appropriate.
Full case mappings are applied by the case mapping functions
that take String parameters rather than code points (int).
See also the User Guide chapter on C/POSIX migration:
http://icu.sourceforge.net/userguide/posix.html#case_mappings
Parameters: ch - the character to be converted Parameters: options - A bit set for special processing. Currently the recognised options areFOLD_CASE_EXCLUDE_SPECIAL_I and FOLD_CASE_DEFAULT the case folding equivalent of the character, if any; otherwise the character itself. See Also: UCharacter.foldCase(String,boolean) |
foldCase | final public static String foldCase(String str, int options)(Code) | | The given string is mapped to its case folding equivalent according to
UnicodeData.txt and CaseFolding.txt; if any character has no case
folding equivalent, the character itself is returned.
"Full", multiple-code point case folding mappings are returned here.
For "simple" single-code point mappings use the API
foldCase(int ch, boolean defaultmapping).
Parameters: str - the String to be converted Parameters: options - A bit set for special processing. Currently the recognised options areFOLD_CASE_EXCLUDE_SPECIAL_I and FOLD_CASE_DEFAULT the case folding equivalent of the character, if any; otherwise the character itself. See Also: UCharacter.foldCase(int,boolean) |
forDigit | public static char forDigit(int digit, int radix)(Code) | | Provide the java.lang.Character forDigit API, for convenience.
|
getAge | public static VersionInfo getAge(int ch)(Code) | | Get the "age" of the code point.
The "age" is the Unicode version when the code point was first
designated (as a non-character or for Private Use) or assigned a
character.
This can be useful to avoid emitting code points to receiving
processes that do not accept newer characters.
The data is from the UCD file DerivedAge.txt.
Parameters: ch - The code point. the Unicode version number |
getCharFromExtendedName | public static int getCharFromExtendedName(String name)(Code) | | Find a Unicode character by either its name and return its code
point value. All Unicode names are in uppercase.
Extended names are all lowercase except for numbers and are contained
within angle brackets.
The names are searched in the following order
- Most current Unicode name if there is any
- Unicode 1.0 name if there is any
- Extended name in the form of
"". E.g.
Note calling any methods related to code point names, e.g. get*Name*()
incurs a one-time initialisation cost to construct the name tables.
Parameters: name - codepoint name code point associated with the name or -1 if the name is notfound. |
getCharFromName | public static int getCharFromName(String name)(Code) | | Find a Unicode code point by its most current Unicode name and
return its code point value. All Unicode names are in uppercase.
Note calling any methods related to code point names, e.g. get*Name*()
incurs a one-time initialisation cost to construct the name tables.
Parameters: name - most current Unicode character name whose code point is to be returned code point or -1 if name is not found |
getCharFromName1_0 | public static int getCharFromName1_0(String name)(Code) | | Find a Unicode character by its version 1.0 Unicode name and return
its code point value. All Unicode names are in uppercase.
Note calling any methods related to code point names, e.g. get*Name*()
incurs a one-time initialisation cost to construct the name tables.
Parameters: name - Unicode 1.0 code point name whose code point is to returned code point or -1 if name is not found |
getCodePoint | public static int getCodePoint(char lead, char trail)(Code) | | Returns a code point corresponding to the two UTF16 characters.
Parameters: lead - the lead char Parameters: trail - the trail char code point if surrogate characters are valid. exception: IllegalArgumentException - thrown when argument characters donot form a valid codepoint |
getCodePoint | public static int getCodePoint(char char16)(Code) | | Returns the code point corresponding to the UTF16 character.
Parameters: char16 - the UTF16 character code point if argument is a valid character. exception: IllegalArgumentException - thrown when char16 is not a validcodepoint |
getCombiningClass | public static int getCombiningClass(int ch)(Code) | | Gets the combining class of the argument codepoint
Parameters: ch - code point whose combining is to be retrieved the combining class of the codepoint |
getDirection | public static int getDirection(int ch)(Code) | | Returns the Bidirection property of a code point.
For example, 0x0041 (letter A) has the LEFT_TO_RIGHT directional
property.
Result returned belongs to the interface
UCharacterDirection
Parameters: ch - the code point to be determined its direction direction constant from UCharacterDirection. |
getDirectionality | public static byte getDirectionality(int cp)(Code) | | Cover the JDK API, for convenience. Return a byte representing the directionality of
the character.
Note: Unlike the JDK, this returns DIRECTIONALITY_LEFT_TO_RIGHT for undefined or
out-of-bounds characters. Note: The return value must be
tested using the constants defined in
UCharacterEnums.ECharacterDirection since the values are different from the ones defined by java.lang.Character .
Parameters: cp - the code point to check the directionality of the code point See Also: UCharacter.getDirection |
getExtendedName | public static String getExtendedName(int ch)(Code) | | Retrieves a name for a valid codepoint. Unlike, getName(int) and
getName1_0(int), this method will return a name even for codepoints that
are not assigned a name in UnicodeData.txt.
The names are returned in the following order.
- Most current Unicode name if there is any
- Unicode 1.0 name if there is any
- Extended name in the form of
"". E.g.
Note calling any methods related to code point names, e.g. get*Name*()
incurs a one-time initialisation cost to construct the name tables.
Parameters: ch - the code point for which to get the name a name for the argument codepoint |
getExtendedNameIterator | public static ValueIterator getExtendedNameIterator()(Code) | | Gets an iterator for character names, iterating over codepoints.
This API only gets the iterator for the extended names.
For modern, most up-to-date Unicode names use getNameIterator() or
for older 1.0 Unicode names use get1_0NameIterator().
Example of use:
ValueIterator iterator = UCharacter.getExtendedNameIterator();
ValueIterator.Element element = new ValueIterator.Element();
while (iterator.next(element)) {
System.out.println("Codepoint \\u" +
Integer.toHexString(element.codepoint) +
" has the name " + (String)element.value);
}
The maximal range which the name iterator iterates is from
an iterator |
getHanNumericValue | public static int getHanNumericValue(int ch)(Code) | | Return numeric value of Han code points.
This returns the value of Han 'numeric' code points,
including those for zero, ten, hundred, thousand, ten thousand,
and hundred million.
This includes both the standard and 'checkwriting'
characters, the 'big circle' zero character, and the standard
zero character.
Parameters: ch - code point to query value if it is a Han 'numeric character,' otherwise return -1. |
getISOComment | public static String getISOComment(int ch)(Code) | | Get the ISO 10646 comment for a character.
The ISO 10646 comment is an informative field in the Unicode Character
Database (UnicodeData.txt field 11) and is from the ISO 10646 names list.
Parameters: ch - The code point for which to get the ISO comment.It must be 0<=c<=0x10ffff . The ISO comment, or null if there is no comment for this character. |
getIntPropertyMaxValue | public static int getIntPropertyMaxValue(int type)(Code) | | Get the maximum value for an integer/binary Unicode property.
Can be used together with UCharacter.getIntPropertyMinValue(int)
to allocate arrays of com.ibm.icu.text.UnicodeSet or similar.
Examples for min/max values (for Unicode 3.2):
- UProperty.BIDI_CLASS: 0/18 (UCharacterDirection.LEFT_TO_RIGHT/UCharacterDirection.BOUNDARY_NEUTRAL)
- UProperty.SCRIPT: 0/45 (UScript.COMMON/UScript.TAGBANWA)
- UProperty.IDEOGRAPHIC: 0/1 (false/true)
For undefined UProperty constant values, min/max values will be 0/-1.
Parameters: type - UProperty selector constant, identifies which binary property to check. Must be UProperty.BINARY_START <= type < UProperty.BINARY_LIMIT or UProperty.INT_START <= type < UProperty.INT_LIMIT. Maximum value returned by u_getIntPropertyValue for a Unicode property. <= 0 if the property selector 'type' is out of range. See Also: UProperty See Also: UCharacter.hasBinaryProperty See Also: UCharacter.getUnicodeVersion See Also: UCharacter.getIntPropertyMaxValue See Also: UCharacter.getIntPropertyValue |
getIntPropertyMinValue | public static int getIntPropertyMinValue(int type)(Code) | | Get the minimum value for an integer/binary Unicode property type.
Can be used together with UCharacter.getIntPropertyMaxValue(int)
to allocate arrays of com.ibm.icu.text.UnicodeSet or similar.
Parameters: type - UProperty selector constant, identifies which binary property to check. Must be UProperty.BINARY_START <= type < UProperty.BINARY_LIMIT or UProperty.INT_START <= type < UProperty.INT_LIMIT. Minimum value returned by UCharacter.getIntPropertyValue(int) for a Unicode property. 0 if the property selector 'type' is out of range. See Also: UProperty See Also: UCharacter.hasBinaryProperty See Also: UCharacter.getUnicodeVersion See Also: UCharacter.getIntPropertyMaxValue See Also: UCharacter.getIntPropertyValue |
getIntPropertyValue | public static int getIntPropertyValue(int ch, int type)(Code) | | Gets the property value for an Unicode property type of a code point.
Also returns binary and mask property values.
Unicode, especially in version 3.2, defines many more properties than
the original set in UnicodeData.txt.
The properties APIs are intended to reflect Unicode properties as
defined in the Unicode Character Database (UCD) and Unicode Technical
Reports (UTR). For details about the properties see
http://www.unicode.org/.
For names of Unicode properties see the UCD file PropertyAliases.txt.
Sample usage:
int ea = UCharacter.getIntPropertyValue(c, UProperty.EAST_ASIAN_WIDTH);
int ideo = UCharacter.getIntPropertyValue(c, UProperty.IDEOGRAPHIC);
boolean b = (ideo == 1) ? true : false;
Parameters: ch - code point to test. Parameters: type - UProperty selector constant, identifies which binary property to check. Must be UProperty.BINARY_START <= type < UProperty.BINARY_LIMIT or UProperty.INT_START <= type < UProperty.INT_LIMIT or UProperty.MASK_START <= type < UProperty.MASK_LIMIT. numeric value that is directly the property value or,for enumerated properties, corresponds to the numeric value of the enumerated constant of the respective property value enumeration type (cast to enum type if necessary).Returns 0 or 1 (for false / true) for binary Unicode properties.Returns a bit-mask for mask properties.Returns 0 if 'type' is out of bounds or if the Unicode versiondoes not have data for the property at all, or not for this code point. See Also: UProperty See Also: UCharacter.hasBinaryProperty See Also: UCharacter.getIntPropertyMinValue See Also: UCharacter.getIntPropertyMaxValue See Also: UCharacter.getUnicodeVersion |
getMirror | public static int getMirror(int ch)(Code) | | Maps the specified code point to a "mirror-image" code point.
For code points with the "mirrored" property, implementations sometimes
need a "poor man's" mapping to another code point such that the default
glyph may serve as the mirror-image of the default glyph of the
specified code point.
This is useful for text conversion to and from codepages with visual
order, and for displays without glyph selection capabilities.
Parameters: ch - code point whose mirror is to be retrieved another code point that may serve as a mirror-image substitute, or ch itself if there is no such mapping or ch does not have the "mirrored" property |
getName | public static String getName(int ch)(Code) | | Retrieve the most current Unicode name of the argument code point, or
null if the character is unassigned or outside the range
UCharacter.MIN_VALUE and UCharacter.MAX_VALUE or does not have a name.
Note calling any methods related to code point names, e.g. get*Name*()
incurs a one-time initialisation cost to construct the name tables.
Parameters: ch - the code point for which to get the name most current Unicode name |
getName | public static String getName(String s, String separator)(Code) | | Gets the names for each of the characters in a string
Parameters: s - string to format Parameters: separator - string to go between names string of names |
getName1_0 | public static String getName1_0(int ch)(Code) | | Retrieve the earlier version 1.0 Unicode name of the argument code
point, or null if the character is unassigned or outside the range
UCharacter.MIN_VALUE and UCharacter.MAX_VALUE or does not have a name.
Note calling any methods related to code point names, e.g. get*Name*()
incurs a one-time initialisation cost to construct the name tables.
Parameters: ch - the code point for which to get the name version 1.0 Unicode name |
getName1_0Iterator | public static ValueIterator getName1_0Iterator()(Code) | | Gets an iterator for character names, iterating over codepoints.
This API only gets the iterator for the older 1.0 Unicode names.
For modern, most up-to-date Unicode names use getNameIterator() or
for extended names use getExtendedNameIterator().
Example of use:
ValueIterator iterator = UCharacter.get1_0NameIterator();
ValueIterator.Element element = new ValueIterator.Element();
while (iterator.next(element)) {
System.out.println("Codepoint \\u" +
Integer.toHexString(element.codepoint) +
" has the name " + (String)element.value);
}
The maximal range which the name iterator iterates is from
an iterator |
getNameIterator | public static ValueIterator getNameIterator()(Code) | | Gets an iterator for character names, iterating over codepoints.
This API only gets the iterator for the modern, most up-to-date
Unicode names. For older 1.0 Unicode names use get1_0NameIterator() or
for extended names use getExtendedNameIterator().
Example of use:
ValueIterator iterator = UCharacter.getNameIterator();
ValueIterator.Element element = new ValueIterator.Element();
while (iterator.next(element)) {
System.out.println("Codepoint \\u" +
Integer.toHexString(element.codepoint) +
" has the name " + (String)element.value);
}
The maximal range which the name iterator iterates is from
UCharacter.MIN_VALUE to UCharacter.MAX_VALUE.
an iterator |
getNumericValue | public static int getNumericValue(int ch)(Code) | | Returns the numeric value of the code point as a nonnegative
integer.
If the code point does not have a numeric value, then -1 is returned.
If the code point has a numeric value that cannot be represented as a
nonnegative integer (for example, a fractional value), then -2 is
returned.
Parameters: ch - the code point to query the numeric value of the code point, or -1 if it has no numeric value, or -2 if it has a numeric value that cannot be represented as a nonnegative integer |
getPropertyEnum | public static int getPropertyEnum(String propertyAlias)(Code) | | Return the UProperty selector for a given property name, as
specified in the Unicode database file PropertyAliases.txt.
Short, long, and any other variants are recognized.
In addition, this function maps the synthetic names "gcm" /
"General_Category_Mask" to the property
UProperty.GENERAL_CATEGORY_MASK. These names are not in
PropertyAliases.txt.
Parameters: propertyAlias - the property name to be matched. The nameis compared using "loose matching" as described inPropertyAliases.txt. a UProperty enum. exception: IllegalArgumentException - thrown if propertyAliasis not recognized. See Also: UProperty |
getPropertyName | public static String getPropertyName(int property, int nameChoice)(Code) | | Return the Unicode name for a given property, as given in the
Unicode database file PropertyAliases.txt. Most properties
have more than one name. The nameChoice determines which one
is returned.
In addition, this function maps the property
UProperty.GENERAL_CATEGORY_MASK to the synthetic names "gcm" /
"General_Category_Mask". These names are not in
PropertyAliases.txt.
Parameters: property - UProperty selector. Parameters: nameChoice - UProperty.NameChoice selector for which nameto get. All properties have a long name. Most have a shortname, but some do not. Unicode allows for additional names; ifpresent these will be returned by UProperty.NameChoice.LONG + i,where i=1, 2,... a name, or null if Unicode explicitly defines no name("n/a") for a given property/nameChoice. If a given nameChoicethrows an exception, then all larger values of nameChoice willthrow an exception. If null is returned for a givennameChoice, then other nameChoice values may return non-nullresults. exception: IllegalArgumentException - thrown if property ornameChoice are invalid. See Also: UProperty See Also: UProperty.NameChoice |
getPropertyValueEnum | public static int getPropertyValueEnum(int property, String valueAlias)(Code) | | Return the property value integer for a given value name, as
specified in the Unicode database file PropertyValueAliases.txt.
Short, long, and any other variants are recognized.
Note: Some of the names in PropertyValueAliases.txt will only be
recognized with UProperty.GENERAL_CATEGORY_MASK, not
UProperty.GENERAL_CATEGORY. These include: "C" / "Other", "L" /
"Letter", "LC" / "Cased_Letter", "M" / "Mark", "N" / "Number", "P"
/ "Punctuation", "S" / "Symbol", and "Z" / "Separator".
Parameters: property - UProperty selector constant.UProperty.INT_START <= property < UProperty.INT_LIMIT orUProperty.BINARY_START <= property < UProperty.BINARY_LIMIT orUProperty.MASK_START < = property < UProperty.MASK_LIMIT.Only these properties can be enumerated. Parameters: valueAlias - the value name to be matched. The name iscompared using "loose matching" as described inPropertyValueAliases.txt. a value integer. Note: UProperty.GENERAL_CATEGORYvalues are mask values produced by left-shifting 1 byUCharacter.getType(). This allows grouped categories such as[:L:] to be represented. See Also: UProperty throws: IllegalArgumentException - if property is not a valid UPropertyselector |
getPropertyValueName | public static String getPropertyValueName(int property, int value, int nameChoice)(Code) | | Return the Unicode name for a given property value, as given in
the Unicode database file PropertyValueAliases.txt. Most
values have more than one name. The nameChoice determines
which one is returned.
Note: Some of the names in PropertyValueAliases.txt can only be
retrieved using UProperty.GENERAL_CATEGORY_MASK, not
UProperty.GENERAL_CATEGORY. These include: "C" / "Other", "L" /
"Letter", "LC" / "Cased_Letter", "M" / "Mark", "N" / "Number", "P"
/ "Punctuation", "S" / "Symbol", and "Z" / "Separator".
Parameters: property - UProperty selector constant.UProperty.INT_START <= property < UProperty.INT_LIMIT orUProperty.BINARY_START <= property < UProperty.BINARY_LIMIT orUProperty.MASK_START < = property < UProperty.MASK_LIMIT.If out of range, null is returned. Parameters: value - selector for a value for the given property. Ingeneral, valid values range from 0 up to some maximum. Thereare a few exceptions: (1.) UProperty.BLOCK values begin at thenon-zero value BASIC_LATIN.getID(). (2.)UProperty.CANONICAL_COMBINING_CLASS values are not contiguousand range from 0..240. (3.) UProperty.GENERAL_CATEGORY_MASK valuesare mask values produced by left-shifting 1 byUCharacter.getType(). This allows grouped categories such as[:L:] to be represented. Mask values are non-contiguous. Parameters: nameChoice - UProperty.NameChoice selector for which nameto get. All values have a long name. Most have a short name,but some do not. Unicode allows for additional names; ifpresent these will be returned by UProperty.NameChoice.LONG + i,where i=1, 2,... a name, or null if Unicode explicitly defines no name("n/a") for a given property/value/nameChoice. If a givennameChoice throws an exception, then all larger values ofnameChoice will throw an exception. If null is returned for agiven nameChoice, then other nameChoice values may returnnon-null results. exception: IllegalArgumentException - thrown if property, value,or nameChoice are invalid. See Also: UProperty See Also: UProperty.NameChoice |
getStringPropertyValue | public static String getStringPropertyValue(int propertyEnum, int codepoint, int nameChoice)(Code) | | Returns a string version of the property value.
Parameters: propertyEnum - Parameters: codepoint - Parameters: nameChoice - value as string |
getType | public static int getType(int ch)(Code) | | Returns a value indicating a code point's Unicode category.
Up-to-date Unicode implementation of java.lang.Character.getType()
except for the above mentioned code points that had their category
changed.
Return results are constants from the interface
UCharacterCategory
NOTE: the UCharacterCategory values are not compatible with
those returned by java.lang.Character.getType. UCharacterCategory values
match the ones used in ICU4C, while java.lang.Character type
values, though similar, skip the value 17.
Parameters: ch - code point whose type is to be determined category which is a value of UCharacterCategory |
getTypeIterator | public static RangeValueIterator getTypeIterator()(Code) | | Gets an iterator for character types, iterating over codepoints.
Example of use:
RangeValueIterator iterator = UCharacter.getTypeIterator();
RangeValueIterator.Element element = new RangeValueIterator.Element();
while (iterator.next(element)) {
System.out.println("Codepoint \\u" +
Integer.toHexString(element.start) +
" to codepoint \\u" +
Integer.toHexString(element.limit - 1) +
" has the character type " +
element.value);
}
an iterator |
getUnicodeNumericValue | public static double getUnicodeNumericValue(int ch)(Code) | | Get the numeric value for a Unicode code point as defined in the
Unicode Character Database.
A "double" return type is necessary because some numeric values are
fractions, negative, or too large for int.
For characters without any numeric values in the Unicode Character
Database, this function will return NO_NUMERIC_VALUE.
API Change: In release 2.2 and prior, this API has a
return type int and returns -1 when the argument ch does not have a
corresponding numeric value. This has been changed to synch with ICU4C
This corresponds to the ICU4C function u_getNumericValue.
Parameters: ch - Code point to get the numeric value for. numeric value of ch, or NO_NUMERIC_VALUE if none is defined. |
getUnicodeVersion | public static VersionInfo getUnicodeVersion()(Code) | | Gets the version of Unicode data used.
the unicode version number used |
hasBinaryProperty | public static boolean hasBinaryProperty(int ch, int property)(Code) | | Check a binary Unicode property for a code point.
Unicode, especially in version 3.2, defines many more properties
than the original set in UnicodeData.txt.
This API is intended to reflect Unicode properties as defined in
the Unicode Character Database (UCD) and Unicode Technical Reports
(UTR).
For details about the properties see
http://www.unicode.org/.
For names of Unicode properties see the UCD file
PropertyAliases.txt.
This API does not check the validity of the codepoint.
Important: If ICU is built with UCD files from Unicode versions
below 3.2, then properties marked with "new" are not or
not fully available.
Parameters: ch - code point to test. Parameters: property - selector constant from com.ibm.icu.lang.UProperty, identifies which binary property to check. true or false according to the binary Unicode property value for ch. Also false if property is out of bounds or if the Unicode version does not have data for the property at all, or not for this code point. See Also: com.ibm.icu.lang.UProperty |
isBMP | public static boolean isBMP(int ch)(Code) | | Determines if the code point is in the BMP plane.
Parameters: ch - code point to be determined if it is not a supplementary character true if code point is not a supplementary character |
isBaseForm | public static boolean isBaseForm(int ch)(Code) | | Determines whether the specified code point is of base form.
A code point of base form does not graphically combine with preceding
characters, and is neither a control nor a format character.
Parameters: ch - code point to be determined if it is of base form true if the code point is of base form |
isDefined | public static boolean isDefined(int ch)(Code) | | Determines if a code point has a defined meaning in the up-to-date
Unicode standard.
E.g. supplementary code points though allocated space are not defined in
Unicode yet.
Up-to-date Unicode implementation of java.lang.Character.isDefined()
Parameters: ch - code point to be determined if it is defined in the most current version of Unicode true if this code point is defined in unicode |
isDigit | public static boolean isDigit(int ch)(Code) | | Determines if a code point is a Java digit.
This method observes the semantics of
java.lang.Character.isDigit() . It returns true for decimal
digits only.
Semantic Change: In release 1.3.1 and prior, this treated
numeric letters and other numbers as digits.
This has been changed to conform to the java semantics.
Parameters: ch - code point to query true if this code point is a digit |
isHighSurrogate | public static boolean isHighSurrogate(char ch)(Code) | | Cover the JDK 1.5 API, for convenience.
Parameters: ch - the char to check true if ch is a high (lead) surrogate |
isISOControl | public static boolean isISOControl(int ch)(Code) | | Determines if the specified code point is an ISO control character.
A code point is considered to be an ISO control character if it is in
the range \u0000 through \u001F or in the range \u007F through
\u009F.
Up-to-date Unicode implementation of java.lang.Character.isISOControl()
Parameters: ch - code point to determine if it is an ISO control character true if code point is a ISO control character |
isIdentifierIgnorable | public static boolean isIdentifierIgnorable(int ch)(Code) | | Determines if the specified code point should be regarded as an
ignorable character in a Unicode identifier.
A character is ignorable in the Unicode standard if it is of the type
Cf, Formatting code.
Up-to-date Unicode implementation of
java.lang.Character.isIdentifierIgnorable().
See UTR #8.
Parameters: ch - code point to be determined if it can be ignored in a Unicode identifier. true if the code point is ignorable |
isJavaIdentifierPart | public static boolean isJavaIdentifierPart(int cp)(Code) | | Compatibility override of Java method, delegates to
java.lang.Character.isJavaIdentifierPart.
Parameters: cp - the code point true if the code point can continue a java identifier. |
isJavaIdentifierStart | public static boolean isJavaIdentifierStart(int cp)(Code) | | Compatibility override of Java method, delegates to
java.lang.Character.isJavaIdentifierStart.
Parameters: cp - the code point true if the code point can start a java identifier. |
isJavaLetter | public static boolean isJavaLetter(int cp)(Code) | | Compatibility override of Java deprecated method. This
method will always remain deprecated. Delegates to
java.lang.Character.isJavaIdentifierStart.
Parameters: cp - the code point true if the code point can start a java identifier. |
isJavaLetterOrDigit | public static boolean isJavaLetterOrDigit(int cp)(Code) | | Compatibility override of Java deprecated method. This
method will always remain deprecated. Delegates to
java.lang.Character.isJavaIdentifierPart.
Parameters: cp - the code point true if the code point can continue a java identifier. |
isLegal | public static boolean isLegal(int ch)(Code) | | A code point is illegal if and only if
- Out of bounds, less than 0 or greater than UCharacter.MAX_VALUE
- A surrogate value, 0xD800 to 0xDFFF
- Not-a-character, having the form 0x xxFFFF or 0x xxFFFE
Note: legal does not mean that it is assigned in this version of Unicode.
Parameters: ch - code point to determine if it is a legal code point by itself true if and only if legal. |
isLegal | public static boolean isLegal(String str)(Code) | | A string is legal iff all its code points are legal.
A code point is illegal if and only if
- Out of bounds, less than 0 or greater than UCharacter.MAX_VALUE
- A surrogate value, 0xD800 to 0xDFFF
- Not-a-character, having the form 0x xxFFFF or 0x xxFFFE
Note: legal does not mean that it is assigned in this version of Unicode.
Parameters: str - containing code points to examin true if and only if legal. |
isLetter | public static boolean isLetter(int ch)(Code) | | Determines if the specified code point is a letter.
Up-to-date Unicode implementation of java.lang.Character.isLetter()
Parameters: ch - code point to determine if it is a letter true if code point is a letter |
isLetterOrDigit | public static boolean isLetterOrDigit(int ch)(Code) | | Determines if the specified code point is a letter or digit.
Note this method, unlike java.lang.Character does not regard the ascii
characters 'A' - 'Z' and 'a' - 'z' as digits.
Parameters: ch - code point to determine if it is a letter or a digit true if code point is a letter or a digit |
isLowSurrogate | public static boolean isLowSurrogate(char ch)(Code) | | Cover the JDK 1.5 API, for convenience.
Parameters: ch - the char to check true if ch is a low (trail) surrogate |
isLowerCase | public static boolean isLowerCase(int ch)(Code) | | Determines if the specified code point is a lowercase character.
UnicodeData only contains case mappings for code points where they are
one-to-one mappings; it also omits information about context-sensitive
case mappings. For more information about Unicode case mapping
please refer to the
Technical report
#21.
Up-to-date Unicode implementation of java.lang.Character.isLowerCase()
Parameters: ch - code point to determine if it is in lowercase true if code point is a lowercase character |
isMirrored | public static boolean isMirrored(int ch)(Code) | | Determines whether the code point has the "mirrored" property.
This property is set for characters that are commonly used in
Right-To-Left contexts and need to be displayed with a "mirrored"
glyph.
Parameters: ch - code point whose mirror is to be determined true if the code point has the "mirrored" property |
isPrintable | public static boolean isPrintable(int ch)(Code) | | Determines whether the specified code point is a printable character
according to the Unicode standard.
Parameters: ch - code point to be determined if it is printable true if the code point is a printable character |
isSpace | public static boolean isSpace(int ch)(Code) | | Compatibility override of Java deprecated method. This
method will always remain deprecated. Delegates to
java.lang.Character.isSpace.
Parameters: ch - the code point true if the code point is a space character asdefined by java.lang.Character.isSpace. |
isSpaceChar | public static boolean isSpaceChar(int ch)(Code) | | Determines if the specified code point is a Unicode specified space
character, i.e. if code point is in the category Zs, Zl and Zp.
Up-to-date Unicode implementation of java.lang.Character.isSpaceChar().
Parameters: ch - code point to determine if it is a space true if the specified code point is a space character |
isSupplementary | public static boolean isSupplementary(int ch)(Code) | | Determines if the code point is a supplementary character.
A code point is a supplementary character if and only if it is greater
than SUPPLEMENTARY_MIN_VALUE
Parameters: ch - code point to be determined if it is in the supplementary plane true if code point is a supplementary character |
isSupplementaryCodePoint | final public static boolean isSupplementaryCodePoint(int cp)(Code) | | Cover the JDK 1.5 API, for convenience.
Parameters: cp - the code point to check true if cp is a supplementary code point |
isSurrogatePair | final public static boolean isSurrogatePair(char high, char low)(Code) | | Cover the JDK 1.5 API, for convenience. Return true if the chars
form a valid surrogate pair.
Parameters: high - the high (lead) char Parameters: low - the low (trail) char true if high, low form a surrogate pair |
isTitleCase | public static boolean isTitleCase(int ch)(Code) | | Determines if the specified code point is a titlecase character.
UnicodeData only contains case mappings for code points where they are
one-to-one mappings; it also omits information about context-sensitive
case mappings.
For more information about Unicode case mapping please refer to the
Technical report #21.
Up-to-date Unicode implementation of java.lang.Character.isTitleCase().
Parameters: ch - code point to determine if it is in title case true if the specified code point is a titlecase character |
isUAlphabetic | public static boolean isUAlphabetic(int ch)(Code) | | Check if a code point has the Alphabetic Unicode property.
Same as UCharacter.hasBinaryProperty(ch, UProperty.ALPHABETIC).
Different from UCharacter.isLetter(ch)!
Parameters: ch - codepoint to be tested |
isULowercase | public static boolean isULowercase(int ch)(Code) | | Check if a code point has the Lowercase Unicode property.
Same as UCharacter.hasBinaryProperty(ch, UProperty.LOWERCASE).
This is different from UCharacter.isLowerCase(ch)!
Parameters: ch - codepoint to be tested |
isUUppercase | public static boolean isUUppercase(int ch)(Code) | | Check if a code point has the Uppercase Unicode property.
Same as UCharacter.hasBinaryProperty(ch, UProperty.UPPERCASE).
This is different from UCharacter.isUpperCase(ch)!
Parameters: ch - codepoint to be tested |
isUWhiteSpace | public static boolean isUWhiteSpace(int ch)(Code) | | Check if a code point has the White_Space Unicode property.
Same as UCharacter.hasBinaryProperty(ch, UProperty.WHITE_SPACE).
This is different from both UCharacter.isSpace(ch) and
UCharacter.isWhitespace(ch)!
Parameters: ch - codepoint to be tested |
isUnicodeIdentifierPart | public static boolean isUnicodeIdentifierPart(int ch)(Code) | | Determines if the specified code point may be any part of a Unicode
identifier other than the starting character.
A code point may be part of a Unicode identifier if and only if it is
one of the following:
- Lu Uppercase letter
- Ll Lowercase letter
- Lt Titlecase letter
- Lm Modifier letter
- Lo Other letter
- Nl Letter number
- Pc Connecting punctuation character
- Nd decimal number
- Mc Spacing combining mark
- Mn Non-spacing mark
- Cf formatting code
Up-to-date Unicode implementation of
java.lang.Character.isUnicodeIdentifierPart().
See UTR #8.
Parameters: ch - code point to determine if is can be part of a Unicode identifier true if code point is any character belonging a unicode identifier suffix after the first character |
isUnicodeIdentifierStart | public static boolean isUnicodeIdentifierStart(int ch)(Code) | | Determines if the specified code point is permissible as the first
character in a Unicode identifier.
A code point may start a Unicode identifier if it is of type either
- Lu Uppercase letter
- Ll Lowercase letter
- Lt Titlecase letter
- Lm Modifier letter
- Lo Other letter
- Nl Letter number
Up-to-date Unicode implementation of
java.lang.Character.isUnicodeIdentifierStart().
See UTR #8.
Parameters: ch - code point to determine if it can start a Unicode identifier true if code point is the first character belonging a unicode identifier |
isUpperCase | public static boolean isUpperCase(int ch)(Code) | | Determines if the specified code point is an uppercase character.
UnicodeData only contains case mappings for code point where they are
one-to-one mappings; it also omits information about context-sensitive
case mappings.
For language specific case conversion behavior, use
toUpperCase(locale, str).
For example, the case conversion for dot-less i and dotted I in Turkish,
or for final sigma in Greek.
For more information about Unicode case mapping please refer to the
Technical report #21.
Up-to-date Unicode implementation of java.lang.Character.isUpperCase().
Parameters: ch - code point to determine if it is in uppercase true if the code point is an uppercase character |
isValidCodePoint | final public static boolean isValidCodePoint(int cp)(Code) | | Cover the JDK 1.5 API, for convenience.
Parameters: cp - the code point to check true if cp is a valid code point |
isWhitespace | public static boolean isWhitespace(int ch)(Code) | | Determines if the specified code point is a white space character.
A code point is considered to be an whitespace character if and only
if it satisfies one of the following criteria:
- It is a Unicode space separator (category "Zs"), but is not
a no-break space (\u00A0 or \u202F or \uFEFF).
- It is a Unicode line separator (category "Zl").
- It is a Unicode paragraph separator (category "Zp").
- It is \u0009, HORIZONTAL TABULATION.
- It is \u000A, LINE FEED.
- It is \u000B, VERTICAL TABULATION.
- It is \u000C, FORM FEED.
- It is \u000D, CARRIAGE RETURN.
- It is \u001C, FILE SEPARATOR.
- It is \u001D, GROUP SEPARATOR.
- It is \u001E, RECORD SEPARATOR.
- It is \u001F, UNIT SEPARATOR.
This API tries to synch to the semantics of the Java API,
java.lang.Character.isWhitespace().
Parameters: ch - code point to determine if it is a white space true if the specified code point is a white space character |
offsetByCodePoints | public static int offsetByCodePoints(CharSequence text, int index, int codePointOffset)(Code) | | Cover the JDK API, for convenience. Adjust the char index by a code point offset.
Parameters: text - the characters to check Parameters: index - the index to adjust Parameters: codePointOffset - the number of code points by which to offset the index the adjusted index |
offsetByCodePoints | public static int offsetByCodePoints(char[] text, int start, int count, int index, int codePointOffset)(Code) | | Cover the JDK API, for convenience. Adjust the char index by a code point offset.
Parameters: text - the characters to check Parameters: start - the start of the range to check Parameters: count - the length of the range to check Parameters: index - the index to adjust Parameters: codePointOffset - the number of code points by which to offset the index the adjusted index |
toChars | final public static int toChars(int cp, char[] dst, int dstIndex)(Code) | | Cover the JDK 1.5 API, for convenience. Writes the chars representing the
code point into the destination at the given index.
Parameters: cp - the code point to convert Parameters: dst - the destination array into which to put the char(s) representing the code point Parameters: dstIndex - the index at which to put the first (or only) char the count of the number of chars written (1 or 2) throws: IllegalArgumentException - if cp is not a valid code point |
toChars | final public static char[] toChars(int cp)(Code) | | Cover the JDK 1.5 API, for convenience. Returns a char array
representing the code point.
Parameters: cp - the code point to convert an array containing the char(s) representing the code point throws: IllegalArgumentException - if cp is not a valid code point |
toCodePoint | final public static int toCodePoint(char high, char low)(Code) | | Cover the JDK 1.5 API, for convenience. Return the code point represented by
the characters. This does not check the surrogate pair for validity.
Parameters: high - the high (lead) surrogate Parameters: low - the low (trail) surrogate the code point formed by the surrogate pair |
toLowerCase | public static int toLowerCase(int ch)(Code) | | The given code point is mapped to its lowercase equivalent; if the code
point has no lowercase equivalent, the code point itself is returned.
Up-to-date Unicode implementation of java.lang.Character.toLowerCase()
This function only returns the simple, single-code point case mapping.
Full case mappings should be used whenever possible because they produce
better results by working on whole strings.
They take into account the string context and the language and can map
to a result string with a different length as appropriate.
Full case mappings are applied by the case mapping functions
that take String parameters rather than code points (int).
See also the User Guide chapter on C/POSIX migration:
http://icu.sourceforge.net/userguide/posix.html#case_mappings
Parameters: ch - code point whose lowercase equivalent is to be retrieved the lowercase equivalent code point |
toLowerCase | public static String toLowerCase(String str)(Code) | | Gets lowercase version of the argument string.
Casing is dependent on the default locale and context-sensitive
Parameters: str - source string to be performed on lowercase version of the argument string |
toLowerCase | public static String toLowerCase(Locale locale, String str)(Code) | | Gets lowercase version of the argument string.
Casing is dependent on the argument locale and context-sensitive
Parameters: locale - which string is to be converted in Parameters: str - source string to be performed on lowercase version of the argument string |
toLowerCase | public static String toLowerCase(ULocale locale, String str)(Code) | | Gets lowercase version of the argument string.
Casing is dependent on the argument locale and context-sensitive
Parameters: locale - which string is to be converted in Parameters: str - source string to be performed on lowercase version of the argument string |
toString | public static String toString(int ch)(Code) | | Converts argument code point and returns a String object representing
the code point's value in UTF16 format.
The result is a string whose length is 1 for non-supplementary code
points, 2 otherwise.
com.ibm.ibm.icu.UTF16 can be used to parse Strings generated by this
function.
Up-to-date Unicode implementation of java.lang.Character.toString()
Parameters: ch - code point string representation of the code point, null if code point is notdefined in unicode |
toTitleCase | public static int toTitleCase(int ch)(Code) | | Converts the code point argument to titlecase.
If no titlecase is available, the uppercase is returned. If no uppercase
is available, the code point itself is returned.
Up-to-date Unicode implementation of java.lang.Character.toTitleCase()
This function only returns the simple, single-code point case mapping.
Full case mappings should be used whenever possible because they produce
better results by working on whole strings.
They take into account the string context and the language and can map
to a result string with a different length as appropriate.
Full case mappings are applied by the case mapping functions
that take String parameters rather than code points (int).
See also the User Guide chapter on C/POSIX migration:
http://icu.sourceforge.net/userguide/posix.html#case_mappings
Parameters: ch - code point whose title case is to be retrieved titlecase code point |
toTitleCase | public static String toTitleCase(String str, BreakIterator breakiter)(Code) | | Gets the titlecase version of the argument string.
Position for titlecasing is determined by the argument break
iterator, hence the user can customized his break iterator for
a specialized titlecasing. In this case only the forward iteration
needs to be implemented.
If the break iterator passed in is null, the default Unicode algorithm
will be used to determine the titlecase positions.
Only positions returned by the break iterator will be title cased,
character in between the positions will all be in lower case.
Casing is dependent on the default locale and context-sensitive
Parameters: str - source string to be performed on Parameters: breakiter - break iterator to determine the positions in whichthe character should be title cased. lowercase version of the argument string |
toTitleCase | public static String toTitleCase(Locale locale, String str, BreakIterator breakiter)(Code) | | Gets the titlecase version of the argument string.
Position for titlecasing is determined by the argument break
iterator, hence the user can customized his break iterator for
a specialized titlecasing. In this case only the forward iteration
needs to be implemented.
If the break iterator passed in is null, the default Unicode algorithm
will be used to determine the titlecase positions.
Only positions returned by the break iterator will be title cased,
character in between the positions will all be in lower case.
Casing is dependent on the argument locale and context-sensitive
Parameters: locale - which string is to be converted in Parameters: str - source string to be performed on Parameters: breakiter - break iterator to determine the positions in whichthe character should be title cased. lowercase version of the argument string |
toTitleCase | public static String toTitleCase(ULocale locale, String str, BreakIterator titleIter)(Code) | | Gets the titlecase version of the argument string.
Position for titlecasing is determined by the argument break
iterator, hence the user can customized his break iterator for
a specialized titlecasing. In this case only the forward iteration
needs to be implemented.
If the break iterator passed in is null, the default Unicode algorithm
will be used to determine the titlecase positions.
Only positions returned by the break iterator will be title cased,
character in between the positions will all be in lower case.
Casing is dependent on the argument locale and context-sensitive
Parameters: locale - which string is to be converted in Parameters: str - source string to be performed on Parameters: titleIter - break iterator to determine the positions in whichthe character should be title cased. lowercase version of the argument string |
toUpperCase | public static int toUpperCase(int ch)(Code) | | Converts the character argument to uppercase.
If no uppercase is available, the character itself is returned.
Up-to-date Unicode implementation of java.lang.Character.toUpperCase()
This function only returns the simple, single-code point case mapping.
Full case mappings should be used whenever possible because they produce
better results by working on whole strings.
They take into account the string context and the language and can map
to a result string with a different length as appropriate.
Full case mappings are applied by the case mapping functions
that take String parameters rather than code points (int).
See also the User Guide chapter on C/POSIX migration:
http://icu.sourceforge.net/userguide/posix.html#case_mappings
Parameters: ch - code point whose uppercase is to be retrieved uppercase code point |
toUpperCase | public static String toUpperCase(String str)(Code) | | Gets uppercase version of the argument string.
Casing is dependent on the default locale and context-sensitive.
Parameters: str - source string to be performed on uppercase version of the argument string |
toUpperCase | public static String toUpperCase(Locale locale, String str)(Code) | | Gets uppercase version of the argument string.
Casing is dependent on the argument locale and context-sensitive.
Parameters: locale - which string is to be converted in Parameters: str - source string to be performed on uppercase version of the argument string |
toUpperCase | public static String toUpperCase(ULocale locale, String str)(Code) | | Gets uppercase version of the argument string.
Casing is dependent on the argument locale and context-sensitive.
Parameters: locale - which string is to be converted in Parameters: str - source string to be performed on uppercase version of the argument string |
|
|