| java.lang.Object sun.text.normalizer.UCharacter
UCharacter | final public class UCharacter (Code) | |
The UCharacter class provides extensions to the
java.lang.Character class. These extensions provide support for
Unicode 3.2 properties and together with the UTF16
class, provide support for supplementary characters (those with code
points above U+FFFF).
Code points are represented in these API using ints. While it would be
more convenient in Java to have a separate primitive datatype for them,
ints suffice in the meantime.
To use this class please add the jar file name icu4j.jar to the
class path, since it contains data files which supply the information used
by this file.
E.g. In Windows
set CLASSPATH=%CLASSPATH%;$JAR_FILE_PATH/ucharacter.jar .
Otherwise, another method would be to copy the files uprops.dat and
unames.icu from the icu4j source subdirectory
$ICU4J_SRC/src/com.ibm.icu.impl.data to your class directory
$ICU4J_CLASS/com.ibm.icu.impl.data.
Aside from the additions for UTF-16 support, and the updated Unicode 3.1
properties, the main differences between UCharacter and Character are:
- UCharacter is not designed to be a char wrapper and does not have
APIs to which involves management of that single char.
These include:
- char charValue(),
- int compareTo(java.lang.Character, java.lang.Character), etc.
- UCharacter does not include Character APIs that are deprecated, not
does it include the Java-specific character information, such as
boolean isJavaIdentifierPart(char ch).
- Character maps characters 'A' - 'Z' and 'a' - 'z' to the numeric
values '10' - '35'. UCharacter also does this in digit and
getNumericValue, to adhere to the java semantics of these
methods. New methods unicodeDigit, and
getUnicodeNumericValue do not treat the above code points
as having numeric values. This is a semantic change from ICU4J 1.3.1.
Further detail differences can be determined from the program
com.ibm.icu.dev.test.lang.UCharacterCompare
This class is not subclassable
author: Syn Wee Quek See Also: com.ibm.icu.lang.UCharacterEnums |
Inner Class :public static interface NumericType | |
Inner Class :public static interface HangulSyllableType | |
Inner Class :public static interface ECharacterCategory | |
Field Summary | |
final public static int | MAX_VALUE The highest Unicode code point value (scalar value) according to the
Unicode Standard. | final public static int | MIN_VALUE The lowest Unicode code point value. | final public static double | NO_NUMERIC_VALUE Special value that is returned by getUnicodeNumericValue(int) when no
numeric value is defined for a code point. | final public static int | SUPPLEMENTARY_MIN_VALUE |
Method Summary | |
public static int | digit(int ch, int radix) Retrieves the numeric value of a decimal digit code point.
This method observes the semantics of
java.lang.Character.digit() . | public static String | foldCase(String str, boolean defaultmapping) The given string is mapped to its case folding equivalent according to
UnicodeData.txt and CaseFolding.txt; if any character has no case
folding equivalent, the character itself is returned.
"Full", multiple-code point case folding mappings are returned here.
For "simple" single-code point mappings use the API
foldCase(int ch, boolean defaultmapping).
Parameters: str - the String to be converted Parameters: defaultmapping - Indicates if all mappings defined in CaseFolding.txt is to be used, otherwise the mappings for dotted I and dotless i marked with 'I' in CaseFolding.txt will be skipped. | public static VersionInfo | getAge(int ch) Get the "age" of the code point.
The "age" is the Unicode version when the code point was first
designated (as a non-character or for Private Use) or assigned a
character.
This can be useful to avoid emitting code points to receiving
processes that do not accept newer characters.
The data is from the UCD file DerivedAge.txt.
Parameters: ch - The code point. | public static int | getCodePoint(char lead, char trail) Returns a code point corresponding to the two UTF16 characters. | public static int | getDirection(int ch) Returns the Bidirection property of a code point. | public static int | getIntPropertyValue(int ch, int type) Gets the property value for an Unicode property type of a code point. | public static int | getType(int ch) Returns a value indicating a code point's Unicode category.
Up-to-date Unicode implementation of java.lang.Character.getType()
except for the above mentioned code points that had their category
changed.
Return results are constants from the interface
UCharacterCategory
NOTE: the UCharacterCategory values are not compatible with
those returned by java.lang.Character.getType. | public static double | getUnicodeNumericValue(int ch) Get the numeric value for a Unicode code point as defined in the
Unicode Character Database.
A "double" return type is necessary because some numeric values are
fractions, negative, or too large for int.
For characters without any numeric values in the Unicode Character
Database, this function will return NO_NUMERIC_VALUE.
API Change: In release 2.2 and prior, this API has a
return type int and returns -1 when the argument ch does not have a
corresponding numeric value. |
MAX_VALUE | final public static int MAX_VALUE(Code) | | The highest Unicode code point value (scalar value) according to the
Unicode Standard.
This is a 21-bit value (21 bits, rounded up).
Up-to-date Unicode implementation of java.lang.Character.MIN_VALUE
|
MIN_VALUE | final public static int MIN_VALUE(Code) | | The lowest Unicode code point value.
|
NO_NUMERIC_VALUE | final public static double NO_NUMERIC_VALUE(Code) | | Special value that is returned by getUnicodeNumericValue(int) when no
numeric value is defined for a code point.
See Also: UCharacter.getUnicodeNumericValue |
SUPPLEMENTARY_MIN_VALUE | final public static int SUPPLEMENTARY_MIN_VALUE(Code) | | The minimum value for Supplementary code points
|
digit | public static int digit(int ch, int radix)(Code) | | Retrieves the numeric value of a decimal digit code point.
This method observes the semantics of
java.lang.Character.digit() . Note that this
will return positive values for code points for which isDigit
returns false, just like java.lang.Character.
Semantic Change: In release 1.3.1 and
prior, this did not treat the European letters as having a
digit value, and also treated numeric letters and other numbers as
digits.
This has been changed to conform to the java semantics.
A code point is a valid digit if and only if:
- ch is a decimal digit or one of the european letters, and
- the value of ch is less than the specified radix.
Parameters: ch - the code point to query Parameters: radix - the radix the numeric value represented by the code point in thespecified radix, or -1 if the code point is not a decimal digitor if its value is too large for the radix |
foldCase | public static String foldCase(String str, boolean defaultmapping)(Code) | | The given string is mapped to its case folding equivalent according to
UnicodeData.txt and CaseFolding.txt; if any character has no case
folding equivalent, the character itself is returned.
"Full", multiple-code point case folding mappings are returned here.
For "simple" single-code point mappings use the API
foldCase(int ch, boolean defaultmapping).
Parameters: str - the String to be converted Parameters: defaultmapping - Indicates if all mappings defined in CaseFolding.txt is to be used, otherwise the mappings for dotted I and dotless i marked with 'I' in CaseFolding.txt will be skipped. the case folding equivalent of the character, if any; otherwise the character itself. See Also: UCharacter.foldCase(int,boolean) |
getAge | public static VersionInfo getAge(int ch)(Code) | | Get the "age" of the code point.
The "age" is the Unicode version when the code point was first
designated (as a non-character or for Private Use) or assigned a
character.
This can be useful to avoid emitting code points to receiving
processes that do not accept newer characters.
The data is from the UCD file DerivedAge.txt.
Parameters: ch - The code point. the Unicode version number |
getCodePoint | public static int getCodePoint(char lead, char trail)(Code) | | Returns a code point corresponding to the two UTF16 characters.
Parameters: lead - the lead char Parameters: trail - the trail char code point if surrogate characters are valid. exception: IllegalArgumentException - thrown when argument characters donot form a valid codepoint |
getDirection | public static int getDirection(int ch)(Code) | | Returns the Bidirection property of a code point.
For example, 0x0041 (letter A) has the LEFT_TO_RIGHT directional
property.
Result returned belongs to the interface
UCharacterDirection
Parameters: ch - the code point to be determined its direction direction constant from UCharacterDirection. |
getIntPropertyValue | public static int getIntPropertyValue(int ch, int type)(Code) | | Gets the property value for an Unicode property type of a code point.
Also returns binary and mask property values.
Unicode, especially in version 3.2, defines many more properties than
the original set in UnicodeData.txt.
The properties APIs are intended to reflect Unicode properties as
defined in the Unicode Character Database (UCD) and Unicode Technical
Reports (UTR). For details about the properties see
http://www.unicode.org/.
For names of Unicode properties see the UCD file PropertyAliases.txt.
Sample usage:
int ea = UCharacter.getIntPropertyValue(c, UProperty.EAST_ASIAN_WIDTH);
int ideo = UCharacter.getIntPropertyValue(c, UProperty.IDEOGRAPHIC);
boolean b = (ideo == 1) ? true : false;
Parameters: ch - code point to test. Parameters: type - UProperty selector constant, identifies which binary property to check. Must be UProperty.BINARY_START <= type < UProperty.BINARY_LIMIT or UProperty.INT_START <= type < UProperty.INT_LIMIT or UProperty.MASK_START <= type < UProperty.MASK_LIMIT. numeric value that is directly the property value or,for enumerated properties, corresponds to the numeric value of the enumerated constant of the respective property value enumeration type (cast to enum type if necessary).Returns 0 or 1 (for false / true) for binary Unicode properties.Returns a bit-mask for mask properties.Returns 0 if 'type' is out of bounds or if the Unicode versiondoes not have data for the property at all, or not for this code point. See Also: UProperty See Also: UCharacter.hasBinaryProperty See Also: UCharacter.getIntPropertyMinValue See Also: UCharacter.getIntPropertyMaxValue See Also: UCharacter.getUnicodeVersion |
getType | public static int getType(int ch)(Code) | | Returns a value indicating a code point's Unicode category.
Up-to-date Unicode implementation of java.lang.Character.getType()
except for the above mentioned code points that had their category
changed.
Return results are constants from the interface
UCharacterCategory
NOTE: the UCharacterCategory values are not compatible with
those returned by java.lang.Character.getType. UCharacterCategory values
match the ones used in ICU4C, while java.lang.Character type
values, though similar, skip the value 17.
Parameters: ch - code point whose type is to be determined category which is a value of UCharacterCategory |
getUnicodeNumericValue | public static double getUnicodeNumericValue(int ch)(Code) | | Get the numeric value for a Unicode code point as defined in the
Unicode Character Database.
A "double" return type is necessary because some numeric values are
fractions, negative, or too large for int.
For characters without any numeric values in the Unicode Character
Database, this function will return NO_NUMERIC_VALUE.
API Change: In release 2.2 and prior, this API has a
return type int and returns -1 when the argument ch does not have a
corresponding numeric value. This has been changed to synch with ICU4C
This corresponds to the ICU4C function u_getNumericValue.
Parameters: ch - Code point to get the numeric value for. numeric value of ch, or NO_NUMERIC_VALUE if none is defined. |
|
|