| java.lang.Object com.ibm.icu.impl.UCharacterName
UCharacterName | final public class UCharacterName (Code) | | Internal class to manage character names.
Since data for names are stored
in an array of char, by default indexes used in this class is refering to
a 2 byte count, unless otherwise stated. Cases where the index is refering
to a byte count, the index is halved and depending on whether the index is
even or odd, the MSB or LSB of the result char at the halved index is
returned. For indexes to an array of int, the index is multiplied by 2,
result char at the multiplied index and its following char is returned as an
int.
UCharacter acts as a public facade for this class
Note : 0 - 0x1F are control characters without names in Unicode 3.0
author: Syn Wee Quek since: nov0700 |
Inner Class :final static class AlgorithmName | |
Method Summary | |
public int | getAlgorithmEnd(int index) | public int | getAlgorithmLength() | public String | getAlgorithmName(int index, int codepoint) | public int | getAlgorithmStart(int index) | public int | getCharFromName(int choice, String name) | public void | getCharNameCharacters(UnicodeSet set) Fills set with characters that are used in Unicode character names.
Equivalent to uprv_getCharNameCharacters.
Parameters: set - USet to receive characters. | public static int | getCodepointMSB(int codepoint) | public String | getExtendedName(int ch) | public String | getExtendedOr10Name(int ch) | public int | getGroup(int codepoint) Gets the group index for the codepoint, or the group before it. | public int | getGroupLengths(int index, char offsets, char lengths) Reads a block of compressed lengths of 32 strings and expands them into
offsets and lengths for each string. | public static int | getGroupLimit(int msb) | public int | getGroupMSB(int gindex) | public static int | getGroupMin(int msb) | public static int | getGroupMinFromCodepoint(int codepoint) | public String | getGroupName(int index, int length, int choice) Gets the name of the argument group index.
UnicodeData.txt uses ';' as a field separator, so no field can contain
';' as part of its contents. | public String | getGroupName(int ch, int choice) | public static int | getGroupOffset(int codepoint) | public void | getISOCommentCharacters(UnicodeSet set) Fills set with characters that are used in Unicode character names.
Equivalent to uprv_getISOCommentCharacters.
Parameters: set - USet to receive characters. | public static UCharacterName | getInstance() | public int | getMaxCharNameLength() Gets the maximum length of any codepoint name. | public int | getMaxISOCommentLength() Gets the maximum length of any iso comments. | public String | getName(int ch, int choice) Retrieve the name of a Unicode code point.
Depending on choice , the character name written into the
buffer is the "modern" name or the name that was defined in Unicode
version 1.0.
The name contains only "invariant" characters
like A-Z, 0-9, space, and '-'.
Parameters: ch - the code point for which to get the name. Parameters: choice - Selector for which name to get. | boolean | setAlgorithm(AlgorithmName alg) | boolean | setGroup(char group, byte groupstring) | boolean | setGroupCountSize(int count, int size) | boolean | setToken(char token, byte tokenstring) |
EXTENDED_CATEGORY_ | final static int EXTENDED_CATEGORY_(Code) | | Extended category count
|
LINES_PER_GROUP_ | final public static int LINES_PER_GROUP_(Code) | | Number of lines per group
1 << GROUP_SHIFT_
|
m_groupcount_ | public int m_groupcount_(Code) | | Maximum number of groups
|
m_groupsize_ | int m_groupsize_(Code) | | Size of each groups
|
getAlgorithmEnd | public int getAlgorithmEnd(int index)(Code) | | Gets the end of the range
Parameters: index - algorithm index algorithm range end |
getAlgorithmLength | public int getAlgorithmLength()(Code) | | Get the Algorithm range length
Algorithm range length |
getAlgorithmName | public String getAlgorithmName(int index, int codepoint)(Code) | | Gets the Algorithmic name of the codepoint
Parameters: index - algorithmic range index Parameters: codepoint - algorithmic name of codepoint |
getAlgorithmStart | public int getAlgorithmStart(int index)(Code) | | Gets the start of the range
Parameters: index - algorithm index algorithm range start |
getCharFromName | public int getCharFromName(int choice, String name)(Code) | | Find a character by its name and return its code point value
Parameters: choice - selector to indicate if argument name is a Unicode 1.0or the most current version Parameters: name - the name to search for code point |
getCharNameCharacters | public void getCharNameCharacters(UnicodeSet set)(Code) | | Fills set with characters that are used in Unicode character names.
Equivalent to uprv_getCharNameCharacters.
Parameters: set - USet to receive characters. Existing contents are deleted. |
getCodepointMSB | public static int getCodepointMSB(int codepoint)(Code) | | Gets the MSB of the codepoint
Parameters: codepoint - the MSB of the codepoint |
getExtendedName | public String getExtendedName(int ch)(Code) | | Retrieves the extended name
|
getExtendedOr10Name | public String getExtendedOr10Name(int ch)(Code) | | Gets the extended and 1.0 name when the most current unicode names
fail
Parameters: ch - codepoint name of codepoint extended or 1.0 |
getGroup | public int getGroup(int codepoint)(Code) | | Gets the group index for the codepoint, or the group before it.
Parameters: codepoint - group index containing codepoint or the group before it. |
getGroupLengths | public int getGroupLengths(int index, char offsets, char lengths)(Code) | | Reads a block of compressed lengths of 32 strings and expands them into
offsets and lengths for each string. Lengths are stored with a
variable-width encoding in consecutive nibbles:
If a nibble<0xc, then it is the length itself (0 = empty string).
If a nibble>=0xc, then it forms a length value with the following
nibble.
The offsets and lengths arrays must be at least 33 (one more) long
because there is no check here at the end if the last nibble is still
used.
Parameters: index - of group string object in array Parameters: offsets - array to store the value of the string offsets Parameters: lengths - array to store the value of the string length next index of the data string immediately after the lengthsin terms of byte address |
getGroupLimit | public static int getGroupLimit(int msb)(Code) | | Gets the maximum codepoint + 1 of the group
Parameters: msb - most significant byte of the group limit codepoint of the group |
getGroupMSB | public int getGroupMSB(int gindex)(Code) | | Gets the MSB from the group index
Parameters: gindex - group index the MSB of the group if gindex is valid, -1 otherwise |
getGroupMin | public static int getGroupMin(int msb)(Code) | | Gets the minimum codepoint of the group
Parameters: msb - most significant byte of the group minimum codepoint of the group |
getGroupMinFromCodepoint | public static int getGroupMinFromCodepoint(int codepoint)(Code) | | Gets the minimum codepoint of a group
Parameters: codepoint - minimum codepoint in the group which codepoint belongs to |
getGroupName | public String getGroupName(int index, int length, int choice)(Code) | | Gets the name of the argument group index.
UnicodeData.txt uses ';' as a field separator, so no field can contain
';' as part of its contents. In unames.icu, it is marked as
token[';'] == -1 only if the semicolon is used in the data file - which
is iff we have Unicode 1.0 names or ISO comments.
So, it will be token[';'] == -1 if we store U1.0 names/ISO comments
although we know that it will never be part of a name.
Equivalent to ICU4C's expandName.
Parameters: index - of the group name string in byte count Parameters: length - of the group name string Parameters: choice - of Unicode 1.0 name or the most current name name of the group |
getGroupName | public String getGroupName(int ch, int choice)(Code) | | Gets the group name of the character
Parameters: ch - character to get the group name Parameters: choice - name choice selector to choose a unicode 1.0 or newer name |
getGroupOffset | public static int getGroupOffset(int codepoint)(Code) | | Gets the offset to a group
Parameters: codepoint - offset to a group |
getISOCommentCharacters | public void getISOCommentCharacters(UnicodeSet set)(Code) | | Fills set with characters that are used in Unicode character names.
Equivalent to uprv_getISOCommentCharacters.
Parameters: set - USet to receive characters. Existing contents are deleted. |
getMaxCharNameLength | public int getMaxCharNameLength()(Code) | | Gets the maximum length of any codepoint name.
Equivalent to uprv_getMaxCharNameLength.
the maximum length of any codepoint name |
getMaxISOCommentLength | public int getMaxISOCommentLength()(Code) | | Gets the maximum length of any iso comments.
Equivalent to uprv_getMaxISOCommentLength.
the maximum length of any codepoint name |
getName | public String getName(int ch, int choice)(Code) | | Retrieve the name of a Unicode code point.
Depending on choice , the character name written into the
buffer is the "modern" name or the name that was defined in Unicode
version 1.0.
The name contains only "invariant" characters
like A-Z, 0-9, space, and '-'.
Parameters: ch - the code point for which to get the name. Parameters: choice - Selector for which name to get. if code point is above 0x1fff, null is returned |
setAlgorithm | boolean setAlgorithm(AlgorithmName alg)(Code) | | Set the algorithm name information array
Parameters: alg - Algorithm information array true if the group string offset has been set correctly |
setGroup | boolean setGroup(char group, byte groupstring)(Code) | | Sets the group name data
Parameters: group - index information array Parameters: groupstring - name information array false if there is a data error |
setGroupCountSize | boolean setGroupCountSize(int count, int size)(Code) | | Sets the number of group and size of each group in number of char
Parameters: count - number of groups Parameters: size - size of group in char true if group size is set correctly |
setToken | boolean setToken(char token, byte tokenstring)(Code) | | Sets the token data
Parameters: token - array of tokens Parameters: tokenstring - array of string values of the tokens false if there is a data error |
|
|