| java.lang.Object com.ibm.icu.dev.tool.translit.UnicodeSetCloseOver
UnicodeSetCloseOver | class UnicodeSetCloseOver (Code) | | This class produces the data tables used by the closeOver() method
of UnicodeSet.
Whenever the Unicode database changes, this tool must be re-run
(AFTER the data file(s) underlying ICU4J are udpated).
The output of this tool should then be pasted into the appropriate
files:
ICU4J: com.ibm.icu.text.UnicodeSet.java
ICU4C: /icu/source/common/uniset.cpp
|
DEFAULT_CASE_MAP | final static boolean DEFAULT_CASE_MAP(Code) | | |
JAVA_CHARPROP_OUT | final static String JAVA_CHARPROP_OUT(Code) | | |
analyzeCaseData | static void analyzeCaseData(Map equivClasses, StringBuffer pairs, Vector nonpairs, Vector lengths)(Code) | | Analyze the case fold equivalency classes. Break them into two
groups: 'pairs', and 'nonpairs'. Create a tally of the length
configurations of the nonpairs.
Length configurations of equivalency classes, as of Unicode
3.2. Most of the classes (83%) have two single codepoints.
Here "112:28" means there are 28 equivalency classes with 2
single codepoints and one string of length 2.
11:656
111:16
1111:3
112:28
113:2
12:31
13:12
22:38
Note: This method does not count the frequencies of the
different length configurations (as shown above after ':'); it
merely records which configurations occur.
Parameters: pairs - Accumulate equivalency classes that consist ofexactly two codepoints here. This is 83+% of the classes.E.g., {"a", "A"}. Parameters: nonpairs - Accumulate other equivalency classes here, aslists of strings. E,g, {"st", "\uFB05", "\uFB06"}. Parameters: lengths - Accumulate a list of unique length structures,not including pairs. Each length structure is represented by astring of digits. The digit string "12" means the equivalencyclass contains a single code point and a string of length 2.Typical contents of 'lengths': { "111", "1111", "112","113", "12", "13", "22" }. Note the absence of "11". |
createCaseFoldEquivalencyClasses | static Map createCaseFoldEquivalencyClasses()(Code) | | Create a map of String => Set. The String in this case is a
folded string for which
UCharacter.foldCase(folded. DEFAULT_CASE_MAP).equals(folded).
The Set contains all single-character strings x for which
UCharacter.foldCase(x, DEFAULT_CASE_MAP).equals(folded), as
well as folded itself.
|
emitRangesString | static void emitRangesString(PrintStream out, UnicodeSet set, String id)(Code) | | Given a UnicodeSet, emit it as a Java string. The most economical
format is not the pattern, but instead a pairs list, with each
range pair represented as two adjacent characters.
|
emitUCharRangesArray | static void emitUCharRangesArray(PrintStream out, UnicodeSet set, String id)(Code) | | Given a UnicodeSet, emit it as an array of UChar pairs. Each
pair will be the start/end of a range. Code points >= U+10000
will be represented as surrogate pairs.
|
getCaseSensitive | static UnicodeSet getCaseSensitive()(Code) | | Create the set of case-sensitive characters. These are characters
that participate in any case mapping operation as a source or
as a member of a target string.
|
|
|