| java.lang.Object com.ibm.icu.impl.TrieBuilder
All known Subclasses: com.ibm.icu.impl.IntTrieBuilder,
TrieBuilder | public class TrieBuilder (Code) | | Builder class to manipulate and generate a trie.
This is useful for ICU data in primitive types.
Provides a compact way to store information that is indexed by Unicode
values, such as character properties, types, keyboard values, etc. This is
very useful when you have a block of Unicode data that contains significant
values while the rest of the Unicode data is unused in the application or
when you have a lot of redundance, such as where all 21,000 Han ideographs
have the same value. However, lookup is much faster than a hash table.
A trie of any primitive data type serves two purposes:
- Fast access of the indexed values.
- Smaller memory footprint.
This is a direct port from the ICU4C version
author: Syn Wee Quek |
Inner Class :public static interface DataManipulate | |
Field Summary | |
final protected static int | BMP_INDEX_LENGTH_ Length of the BMP portion of the index (stage 1) array. | final public static int | DATA_BLOCK_LENGTH Number of data values in a stage 2 (data array) block. | final protected static int | DATA_GRANULARITY_ The alignment size of a stage 2 data block. | final protected static int | INDEX_SHIFT_ Shift size for shifting left the index array values. | final protected static int | MASK_ Mask for getting the lower bits from the input index. | final protected static int | MAX_DATA_LENGTH_ Maximum length of the runtime data (stage 2) array. | final protected static int | MAX_INDEX_LENGTH_ Length of the index (stage 1) array before folding. | final protected static int | OPTIONS_DATA_IS_32_BIT_ If set, then the data (stage 2) array is 32 bits wide. | final protected static int | OPTIONS_INDEX_SHIFT_ | final protected static int | OPTIONS_LATIN1_IS_LINEAR_ If set, then Latin-1 data (for U+0000..U+00ff) is stored in the data
(stage 2) array as a simple, linear array at data + DATA_BLOCK_LENGTH. | final protected static int | SHIFT_ Shift size for shifting right the input index. | final protected static int | SURROGATE_BLOCK_COUNT_ Number of index (stage 1) entries per lead surrogate.
Same as number of indexe entries for 1024 trail surrogates,
==0x400>>UTRIE_SHIFT
10 - SHIFT == Number of bits of a trail surrogate that are used in
index table lookups. | protected int | m_dataCapacity_ | protected int | m_dataLength_ | protected int | m_indexLength_ | protected int | m_index_ Index values at build-time are 32 bits wide for easier processing. | protected boolean | m_isCompacted_ | protected boolean | m_isLatin1Linear_ | protected int | m_map_ Map of adjusted indexes, used in utrie_compact(). |
Method Summary | |
final protected static boolean | equal_int(int[] array, int start1, int start2, int length) Compare two sections of an array for equality. | final protected static int | findSameIndexBlock(int index, int indexLength, int otherBlock) | protected void | findUnusedBlocks() Set a value in the trie index map to indicate which data block
is referenced and which one is not. | public boolean | isInZeroBlock(int ch) |
BMP_INDEX_LENGTH_ | final protected static int BMP_INDEX_LENGTH_(Code) | | Length of the BMP portion of the index (stage 1) array.
|
DATA_BLOCK_LENGTH | final public static int DATA_BLOCK_LENGTH(Code) | | Number of data values in a stage 2 (data array) block. 2, 4, 8, ..,
0x200
|
DATA_GRANULARITY_ | final protected static int DATA_GRANULARITY_(Code) | | The alignment size of a stage 2 data block. Also the granularity for
compaction.
|
INDEX_SHIFT_ | final protected static int INDEX_SHIFT_(Code) | | Shift size for shifting left the index array values.
Increases possible data size with 16-bit index values at the cost
of compactability.
This requires blocks of stage 2 data to be aligned by UTRIE_DATA_GRANULARITY.
0..UTRIE_SHIFT
|
MASK_ | final protected static int MASK_(Code) | | Mask for getting the lower bits from the input index.
DATA_BLOCK_LENGTH - 1.
|
MAX_DATA_LENGTH_ | final protected static int MAX_DATA_LENGTH_(Code) | | Maximum length of the runtime data (stage 2) array.
Limited by 16-bit index values that are left-shifted by INDEX_SHIFT_.
|
MAX_INDEX_LENGTH_ | final protected static int MAX_INDEX_LENGTH_(Code) | | Length of the index (stage 1) array before folding.
Maximum number of Unicode code points (0x110000) shifted right by
SHIFT.
|
OPTIONS_DATA_IS_32_BIT_ | final protected static int OPTIONS_DATA_IS_32_BIT_(Code) | | If set, then the data (stage 2) array is 32 bits wide.
|
OPTIONS_INDEX_SHIFT_ | final protected static int OPTIONS_INDEX_SHIFT_(Code) | | Shifting to position the index value in options
|
OPTIONS_LATIN1_IS_LINEAR_ | final protected static int OPTIONS_LATIN1_IS_LINEAR_(Code) | | If set, then Latin-1 data (for U+0000..U+00ff) is stored in the data
(stage 2) array as a simple, linear array at data + DATA_BLOCK_LENGTH.
|
SHIFT_ | final protected static int SHIFT_(Code) | | Shift size for shifting right the input index. 1..9
|
SURROGATE_BLOCK_COUNT_ | final protected static int SURROGATE_BLOCK_COUNT_(Code) | | Number of index (stage 1) entries per lead surrogate.
Same as number of indexe entries for 1024 trail surrogates,
==0x400>>UTRIE_SHIFT
10 - SHIFT == Number of bits of a trail surrogate that are used in
index table lookups.
|
m_dataCapacity_ | protected int m_dataCapacity_(Code) | | |
m_dataLength_ | protected int m_dataLength_(Code) | | |
m_indexLength_ | protected int m_indexLength_(Code) | | |
m_index_ | protected int m_index_(Code) | | Index values at build-time are 32 bits wide for easier processing.
Bit 31 is set if the data block is used by multiple index values
(from setRange()).
|
m_isCompacted_ | protected boolean m_isCompacted_(Code) | | |
m_isLatin1Linear_ | protected boolean m_isLatin1Linear_(Code) | | |
m_map_ | protected int m_map_(Code) | | Map of adjusted indexes, used in utrie_compact().
Maps from original indexes to new ones.
|
TrieBuilder | protected TrieBuilder()(Code) | | |
equal_int | final protected static boolean equal_int(int[] array, int start1, int start2, int length)(Code) | | Compare two sections of an array for equality.
|
findSameIndexBlock | final protected static int findSameIndexBlock(int index, int indexLength, int otherBlock)(Code) | | Finds the same index block as the otherBlock
Parameters: index - array Parameters: indexLength - size of index Parameters: otherBlock - same index block |
findUnusedBlocks | protected void findUnusedBlocks()(Code) | | Set a value in the trie index map to indicate which data block
is referenced and which one is not.
utrie_compact() will remove data blocks that are not used at all.
Set
- 0 if it is used
- -1 if it is not used
|
isInZeroBlock | public boolean isInZeroBlock(int ch)(Code) | | Checks if the character belongs to a zero block in the trie
Parameters: ch - codepoint which data is to be retrieved true if ch is in the zero block |
|
|