| java.lang.Object com.ibm.icu.impl.Trie
All known Subclasses: com.ibm.icu.impl.CharTrie, com.ibm.icu.impl.IntTrie,
Trie | abstract public class Trie (Code) | | A trie is a kind of compressed, serializable table of values
associated with Unicode code points (0..0x10ffff).
This class defines the basic structure of a trie and provides methods
to retrieve the offsets to the actual data.
Data will be the form of an array of basic types, char or int.
The actual data format will have to be specified by the user in the
inner static interface com.ibm.icu.impl.Trie.DataManipulate.
This trie implementation is optimized for getting offset while walking
forward through a UTF-16 string.
Therefore, the simplest and fastest access macros are the
fromLead() and fromOffsetTrail() methods.
The fromBMP() method are a little more complicated; they get offsets even
for lead surrogate codepoints, while the fromLead() method get special
"folded" offsets for lead surrogate code units if there is relevant data
associated with them.
From such a folded offsets, an offset needs to be extracted to supply
to the fromOffsetTrail() methods.
To handle such supplementary codepoints, some offset information are kept
in the data.
Methods in com.ibm.icu.impl.Trie.DataManipulate are called to retrieve
that offset from the folded value for the lead surrogate unit.
For examples of use, see com.ibm.icu.impl.CharTrie or
com.ibm.icu.impl.IntTrie.
author: synwee See Also: com.ibm.icu.impl.CharTrie See Also: com.ibm.icu.impl.IntTrie since: release 2.1, Jan 01 2002 |
Inner Class :public static interface DataManipulate | |
Constructor Summary | |
protected | Trie(InputStream inputStream, DataManipulate dataManipulate) Trie constructor for CharTrie use. | protected | Trie(char index, int options, DataManipulate dataManipulate) |
Method Summary | |
public boolean | equals(Object other) Checks if the argument Trie has the same data as this Trie. | final protected int | getBMPOffset(char ch) Gets the offset to data which the BMP character points to
Treats a lead surrogate as a normal code point. | final protected int | getCodePointOffset(int ch) Internal trie getter from a code point. | abstract protected int | getInitialValue() | final protected int | getLeadOffset(char ch) Gets the offset to the data which this lead surrogate character points
to. | final protected int | getRawOffset(int offset, char ch) Gets the offset to the data which the index ch after variable offset
points to.
Note for locating a non-supplementary character data offset, calling
getRawOffset(0, ch);
will do. | public int | getSerializedDataSize() Gets the serialized data file size of the Trie. | abstract protected int | getSurrogateOffset(char lead, char trail) Gets the offset to the data which the surrogate pair points to. | abstract protected int | getValue(int index) | final protected boolean | isCharTrie() | final protected boolean | isIntTrie() | final public boolean | isLatin1Linear() | protected void | unserialize(InputStream inputStream) Parses the inputstream and creates the trie index with it.
This is overwritten by the child classes. |
BMP_INDEX_LENGTH | final protected static int BMP_INDEX_LENGTH(Code) | | Length of the BMP portion of the index (stage 1) array.
|
DATA_BLOCK_LENGTH | final protected static int DATA_BLOCK_LENGTH(Code) | | Number of data values in a stage 2 (data array) block.
|
HEADER_LENGTH_ | final protected static int HEADER_LENGTH_(Code) | | Size of Trie header in bytes
|
HEADER_OPTIONS_DATA_IS_32_BIT_ | final protected static int HEADER_OPTIONS_DATA_IS_32_BIT_(Code) | | |
HEADER_OPTIONS_INDEX_SHIFT_ | final protected static int HEADER_OPTIONS_INDEX_SHIFT_(Code) | | |
HEADER_OPTIONS_LATIN1_IS_LINEAR_MASK_ | final protected static int HEADER_OPTIONS_LATIN1_IS_LINEAR_MASK_(Code) | | Latin 1 option mask
|
HEADER_SIGNATURE_ | final protected static int HEADER_SIGNATURE_(Code) | | Constant number to authenticate the byte block
|
INDEX_STAGE_1_SHIFT_ | final protected static int INDEX_STAGE_1_SHIFT_(Code) | | Shift size for shifting right the input index. 1..9
|
INDEX_STAGE_2_SHIFT_ | final protected static int INDEX_STAGE_2_SHIFT_(Code) | | Shift size for shifting left the index array values.
Increases possible data size with 16-bit index values at the cost
of compactability.
This requires blocks of stage 2 data to be aligned by
DATA_GRANULARITY.
0..INDEX_STAGE_1_SHIFT
|
INDEX_STAGE_3_MASK_ | final protected static int INDEX_STAGE_3_MASK_(Code) | | Mask for getting the lower bits from the input index.
DATA_BLOCK_LENGTH - 1.
|
LEAD_INDEX_OFFSET_ | final protected static int LEAD_INDEX_OFFSET_(Code) | | Lead surrogate code points' index displacement in the index array.
0x10000-0xd800=0x2800
0x2800 >> INDEX_STAGE_1_SHIFT_
|
SURROGATE_BLOCK_BITS | final protected static int SURROGATE_BLOCK_BITS(Code) | | Number of bits of a trail surrogate that are used in index table lookups.
|
SURROGATE_BLOCK_COUNT | final protected static int SURROGATE_BLOCK_COUNT(Code) | | Number of index (stage 1) entries per lead surrogate.
Same as number of index entries for 1024 trail surrogates,
==0x400>>INDEX_STAGE_1_SHIFT_
|
SURROGATE_MASK_ | final protected static int SURROGATE_MASK_(Code) | | Surrogate mask to use when shifting offset to retrieve supplementary
values
|
m_dataLength_ | protected int m_dataLength_(Code) | | Length of the data array
|
m_dataManipulate_ | protected DataManipulate m_dataManipulate_(Code) | | Internal TrieValue which handles the parsing of the data value.
This class is to be implemented by the user
|
m_dataOffset_ | protected int m_dataOffset_(Code) | | Start index of the data portion of the trie. CharTrie combines
index and data into a char array, so this is used to indicate the
initial offset to the data portion.
Note this index always points to the initial value.
|
m_index_ | protected char m_index_(Code) | | Index or UTF16 characters
|
Trie | protected Trie(InputStream inputStream, DataManipulate dataManipulate) throws IOException(Code) | | Trie constructor for CharTrie use.
Parameters: inputStream - ICU data file input stream which contains thetrie Parameters: dataManipulate - object containing the information to parse the trie data throws: IOException - thrown when input stream does not have theright header. |
Trie | protected Trie(char index, int options, DataManipulate dataManipulate)(Code) | | Trie constructor
Parameters: index - array to be used for index Parameters: options - used by the trie Parameters: dataManipulate - object containing the information to parse the trie data |
equals | public boolean equals(Object other)(Code) | | Checks if the argument Trie has the same data as this Trie.
Attributes are checked but not the index data.
Parameters: other - Trie to check true if the argument Trie has the same data as this Trie, falseotherwise |
getBMPOffset | final protected int getBMPOffset(char ch)(Code) | | Gets the offset to data which the BMP character points to
Treats a lead surrogate as a normal code point.
Parameters: ch - BMP character offset to data |
getCodePointOffset | final protected int getCodePointOffset(int ch)(Code) | | Internal trie getter from a code point.
Could be faster(?) but longer with
if((c32)<=0xd7ff) { (result)=_TRIE_GET_RAW(trie, data, 0, c32); }
Gets the offset to data which the codepoint points to
Parameters: ch - codepoint offset to data |
getInitialValue | abstract protected int getInitialValue()(Code) | | Gets the default initial value
32 bit value |
getLeadOffset | final protected int getLeadOffset(char ch)(Code) | | Gets the offset to the data which this lead surrogate character points
to.
Data at the returned offset may contain folding offset information for
the next trailing surrogate character.
Parameters: ch - lead surrogate character offset to data |
getRawOffset | final protected int getRawOffset(int offset, char ch)(Code) | | Gets the offset to the data which the index ch after variable offset
points to.
Note for locating a non-supplementary character data offset, calling
getRawOffset(0, ch);
will do. Otherwise if it is a supplementary character formed by
surrogates lead and trail. Then we would have to call getRawOffset()
with getFoldingIndexOffset(). See getSurrogateOffset().
Parameters: offset - index offset which ch is to start from Parameters: ch - index to be used after offset offset to the data |
getSerializedDataSize | public int getSerializedDataSize()(Code) | | Gets the serialized data file size of the Trie. This is used during
trie data reading for size checking purposes.
size size of serialized trie data file in terms of the numberof bytes |
getSurrogateOffset | abstract protected int getSurrogateOffset(char lead, char trail)(Code) | | Gets the offset to the data which the surrogate pair points to.
Parameters: lead - lead surrogate Parameters: trail - trailing surrogate offset to data |
getValue | abstract protected int getValue(int index)(Code) | | Gets the value at the argument index
Parameters: index - value at index will be retrieved 32 bit value |
isCharTrie | final protected boolean isCharTrie()(Code) | | Determines if this is a 16 bit trie
true if this is a 16 bit trie |
isIntTrie | final protected boolean isIntTrie()(Code) | | Determines if this is a 32 bit trie
true if options specifies this is a 32 bit trie |
isLatin1Linear | final public boolean isLatin1Linear()(Code) | | Determines if this trie has a linear latin 1 array
true if this trie has a linear latin 1 array, false otherwise |
unserialize | protected void unserialize(InputStream inputStream) throws IOException(Code) | | Parses the inputstream and creates the trie index with it.
This is overwritten by the child classes.
Parameters: inputStream - input stream containing the trie information exception: IOException - thrown when data reading fails. |
|
|