| java.lang.Object com.ibm.icu.charset.UConverterDataReader
UConverterDataReader | final class UConverterDataReader implements ICUBinary.Authenticate(Code) | | ucnvmbcs.h
ICU conversion (.cnv) data file structure, following the usual UDataInfo
header.
Format version: 6.2
struct UConverterStaticData -- struct containing the converter name, IBM CCSID,
min/max bytes per character, etc.
see ucnv_bld.h
--------------------
The static data is followed by conversionType-specific data structures.
At the moment, there are only variations of MBCS converters. They all have
the same toUnicode structures, while the fromUnicode structures for SBCS
differ from those for other MBCS-style converters.
_MBCSHeader.version 4.2 adds an optional conversion extension data structure.
If it is present, then an ICU version reading header versions 4.0 or 4.1
will be able to use the base table and ignore the extension.
The unicodeMask in the static data is part of the base table data structure.
Especially, the UCNV_HAS_SUPPLEMENTARY flag determines the length of the
fromUnicode stage 1 array.
The static data unicodeMask refers only to the base table's properties if
a base table is included.
In an extension-only file, the static data unicodeMask is 0.
The extension data indexes have a separate field with the unicodeMask flags.
MBCS-style data structure following the static data.
Offsets are counted in bytes from the beginning of the MBCS header structure.
Details about usage in comments in ucnvmbcs.c.
struct _MBCSHeader (see the definition in this header file below)
contains 32-bit fields as follows:
8 values:
0 uint8_t[4] MBCS version in UVersionInfo format (currently 4.2.0.0)
1 uint32_t countStates
2 uint32_t countToUFallbacks
3 uint32_t offsetToUCodeUnits
4 uint32_t offsetFromUTable
5 uint32_t offsetFromUBytes
6 uint32_t flags, bits:
31.. 8 offsetExtension -- _MBCSHeader.version 4.2 (ICU 2.8) and higher
0 for older versions and if
there is not extension structure
7.. 0 outputType
7 uint32_t fromUBytesLength -- _MBCSHeader.version 4.1 (ICU 2.4) and higher
counts bytes in fromUBytes[]
if(outputType==MBCS_OUTPUT_EXT_ONLY) {
-- base table name for extension-only table
char baseTableName[variable]; -- with NUL plus padding for 4-alignment
-- all _MBCSHeader fields except for version and flags are 0
} else {
-- normal base table with optional extension
int32_t stateTable[countStates][256];
struct _MBCSToUFallback { (fallbacks are sorted by offset)
uint32_t offset;
UChar32 codePoint;
} toUFallbacks[countToUFallbacks];
uint16_t unicodeCodeUnits[(offsetFromUTable-offsetToUCodeUnits)/2];
(padded to an even number of units)
-- stage 1 tables
if(staticData.unicodeMask&UCNV_HAS_SUPPLEMENTARY) {
-- stage 1 table for all of Unicode
uint16_t fromUTable[0x440]; (32-bit-aligned)
} else {
-- BMP-only tables have a smaller stage 1 table
uint16_t fromUTable[0x40]; (32-bit-aligned)
}
-- stage 2 tables
length determined by top of stage 1 and bottom of stage 3 tables
if(outputType==MBCS_OUTPUT_1) {
-- SBCS: pure indexes
uint16_t stage 2 indexes[?];
} else {
-- DBCS, MBCS, EBCDIC_STATEFUL, ...: roundtrip flags and indexes
uint32_t stage 2 flags and indexes[?];
}
-- stage 3 tables with byte results
if(outputType==MBCS_OUTPUT_1) {
-- SBCS: each 16-bit result contains flags and the result byte, see ucnvmbcs.c
uint16_t fromUBytes[fromUBytesLength/2];
} else {
-- DBCS, MBCS, EBCDIC_STATEFUL, ... 2/3/4 bytes result, see ucnvmbcs.c
uint8_t fromUBytes[fromUBytesLength]; or
uint16_t fromUBytes[fromUBytesLength/2]; or
uint32_t fromUBytes[fromUBytesLength/4];
}
}
-- extension table, details see ucnv_ext.h
int32_t indexes[>=32]; ...
|
UConverterDataReader | protected UConverterDataReader(InputStream inputStream) throws IOException(Code) | | Protected constructor.
Parameters: inputStream - ICU uprop.dat file input stream exception: IOException - throw if data file fails authentication |
getDataFormatVersion | byte[] getDataFormatVersion()(Code) | | |
getUnicodeVersion | byte[] getUnicodeVersion()(Code) | | |
isDataVersionAcceptable | public boolean isDataVersionAcceptable(byte version)(Code) | | Inherited method
|
readMBCSTable | protected void readMBCSTable(int[][] stateTableArray, CharsetMBCS.MBCSToUFallback[] toUFallbacksArray, char[] unicodeCodeUnitsArray, char[] fromUnicodeTableArray, byte[] fromUnicodeBytesArray) throws IOException(Code) | | |
|
|