| java.lang.Object com.ibm.icu.text.CharsetRecognizer com.ibm.icu.text.CharsetRecog_mbcs
CharsetRecog_mbcs | abstract class CharsetRecog_mbcs extends CharsetRecognizer (Code) | | CharsetRecognizer implemenation for Asian - double or multi-byte - charsets.
Match is determined mostly by the input data adhering to the
encoding scheme for the charset, and, optionally,
frequency-of-occurence of characters.
Instances of this class are singletons, one per encoding
being recognized. They are created in the main
CharsetDetector class and kept in the global list of available
encodings to be checked. The specific encoding being recognized
is determined by subclass.
|
Inner Class :static class iteratedChar | |
Method Summary | |
abstract String | getName() Get the IANA name of this charset. | int | match(CharsetDetector det, int[] commonChars) Test the match of this charset with the input text data
which is obtained via the CharsetDetector object.
Parameters: det - The CharsetDetector, which contains the input textto be checked for being in this charset. | abstract boolean | nextChar(iteratedChar it, CharsetDetector det) Get the next character (however many bytes it is) from the input data
Subclasses for specific charset encodings must implement this function
to get characters according to the rules of their encoding scheme.
This function is not a method of class iteratedChar only because
that would require a lot of extra derived classes, which is awkward.
Parameters: it - The iteratedChar "struct" into which the returned char is placed. Parameters: det - The charset detector, which is needed to get at the input byte databeing iterated over. |
getName | abstract String getName()(Code) | | Get the IANA name of this charset.
the charset name. |
match | int match(CharsetDetector det, int[] commonChars)(Code) | | Test the match of this charset with the input text data
which is obtained via the CharsetDetector object.
Parameters: det - The CharsetDetector, which contains the input textto be checked for being in this charset. Two values packed into one int (Damn java, anyhow) bits 0-7: the match confidence, ranging from 0-100 bits 8-15: The match reason, an enum-like value. |
nextChar | abstract boolean nextChar(iteratedChar it, CharsetDetector det)(Code) | | Get the next character (however many bytes it is) from the input data
Subclasses for specific charset encodings must implement this function
to get characters according to the rules of their encoding scheme.
This function is not a method of class iteratedChar only because
that would require a lot of extra derived classes, which is awkward.
Parameters: it - The iteratedChar "struct" into which the returned char is placed. Parameters: det - The charset detector, which is needed to get at the input byte databeing iterated over. True if a character was returned, false at end of input. |
|
|