| java.lang.Object com.ibm.icu.text.CharsetRecognizer
All known Subclasses: com.ibm.icu.text.CharsetRecog_2022, com.ibm.icu.text.CharsetRecog_mbcs, com.ibm.icu.text.CharsetRecog_Unicode, com.ibm.icu.text.CharsetRecog_UTF8, com.ibm.icu.text.CharsetRecog_sbcs,
CharsetRecognizer | abstract class CharsetRecognizer (Code) | | Abstract class for recognizing a single charset.
Part of the implementation of ICU's CharsetDetector.
Each specific charset that can be recognized will have an instance
of some subclass of this class. All interaction between the overall
CharsetDetector and the stuff specific to an individual charset happens
via the interface provided here.
Instances of CharsetDetector DO NOT have or maintain
state pertaining to a specific match or detect operation.
The WILL be shared by multiple instances of CharsetDetector.
They encapsulate const charset-specific information.
|
Method Summary | |
public String | getLanguage() Get the ISO language code for this charset. | abstract String | getName() Get the IANA name of this charset. | abstract int | match(CharsetDetector det) Test the match of this charset with the input text data
which is obtained via the CharsetDetector object.
Parameters: det - The CharsetDetector, which contains the input textto be checked for being in this charset. |
getLanguage | public String getLanguage()(Code) | | Get the ISO language code for this charset.
the language code, or null if the language cannot be determined. |
getName | abstract String getName()(Code) | | Get the IANA name of this charset.
the charset name. |
match | abstract int match(CharsetDetector det)(Code) | | Test the match of this charset with the input text data
which is obtained via the CharsetDetector object.
Parameters: det - The CharsetDetector, which contains the input textto be checked for being in this charset. Two values packed into one int (Damn java, anyhow) bits 0-7: the match confidence, ranging from 0-100 bits 8-15: The match reason, an enum-like value. |
|
|