| java.lang.Object com.ibm.icu.text.CharsetMatch
CharsetMatch | public class CharsetMatch implements Comparable(Code) | | This class represents a charset that has been identified by a CharsetDetector
as a possible encoding for a set of input data. From an instance of this
class, you can ask for a confidence level in the charset identification,
or for Java Reader or String to access the original byte data in Unicode form.
Instances of this class are created only by CharsetDetectors.
Note: this class has a natural ordering that is inconsistent with equals.
The natural ordering is based on the match confidence value.
|
Field Summary | |
final public static int | BOM Bit flag indicating the match is based on the presence of a BOM. | final public static int | DECLARED_ENCODING Bit flag indicating he match is based on the declared encoding. | final public static int | ENCODING_SCHEME Bit flag indicating the match is based on the the encoding scheme. | final public static int | LANG_STATISTICS Bit flag indicating the match is based on language statistics. |
Method Summary | |
public int | compareTo(Object o) Compare to other CharsetMatch objects.
Comparison is based on the match confidence value, which
allows CharsetDetector.detectAll() to order its results. | public int | getConfidence() Get an indication of the confidence in the charset detected. | public String | getLanguage() Get the ISO code for the language of the detected charset. | public int | getMatchType() Return flags indicating what it was about the input data
that caused this charset to be considered as a possible match. | public String | getName() Get the name of the detected charset. | public Reader | getReader() Create a java.io.Reader for reading the Unicode character data corresponding
to the original byte data supplied to the Charset detect operation.
CAUTION: if the source of the byte data was an InputStream, a Reader
can be created for only one matching char set using this method. | public String | getString() Create a Java String from Unicode character data corresponding
to the original byte data supplied to the Charset detect operation. | public String | getString(int maxLength) Create a Java String from Unicode character data corresponding
to the original byte data supplied to the Charset detect operation.
The length of the returned string is limited to the specified size;
the string will be trunctated to this length if necessary. |
DECLARED_ENCODING | final public static int DECLARED_ENCODING(Code) | | Bit flag indicating he match is based on the declared encoding.
See Also: CharsetMatch.getMatchType |
ENCODING_SCHEME | final public static int ENCODING_SCHEME(Code) | | Bit flag indicating the match is based on the the encoding scheme.
See Also: CharsetMatch.getMatchType |
LANG_STATISTICS | final public static int LANG_STATISTICS(Code) | | Bit flag indicating the match is based on language statistics.
See Also: CharsetMatch.getMatchType |
compareTo | public int compareTo(Object o)(Code) | | Compare to other CharsetMatch objects.
Comparison is based on the match confidence value, which
allows CharsetDetector.detectAll() to order its results.
Parameters: o - the CharsetMatch object to compare against. a negative integer, zero, or a positive integer as the confidence level of this CharsetMatchis less than, equal to, or greater than that ofthe argument. throws: ClassCastException - if the argument is not a CharsetMatch. |
getConfidence | public int getConfidence()(Code) | | Get an indication of the confidence in the charset detected.
Confidence values range from 0-100, with larger numbers indicating
a better match of the input data to the characteristics of the
charset.
the confidence in the charset match |
getLanguage | public String getLanguage()(Code) | | Get the ISO code for the language of the detected charset.
The ISO code for the language or null if the language cannot be determined. |
getMatchType | public int getMatchType()(Code) | | Return flags indicating what it was about the input data
that caused this charset to be considered as a possible match.
The result is a bitfield containing zero or more of the flags
ENCODING_SCHEME, BOM, DECLARED_ENCODING, and LANG_STATISTICS.
A result of zero means no information is available.
Note: currently, this method always returns zero.
the type of match found for this charset. |
getName | public String getName()(Code) | | Get the name of the detected charset.
The name will be one that can be used with other APIs on the
platform that accept charset names. It is the "Canonical name"
as defined by the class java.nio.charset.Charset; for
charsets that are registered with the IANA charset registry,
this is the MIME-preferred registerd name.
See Also: java.nio.charset.Charset See Also: java.io.InputStreamReader The name of the charset. |
getReader | public Reader getReader()(Code) | | Create a java.io.Reader for reading the Unicode character data corresponding
to the original byte data supplied to the Charset detect operation.
CAUTION: if the source of the byte data was an InputStream, a Reader
can be created for only one matching char set using this method. If more
than one charset needs to be tried, the caller will need to reset
the InputStream and create InputStreamReaders itself, based on the charset name.
the Reader for the Unicode character data. |
getString | public String getString() throws java.io.IOException(Code) | | Create a Java String from Unicode character data corresponding
to the original byte data supplied to the Charset detect operation.
a String created from the converted input data. |
getString | public String getString(int maxLength) throws java.io.IOException(Code) | | Create a Java String from Unicode character data corresponding
to the original byte data supplied to the Charset detect operation.
The length of the returned string is limited to the specified size;
the string will be trunctated to this length if necessary. A limit value of
zero or less is ignored, and treated as no limit.
Parameters: maxLength - The maximium length of the String to be created when thesource of the data is an input stream, or -1 forunlimited length. a String created from the converted input data. |
|
|