| java.lang.Object org.w3c.tidy.EncodingUtils
EncodingUtils | final public class EncodingUtils (Code) | | author: Fabrizio Giustina version: $Revision: 1.7 $ ($Author: fgiust $) |
Inner Class :static interface GetBytes | |
Inner Class :static interface PutBytes | |
Method Summary | |
protected static int | decodeMacRoman(int c) Function to convert from MacRoman to Unicode. | static int | decodeSymbolFont(int c) Function to convert from Symbol Font chars to Unicode. | static boolean | decodeUTF8BytesToChar(int[] c, int firstByte, byte[] successorBytes, GetBytes getter, int[] count, int startInSuccessorBytesArray) Decodes an array of bytes to a char. | protected static int | decodeWin1252(int c) Function for conversion from Windows-1252 to Unicode. | static boolean | encodeCharToUTF8Bytes(int c, byte[] encodebuf, PutBytes putter, int[] count) Encode a char to an array of bytes. |
FSM_ASCII | final public static int FSM_ASCII(Code) | | states for ISO 2022 A document in ISO-2022 based encoding uses some ESC sequences called "designator" to switch
character sets. The designators defined and used in ISO-2022-JP are: "ESC" + "(" + ? for ISO646 variants "ESC" +
"$" + ? and "ESC" + "$" + "(" + ? for multibyte character sets. State ASCII.
|
FSM_ESC | final public static int FSM_ESC(Code) | | state ESC.
|
FSM_ESCD | final public static int FSM_ESCD(Code) | | state ESCD.
|
FSM_ESCDP | final public static int FSM_ESCDP(Code) | | state ESCDP.
|
FSM_ESCP | final public static int FSM_ESCP(Code) | | state ESCP.
|
FSM_NONASCII | final public static int FSM_NONASCII(Code) | | state NONASCII.
|
HIGH_UTF16_SURROGATE | final public static int HIGH_UTF16_SURROGATE(Code) | | UTF-16 high surrogate.
|
LOW_UTF16_SURROGATE | final public static int LOW_UTF16_SURROGATE(Code) | | utf16 low surrogate.
|
MAX_UTF16_FROM_UCS4 | final public static int MAX_UTF16_FROM_UCS4(Code) | | Max UTF-16 value.
|
MAX_UTF8_FROM_UCS4 | final public static int MAX_UTF8_FROM_UCS4(Code) | | Max UTF-88 valid char value.
|
UNICODE_BOM | final public static int UNICODE_BOM(Code) | | the default (big-endian) UNICODE BOM.
|
UNICODE_BOM_BE | final public static int UNICODE_BOM_BE(Code) | | the big-endian (default) UNICODE BOM.
|
UNICODE_BOM_LE | final public static int UNICODE_BOM_LE(Code) | | the little-endian UNICODE BOM.
|
UNICODE_BOM_UTF8 | final public static int UNICODE_BOM_UTF8(Code) | | the UTF-8 UNICODE BOM.
|
UTF16_HIGH_SURROGATE_BEGIN | final public static int UTF16_HIGH_SURROGATE_BEGIN(Code) | | UTF-16 surrogate pair areas: high surrogates begin.
|
UTF16_HIGH_SURROGATE_END | final public static int UTF16_HIGH_SURROGATE_END(Code) | | UTF-16 surrogate pair areas: high surrogates end.
|
UTF16_LOW_SURROGATE_BEGIN | final public static int UTF16_LOW_SURROGATE_BEGIN(Code) | | UTF-16 surrogate pair areas: low surrogates begin.
|
UTF16_LOW_SURROGATE_END | final public static int UTF16_LOW_SURROGATE_END(Code) | | UTF-16 surrogate pair areas: low surrogates end.
|
UTF16_SURROGATES_BEGIN | final public static int UTF16_SURROGATES_BEGIN(Code) | | UTF-16 surrogates begin.
|
decodeMacRoman | protected static int decodeMacRoman(int c)(Code) | | Function to convert from MacRoman to Unicode.
Parameters: c - char to decode decoded char |
decodeSymbolFont | static int decodeSymbolFont(int c)(Code) | | Function to convert from Symbol Font chars to Unicode.
Parameters: c - char to decode decoded char |
decodeUTF8BytesToChar | static boolean decodeUTF8BytesToChar(int[] c, int firstByte, byte[] successorBytes, GetBytes getter, int[] count, int startInSuccessorBytesArray)(Code) | | Decodes an array of bytes to a char.
Parameters: c - will contain the decoded char Parameters: firstByte - first input byte Parameters: successorBytes - array containing successor bytes (can be null if a getter is provided). Parameters: getter - callback used to get new bytes if successorBytes doesn't contain enough bytes Parameters: count - will contain the number of bytes read Parameters: startInSuccessorBytesArray - starting offset for bytes in successorBytes true if error |
decodeWin1252 | protected static int decodeWin1252(int c)(Code) | | Function for conversion from Windows-1252 to Unicode.
Parameters: c - char to decode decoded char |
encodeCharToUTF8Bytes | static boolean encodeCharToUTF8Bytes(int c, byte[] encodebuf, PutBytes putter, int[] count)(Code) | | Encode a char to an array of bytes.
Parameters: c - char to encode Parameters: encodebuf - will contain the decoded bytes Parameters: putter - if not null it will be called to write bytes to out Parameters: count - number of bytes written false = ok, true = error |
|
|