Utility class to guess the encoding of a given text file.
Unicode files encoded in UTF-16 (low or big endian) or UTF-8 files
with a Byte Order Marker are correctly discovered. For UTF-8 files with no BOM, if the buffer
is wide enough, the charset should also be discovered.
A byte buffer of 4KB is usually sufficient to be able to guess the encoding.
Usage:
// guess the encoding
Charset guessedCharset = CharsetToolkit.guessEncoding(file, 4096);
// create a reader with the correct charset
CharsetToolkit toolkit = new CharsetToolkit(file);
BufferedReader reader = toolkit.getReader();
// read the file content
String line;
while ((line = br.readLine())!= null)
{
System.out.println(line);
}
author: Guillaume Laforge |