Source Code Cross Referenced for Charset.java in » 6.0-JDK-Core » io-nio » java » nio » charset » Java Source Code / Java DocumentationJava Source Code and Java Documentation

1.	6.0 JDK Core
2.	6.0 JDK Modules
3.	6.0 JDK Modules com.sun
4.	6.0 JDK Modules com.sun.java
5.	6.0 JDK Modules sun
6.	6.0 JDK Platform
7.	Ajax
8.	Apache Harmony Java SE
9.	Aspect oriented
10.	Authentication Authorization
11.	Blogger System
12.	Build
13.	Byte Code
14.	Cache
15.	Chart
16.	Chat
17.	Code Analyzer
18.	Collaboration
19.	Content Management System
20.	Database Client
21.	Database DBMS
22.	Database JDBC Connection Pool
23.	Database ORM
24.	Development
25.	EJB Server
26.	ERP CRM Financial
27.	ESB
28.	Forum
29.	Game
30.	GIS
31.	Graphic 3D
32.	Graphic Library
33.	Groupware
34.	HTML Parser
35.	IDE
36.	IDE Eclipse
37.	IDE Netbeans
38.	Installer
39.	Internationalization Localization
40.	Inversion of Control
41.	Issue Tracking
42.	J2EE
43.	J2ME
44.	JBoss
45.	JMS
46.	JMX
47.	Library
48.	Mail Clients
49.	Music
50.	Net
51.	Parser
52.	PDF
53.	Portal
54.	Profiler
55.	Project Management
56.	Report
57.	RSS RDF
58.	Rule Engine
59.	Science
60.	Scripting
61.	Search Engine
62.	Security
63.	Sevlet Container
64.	Source Control
65.	Swing Library
66.	Template Engine
67.	Test Coverage
68.	Testing
69.	UML
70.	Web Crawler
71.	Web Framework
72.	Web Mail
73.	Web Server
74.	Web Services
75.	Web Services apache cxf 2.2.6
76.	Web Services AXIS2
77.	Wiki Engine
78.	Workflow Engines
79.	XML
80.	XML UI
Java Source Code / Java Documentation » 6.0 JDK Core » io nio » java.nio.charset
Source Cross Referenced Class Diagram Java Document (Java Doc)
001        /*
002         * Copyright 2000-2006 Sun Microsystems, Inc.  All Rights Reserved.
003         * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
004         *
005         * This code is free software; you can redistribute it and/or modify it
006         * under the terms of the GNU General Public License version 2 only, as
007         * published by the Free Software Foundation.  Sun designates this
008         * particular file as subject to the "Classpath" exception as provided
009         * by Sun in the LICENSE file that accompanied this code.
010         *
011         * This code is distributed in the hope that it will be useful, but WITHOUT
012         * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
013         * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
014         * version 2 for more details (a copy is included in the LICENSE file that
015         * accompanied this code).
016         *
017         * You should have received a copy of the GNU General Public License version
018         * 2 along with this work; if not, write to the Free Software Foundation,
019         * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
020         *
021         * Please contact Sun Microsystems, Inc., 4150 Network Circle, Santa Clara,
022         * CA 95054 USA or visit www.sun.com if you need additional information or
023         * have any questions.
024         */
025
026        package java.nio.charset;
027
028        import java.nio.ByteBuffer;
029        import java.nio.CharBuffer;
030        import java.nio.charset.spi.CharsetProvider;
031        import java.security.AccessController;
032        import java.security.AccessControlException;
033        import java.security.PrivilegedAction;
034        import java.util.Collections;
035        import java.util.HashSet;
036        import java.util.Iterator;
037        import java.util.Locale;
038        import java.util.Map;
039        import java.util.NoSuchElementException;
040        import java.util.Set;
041        import java.util.ServiceLoader;
042        import java.util.ServiceConfigurationError;
043        import java.util.SortedMap;
044        import java.util.TreeMap;
045        import sun.misc.ASCIICaseInsensitiveComparator;
046        import sun.nio.cs.StandardCharsets;
047        import sun.nio.cs.ThreadLocalCoders;
048        import sun.security.action.GetPropertyAction;
049
050        /**
051         * A named mapping between sequences of sixteen-bit Unicode <a
052         * href="../../lang/Character.html#unicode">code units</a> and sequences of
053         * bytes.  This class defines methods for creating decoders and encoders and
054         * for retrieving the various names associated with a charset.  Instances of
055         * this class are immutable.
056         *
057         * <p> This class also defines static methods for testing whether a particular
058         * charset is supported, for locating charset instances by name, and for
059         * constructing a map that contains every charset for which support is
060         * available in the current Java virtual machine.  Support for new charsets can
061         * be added via the service-provider interface defined in the {@link
062         * java.nio.charset.spi.CharsetProvider} class.
063         *
064         * <p> All of the methods defined in this class are safe for use by multiple
065         * concurrent threads.
066         *
067         *
068         * <a name="names"><a name="charenc">
069         * <h4>Charset names</h4>
070         *
071         * <p> Charsets are named by strings composed of the following characters:
072         *
073         * <ul>
074         *
075         *   <li> The uppercase letters <tt>'A'</tt> through <tt>'Z'</tt>
076         *        (<tt>'&#92;u0041'</tt>&nbsp;through&nbsp;<tt>'&#92;u005a'</tt>),
077         * 
078         *   <li> The lowercase letters <tt>'a'</tt> through <tt>'z'</tt>
079         *        (<tt>'&#92;u0061'</tt>&nbsp;through&nbsp;<tt>'&#92;u007a'</tt>),
080         * 
081         *   <li> The digits <tt>'0'</tt> through <tt>'9'</tt>
082         *        (<tt>'&#92;u0030'</tt>&nbsp;through&nbsp;<tt>'&#92;u0039'</tt>),
083         * 
084         *   <li> The dash character <tt>'-'</tt>
085         *        (<tt>'&#92;u002d'</tt>,&nbsp;<small>HYPHEN-MINUS</small>),
086         *
087         *   <li> The period character <tt>'.'</tt>
088         *        (<tt>'&#92;u002e'</tt>,&nbsp;<small>FULL STOP</small>),
089         * 
090         *   <li> The colon character <tt>':'</tt>
091         *        (<tt>'&#92;u003a'</tt>,&nbsp;<small>COLON</small>), and
092         * 
093         *   <li> The underscore character <tt>'_'</tt>
094         *        (<tt>'&#92;u005f'</tt>,&nbsp;<small>LOW&nbsp;LINE</small>).
095         * 
096         * </ul>
097         *
098         * A charset name must begin with either a letter or a digit.  The empty string
099         * is not a legal charset name.  Charset names are not case-sensitive; that is,
100         * case is always ignored when comparing charset names.  Charset names
101         * generally follow the conventions documented in <a
102         * href="http://www.ietf.org/rfc/rfc2278.txt"><i>RFC&nbsp;2278:&nbsp;IANA Charset
103         * Registration Procedures</i></a>.
104         *
105         * <p> Every charset has a <i>canonical name</i> and may also have one or more
106         * <i>aliases</i>.  The canonical name is returned by the {@link #name() name} method
107         * of this class.  Canonical names are, by convention, usually in upper case.
108         * The aliases of a charset are returned by the {@link #aliases() aliases}
109         * method.
110         *
111         * <a name="hn">
112         *
113         * <p> Some charsets have an <i>historical name</i> that is defined for
114         * compatibility with previous versions of the Java platform.  A charset's
115         * historical name is either its canonical name or one of its aliases.  The
116         * historical name is returned by the <tt>getEncoding()</tt> methods of the
117         * {@link java.io.InputStreamReader#getEncoding InputStreamReader} and {@link
118         * java.io.OutputStreamWriter#getEncoding OutputStreamWriter} classes.
119         *
120         * <a name="iana">
121         *
122         * <p> If a charset listed in the <a
123         * href="http://www.iana.org/assignments/character-sets"><i>IANA Charset
124         * Registry</i></a> is supported by an implementation of the Java platform then
125         * its canonical name must be the name listed in the registry.  Many charsets
126         * are given more than one name in the registry, in which case the registry
127         * identifies one of the names as <i>MIME-preferred</i>.  If a charset has more
128         * than one registry name then its canonical name must be the MIME-preferred
129         * name and the other names in the registry must be valid aliases.  If a
130         * supported charset is not listed in the IANA registry then its canonical name
131         * must begin with one of the strings <tt>"X-"</tt> or <tt>"x-"</tt>.
132         *
133         * <p> The IANA charset registry does change over time, and so the canonical
134         * name and the aliases of a particular charset may also change over time.  To
135         * ensure compatibility it is recommended that no alias ever be removed from a
136         * charset, and that if the canonical name of a charset is changed then its
137         * previous canonical name be made into an alias.
138         *
139         *
140         * <h4>Standard charsets</h4>
141         *
142         * <p> Every implementation of the Java platform is required to support the
143         * following standard charsets.  Consult the release documentation for your
144         * implementation to see if any other charsets are supported.  The behavior
145         * of such optional charsets may differ between implementations.
146         * 
147         * <blockquote><table width="80%" summary="Description of standard charsets">
148         * <tr><th><p align="left">Charset</p></th><th><p align="left">Description</p></th></tr>
149         * <tr><td valign=top><tt>US-ASCII</tt></td>
150         *     <td>Seven-bit ASCII, a.k.a. <tt>ISO646-US</tt>,
151         *         a.k.a. the Basic Latin block of the Unicode character set</td></tr>
152         * <tr><td valign=top><tt>ISO-8859-1&nbsp;&nbsp;</tt></td>
153         *     <td>ISO Latin Alphabet No. 1, a.k.a. <tt>ISO-LATIN-1</tt></td></tr>
154         * <tr><td valign=top><tt>UTF-8</tt></td>
155         *     <td>Eight-bit UCS Transformation Format</td></tr>
156         * <tr><td valign=top><tt>UTF-16BE</tt></td>
157         *     <td>Sixteen-bit UCS Transformation Format,
158         *         big-endian byte&nbsp;order</td></tr>
159         * <tr><td valign=top><tt>UTF-16LE</tt></td>
160         *     <td>Sixteen-bit UCS Transformation Format,
161         *         little-endian byte&nbsp;order</td></tr>
162         * <tr><td valign=top><tt>UTF-16</tt></td>
163         *     <td>Sixteen-bit UCS Transformation Format,
164         *         byte&nbsp;order identified by an optional byte-order mark</td></tr>
165         * </table></blockquote>
166         * 
167         * <p> The <tt>UTF-8</tt> charset is specified by <a
168         * href="http://www.ietf.org/rfc/rfc2279.txt"><i>RFC&nbsp;2279</i></a>; the
169         * transformation format upon which it is based is specified in
170         * Amendment&nbsp;2 of ISO&nbsp;10646-1 and is also described in the <a
171         * href="http://www.unicode.org/unicode/standard/standard.html"><i>Unicode
172         * Standard</i></a>.
173         *
174         * <p> The <tt>UTF-16</tt> charsets are specified by <a
175         * href="http://www.ietf.org/rfc/rfc2781.txt"><i>RFC&nbsp;2781</i></a>; the
176         * transformation formats upon which they are based are specified in
177         * Amendment&nbsp;1 of ISO&nbsp;10646-1 and are also described in the <a
178         * href="http://www.unicode.org/unicode/standard/standard.html"><i>Unicode
179         * Standard</i></a>.
180         *
181         * <p> The <tt>UTF-16</tt> charsets use sixteen-bit quantities and are
182         * therefore sensitive to byte order.  In these encodings the byte order of a
183         * stream may be indicated by an initial <i>byte-order mark</i> represented by
184         * the Unicode character <tt>'&#92;uFEFF'</tt>.  Byte-order marks are handled
185         * as follows:
186         *
187         * <ul>
188         *
189         *   <li><p> When decoding, the <tt>UTF-16BE</tt> and <tt>UTF-16LE</tt>
190         *   charsets ignore byte-order marks; when encoding, they do not write
191         *   byte-order marks. </p></li>
192         *
193         *   <li><p> When decoding, the <tt>UTF-16</tt> charset interprets a byte-order
194         *   mark to indicate the byte order of the stream but defaults to big-endian
195         *   if there is no byte-order mark; when encoding, it uses big-endian byte
196         *   order and writes a big-endian byte-order mark. </p></li>
197         *
198         * </ul>
199         *
200         * In any case, when a byte-order mark is read at the beginning of a decoding
201         * operation it is omitted from the resulting sequence of characters.  Byte
202         * order marks occuring after the first element of an input sequence are not
203         * omitted since the same code is used to represent <small>ZERO-WIDTH
204         * NON-BREAKING SPACE</small>.
205         *
206         * <p> Every instance of the Java virtual machine has a default charset, which
207         * may or may not be one of the standard charsets.  The default charset is
208         * determined during virtual-machine startup and typically depends upon the
209         * locale and charset being used by the underlying operating system. </p>
210         *
211         *
212         * <h4>Terminology</h4>
213         *
214         * <p> The name of this class is taken from the terms used in <a
215         * href="http://www.ietf.org/rfc/rfc2278.txt""><i>RFC&nbsp;2278</i></a>.  In that
216         * document a <i>charset</i> is defined as the combination of a coded character
217         * set and a character-encoding scheme.
218         *
219         * <p> A <i>coded character set</i> is a mapping between a set of abstract
220         * characters and a set of integers.  US-ASCII, ISO&nbsp;8859-1,
221         * JIS&nbsp;X&nbsp;0201, and full Unicode, which is the same as
222         * ISO&nbsp;10646-1, are examples of coded character sets.
223         *
224         * <p> A <i>character-encoding scheme</i> is a mapping between a coded
225         * character set and a set of octet (eight-bit byte) sequences.  UTF-8, UCS-2,
226         * UTF-16, ISO&nbsp;2022, and EUC are examples of character-encoding schemes.
227         * Encoding schemes are often associated with a particular coded character set;
228         * UTF-8, for example, is used only to encode Unicode.  Some schemes, however,
229         * are associated with multiple character sets; EUC, for example, can be used
230         * to encode characters in a variety of Asian character sets.
231         *
232         * <p> When a coded character set is used exclusively with a single
233         * character-encoding scheme then the corresponding charset is usually named
234         * for the character set; otherwise a charset is usually named for the encoding
235         * scheme and, possibly, the locale of the character sets that it supports.
236         * Hence <tt>US-ASCII</tt> is the name of the charset for US-ASCII while
237         * <tt>EUC-JP</tt> is the name of the charset that encodes the
238         * JIS&nbsp;X&nbsp;0201, JIS&nbsp;X&nbsp;0208, and JIS&nbsp;X&nbsp;0212
239         * character sets.
240         *
241         * <p> The native character encoding of the Java programming language is
242         * UTF-16.  A charset in the Java platform therefore defines a mapping between
243         * sequences of sixteen-bit UTF-16 code units and sequences of bytes. </p>
244         *
245         *
246         * @author Mark Reinhold
247         * @author JSR-51 Expert Group
248         * @version 1.61, 07/05/19
249         * @since 1.4
250         *
251         * @see CharsetDecoder
252         * @see CharsetEncoder
253         * @see java.nio.charset.spi.CharsetProvider
254         * @see java.lang.Character
255         */
256
257        public abstract class Charset implements  Comparable<Charset> {
258
259            /* -- Static methods -- */
260
261            private static String bugLevel = null;
262
263            static boolean atBugLevel(String bl) { // package-private
264                if (bugLevel == null) {
265                    if (!sun.misc.VM.isBooted())
266                        return false;
267                    java.security.PrivilegedAction pa = new GetPropertyAction(
268                            "sun.nio.cs.bugLevel");
269                    bugLevel = (String) AccessController.doPrivileged(pa);
270                    if (bugLevel == null)
271                        bugLevel = "";
272                }
273                return (bugLevel != null) && bugLevel.equals(bl);
274            }
275
276            /**
277             * Checks that the given string is a legal charset name. </p>
278             *
279             * @param  s
280             *         A purported charset name
281             *
282             * @throws  IllegalCharsetNameException
283             *          If the given name is not a legal charset name
284             */
285            private static void checkName(String s) {
286                int n = s.length();
287                if (!atBugLevel("1.4")) {
288                    if (n == 0)
289                        throw new IllegalCharsetNameException(s);
290                }
291                for (int i = 0; i < n; i++) {
292                    char c = s.charAt(i);
293                    if (c >= 'A' && c <= 'Z')
294                        continue;
295                    if (c >= 'a' && c <= 'z')
296                        continue;
297                    if (c >= '0' && c <= '9')
298                        continue;
299                    if (c == '-' && i != 0)
300                        continue;
301                    if (c == ':' && i != 0)
302                        continue;
303                    if (c == '_' && i != 0)
304                        continue;
305                    if (c == '.' && i != 0)
306                        continue;
307                    throw new IllegalCharsetNameException(s);
308                }
309            }
310
311            /* The standard set of charsets */
312            private static CharsetProvider standardProvider = new StandardCharsets();
313
314            // Cache of the most-recently-returned charsets,
315            // along with the names that were used to find them
316            //
317            private static volatile Object[] cache1 = null; // "Level 1" cache
318            private static volatile Object[] cache2 = null; // "Level 2" cache
319
320            private static void cache(String charsetName, Charset cs) {
321                cache2 = cache1;
322                cache1 = new Object[] { charsetName, cs };
323            }
324
325            // Creates an iterator that walks over the available providers, ignoring
326            // those whose lookup or instantiation causes a security exception to be
327            // thrown.  Should be invoked with full privileges.
328            //
329            private static Iterator providers() {
330                return new Iterator() {
331
332                    ClassLoader cl = ClassLoader.getSystemClassLoader();
333                    ServiceLoader<CharsetProvider> sl = ServiceLoader.load(
334                            CharsetProvider.class, cl);
335                    Iterator<CharsetProvider> i = sl.iterator();
336
337                    Object next = null;
338
339                    private boolean getNext() {
340                        while (next == null) {
341                            try {
342                                if (!i.hasNext())
343                                    return false;
344                                next = i.next();
345                            } catch (ServiceConfigurationError sce) {
346                                if (sce.getCause() instanceof  SecurityException) {
347                                    // Ignore security exceptions
348                                    continue;
349                                }
350                                throw sce;
351                            }
352                        }
353                        return true;
354                    }
355
356                    public boolean hasNext() {
357                        return getNext();
358                    }
359
360                    public Object next() {
361                        if (!getNext())
362                            throw new NoSuchElementException();
363                        Object n = next;
364                        next = null;
365                        return n;
366                    }
367
368                    public void remove() {
369                        throw new UnsupportedOperationException();
370                    }
371
372                };
373            }
374
375            // Thread-local gate to prevent recursive provider lookups
376            private static ThreadLocal gate = new ThreadLocal();
377
378            private static Charset lookupViaProviders(final String charsetName) {
379
380                // The runtime startup sequence looks up standard charsets as a
381                // consequence of the VM's invocation of System.initializeSystemClass
382                // in order to, e.g., set system properties and encode filenames.  At
383                // that point the application class loader has not been initialized,
384                // however, so we can't look for providers because doing so will cause
385                // that loader to be prematurely initialized with incomplete
386                // information.
387                //
388                if (!sun.misc.VM.isBooted())
389                    return null;
390
391                if (gate.get() != null)
392                    // Avoid recursive provider lookups
393                    return null;
394                try {
395                    gate.set(gate);
396
397                    return (Charset) AccessController
398                            .doPrivileged(new PrivilegedAction() {
399                                public Object run() {
400                                    for (Iterator i = providers(); i.hasNext();) {
401                                        CharsetProvider cp = (CharsetProvider) i
402                                                .next();
403                                        Charset cs = cp
404                                                .charsetForName(charsetName);
405                                        if (cs != null)
406                                            return cs;
407                                    }
408                                    return null;
409                                }
410                            });
411
412                } finally {
413                    gate.set(null);
414                }
415            }
416
417            /* The extended set of charsets */
418            private static Object extendedProviderLock = new Object();
419            private static boolean extendedProviderProbed = false;
420            private static CharsetProvider extendedProvider = null;
421
422            private static void probeExtendedProvider() {
423                AccessController.doPrivileged(new PrivilegedAction() {
424                    public Object run() {
425                        try {
426                            Class epc = Class
427                                    .forName("sun.nio.cs.ext.ExtendedCharsets");
428                            extendedProvider = (CharsetProvider) epc
429                                    .newInstance();
430                        } catch (ClassNotFoundException x) {
431                            // Extended charsets not available
432                            // (charsets.jar not present)
433                        } catch (InstantiationException x) {
434                            throw new Error(x);
435                        } catch (IllegalAccessException x) {
436                            throw new Error(x);
437                        }
438                        return null;
439                    }
440                });
441            }
442
443            private static Charset lookupExtendedCharset(String charsetName) {
444                CharsetProvider ecp = null;
445                synchronized (extendedProviderLock) {
446                    if (!extendedProviderProbed) {
447                        probeExtendedProvider();
448                        extendedProviderProbed = true;
449                    }
450                    ecp = extendedProvider;
451                }
452                return (ecp != null) ? ecp.charsetForName(charsetName) : null;
453            }
454
455            private static Charset lookup(String charsetName) {
456                if (charsetName == null)
457                    throw new IllegalArgumentException("Null charset name");
458
459                Object[] a;
460                if ((a = cache1) != null && charsetName.equals(a[0]))
461                    return (Charset) a[1];
462                // We expect most programs to use one Charset repeatedly.
463                // We convey a hint to this effect to the VM by putting the
464                // level 1 cache miss code in a separate method.
465                return lookup2(charsetName);
466            }
467
468            private static Charset lookup2(String charsetName) {
469                Object[] a;
470                if ((a = cache2) != null && charsetName.equals(a[0])) {
471                    cache2 = cache1;
472                    cache1 = a;
473                    return (Charset) a[1];
474                }
475
476                Charset cs;
477                if ((cs = standardProvider.charsetForName(charsetName)) != null
478                        || (cs = lookupExtendedCharset(charsetName)) != null
479                        || (cs = lookupViaProviders(charsetName)) != null) {
480                    cache(charsetName, cs);
481                    return cs;
482                }
483
484                /* Only need to check the name if we didn't find a charset for it */
485                checkName(charsetName);
486                return null;
487            }
488
489            /**
490             * Tells whether the named charset is supported. </p>
491             *
492             * @param  charsetName
493             *         The name of the requested charset; may be either
494             *         a canonical name or an alias
495             *
496             * @return  <tt>true</tt> if, and only if, support for the named charset
497             *          is available in the current Java virtual machine
498             *
499             * @throws IllegalCharsetNameException
500             *         If the given charset name is illegal
501             *
502             * @throws  IllegalArgumentException
503             *          If the given <tt>charsetName</tt> is null
504             */
505            public static boolean isSupported(String charsetName) {
506                return (lookup(charsetName) != null);
507            }
508
509            /**
510             * Returns a charset object for the named charset. </p>
511             *
512             * @param  charsetName
513             *         The name of the requested charset; may be either
514             *         a canonical name or an alias
515             *
516             * @return  A charset object for the named charset
517             *
518             * @throws  IllegalCharsetNameException
519             *          If the given charset name is illegal
520             *
521             * @throws  IllegalArgumentException
522             *          If the given <tt>charsetName</tt> is null
523             *
524             * @throws  UnsupportedCharsetException
525             *          If no support for the named charset is available
526             *          in this instance of the Java virtual machine
527             */
528            public static Charset forName(String charsetName) {
529                Charset cs = lookup(charsetName);
530                if (cs != null)
531                    return cs;
532                throw new UnsupportedCharsetException(charsetName);
533            }
534
535            // Fold charsets from the given iterator into the given map, ignoring
536            // charsets whose names already have entries in the map.
537            //
538            private static void put(Iterator i, Map m) {
539                while (i.hasNext()) {
540                    Charset cs = (Charset) i.next();
541                    if (!m.containsKey(cs.name()))
542                        m.put(cs.name(), cs);
543                }
544            }
545
546            /**
547             * Constructs a sorted map from canonical charset names to charset objects.
548             *
549             * <p> The map returned by this method will have one entry for each charset
550             * for which support is available in the current Java virtual machine.  If
551             * two or more supported charsets have the same canonical name then the
552             * resulting map will contain just one of them; which one it will contain
553             * is not specified. </p>
554             *
555             * <p> The invocation of this method, and the subsequent use of the
556             * resulting map, may cause time-consuming disk or network I/O operations
557             * to occur.  This method is provided for applications that need to
558             * enumerate all of the available charsets, for example to allow user
559             * charset selection.  This method is not used by the {@link #forName
560             * forName} method, which instead employs an efficient incremental lookup
561             * algorithm.
562             *
563             * <p> This method may return different results at different times if new
564             * charset providers are dynamically made available to the current Java
565             * virtual machine.  In the absence of such changes, the charsets returned
566             * by this method are exactly those that can be retrieved via the {@link
567             * #forName forName} method.  </p>
568             *
569             * @return An immutable, case-insensitive map from canonical charset names
570             *         to charset objects
571             */
572            public static SortedMap<String, Charset> availableCharsets() {
573                return (SortedMap) AccessController
574                        .doPrivileged(new PrivilegedAction() {
575                            public Object run() {
576                                TreeMap m = new TreeMap(
577                                        ASCIICaseInsensitiveComparator.CASE_INSENSITIVE_ORDER);
578                                put(standardProvider.charsets(), m);
579                                for (Iterator i = providers(); i.hasNext();) {
580                                    CharsetProvider cp = (CharsetProvider) i
581                                            .next();
582                                    put(cp.charsets(), m);
583                                }
584                                return Collections.unmodifiableSortedMap(m);
585                            }
586                        });
587            }
588
589            private static volatile Charset defaultCharset;
590
591            /**
592             * Returns the default charset of this Java virtual machine.
593             *
594             * <p> The default charset is determined during virtual-machine startup and
595             * typically depends upon the locale and charset of the underlying
596             * operating system.
597             *
598             * @return  A charset object for the default charset
599             *
600             * @since 1.5
601             */
602            public static Charset defaultCharset() {
603                if (defaultCharset == null) {
604                    synchronized (Charset.class) {
605                        java.security.PrivilegedAction pa = new GetPropertyAction(
606                                "file.encoding");
607                        String csn = (String) AccessController.doPrivileged(pa);
608                        Charset cs = lookup(csn);
609                        if (cs != null)
610                            defaultCharset = cs;
611                        else
612                            defaultCharset = forName("UTF-8");
613                    }
614                }
615                return defaultCharset;
616            }
617
618            /* -- Instance fields and methods -- */
619
620            private final String name; // tickles a bug in oldjavac
621            private final String[] aliases; // tickles a bug in oldjavac
622            private Set aliasSet = null;
623
624            /**
625             * Initializes a new charset with the given canonical name and alias
626             * set. </p>
627             *
628             * @param  canonicalName
629             *         The canonical name of this charset
630             *
631             * @param  aliases
632             *         An array of this charset's aliases, or null if it has no aliases
633             *
634             * @throws IllegalCharsetNameException
635             *         If the canonical name or any of the aliases are illegal
636             */
637            protected Charset(String canonicalName, String[] aliases) {
638                checkName(canonicalName);
639                String[] as = (aliases == null) ? new String[0] : aliases;
640                for (int i = 0; i < as.length; i++)
641                    checkName(as[i]);
642                this .name = canonicalName;
643                this .aliases = as;
644            }
645
646            /**
647             * Returns this charset's canonical name. </p>
648             *
649             * @return  The canonical name of this charset
650             */
651            public final String name() {
652                return name;
653            }
654
655            /**
656             * Returns a set containing this charset's aliases. </p>
657             *
658             * @return  An immutable set of this charset's aliases
659             */
660            public final Set<String> aliases() {
661                if (aliasSet != null)
662                    return aliasSet;
663                int n = aliases.length;
664                HashSet hs = new HashSet(n);
665                for (int i = 0; i < n; i++)
666                    hs.add(aliases[i]);
667                aliasSet = Collections.unmodifiableSet(hs);
668                return aliasSet;
669            }
670
671            /**
672             * Returns this charset's human-readable name for the default locale.
673             *
674             * <p> The default implementation of this method simply returns this
675             * charset's canonical name.  Concrete subclasses of this class may
676             * override this method in order to provide a localized display name. </p>
677             *
678             * @return  The display name of this charset in the default locale
679             */
680            public String displayName() {
681                return name;
682            }
683
684            /**
685             * Tells whether or not this charset is registered in the <a
686             * href="http://www.iana.org/assignments/character-sets">IANA Charset
687             * Registry</a>.  </p>
688             *
689             * @return  <tt>true</tt> if, and only if, this charset is known by its
690             *          implementor to be registered with the IANA
691             */
692            public final boolean isRegistered() {
693                return !name.startsWith("X-") && !name.startsWith("x-");
694            }
695
696            /**
697             * Returns this charset's human-readable name for the given locale.
698             *
699             * <p> The default implementation of this method simply returns this
700             * charset's canonical name.  Concrete subclasses of this class may
701             * override this method in order to provide a localized display name. </p>
702             *
703             * @param  locale
704             *         The locale for which the display name is to be retrieved
705             *
706             * @return  The display name of this charset in the given locale
707             */
708            public String displayName(Locale locale) {
709                return name;
710            }
711
712            /**
713             * Tells whether or not this charset contains the given charset.
714             *
715             * <p> A charset <i>C</i> is said to <i>contain</i> a charset <i>D</i> if,
716             * and only if, every character representable in <i>D</i> is also
717             * representable in <i>C</i>.  If this relationship holds then it is
718             * guaranteed that every string that can be encoded in <i>D</i> can also be
719             * encoded in <i>C</i> without performing any replacements.
720             *
721             * <p> That <i>C</i> contains <i>D</i> does not imply that each character
722             * representable in <i>C</i> by a particular byte sequence is represented
723             * in <i>D</i> by the same byte sequence, although sometimes this is the
724             * case.
725             *
726             * <p> Every charset contains itself.
727             *
728             * <p> This method computes an approximation of the containment relation:
729             * If it returns <tt>true</tt> then the given charset is known to be
730             * contained by this charset; if it returns <tt>false</tt>, however, then
731             * it is not necessarily the case that the given charset is not contained
732             * in this charset.
733             *
734             * @return  <tt>true</tt> if the given charset is contained in this charset
735             */
736            public abstract boolean contains(Charset cs);
737
738            /**
739             * Constructs a new decoder for this charset. </p>
740             *
741             * @return  A new decoder for this charset
742             */
743            public abstract CharsetDecoder newDecoder();
744
745            /**
746             * Constructs a new encoder for this charset. </p>
747             *
748             * @return  A new encoder for this charset
749             *
750             * @throws  UnsupportedOperationException
751             *          If this charset does not support encoding
752             */
753            public abstract CharsetEncoder newEncoder();
754
755            /**
756             * Tells whether or not this charset supports encoding.
757             *
758             * <p> Nearly all charsets support encoding.  The primary exceptions are
759             * special-purpose <i>auto-detect</i> charsets whose decoders can determine
760             * which of several possible encoding schemes is in use by examining the
761             * input byte sequence.  Such charsets do not support encoding because
762             * there is no way to determine which encoding should be used on output.
763             * Implementations of such charsets should override this method to return
764             * <tt>false</tt>. </p>
765             *
766             * @return  <tt>true</tt> if, and only if, this charset supports encoding
767             */
768            public boolean canEncode() {
769                return true;
770            }
771
772            /**
773             * Convenience method that decodes bytes in this charset into Unicode
774             * characters.
775             *
776             * <p> An invocation of this method upon a charset <tt>cs</tt> returns the
777             * same result as the expression
778             *
779             * <pre>
780             *     cs.newDecoder()
781             *       .onMalformedInput(CodingErrorAction.REPLACE)
782             *       .onUnmappableCharacter(CodingErrorAction.REPLACE)
783             *       .decode(bb); </pre>
784             *
785             * except that it is potentially more efficient because it can cache
786             * decoders between successive invocations.
787             *
788             * <p> This method always replaces malformed-input and unmappable-character
789             * sequences with this charset's default replacement byte array.  In order
790             * to detect such sequences, use the {@link
791             * CharsetDecoder#decode(java.nio.ByteBuffer)} method directly.  </p>
792             *
793             * @param  bb  The byte buffer to be decoded
794             *
795             * @return  A char buffer containing the decoded characters
796             */
797            public final CharBuffer decode(ByteBuffer bb) {
798                try {
799                    return ThreadLocalCoders.decoderFor(this ).onMalformedInput(
800                            CodingErrorAction.REPLACE).onUnmappableCharacter(
801                            CodingErrorAction.REPLACE).decode(bb);
802                } catch (CharacterCodingException x) {
803                    throw new Error(x); // Can't happen
804                }
805            }
806
807            /**
808             * Convenience method that encodes Unicode characters into bytes in this
809             * charset.
810             *
811             * <p> An invocation of this method upon a charset <tt>cs</tt> returns the
812             * same result as the expression
813             *
814             * <pre>
815             *     cs.newEncoder()
816             *       .onMalformedInput(CodingErrorAction.REPLACE)
817             *       .onUnmappableCharacter(CodingErrorAction.REPLACE)
818             *       .encode(bb); </pre>
819             *
820             * except that it is potentially more efficient because it can cache
821             * encoders between successive invocations.
822             *
823             * <p> This method always replaces malformed-input and unmappable-character
824             * sequences with this charset's default replacement string.  In order to
825             * detect such sequences, use the {@link
826             * CharsetEncoder#encode(java.nio.CharBuffer)} method directly.  </p>
827             *
828             * @param  cb  The char buffer to be encoded
829             *
830             * @return  A byte buffer containing the encoded characters
831             */
832            public final ByteBuffer encode(CharBuffer cb) {
833                try {
834                    return ThreadLocalCoders.encoderFor(this ).onMalformedInput(
835                            CodingErrorAction.REPLACE).onUnmappableCharacter(
836                            CodingErrorAction.REPLACE).encode(cb);
837                } catch (CharacterCodingException x) {
838                    throw new Error(x); // Can't happen
839                }
840            }
841
842            /**
843             * Convenience method that encodes a string into bytes in this charset.
844             *
845             * <p> An invocation of this method upon a charset <tt>cs</tt> returns the
846             * same result as the expression
847             *
848             * <pre>
849             *     cs.encode(CharBuffer.wrap(s)); </pre>
850             *
851             * @param  str  The string to be encoded
852             *
853             * @return  A byte buffer containing the encoded characters
854             */
855            public final ByteBuffer encode(String str) {
856                return encode(CharBuffer.wrap(str));
857            }
858
859            /**
860             * Compares this charset to another.
861             *
862             * <p> Charsets are ordered by their canonical names, without regard to
863             * case. </p>
864             *
865             * @param  that
866             *         The charset to which this charset is to be compared
867             *
868             * @return A negative integer, zero, or a positive integer as this charset
869             *         is less than, equal to, or greater than the specified charset
870             */
871            public final int compareTo(Charset that) {
872                return (name().compareToIgnoreCase(that.name()));
873            }
874
875            /**
876             * Computes a hashcode for this charset. </p>
877             *
878             * @return  An integer hashcode
879             */
880            public final int hashCode() {
881                return name().hashCode();
882            }
883
884            /**
885             * Tells whether or not this object is equal to another.
886             *
887             * <p> Two charsets are equal if, and only if, they have the same canonical
888             * names.  A charset is never equal to any other type of object.  </p>
889             *
890             * @return  <tt>true</tt> if, and only if, this charset is equal to the
891             *          given object
892             */
893            public final boolean equals(Object ob) {
894                if (!(ob instanceof  Charset))
895                    return false;
896                if (this  == ob)
897                    return true;
898                return name.equals(((Charset) ob).name());
899            }
900
901            /**
902             * Returns a string describing this charset. </p>
903             *
904             * @return  A string describing this charset
905             */
906            public final String toString() {
907                return name();
908            }
909
910        }
www.java2java.com | Contact Us
All other trademarks are property of their respective owners.