3261N/A * Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved. 0N/A * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. 0N/A * This code is free software; you can redistribute it and/or modify it 0N/A * under the terms of the GNU General Public License version 2 only, as 2362N/A * published by the Free Software Foundation. Oracle designates this 0N/A * particular file as subject to the "Classpath" exception as provided 2362N/A * by Oracle in the LICENSE file that accompanied this code. 0N/A * This code is distributed in the hope that it will be useful, but WITHOUT 0N/A * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 0N/A * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 0N/A * version 2 for more details (a copy is included in the LICENSE file that 0N/A * accompanied this code). 0N/A * You should have received a copy of the GNU General Public License version 0N/A * 2 along with this work; if not, write to the Free Software Foundation, 0N/A * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. 2362N/A * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA 2362N/A * or visit www.oracle.com if you need additional information or have any 0N/A * A named mapping between sequences of sixteen-bit Unicode <a 0N/A * bytes. This class defines methods for creating decoders and encoders and 0N/A * for retrieving the various names associated with a charset. Instances of 0N/A * this class are immutable. 0N/A * <p> This class also defines static methods for testing whether a particular 0N/A * charset is supported, for locating charset instances by name, and for 0N/A * constructing a map that contains every charset for which support is 0N/A * available in the current Java virtual machine. Support for new charsets can 0N/A * be added via the service-provider interface defined in the {@link 0N/A * java.nio.charset.spi.CharsetProvider} class. 0N/A * <p> All of the methods defined in this class are safe for use by multiple 0N/A * concurrent threads. 0N/A * <a name="names"><a name="charenc"> 0N/A * <h4>Charset names</h4> 0N/A * <p> Charsets are named by strings composed of the following characters: 0N/A * <li> The uppercase letters <tt>'A'</tt> through <tt>'Z'</tt> 0N/A * (<tt>'\u0041'</tt> through <tt>'\u005a'</tt>), 0N/A * <li> The lowercase letters <tt>'a'</tt> through <tt>'z'</tt> 0N/A * (<tt>'\u0061'</tt> through <tt>'\u007a'</tt>), 0N/A * <li> The digits <tt>'0'</tt> through <tt>'9'</tt> 0N/A * (<tt>'\u0030'</tt> through <tt>'\u0039'</tt>), 0N/A * <li> The dash character <tt>'-'</tt> 0N/A * (<tt>'\u002d'</tt>, <small>HYPHEN-MINUS</small>), 518N/A * <li> The plus character <tt>'+'</tt> 518N/A * (<tt>'\u002b'</tt>, <small>PLUS SIGN</small>), 0N/A * <li> The period character <tt>'.'</tt> 0N/A * (<tt>'\u002e'</tt>, <small>FULL STOP</small>), 0N/A * <li> The colon character <tt>':'</tt> 0N/A * (<tt>'\u003a'</tt>, <small>COLON</small>), and 0N/A * <li> The underscore character <tt>'_'</tt> 0N/A * (<tt>'\u005f'</tt>, <small>LOW LINE</small>). 0N/A * A charset name must begin with either a letter or a digit. The empty string 0N/A * is not a legal charset name. Charset names are not case-sensitive; that is, 0N/A * case is always ignored when comparing charset names. Charset names 0N/A * generally follow the conventions documented in <a 0N/A * Registration Procedures</i></a>. 0N/A * <p> Every charset has a <i>canonical name</i> and may also have one or more 0N/A * <i>aliases</i>. The canonical name is returned by the {@link #name() name} method 0N/A * of this class. Canonical names are, by convention, usually in upper case. 0N/A * The aliases of a charset are returned by the {@link #aliases() aliases} 0N/A * <p> Some charsets have an <i>historical name</i> that is defined for 0N/A * compatibility with previous versions of the Java platform. A charset's 0N/A * historical name is either its canonical name or one of its aliases. The 0N/A * historical name is returned by the <tt>getEncoding()</tt> methods of the 0N/A * {@link java.io.InputStreamReader#getEncoding InputStreamReader} and {@link 0N/A * java.io.OutputStreamWriter#getEncoding OutputStreamWriter} classes. 0N/A * <p> If a charset listed in the <a 0N/A * Registry</i></a> is supported by an implementation of the Java platform then 0N/A * its canonical name must be the name listed in the registry. Many charsets 0N/A * are given more than one name in the registry, in which case the registry 0N/A * identifies one of the names as <i>MIME-preferred</i>. If a charset has more 0N/A * than one registry name then its canonical name must be the MIME-preferred 0N/A * name and the other names in the registry must be valid aliases. If a 0N/A * supported charset is not listed in the IANA registry then its canonical name 0N/A * must begin with one of the strings <tt>"X-"</tt> or <tt>"x-"</tt>. 0N/A * <p> The IANA charset registry does change over time, and so the canonical 0N/A * name and the aliases of a particular charset may also change over time. To 0N/A * ensure compatibility it is recommended that no alias ever be removed from a 0N/A * charset, and that if the canonical name of a charset is changed then its 0N/A * previous canonical name be made into an alias. 0N/A * <h4>Standard charsets</h4> 0N/A * <p> Every implementation of the Java platform is required to support the 0N/A * following standard charsets. Consult the release documentation for your 0N/A * implementation to see if any other charsets are supported. The behavior 0N/A * of such optional charsets may differ between implementations. 0N/A * <blockquote><table width="80%" summary="Description of standard charsets"> 0N/A * <tr><th><p align="left">Charset</p></th><th><p align="left">Description</p></th></tr> 0N/A * <tr><td valign=top><tt>US-ASCII</tt></td> 0N/A * <td>Seven-bit ASCII, a.k.a. <tt>ISO646-US</tt>, 0N/A * a.k.a. the Basic Latin block of the Unicode character set</td></tr> 0N/A * <tr><td valign=top><tt>ISO-8859-1 </tt></td> 0N/A * <td>ISO Latin Alphabet No. 1, a.k.a. <tt>ISO-LATIN-1</tt></td></tr> 0N/A * <tr><td valign=top><tt>UTF-8</tt></td> 0N/A * <td>Eight-bit UCS Transformation Format</td></tr> 0N/A * <tr><td valign=top><tt>UTF-16BE</tt></td> 0N/A * <td>Sixteen-bit UCS Transformation Format, 0N/A * big-endian byte order</td></tr> 0N/A * <tr><td valign=top><tt>UTF-16LE</tt></td> 0N/A * <td>Sixteen-bit UCS Transformation Format, 0N/A * little-endian byte order</td></tr> 0N/A * <tr><td valign=top><tt>UTF-16</tt></td> 0N/A * <td>Sixteen-bit UCS Transformation Format, 0N/A * byte order identified by an optional byte-order mark</td></tr> 0N/A * </table></blockquote> 0N/A * <p> The <tt>UTF-8</tt> charset is specified by <a 0N/A * transformation format upon which it is based is specified in 0N/A * Amendment 2 of ISO 10646-1 and is also described in the <a 0N/A * <p> The <tt>UTF-16</tt> charsets are specified by <a 0N/A * transformation formats upon which they are based are specified in 0N/A * Amendment 1 of ISO 10646-1 and are also described in the <a 0N/A * <p> The <tt>UTF-16</tt> charsets use sixteen-bit quantities and are 0N/A * therefore sensitive to byte order. In these encodings the byte order of a 0N/A * stream may be indicated by an initial <i>byte-order mark</i> represented by 0N/A * the Unicode character <tt>'\uFEFF'</tt>. Byte-order marks are handled 0N/A * <li><p> When decoding, the <tt>UTF-16BE</tt> and <tt>UTF-16LE</tt> 383N/A * charsets interpret the initial byte-order marks as a <small>ZERO-WIDTH 383N/A * NON-BREAKING SPACE</small>; when encoding, they do not write 0N/A * byte-order marks. </p></li> 383N/A * <li><p> When decoding, the <tt>UTF-16</tt> charset interprets the 383N/A * byte-order mark at the beginning of the input stream to indicate the 383N/A * byte-order of the stream but defaults to big-endian if there is no 383N/A * byte-order mark; when encoding, it uses big-endian byte order and writes 383N/A * a big-endian byte-order mark. </p></li> 383N/A * In any case, byte order marks occuring after the first element of an 383N/A * input sequence are not omitted since the same code is used to represent 383N/A * <small>ZERO-WIDTH NON-BREAKING SPACE</small>. 0N/A * <p> Every instance of the Java virtual machine has a default charset, which 0N/A * may or may not be one of the standard charsets. The default charset is 0N/A * determined during virtual-machine startup and typically depends upon the 0N/A * locale and charset being used by the underlying operating system. </p> 4213N/A * <p>The {@link StandardCharsets} class defines constants for each of the 0N/A * <h4>Terminology</h4> 17N/A * <p> The name of this class is taken from the terms used in 17N/A * In that document a <i>charset</i> is defined as the combination of 17N/A * one or more coded character sets and a character-encoding scheme. 17N/A * (This definition is confusing; some other software systems define 17N/A * <i>charset</i> as a synonym for <i>coded character set</i>.) 0N/A * <p> A <i>coded character set</i> is a mapping between a set of abstract 0N/A * characters and a set of integers. US-ASCII, ISO 8859-1, 17N/A * JIS X 0201, and Unicode are examples of coded character sets. 17N/A * <p> Some standards have defined a <i>character set</i> to be simply a 17N/A * set of abstract characters without an associated assigned numbering. 17N/A * An alphabet is an example of such a character set. However, the subtle 17N/A * distinction between <i>character set</i> and <i>coded character set</i> 17N/A * is rarely used in practice; the former has become a short form for the 17N/A * latter, including in the Java API specification. 17N/A * <p> A <i>character-encoding scheme</i> is a mapping between one or more 17N/A * coded character sets and a set of octet (eight-bit byte) sequences. 17N/A * UTF-8, UTF-16, ISO 2022, and EUC are examples of 17N/A * character-encoding schemes. Encoding schemes are often associated with 17N/A * a particular coded character set; UTF-8, for example, is used only to 17N/A * encode Unicode. Some schemes, however, are associated with multiple 17N/A * coded character sets; EUC, for example, can be used to encode 17N/A * characters in a variety of Asian coded character sets. 0N/A * <p> When a coded character set is used exclusively with a single 17N/A * character-encoding scheme then the corresponding charset is usually 17N/A * named for the coded character set; otherwise a charset is usually named 17N/A * for the encoding scheme and, possibly, the locale of the coded 17N/A * character sets that it supports. Hence <tt>US-ASCII</tt> is both the 17N/A * name of a coded character set and of the charset that encodes it, while 0N/A * <tt>EUC-JP</tt> is the name of the charset that encodes the 0N/A * JIS X 0201, JIS X 0208, and JIS X 0212 17N/A * coded character sets for the Japanese language. 0N/A * <p> The native character encoding of the Java programming language is 17N/A * UTF-16. A charset in the Java platform therefore defines a mapping 17N/A * between sequences of sixteen-bit UTF-16 code units (that is, sequences 17N/A * of chars) and sequences of bytes. </p> 0N/A * @author Mark Reinhold 0N/A * @author JSR-51 Expert Group 0N/A * @see CharsetDecoder 0N/A * @see CharsetEncoder 0N/A * @see java.nio.charset.spi.CharsetProvider 0N/A * @see java.lang.Character 0N/A /* -- Static methods -- */ 0N/A * Checks that the given string is a legal charset name. </p> 0N/A * A purported charset name 0N/A * @throws IllegalCharsetNameException 0N/A * If the given name is not a legal charset name 0N/A for (
int i =
0; i < n; i++) {
0N/A if (c >=
'A' && c <=
'Z')
continue;
0N/A if (c >=
'a' && c <=
'z')
continue;
0N/A if (c >=
'0' && c <=
'9')
continue;
0N/A if (c ==
'-' && i !=
0)
continue;
518N/A if (c ==
'+' && i !=
0)
continue;
0N/A if (c ==
':' && i !=
0)
continue;
0N/A if (c ==
'_' && i !=
0)
continue;
0N/A if (c ==
'.' && i !=
0)
continue;
0N/A /* The standard set of charsets */ 0N/A // Cache of the most-recently-returned charsets, 0N/A // along with the names that were used to find them 0N/A // Creates an iterator that walks over the available providers, ignoring 0N/A // those whose lookup or instantiation causes a security exception to be 0N/A // thrown. Should be invoked with full privileges. 0N/A // Ignore security exceptions 0N/A // Thread-local gate to prevent recursive provider lookups 0N/A // The runtime startup sequence looks up standard charsets as a 0N/A // consequence of the VM's invocation of System.initializeSystemClass 0N/A // in order to, e.g., set system properties and encode filenames. At 0N/A // that point the application class loader has not been initialized, 0N/A // however, so we can't look for providers because doing so will cause 0N/A // that loader to be prematurely initialized with incomplete 0N/A // Avoid recursive provider lookups 0N/A /* The extended set of charsets */ 0N/A // Extended charsets not available 0N/A // (charsets.jar not present) 0N/A // We expect most programs to use one Charset repeatedly. 0N/A // We convey a hint to this effect to the VM by putting the 0N/A // level 1 cache miss code in a separate method. 0N/A /* Only need to check the name if we didn't find a charset for it */ 0N/A * Tells whether the named charset is supported. </p> 0N/A * @param charsetName 0N/A * The name of the requested charset; may be either 0N/A * a canonical name or an alias 0N/A * @return <tt>true</tt> if, and only if, support for the named charset 0N/A * is available in the current Java virtual machine 0N/A * @throws IllegalCharsetNameException 0N/A * If the given charset name is illegal 0N/A * @throws IllegalArgumentException 0N/A * If the given <tt>charsetName</tt> is null 0N/A * Returns a charset object for the named charset. </p> 0N/A * @param charsetName 0N/A * The name of the requested charset; may be either 0N/A * a canonical name or an alias 0N/A * @return A charset object for the named charset 0N/A * @throws IllegalCharsetNameException 0N/A * If the given charset name is illegal 0N/A * @throws IllegalArgumentException 0N/A * If the given <tt>charsetName</tt> is null 0N/A * @throws UnsupportedCharsetException 0N/A * If no support for the named charset is available 0N/A * in this instance of the Java virtual machine 0N/A // Fold charsets from the given iterator into the given map, ignoring 0N/A // charsets whose names already have entries in the map. 0N/A * Constructs a sorted map from canonical charset names to charset objects. 0N/A * <p> The map returned by this method will have one entry for each charset 0N/A * for which support is available in the current Java virtual machine. If 0N/A * two or more supported charsets have the same canonical name then the 0N/A * resulting map will contain just one of them; which one it will contain 0N/A * is not specified. </p> 0N/A * <p> The invocation of this method, and the subsequent use of the 0N/A * resulting map, may cause time-consuming disk or network I/O operations 0N/A * to occur. This method is provided for applications that need to 0N/A * enumerate all of the available charsets, for example to allow user 0N/A * charset selection. This method is not used by the {@link #forName 0N/A * forName} method, which instead employs an efficient incremental lookup 0N/A * <p> This method may return different results at different times if new 0N/A * charset providers are dynamically made available to the current Java 0N/A * virtual machine. In the absence of such changes, the charsets returned 0N/A * by this method are exactly those that can be retrieved via the {@link 0N/A * #forName forName} method. </p> 0N/A * @return An immutable, case-insensitive map from canonical charset names 0N/A * to charset objects 0N/A * Returns the default charset of this Java virtual machine. 0N/A * <p> The default charset is determined during virtual-machine startup and 0N/A * typically depends upon the locale and charset of the underlying 0N/A * @return A charset object for the default charset 0N/A /* -- Instance fields and methods -- */ 0N/A * Initializes a new charset with the given canonical name and alias 0N/A * @param canonicalName 0N/A * The canonical name of this charset 0N/A * An array of this charset's aliases, or null if it has no aliases 0N/A * @throws IllegalCharsetNameException 0N/A * If the canonical name or any of the aliases are illegal 0N/A * Returns this charset's canonical name. </p> 0N/A * @return The canonical name of this charset 0N/A * Returns a set containing this charset's aliases. </p> 0N/A * @return An immutable set of this charset's aliases 0N/A for (
int i =
0; i < n; i++)
0N/A * Returns this charset's human-readable name for the default locale. 0N/A * <p> The default implementation of this method simply returns this 0N/A * charset's canonical name. Concrete subclasses of this class may 0N/A * override this method in order to provide a localized display name. </p> 0N/A * @return The display name of this charset in the default locale 0N/A * Tells whether or not this charset is registered in the <a 0N/A * Registry</a>. </p> 0N/A * @return <tt>true</tt> if, and only if, this charset is known by its 0N/A * implementor to be registered with the IANA 0N/A * Returns this charset's human-readable name for the given locale. 0N/A * <p> The default implementation of this method simply returns this 0N/A * charset's canonical name. Concrete subclasses of this class may 0N/A * override this method in order to provide a localized display name. </p> 0N/A * The locale for which the display name is to be retrieved 0N/A * @return The display name of this charset in the given locale 0N/A * Tells whether or not this charset contains the given charset. 0N/A * <p> A charset <i>C</i> is said to <i>contain</i> a charset <i>D</i> if, 0N/A * and only if, every character representable in <i>D</i> is also 0N/A * representable in <i>C</i>. If this relationship holds then it is 0N/A * guaranteed that every string that can be encoded in <i>D</i> can also be 0N/A * encoded in <i>C</i> without performing any replacements. 0N/A * <p> That <i>C</i> contains <i>D</i> does not imply that each character 0N/A * representable in <i>C</i> by a particular byte sequence is represented 0N/A * in <i>D</i> by the same byte sequence, although sometimes this is the 0N/A * <p> Every charset contains itself. 0N/A * <p> This method computes an approximation of the containment relation: 0N/A * If it returns <tt>true</tt> then the given charset is known to be 0N/A * contained by this charset; if it returns <tt>false</tt>, however, then 0N/A * it is not necessarily the case that the given charset is not contained 0N/A * @return <tt>true</tt> if the given charset is contained in this charset 0N/A * Constructs a new decoder for this charset. </p> 0N/A * @return A new decoder for this charset 0N/A * Constructs a new encoder for this charset. </p> 0N/A * @return A new encoder for this charset 0N/A * @throws UnsupportedOperationException 0N/A * If this charset does not support encoding 0N/A * Tells whether or not this charset supports encoding. 0N/A * <p> Nearly all charsets support encoding. The primary exceptions are 0N/A * special-purpose <i>auto-detect</i> charsets whose decoders can determine 0N/A * which of several possible encoding schemes is in use by examining the 0N/A * input byte sequence. Such charsets do not support encoding because 0N/A * there is no way to determine which encoding should be used on output. 0N/A * Implementations of such charsets should override this method to return 0N/A * <tt>false</tt>. </p> 0N/A * @return <tt>true</tt> if, and only if, this charset supports encoding 0N/A * Convenience method that decodes bytes in this charset into Unicode 0N/A * <p> An invocation of this method upon a charset <tt>cs</tt> returns the 0N/A * same result as the expression 0N/A * .onMalformedInput(CodingErrorAction.REPLACE) 0N/A * .onUnmappableCharacter(CodingErrorAction.REPLACE) 0N/A * .decode(bb); </pre> 0N/A * except that it is potentially more efficient because it can cache 0N/A * decoders between successive invocations. 0N/A * <p> This method always replaces malformed-input and unmappable-character 0N/A * sequences with this charset's default replacement byte array. In order 0N/A * to detect such sequences, use the {@link 0N/A * CharsetDecoder#decode(java.nio.ByteBuffer)} method directly. </p> 0N/A * @param bb The byte buffer to be decoded 0N/A * @return A char buffer containing the decoded characters 0N/A * Convenience method that encodes Unicode characters into bytes in this 0N/A * <p> An invocation of this method upon a charset <tt>cs</tt> returns the 0N/A * same result as the expression 0N/A * .onMalformedInput(CodingErrorAction.REPLACE) 0N/A * .onUnmappableCharacter(CodingErrorAction.REPLACE) 0N/A * .encode(bb); </pre> 0N/A * except that it is potentially more efficient because it can cache 0N/A * encoders between successive invocations. 0N/A * <p> This method always replaces malformed-input and unmappable-character 0N/A * sequences with this charset's default replacement string. In order to 0N/A * detect such sequences, use the {@link 0N/A * CharsetEncoder#encode(java.nio.CharBuffer)} method directly. </p> 0N/A * @param cb The char buffer to be encoded 0N/A * @return A byte buffer containing the encoded characters 0N/A * Convenience method that encodes a string into bytes in this charset. 0N/A * <p> An invocation of this method upon a charset <tt>cs</tt> returns the 0N/A * same result as the expression 0N/A * cs.encode(CharBuffer.wrap(s)); </pre> 0N/A * @param str The string to be encoded 0N/A * @return A byte buffer containing the encoded characters 0N/A * Compares this charset to another. 0N/A * <p> Charsets are ordered by their canonical names, without regard to 0N/A * The charset to which this charset is to be compared 0N/A * @return A negative integer, zero, or a positive integer as this charset 0N/A * is less than, equal to, or greater than the specified charset 0N/A * Computes a hashcode for this charset. </p> 0N/A * @return An integer hashcode 0N/A * Tells whether or not this object is equal to another. 0N/A * <p> Two charsets are equal if, and only if, they have the same canonical 0N/A * names. A charset is never equal to any other type of object. </p> 0N/A * @return <tt>true</tt> if, and only if, this charset is equal to the 0N/A * Returns a string describing this charset. </p> 0N/A * @return A string describing this charset