2362N/A * Copyright (c) 2005, Oracle and/or its affiliates. All rights reserved. 0N/A * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. 0N/A * This code is free software; you can redistribute it and/or modify it 0N/A * under the terms of the GNU General Public License version 2 only, as 2362N/A * published by the Free Software Foundation. Oracle designates this 0N/A * particular file as subject to the "Classpath" exception as provided 2362N/A * by Oracle in the LICENSE file that accompanied this code. 0N/A * This code is distributed in the hope that it will be useful, but WITHOUT 0N/A * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 0N/A * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 0N/A * version 2 for more details (a copy is included in the LICENSE file that 0N/A * accompanied this code). 0N/A * You should have received a copy of the GNU General Public License version 0N/A * 2 along with this work; if not, write to the Free Software Foundation, 0N/A * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. 2362N/A * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA 2362N/A * or visit www.oracle.com if you need additional information or have any 0N/A * Provides methods to convert internationalized domain names (IDNs) between 0N/A * a normal Unicode representation and an ASCII Compatible Encoding (ACE) representation. 0N/A * Internationalized domain names can use characters from the entire range of 0N/A * Unicode, while traditional domain names are restricted to ASCII characters. 0N/A * ACE is an encoding of Unicode strings that uses only ASCII characters and 0N/A * can be used with software (such as the Domain Name System) that only 0N/A * understands traditional domain names. 0N/A * RFC 3490 defines two operations: ToASCII and ToUnicode. These 2 operations employ 0N/A * domain name string back and forth. 0N/A * <p>The behavior of aforementioned conversion process can be adjusted by various flags: 0N/A * <li>If the ALLOW_UNASSIGNED flag is used, the domain name string to be converted 0N/A * can contain code points that are unassigned in Unicode 3.2, which is the 0N/A * Unicode version on which IDN conversion is based. If the flag is not used, 0N/A * the presence of such unassigned code points is treated as an error. 0N/A * It is an error if they don't meet the requirements. 0N/A * These flags can be logically OR'ed together. 0N/A * <p>The security consideration is important with respect to internationalization 0N/A * domain name support. For example, English domain names may be <i>homographed</i> 0N/A * - maliciously misspelled by substitution of non-Latin letters. 0N/A * discusses security issues of IDN support as well as possible solutions. 0N/A * Applications are responsible for taking adequate security measures when using 0N/A * international domain names. 0N/A * @author Edward Wang 0N/A * Flag to allow processing of unassigned code points 0N/A * Flag to turn on the check against STD-3 ASCII rules 0N/A * Translates a string from Unicode to ASCII Compatible Encoding (ACE), 0N/A * <p>ToASCII operation can fail. ToASCII fails if any step of it fails. 0N/A * If ToASCII operation fails, an IllegalArgumentException will be thrown. 0N/A * In this case, the input string should not be used in an internationalized domain name. 0N/A * <p> A label is an individual part of a domain name. The original ToASCII operation, 0N/A * as defined in RFC 3490, only operates on a single label. This method can handle 0N/A * both label and entire domain name, by assuming that labels in a domain name are 0N/A * always separated by dots. The following characters are recognized as dots: 0N/A * \u002E (full stop), \u3002 (ideographic full stop), \uFF0E (fullwidth full stop), 0N/A * and \uFF61 (halfwidth ideographic full stop). if dots are 0N/A * used as label separators, this method also changes all of them to \u002E (full stop) 0N/A * in output translated string. 0N/A * @param input the string to be processed 0N/A * @param flag process flag; can be 0 or any logical OR of possible flags 0N/A * @return the translated <tt>String</tt> 0N/A * @throws IllegalArgumentException if the input string doesn't conform to RFC 3490 specification 0N/A * Translates a string from Unicode to ASCII Compatible Encoding (ACE), 0N/A * <p> This convenience method works as if by invoking the 0N/A * two-argument counterpart as follows: 0N/A * {@link #toASCII(String, int) toASCII}(input, 0); 0N/A * </tt></blockquote> 0N/A * @param input the string to be processed 0N/A * @return the translated <tt>String</tt> 0N/A * @throws IllegalArgumentException if the input string doesn't conform to RFC 3490 specification 0N/A * Translates a string from ASCII Compatible Encoding (ACE) to Unicode, 0N/A * <p>ToUnicode never fails. In case of any error, the input string is returned unmodified. 0N/A * <p> A label is an individual part of a domain name. The original ToUnicode operation, 0N/A * as defined in RFC 3490, only operates on a single label. This method can handle 0N/A * both label and entire domain name, by assuming that labels in a domain name are 0N/A * always separated by dots. The following characters are recognized as dots: 0N/A * \u002E (full stop), \u3002 (ideographic full stop), \uFF0E (fullwidth full stop), 0N/A * and \uFF61 (halfwidth ideographic full stop). 0N/A * @param input the string to be processed 0N/A * @param flag process flag; can be 0 or any logical OR of possible flags 0N/A * @return the translated <tt>String</tt> 0N/A * Translates a string from ASCII Compatible Encoding (ACE) to Unicode, 0N/A * <p> This convenience method works as if by invoking the 0N/A * two-argument counterpart as follows: 0N/A * {@link #toUnicode(String, int) toUnicode}(input, 0); 0N/A * </tt></blockquote> 0N/A * @param input the string to be processed 0N/A * @return the translated <tt>String</tt> 0N/A /* ---------------- Private members -------------- */ 0N/A // ACE Prefix is "xn--" 0N/A // single instance of nameprep 0N/A // should never reach here 0N/A /* ---------------- Private operations -------------- */ 0N/A // to suppress the default zero-argument constructor 0N/A // toASCII operation; should only apply to a single label 0N/A // Check if the string contains code points outside the ASCII range 0..0x7c. 0N/A // perform the nameprep operation; flag ALLOW_UNASSIGNED is used here 0N/A // Verify the absence of non-LDH ASCII code points 0N/A // 0..0x2c, 0x2e..0x2f, 0x3a..0x40, 0x5b..0x60, 0x7b..0x7f 0N/A // Verify the absence of leading and trailing hyphen 0N/A // If all code points are inside 0..0x7f, skip to step 8 0N/A // verify the sequence does not begin with ACE prefix 0N/A // encode the sequence with punycode 0N/A // prepend the ACE prefix 0N/A // the length must be inside 1..63 0N/A // toUnicode operation; should only apply to a single label 0N/A // find out if all the codepoints in input are ASCII 0N/A // perform the nameprep operation; flag ALLOW_UNASSIGNED is used here 0N/A // toUnicode never fails; if any step fails, return the input string 0N/A // verify ACE Prefix 0N/A // Remove the ACE Prefix 0N/A // Decode using punycode 0N/A // return output of step 5 0N/A // just return the input 0N/A // 26-letter Latin alphabet <A-Z a-z>, the digits <0-9>, and the hyphen 0N/A // non-LDH = 0..0x2C, 0x2E..0x2F, 0x3A..0x40, 0x56..0x60, 0x7B..0x7F 0N/A //['-' '0'..'9' 'A'..'Z' 'a'..'z'] 0N/A // search dots in a string and return the index of that character; 0N/A // or if there is no dots, return the length of input string 0N/A // dots might be: \u002E (full stop), \u3002 (ideographic full stop), \uFF0E (fullwidth full stop), 0N/A // and \uFF61 (halfwidth ideographic full stop). 0N/A if (c ==
'.' || c ==
'\u3002' || c ==
'\uFF0E' || c ==
'\uFF61') {
0N/A // to check if a string only contains US-ASCII code point 0N/A // to check if a string starts with ACE-prefix 0N/A return (
char)(
ch +
'a' -
'A');