0N/A * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. 0N/A * This code is free software; you can redistribute it and/or modify it 0N/A * under the terms of the GNU General Public License version 2 only, as 2362N/A * published by the Free Software Foundation. Oracle designates this 0N/A * particular file as subject to the "Classpath" exception as provided 2362N/A * by Oracle in the LICENSE file that accompanied this code. 0N/A * This code is distributed in the hope that it will be useful, but WITHOUT 0N/A * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 0N/A * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 0N/A * version 2 for more details (a copy is included in the LICENSE file that 0N/A * accompanied this code). 0N/A * You should have received a copy of the GNU General Public License version 0N/A * 2 along with this work; if not, write to the Free Software Foundation, 0N/A * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. 2362N/A * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA 2362N/A * or visit www.oracle.com if you need additional information or have any 0N/A ******************************************************************************* 0N/A * Copyright (C) 2003-2004, International Business Machines Corporation and * 0N/A * others. All Rights Reserved. * 0N/A ******************************************************************************* 0N/A// 2005-05-19 Edward Wang 0N/A// - move from package com.ibm.icu.text to package sun.net.idn 0N/A// - use ParseException instead of StringPrepParseException 0N/A// - change 'Normalizer.getUnicodeVersion()' to 'NormalizerImpl.getUnicodeVersion()' 0N/A// - remove all @deprecated tag to make compiler happy 0N/A// 2007-08-14 Martin Buchholz 0N/A// - remove redundant casts 0N/A * StringPrep API implements the StingPrep framework as described by 0N/A * StringPrep prepares Unicode strings for use in network protocols. 0N/A * Profiles of StingPrep are set of rules and data according to which the 0N/A * Unicode Strings are prepared. Each profiles contains tables which describe 0N/A * how a code point should be treated. The tables are broadly classied into 0N/A * <li> Unassigned Table: Contains code points that are unassigned 0N/A * in the Unicode Version supported by StringPrep. Currently 0N/A * RFC 3454 supports Unicode 3.2. </li> 0N/A * <li> Prohibited Table: Contains code points that are prohibted from 0N/A * the output of the StringPrep processing function. </li> 0N/A * <li> Mapping Table: Contains code ponts that are deleted from the output or case mapped. </li> 0N/A * The procedure for preparing Unicode strings: 0N/A * <li> Map: For each character in the input, check if it has a mapping 0N/A * and, if so, replace it with its mapping. </li> 0N/A * <li> Normalize: Possibly normalize the result of step 1 using Unicode 0N/A * normalization. </li> 0N/A * <li> Prohibit: Check for any characters that are not allowed in the 0N/A * output. If any are found, return an error.</li> 0N/A * <li> Check bidi: Possibly check for right-to-left characters, and if 0N/A * any are found, make sure that the whole string satisfies the 0N/A * requirements for bidirectional strings. If the string does not 0N/A * satisfy the requirements for bidirectional strings, return an 0N/A * @author Ram Viswanadha 0N/A * Option to prohibit processing of unassigned code points in the input 0N/A * Option to allow processing of unassigned code points in the input 0N/A private static final int MAP =
0x0001;
0N/A /* indexes[] value names */ 0N/A private static final int OPTIONS =
7;
/* Bit set of options to turn on in the profile */ 0N/A private static final int INDEX_TOP =
16;
/* changing this requires a new formatVersion */ 0N/A * Default buffer size of datafile 0N/A /* Wrappers for Trie implementations */ 0N/A * Called by com.ibm.icu.util.Trie to extract from a lead surrogate's 0N/A * data the index array offset of the indexes for that lead surrogate. 0N/A * @param property data value for a surrogate from the trie, including 0N/A * the folding offset 0N/A * @return data offset or 0 if there is no data for the lead surrogate 0N/A // CharTrie implmentation for reading the trie data 0N/A // Indexes read from the data file 0N/A // mapping data read from the data file 0N/A // format version of the data file 0N/A // the version of Unicode supported by the data file 0N/A // the Unicode version of last entry in the 0N/A // Option to turn on Normalization 0N/A // Option to turn on checking for BiDi rules 0N/A * Creates an StringPrep object after reading the input stream. 0N/A * The object does not hold a reference to the input steam, so the stream can be 0N/A * closed after the method returns. 0N/A * @param inputStream The stream for reading the StringPrep profile binarySun 0N/A * @throws IOException 0N/A //indexes[INDEX_MAPPING_DATA_SIZE] store the size of mappingData in bytes 0N/A // load the rest of the data data and initialize the data members 0N/A // get the data format version 0N/A throw new IOException(
"Normalization Correction version not supported");
0N/A * Initial value stored in the mapping table 0N/A * just return TYPE_LIMIT .. so that 0N/A * the source codepoint is copied to the destination 0N/A /* ascertain if the value is index or delta */ 0N/A // check if the source codepoint is unassigned 0N/A /* copy mapping to destination */ 0N/A // just consume the codepoint and contine 0N/A //copy the source into destination 0N/A * Option UNORM_BEFORE_PRI_29: 0N/A * IDNA as interpreted by IETF members (see unicode mailing list 2004H1) 0N/A * requires strict adherence to Unicode 3.2 normalization, 0N/A * including buggy composition from before fixing Public Review Issue #29. 0N/A * Note that this results in some valid but nonsensical text to be 0N/A * either corrupted or rejected, depending on the text. 0N/A * See unorm.cpp and cnormtst.c 0N/A boolean isLabelSeparator(int ch){ 0N/A int result = getCodePointValue(ch); 0N/A if( (result & 0x07) == LABEL_SEPARATOR){ 0N/A 1) Map -- For each character in the input, check if it has a mapping 0N/A and, if so, replace it with its mapping. 0N/A 2) Normalize -- Possibly normalize the result of step 1 using Unicode 0N/A 3) Prohibit -- Check for any characters that are not allowed in the 0N/A output. If any are found, return an error. 0N/A 4) Check bidi -- Possibly check for right-to-left characters, and if 0N/A any are found, make sure that the whole string satisfies the 0N/A requirements for bidirectional strings. If the string does not 0N/A satisfy the requirements for bidirectional strings, return an 0N/A [Unicode3.2] defines several bidirectional categories; each character 0N/A has one bidirectional category assigned to it. For the purposes of 0N/A the requirements below, an "RandALCat character" is a character that 0N/A has Unicode bidirectional categories "R" or "AL"; an "LCat character" 0N/A is a character that has Unicode bidirectional category "L". Note 0N/A that there are many characters which fall in neither of the above 0N/A definitions; Latin digits (<U+0030> through <U+0039>) are examples of 0N/A this because they have bidirectional category "EN". 0N/A In any profile that specifies bidirectional character handling, all 0N/A three of the following requirements MUST be met: 0N/A 1) The characters in section 5.8 MUST be prohibited. 0N/A 2) If a string contains any RandALCat character, the string MUST NOT 0N/A contain any LCat character. 0N/A 3) If a string contains any RandALCat character, a RandALCat 0N/A character MUST be the first character of the string, and a 0N/A RandALCat character MUST be the last character of the string. 0N/A * Prepare the input buffer for use in applications with the given profile. This operation maps, normalizes(NFKC), 0N/A * checks for prohited and BiDi characters in the order defined by RFC 3454 0N/A * depending on the options specified in the profile. 0N/A * @param src A UCharacterIterator object containing the source string 0N/A * @param options A bit set of options: 0N/A * - StringPrep.NONE Prohibit processing of unassigned code points in the input 0N/A * - StringPrep.ALLOW_UNASSIGNED Treat the unassigned code points are in the input 0N/A * as normal Unicode code points. 0N/A * @return StringBuffer A StringBuffer containing the output 0N/A * @throws ParseException 0N/A throw new ParseException(
"The input does not conform to the rules for BiDi code points." +
0N/A throw new ParseException(
"The input does not conform to the rules for BiDi code points." +