RBTableBuilder.java revision 2362
2362N/A * Copyright (c) 1999, 2005, Oracle and/or its affiliates. All rights reserved. 0N/A * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. 0N/A * This code is free software; you can redistribute it and/or modify it 0N/A * under the terms of the GNU General Public License version 2 only, as 2362N/A * published by the Free Software Foundation. Oracle designates this 0N/A * particular file as subject to the "Classpath" exception as provided 2362N/A * by Oracle in the LICENSE file that accompanied this code. 0N/A * This code is distributed in the hope that it will be useful, but WITHOUT 0N/A * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 0N/A * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 0N/A * version 2 for more details (a copy is included in the LICENSE file that 0N/A * accompanied this code). 0N/A * You should have received a copy of the GNU General Public License version 0N/A * 2 along with this work; if not, write to the Free Software Foundation, 0N/A * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. 2362N/A * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA 2362N/A * or visit www.oracle.com if you need additional information or have any 0N/A * (C) Copyright Taligent, Inc. 1996, 1997 - All Rights Reserved 0N/A * (C) Copyright IBM Corp. 1996-1998 - All Rights Reserved 0N/A * The original version of this source code and documentation is copyrighted 0N/A * and owned by Taligent, Inc., a wholly-owned subsidiary of IBM. These 0N/A * materials are provided under terms of a License Agreement between Taligent 0N/A * and Sun. This technology is protected by multiple US and International 0N/A * patents. This notice and attribution to Taligent may not be removed. 0N/A * Taligent is a registered trademark of Taligent, Inc. 0N/A * This class contains all the code to parse a RuleBasedCollator pattern 0N/A * and build a RBCollationTables object from it. A particular instance 0N/A * of tis class exists only during the actual build process-- once an 0N/A * RBCollationTables object has been built, the RBTableBuilder object 0N/A * goes away. This object carries all of the state which is only needed 0N/A * during the build process, plus a "shadow" copy of all of the state 0N/A * that will go into the tables object itself. This object communicates 0N/A * with RBCollationTables through a separate class, RBCollationTables.BuildAPI, 0N/A * this is an inner class of RBCollationTables and provides a separate 0N/A * private API for communication with RBTableBuilder. 0N/A * This class isn't just an inner class of RBCollationTables itself because 0N/A * of its large size. For source-code readability, it seemed better for the 0N/A * builder to have its own source file. 0N/A * Create a table-based collation object with the given rules. 0N/A * This is the main function that actually builds the tables and 0N/A * stores them back in the RBCollationTables object. It is called 0N/A * ONLY by the RBCollationTables constructor. 0N/A * @see java.util.RuleBasedCollator#RuleBasedCollator 0N/A * @exception ParseException If the rules format is incorrect. 0N/A // This array maps Unicode characters to their collation ordering 0N/A // Normalize the build rules. Find occurances of all decomposed characters 0N/A // and normalize the rules before feeding into the builder. By "normalize", 0N/A // we mean that all precomposed Unicode characters must be converted into 0N/A // a base character and one or more combining characters (such as accents). 0N/A // When there are multiple combining characters attached to a base character, 0N/A // the combining characters must be in their canonical order 0N/A //(1)decmp will be NO_DECOMPOSITION only in ko locale to prevent decompose 0N/A //hangual syllables to jamos, so we can actually just call decompose with 0N/A //normalizer's IGNORE_HANGUL option turned on 0N/A //(2)just call the "special version" in NormalizerImpl directly 0N/A //pattern = Normalizer.decompose(pattern, false, Normalizer.IGNORE_HANGUL, true); 0N/A //Normalizer.Mode mode = CollatorUtilities.toNormalizerMode(decmp); 0N/A //pattern = Normalizer.normalize(pattern, mode, 0, true); 0N/A // Build the merged collation entries 0N/A // Since rules can be specified in any order in the string 0N/A // (e.g. "c , C < d , D < e , E .... C < CH") 0N/A // this splits all of the rules in the string out into separate 0N/A // objects and then sorts them. In the above example, it merges the 0N/A // "C < CH" rule in just before the "C < D" rule. 0N/A // Now walk though each entry and add it to my own tables 0N/A System.out.println("mappingSize=" + mapping.getKSize()); 0N/A for (int j = 0; j < 0xffff; j++) { 0N/A int value = mapping.elementAt(j); 0N/A if (value != RBCollationTables.UNMAPPED) 0N/A System.out.println("index=" + Integer.toString(j, 16) 0N/A + ", value=" + Integer.toString(value, 16)); 0N/A /** Add expanding entries for pre-composed unicode characters so that this 0N/A * collator can be used reasonably well with decomposition turned off. 0N/A // Iterate through all of the pre-composed characters in Unicode 0N/A // We don't already have an ordering for this pre-composed character. 0N/A // First, see if the decomposed string is already in our 0N/A // tables as a single contracting-string ordering. 0N/A // If so, just map the precomposed character to that order. 0N/A // TODO: What we should really be doing here is trying to find the 0N/A // longest initial substring of the decomposition that is present 0N/A // in the tables as a contracting character sequence, and find its 0N/A // ordering. Then do this recursively with the remaining chars 0N/A // so that we build a list of orderings, and add that list to 0N/A // the expansion table. 0N/A // That would be more correct but also significantly slower, so 0N/A // I'm not totally sure it's worth doing. 0N/A //only thing need to do is to check if this decomposed character 0N/A //has an entry in our order table, this order is not necessary 0N/A //to be a contraction order, if it does have one, add an entry 0N/A //for the precomposed character by using the same order, the 0N/A //previous impl unnecessarily adds a single character expansion 0N/A // We don't have a contracting ordering for the entire string 0N/A // that results from the decomposition, but if we have orders 0N/A // for each individual character, we can add an expanding 0N/A // table entry for the pre-composed character 0N/A * Look up for unmapped values in the expanded character table. 0N/A * When the expanding character tables are built by addExpandOrder, 0N/A * it doesn't know what the final ordering of each character 0N/A * in the expansion will be. Instead, it just puts the raw character 0N/A * code into the table, adding CHARINDEX as a flag. Now that we've 0N/A * finished building the mapping table, we can go back and look up 0N/A * that character to see what its real collation order is and 0N/A * stick that into the expansion table. That lets us avoid doing 0N/A * a two-stage lookup later. 0N/A // found a expanding character that isn't filled in yet 0N/A // Get the real values for the non-filled entry 0N/A // The real value is still unmapped, maybe it's ignorable 0N/A // just fill in the value 0N/A * Increment of the last order based on the comparison level. 0N/A // increment priamry order and mask off secondary and tertiary difference 0N/A // increment secondary order and mask off tertiary difference 0N/A // record max # of ignorable chars with secondary difference 0N/A // increment tertiary order 0N/A // record max # of ignorable chars with tertiary difference 0N/A * Adds a character and its designated order into the collation table. 0N/A // See if the char already has an order in the mapping table 0N/A // There's already an entry for this character that points to a contracting 0N/A // character table. Instead of adding the character directly to the mapping 0N/A // table, we must add it to the contract table instead. 0N/A // add the entry to the mapping table, 0N/A // the same later entry replaces the previous one 0N/A * Adds the contracting string into the collation table. 0N/A char ch0 = groupChars.charAt(0); 0N/A int ch = Character.isHighSurrogate(ch0)? 0N/A Character.toCodePoint(ch0, groupChars.charAt(1)):ch0; 0N/A // See if the initial character of the string already has a contract table. 0N/A // We need to create a new table of contract entries for this base char 0N/A // Add the initial character's current ordering first. then 0N/A // update its mapping to point to this contract table 0N/A // Now add (or replace) this string in the table 0N/A // NOTE: This little bit of logic is here to speed CollationElementIterator 0N/A // .nextContractChar(). This code ensures that the longest sequence in 0N/A // this list is always the _last_ one in the list. This keeps 0N/A // nextContractChar() from having to search the entire list for the longest 0N/A // If this was a forward mapping for a contracting string, also add a 0N/A // reverse mapping for it, so that CollationElementIterator.previous 0N/A * If the given string has been specified as a contracting string 0N/A * in this collation table, return its ordering. 0N/A * Otherwise return UNMAPPED. 0N/A char ch0 = groupChars.charAt(0); 0N/A int ch = Character.isHighSurrogate(ch0)? 0N/A Character.toCodePoint(ch0, groupChars.charAt(1)):ch0; 0N/A * Get the entry of hash table of the contracting string in the collation 0N/A * @param ch the starting character of the contracting string 0N/A * Adds the expanding string into the collation table. 0N/A // Create an expansion table entry 0N/A // And add its index into the main mapping table 0N/A //only add into table when it is a legal surrogate 0N/A * Create a new entry in the expansion table that contains the orderings 0N/A * for the given characers. If anOrder is valid, it is added to the 0N/A * beginning of the expanded list of orders. 0N/A // If anOrder is valid, we want to add it at the beginning of the list 0N/A //ether we are missing the low surrogate or the next char 0N/A //is not a legal low surrogate, so stop loop 0N/A // can't find it in the table, will be filled in by commit(). 0N/A //we had at least one supplementary character, the size of valueList 0N/A //is bigger than it really needs... 0N/A // Add the expanding char list into the expansion table. 0N/A for (
int i =
0; i <
len; i++) {
0N/A // ============================================================== 0N/A // ============================================================== 0N/A final static int CHARINDEX =
0x70000000;
// need look up in .commit() 0N/A // ============================================================== 0N/A // instance variables 0N/A // ============================================================== 0N/A // variables used by the build process 0N/A // "shadow" copies of the instance variables in RBCollationTables 0N/A // (the values in these variables are copied back into RBCollationTables 0N/A // at the end of the build process)