MergeCollation.java revision 0
0N/A * Copyright 1996-1999 Sun Microsystems, Inc. All Rights Reserved. 0N/A * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. 0N/A * This code is free software; you can redistribute it and/or modify it 0N/A * under the terms of the GNU General Public License version 2 only, as 0N/A * published by the Free Software Foundation. Sun designates this 0N/A * particular file as subject to the "Classpath" exception as provided 0N/A * by Sun in the LICENSE file that accompanied this code. 0N/A * This code is distributed in the hope that it will be useful, but WITHOUT 0N/A * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 0N/A * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 0N/A * version 2 for more details (a copy is included in the LICENSE file that 0N/A * accompanied this code). 0N/A * You should have received a copy of the GNU General Public License version 0N/A * 2 along with this work; if not, write to the Free Software Foundation, 0N/A * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. 0N/A * Please contact Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, 0N/A * CA 95054 USA or visit www.sun.com if you need additional information or 0N/A * have any questions. 0N/A * (C) Copyright Taligent, Inc. 1996, 1997 - All Rights Reserved 0N/A * (C) Copyright IBM Corp. 1996, 1997 - All Rights Reserved 0N/A * The original version of this source code and documentation is copyrighted 0N/A * and owned by Taligent, Inc., a wholly-owned subsidiary of IBM. These 0N/A * materials are provided under terms of a License Agreement between Taligent 0N/A * and Sun. This technology is protected by multiple US and International 0N/A * patents. This notice and attribution to Taligent may not be removed. 0N/A * Taligent is a registered trademark of Taligent, Inc. 0N/A * Utility class for normalizing and merging patterns for collation. 0N/A * Patterns are strings of the form <entry>*, where <entry> has the 0N/A * <pattern> := <entry>* 0N/A * <entry> := <separator><chars>{"/"<extension>} 0N/A * <separator> := "=", ",", ";", "<", "&" 0N/A * <chars>, and <extension> are both arbitrary strings. 0N/A * unquoted whitespaces are ignored. 0N/A * 'xxx' can be used to quote characters 0N/A * One difference from Collator is that & is used to reset to a current 0N/A * point. Or, in other words, it introduces a new sequence which is to 0N/A * be added to the old. 0N/A * That is: "a < b < c < d" is the same as "a < b & b < c & c < d" OR 0N/A * "a < b < d & b < c" 0N/A * XXX: make '' be a single quote. 0N/A * @author Mark Davis, Helena Shih 0N/A * Creates from a pattern 0N/A * @exception ParseException If the input pattern is incorrect. 0N/A * recovers current pattern 0N/A * recovers current pattern. 0N/A * @param withWhiteSpace puts spacing around the entries, and \n 0N/A for (--i;i >=
0; --i) {
0N/A * emits the pattern for collation builder. 0N/A * @return emits the string in the format understable to the collation 0N/A * emits the pattern for collation builder. 0N/A * @param withWhiteSpace puts spacing around the entries, and \n 0N/A * @return emits the string in the format understable to the collation 0N/A * adds a pattern to the current one. 0N/A * @param pattern the new pattern to be added 0N/A * gets count of separate entries 0N/A * @return the size of pattern entries 0N/A * gets count of separate entries 0N/A * @param index the offset of the desired pattern entry 0N/A * @return the requested pattern entry 0N/A //============================================================ 0N/A //============================================================ 0N/A // This is really used as a local variable inside fixEntry, but we cache 0N/A // it here to avoid newing it up every time the method is called. 0N/A // When building a MergeCollation, we need to do lots of searches to see 0N/A // whether a given entry is already in the table. Since we're using an 0N/A // array, this would make the algorithm O(N*N). To speed things up, we 0N/A // use this bit array to remember whether the array contains any entries 0N/A // starting with each Unicode character. If not, we can avoid the search. 0N/A // Using BitSet would make this easier, but it's significantly slower. 0N/A If the strength is RESET, then just change the lastEntry to 0N/A be the current. (If the current is not in patterns, signal an error). 0N/A If not, then remove the current entry, and add it after lastEntry 0N/A (which is usually at the end). 0N/A // check to see whether the new entry has the same characters as the previous 0N/A // entry did (this can happen when a pattern declaring a difference between two 0N/A // strings that are canonically equivalent is normalized). If so, and the strength 0N/A // is anything other than IDENTICAL or RESET, throw an exception (you can't 0N/A // declare a string to be unequal to itself). --rtg 5/24/99 0N/A +
newEntry +
" are adjacent in the rules, but have conflicting " 0N/A +
"strengths: A character can't be unequal to itself.", -
1);
0N/A // otherwise, just skip this entry and behave as though you never saw it 0N/A // We're going to add an element that starts with this 0N/A // character, so go ahead and set its bit. 0N/A // Search backwards for string that contains this one; 0N/A // most likely entry is last one