0N/A * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. 0N/A * This code is free software; you can redistribute it and/or modify it 0N/A * under the terms of the GNU General Public License version 2 only, as 2362N/A * published by the Free Software Foundation. Oracle designates this 0N/A * particular file as subject to the "Classpath" exception as provided 2362N/A * by Oracle in the LICENSE file that accompanied this code. 0N/A * This code is distributed in the hope that it will be useful, but WITHOUT 0N/A * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 0N/A * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 0N/A * version 2 for more details (a copy is included in the LICENSE file that 0N/A * accompanied this code). 0N/A * You should have received a copy of the GNU General Public License version 0N/A * 2 along with this work; if not, write to the Free Software Foundation, 0N/A * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. 2362N/A * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA 2362N/A * or visit www.oracle.com if you need additional information or have any 0N/A ******************************************************************************* 0N/A * Copyright (C) 1999-2003, International Business Machines 0N/A * Corporation and others. All Rights Reserved. 0N/A ******************************************************************************* 0N/A * <code>ScriptRun</code> is used to find runs of characters in 0N/A * the same script, as defined in the <code>Script</code> class. 0N/A * It implements a simple iterator over an array of characters. 0N/A * The iterator will assign <code>COMMON</code> and <code>INHERITED</code> 0N/A * characters to the same script as the preceeding characters. If the 0N/A * COMMON and INHERITED characters are first, they will be assigned to 0N/A * the same script as the following characters. 0N/A * The iterator will try to match paired punctuation. If it sees an 0N/A * opening punctuation character, it will remember the script that 0N/A * was assigned to that character, and assign the same script to the 0N/A * matching closing punctuation. 0N/A * No attempt is made to combine related scripts into a single run. In 0N/A * particular, Hiragana, Katakana, and Han characters will appear in seperate 0N/A * Here is an example of how to iterate over script runs: 0N/A * void printScriptRuns(char[] text) 0N/A * ScriptRun scriptRun = new ScriptRun(text, 0, text.length); 0N/A * while (scriptRun.next()) { 0N/A * int start = scriptRun.getScriptStart(); 0N/A * int limit = scriptRun.getScriptLimit(); 0N/A * int script = scriptRun.getScriptCode(); 0N/A * System.out.println("Script \"" + Script.getName(script) + "\" from " + 0N/A * start + " to " + limit + "."); 0N/A private char[]
text;
// fixed once set by constructor 0N/A private int stack[];
// stack used to handle paired punctuation if encountered 0N/A // must call init later or we die. 0N/A * Construct a <code>ScriptRun</code> object which iterates over a subrange 0N/A * of the given characetrs. 0N/A * @param chars the array of characters over which to iterate. 0N/A * @param start the index of the first character over which to iterate 0N/A * @param count the number of characters over which to iterate 0N/A * Get the starting index of the current script run. 0N/A * @return the index of the first character in the current script run. 0N/A * Get the index of the first character after the current script run. 0N/A * @return the index of the first character after the current script run. 0N/A * Get the script code for the script of the current script run. 0N/A * @return the script code for the script of the current script run. 0N/A * Find the next script run. Returns <code>false</code> if there 0N/A * isn't another run, returns <code>true</code> if there is. 0N/A * @return <code>false</code> if there isn't another run, <code>true</code> if there is. 0N/A // if we've fallen off the end of the text, we're done 0N/A // Paired character handling: 0N/A // if it's an open character, push it onto the stack. 0N/A // if it's a close character, find the matching open on the 0N/A // stack, and use that script code. Any non-matching open 0N/A // characters above it on the stack will be popped. 0N/A // now that we have a final script code, fix any open 0N/A // characters we pushed before we knew the script code. 0N/A // if this character is a close paired character, 0N/A // pop it from the stack 0N/A // We've just seen the first character of 0N/A // the next run. Back over it so we'll see 0N/A // it again the next time. 0N/A * Compare two script codes to see if they are in the same script. If one script is 0N/A * a strong script, and the other is INHERITED or COMMON, it will compare equal. 0N/A * @param scriptOne one of the script codes. 0N/A * @param scriptTwo the other script code. 0N/A * @return <code>true</code> if the two scripts are the same. 0N/A * @see com.ibm.icu.lang.Script 0N/A * Find the highest bit that's set in a word. Uses a binary search through 0N/A * @param n the word in which to find the highest bit that's set. 0N/A * @return the bit number (counting from the low order bit) of the highest bit. 0N/A * Search the pairedChars array for the given character. 0N/A * @param ch the character for which to search. 0N/A * @return the index of the character in the table, or -1 if it's not there. 0N/A 0x0028,
0x0029,
// ascii paired punctuation // common 0N/A 0x003c,
0x003e,
// common 0N/A 0x005b,
0x005d,
// common 0N/A 0x007b,
0x007d,
// common 0N/A 0x00ab,
0x00bb,
// guillemets // common 0N/A 0x2018,
0x2019,
// general punctuation // common 0N/A 0x201c,
0x201d,
// common 0N/A 0x2039,
0x203a,
// common 0N/A 0x3008,
0x3009,
// chinese paired punctuation // common