2362N/A * Copyright (c) 2005, 2009, Oracle and/or its affiliates. All rights reserved. 0N/A * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. 0N/A * This code is free software; you can redistribute it and/or modify it 0N/A * under the terms of the GNU General Public License version 2 only, as 2362N/A * published by the Free Software Foundation. Oracle designates this 0N/A * particular file as subject to the "Classpath" exception as provided 2362N/A * by Oracle in the LICENSE file that accompanied this code. 0N/A * This code is distributed in the hope that it will be useful, but WITHOUT 0N/A * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 0N/A * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 0N/A * version 2 for more details (a copy is included in the LICENSE file that 0N/A * accompanied this code). 0N/A * You should have received a copy of the GNU General Public License version 0N/A * 2 along with this work; if not, write to the Free Software Foundation, 0N/A * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. 2362N/A * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA 2362N/A * or visit www.oracle.com if you need additional information or have any 0N/A ******************************************************************************* 1091N/A * (C) Copyright IBM Corp. and others, 1996-2009 - All Rights Reserved * 0N/A * The original version of this source code and documentation is copyrighted * 0N/A * and owned by IBM, These materials are provided under terms of a License * 0N/A * Agreement between IBM and Sun. This technology is protected by multiple * 0N/A * US and International patents. This notice and attribution to IBM may not * 0N/A ******************************************************************************* 0N/A* <p>Internal class used for Unicode character property database.</p> 0N/A* <p>This classes store binary data read from uprops.icu. 0N/A* It does not have the capability to parse the data into more high-level 0N/A* information. It only returns bytes of information when required.</p> 0N/A* <p>Due to the form most commonly used for retrieval, array of char is used 0N/A* to store the binary data.</p> 0N/A* <p>UCharacterPropertyDB also contains information on accessing indexes to 0N/A* significant points in the binary data.</p> 0N/A* <p>Responsibility for molding the binary data into more meaning form lies on 0N/A* @author Syn Wee Quek 0N/A* @since release 2.1, february 1st 2002 0N/A // public data members ----------------------------------------------- 0N/A * CharTrie index array 0N/A * CharTrie data array 0N/A * CharTrie data offset 1091N/A // uprops.h enum UPropertySource --------------------------------------- *** 1091N/A /** One more than the highest UPropertySource (SRC_) constant. */ 0N/A // public methods ---------------------------------------------------- 0N/A * Java friends implementation 0N/A * Gets the property value at the index. 0N/A * This is optimized. 0N/A * Note this is alittle different from CharTrie the index m_trieData_ 0N/A * is never negative. 0N/A * @param ch code point whose property value is to be retrieved 0N/A * @return property value of code point 1091N/A // BMP codepoint 0000..D7FF or DC00..FFFF 1091N/A try {
// using try for ch < 0 is faster than using an if statement 1091N/A // lead surrogate D800..DBFF 1091N/A // supplementary code point 10000..10FFFF 0N/A // look at the construction of supplementary characters 0N/A // trail forms the ends of it. 0N/A // return m_dataOffset_ if there is an error, in this case we return 0N/A // the default value: m_initialValue_ 0N/A // we cannot assume that m_initialValue_ is at offset 0 0N/A // this is for optimization. 1091N/A // this all is an inlined form of return m_trie_.getCodePointValue(ch); 1091N/A * Getting the unsigned numeric value of a character embedded in the property 1091N/A * @param prop the character 1091N/A * @return unsigned numberic value 0N/A * Gets the unicode additional properties. 0N/A * C version getUnicodeProperties. 0N/A * @param codepoint codepoint whose additional properties is to be 0N/A * @return unicode properties 0N/A * <p>Get the "age" of the code point.</p> 0N/A * <p>The "age" is the Unicode version when the code point was first 0N/A * designated (as a non-character or for Private Use) or assigned a 0N/A * <p>This can be useful to avoid emitting code points to receiving 0N/A * processes that do not accept newer characters.</p> 0N/A * <p>This API does not check the validity of the codepoint.</p> 0N/A * @param codepoint The code point. 0N/A * @return the Unicode version number 0N/A * Forms a supplementary code point from the argument character<br> 0N/A * Note this is for internal use hence no checks for the validity of the 0N/A * surrogate characters are done 0N/A * @param lead lead surrogate character 0N/A * @param trail trailing surrogate character 0N/A * @return code point of the supplementary character 0N/A * Loads the property data and initialize the UCharacterProperty instance. 1091N/A * @throws MissingResourceException when data is missing or data has been corrupted 0N/A * Checks if the argument c is to be treated as a white space in ICU 0N/A * rules. Usually ICU rule white spaces are ignored unless quoted. 1091N/A * Equivalent to test for Pattern_White_Space Unicode property. 1091N/A * Stable set of characters, won't change. 0N/A * @param c codepoint to check 0N/A * @return true if c is a ICU white space 0N/A /* "white space" in the sense of ICU rule parsers 0N/A This is a FIXED LIST that is NOT DEPENDENT ON UNICODE PROPERTIES. 0N/A U+0009..U+000D, U+0020, U+0085, U+200E..U+200F, and U+2028..U+2029 1091N/A Equivalent to test for Pattern_White_Space Unicode property. 0N/A return (c >=
0x0009 && c <=
0x2029 &&
0N/A (c <=
0x000D || c ==
0x0020 || c ==
0x0085 ||
0N/A c ==
0x200E || c ==
0x200F || c >=
0x2028));
0N/A // protected variables ----------------------------------------------- 0N/A * Extra property trie 0N/A * Extra property vectors, 1st column for age and second for binary 0N/A * Number of additional columns 0N/A * Maximum values for block, bits used as in vector word 0N/A * Maximum values for script, bits used as in vector word 0N/A // private variables ------------------------------------------------- 0N/A * Default name of the datafile 0N/A * Default buffer size of datafile 0N/A * Numeric value shift 0N/A * Mask to be applied after shifting to obtain an unsigned numeric value 0N/A * Shift value for lead surrogate to form a supplementary character. 0N/A * Offset to add to combined surrogate pair to avoid msking. 1091N/A // additional properties ---------------------------------------------- 0N/A * First nibble shift 0N/A * Second nibble mask 0N/A // private constructors -------------------------------------------------- 1091N/A * @exception IOException thrown when data reading fails or data corrupted 1091N/A /* add the start code point of each same-value range of the properties vectors trie */ 1091N/A /* if m_additionalColumnsCount_==0 then the properties vectors trie may not be there at all */