3081N/A * Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved. 0N/A * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. 0N/A * This code is free software; you can redistribute it and/or modify it 0N/A * under the terms of the GNU General Public License version 2 only, as 2362N/A * published by the Free Software Foundation. Oracle designates this 0N/A * particular file as subject to the "Classpath" exception as provided 2362N/A * by Oracle in the LICENSE file that accompanied this code. 0N/A * This code is distributed in the hope that it will be useful, but WITHOUT 0N/A * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 0N/A * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 0N/A * version 2 for more details (a copy is included in the LICENSE file that 0N/A * accompanied this code). 0N/A * You should have received a copy of the GNU General Public License version 0N/A * 2 along with this work; if not, write to the Free Software Foundation, 0N/A * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. 2362N/A * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA 2362N/A * or visit www.oracle.com if you need additional information or have any 0N/A * The <code>NumericShaper</code> class is used to convert Latin-1 (European) 0N/A * digits to other Unicode decimal digits. Users of this class will 0N/A * primarily be people who wish to present data using 0N/A * national digit shapes, but find it more convenient to represent the 0N/A * data internally using Latin-1 (European) digits. This does not 0N/A * interpret the deprecated numeric shape selector character (U+206E). 0N/A * Instances of <code>NumericShaper</code> are typically applied 0N/A * as attributes to text with the 0N/A * {@link TextAttribute#NUMERIC_SHAPING NUMERIC_SHAPING} attribute 0N/A * of the <code>TextAttribute</code> class. 0N/A * For example, this code snippet causes a <code>TextLayout</code> to 0N/A * shape European digits to Arabic in an Arabic context:<br> 0N/A * Map map = new HashMap(); 0N/A * map.put(TextAttribute.NUMERIC_SHAPING, 0N/A * NumericShaper.getContextualShaper(NumericShaper.ARABIC)); 0N/A * FontRenderContext frc = ...; 0N/A * TextLayout layout = new TextLayout(text, map, frc); 0N/A * layout.draw(g2d, x, y); 0N/A * </pre></blockquote> 0N/A * It is also possible to perform numeric shaping explicitly using instances 0N/A * of <code>NumericShaper</code>, as this code snippet demonstrates:<br> 3301N/A * // shape all EUROPEAN digits (except zero) to ARABIC digits 3301N/A * NumericShaper shaper = NumericShaper.getShaper(NumericShaper.ARABIC); 3301N/A * shaper.shape(text, start, count); 3301N/A * // shape European digits to ARABIC digits if preceding text is Arabic, or 3301N/A * // shape European digits to TAMIL digits if preceding text is Tamil, or 3301N/A * // leave European digits alone if there is no preceding text, or 3301N/A * // preceding text is neither Arabic nor Tamil 3301N/A * NumericShaper.getContextualShaper(NumericShaper.ARABIC | 3301N/A * shaper.shape(text, start, count); 1914N/A * <p><b>Bit mask- and enum-based Unicode ranges</b></p> 1914N/A * <p>This class supports two different programming interfaces to 1914N/A * represent Unicode ranges for script-specific digits: bit 1914N/A * mask-based ones, such as {@link #ARABIC NumericShaper.ARABIC}, and 1914N/A * enum-based ones, such as {@link NumericShaper.Range#ARABIC}. 1914N/A * Multiple ranges can be specified by ORing bit mask-based constants, 1914N/A * NumericShaper.ARABIC | NumericShaper.TAMIL 0N/A * </pre></blockquote> 1914N/A * or creating a {@code Set} with the {@link NumericShaper.Range} 1914N/A * EnumSet.of(NumericShaper.Scirpt.ARABIC, NumericShaper.Range.TAMIL) 1914N/A * The enum-based ranges are a super set of the bit mask-based ones. 1914N/A * <p>If the two interfaces are mixed (including serialization), 1914N/A * Unicode range values are mapped to their counterparts where such 1914N/A * mapping is possible, such as {@code NumericShaper.Range.ARABIC} 1914N/A * from/to {@code NumericShaper.ARABIC}. If any unmappable range 1914N/A * values are specified, such as {@code NumericShaper.Range.BALINESE}, 1914N/A * those ranges are ignored. 3301N/A * <p><b>Decimal Digits Precedence</b></p> 3301N/A * <p>A Unicode range may have more than one set of decimal digits. If 3301N/A * multiple decimal digits sets are specified for the same Unicode 3301N/A * range, one of the sets will take precedence as follows. 3301N/A * <table border=1 cellspacing=3 cellpadding=0 summary="NumericShaper constants precedence."> 3301N/A * <th class="TableHeadingColor">Unicode Range</th> 3301N/A * <th class="TableHeadingColor"><code>NumericShaper</code> Constants</th> 3301N/A * <th class="TableHeadingColor">Precedence</th> 3301N/A * <td rowspan="2">Arabic</td> 3301N/A * <td>{@link NumericShaper#ARABIC NumericShaper.ARABIC}<br> 3301N/A * {@link NumericShaper#EASTERN_ARABIC NumericShaper.EASTERN_ARABIC}</td> 3301N/A * <td>{@link NumericShaper#EASTERN_ARABIC NumericShaper.EASTERN_ARABIC}</td> 3301N/A * <td>{@link NumericShaper.Range#ARABIC}<br> 3301N/A * {@link NumericShaper.Range#EASTERN_ARABIC}</td> 3301N/A * <td>{@link NumericShaper.Range#EASTERN_ARABIC}</td> 3301N/A * <td>{@link NumericShaper.Range#TAI_THAM_HORA}<br> 3301N/A * {@link NumericShaper.Range#TAI_THAM_THAM}</td> 3301N/A * <td>{@link NumericShaper.Range#TAI_THAM_THAM}</td> 1914N/A * A {@code NumericShaper.Range} represents a Unicode range of a 1914N/A * script having its own decimal digits. For example, the {@link 1914N/A * NumericShaper.Range#THAI} range has the Thai digits, THAI DIGIT 1914N/A * ZERO (U+0E50) to THAI DIGIT NINE (U+0E59). 1914N/A * <p>The <code>Range</code> enum replaces the traditional bit 1914N/A * mask-based values (e.g., {@link NumericShaper#ARABIC}), and 1914N/A * supports more Unicode ranges than the bit mask-based ones. For 1914N/A * example, the following code using the bit mask: 1914N/A * NumericShaper.getContextualShaper(NumericShaper.ARABIC | 1914N/A * can be written using this enum as: 1914N/A * NumericShaper.getContextualShaper(EnumSet.of( 1914N/A * NumericShaper.Range.ARABIC, 1914N/A * NumericShaper.Range.TAMIL), 1914N/A * NumericShaper.Range.EUROPEAN); 2291N/A // The order of EUROPEAN to MOGOLIAN must be consistent 2291N/A // with the bitmask-based constants. 1914N/A * The Latin (European) range with the Latin (ASCII) digits. 1914N/A * The Arabic range with the Arabic-Indic digits. 1914N/A * The Arabic range with the Eastern Arabic-Indic digits. 1914N/A * The Devanagari range with the Devanagari digits. 1914N/A * The Bengali range with the Bengali digits. 1914N/A * The Gurmukhi range with the Gurmukhi digits. 1914N/A * The Gujarati range with the Gujarati digits. 1914N/A * The Oriya range with the Oriya digits. 1914N/A * The Tamil range with the Tamil digits. 1914N/A * The Telugu range with the Telugu digits. 1914N/A * The Kannada range with the Kannada digits. 1914N/A * The Malayalam range with the Malayalam digits. 1914N/A * The Thai range with the Thai digits. 1914N/A * The Lao range with the Lao digits. 1914N/A * The Tibetan range with the Tibetan digits. 1914N/A * The Myanmar range with the Myanmar digits. 1914N/A * The Ethiopic range with the Ethiopic digits. Ethiopic 1914N/A * does not have a decimal digit 0 so Latin (European) 0 is 1914N/A * The Khmer range with the Khmer digits. 1914N/A * The Mongolian range with the Mongolian digits. 2291N/A // The order of EUROPEAN to MOGOLIAN must be consistent 2291N/A // with the bitmask-based constants. 1914N/A * The N'Ko range with the N'Ko digits. 1914N/A * The Myanmar range with the Myanmar Shan digits. 1914N/A * The Limbu range with the Limbu digits. 1914N/A * The New Tai Lue range with the New Tai Lue digits. 1914N/A * The Balinese range with the Balinese digits. 1914N/A * The Sundanese range with the Sundanese digits. 1914N/A * The Lepcha range with the Lepcha digits. 1914N/A * The Ol Chiki range with the Ol Chiki digits. 1914N/A * The Vai range with the Vai digits. 1914N/A * The Saurashtra range with the Saurashtra digits. 1914N/A * The Kayah Li range with the Kayah Li digits. 1914N/A * The Cham range with the Cham digits. 3081N/A * The Tai Tham Hora range with the Tai Tham Hora digits. 3081N/A * The Tai Tham Tham range with the Tai Tham Tham digits. 3081N/A * The Javanese range with the Javanese digits. 3081N/A * The Meetei Mayek range with the Meetei Mayek digits. 1914N/A // base character of range digits 0N/A /** index of context for contextual shaping - values range from 0 to 18 */ 0N/A /** flag indicating whether to shape contextually (high bit) and which 0N/A * digit ranges to shape (bits 0-18) 1914N/A * The context {@code Range} for contextual shaping or the {@code 1914N/A * Range} for non-contextual shaping. {@code null} for the bit 1914N/A * {@code Set<Range>} indicating which Unicode ranges to 1914N/A * shape. {@code null} for the bit mask-based API. 2291N/A * rangeSet.toArray() value. Sorted by Range.base when the number 2291N/A * of elements is greater then BSEARCH_THRESHOLD. 2291N/A * If more than BSEARCH_THRESHOLD ranges are specified, binary search is used. 0N/A /** Identifies the Latin-1 (European) and extended range, and 0N/A * Latin-1 (European) decimal base. 0N/A /** Identifies the ARABIC range and decimal base. */ 0N/A /** Identifies the ARABIC range and ARABIC_EXTENDED decimal base. */ 0N/A /** Identifies the DEVANAGARI range and decimal base. */ 0N/A /** Identifies the BENGALI range and decimal base. */ 0N/A /** Identifies the GURMUKHI range and decimal base. */ 0N/A /** Identifies the GUJARATI range and decimal base. */ 0N/A /** Identifies the ORIYA range and decimal base. */ 1914N/A /** Identifies the TAMIL range and decimal base. */ 1914N/A // TAMIL DIGIT ZERO was added in Unicode 4.1 0N/A /** Identifies the TELUGU range and decimal base. */ 0N/A /** Identifies the KANNADA range and decimal base. */ 0N/A /** Identifies the MALAYALAM range and decimal base. */ 0N/A /** Identifies the THAI range and decimal base. */ 0N/A public static final int THAI =
1<<
12;
0N/A /** Identifies the LAO range and decimal base. */ 0N/A public static final int LAO =
1<<
13;
0N/A /** Identifies the TIBETAN range and decimal base. */ 0N/A /** Identifies the MYANMAR range and decimal base. */ 0N/A /** Identifies the ETHIOPIC range and decimal base. */ 0N/A /** Identifies the KHMER range and decimal base. */ 0N/A /** Identifies the MONGOLIAN range and decimal base. */ 1914N/A /** Identifies all ranges, for full contextual shaping. 1914N/A * <p>This constant specifies all of the bit mask-based 1914N/A * ranges. Use {@code EmunSet.allOf(NumericShaper.Range.class)} to 1914N/A * specify all of the enum-based ranges. 0N/A '\u0030' -
'\u0030',
// EUROPEAN 1914N/A '\u0660' -
'\u0030',
// ARABIC-INDIC 1914N/A '\u06f0' -
'\u0030',
// EXTENDED ARABIC-INDIC (EASTERN_ARABIC) 0N/A '\u0966' -
'\u0030',
// DEVANAGARI 0N/A '\u09e6' -
'\u0030',
// BENGALI 0N/A '\u0a66' -
'\u0030',
// GURMUKHI 0N/A '\u0ae6' -
'\u0030',
// GUJARATI 0N/A '\u0b66' -
'\u0030',
// ORIYA 1914N/A '\u0be6' -
'\u0030',
// TAMIL - zero was added in Unicode 4.1 0N/A '\u0c66' -
'\u0030',
// TELUGU 0N/A '\u0ce6' -
'\u0030',
// KANNADA 0N/A '\u0d66' -
'\u0030',
// MALAYALAM 0N/A '\u0e50' -
'\u0030',
// THAI 0N/A '\u0ed0' -
'\u0030',
// LAO 0N/A '\u0f20' -
'\u0030',
// TIBETAN 0N/A '\u1040' -
'\u0030',
// MYANMAR 1914N/A '\u1369' -
'\u0031',
// ETHIOPIC - no zero 0N/A '\u17e0' -
'\u0030',
// KHMER 0N/A '\u1810' -
'\u0030',
// MONGOLIAN 0N/A // some ranges adjoin or overlap, rethink if we want to do a binary search on this 0N/A '\u0000',
'\u0300',
// 'EUROPEAN' (really latin-1 and extended) 1914N/A '\u0600',
'\u0780',
// ARABIC 1914N/A '\u0600',
'\u0780',
// EASTERN_ARABIC -- note overlap with arabic 0N/A '\u0900',
'\u0980',
// DEVANAGARI 0N/A '\u0980',
'\u0a00',
// BENGALI 0N/A '\u0a00',
'\u0a80',
// GURMUKHI 0N/A '\u0a80',
'\u0b00',
// GUJARATI 0N/A '\u0b00',
'\u0b80',
// ORIYA 1914N/A '\u0b80',
'\u0c00',
// TAMIL 0N/A '\u0c00',
'\u0c80',
// TELUGU 0N/A '\u0c80',
'\u0d00',
// KANNADA 0N/A '\u0d00',
'\u0d80',
// MALAYALAM 0N/A '\u0e00',
'\u0e80',
// THAI 0N/A '\u0e80',
'\u0f00',
// LAO 0N/A '\u0f00',
'\u1000',
// TIBETAN 0N/A '\u1000',
'\u1080',
// MYANMAR 1914N/A '\u1200',
'\u1380',
// ETHIOPIC - note missing zero 0N/A '\u1780',
'\u1800',
// KHMER 0N/A '\u1800',
'\u1900',
// MONGOLIAN 0N/A // assume most characters are near each other so probing the cache is infrequent, 0N/A // and a linear probe is ok. 0N/A // warning, synchronize access to this as it modifies state 0N/A // if we're not in a known range, then return EUROPEAN as the range key 1914N/A // cache for the NumericShaper.Range version 0N/A * A range table of strong directional characters (types L, R, AL). 0N/A * Even (left) indexes are starts of ranges of non-strong-directional (or undefined) 0N/A * characters, odd (right) indexes are starts of ranges of strong directional 1914N/A 0x10fffe,
0x10ffff // sentinel 0N/A // use a binary search with a cache 0N/A * Returns a shaper for the provided unicode range. All 0N/A * Latin-1 (EUROPEAN) digits are converted 0N/A * to the corresponding decimal unicode digits. 0N/A * @param singleRange the specified Unicode range 0N/A * @return a non-contextual numeric shaper 0N/A * @throws IllegalArgumentException if the range is not a single range 1914N/A * Returns a shaper for the provided Unicode 1914N/A * range. All Latin-1 (EUROPEAN) digits are converted to the 1914N/A * corresponding decimal digits of the specified Unicode range. 1914N/A * @param singleRange the Unicode range given by a {@link 1914N/A * NumericShaper.Range} constant. 1914N/A * @return a non-contextual {@code NumericShaper}. 1914N/A * @throws NullPointerException if {@code singleRange} is {@code null} 0N/A * Returns a contextual shaper for the provided unicode range(s). 0N/A * Latin-1 (EUROPEAN) digits are converted to the decimal digits 0N/A * corresponding to the range of the preceding text, if the 0N/A * range is one of the provided ranges. Multiple ranges are 0N/A * represented by or-ing the values together, such as, 0N/A * <code>NumericShaper.ARABIC | NumericShaper.THAI</code>. The 0N/A * shaper assumes EUROPEAN as the starting context, that is, if 0N/A * EUROPEAN digits are encountered before any strong directional 0N/A * text in the string, the context is presumed to be EUROPEAN, and 0N/A * so the digits will not shape. 0N/A * @param ranges the specified Unicode ranges 0N/A * @return a shaper for the specified ranges 1914N/A * Returns a contextual shaper for the provided Unicode 1914N/A * range(s). The Latin-1 (EUROPEAN) digits are converted to the 1914N/A * decimal digits corresponding to the range of the preceding 1914N/A * text, if the range is one of the provided ranges. 1914N/A * <p>The shaper assumes EUROPEAN as the starting context, that 1914N/A * is, if EUROPEAN digits are encountered before any strong 1914N/A * directional text in the string, the context is presumed to be 1914N/A * EUROPEAN, and so the digits will not shape. 1914N/A * @param ranges the specified Unicode ranges 1914N/A * @return a contextual shaper for the specified ranges 1914N/A * @throws NullPointerException if {@code ranges} is {@code null}. 0N/A * Returns a contextual shaper for the provided unicode range(s). 0N/A * Latin-1 (EUROPEAN) digits will be converted to the decimal digits 0N/A * corresponding to the range of the preceding text, if the 0N/A * range is one of the provided ranges. Multiple ranges are 0N/A * represented by or-ing the values together, for example, 0N/A * <code>NumericShaper.ARABIC | NumericShaper.THAI</code>. The 0N/A * shaper uses defaultContext as the starting context. 0N/A * @param ranges the specified Unicode ranges 0N/A * @param defaultContext the starting context, such as 0N/A * <code>NumericShaper.EUROPEAN</code> 0N/A * @return a shaper for the specified Unicode ranges. 0N/A * @throws IllegalArgumentException if the specified 0N/A * <code>defaultContext</code> is not a single valid range. 1914N/A * Returns a contextual shaper for the provided Unicode range(s). 1914N/A * The Latin-1 (EUROPEAN) digits will be converted to the decimal 1914N/A * digits corresponding to the range of the preceding text, if the 1914N/A * range is one of the provided ranges. The shaper uses {@code 1914N/A * defaultContext} as the starting context. 1914N/A * @param ranges the specified Unicode ranges 1914N/A * @param defaultContext the starting context, such as 1914N/A * {@code NumericShaper.Range.EUROPEAN} 1914N/A * @return a contextual shaper for the specified Unicode ranges. 1914N/A * @throws NullPointerException 1914N/A * if {@code ranges} or {@code defaultContext} is {@code null} 0N/A * Private constructor. 2291N/A // Give precedance to EASTERN_ARABIC if both ARABIC and 2291N/A // EASTERN_ARABIC are specified. 3081N/A // As well as the above case, give precedance to TAI_THAM_THAM if both 3081N/A // TAI_THAM_HORA and TAI_THAM_THAM are specified. 2291N/A // sort rangeArray for binary search 0N/A * Converts the digits in the text that occur between start and 0N/A * @param text an array of characters to convert 0N/A * @param start the index into <code>text</code> to start 0N/A * @param count the number of characters in <code>text</code> 0N/A * @throws IndexOutOfBoundsException if start or start + count is 0N/A * @throws NullPointerException if text is null 0N/A * Converts the digits in the text that occur between start and 0N/A * start + count, using the provided context. 0N/A * Context is ignored if the shaper is not a contextual shaper. 0N/A * @param text an array of characters 0N/A * @param start the index into <code>text</code> to start 0N/A * @param count the number of characters in <code>text</code> 0N/A * @param context the context to which to convert the 0N/A * characters, such as <code>NumericShaper.EUROPEAN</code> 0N/A * @throws IndexOutOfBoundsException if start or start + count is 0N/A * @throws NullPointerException if text is null 0N/A * @throws IllegalArgumentException if this is a contextual shaper 0N/A * and the specified <code>context</code> is not a single valid 1914N/A * Converts the digits in the text that occur between {@code 1914N/A * start} and {@code start + count}, using the provided {@code 1914N/A * context}. {@code Context} is ignored if the shaper is not a 1914N/A * @param text a {@code char} array 1914N/A * @param start the index into {@code text} to start converting 1914N/A * @param count the number of {@code char}s in {@code text} 1914N/A * @param context the context to which to convert the characters, 1914N/A * such as {@code NumericShaper.Range.EUROPEAN} 1914N/A * @throws IndexOutOfBoundsException 1914N/A * if {@code start} or {@code start + count} is out of bounds 1914N/A * @throws NullPointerException 1914N/A * if {@code text} or {@code context} is null 0N/A * Returns a <code>boolean</code> indicating whether or not 0N/A * this shaper shapes contextually. 0N/A * @return <code>true</code> if this shaper is contextual; 0N/A * <code>false</code> otherwise. 0N/A * Returns an <code>int</code> that ORs together the values for 0N/A * all the ranges that will be shaped. 0N/A * For example, to check if a shaper shapes to Arabic, you would use the 0N/A * <code>if ((shaper.getRanges() & shaper.ARABIC) != 0) { ... </code> 1914N/A * <p>Note that this method supports only the bit mask-based 1914N/A * ranges. Call {@link #getRangeSet()} for the enum-based ranges. 0N/A * @return the values for all the ranges to be shaped. 1914N/A * Returns a {@code Set} representing all the Unicode ranges in 1914N/A * this {@code NumericShaper} that will be shaped. 1914N/A * @return all the Unicode ranges to be shaped. 0N/A * Perform non-contextual shaping. 0N/A * Perform contextual shaping. 1914N/A * Synchronized to protect caches used in getContextKey. 0N/A // if we don't support this context, then don't shape 2291N/A // if we don't support the specified context, then don't shape. 0N/A * Returns a hash code for this shaper. 0N/A * @return this shaper's hash code. 0N/A * @see java.lang.Object#hashCode 1914N/A // Use the CONTEXTUAL_MASK bit only for the enum-based 1914N/A // NumericShaper. A deserialized NumericShaper might have 1914N/A * Returns {@code true} if the specified object is an instance of 1914N/A * <code>NumericShaper</code> and shapes identically to this one, 1914N/A * regardless of the range representations, the bit mask or the 1914N/A * enum. For example, the following code produces {@code "true"}. 1914N/A * NumericShaper ns1 = NumericShaper.getShaper(NumericShaper.ARABIC); 1914N/A * NumericShaper ns2 = NumericShaper.getShaper(NumericShaper.Range.ARABIC); 1914N/A * System.out.println(ns1.equals(ns2)); 0N/A * @param o the specified object to compare to this 0N/A * <code>NumericShaper</code> 0N/A * @return <code>true</code> if <code>o</code> is an instance 0N/A * of <code>NumericShaper</code> and shapes in the same way; 0N/A * <code>false</code> otherwise. 0N/A * @see java.lang.Object#equals(java.lang.Object) 0N/A * Returns a <code>String</code> that describes this shaper. This method 0N/A * is used for debugging purposes only. 0N/A * @return a <code>String</code> describing this shaper. 0N/A * Returns the index of the high bit in value (assuming le, actually 0N/A * power of 2 >= value). value must be positive. 0N/A * fast binary search over subrange of array. 1914N/A * Converts the {@code NumericShaper.Range} enum-based parameters, 1914N/A * if any, to the bit mask-based counterparts and writes this 1914N/A * object to the {@code stream}. Any enum constants that have no 1914N/A * bit mask-based counterparts are ignored in the conversion. 1914N/A * @param stream the output stream to write to 1914N/A * @throws IOException if an I/O error occurs while writing to {@code stream}