Character.java revision 4138
5769N/A * Copyright (c) 2002, 2010, Oracle and/or its affiliates. All rights reserved. 4905N/A * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. 4905N/A * This code is free software; you can redistribute it and/or modify it 4905N/A * under the terms of the GNU General Public License version 2 only, as 4905N/A * published by the Free Software Foundation. Oracle designates this 4905N/A * particular file as subject to the "Classpath" exception as provided 4905N/A * by Oracle in the LICENSE file that accompanied this code. 4905N/A * This code is distributed in the hope that it will be useful, but WITHOUT 4905N/A * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 4905N/A * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 4905N/A * version 2 for more details (a copy is included in the LICENSE file that 4905N/A * You should have received a copy of the GNU General Public License version 4905N/A * 2 along with this work; if not, write to the Free Software Foundation, 4905N/A * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. 4905N/A * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA 4905N/A * or visit www.oracle.com if you need additional information or have any 4905N/A * The {@code Character} class wraps a value of the primitive 4905N/A * type {@code char} in an object. An object of type 4905N/A * {@code Character} contains a single field whose type is 4905N/A * In addition, this class provides several methods for determining 4905N/A * a character's category (lowercase letter, digit, etc.) and for converting 4905N/A * characters from uppercase to lowercase and vice versa. 4905N/A * Character information is based on the Unicode Standard, version 6.0.0. 4905N/A * The methods and data of class {@code Character} are defined by 4905N/A * the information in the <i>UnicodeData</i> file that is part of the 4905N/A * Unicode Character Database maintained by the Unicode 4905N/A * Consortium. This file specifies various properties including name 4905N/A * and general category for every defined Unicode code point or 4905N/A * The file and its description are available from the Unicode Consortium at: 4905N/A * <h4><a name="unicode">Unicode Character Representations</a></h4> 4905N/A * <p>The {@code char} data type (and therefore the value that a 4905N/A * {@code Character} object encapsulates) are based on the 4905N/A * original Unicode specification, which defined characters as 4905N/A * fixed-width 16-bit entities. The Unicode Standard has since been 4905N/A * changed to allow for characters whose representation requires more 4905N/A * than 16 bits. The range of legal <em>code point</em>s is now 4905N/A * U+0000 to U+10FFFF, known as <em>Unicode scalar value</em>. 4905N/A * definition</i></a> of the U+<i>n</i> notation in the Unicode 4905N/A * <p><a name="BMP">The set of characters from U+0000 to U+FFFF is 4905N/A * sometimes referred to as the <em>Basic Multilingual Plane (BMP)</em>. 4905N/A * <a name="supplementary">Characters</a> whose code points are greater 4905N/A * than U+FFFF are called <em>supplementary character</em>s. The Java 4905N/A * platform uses the UTF-16 representation in {@code char} arrays and 4905N/A * in the {@code String} and {@code StringBuffer} classes. In 4905N/A * this representation, supplementary characters are represented as a pair 4905N/A * of {@code char} values, the first from the <em>high-surrogates</em> 4905N/A * range, (\uD800-\uDBFF), the second from the 4905N/A * <em>low-surrogates</em> range (\uDC00-\uDFFF). 4905N/A * <p>A {@code char} value, therefore, represents Basic 4905N/A * Multilingual Plane (BMP) code points, including the surrogate 4905N/A * code points, or code units of the UTF-16 encoding. An 4905N/A * {@code int} value represents all Unicode code points, 4905N/A * including supplementary code points. The lower (least significant) 4905N/A * 21 bits of {@code int} are used to represent Unicode code 4905N/A * points and the upper (most significant) 11 bits must be zero. 4905N/A * Unless otherwise specified, the behavior with respect to 4905N/A * supplementary characters and surrogate {@code char} values is 4905N/A * <li>The methods that only accept a {@code char} value cannot support 4905N/A * supplementary characters. They treat {@code char} values from the 4905N/A * surrogate ranges as undefined characters. For example, 4905N/A * {@code Character.isLetter('\u005CuD840')} returns {@code false}, even though 4905N/A * this specific value if followed by any low-surrogate value in a string 4905N/A * would represent a letter. 4905N/A * <li>The methods that accept an {@code int} value support all 4905N/A * Unicode characters, including supplementary characters. For 4905N/A * example, {@code Character.isLetter(0x2F81A)} returns 4905N/A * {@code true} because the code point value represents a letter 4905N/A * <p>In the Java SE API documentation, <em>Unicode code point</em> is * used for character values in the range between U+0000 and U+10FFFF, * and <em>Unicode code unit</em> is used for 16-bit * {@code char} values that are code units of the <em>UTF-16</em> * encoding. For more information on Unicode terminology, refer to the * @author Martin Buchholz * The minimum radix available for conversion to and from strings. * The constant value of this field is the smallest value permitted * for the radix argument in radix-conversion methods such as the * {@code digit} method, the {@code forDigit} method, and the * {@code toString} method of class {@code Integer}. * @see Character#digit(char, int) * @see Character#forDigit(int, int) * @see Integer#toString(int, int) * @see Integer#valueOf(String) * The maximum radix available for conversion to and from strings. * The constant value of this field is the largest value permitted * for the radix argument in radix-conversion methods such as the * {@code digit} method, the {@code forDigit} method, and the * {@code toString} method of class {@code Integer}. * @see Character#digit(char, int) * @see Character#forDigit(int, int) * @see Integer#toString(int, int) * @see Integer#valueOf(String) * The constant value of this field is the smallest value of type * {@code char}, {@code '\u005Cu0000'}. public static final char MIN_VALUE =
'\u0000';
* The constant value of this field is the largest value of type * {@code char}, {@code '\u005CuFFFF'}. public static final char MAX_VALUE =
'\uFFFF';
* The {@code Class} instance representing the primitive type * Normative general types * General character types * General category "Cn" in the Unicode specification. * General category "Lu" in the Unicode specification. * General category "Ll" in the Unicode specification. * General category "Lt" in the Unicode specification. * General category "Lm" in the Unicode specification. * General category "Lo" in the Unicode specification. * General category "Mn" in the Unicode specification. * General category "Me" in the Unicode specification. * General category "Mc" in the Unicode specification. * General category "Nd" in the Unicode specification. * General category "Nl" in the Unicode specification. * General category "No" in the Unicode specification. * General category "Zs" in the Unicode specification. * General category "Zl" in the Unicode specification. * General category "Zp" in the Unicode specification. * General category "Cc" in the Unicode specification. public static final byte CONTROL =
15;
* General category "Cf" in the Unicode specification. public static final byte FORMAT =
16;
* General category "Co" in the Unicode specification. * General category "Cs" in the Unicode specification. * General category "Pd" in the Unicode specification. * General category "Ps" in the Unicode specification. * General category "Pe" in the Unicode specification. * General category "Pc" in the Unicode specification. * General category "Po" in the Unicode specification. * General category "Sm" in the Unicode specification. * General category "Sc" in the Unicode specification. * General category "Sk" in the Unicode specification. * General category "So" in the Unicode specification. * General category "Pi" in the Unicode specification. * General category "Pf" in the Unicode specification. * Error flag. Use int (code point) to avoid confusion with U+FFFF. static final int ERROR =
0xFFFFFFFF;
* Undefined bidirectional character type. Undefined {@code char} * values have undefined directionality in the Unicode specification. * Strong bidirectional character type "L" in the Unicode specification. * Strong bidirectional character type "R" in the Unicode specification. * Strong bidirectional character type "AL" in the Unicode specification. * Weak bidirectional character type "EN" in the Unicode specification. * Weak bidirectional character type "ES" in the Unicode specification. * Weak bidirectional character type "ET" in the Unicode specification. * Weak bidirectional character type "AN" in the Unicode specification. * Weak bidirectional character type "CS" in the Unicode specification. * Weak bidirectional character type "NSM" in the Unicode specification. * Weak bidirectional character type "BN" in the Unicode specification. * Neutral bidirectional character type "B" in the Unicode specification. * Neutral bidirectional character type "S" in the Unicode specification. * Neutral bidirectional character type "WS" in the Unicode specification. * Neutral bidirectional character type "ON" in the Unicode specification. * Strong bidirectional character type "LRE" in the Unicode specification. * Strong bidirectional character type "LRO" in the Unicode specification. * Strong bidirectional character type "RLE" in the Unicode specification. * Strong bidirectional character type "RLO" in the Unicode specification. * Weak bidirectional character type "PDF" in the Unicode specification. * Unicode high-surrogate code unit</a> * in the UTF-16 encoding, constant {@code '\u005CuD800'}. * A high-surrogate is also known as a <i>leading-surrogate</i>. * Unicode high-surrogate code unit</a> * in the UTF-16 encoding, constant {@code '\u005CuDBFF'}. * A high-surrogate is also known as a <i>leading-surrogate</i>. * Unicode low-surrogate code unit</a> * in the UTF-16 encoding, constant {@code '\u005CuDC00'}. * A low-surrogate is also known as a <i>trailing-surrogate</i>. * Unicode low-surrogate code unit</a> * in the UTF-16 encoding, constant {@code '\u005CuDFFF'}. * A low-surrogate is also known as a <i>trailing-surrogate</i>. * The minimum value of a Unicode surrogate code unit in the * UTF-16 encoding, constant {@code '\u005CuD800'}. * The maximum value of a Unicode surrogate code unit in the * UTF-16 encoding, constant {@code '\u005CuDFFF'}. * Unicode supplementary code point</a>, constant {@code U+10000}. * Unicode code point</a>, constant {@code U+0000}. * Unicode code point</a>, constant {@code U+10FFFF}. * Instances of this class represent particular subsets of the Unicode * character set. The only family of subsets defined in the * {@code Character} class is {@link Character.UnicodeBlock}. * Other portions of the Java API may define other subsets for their * Constructs a new {@code Subset} instance. * @param name The name of this subset * @exception NullPointerException if name is {@code null} * Compares two {@code Subset} objects for equality. * This method returns {@code true} if and only if * {@code this} and the argument refer to the same * object; since this method is {@code final}, this * guarantee holds for all subclasses. * Returns the standard hash code as defined by the * {@link Object#hashCode} method. This method * is {@code final} in order to ensure that the * {@code equals} and {@code hashCode} methods will * be consistent in all subclasses. * Returns the name of this subset. // for the latest specification of Unicode Blocks. * A family of character subsets representing the character blocks in the * Unicode specification. Character blocks generally define characters * used for a specific script or purpose. A character is contained by * at most one Unicode block. * Creates a UnicodeBlock with the given identifier name. * This name must be the same as the block identifier. * Creates a UnicodeBlock with the given identifier name and * Creates a UnicodeBlock with the given identifier name and * Constant for the "Basic Latin" Unicode character block. * Constant for the "Latin-1 Supplement" Unicode character block. * Constant for the "Latin Extended-A" Unicode character block. * Constant for the "Latin Extended-B" Unicode character block. * Constant for the "IPA Extensions" Unicode character block. * Constant for the "Spacing Modifier Letters" Unicode character block. "SPACING MODIFIER LETTERS",
"SPACINGMODIFIERLETTERS");
* Constant for the "Combining Diacritical Marks" Unicode character block. "COMBINING DIACRITICAL MARKS",
"COMBININGDIACRITICALMARKS");
* Constant for the "Greek and Coptic" Unicode character block. * This block was previously known as the "Greek" block. * Constant for the "Cyrillic" Unicode character block. * Constant for the "Armenian" Unicode character block. * Constant for the "Hebrew" Unicode character block. * Constant for the "Arabic" Unicode character block. * Constant for the "Devanagari" Unicode character block. * Constant for the "Bengali" Unicode character block. * Constant for the "Gurmukhi" Unicode character block. * Constant for the "Gujarati" Unicode character block. * Constant for the "Oriya" Unicode character block. * Constant for the "Tamil" Unicode character block. * Constant for the "Telugu" Unicode character block. * Constant for the "Kannada" Unicode character block. * Constant for the "Malayalam" Unicode character block. * Constant for the "Thai" Unicode character block. * Constant for the "Lao" Unicode character block. * Constant for the "Tibetan" Unicode character block. * Constant for the "Georgian" Unicode character block. * Constant for the "Hangul Jamo" Unicode character block. * Constant for the "Latin Extended Additional" Unicode character block. "LATIN EXTENDED ADDITIONAL",
"LATINEXTENDEDADDITIONAL");
* Constant for the "Greek Extended" Unicode character block. * Constant for the "General Punctuation" Unicode character block. * Constant for the "Superscripts and Subscripts" Unicode character "SUPERSCRIPTS AND SUBSCRIPTS",
"SUPERSCRIPTSANDSUBSCRIPTS");
* Constant for the "Currency Symbols" Unicode character block. * Constant for the "Combining Diacritical Marks for Symbols" Unicode * This block was previously known as "Combining Marks for Symbols". "COMBINING DIACRITICAL MARKS FOR SYMBOLS",
"COMBININGDIACRITICALMARKSFORSYMBOLS",
"COMBINING MARKS FOR SYMBOLS",
"COMBININGMARKSFORSYMBOLS");
* Constant for the "Letterlike Symbols" Unicode character block. * Constant for the "Number Forms" Unicode character block. * Constant for the "Arrows" Unicode character block. * Constant for the "Mathematical Operators" Unicode character block. "MATHEMATICAL OPERATORS",
"MATHEMATICALOPERATORS");
* Constant for the "Miscellaneous Technical" Unicode character block. "MISCELLANEOUS TECHNICAL",
"MISCELLANEOUSTECHNICAL");
* Constant for the "Control Pictures" Unicode character block. * Constant for the "Optical Character Recognition" Unicode character block. "OPTICAL CHARACTER RECOGNITION",
"OPTICALCHARACTERRECOGNITION");
* Constant for the "Enclosed Alphanumerics" Unicode character block. "ENCLOSED ALPHANUMERICS",
"ENCLOSEDALPHANUMERICS");
* Constant for the "Box Drawing" Unicode character block. * Constant for the "Block Elements" Unicode character block. * Constant for the "Geometric Shapes" Unicode character block. * Constant for the "Miscellaneous Symbols" Unicode character block. * Constant for the "Dingbats" Unicode character block. * Constant for the "CJK Symbols and Punctuation" Unicode character block. "CJK SYMBOLS AND PUNCTUATION",
"CJKSYMBOLSANDPUNCTUATION");
* Constant for the "Hiragana" Unicode character block. * Constant for the "Katakana" Unicode character block. * Constant for the "Bopomofo" Unicode character block. * Constant for the "Hangul Compatibility Jamo" Unicode character block. "HANGUL COMPATIBILITY JAMO",
"HANGULCOMPATIBILITYJAMO");
* Constant for the "Kanbun" Unicode character block. * Constant for the "Enclosed CJK Letters and Months" Unicode character block. "ENCLOSED CJK LETTERS AND MONTHS",
"ENCLOSEDCJKLETTERSANDMONTHS");
* Constant for the "CJK Compatibility" Unicode character block. * Constant for the "CJK Unified Ideographs" Unicode character block. "CJK UNIFIED IDEOGRAPHS",
* Constant for the "Hangul Syllables" Unicode character block. * Constant for the "Private Use Area" Unicode character block. * Constant for the "CJK Compatibility Ideographs" Unicode character "CJK COMPATIBILITY IDEOGRAPHS",
"CJKCOMPATIBILITYIDEOGRAPHS");
* Constant for the "Alphabetic Presentation Forms" Unicode character block. "ALPHABETIC PRESENTATION FORMS",
"ALPHABETICPRESENTATIONFORMS");
* Constant for the "Arabic Presentation Forms-A" Unicode character "ARABIC PRESENTATION FORMS-A",
"ARABICPRESENTATIONFORMS-A");
* Constant for the "Combining Half Marks" Unicode character block. * Constant for the "CJK Compatibility Forms" Unicode character block. "CJK COMPATIBILITY FORMS",
"CJKCOMPATIBILITYFORMS");
* Constant for the "Small Form Variants" Unicode character block. * Constant for the "Arabic Presentation Forms-B" Unicode character block. "ARABIC PRESENTATION FORMS-B",
"ARABICPRESENTATIONFORMS-B");
* Constant for the "Halfwidth and Fullwidth Forms" Unicode character "HALFWIDTH AND FULLWIDTH FORMS",
"HALFWIDTHANDFULLWIDTHFORMS");
* Constant for the "Specials" Unicode character block. * @deprecated As of J2SE 5, use {@link #HIGH_SURROGATES}, * {@link #HIGH_PRIVATE_USE_SURROGATES}, and * {@link #LOW_SURROGATES}. These new constants match * the block definitions of the Unicode Standard. * The {@link #of(char)} and {@link #of(int)} methods * return the new constants, not SURROGATES_AREA. * Constant for the "Syriac" Unicode character block. * Constant for the "Thaana" Unicode character block. * Constant for the "Sinhala" Unicode character block. * Constant for the "Myanmar" Unicode character block. * Constant for the "Ethiopic" Unicode character block. * Constant for the "Cherokee" Unicode character block. * Constant for the "Unified Canadian Aboriginal Syllabics" Unicode character block. "UNIFIED CANADIAN ABORIGINAL SYLLABICS",
"UNIFIEDCANADIANABORIGINALSYLLABICS");
* Constant for the "Ogham" Unicode character block. * Constant for the "Runic" Unicode character block. * Constant for the "Khmer" Unicode character block. * Constant for the "Mongolian" Unicode character block. * Constant for the "Braille Patterns" Unicode character block. * Constant for the "CJK Radicals Supplement" Unicode character block. "CJK RADICALS SUPPLEMENT",
"CJKRADICALSSUPPLEMENT");
* Constant for the "Kangxi Radicals" Unicode character block. * Constant for the "Ideographic Description Characters" Unicode character block. "IDEOGRAPHIC DESCRIPTION CHARACTERS",
"IDEOGRAPHICDESCRIPTIONCHARACTERS");
* Constant for the "Bopomofo Extended" Unicode character block. * Constant for the "CJK Unified Ideographs Extension A" Unicode character block. "CJK UNIFIED IDEOGRAPHS EXTENSION A",
"CJKUNIFIEDIDEOGRAPHSEXTENSIONA");
* Constant for the "Yi Syllables" Unicode character block. * Constant for the "Yi Radicals" Unicode character block. * Constant for the "Cyrillic Supplementary" Unicode character block. "CYRILLIC SUPPLEMENTARY",
* Constant for the "Tagalog" Unicode character block. * Constant for the "Hanunoo" Unicode character block. * Constant for the "Buhid" Unicode character block. * Constant for the "Tagbanwa" Unicode character block. * Constant for the "Limbu" Unicode character block. * Constant for the "Tai Le" Unicode character block. * Constant for the "Khmer Symbols" Unicode character block. * Constant for the "Phonetic Extensions" Unicode character block. * Constant for the "Miscellaneous Mathematical Symbols-A" Unicode character block. "MISCELLANEOUS MATHEMATICAL SYMBOLS-A",
"MISCELLANEOUSMATHEMATICALSYMBOLS-A");
* Constant for the "Supplemental Arrows-A" Unicode character block. * Constant for the "Supplemental Arrows-B" Unicode character block. * Constant for the "Miscellaneous Mathematical Symbols-B" Unicode "MISCELLANEOUS MATHEMATICAL SYMBOLS-B",
"MISCELLANEOUSMATHEMATICALSYMBOLS-B");
* Constant for the "Supplemental Mathematical Operators" Unicode "SUPPLEMENTAL MATHEMATICAL OPERATORS",
"SUPPLEMENTALMATHEMATICALOPERATORS");
* Constant for the "Miscellaneous Symbols and Arrows" Unicode character "MISCELLANEOUS SYMBOLS AND ARROWS",
"MISCELLANEOUSSYMBOLSANDARROWS");
* Constant for the "Katakana Phonetic Extensions" Unicode character "KATAKANA PHONETIC EXTENSIONS",
"KATAKANAPHONETICEXTENSIONS");
* Constant for the "Yijing Hexagram Symbols" Unicode character block. "YIJING HEXAGRAM SYMBOLS",
"YIJINGHEXAGRAMSYMBOLS");
* Constant for the "Variation Selectors" Unicode character block. * Constant for the "Linear B Syllabary" Unicode character block. * Constant for the "Linear B Ideograms" Unicode character block. * Constant for the "Aegean Numbers" Unicode character block. * Constant for the "Old Italic" Unicode character block. * Constant for the "Gothic" Unicode character block. * Constant for the "Ugaritic" Unicode character block. * Constant for the "Deseret" Unicode character block. * Constant for the "Shavian" Unicode character block. * Constant for the "Osmanya" Unicode character block. * Constant for the "Cypriot Syllabary" Unicode character block. * Constant for the "Byzantine Musical Symbols" Unicode character block. "BYZANTINE MUSICAL SYMBOLS",
"BYZANTINEMUSICALSYMBOLS");
* Constant for the "Musical Symbols" Unicode character block. * Constant for the "Tai Xuan Jing Symbols" Unicode character block. * Constant for the "Mathematical Alphanumeric Symbols" Unicode "MATHEMATICAL ALPHANUMERIC SYMBOLS",
"MATHEMATICALALPHANUMERICSYMBOLS");
* Constant for the "CJK Unified Ideographs Extension B" Unicode "CJK UNIFIED IDEOGRAPHS EXTENSION B",
"CJKUNIFIEDIDEOGRAPHSEXTENSIONB");
* Constant for the "CJK Compatibility Ideographs Supplement" Unicode character block. "CJK COMPATIBILITY IDEOGRAPHS SUPPLEMENT",
"CJKCOMPATIBILITYIDEOGRAPHSSUPPLEMENT");
* Constant for the "Tags" Unicode character block. * Constant for the "Variation Selectors Supplement" Unicode character "VARIATION SELECTORS SUPPLEMENT",
"VARIATIONSELECTORSSUPPLEMENT");
* Constant for the "Supplementary Private Use Area-A" Unicode character "SUPPLEMENTARY PRIVATE USE AREA-A",
"SUPPLEMENTARYPRIVATEUSEAREA-A");
* Constant for the "Supplementary Private Use Area-B" Unicode character "SUPPLEMENTARY PRIVATE USE AREA-B",
"SUPPLEMENTARYPRIVATEUSEAREA-B");
* Constant for the "High Surrogates" Unicode character block. * This block represents codepoint values in the high surrogate * range: U+D800 through U+DB7F * Constant for the "High Private Use Surrogates" Unicode character * This block represents codepoint values in the private use high * surrogate range: U+DB80 through U+DBFF "HIGH PRIVATE USE SURROGATES",
"HIGHPRIVATEUSESURROGATES");
* Constant for the "Low Surrogates" Unicode character block. * This block represents codepoint values in the low surrogate * range: U+DC00 through U+DFFF * Constant for the "Arabic Supplement" Unicode character block. * Constant for the "NKo" Unicode character block. * Constant for the "Samaritan" Unicode character block. * Constant for the "Mandaic" Unicode character block. * Constant for the "Ethiopic Supplement" Unicode character block. * Constant for the "Unified Canadian Aboriginal Syllabics Extended" * Unicode character block. new UnicodeBlock(
"UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS_EXTENDED",
"UNIFIED CANADIAN ABORIGINAL SYLLABICS EXTENDED",
"UNIFIEDCANADIANABORIGINALSYLLABICSEXTENDED");
* Constant for the "New Tai Lue" Unicode character block. * Constant for the "Buginese" Unicode character block. * Constant for the "Tai Tham" Unicode character block. * Constant for the "Balinese" Unicode character block. * Constant for the "Sundanese" Unicode character block. * Constant for the "Batak" Unicode character block. * Constant for the "Lepcha" Unicode character block. * Constant for the "Ol Chiki" Unicode character block. * Constant for the "Vedic Extensions" Unicode character block. * Constant for the "Phonetic Extensions Supplement" Unicode character "PHONETIC EXTENSIONS SUPPLEMENT",
"PHONETICEXTENSIONSSUPPLEMENT");
* Constant for the "Combining Diacritical Marks Supplement" Unicode "COMBINING DIACRITICAL MARKS SUPPLEMENT",
"COMBININGDIACRITICALMARKSSUPPLEMENT");
* Constant for the "Glagolitic" Unicode character block. * Constant for the "Latin Extended-C" Unicode character block. * Constant for the "Coptic" Unicode character block. * Constant for the "Georgian Supplement" Unicode character block. * Constant for the "Tifinagh" Unicode character block. * Constant for the "Ethiopic Extended" Unicode character block. * Constant for the "Cyrillic Extended-A" Unicode character block. * Constant for the "Supplemental Punctuation" Unicode character block. "SUPPLEMENTAL PUNCTUATION",
"SUPPLEMENTALPUNCTUATION");
* Constant for the "CJK Strokes" Unicode character block. * Constant for the "Lisu" Unicode character block. * Constant for the "Vai" Unicode character block. * Constant for the "Cyrillic Extended-B" Unicode character block. * Constant for the "Bamum" Unicode character block. * Constant for the "Modifier Tone Letters" Unicode character block. * Constant for the "Latin Extended-D" Unicode character block. * Constant for the "Syloti Nagri" Unicode character block. * Constant for the "Common Indic Number Forms" Unicode character block. "COMMON INDIC NUMBER FORMS",
"COMMONINDICNUMBERFORMS");
* Constant for the "Phags-pa" Unicode character block. * Constant for the "Saurashtra" Unicode character block. * Constant for the "Devanagari Extended" Unicode character block. * Constant for the "Kayah Li" Unicode character block. * Constant for the "Rejang" Unicode character block. * Constant for the "Hangul Jamo Extended-A" Unicode character block. "HANGUL JAMO EXTENDED-A",
* Constant for the "Javanese" Unicode character block. * Constant for the "Cham" Unicode character block. * Constant for the "Myanmar Extended-A" Unicode character block. * Constant for the "Tai Viet" Unicode character block. * Constant for the "Ethiopic Extended-A" Unicode character block. * Constant for the "Meetei Mayek" Unicode character block. * Constant for the "Hangul Jamo Extended-B" Unicode character block. "HANGUL JAMO EXTENDED-B",
* Constant for the "Vertical Forms" Unicode character block. * Constant for the "Ancient Greek Numbers" Unicode character block. * Constant for the "Ancient Symbols" Unicode character block. * Constant for the "Phaistos Disc" Unicode character block. * Constant for the "Lycian" Unicode character block. * Constant for the "Carian" Unicode character block. * Constant for the "Old Persian" Unicode character block. * Constant for the "Imperial Aramaic" Unicode character block. * Constant for the "Phoenician" Unicode character block. * Constant for the "Lydian" Unicode character block. * Constant for the "Kharoshthi" Unicode character block. * Constant for the "Old South Arabian" Unicode character block. * Constant for the "Avestan" Unicode character block. * Constant for the "Inscriptional Parthian" Unicode character block. "INSCRIPTIONAL PARTHIAN",
"INSCRIPTIONALPARTHIAN");
* Constant for the "Inscriptional Pahlavi" Unicode character block. * Constant for the "Old Turkic" Unicode character block. * Constant for the "Rumi Numeral Symbols" Unicode character block. * Constant for the "Brahmi" Unicode character block. * Constant for the "Kaithi" Unicode character block. * Constant for the "Cuneiform" Unicode character block. * Constant for the "Cuneiform Numbers and Punctuation" Unicode "CUNEIFORM NUMBERS AND PUNCTUATION",
"CUNEIFORMNUMBERSANDPUNCTUATION");
* Constant for the "Egyptian Hieroglyphs" Unicode character block. * Constant for the "Bamum Supplement" Unicode character block. * Constant for the "Kana Supplement" Unicode character block. * Constant for the "Ancient Greek Musical Notation" Unicode character "ANCIENT GREEK MUSICAL NOTATION",
"ANCIENTGREEKMUSICALNOTATION");
* Constant for the "Counting Rod Numerals" Unicode character block. * Constant for the "Mahjong Tiles" Unicode character block. * Constant for the "Domino Tiles" Unicode character block. * Constant for the "Playing Cards" Unicode character block. * Constant for the "Enclosed Alphanumeric Supplement" Unicode character "ENCLOSED ALPHANUMERIC SUPPLEMENT",
"ENCLOSEDALPHANUMERICSUPPLEMENT");
* Constant for the "Enclosed Ideographic Supplement" Unicode character "ENCLOSED IDEOGRAPHIC SUPPLEMENT",
"ENCLOSEDIDEOGRAPHICSUPPLEMENT");
* Constant for the "Miscellaneous Symbols And Pictographs" Unicode "MISCELLANEOUS SYMBOLS AND PICTOGRAPHS",
"MISCELLANEOUSSYMBOLSANDPICTOGRAPHS");
* Constant for the "Emoticons" Unicode character block. * Constant for the "Transport And Map Symbols" Unicode character block. "TRANSPORT AND MAP SYMBOLS",
"TRANSPORTANDMAPSYMBOLS");
* Constant for the "Alchemical Symbols" Unicode character block. * Constant for the "CJK Unified Ideographs Extension C" Unicode "CJK UNIFIED IDEOGRAPHS EXTENSION C",
"CJKUNIFIEDIDEOGRAPHSEXTENSIONC");
* Constant for the "CJK Unified Ideographs Extension D" Unicode "CJK UNIFIED IDEOGRAPHS EXTENSION D",
"CJKUNIFIEDIDEOGRAPHSEXTENSIOND");
0x0000,
// 0000..007F; Basic Latin 0x0080,
// 0080..00FF; Latin-1 Supplement 0x0100,
// 0100..017F; Latin Extended-A 0x0180,
// 0180..024F; Latin Extended-B 0x0250,
// 0250..02AF; IPA Extensions 0x02B0,
// 02B0..02FF; Spacing Modifier Letters 0x0300,
// 0300..036F; Combining Diacritical Marks 0x0370,
// 0370..03FF; Greek and Coptic 0x0400,
// 0400..04FF; Cyrillic 0x0500,
// 0500..052F; Cyrillic Supplement 0x0530,
// 0530..058F; Armenian 0x0590,
// 0590..05FF; Hebrew 0x0600,
// 0600..06FF; Arabic 0x0700,
// 0700..074F; Syriac 0x0750,
// 0750..077F; Arabic Supplement 0x0780,
// 0780..07BF; Thaana 0x07C0,
// 07C0..07FF; NKo 0x0800,
// 0800..083F; Samaritan 0x0840,
// 0840..085F; Mandaic 0x0900,
// 0900..097F; Devanagari 0x0980,
// 0980..09FF; Bengali 0x0A00,
// 0A00..0A7F; Gurmukhi 0x0A80,
// 0A80..0AFF; Gujarati 0x0B00,
// 0B00..0B7F; Oriya 0x0B80,
// 0B80..0BFF; Tamil 0x0C00,
// 0C00..0C7F; Telugu 0x0C80,
// 0C80..0CFF; Kannada 0x0D00,
// 0D00..0D7F; Malayalam 0x0D80,
// 0D80..0DFF; Sinhala 0x0E00,
// 0E00..0E7F; Thai 0x0E80,
// 0E80..0EFF; Lao 0x0F00,
// 0F00..0FFF; Tibetan 0x1000,
// 1000..109F; Myanmar 0x10A0,
// 10A0..10FF; Georgian 0x1100,
// 1100..11FF; Hangul Jamo 0x1200,
// 1200..137F; Ethiopic 0x1380,
// 1380..139F; Ethiopic Supplement 0x13A0,
// 13A0..13FF; Cherokee 0x1400,
// 1400..167F; Unified Canadian Aboriginal Syllabics 0x1680,
// 1680..169F; Ogham 0x16A0,
// 16A0..16FF; Runic 0x1700,
// 1700..171F; Tagalog 0x1720,
// 1720..173F; Hanunoo 0x1740,
// 1740..175F; Buhid 0x1760,
// 1760..177F; Tagbanwa 0x1780,
// 1780..17FF; Khmer 0x1800,
// 1800..18AF; Mongolian 0x18B0,
// 18B0..18FF; Unified Canadian Aboriginal Syllabics Extended 0x1900,
// 1900..194F; Limbu 0x1950,
// 1950..197F; Tai Le 0x1980,
// 1980..19DF; New Tai Lue 0x19E0,
// 19E0..19FF; Khmer Symbols 0x1A00,
// 1A00..1A1F; Buginese 0x1A20,
// 1A20..1AAF; Tai Tham 0x1B00,
// 1B00..1B7F; Balinese 0x1B80,
// 1B80..1BBF; Sundanese 0x1BC0,
// 1BC0..1BFF; Batak 0x1C00,
// 1C00..1C4F; Lepcha 0x1C50,
// 1C50..1C7F; Ol Chiki 0x1CD0,
// 1CD0..1CFF; Vedic Extensions 0x1D00,
// 1D00..1D7F; Phonetic Extensions 0x1D80,
// 1D80..1DBF; Phonetic Extensions Supplement 0x1DC0,
// 1DC0..1DFF; Combining Diacritical Marks Supplement 0x1E00,
// 1E00..1EFF; Latin Extended Additional 0x1F00,
// 1F00..1FFF; Greek Extended 0x2000,
// 2000..206F; General Punctuation 0x2070,
// 2070..209F; Superscripts and Subscripts 0x20A0,
// 20A0..20CF; Currency Symbols 0x20D0,
// 20D0..20FF; Combining Diacritical Marks for Symbols 0x2100,
// 2100..214F; Letterlike Symbols 0x2150,
// 2150..218F; Number Forms 0x2190,
// 2190..21FF; Arrows 0x2200,
// 2200..22FF; Mathematical Operators 0x2300,
// 2300..23FF; Miscellaneous Technical 0x2400,
// 2400..243F; Control Pictures 0x2440,
// 2440..245F; Optical Character Recognition 0x2460,
// 2460..24FF; Enclosed Alphanumerics 0x2500,
// 2500..257F; Box Drawing 0x2580,
// 2580..259F; Block Elements 0x25A0,
// 25A0..25FF; Geometric Shapes 0x2600,
// 2600..26FF; Miscellaneous Symbols 0x2700,
// 2700..27BF; Dingbats 0x27C0,
// 27C0..27EF; Miscellaneous Mathematical Symbols-A 0x27F0,
// 27F0..27FF; Supplemental Arrows-A 0x2800,
// 2800..28FF; Braille Patterns 0x2900,
// 2900..297F; Supplemental Arrows-B 0x2980,
// 2980..29FF; Miscellaneous Mathematical Symbols-B 0x2A00,
// 2A00..2AFF; Supplemental Mathematical Operators 0x2B00,
// 2B00..2BFF; Miscellaneous Symbols and Arrows 0x2C00,
// 2C00..2C5F; Glagolitic 0x2C60,
// 2C60..2C7F; Latin Extended-C 0x2C80,
// 2C80..2CFF; Coptic 0x2D00,
// 2D00..2D2F; Georgian Supplement 0x2D30,
// 2D30..2D7F; Tifinagh 0x2D80,
// 2D80..2DDF; Ethiopic Extended 0x2DE0,
// 2DE0..2DFF; Cyrillic Extended-A 0x2E00,
// 2E00..2E7F; Supplemental Punctuation 0x2E80,
// 2E80..2EFF; CJK Radicals Supplement 0x2F00,
// 2F00..2FDF; Kangxi Radicals 0x2FF0,
// 2FF0..2FFF; Ideographic Description Characters 0x3000,
// 3000..303F; CJK Symbols and Punctuation 0x3040,
// 3040..309F; Hiragana 0x30A0,
// 30A0..30FF; Katakana 0x3100,
// 3100..312F; Bopomofo 0x3130,
// 3130..318F; Hangul Compatibility Jamo 0x3190,
// 3190..319F; Kanbun 0x31A0,
// 31A0..31BF; Bopomofo Extended 0x31C0,
// 31C0..31EF; CJK Strokes 0x31F0,
// 31F0..31FF; Katakana Phonetic Extensions 0x3200,
// 3200..32FF; Enclosed CJK Letters and Months 0x3300,
// 3300..33FF; CJK Compatibility 0x3400,
// 3400..4DBF; CJK Unified Ideographs Extension A 0x4DC0,
// 4DC0..4DFF; Yijing Hexagram Symbols 0x4E00,
// 4E00..9FFF; CJK Unified Ideographs 0xA000,
// A000..A48F; Yi Syllables 0xA490,
// A490..A4CF; Yi Radicals 0xA4D0,
// A4D0..A4FF; Lisu 0xA500,
// A500..A63F; Vai 0xA640,
// A640..A69F; Cyrillic Extended-B 0xA6A0,
// A6A0..A6FF; Bamum 0xA700,
// A700..A71F; Modifier Tone Letters 0xA720,
// A720..A7FF; Latin Extended-D 0xA800,
// A800..A82F; Syloti Nagri 0xA830,
// A830..A83F; Common Indic Number Forms 0xA840,
// A840..A87F; Phags-pa 0xA880,
// A880..A8DF; Saurashtra 0xA8E0,
// A8E0..A8FF; Devanagari Extended 0xA900,
// A900..A92F; Kayah Li 0xA930,
// A930..A95F; Rejang 0xA960,
// A960..A97F; Hangul Jamo Extended-A 0xA980,
// A980..A9DF; Javanese 0xAA00,
// AA00..AA5F; Cham 0xAA60,
// AA60..AA7F; Myanmar Extended-A 0xAA80,
// AA80..AADF; Tai Viet 0xAB00,
// AB00..AB2F; Ethiopic Extended-A 0xABC0,
// ABC0..ABFF; Meetei Mayek 0xAC00,
// AC00..D7AF; Hangul Syllables 0xD7B0,
// D7B0..D7FF; Hangul Jamo Extended-B 0xD800,
// D800..DB7F; High Surrogates 0xDB80,
// DB80..DBFF; High Private Use Surrogates 0xDC00,
// DC00..DFFF; Low Surrogates 0xE000,
// E000..F8FF; Private Use Area 0xF900,
// F900..FAFF; CJK Compatibility Ideographs 0xFB00,
// FB00..FB4F; Alphabetic Presentation Forms 0xFB50,
// FB50..FDFF; Arabic Presentation Forms-A 0xFE00,
// FE00..FE0F; Variation Selectors 0xFE10,
// FE10..FE1F; Vertical Forms 0xFE20,
// FE20..FE2F; Combining Half Marks 0xFE30,
// FE30..FE4F; CJK Compatibility Forms 0xFE50,
// FE50..FE6F; Small Form Variants 0xFE70,
// FE70..FEFF; Arabic Presentation Forms-B 0xFF00,
// FF00..FFEF; Halfwidth and Fullwidth Forms 0xFFF0,
// FFF0..FFFF; Specials 0x10000,
// 10000..1007F; Linear B Syllabary 0x10080,
// 10080..100FF; Linear B Ideograms 0x10100,
// 10100..1013F; Aegean Numbers 0x10140,
// 10140..1018F; Ancient Greek Numbers 0x10190,
// 10190..101CF; Ancient Symbols 0x101D0,
// 101D0..101FF; Phaistos Disc 0x10280,
// 10280..1029F; Lycian 0x102A0,
// 102A0..102DF; Carian 0x10300,
// 10300..1032F; Old Italic 0x10330,
// 10330..1034F; Gothic 0x10380,
// 10380..1039F; Ugaritic 0x103A0,
// 103A0..103DF; Old Persian 0x10400,
// 10400..1044F; Deseret 0x10450,
// 10450..1047F; Shavian 0x10480,
// 10480..104AF; Osmanya 0x10800,
// 10800..1083F; Cypriot Syllabary 0x10840,
// 10840..1085F; Imperial Aramaic 0x10900,
// 10900..1091F; Phoenician 0x10920,
// 10920..1093F; Lydian 0x10A00,
// 10A00..10A5F; Kharoshthi 0x10A60,
// 10A60..10A7F; Old South Arabian 0x10B00,
// 10B00..10B3F; Avestan 0x10B40,
// 10B40..10B5F; Inscriptional Parthian 0x10B60,
// 10B60..10B7F; Inscriptional Pahlavi 0x10C00,
// 10C00..10C4F; Old Turkic 0x10E60,
// 10E60..10E7F; Rumi Numeral Symbols 0x11000,
// 11000..1107F; Brahmi 0x11080,
// 11080..110CF; Kaithi 0x12000,
// 12000..123FF; Cuneiform 0x12400,
// 12400..1247F; Cuneiform Numbers and Punctuation 0x13000,
// 13000..1342F; Egyptian Hieroglyphs 0x16800,
// 16800..16A3F; Bamum Supplement 0x1B000,
// 1B000..1B0FF; Kana Supplement 0x1D000,
// 1D000..1D0FF; Byzantine Musical Symbols 0x1D100,
// 1D100..1D1FF; Musical Symbols 0x1D200,
// 1D200..1D24F; Ancient Greek Musical Notation 0x1D300,
// 1D300..1D35F; Tai Xuan Jing Symbols 0x1D360,
// 1D360..1D37F; Counting Rod Numerals 0x1D400,
// 1D400..1D7FF; Mathematical Alphanumeric Symbols 0x1F000,
// 1F000..1F02F; Mahjong Tiles 0x1F030,
// 1F030..1F09F; Domino Tiles 0x1F0A0,
// 1F0A0..1F0FF; Playing Cards 0x1F100,
// 1F100..1F1FF; Enclosed Alphanumeric Supplement 0x1F200,
// 1F200..1F2FF; Enclosed Ideographic Supplement 0x1F300,
// 1F300..1F5FF; Miscellaneous Symbols And Pictographs 0x1F600,
// 1F600..1F64F; Emoticons 0x1F680,
// 1F680..1F6FF; Transport And Map Symbols 0x1F700,
// 1F700..1F77F; Alchemical Symbols 0x20000,
// 20000..2A6DF; CJK Unified Ideographs Extension B 0x2A700,
// 2A700..2B73F; CJK Unified Ideographs Extension C 0x2B740,
// 2B740..2B81F; CJK Unified Ideographs Extension D 0x2F800,
// 2F800..2FA1F; CJK Compatibility Ideographs Supplement 0xE0000,
// E0000..E007F; Tags 0xE0100,
// E0100..E01EF; Variation Selectors Supplement 0xF0000,
// F0000..FFFFF; Supplementary Private Use Area-A 0x100000 // 100000..10FFFF; Supplementary Private Use Area-B * Returns the object representing the Unicode block containing the * given character, or {@code null} if the character is not a * member of a defined block. * <p><b>Note:</b> This method cannot handle * characters</a>. To support all Unicode characters, including * supplementary characters, use the {@link #of(int)} method. * @param c The character in question * @return The {@code UnicodeBlock} instance representing the * Unicode block of which this character is a member, or * {@code null} if the character is not a member of any * Returns the object representing the Unicode block * containing the given character (Unicode code point), or * {@code null} if the character is not a member of a * @param codePoint the character (Unicode code point) in question. * @return The {@code UnicodeBlock} instance representing the * Unicode block of which this character is a member, or * {@code null} if the character is not a member of any * @exception IllegalArgumentException if the specified * {@code codePoint} is an invalid Unicode code point. * @see Character#isValidCodePoint(int) // invariant: top > current >= bottom && codePoint >= unicodeBlockStarts[bottom] * Returns the UnicodeBlock with the given name. Block * names are determined by The Unicode Standard. The file * Blocks-<version>.txt defines blocks for a particular * version of the standard. The {@link Character} class specifies * the version of the standard that it supports. * This method accepts block names in the following forms: * <li> Canonical block names as defined by the Unicode Standard. * For example, the standard defines a "Basic Latin" block. Therefore, this * method accepts "Basic Latin" as a valid block name. The documentation of * each UnicodeBlock provides the canonical name. * <li>Canonical block names with all spaces removed. For example, "BasicLatin" * is a valid block name for the "Basic Latin" block. * <li>The text representation of each constant UnicodeBlock identifier. * For example, this method will return the {@link #BASIC_LATIN} block if * provided with the "BASIC_LATIN" name. This form replaces all spaces and * hyphens in the canonical name with underscores. * Finally, character case is ignored for all of the valid block name forms. * For example, "BASIC_LATIN" and "basic_latin" are both valid block names. * The en_US locale's case mapping rules are used to provide case-insensitive * string comparisons for block name validation. * If the Unicode Standard changes block names, both the previous and * current names will be accepted. * @param blockName A {@code UnicodeBlock} name. * @return The {@code UnicodeBlock} instance identified * @throws IllegalArgumentException if {@code blockName} is an * @throws NullPointerException if {@code blockName} is null * A family of character subsets representing the character scripts * <i>Unicode Standard Annex #24: Script Names</i></a>. Every Unicode * character is assigned to a single Unicode script, either a specific * script, such as {@link Character.UnicodeScript#LATIN Latin}, or * one of the following three special values, * {@link Character.UnicodeScript#INHERITED Inherited}, * {@link Character.UnicodeScript#COMMON Common} or * {@link Character.UnicodeScript#UNKNOWN Unknown}. * Unicode script "Common". * Unicode script "Latin". * Unicode script "Greek". * Unicode script "Cyrillic". * Unicode script "Armenian". * Unicode script "Hebrew". * Unicode script "Arabic". * Unicode script "Syriac". * Unicode script "Thaana". * Unicode script "Devanagari". * Unicode script "Bengali". * Unicode script "Gurmukhi". * Unicode script "Gujarati". * Unicode script "Oriya". * Unicode script "Tamil". * Unicode script "Telugu". * Unicode script "Kannada". * Unicode script "Malayalam". * Unicode script "Sinhala". * Unicode script "Tibetan". * Unicode script "Myanmar". * Unicode script "Georgian". * Unicode script "Hangul". * Unicode script "Ethiopic". * Unicode script "Cherokee". * Unicode script "Canadian_Aboriginal". * Unicode script "Ogham". * Unicode script "Runic". * Unicode script "Khmer". * Unicode script "Mongolian". * Unicode script "Hiragana". * Unicode script "Katakana". * Unicode script "Bopomofo". * Unicode script "Old_Italic". * Unicode script "Gothic". * Unicode script "Deseret". * Unicode script "Inherited". * Unicode script "Tagalog". * Unicode script "Hanunoo". * Unicode script "Buhid". * Unicode script "Tagbanwa". * Unicode script "Limbu". * Unicode script "Tai_Le". * Unicode script "Linear_B". * Unicode script "Ugaritic". * Unicode script "Shavian". * Unicode script "Osmanya". * Unicode script "Cypriot". * Unicode script "Braille". * Unicode script "Buginese". * Unicode script "Coptic". * Unicode script "New_Tai_Lue". * Unicode script "Glagolitic". * Unicode script "Tifinagh". * Unicode script "Syloti_Nagri". * Unicode script "Old_Persian". * Unicode script "Kharoshthi". * Unicode script "Balinese". * Unicode script "Cuneiform". * Unicode script "Phoenician". * Unicode script "Phags_Pa". * Unicode script "Sundanese". * Unicode script "Batak". * Unicode script "Lepcha". * Unicode script "Ol_Chiki". * Unicode script "Saurashtra". * Unicode script "Kayah_Li". * Unicode script "Rejang". * Unicode script "Lycian". * Unicode script "Carian". * Unicode script "Lydian". * Unicode script "Tai_Tham". * Unicode script "Tai_Viet". * Unicode script "Avestan". * Unicode script "Egyptian_Hieroglyphs". * Unicode script "Samaritan". * Unicode script "Mandaic". * Unicode script "Bamum". * Unicode script "Javanese". * Unicode script "Meetei_Mayek". * Unicode script "Imperial_Aramaic". * Unicode script "Old_South_Arabian". * Unicode script "Inscriptional_Parthian". * Unicode script "Inscriptional_Pahlavi". * Unicode script "Old_Turkic". * Unicode script "Brahmi". * Unicode script "Kaithi". * Unicode script "Unknown". 0x0000,
// 0000..0040; COMMON 0x0041,
// 0041..005A; LATIN 0x005B,
// 005B..0060; COMMON 0x0061,
// 0061..007A; LATIN 0x007B,
// 007B..00A9; COMMON 0x00AA,
// 00AA..00AA; LATIN 0x00AB,
// 00AB..00B9; COMMON 0x00BA,
// 00BA..00BA; LATIN 0x00BB,
// 00BB..00BF; COMMON 0x00C0,
// 00C0..00D6; LATIN 0x00D7,
// 00D7..00D7; COMMON 0x00D8,
// 00D8..00F6; LATIN 0x00F7,
// 00F7..00F7; COMMON 0x00F8,
// 00F8..02B8; LATIN 0x02B9,
// 02B9..02DF; COMMON 0x02E0,
// 02E0..02E4; LATIN 0x02E5,
// 02E5..02E9; COMMON 0x02EA,
// 02EA..02EB; BOPOMOFO 0x02EC,
// 02EC..02FF; COMMON 0x0300,
// 0300..036F; INHERITED 0x0370,
// 0370..0373; GREEK 0x0374,
// 0374..0374; COMMON 0x0375,
// 0375..037D; GREEK 0x037E,
// 037E..0383; COMMON 0x0384,
// 0384..0384; GREEK 0x0385,
// 0385..0385; COMMON 0x0386,
// 0386..0386; GREEK 0x0387,
// 0387..0387; COMMON 0x0388,
// 0388..03E1; GREEK 0x03E2,
// 03E2..03EF; COPTIC 0x03F0,
// 03F0..03FF; GREEK 0x0400,
// 0400..0484; CYRILLIC 0x0485,
// 0485..0486; INHERITED 0x0487,
// 0487..0530; CYRILLIC 0x0531,
// 0531..0588; ARMENIAN 0x0589,
// 0589..0589; COMMON 0x058A,
// 058A..0590; ARMENIAN 0x0591,
// 0591..05FF; HEBREW 0x0600,
// 0600..060B; ARABIC 0x060C,
// 060C..060C; COMMON 0x060D,
// 060D..061A; ARABIC 0x061B,
// 061B..061D; COMMON 0x061E,
// 061E..061E; ARABIC 0x061F,
// 061F..061F; COMMON 0x0620,
// 0620..063F; ARABIC 0x0640,
// 0640..0640; COMMON 0x0641,
// 0641..064A; ARABIC 0x064B,
// 064B..0655; INHERITED 0x0656,
// 0656..065E; ARABIC 0x065F,
// 065F..065F; INHERITED 0x0660,
// 0660..0669; COMMON 0x066A,
// 066A..066F; ARABIC 0x0670,
// 0670..0670; INHERITED 0x0671,
// 0671..06DC; ARABIC 0x06DD,
// 06DD..06DD; COMMON 0x06DE,
// 06DE..06FF; ARABIC 0x0700,
// 0700..074F; SYRIAC 0x0750,
// 0750..077F; ARABIC 0x0780,
// 0780..07BF; THAANA 0x07C0,
// 07C0..07FF; NKO 0x0800,
// 0800..083F; SAMARITAN 0x0840,
// 0840..08FF; MANDAIC 0x0900,
// 0900..0950; DEVANAGARI 0x0951,
// 0951..0952; INHERITED 0x0953,
// 0953..0963; DEVANAGARI 0x0964,
// 0964..0965; COMMON 0x0966,
// 0966..096F; DEVANAGARI 0x0970,
// 0970..0970; COMMON 0x0971,
// 0971..0980; DEVANAGARI 0x0981,
// 0981..0A00; BENGALI 0x0A01,
// 0A01..0A80; GURMUKHI 0x0A81,
// 0A81..0B00; GUJARATI 0x0B01,
// 0B01..0B81; ORIYA 0x0B82,
// 0B82..0C00; TAMIL 0x0C01,
// 0C01..0C81; TELUGU 0x0C82,
// 0C82..0CF0; KANNADA 0x0D02,
// 0D02..0D81; MALAYALAM 0x0D82,
// 0D82..0E00; SINHALA 0x0E01,
// 0E01..0E3E; THAI 0x0E3F,
// 0E3F..0E3F; COMMON 0x0E40,
// 0E40..0E80; THAI 0x0E81,
// 0E81..0EFF; LAO 0x0F00,
// 0F00..0FD4; TIBETAN 0x0FD5,
// 0FD5..0FD8; COMMON 0x0FD9,
// 0FD9..0FFF; TIBETAN 0x1000,
// 1000..109F; MYANMAR 0x10A0,
// 10A0..10FA; GEORGIAN 0x10FB,
// 10FB..10FB; COMMON 0x10FC,
// 10FC..10FF; GEORGIAN 0x1100,
// 1100..11FF; HANGUL 0x1200,
// 1200..139F; ETHIOPIC 0x13A0,
// 13A0..13FF; CHEROKEE 0x1400,
// 1400..167F; CANADIAN_ABORIGINAL 0x1680,
// 1680..169F; OGHAM 0x16A0,
// 16A0..16EA; RUNIC 0x16EB,
// 16EB..16ED; COMMON 0x16EE,
// 16EE..16FF; RUNIC 0x1700,
// 1700..171F; TAGALOG 0x1720,
// 1720..1734; HANUNOO 0x1735,
// 1735..173F; COMMON 0x1740,
// 1740..175F; BUHID 0x1760,
// 1760..177F; TAGBANWA 0x1780,
// 1780..17FF; KHMER 0x1800,
// 1800..1801; MONGOLIAN 0x1802,
// 1802..1803; COMMON 0x1804,
// 1804..1804; MONGOLIAN 0x1805,
// 1805..1805; COMMON 0x1806,
// 1806..18AF; MONGOLIAN 0x18B0,
// 18B0..18FF; CANADIAN_ABORIGINAL 0x1900,
// 1900..194F; LIMBU 0x1950,
// 1950..197F; TAI_LE 0x1980,
// 1980..19DF; NEW_TAI_LUE 0x19E0,
// 19E0..19FF; KHMER 0x1A00,
// 1A00..1A1F; BUGINESE 0x1A20,
// 1A20..1AFF; TAI_THAM 0x1B00,
// 1B00..1B7F; BALINESE 0x1B80,
// 1B80..1BBF; SUNDANESE 0x1BC0,
// 1BC0..1BFF; BATAK 0x1C00,
// 1C00..1C4F; LEPCHA 0x1C50,
// 1C50..1CCF; OL_CHIKI 0x1CD0,
// 1CD0..1CD2; INHERITED 0x1CD3,
// 1CD3..1CD3; COMMON 0x1CD4,
// 1CD4..1CE0; INHERITED 0x1CE1,
// 1CE1..1CE1; COMMON 0x1CE2,
// 1CE2..1CE8; INHERITED 0x1CE9,
// 1CE9..1CEC; COMMON 0x1CED,
// 1CED..1CED; INHERITED 0x1CEE,
// 1CEE..1CFF; COMMON 0x1D00,
// 1D00..1D25; LATIN 0x1D26,
// 1D26..1D2A; GREEK 0x1D2B,
// 1D2B..1D2B; CYRILLIC 0x1D2C,
// 1D2C..1D5C; LATIN 0x1D5D,
// 1D5D..1D61; GREEK 0x1D62,
// 1D62..1D65; LATIN 0x1D66,
// 1D66..1D6A; GREEK 0x1D6B,
// 1D6B..1D77; LATIN 0x1D78,
// 1D78..1D78; CYRILLIC 0x1D79,
// 1D79..1DBE; LATIN 0x1DBF,
// 1DBF..1DBF; GREEK 0x1DC0,
// 1DC0..1DFF; INHERITED 0x1E00,
// 1E00..1EFF; LATIN 0x1F00,
// 1F00..1FFF; GREEK 0x2000,
// 2000..200B; COMMON 0x200C,
// 200C..200D; INHERITED 0x200E,
// 200E..2070; COMMON 0x2071,
// 2071..2073; LATIN 0x2074,
// 2074..207E; COMMON 0x207F,
// 207F..207F; LATIN 0x2080,
// 2080..208F; COMMON 0x2090,
// 2090..209F; LATIN 0x20A0,
// 20A0..20CF; COMMON 0x20D0,
// 20D0..20FF; INHERITED 0x2100,
// 2100..2125; COMMON 0x2126,
// 2126..2126; GREEK 0x2127,
// 2127..2129; COMMON 0x212A,
// 212A..212B; LATIN 0x212C,
// 212C..2131; COMMON 0x2132,
// 2132..2132; LATIN 0x2133,
// 2133..214D; COMMON 0x214E,
// 214E..214E; LATIN 0x214F,
// 214F..215F; COMMON 0x2160,
// 2160..2188; LATIN 0x2189,
// 2189..27FF; COMMON 0x2800,
// 2800..28FF; BRAILLE 0x2900,
// 2900..2BFF; COMMON 0x2C00,
// 2C00..2C5F; GLAGOLITIC 0x2C60,
// 2C60..2C7F; LATIN 0x2C80,
// 2C80..2CFF; COPTIC 0x2D00,
// 2D00..2D2F; GEORGIAN 0x2D30,
// 2D30..2D7F; TIFINAGH 0x2D80,
// 2D80..2DDF; ETHIOPIC 0x2DE0,
// 2DE0..2DFF; CYRILLIC 0x2E00,
// 2E00..2E7F; COMMON 0x2E80,
// 2E80..2FEF; HAN 0x2FF0,
// 2FF0..3004; COMMON 0x3005,
// 3005..3005; HAN 0x3006,
// 3006..3006; COMMON 0x3007,
// 3007..3007; HAN 0x3008,
// 3008..3020; COMMON 0x3021,
// 3021..3029; HAN 0x302A,
// 302A..302D; INHERITED 0x302E,
// 302E..302F; HANGUL 0x3030,
// 3030..3037; COMMON 0x3038,
// 3038..303B; HAN 0x303C,
// 303C..3040; COMMON 0x3041,
// 3041..3098; HIRAGANA 0x3099,
// 3099..309A; INHERITED 0x309B,
// 309B..309C; COMMON 0x309D,
// 309D..309F; HIRAGANA 0x30A0,
// 30A0..30A0; COMMON 0x30A1,
// 30A1..30FA; KATAKANA 0x30FB,
// 30FB..30FC; COMMON 0x30FD,
// 30FD..3104; KATAKANA 0x3105,
// 3105..3130; BOPOMOFO 0x3131,
// 3131..318F; HANGUL 0x3190,
// 3190..319F; COMMON 0x31A0,
// 31A0..31BF; BOPOMOFO 0x31C0,
// 31C0..31EF; COMMON 0x31F0,
// 31F0..31FF; KATAKANA 0x3200,
// 3200..321F; HANGUL 0x3220,
// 3220..325F; COMMON 0x3260,
// 3260..327E; HANGUL 0x327F,
// 327F..32CF; COMMON 0x32D0,
// 32D0..3357; KATAKANA 0x3358,
// 3358..33FF; COMMON 0x3400,
// 3400..4DBF; HAN 0x4DC0,
// 4DC0..4DFF; COMMON 0x4E00,
// 4E00..9FFF; HAN 0xA000,
// A000..A4CF; YI 0xA4D0,
// A4D0..A4FF; LISU 0xA500,
// A500..A63F; VAI 0xA640,
// A640..A69F; CYRILLIC 0xA6A0,
// A6A0..A6FF; BAMUM 0xA700,
// A700..A721; COMMON 0xA722,
// A722..A787; LATIN 0xA788,
// A788..A78A; COMMON 0xA78B,
// A78B..A7FF; LATIN 0xA800,
// A800..A82F; SYLOTI_NAGRI 0xA830,
// A830..A83F; COMMON 0xA840,
// A840..A87F; PHAGS_PA 0xA880,
// A880..A8DF; SAURASHTRA 0xA8E0,
// A8E0..A8FF; DEVANAGARI 0xA900,
// A900..A92F; KAYAH_LI 0xA930,
// A930..A95F; REJANG 0xA960,
// A960..A97F; HANGUL 0xA980,
// A980..A9FF; JAVANESE 0xAA00,
// AA00..AA5F; CHAM 0xAA60,
// AA60..AA7F; MYANMAR 0xAA80,
// AA80..AB00; TAI_VIET 0xAB01,
// AB01..ABBF; ETHIOPIC 0xABC0,
// ABC0..ABFF; MEETEI_MAYEK 0xAC00,
// AC00..D7FB; HANGUL 0xD7FC,
// D7FC..F8FF; UNKNOWN 0xF900,
// F900..FAFF; HAN 0xFB00,
// FB00..FB12; LATIN 0xFB13,
// FB13..FB1C; ARMENIAN 0xFB1D,
// FB1D..FB4F; HEBREW 0xFB50,
// FB50..FD3D; ARABIC 0xFD3E,
// FD3E..FD4F; COMMON 0xFD50,
// FD50..FDFC; ARABIC 0xFDFD,
// FDFD..FDFF; COMMON 0xFE00,
// FE00..FE0F; INHERITED 0xFE10,
// FE10..FE1F; COMMON 0xFE20,
// FE20..FE2F; INHERITED 0xFE30,
// FE30..FE6F; COMMON 0xFE70,
// FE70..FEFE; ARABIC 0xFEFF,
// FEFF..FF20; COMMON 0xFF21,
// FF21..FF3A; LATIN 0xFF3B,
// FF3B..FF40; COMMON 0xFF41,
// FF41..FF5A; LATIN 0xFF5B,
// FF5B..FF65; COMMON 0xFF66,
// FF66..FF6F; KATAKANA 0xFF70,
// FF70..FF70; COMMON 0xFF71,
// FF71..FF9D; KATAKANA 0xFF9E,
// FF9E..FF9F; COMMON 0xFFA0,
// FFA0..FFDF; HANGUL 0xFFE0,
// FFE0..FFFF; COMMON 0x10000,
// 10000..100FF; LINEAR_B 0x10100,
// 10100..1013F; COMMON 0x10140,
// 10140..1018F; GREEK 0x10190,
// 10190..101FC; COMMON 0x101FD,
// 101FD..1027F; INHERITED 0x10280,
// 10280..1029F; LYCIAN 0x102A0,
// 102A0..102FF; CARIAN 0x10300,
// 10300..1032F; OLD_ITALIC 0x10330,
// 10330..1037F; GOTHIC 0x10380,
// 10380..1039F; UGARITIC 0x103A0,
// 103A0..103FF; OLD_PERSIAN 0x10400,
// 10400..1044F; DESERET 0x10450,
// 10450..1047F; SHAVIAN 0x10480,
// 10480..107FF; OSMANYA 0x10800,
// 10800..1083F; CYPRIOT 0x10840,
// 10840..108FF; IMPERIAL_ARAMAIC 0x10900,
// 10900..1091F; PHOENICIAN 0x10920,
// 10920..109FF; LYDIAN 0x10A00,
// 10A00..10A5F; KHAROSHTHI 0x10A60,
// 10A60..10AFF; OLD_SOUTH_ARABIAN 0x10B00,
// 10B00..10B3F; AVESTAN 0x10B40,
// 10B40..10B5F; INSCRIPTIONAL_PARTHIAN 0x10B60,
// 10B60..10BFF; INSCRIPTIONAL_PAHLAVI 0x10C00,
// 10C00..10E5F; OLD_TURKIC 0x10E60,
// 10E60..10FFF; ARABIC 0x11000,
// 11000..1107F; BRAHMI 0x11080,
// 11080..11FFF; KAITHI 0x12000,
// 12000..12FFF; CUNEIFORM 0x13000,
// 13000..167FF; EGYPTIAN_HIEROGLYPHS 0x16800,
// 16800..16A38; BAMUM 0x1B000,
// 1B000..1B000; KATAKANA 0x1B001,
// 1B001..1CFFF; HIRAGANA 0x1D000,
// 1D000..1D166; COMMON 0x1D167,
// 1D167..1D169; INHERITED 0x1D16A,
// 1D16A..1D17A; COMMON 0x1D17B,
// 1D17B..1D182; INHERITED 0x1D183,
// 1D183..1D184; COMMON 0x1D185,
// 1D185..1D18B; INHERITED 0x1D18C,
// 1D18C..1D1A9; COMMON 0x1D1AA,
// 1D1AA..1D1AD; INHERITED 0x1D1AE,
// 1D1AE..1D1FF; COMMON 0x1D200,
// 1D200..1D2FF; GREEK 0x1D300,
// 1D300..1F1FF; COMMON 0x1F200,
// 1F200..1F200; HIRAGANA 0x1F201,
// 1F210..1FFFF; COMMON 0x20000,
// 20000..E0000; HAN 0xE0001,
// E0001..E00FF; COMMON 0xE0100,
// E0100..E01EF; INHERITED 0xE01F0 // E01F0..10FFFF; UNKNOWN // it appears we don't have the KATAKANA_OR_HIRAGANA //aliases.put("HRKT", KATAKANA_OR_HIRAGANA); * Returns the enum constant representing the Unicode script of which * the given character (Unicode code point) is assigned to. * @param codePoint the character (Unicode code point) in question. * @return The {@code UnicodeScript} constant representing the * Unicode script of which this character is assigned to. * @exception IllegalArgumentException if the specified * {@code codePoint} is an invalid Unicode code point. * @see Character#isValidCodePoint(int) // leave SURROGATE and PRIVATE_USE for table lookup * Returns the UnicodeScript constant with the given Unicode script * name or the script name alias. Script names and their aliases are * determined by The Unicode Standard. The files Scripts<version>.txt * and PropertyValueAliases<version>.txt define script names * and the script name aliases for a particular version of the * standard. The {@link Character} class specifies the version of * the standard that it supports. * Character case is ignored for all of the valid script names. * The en_US locale's case mapping rules are used to provide * case-insensitive string comparisons for script name validation. * @param scriptName A {@code UnicodeScript} name. * @return The {@code UnicodeScript} constant identified * @throws IllegalArgumentException if {@code scriptName} is an * @throws NullPointerException if {@code scriptName} is null * The value of the {@code Character}. private final char value;
/** use serialVersionUID from JDK 1.0.2 for interoperability */ * Constructs a newly allocated {@code Character} object that * represents the specified {@code char} value. * @param value the value to be represented by the * {@code Character} object. * Returns a <tt>Character</tt> instance representing the specified * If a new <tt>Character</tt> instance is not required, this method * should generally be used in preference to the constructor * {@link #Character(char)}, as this method is likely to yield * significantly better space and time performance by caching * frequently requested values. * This method will always cache values in the range {@code * '\u005Cu0000'} to {@code '\u005Cu007F'}, inclusive, and may * cache other values outside of this range. * @return a <tt>Character</tt> instance representing <tt>c</tt>. if (c <=
127) {
// must cache * Returns the value of this {@code Character} object. * @return the primitive {@code char} value represented by * Returns a hash code for this {@code Character}; equal to the result * of invoking {@code charValue()}. * @return a hash code value for this {@code Character} * Compares this object against the specified object. * The result is {@code true} if and only if the argument is not * {@code null} and is a {@code Character} object that * represents the same {@code char} value as this object. * @param obj the object to compare with. * @return {@code true} if the objects are the same; * {@code false} otherwise. * Returns a {@code String} object representing this * {@code Character}'s value. The result is a string of * length 1 whose sole component is the primitive * {@code char} value represented by this * {@code Character} object. * @return a string representation of this object. * Returns a {@code String} object representing the * specified {@code char}. The result is a string of length * 1 consisting solely of the specified {@code char}. * @param c the {@code char} to be converted * @return the string representation of the specified {@code char} * Determines whether the specified code point is a valid * Unicode code point value</a>. * @param codePoint the Unicode code point to be tested * @return {@code true} if the specified code point value is between * {@link #MIN_CODE_POINT} and * {@link #MAX_CODE_POINT} inclusive; * {@code false} otherwise. // codePoint >= MIN_CODE_POINT && codePoint <= MAX_CODE_POINT * Determines whether the specified character (Unicode code point) * is in the <a href="#BMP">Basic Multilingual Plane (BMP)</a>. * Such code points can be represented using a single {@code char}. * @param codePoint the character (Unicode code point) to be tested * @return {@code true} if the specified code point is between * {@link #MIN_VALUE} and {@link #MAX_VALUE} inclusive; * {@code false} otherwise. // codePoint >= MIN_VALUE && codePoint <= MAX_VALUE // We consistently use logical shift (>>>) to facilitate // additional runtime optimizations. * Determines whether the specified character (Unicode code point) * is in the <a href="#supplementary">supplementary character</a> range. * @param codePoint the character (Unicode code point) to be tested * @return {@code true} if the specified code point is between * {@link #MIN_SUPPLEMENTARY_CODE_POINT} and * {@link #MAX_CODE_POINT} inclusive; * {@code false} otherwise. * Determines if the given {@code char} value is a * Unicode high-surrogate code unit</a> * (also known as <i>leading-surrogate code unit</i>). * <p>Such values do not represent characters by themselves, * but are used in the representation of * <a href="#supplementary">supplementary characters</a> * in the UTF-16 encoding. * @param ch the {@code char} value to be tested. * @return {@code true} if the {@code char} value is between * {@link #MIN_HIGH_SURROGATE} and * {@link #MAX_HIGH_SURROGATE} inclusive; * {@code false} otherwise. * @see Character#isLowSurrogate(char) * @see Character.UnicodeBlock#of(int) // Help VM constant-fold; MAX_HIGH_SURROGATE + 1 == MIN_LOW_SURROGATE * Determines if the given {@code char} value is a * Unicode low-surrogate code unit</a> * (also known as <i>trailing-surrogate code unit</i>). * <p>Such values do not represent characters by themselves, * but are used in the representation of * <a href="#supplementary">supplementary characters</a> * in the UTF-16 encoding. * @param ch the {@code char} value to be tested. * @return {@code true} if the {@code char} value is between * {@link #MIN_LOW_SURROGATE} and * {@link #MAX_LOW_SURROGATE} inclusive; * {@code false} otherwise. * @see Character#isHighSurrogate(char) * Determines if the given {@code char} value is a Unicode * <i>surrogate code unit</i>. * <p>Such values do not represent characters by themselves, * but are used in the representation of * <a href="#supplementary">supplementary characters</a> * in the UTF-16 encoding. * <p>A char value is a surrogate code unit if and only if it is either * a {@linkplain #isLowSurrogate(char) low-surrogate code unit} or * a {@linkplain #isHighSurrogate(char) high-surrogate code unit}. * @param ch the {@code char} value to be tested. * @return {@code true} if the {@code char} value is between * {@link #MIN_SURROGATE} and * {@link #MAX_SURROGATE} inclusive; * {@code false} otherwise. * Determines whether the specified pair of {@code char} * Unicode surrogate pair</a>. * <p>This method is equivalent to the expression: * isHighSurrogate(high) && isLowSurrogate(low) * @param high the high-surrogate code value to be tested * @param low the low-surrogate code value to be tested * @return {@code true} if the specified high and * low-surrogate code values represent a valid surrogate pair; * {@code false} otherwise. * Determines the number of {@code char} values needed to * represent the specified character (Unicode code point). If the * specified character is equal to or greater than 0x10000, then * the method returns 2. Otherwise, the method returns 1. * <p>This method doesn't validate the specified character to be a * valid Unicode code point. The caller must validate the * character value using {@link #isValidCodePoint(int) isValidCodePoint} * @param codePoint the character (Unicode code point) to be tested. * @return 2 if the character is a valid supplementary character; 1 otherwise. * @see Character#isSupplementaryCodePoint(int) * Converts the specified surrogate pair to its supplementary code * point value. This method does not validate the specified * surrogate pair. The caller must validate it using {@link * #isSurrogatePair(char, char) isSurrogatePair} if necessary. * @param high the high-surrogate code unit * @param low the low-surrogate code unit * @return the supplementary code point composed from the * specified surrogate pair. // return ((high - MIN_HIGH_SURROGATE) << 10) // + (low - MIN_LOW_SURROGATE) // + MIN_SUPPLEMENTARY_CODE_POINT; * Returns the code point at the given index of the * {@code CharSequence}. If the {@code char} value at * the given index in the {@code CharSequence} is in the * high-surrogate range, the following index is less than the * length of the {@code CharSequence}, and the * {@code char} value at the following index is in the * low-surrogate range, then the supplementary code point * corresponding to this surrogate pair is returned. Otherwise, * the {@code char} value at the given index is returned. * @param seq a sequence of {@code char} values (Unicode code * @param index the index to the {@code char} values (Unicode * code units) in {@code seq} to be converted * @return the Unicode code point at the given index * @exception NullPointerException if {@code seq} is null. * @exception IndexOutOfBoundsException if the value * {@code index} is negative or not less than * {@link CharSequence#length() seq.length()}. * Returns the code point at the given index of the * {@code char} array. If the {@code char} value at * the given index in the {@code char} array is in the * high-surrogate range, the following index is less than the * length of the {@code char} array, and the * {@code char} value at the following index is in the * low-surrogate range, then the supplementary code point * corresponding to this surrogate pair is returned. Otherwise, * the {@code char} value at the given index is returned. * @param a the {@code char} array * @param index the index to the {@code char} values (Unicode * code units) in the {@code char} array to be converted * @return the Unicode code point at the given index * @exception NullPointerException if {@code a} is null. * @exception IndexOutOfBoundsException if the value * {@code index} is negative or not less than * the length of the {@code char} array. * Returns the code point at the given index of the * {@code char} array, where only array elements with * {@code index} less than {@code limit} can be used. If * the {@code char} value at the given index in the * {@code char} array is in the high-surrogate range, the * following index is less than the {@code limit}, and the * {@code char} value at the following index is in the * low-surrogate range, then the supplementary code point * corresponding to this surrogate pair is returned. Otherwise, * the {@code char} value at the given index is returned. * @param a the {@code char} array * @param index the index to the {@code char} values (Unicode * code units) in the {@code char} array to be converted * @param limit the index after the last array element that * can be used in the {@code char} array * @return the Unicode code point at the given index * @exception NullPointerException if {@code a} is null. * @exception IndexOutOfBoundsException if the {@code index} * argument is negative or not less than the {@code limit} * argument, or if the {@code limit} argument is negative or * greater than the length of the {@code char} array. // throws ArrayIndexOutofBoundsException if index out of bounds * Returns the code point preceding the given index of the * {@code CharSequence}. If the {@code char} value at * {@code (index - 1)} in the {@code CharSequence} is in * the low-surrogate range, {@code (index - 2)} is not * negative, and the {@code char} value at {@code (index - 2)} * in the {@code CharSequence} is in the * high-surrogate range, then the supplementary code point * corresponding to this surrogate pair is returned. Otherwise, * the {@code char} value at {@code (index - 1)} is * @param seq the {@code CharSequence} instance * @param index the index following the code point that should be returned * @return the Unicode code point value before the given index. * @exception NullPointerException if {@code seq} is null. * @exception IndexOutOfBoundsException if the {@code index} * argument is less than 1 or greater than {@link * CharSequence#length() seq.length()}. * Returns the code point preceding the given index of the * {@code char} array. If the {@code char} value at * {@code (index - 1)} in the {@code char} array is in * the low-surrogate range, {@code (index - 2)} is not * negative, and the {@code char} value at {@code (index - 2)} * in the {@code char} array is in the * high-surrogate range, then the supplementary code point * corresponding to this surrogate pair is returned. Otherwise, * the {@code char} value at {@code (index - 1)} is * @param a the {@code char} array * @param index the index following the code point that should be returned * @return the Unicode code point value before the given index. * @exception NullPointerException if {@code a} is null. * @exception IndexOutOfBoundsException if the {@code index} * argument is less than 1 or greater than the length of the * Returns the code point preceding the given index of the * {@code char} array, where only array elements with * {@code index} greater than or equal to {@code start} * can be used. If the {@code char} value at {@code (index - 1)} * in the {@code char} array is in the * low-surrogate range, {@code (index - 2)} is not less than * {@code start}, and the {@code char} value at * {@code (index - 2)} in the {@code char} array is in * the high-surrogate range, then the supplementary code point * corresponding to this surrogate pair is returned. Otherwise, * the {@code char} value at {@code (index - 1)} is * @param a the {@code char} array * @param index the index following the code point that should be returned * @param start the index of the first array element in the * @return the Unicode code point value before the given index. * @exception NullPointerException if {@code a} is null. * @exception IndexOutOfBoundsException if the {@code index} * argument is not greater than the {@code start} argument or * is greater than the length of the {@code char} array, or * if the {@code start} argument is negative or not less than * the length of the {@code char} array. // throws ArrayIndexOutofBoundsException if index-1 out of bounds * Returns the leading surrogate (a * high surrogate code unit</a>) of the * representing the specified supplementary character (Unicode * code point) in the UTF-16 encoding. If the specified character * an unspecified {@code char} is returned. * {@link #isSupplementaryCodePoint isSupplementaryCodePoint(x)} * {@link #isHighSurrogate isHighSurrogate}{@code (highSurrogate(x))} and * {@link #toCodePoint toCodePoint}{@code (highSurrogate(x), }{@link #lowSurrogate lowSurrogate}{@code (x)) == x} * are also always {@code true}. * @param codePoint a supplementary character (Unicode code point) * @return the leading surrogate code unit used to represent the * character in the UTF-16 encoding * Returns the trailing surrogate (a * low surrogate code unit</a>) of the * representing the specified supplementary character (Unicode * code point) in the UTF-16 encoding. If the specified character * an unspecified {@code char} is returned. * {@link #isSupplementaryCodePoint isSupplementaryCodePoint(x)} * {@link #isLowSurrogate isLowSurrogate}{@code (lowSurrogate(x))} and * {@link #toCodePoint toCodePoint}{@code (}{@link #highSurrogate highSurrogate}{@code (x), lowSurrogate(x)) == x} * are also always {@code true}. * @param codePoint a supplementary character (Unicode code point) * @return the trailing surrogate code unit used to represent the * character in the UTF-16 encoding * Converts the specified character (Unicode code point) to its * UTF-16 representation. If the specified code point is a BMP * (Basic Multilingual Plane or Plane 0) value, the same value is * stored in {@code dst[dstIndex]}, and 1 is returned. If the * specified code point is a supplementary character, its * surrogate values are stored in {@code dst[dstIndex]} * (high-surrogate) and {@code dst[dstIndex+1]} * (low-surrogate), and 2 is returned. * @param codePoint the character (Unicode code point) to be converted. * @param dst an array of {@code char} in which the * {@code codePoint}'s UTF-16 value is stored. * @param dstIndex the start index into the {@code dst} * array where the converted value is stored. * @return 1 if the code point is a BMP code point, 2 if the * code point is a supplementary code point. * @exception IllegalArgumentException if the specified * {@code codePoint} is not a valid Unicode code point. * @exception NullPointerException if the specified {@code dst} is null. * @exception IndexOutOfBoundsException if {@code dstIndex} * is negative or not less than {@code dst.length}, or if * {@code dst} at {@code dstIndex} doesn't have enough * array element(s) to store the resulting {@code char} * value(s). (If {@code dstIndex} is equal to * {@code dst.length-1} and the specified * {@code codePoint} is a supplementary character, the * high-surrogate value is not stored in * {@code dst[dstIndex]}.) * Converts the specified character (Unicode code point) to its * UTF-16 representation stored in a {@code char} array. If * the specified code point is a BMP (Basic Multilingual Plane or * Plane 0) value, the resulting {@code char} array has * the same value as {@code codePoint}. If the specified code * point is a supplementary code point, the resulting * {@code char} array has the corresponding surrogate pair. * @param codePoint a Unicode code point * @return a {@code char} array having * {@code codePoint}'s UTF-16 representation. * @exception IllegalArgumentException if the specified * {@code codePoint} is not a valid Unicode code point. // We write elements "backwards" to guarantee all-or-nothing * Returns the number of Unicode code points in the text range of * the specified char sequence. The text range begins at the * specified {@code beginIndex} and extends to the * {@code char} at index {@code endIndex - 1}. Thus the * length (in {@code char}s) of the text range is * {@code endIndex-beginIndex}. Unpaired surrogates within * the text range count as one code point each. * @param seq the char sequence * @param beginIndex the index to the first {@code char} of * @param endIndex the index after the last {@code char} of * @return the number of Unicode code points in the specified text * @exception NullPointerException if {@code seq} is null. * @exception IndexOutOfBoundsException if the * {@code beginIndex} is negative, or {@code endIndex} * is larger than the length of the given sequence, or * {@code beginIndex} is larger than {@code endIndex}. * Returns the number of Unicode code points in a subarray of the * {@code char} array argument. The {@code offset} * argument is the index of the first {@code char} of the * subarray and the {@code count} argument specifies the * length of the subarray in {@code char}s. Unpaired * surrogates within the subarray count as one code point each. * @param a the {@code char} array * @param offset the index of the first {@code char} in the * given {@code char} array * @param count the length of the subarray in {@code char}s * @return the number of Unicode code points in the specified subarray * @exception NullPointerException if {@code a} is null. * @exception IndexOutOfBoundsException if {@code offset} or * {@code count} is negative, or if {@code offset + * count} is larger than the length of the given array. * Returns the index within the given char sequence that is offset * from the given {@code index} by {@code codePointOffset} * code points. Unpaired surrogates within the text range given by * {@code index} and {@code codePointOffset} count as * @param seq the char sequence * @param index the index to be offset * @param codePointOffset the offset in code points * @return the index within the char sequence * @exception NullPointerException if {@code seq} is null. * @exception IndexOutOfBoundsException if {@code index} * is negative or larger then the length of the char sequence, * or if {@code codePointOffset} is positive and the * subsequence starting with {@code index} has fewer than * {@code codePointOffset} code points, or if * {@code codePointOffset} is negative and the subsequence * before {@code index} has fewer than the absolute value * of {@code codePointOffset} code points. * Returns the index within the given {@code char} subarray * that is offset from the given {@code index} by * {@code codePointOffset} code points. The * {@code start} and {@code count} arguments specify a * subarray of the {@code char} array. Unpaired surrogates * within the text range given by {@code index} and * {@code codePointOffset} count as one code point each. * @param a the {@code char} array * @param start the index of the first {@code char} of the * @param count the length of the subarray in {@code char}s * @param index the index to be offset * @param codePointOffset the offset in code points * @return the index within the subarray * @exception NullPointerException if {@code a} is null. * @exception IndexOutOfBoundsException * if {@code start} or {@code count} is negative, * or if {@code start + count} is larger than the length of * or if {@code index} is less than {@code start} or * larger then {@code start + count}, * or if {@code codePointOffset} is positive and the text range * starting with {@code index} and ending with {@code start + count - 1} * has fewer than {@code codePointOffset} code * or if {@code codePointOffset} is negative and the text range * starting with {@code start} and ending with {@code index - 1} * has fewer than the absolute value of * {@code codePointOffset} code points. * Determines if the specified character is a lowercase character. * A character is lowercase if its general category type, provided * by {@code Character.getType(ch)}, is * {@code LOWERCASE_LETTER}, or it has contributory property * Other_Lowercase as defined by the Unicode Standard. * The following are examples of lowercase characters: * a b c d e f g h i j k l m n o p q r s t u v w x y z * '\u00DF' '\u00E0' '\u00E1' '\u00E2' '\u00E3' '\u00E4' '\u00E5' '\u00E6' * '\u00E7' '\u00E8' '\u00E9' '\u00EA' '\u00EB' '\u00EC' '\u00ED' '\u00EE' * '\u00EF' '\u00F0' '\u00F1' '\u00F2' '\u00F3' '\u00F4' '\u00F5' '\u00F6' * '\u00F8' '\u00F9' '\u00FA' '\u00FB' '\u00FC' '\u00FD' '\u00FE' '\u00FF' * <p> Many other Unicode characters are lowercase too. * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use * the {@link #isLowerCase(int)} method. * @param ch the character to be tested. * @return {@code true} if the character is lowercase; * {@code false} otherwise. * @see Character#isLowerCase(char) * @see Character#isTitleCase(char) * @see Character#toLowerCase(char) * @see Character#getType(char) * Determines if the specified character (Unicode code point) is a * A character is lowercase if its general category type, provided * by {@link Character#getType getType(codePoint)}, is * {@code LOWERCASE_LETTER}, or it has contributory property * Other_Lowercase as defined by the Unicode Standard. * The following are examples of lowercase characters: * a b c d e f g h i j k l m n o p q r s t u v w x y z * '\u00DF' '\u00E0' '\u00E1' '\u00E2' '\u00E3' '\u00E4' '\u00E5' '\u00E6' * '\u00E7' '\u00E8' '\u00E9' '\u00EA' '\u00EB' '\u00EC' '\u00ED' '\u00EE' * '\u00EF' '\u00F0' '\u00F1' '\u00F2' '\u00F3' '\u00F4' '\u00F5' '\u00F6' * '\u00F8' '\u00F9' '\u00FA' '\u00FB' '\u00FC' '\u00FD' '\u00FE' '\u00FF' * <p> Many other Unicode characters are lowercase too. * @param codePoint the character (Unicode code point) to be tested. * @return {@code true} if the character is lowercase; * {@code false} otherwise. * @see Character#isLowerCase(int) * @see Character#isTitleCase(int) * @see Character#toLowerCase(int) * @see Character#getType(int) * Determines if the specified character is an uppercase character. * A character is uppercase if its general category type, provided by * {@code Character.getType(ch)}, is {@code UPPERCASE_LETTER}. * or it has contributory property Other_Uppercase as defined by the Unicode Standard. * The following are examples of uppercase characters: * A B C D E F G H I J K L M N O P Q R S T U V W X Y Z * '\u00C0' '\u00C1' '\u00C2' '\u00C3' '\u00C4' '\u00C5' '\u00C6' '\u00C7' * '\u00C8' '\u00C9' '\u00CA' '\u00CB' '\u00CC' '\u00CD' '\u00CE' '\u00CF' * '\u00D0' '\u00D1' '\u00D2' '\u00D3' '\u00D4' '\u00D5' '\u00D6' '\u00D8' * '\u00D9' '\u00DA' '\u00DB' '\u00DC' '\u00DD' '\u00DE' * <p> Many other Unicode characters are uppercase too.<p> * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use * the {@link #isUpperCase(int)} method. * @param ch the character to be tested. * @return {@code true} if the character is uppercase; * {@code false} otherwise. * @see Character#isLowerCase(char) * @see Character#isTitleCase(char) * @see Character#toUpperCase(char) * @see Character#getType(char) * Determines if the specified character (Unicode code point) is an uppercase character. * A character is uppercase if its general category type, provided by * {@link Character#getType(int) getType(codePoint)}, is {@code UPPERCASE_LETTER}, * or it has contributory property Other_Uppercase as defined by the Unicode Standard. * The following are examples of uppercase characters: * A B C D E F G H I J K L M N O P Q R S T U V W X Y Z * '\u00C0' '\u00C1' '\u00C2' '\u00C3' '\u00C4' '\u00C5' '\u00C6' '\u00C7' * '\u00C8' '\u00C9' '\u00CA' '\u00CB' '\u00CC' '\u00CD' '\u00CE' '\u00CF' * '\u00D0' '\u00D1' '\u00D2' '\u00D3' '\u00D4' '\u00D5' '\u00D6' '\u00D8' * '\u00D9' '\u00DA' '\u00DB' '\u00DC' '\u00DD' '\u00DE' * <p> Many other Unicode characters are uppercase too.<p> * @param codePoint the character (Unicode code point) to be tested. * @return {@code true} if the character is uppercase; * {@code false} otherwise. * @see Character#isLowerCase(int) * @see Character#isTitleCase(int) * @see Character#toUpperCase(int) * @see Character#getType(int) * Determines if the specified character is a titlecase character. * A character is a titlecase character if its general * category type, provided by {@code Character.getType(ch)}, * is {@code TITLECASE_LETTER}. * Some characters look like pairs of Latin letters. For example, there * is an uppercase letter that looks like "LJ" and has a corresponding * lowercase letter that looks like "lj". A third form, which looks like "Lj", * is the appropriate form to use when rendering a word in lowercase * with initial capitals, as for a book title. * These are some of the Unicode characters for which this method returns * <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON} * <li>{@code LATIN CAPITAL LETTER L WITH SMALL LETTER J} * <li>{@code LATIN CAPITAL LETTER N WITH SMALL LETTER J} * <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z} * <p> Many other Unicode characters are titlecase too.<p> * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use * the {@link #isTitleCase(int)} method. * @param ch the character to be tested. * @return {@code true} if the character is titlecase; * {@code false} otherwise. * @see Character#isLowerCase(char) * @see Character#isUpperCase(char) * @see Character#toTitleCase(char) * @see Character#getType(char) * Determines if the specified character (Unicode code point) is a titlecase character. * A character is a titlecase character if its general * category type, provided by {@link Character#getType(int) getType(codePoint)}, * is {@code TITLECASE_LETTER}. * Some characters look like pairs of Latin letters. For example, there * is an uppercase letter that looks like "LJ" and has a corresponding * lowercase letter that looks like "lj". A third form, which looks like "Lj", * is the appropriate form to use when rendering a word in lowercase * with initial capitals, as for a book title. * These are some of the Unicode characters for which this method returns * <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON} * <li>{@code LATIN CAPITAL LETTER L WITH SMALL LETTER J} * <li>{@code LATIN CAPITAL LETTER N WITH SMALL LETTER J} * <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z} * <p> Many other Unicode characters are titlecase too.<p> * @param codePoint the character (Unicode code point) to be tested. * @return {@code true} if the character is titlecase; * {@code false} otherwise. * @see Character#isLowerCase(int) * @see Character#isUpperCase(int) * @see Character#toTitleCase(int) * @see Character#getType(int) * Determines if the specified character is a digit. * A character is a digit if its general category type, provided * by {@code Character.getType(ch)}, is * {@code DECIMAL_DIGIT_NUMBER}. * Some Unicode character ranges that contain digits: * <li>{@code '\u005Cu0030'} through {@code '\u005Cu0039'}, * ISO-LATIN-1 digits ({@code '0'} through {@code '9'}) * <li>{@code '\u005Cu0660'} through {@code '\u005Cu0669'}, * <li>{@code '\u005Cu06F0'} through {@code '\u005Cu06F9'}, * Extended Arabic-Indic digits * <li>{@code '\u005Cu0966'} through {@code '\u005Cu096F'}, * <li>{@code '\u005CuFF10'} through {@code '\u005CuFF19'}, * Many other character ranges contain digits as well. * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use * the {@link #isDigit(int)} method. * @param ch the character to be tested. * @return {@code true} if the character is a digit; * {@code false} otherwise. * @see Character#digit(char, int) * @see Character#forDigit(int, int) * @see Character#getType(char) * Determines if the specified character (Unicode code point) is a digit. * A character is a digit if its general category type, provided * by {@link Character#getType(int) getType(codePoint)}, is * {@code DECIMAL_DIGIT_NUMBER}. * Some Unicode character ranges that contain digits: * <li>{@code '\u005Cu0030'} through {@code '\u005Cu0039'}, * ISO-LATIN-1 digits ({@code '0'} through {@code '9'}) * <li>{@code '\u005Cu0660'} through {@code '\u005Cu0669'}, * <li>{@code '\u005Cu06F0'} through {@code '\u005Cu06F9'}, * Extended Arabic-Indic digits * <li>{@code '\u005Cu0966'} through {@code '\u005Cu096F'}, * <li>{@code '\u005CuFF10'} through {@code '\u005CuFF19'}, * Many other character ranges contain digits as well. * @param codePoint the character (Unicode code point) to be tested. * @return {@code true} if the character is a digit; * {@code false} otherwise. * @see Character#forDigit(int, int) * @see Character#getType(int) * Determines if a character is defined in Unicode. * A character is defined if at least one of the following is true: * <li>It has an entry in the UnicodeData file. * <li>It has a value in a range defined by the UnicodeData file. * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use * the {@link #isDefined(int)} method. * @param ch the character to be tested * @return {@code true} if the character has a defined meaning * in Unicode; {@code false} otherwise. * @see Character#isDigit(char) * @see Character#isLetter(char) * @see Character#isLetterOrDigit(char) * @see Character#isLowerCase(char) * @see Character#isTitleCase(char) * @see Character#isUpperCase(char) * Determines if a character (Unicode code point) is defined in Unicode. * A character is defined if at least one of the following is true: * <li>It has an entry in the UnicodeData file. * <li>It has a value in a range defined by the UnicodeData file. * @param codePoint the character (Unicode code point) to be tested. * @return {@code true} if the character has a defined meaning * in Unicode; {@code false} otherwise. * @see Character#isDigit(int) * @see Character#isLetter(int) * @see Character#isLetterOrDigit(int) * @see Character#isLowerCase(int) * @see Character#isTitleCase(int) * @see Character#isUpperCase(int) * Determines if the specified character is a letter. * A character is considered to be a letter if its general * category type, provided by {@code Character.getType(ch)}, * is any of the following: * <li> {@code UPPERCASE_LETTER} * <li> {@code LOWERCASE_LETTER} * <li> {@code TITLECASE_LETTER} * <li> {@code MODIFIER_LETTER} * <li> {@code OTHER_LETTER} * Not all letters have case. Many characters are * letters but are neither uppercase nor lowercase nor titlecase. * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use * the {@link #isLetter(int)} method. * @param ch the character to be tested. * @return {@code true} if the character is a letter; * {@code false} otherwise. * @see Character#isDigit(char) * @see Character#isJavaIdentifierStart(char) * @see Character#isJavaLetter(char) * @see Character#isJavaLetterOrDigit(char) * @see Character#isLetterOrDigit(char) * @see Character#isLowerCase(char) * @see Character#isTitleCase(char) * @see Character#isUnicodeIdentifierStart(char) * @see Character#isUpperCase(char) * Determines if the specified character (Unicode code point) is a letter. * A character is considered to be a letter if its general * category type, provided by {@link Character#getType(int) getType(codePoint)}, * is any of the following: * <li> {@code UPPERCASE_LETTER} * <li> {@code LOWERCASE_LETTER} * <li> {@code TITLECASE_LETTER} * <li> {@code MODIFIER_LETTER} * <li> {@code OTHER_LETTER} * Not all letters have case. Many characters are * letters but are neither uppercase nor lowercase nor titlecase. * @param codePoint the character (Unicode code point) to be tested. * @return {@code true} if the character is a letter; * {@code false} otherwise. * @see Character#isDigit(int) * @see Character#isJavaIdentifierStart(int) * @see Character#isLetterOrDigit(int) * @see Character#isLowerCase(int) * @see Character#isTitleCase(int) * @see Character#isUnicodeIdentifierStart(int) * @see Character#isUpperCase(int) * Determines if the specified character is a letter or digit. * A character is considered to be a letter or digit if either * {@code Character.isLetter(char ch)} or * {@code Character.isDigit(char ch)} returns * {@code true} for the character. * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use * the {@link #isLetterOrDigit(int)} method. * @param ch the character to be tested. * @return {@code true} if the character is a letter or digit; * {@code false} otherwise. * @see Character#isDigit(char) * @see Character#isJavaIdentifierPart(char) * @see Character#isJavaLetter(char) * @see Character#isJavaLetterOrDigit(char) * @see Character#isLetter(char) * @see Character#isUnicodeIdentifierPart(char) * Determines if the specified character (Unicode code point) is a letter or digit. * A character is considered to be a letter or digit if either * {@link #isLetter(int) isLetter(codePoint)} or * {@link #isDigit(int) isDigit(codePoint)} returns * {@code true} for the character. * @param codePoint the character (Unicode code point) to be tested. * @return {@code true} if the character is a letter or digit; * {@code false} otherwise. * @see Character#isDigit(int) * @see Character#isJavaIdentifierPart(int) * @see Character#isLetter(int) * @see Character#isUnicodeIdentifierPart(int) * Determines if the specified character is permissible as the first * character in a Java identifier. * A character may start a Java identifier if and only if * one of the following is true: * <li> {@link #isLetter(char) isLetter(ch)} returns {@code true} * <li> {@link #getType(char) getType(ch)} returns {@code LETTER_NUMBER} * <li> {@code ch} is a currency symbol (such as {@code '$'}) * <li> {@code ch} is a connecting punctuation character (such as {@code '_'}). * @param ch the character to be tested. * @return {@code true} if the character may start a Java * identifier; {@code false} otherwise. * @see Character#isJavaLetterOrDigit(char) * @see Character#isJavaIdentifierStart(char) * @see Character#isJavaIdentifierPart(char) * @see Character#isLetter(char) * @see Character#isLetterOrDigit(char) * @see Character#isUnicodeIdentifierStart(char) * @deprecated Replaced by isJavaIdentifierStart(char). * Determines if the specified character may be part of a Java * identifier as other than the first character. * A character may be part of a Java identifier if and only if any * of the following are true: * <li> it is a currency symbol (such as {@code '$'}) * <li> it is a connecting punctuation character (such as {@code '_'}) * <li> it is a numeric letter (such as a Roman numeral character) * <li> it is a combining mark * <li> it is a non-spacing mark * <li> {@code isIdentifierIgnorable} returns * {@code true} for the character. * @param ch the character to be tested. * @return {@code true} if the character may be part of a * Java identifier; {@code false} otherwise. * @see Character#isJavaLetter(char) * @see Character#isJavaIdentifierStart(char) * @see Character#isJavaIdentifierPart(char) * @see Character#isLetter(char) * @see Character#isLetterOrDigit(char) * @see Character#isUnicodeIdentifierPart(char) * @see Character#isIdentifierIgnorable(char) * @deprecated Replaced by isJavaIdentifierPart(char). * Determines if the specified character (Unicode code point) is an alphabet. * A character is considered to be alphabetic if its general category type, * provided by {@link Character#getType(int) getType(codePoint)}, is any of * <li> <code>UPPERCASE_LETTER</code> * <li> <code>LOWERCASE_LETTER</code> * <li> <code>TITLECASE_LETTER</code> * <li> <code>MODIFIER_LETTER</code> * <li> <code>OTHER_LETTER</code> * <li> <code>LETTER_NUMBER</code> * or it has contributory property Other_Alphabetic as defined by the * @param codePoint the character (Unicode code point) to be tested. * @return <code>true</code> if the character is a Unicode alphabet * character, <code>false</code> otherwise. * Determines if the specified character (Unicode code point) is a CJKV * (Chinese, Japanese, Korean and Vietnamese) ideograph, as defined by * @param codePoint the character (Unicode code point) to be tested. * @return <code>true</code> if the character is a Unicode ideograph * character, <code>false</code> otherwise. * Determines if the specified character is * permissible as the first character in a Java identifier. * A character may start a Java identifier if and only if * one of the following conditions is true: * <li> {@link #isLetter(char) isLetter(ch)} returns {@code true} * <li> {@link #getType(char) getType(ch)} returns {@code LETTER_NUMBER} * <li> {@code ch} is a currency symbol (such as {@code '$'}) * <li> {@code ch} is a connecting punctuation character (such as {@code '_'}). * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use * the {@link #isJavaIdentifierStart(int)} method. * @param ch the character to be tested. * @return {@code true} if the character may start a Java identifier; * {@code false} otherwise. * @see Character#isJavaIdentifierPart(char) * @see Character#isLetter(char) * @see Character#isUnicodeIdentifierStart(char) * @see javax.lang.model.SourceVersion#isIdentifier(CharSequence) * Determines if the character (Unicode code point) is * permissible as the first character in a Java identifier. * A character may start a Java identifier if and only if * one of the following conditions is true: * <li> {@link #isLetter(int) isLetter(codePoint)} * <li> {@link #getType(int) getType(codePoint)} * returns {@code LETTER_NUMBER} * <li> the referenced character is a currency symbol (such as {@code '$'}) * <li> the referenced character is a connecting punctuation character * @param codePoint the character (Unicode code point) to be tested. * @return {@code true} if the character may start a Java identifier; * {@code false} otherwise. * @see Character#isJavaIdentifierPart(int) * @see Character#isLetter(int) * @see Character#isUnicodeIdentifierStart(int) * @see javax.lang.model.SourceVersion#isIdentifier(CharSequence) * Determines if the specified character may be part of a Java * identifier as other than the first character. * A character may be part of a Java identifier if any of the following * <li> it is a currency symbol (such as {@code '$'}) * <li> it is a connecting punctuation character (such as {@code '_'}) * <li> it is a numeric letter (such as a Roman numeral character) * <li> it is a combining mark * <li> it is a non-spacing mark * <li> {@code isIdentifierIgnorable} returns * {@code true} for the character * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use * the {@link #isJavaIdentifierPart(int)} method. * @param ch the character to be tested. * @return {@code true} if the character may be part of a * Java identifier; {@code false} otherwise. * @see Character#isIdentifierIgnorable(char) * @see Character#isJavaIdentifierStart(char) * @see Character#isLetterOrDigit(char) * @see Character#isUnicodeIdentifierPart(char) * @see javax.lang.model.SourceVersion#isIdentifier(CharSequence) * Determines if the character (Unicode code point) may be part of a Java * identifier as other than the first character. * A character may be part of a Java identifier if any of the following * <li> it is a currency symbol (such as {@code '$'}) * <li> it is a connecting punctuation character (such as {@code '_'}) * <li> it is a numeric letter (such as a Roman numeral character) * <li> it is a combining mark * <li> it is a non-spacing mark * <li> {@link #isIdentifierIgnorable(int) * isIdentifierIgnorable(codePoint)} returns {@code true} for * @param codePoint the character (Unicode code point) to be tested. * @return {@code true} if the character may be part of a * Java identifier; {@code false} otherwise. * @see Character#isIdentifierIgnorable(int) * @see Character#isJavaIdentifierStart(int) * @see Character#isLetterOrDigit(int) * @see Character#isUnicodeIdentifierPart(int) * @see javax.lang.model.SourceVersion#isIdentifier(CharSequence) * Determines if the specified character is permissible as the * first character in a Unicode identifier. * A character may start a Unicode identifier if and only if * one of the following conditions is true: * <li> {@link #isLetter(char) isLetter(ch)} returns {@code true} * <li> {@link #getType(char) getType(ch)} returns * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use * the {@link #isUnicodeIdentifierStart(int)} method. * @param ch the character to be tested. * @return {@code true} if the character may start a Unicode * identifier; {@code false} otherwise. * @see Character#isJavaIdentifierStart(char) * @see Character#isLetter(char) * @see Character#isUnicodeIdentifierPart(char) * Determines if the specified character (Unicode code point) is permissible as the * first character in a Unicode identifier. * A character may start a Unicode identifier if and only if * one of the following conditions is true: * <li> {@link #isLetter(int) isLetter(codePoint)} * <li> {@link #getType(int) getType(codePoint)} * returns {@code LETTER_NUMBER}. * @param codePoint the character (Unicode code point) to be tested. * @return {@code true} if the character may start a Unicode * identifier; {@code false} otherwise. * @see Character#isJavaIdentifierStart(int) * @see Character#isLetter(int) * @see Character#isUnicodeIdentifierPart(int) * Determines if the specified character may be part of a Unicode * identifier as other than the first character. * A character may be part of a Unicode identifier if and only if * one of the following statements is true: * <li> it is a connecting punctuation character (such as {@code '_'}) * <li> it is a numeric letter (such as a Roman numeral character) * <li> it is a combining mark * <li> it is a non-spacing mark * <li> {@code isIdentifierIgnorable} returns * {@code true} for this character. * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use * the {@link #isUnicodeIdentifierPart(int)} method. * @param ch the character to be tested. * @return {@code true} if the character may be part of a * Unicode identifier; {@code false} otherwise. * @see Character#isIdentifierIgnorable(char) * @see Character#isJavaIdentifierPart(char) * @see Character#isLetterOrDigit(char) * @see Character#isUnicodeIdentifierStart(char) * Determines if the specified character (Unicode code point) may be part of a Unicode * identifier as other than the first character. * A character may be part of a Unicode identifier if and only if * one of the following statements is true: * <li> it is a connecting punctuation character (such as {@code '_'}) * <li> it is a numeric letter (such as a Roman numeral character) * <li> it is a combining mark * <li> it is a non-spacing mark * <li> {@code isIdentifierIgnorable} returns * {@code true} for this character. * @param codePoint the character (Unicode code point) to be tested. * @return {@code true} if the character may be part of a * Unicode identifier; {@code false} otherwise. * @see Character#isIdentifierIgnorable(int) * @see Character#isJavaIdentifierPart(int) * @see Character#isLetterOrDigit(int) * @see Character#isUnicodeIdentifierStart(int) * Determines if the specified character should be regarded as * an ignorable character in a Java identifier or a Unicode identifier. * The following Unicode characters are ignorable in a Java identifier * or a Unicode identifier: * <li>ISO control characters that are not whitespace * <li>{@code '\u005Cu0000'} through {@code '\u005Cu0008'} * <li>{@code '\u005Cu000E'} through {@code '\u005Cu001B'} * <li>{@code '\u005Cu007F'} through {@code '\u005Cu009F'} * <li>all characters that have the {@code FORMAT} general * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use * the {@link #isIdentifierIgnorable(int)} method. * @param ch the character to be tested. * @return {@code true} if the character is an ignorable control * character that may be part of a Java or Unicode identifier; * {@code false} otherwise. * @see Character#isJavaIdentifierPart(char) * @see Character#isUnicodeIdentifierPart(char) * Determines if the specified character (Unicode code point) should be regarded as * an ignorable character in a Java identifier or a Unicode identifier. * The following Unicode characters are ignorable in a Java identifier * or a Unicode identifier: * <li>ISO control characters that are not whitespace * <li>{@code '\u005Cu0000'} through {@code '\u005Cu0008'} * <li>{@code '\u005Cu000E'} through {@code '\u005Cu001B'} * <li>{@code '\u005Cu007F'} through {@code '\u005Cu009F'} * <li>all characters that have the {@code FORMAT} general * @param codePoint the character (Unicode code point) to be tested. * @return {@code true} if the character is an ignorable control * character that may be part of a Java or Unicode identifier; * {@code false} otherwise. * @see Character#isJavaIdentifierPart(int) * @see Character#isUnicodeIdentifierPart(int) * Converts the character argument to lowercase using case * mapping information from the UnicodeData file. * {@code Character.isLowerCase(Character.toLowerCase(ch))} * does not always return {@code true} for some ranges of * characters, particularly those that are symbols or ideographs. * <p>In general, {@link String#toLowerCase()} should be used to map * characters to lowercase. {@code String} case mapping methods * have several benefits over {@code Character} case mapping methods. * {@code String} case mapping methods can perform locale-sensitive * mappings, context-sensitive mappings, and 1:M character mappings, whereas * the {@code Character} case mapping methods cannot. * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use * the {@link #toLowerCase(int)} method. * @param ch the character to be converted. * @return the lowercase equivalent of the character, if any; * otherwise, the character itself. * @see Character#isLowerCase(char) * @see String#toLowerCase() * Converts the character (Unicode code point) argument to * lowercase using case mapping information from the UnicodeData * {@code Character.isLowerCase(Character.toLowerCase(codePoint))} * does not always return {@code true} for some ranges of * characters, particularly those that are symbols or ideographs. * <p>In general, {@link String#toLowerCase()} should be used to map * characters to lowercase. {@code String} case mapping methods * have several benefits over {@code Character} case mapping methods. * {@code String} case mapping methods can perform locale-sensitive * mappings, context-sensitive mappings, and 1:M character mappings, whereas * the {@code Character} case mapping methods cannot. * @param codePoint the character (Unicode code point) to be converted. * @return the lowercase equivalent of the character (Unicode code * point), if any; otherwise, the character itself. * @see Character#isLowerCase(int) * @see String#toLowerCase() * Converts the character argument to uppercase using case mapping * information from the UnicodeData file. * {@code Character.isUpperCase(Character.toUpperCase(ch))} * does not always return {@code true} for some ranges of * characters, particularly those that are symbols or ideographs. * <p>In general, {@link String#toUpperCase()} should be used to map * characters to uppercase. {@code String} case mapping methods * have several benefits over {@code Character} case mapping methods. * {@code String} case mapping methods can perform locale-sensitive * mappings, context-sensitive mappings, and 1:M character mappings, whereas * the {@code Character} case mapping methods cannot. * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use * the {@link #toUpperCase(int)} method. * @param ch the character to be converted. * @return the uppercase equivalent of the character, if any; * otherwise, the character itself. * @see Character#isUpperCase(char) * @see String#toUpperCase() * Converts the character (Unicode code point) argument to * uppercase using case mapping information from the UnicodeData * {@code Character.isUpperCase(Character.toUpperCase(codePoint))} * does not always return {@code true} for some ranges of * characters, particularly those that are symbols or ideographs. * <p>In general, {@link String#toUpperCase()} should be used to map * characters to uppercase. {@code String} case mapping methods * have several benefits over {@code Character} case mapping methods. * {@code String} case mapping methods can perform locale-sensitive * mappings, context-sensitive mappings, and 1:M character mappings, whereas * the {@code Character} case mapping methods cannot. * @param codePoint the character (Unicode code point) to be converted. * @return the uppercase equivalent of the character, if any; * otherwise, the character itself. * @see Character#isUpperCase(int) * @see String#toUpperCase() * Converts the character argument to titlecase using case mapping * information from the UnicodeData file. If a character has no * explicit titlecase mapping and is not itself a titlecase char * according to UnicodeData, then the uppercase mapping is * returned as an equivalent titlecase mapping. If the * {@code char} argument is already a titlecase * {@code char}, the same {@code char} value will be * {@code Character.isTitleCase(Character.toTitleCase(ch))} * does not always return {@code true} for some ranges of * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use * the {@link #toTitleCase(int)} method. * @param ch the character to be converted. * @return the titlecase equivalent of the character, if any; * otherwise, the character itself. * @see Character#isTitleCase(char) * @see Character#toLowerCase(char) * @see Character#toUpperCase(char) * Converts the character (Unicode code point) argument to titlecase using case mapping * information from the UnicodeData file. If a character has no * explicit titlecase mapping and is not itself a titlecase char * according to UnicodeData, then the uppercase mapping is * returned as an equivalent titlecase mapping. If the * character argument is already a titlecase * character, the same character value will be * {@code Character.isTitleCase(Character.toTitleCase(codePoint))} * does not always return {@code true} for some ranges of * @param codePoint the character (Unicode code point) to be converted. * @return the titlecase equivalent of the character, if any; * otherwise, the character itself. * @see Character#isTitleCase(int) * @see Character#toLowerCase(int) * @see Character#toUpperCase(int) * Returns the numeric value of the character {@code ch} in the * If the radix is not in the range {@code MIN_RADIX} ≤ * {@code radix} ≤ {@code MAX_RADIX} or if the * value of {@code ch} is not a valid digit in the specified * radix, {@code -1} is returned. A character is a valid digit * if at least one of the following is true: * <li>The method {@code isDigit} is {@code true} of the character * and the Unicode decimal digit value of the character (or its * single-character decomposition) is less than the specified radix. * In this case the decimal digit value is returned. * <li>The character is one of the uppercase Latin letters * {@code 'A'} through {@code 'Z'} and its code is less than * {@code radix + 'A' - 10}. * In this case, {@code ch - 'A' + 10} * <li>The character is one of the lowercase Latin letters * {@code 'a'} through {@code 'z'} and its code is less than * {@code radix + 'a' - 10}. * In this case, {@code ch - 'a' + 10} * <li>The character is one of the fullwidth uppercase Latin letters A * ({@code '\u005CuFF21'}) through Z ({@code '\u005CuFF3A'}) * and its code is less than * {@code radix + '\u005CuFF21' - 10}. * In this case, {@code ch - '\u005CuFF21' + 10} * <li>The character is one of the fullwidth lowercase Latin letters a * ({@code '\u005CuFF41'}) through z ({@code '\u005CuFF5A'}) * and its code is less than * {@code radix + '\u005CuFF41' - 10}. * In this case, {@code ch - '\u005CuFF41' + 10} * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use * the {@link #digit(int, int)} method. * @param ch the character to be converted. * @param radix the radix. * @return the numeric value represented by the character in the * @see Character#forDigit(int, int) * @see Character#isDigit(char) * Returns the numeric value of the specified character (Unicode * code point) in the specified radix. * <p>If the radix is not in the range {@code MIN_RADIX} ≤ * {@code radix} ≤ {@code MAX_RADIX} or if the * character is not a valid digit in the specified * radix, {@code -1} is returned. A character is a valid digit * if at least one of the following is true: * <li>The method {@link #isDigit(int) isDigit(codePoint)} is {@code true} of the character * and the Unicode decimal digit value of the character (or its * single-character decomposition) is less than the specified radix. * In this case the decimal digit value is returned. * <li>The character is one of the uppercase Latin letters * {@code 'A'} through {@code 'Z'} and its code is less than * {@code radix + 'A' - 10}. * In this case, {@code codePoint - 'A' + 10} * <li>The character is one of the lowercase Latin letters * {@code 'a'} through {@code 'z'} and its code is less than * {@code radix + 'a' - 10}. * In this case, {@code codePoint - 'a' + 10} * <li>The character is one of the fullwidth uppercase Latin letters A * ({@code '\u005CuFF21'}) through Z ({@code '\u005CuFF3A'}) * and its code is less than * {@code radix + '\u005CuFF21' - 10}. * {@code codePoint - '\u005CuFF21' + 10} * <li>The character is one of the fullwidth lowercase Latin letters a * ({@code '\u005CuFF41'}) through z ({@code '\u005CuFF5A'}) * and its code is less than * {@code radix + '\u005CuFF41'- 10}. * {@code codePoint - '\u005CuFF41' + 10} * @param codePoint the character (Unicode code point) to be converted. * @param radix the radix. * @return the numeric value represented by the character in the * @see Character#forDigit(int, int) * @see Character#isDigit(int) * Returns the {@code int} value that the specified Unicode * character represents. For example, the character * {@code '\u005Cu216C'} (the roman numeral fifty) will return * an int with a value of 50. * The letters A-Z in their uppercase ({@code '\u005Cu0041'} through * {@code '\u005Cu005A'}), lowercase * ({@code '\u005Cu0061'} through {@code '\u005Cu007A'}), and * full width variant ({@code '\u005CuFF21'} through * {@code '\u005CuFF3A'} and {@code '\u005CuFF41'} through * {@code '\u005CuFF5A'}) forms have numeric values from 10 * through 35. This is independent of the Unicode specification, * which does not assign numeric values to these {@code char} * If the character does not have a numeric value, then -1 is returned. * If the character has a numeric value that cannot be represented as a * nonnegative integer (for example, a fractional value), then -2 * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use * the {@link #getNumericValue(int)} method. * @param ch the character to be converted. * @return the numeric value of the character, as a nonnegative {@code int} * value; -2 if the character has a numeric value that is not a * nonnegative integer; -1 if the character has no numeric value. * @see Character#forDigit(int, int) * @see Character#isDigit(char) * Returns the {@code int} value that the specified * character (Unicode code point) represents. For example, the character * {@code '\u005Cu216C'} (the Roman numeral fifty) will return * an {@code int} with a value of 50. * The letters A-Z in their uppercase ({@code '\u005Cu0041'} through * {@code '\u005Cu005A'}), lowercase * ({@code '\u005Cu0061'} through {@code '\u005Cu007A'}), and * full width variant ({@code '\u005CuFF21'} through * {@code '\u005CuFF3A'} and {@code '\u005CuFF41'} through * {@code '\u005CuFF5A'}) forms have numeric values from 10 * through 35. This is independent of the Unicode specification, * which does not assign numeric values to these {@code char} * If the character does not have a numeric value, then -1 is returned. * If the character has a numeric value that cannot be represented as a * nonnegative integer (for example, a fractional value), then -2 * @param codePoint the character (Unicode code point) to be converted. * @return the numeric value of the character, as a nonnegative {@code int} * value; -2 if the character has a numeric value that is not a * nonnegative integer; -1 if the character has no numeric value. * @see Character#forDigit(int, int) * @see Character#isDigit(int) * Determines if the specified character is ISO-LATIN-1 white space. * This method returns {@code true} for the following five * <tr><td>{@code '\t'}</td> <td>{@code U+0009}</td> * <td>{@code HORIZONTAL TABULATION}</td></tr> * <tr><td>{@code '\n'}</td> <td>{@code U+000A}</td> * <td>{@code NEW LINE}</td></tr> * <tr><td>{@code '\f'}</td> <td>{@code U+000C}</td> * <td>{@code FORM FEED}</td></tr> * <tr><td>{@code '\r'}</td> <td>{@code U+000D}</td> * <td>{@code CARRIAGE RETURN}</td></tr> * <tr><td>{@code ' '}</td> <td>{@code U+0020}</td> * <td>{@code SPACE}</td></tr> * @param ch the character to be tested. * @return {@code true} if the character is ISO-LATIN-1 white * space; {@code false} otherwise. * @see Character#isSpaceChar(char) * @see Character#isWhitespace(char) * @deprecated Replaced by isWhitespace(char). (
1L <<
0x0020)) >>
ch) &
1L) !=
0);
* Determines if the specified character is a Unicode space character. * A character is considered to be a space character if and only if * it is specified to be a space character by the Unicode Standard. This * method returns true if the character's general category type is any of * <li> {@code SPACE_SEPARATOR} * <li> {@code LINE_SEPARATOR} * <li> {@code PARAGRAPH_SEPARATOR} * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use * the {@link #isSpaceChar(int)} method. * @param ch the character to be tested. * @return {@code true} if the character is a space character; * {@code false} otherwise. * @see Character#isWhitespace(char) * Determines if the specified character (Unicode code point) is a * Unicode space character. A character is considered to be a * space character if and only if it is specified to be a space * character by the Unicode Standard. This method returns true if * the character's general category type is any of the following: * <li> {@link #SPACE_SEPARATOR} * <li> {@link #LINE_SEPARATOR} * <li> {@link #PARAGRAPH_SEPARATOR} * @param codePoint the character (Unicode code point) to be tested. * @return {@code true} if the character is a space character; * {@code false} otherwise. * @see Character#isWhitespace(int) * Determines if the specified character is white space according to Java. * A character is a Java whitespace character if and only if it satisfies * one of the following criteria: * <li> It is a Unicode space character ({@code SPACE_SEPARATOR}, * {@code LINE_SEPARATOR}, or {@code PARAGRAPH_SEPARATOR}) * but is not also a non-breaking space ({@code '\u005Cu00A0'}, * {@code '\u005Cu2007'}, {@code '\u005Cu202F'}). * <li> It is {@code '\u005Ct'}, U+0009 HORIZONTAL TABULATION. * <li> It is {@code '\u005Cn'}, U+000A LINE FEED. * <li> It is {@code '\u005Cu000B'}, U+000B VERTICAL TABULATION. * <li> It is {@code '\u005Cf'}, U+000C FORM FEED. * <li> It is {@code '\u005Cr'}, U+000D CARRIAGE RETURN. * <li> It is {@code '\u005Cu001C'}, U+001C FILE SEPARATOR. * <li> It is {@code '\u005Cu001D'}, U+001D GROUP SEPARATOR. * <li> It is {@code '\u005Cu001E'}, U+001E RECORD SEPARATOR. * <li> It is {@code '\u005Cu001F'}, U+001F UNIT SEPARATOR. * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use * the {@link #isWhitespace(int)} method. * @param ch the character to be tested. * @return {@code true} if the character is a Java whitespace * character; {@code false} otherwise. * @see Character#isSpaceChar(char) * Determines if the specified character (Unicode code point) is * white space according to Java. A character is a Java * whitespace character if and only if it satisfies one of the * <li> It is a Unicode space character ({@link #SPACE_SEPARATOR}, * {@link #LINE_SEPARATOR}, or {@link #PARAGRAPH_SEPARATOR}) * but is not also a non-breaking space ({@code '\u005Cu00A0'}, * {@code '\u005Cu2007'}, {@code '\u005Cu202F'}). * <li> It is {@code '\u005Ct'}, U+0009 HORIZONTAL TABULATION. * <li> It is {@code '\u005Cn'}, U+000A LINE FEED. * <li> It is {@code '\u005Cu000B'}, U+000B VERTICAL TABULATION. * <li> It is {@code '\u005Cf'}, U+000C FORM FEED. * <li> It is {@code '\u005Cr'}, U+000D CARRIAGE RETURN. * <li> It is {@code '\u005Cu001C'}, U+001C FILE SEPARATOR. * <li> It is {@code '\u005Cu001D'}, U+001D GROUP SEPARATOR. * <li> It is {@code '\u005Cu001E'}, U+001E RECORD SEPARATOR. * <li> It is {@code '\u005Cu001F'}, U+001F UNIT SEPARATOR. * @param codePoint the character (Unicode code point) to be tested. * @return {@code true} if the character is a Java whitespace * character; {@code false} otherwise. * @see Character#isSpaceChar(int) * Determines if the specified character is an ISO control * character. A character is considered to be an ISO control * character if its code is in the range {@code '\u005Cu0000'} * through {@code '\u005Cu001F'} or in the range * {@code '\u005Cu007F'} through {@code '\u005Cu009F'}. * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use * the {@link #isISOControl(int)} method. * @param ch the character to be tested. * @return {@code true} if the character is an ISO control character; * {@code false} otherwise. * @see Character#isSpaceChar(char) * @see Character#isWhitespace(char) * Determines if the referenced character (Unicode code point) is an ISO control * character. A character is considered to be an ISO control * character if its code is in the range {@code '\u005Cu0000'} * through {@code '\u005Cu001F'} or in the range * {@code '\u005Cu007F'} through {@code '\u005Cu009F'}. * @param codePoint the character (Unicode code point) to be tested. * @return {@code true} if the character is an ISO control character; * {@code false} otherwise. * @see Character#isSpaceChar(int) * @see Character#isWhitespace(int) // (codePoint >= 0x00 && codePoint <= 0x1F) || // (codePoint >= 0x7F && codePoint <= 0x9F); * Returns a value indicating a character's general category. * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use * the {@link #getType(int)} method. * @param ch the character to be tested. * @return a value of type {@code int} representing the * character's general category. * @see Character#COMBINING_SPACING_MARK * @see Character#CONNECTOR_PUNCTUATION * @see Character#CURRENCY_SYMBOL * @see Character#DASH_PUNCTUATION * @see Character#DECIMAL_DIGIT_NUMBER * @see Character#ENCLOSING_MARK * @see Character#END_PUNCTUATION * @see Character#FINAL_QUOTE_PUNCTUATION * @see Character#INITIAL_QUOTE_PUNCTUATION * @see Character#LETTER_NUMBER * @see Character#LINE_SEPARATOR * @see Character#LOWERCASE_LETTER * @see Character#MATH_SYMBOL * @see Character#MODIFIER_LETTER * @see Character#MODIFIER_SYMBOL * @see Character#NON_SPACING_MARK * @see Character#OTHER_LETTER * @see Character#OTHER_NUMBER * @see Character#OTHER_PUNCTUATION * @see Character#OTHER_SYMBOL * @see Character#PARAGRAPH_SEPARATOR * @see Character#PRIVATE_USE * @see Character#SPACE_SEPARATOR * @see Character#START_PUNCTUATION * @see Character#SURROGATE * @see Character#TITLECASE_LETTER * @see Character#UNASSIGNED * @see Character#UPPERCASE_LETTER * Returns a value indicating a character's general category. * @param codePoint the character (Unicode code point) to be tested. * @return a value of type {@code int} representing the * character's general category. * @see Character#COMBINING_SPACING_MARK COMBINING_SPACING_MARK * @see Character#CONNECTOR_PUNCTUATION CONNECTOR_PUNCTUATION * @see Character#CONTROL CONTROL * @see Character#CURRENCY_SYMBOL CURRENCY_SYMBOL * @see Character#DASH_PUNCTUATION DASH_PUNCTUATION * @see Character#DECIMAL_DIGIT_NUMBER DECIMAL_DIGIT_NUMBER * @see Character#ENCLOSING_MARK ENCLOSING_MARK * @see Character#END_PUNCTUATION END_PUNCTUATION * @see Character#FINAL_QUOTE_PUNCTUATION FINAL_QUOTE_PUNCTUATION * @see Character#FORMAT FORMAT * @see Character#INITIAL_QUOTE_PUNCTUATION INITIAL_QUOTE_PUNCTUATION * @see Character#LETTER_NUMBER LETTER_NUMBER * @see Character#LINE_SEPARATOR LINE_SEPARATOR * @see Character#LOWERCASE_LETTER LOWERCASE_LETTER * @see Character#MATH_SYMBOL MATH_SYMBOL * @see Character#MODIFIER_LETTER MODIFIER_LETTER * @see Character#MODIFIER_SYMBOL MODIFIER_SYMBOL * @see Character#NON_SPACING_MARK NON_SPACING_MARK * @see Character#OTHER_LETTER OTHER_LETTER * @see Character#OTHER_NUMBER OTHER_NUMBER * @see Character#OTHER_PUNCTUATION OTHER_PUNCTUATION * @see Character#OTHER_SYMBOL OTHER_SYMBOL * @see Character#PARAGRAPH_SEPARATOR PARAGRAPH_SEPARATOR * @see Character#PRIVATE_USE PRIVATE_USE * @see Character#SPACE_SEPARATOR SPACE_SEPARATOR * @see Character#START_PUNCTUATION START_PUNCTUATION * @see Character#SURROGATE SURROGATE * @see Character#TITLECASE_LETTER TITLECASE_LETTER * @see Character#UNASSIGNED UNASSIGNED * @see Character#UPPERCASE_LETTER UPPERCASE_LETTER * Determines the character representation for a specific digit in * the specified radix. If the value of {@code radix} is not a * valid radix, or the value of {@code digit} is not a valid * digit in the specified radix, the null character * ({@code '\u005Cu0000'}) is returned. * The {@code radix} argument is valid if it is greater than or * equal to {@code MIN_RADIX} and less than or equal to * {@code MAX_RADIX}. The {@code digit} argument is valid if * {@code 0 <= digit < radix}. * If the digit is less than 10, then * {@code '0' + digit} is returned. Otherwise, the value * {@code 'a' + digit - 10} is returned. * @param digit the number to convert to a character. * @param radix the radix. * @return the {@code char} representation of the specified digit * in the specified radix. * @see Character#MIN_RADIX * @see Character#MAX_RADIX * @see Character#digit(char, int) return (
char)(
'0' +
digit);
return (
char)(
'a' -
10 +
digit);
* Returns the Unicode directionality property for the given * character. Character directionality is used to calculate the * visual ordering of text. The directionality value of undefined * {@code char} values is {@code DIRECTIONALITY_UNDEFINED}. * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use * the {@link #getDirectionality(int)} method. * @param ch {@code char} for which the directionality property * @return the directionality property of the {@code char} value. * @see Character#DIRECTIONALITY_UNDEFINED * @see Character#DIRECTIONALITY_LEFT_TO_RIGHT * @see Character#DIRECTIONALITY_RIGHT_TO_LEFT * @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC * @see Character#DIRECTIONALITY_EUROPEAN_NUMBER * @see Character#DIRECTIONALITY_EUROPEAN_NUMBER_SEPARATOR * @see Character#DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR * @see Character#DIRECTIONALITY_ARABIC_NUMBER * @see Character#DIRECTIONALITY_COMMON_NUMBER_SEPARATOR * @see Character#DIRECTIONALITY_NONSPACING_MARK * @see Character#DIRECTIONALITY_BOUNDARY_NEUTRAL * @see Character#DIRECTIONALITY_PARAGRAPH_SEPARATOR * @see Character#DIRECTIONALITY_SEGMENT_SEPARATOR * @see Character#DIRECTIONALITY_WHITESPACE * @see Character#DIRECTIONALITY_OTHER_NEUTRALS * @see Character#DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING * @see Character#DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE * @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING * @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE * @see Character#DIRECTIONALITY_POP_DIRECTIONAL_FORMAT * Returns the Unicode directionality property for the given * character (Unicode code point). Character directionality is * used to calculate the visual ordering of text. The * directionality value of undefined character is {@link * #DIRECTIONALITY_UNDEFINED}. * @param codePoint the character (Unicode code point) for which * the directionality property is requested. * @return the directionality property of the character. * @see Character#DIRECTIONALITY_UNDEFINED DIRECTIONALITY_UNDEFINED * @see Character#DIRECTIONALITY_LEFT_TO_RIGHT DIRECTIONALITY_LEFT_TO_RIGHT * @see Character#DIRECTIONALITY_RIGHT_TO_LEFT DIRECTIONALITY_RIGHT_TO_LEFT * @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC * @see Character#DIRECTIONALITY_EUROPEAN_NUMBER DIRECTIONALITY_EUROPEAN_NUMBER * @see Character#DIRECTIONALITY_EUROPEAN_NUMBER_SEPARATOR DIRECTIONALITY_EUROPEAN_NUMBER_SEPARATOR * @see Character#DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR * @see Character#DIRECTIONALITY_ARABIC_NUMBER DIRECTIONALITY_ARABIC_NUMBER * @see Character#DIRECTIONALITY_COMMON_NUMBER_SEPARATOR DIRECTIONALITY_COMMON_NUMBER_SEPARATOR * @see Character#DIRECTIONALITY_NONSPACING_MARK DIRECTIONALITY_NONSPACING_MARK * @see Character#DIRECTIONALITY_BOUNDARY_NEUTRAL DIRECTIONALITY_BOUNDARY_NEUTRAL * @see Character#DIRECTIONALITY_PARAGRAPH_SEPARATOR DIRECTIONALITY_PARAGRAPH_SEPARATOR * @see Character#DIRECTIONALITY_SEGMENT_SEPARATOR DIRECTIONALITY_SEGMENT_SEPARATOR * @see Character#DIRECTIONALITY_WHITESPACE DIRECTIONALITY_WHITESPACE * @see Character#DIRECTIONALITY_OTHER_NEUTRALS DIRECTIONALITY_OTHER_NEUTRALS * @see Character#DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING * @see Character#DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE * @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING * @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE * @see Character#DIRECTIONALITY_POP_DIRECTIONAL_FORMAT DIRECTIONALITY_POP_DIRECTIONAL_FORMAT * Determines whether the character is mirrored according to the * Unicode specification. Mirrored characters should have their * glyphs horizontally mirrored when displayed in text that is * right-to-left. For example, {@code '\u005Cu0028'} LEFT * PARENTHESIS is semantically defined to be an <i>opening * parenthesis</i>. This will appear as a "(" in text that is * left-to-right but as a ")" in text that is right-to-left. * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use * the {@link #isMirrored(int)} method. * @param ch {@code char} for which the mirrored property is requested * @return {@code true} if the char is mirrored, {@code false} * if the {@code char} is not mirrored or is not defined. * Determines whether the specified character (Unicode code point) * is mirrored according to the Unicode specification. Mirrored * characters should have their glyphs horizontally mirrored when * displayed in text that is right-to-left. For example, * {@code '\u005Cu0028'} LEFT PARENTHESIS is semantically * defined to be an <i>opening parenthesis</i>. This will appear * as a "(" in text that is left-to-right but as a ")" in text * @param codePoint the character (Unicode code point) to be tested. * @return {@code true} if the character is mirrored, {@code false} * if the character is not mirrored or is not defined. * Compares two {@code Character} objects numerically. * @param anotherCharacter the {@code Character} to be compared. * @return the value {@code 0} if the argument {@code Character} * is equal to this {@code Character}; a value less than * {@code 0} if this {@code Character} is numerically less * than the {@code Character} argument; and a value greater than * {@code 0} if this {@code Character} is numerically greater * than the {@code Character} argument (unsigned comparison). * Note that this is strictly a numerical comparison; it is not * Compares two {@code char} values numerically. * The value returned is identical to what would be returned by: * Character.valueOf(x).compareTo(Character.valueOf(y)) * @param x the first {@code char} to compare * @param y the second {@code char} to compare * @return the value {@code 0} if {@code x == y}; * a value less than {@code 0} if {@code x < y}; and * a value greater than {@code 0} if {@code x > y} public static int compare(
char x,
char y) {
* Converts the character (Unicode code point) argument to uppercase using * information from the UnicodeData file. * @param codePoint the character (Unicode code point) to be converted. * @return either the uppercase equivalent of the character, if * any, or an error flag ({@code Character.ERROR}) * that indicates that a 1:M {@code char} mapping exists. * @see Character#isLowerCase(char) * @see Character#isUpperCase(char) * @see Character#toLowerCase(char) * @see Character#toTitleCase(char) * Converts the character (Unicode code point) argument to uppercase using case * mapping information from the SpecialCasing file in the Unicode * specification. If a character has no explicit uppercase * mapping, then the {@code char} itself is returned in the * @param codePoint the character (Unicode code point) to be converted. * @return a {@code char[]} with the uppercased character. // As of Unicode 6.0, 1:M uppercasings only happen in the BMP. * The number of bits used to represent a <tt>char</tt> value in unsigned * binary form, constant {@code 16}. public static final int SIZE =
16;
* Returns the value obtained by reversing the order of the bytes in the * specified <tt>char</tt> value. * @return the value obtained by reversing (or, equivalently, swapping) * the bytes in the specified <tt>char</tt> value. return (
char) (((
ch &
0xFF00) >>
8) | (
ch <<
8));
* Returns the Unicode name of the specified character * {@code codePoint}, or null if the code point is * {@link #UNASSIGNED unassigned}. * Note: if the specified character is not assigned a name by * the <i>UnicodeData</i> file (part of the Unicode Character * Database maintained by the Unicode Consortium), the returned * name is the same as the result of expression. * Character.UnicodeBlock.of(codePoint).toString().replace('_', ' ') * + Integer.toHexString(codePoint).toUpperCase(Locale.ENGLISH); * @param codePoint the character (Unicode code point) * @return the Unicode name of the specified character, or null if * the code point is unassigned. * @exception IllegalArgumentException if the specified * {@code codePoint} is not a valid Unicode // should never come here