3793N/A * Copyright (c) 2007, 2011, Oracle and/or its affiliates. All rights reserved. 1193N/A * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. 1193N/A * This code is free software; you can redistribute it and/or modify it 1193N/A * under the terms of the GNU General Public License version 2 only, as 2362N/A * published by the Free Software Foundation. Oracle designates this 1193N/A * particular file as subject to the "Classpath" exception as provided 2362N/A * by Oracle in the LICENSE file that accompanied this code. 1193N/A * This code is distributed in the hope that it will be useful, but WITHOUT 1193N/A * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 1193N/A * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 1193N/A * version 2 for more details (a copy is included in the LICENSE file that 1193N/A * You should have received a copy of the GNU General Public License version 1193N/A * 2 along with this work; if not, write to the Free Software Foundation, 1193N/A * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. 2362N/A * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA 2362N/A * or visit www.oracle.com if you need additional information or have any 1193N/A * Utility routines for dealing with bytecode-level names. 1193N/A * Includes universal mangling rules for the JVM. 1193N/A * <h3>Avoiding Dangerous Characters </h3> 1193N/A * The JVM defines a very small set of characters which are illegal 1193N/A * in name spellings. We will slightly extend and regularize this set 1193N/A * into a group of <cite>dangerous characters</cite>. 1193N/A * These characters will then be replaced, in mangled names, by escape sequences. 1193N/A * In addition, accidental escape sequences must be further escaped. 1193N/A * Finally, a special prefix will be applied if and only if 1193N/A * the mangling would otherwise fail to begin with the escape character. 1193N/A * This happens to cover the corner case of the null string, 1193N/A * and also clearly marks symbols which need demangling. 1193N/A * Dangerous characters are the union of all characters forbidden 1193N/A * or otherwise restricted by the JVM specification, 1193N/A * plus their mates, if they are brackets 1193N/A * (<code><big><b>[</b></big></code> and <code><big><b>]</b></big></code>, 1193N/A * <code><big><b><</b></big></code> and <code><big><b>></b></big></code>), 1193N/A * plus, arbitrarily, the colon character <code><big><b>:</b></big></code>. 1193N/A * There is no distinction between type, method, and field names. 1193N/A * This makes it easier to convert between mangled names of different 1193N/A * types, since they do not need to be decoded (demangled). 1193N/A * The escape character is backslash <code><big><b>\</b></big></code> 1193N/A * (also known as reverse solidus). 1193N/A * This character is, until now, unheard of in bytecode names, 1193N/A * but traditional in the proposed role. 1193N/A * <h3> Replacement Characters </h3> 1193N/A * Every escape sequence is two characters 1193N/A * (in fact, two UTF8 bytes) beginning with 1193N/A * the escape character and followed by a 1193N/A * <cite>replacement character</cite>. 1193N/A * (Since the replacement character is never a backslash, 1193N/A * iterated manglings do not double in size.) 1193N/A * Each dangerous character has some rough visual similarity 1193N/A * to its corresponding replacement character. 1193N/A * This makes mangled symbols easier to recognize by sight. 1193N/A * The dangerous characters are 1193N/A * <code><big><b>/</b></big></code> (forward slash, used to delimit package components), 1193N/A * <code><big><b>.</b></big></code> (dot, also a package delimiter), 1193N/A * <code><big><b>;</b></big></code> (semicolon, used in signatures), 1193N/A * <code><big><b>$</b></big></code> (dollar, used in inner classes and synthetic members), 1193N/A * <code><big><b><</b></big></code> (left angle), 1193N/A * <code><big><b>></b></big></code> (right angle), 1193N/A * <code><big><b>[</b></big></code> (left square bracket, used in array types), 1193N/A * <code><big><b>]</b></big></code> (right square bracket, reserved in this scheme for language use), 1193N/A * and <code><big><b>:</b></big></code> (colon, reserved in this scheme for language use). 1193N/A * Their replacements are, respectively, 1193N/A * <code><big><b>|</b></big></code> (vertical bar), 1193N/A * <code><big><b>,</b></big></code> (comma), 1193N/A * <code><big><b>?</b></big></code> (question mark), 1193N/A * <code><big><b>%</b></big></code> (percent), 1193N/A * <code><big><b>^</b></big></code> (caret), 1193N/A * <code><big><b>_</b></big></code> (underscore), and 1193N/A * <code><big><b>{</b></big></code> (left curly bracket), 1193N/A * <code><big><b>}</b></big></code> (right curly bracket), 1193N/A * <code><big><b>!</b></big></code> (exclamation mark). 1193N/A * In addition, the replacement character for the escape character itself is 1193N/A * <code><big><b>-</b></big></code> (hyphen), 1193N/A * and the replacement character for the null prefix is 1193N/A * <code><big><b>=</b></big></code> (equal sign). 1193N/A * An escape character <code><big><b>\</b></big></code> 1193N/A * followed by any of these replacement characters 1193N/A * is an escape sequence, and there are no other escape sequences. 1193N/A * An equal sign is only part of an escape sequence 1193N/A * if it is the second character in the whole string, following a backslash. 1193N/A * Two consecutive backslashes do <em>not</em> form an escape sequence. 1193N/A * Each escape sequence replaces a so-called <cite>original character</cite> 1193N/A * which is either one of the dangerous characters or the escape character. 1193N/A * A null prefix replaces an initial null string, not a character. 1193N/A * All this implies that escape sequences cannot overlap and may be 1193N/A * determined all at once for a whole string. Note that a spelling 1193N/A * string can contain <cite>accidental escapes</cite>, apparent escape 1193N/A * sequences which must not be interpreted as manglings. 1193N/A * These are disabled by replacing their leading backslash with an 1193N/A * escape sequence (<code><big><b>\-</b></big></code>). To mangle a string, three logical steps 1193N/A * are required, though they may be carried out in one pass: 1193N/A * <li>In each accidental escape, replace the backslash with an escape sequence 1193N/A * (<code><big><b>\-</b></big></code>).</li> 1193N/A * <li>Replace each dangerous character with an escape sequence 1193N/A * (<code><big><b>\|</b></big></code> for <code><big><b>/</b></big></code>, etc.).</li> 1193N/A * <li>If the first two steps introduced any change, <em>and</em> 1193N/A * if the string does not already begin with a backslash, prepend a null prefix (<code><big><b>\=</b></big></code>).</li> 1193N/A * To demangle a mangled string that begins with an escape, 1193N/A * remove any null prefix, and then replace (in parallel) 1193N/A * each escape sequence by its original character. 1193N/A * <p>Spelling strings which contain accidental 1193N/A * escapes <em>must</em> have them replaced, even if those 1193N/A * strings do not contain dangerous characters. 1193N/A * This restriction means that mangling a string always 1193N/A * requires a scan of the string for escapes. 1193N/A * But then, a scan would be required anyway, 1193N/A * to check for dangerous characters. 1193N/A * <h3> Nice Properties </h3> 1193N/A * If a bytecode name does not contain any escape sequence, 1193N/A * demangling is a no-op: The string demangles to itself. 1193N/A * Such a string is called <cite>self-mangling</cite>. 1193N/A * Almost all strings are self-mangling. 1193N/A * In practice, to demangle almost any name “found in nature”, 1193N/A * simply verify that it does not begin with a backslash. 1193N/A * Mangling is a one-to-one function, while demangling 1193N/A * is a many-to-one function. 1193N/A * A mangled string is defined as <cite>validly mangled</cite> if 1193N/A * it is in fact the unique mangling of its spelling string. 1193N/A * Three examples of invalidly mangled strings are <code><big><b>\=foo</b></big></code>, 1193N/A * <code><big><b>\-bar</b></big></code>, and <code><big><b>baz\!</b></big></code>, which demangle to <code><big><b>foo</b></big></code>, <code><big><b>\bar</b></big></code>, and 1193N/A * <code><big><b>baz\!</b></big></code>, but then remangle to <code><big><b>foo</b></big></code>, <code><big><b>\bar</b></big></code>, and <code><big><b>\=baz\-!</b></big></code>. 1193N/A * If a language back-end or runtime is using mangled names, 1193N/A * it should never present an invalidly mangled bytecode 1193N/A * name to the JVM. If the runtime encounters one, 1193N/A * it should also report an error, since such an occurrence 1193N/A * probably indicates a bug in name encoding which 1193N/A * will lead to errors in linkage. 1193N/A * However, this note does not propose that the JVM verifier 1193N/A * detect invalidly mangled names. 1193N/A * As a result of these rules, it is a simple matter to 1193N/A * compute validly mangled substrings and concatenations 1193N/A * of validly mangled strings, and (with a little care) 1193N/A * these correspond to corresponding operations on their 1193N/A * <li>Any prefix of a validly mangled string is also validly mangled, 1193N/A * although a null prefix may need to be removed.</li> 1193N/A * <li>Any suffix of a validly mangled string is also validly mangled, 1193N/A * although a null prefix may need to be added.</li> 1193N/A * <li>Two validly mangled strings, when concatenated, 1193N/A * are also validly mangled, although any null prefix 1193N/A * must be removed from the second string, 1193N/A * and a trailing backslash on the first string may need escaping, 1193N/A * if it would participate in an accidental escape when followed 1193N/A * by the first character of the second string.</li> 1193N/A * <p>If languages that include non-Java symbol spellings use this 1193N/A * mangling convention, they will enjoy the following advantages: 1193N/A * <li>They can interoperate via symbols they share in common.</li> 1193N/A * <li>Low-level tools, such as backtrace printers, will have readable displays.</li> 1193N/A * <li>Future JVM and language extensions can safely use the dangerous characters 1193N/A * for structuring symbols, but will never interfere with valid spellings.</li> 1193N/A * <li>Runtimes and compilers can use standard libraries for mangling and demangling.</li> 1193N/A * <li>Occasional transliterations and name composition will be simple and regular, 1193N/A * for classes, methods, and fields.</li> 1193N/A * <li>Bytecode names will continue to be compact. 1193N/A * When mangled, spellings will at most double in length, either in 1193N/A * UTF8 or UTF16 format, and most will not change at all.</li> 1193N/A * <h3> Suggestions for Human Readable Presentations </h3> 1193N/A * For human readable displays of symbols, 1193N/A * it will be better to present a string-like quoted 1193N/A * representation of the spelling, because JVM users 1193N/A * are generally familiar with such tokens. 1193N/A * We suggest using single or double quotes before and after 1193N/A * mangled symbols which are not valid Java identifiers, 1193N/A * with quotes, backslashes, and non-printing characters 1193N/A * escaped as if for literals in the Java language. 1193N/A * For example, an HTML-like spelling 1193N/A * <code><big><b><pre></b></big></code> mangles to 1193N/A * <code><big><b>\^pre\_</b></big></code> and could 1193N/A * <code><big><b>'<pre>'</b></big></code>, 1193N/A * with the quotes included. 1193N/A * Such string-like conventions are <em>not</em> suitable 1193N/A * for mangled bytecode names, in part because 1193N/A * dangerous characters must be eliminated, rather 1193N/A * than just quoted. Otherwise internally structured 1193N/A * strings like package prefixes and method signatures 1193N/A * could not be reliably parsed. 1193N/A * In such human-readable displays, invalidly mangled 1193N/A * names should <em>not</em> be demangled and quoted, 1193N/A * for this would be misleading. Likewise, JVM symbols 1193N/A * which contain dangerous characters (like dots in field 1193N/A * names or brackets in method names) should not be 1193N/A * simply quoted. The bytecode names 1193N/A * <code><big><b>\=phase\,1</b></big></code> and 1193N/A * <code><big><b>phase.1</b></big></code> are distinct, 1193N/A * and in demangled displays they should be presented as 1193N/A * <code><big><b>'phase.1'</b></big></code> and something like 1193N/A * <code><big><b>'phase'.1</b></big></code>, respectively. 1193N/A /** Given a source name, produce the corresponding bytecode name. 1193N/A * The source name should not be qualified, because any syntactic 1193N/A * markers (dots, slashes, dollar signs, colons, etc.) will be mangled. 1193N/A * @return a valid bytecode name which represents the source name 1193N/A /** Given an unqualified bytecode name, produce the corresponding source name. 1193N/A * The bytecode name must not contain dangerous characters. 1193N/A * In particular, it must not be qualified or segmented by colon {@code ':'}. 1193N/A * @param s the bytecode name 1193N/A * @return the source name, which may possibly have unsafe characters 1193N/A * @throws IllegalArgumentException if the bytecode name is not {@link #isSafeBytecodeName safe} 1193N/A * @see #isSafeBytecodeName(java.lang.String) 1193N/A * Given a bytecode name from a classfile, separate it into 1193N/A * components delimited by dangerous characters. 1193N/A * Each resulting array element will be either a dangerous character, 1193N/A * or else a safe bytecode name. 1193N/A * (The safe name might possibly be mangled to hide further dangerous characters.) 1193N/A * will be parsed into the array {@code {"java", '/', "lang", '/', "String"}}. 1193N/A * The name {@code <init>} will be parsed into { '<', "init", '>'}} 1193N/A * {@code {"foo", '/', "bar", '$', ':', "baz"}}. 2040N/A * The name {@code ::\=:foo:\=bar\!baz} will be parsed into 2040N/A * {@code {':', ':', "", ':', "foo", ':', "bar:baz"}}. 1193N/A // got to end of string or next dangerous char 1193N/A // between passes, build the result array 1193N/A * Given a series of components, create a bytecode name for a classfile. 1193N/A * This is the inverse of {@link #parseBytecodeName(java.lang.String)}. 1193N/A * Each component must either be an interned one-character string of 1193N/A * a dangerous character, or else a safe bytecode name. 1193N/A * @param components a series of name components 1193N/A * @return the concatenation of all components 1193N/A * @throws IllegalArgumentException if any component contains an unsafe 1193N/A * character, and is not an interned one-character string 1193N/A * @throws NullPointerException if any component is null 1193N/A * Given a bytecode name, produce the corresponding display name. 1193N/A * This is the source name, plus quotes if needed. 1193N/A * If the bytecode name contains dangerous characters, 1193N/A * assume that they are being used as punctuation, 1193N/A * and pass them through unchanged. 2040N/A * Non-empty runs of non-dangerous characters are demangled 2040N/A * if necessary, and the resulting names are quoted if 2040N/A * they are not already valid Java identifiers, or if 2040N/A * they contain a dangerous character (i.e., dollar sign "$"). 2040N/A * Single quotes are used when quoting. 2040N/A * Within quoted names, embedded single quotes and backslashes 2040N/A * are further escaped by prepended backslashes. 1193N/A * @param s the original bytecode name (which may be qualified) 1193N/A * @return a human-readable presentation 2040N/A // note that the name is already demangled! 1193N/A // TO DO: Replace wierd characters in s by C-style escapes. 1193N/A * Report whether a simple name is safe as a bytecode name. 1193N/A * Such names are acceptable in class files as class, method, and field names. 1193N/A * Additionally, they are free of "dangerous" characters, even if those 1193N/A * characters are legal in some (or all) names in class files. 1193N/A * @param s the proposed bytecode name 1193N/A * @return true if the name is non-empty and all of its characters are safe 1193N/A // check occurrences of each DANGEROUS char 1193N/A * Report whether a character is safe in a bytecode name. 1193N/A * This is true of any unicode character except the following 1193N/A * <em>dangerous characters</em>: {@code ".;:$[]<>/"}. 1193N/A * @param s the proposed character 1193N/A * @return true if the character is safe to use in classfiles 1193N/A // build this lazily, when we first need an escape: 1193N/A // build sb if this is the first escape 1193N/A // mangled names must begin with a backslash: 1193N/A // append the string so far, which is unremarkable: 1193N/A // rewrite \ to \-, / to \|, etc. 1193N/A // build this lazily, when we first meet an escape: 1193N/A // might be an escape sequence 1193N/A // build sb if this is the first escape 1193N/A // append the string so far, which is unremarkable: 1193N/A ++i;
// skip both characters 1193N/A // empty escape sequence to avoid a null name or illegal prefix 1193N/A //System.out.println("SPECIAL = "+SPECIAL);