Scanner.java revision 408
4168N/A * Copyright 1999-2008 Sun Microsystems, Inc. All Rights Reserved. 3863N/A * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. 3863N/A * This code is free software; you can redistribute it and/or modify it 3863N/A * under the terms of the GNU General Public License version 2 only, as 3863N/A * published by the Free Software Foundation. Sun designates this 3863N/A * particular file as subject to the "Classpath" exception as provided 3863N/A * by Sun in the LICENSE file that accompanied this code. 3863N/A * This code is distributed in the hope that it will be useful, but WITHOUT 3863N/A * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 3863N/A * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 3863N/A * version 2 for more details (a copy is included in the LICENSE file that 3863N/A * You should have received a copy of the GNU General Public License version 3863N/A * 2 along with this work; if not, write to the Free Software Foundation, 3863N/A * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. 3863N/A * Please contact Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, 3863N/A * CA 95054 USA or visit www.sun.com if you need additional information or 3863N/A/** The lexical analyzer maps an input stream consisting of 3863N/A * ASCII characters and Unicode escapes into a token sequence. 4141N/A * <p><b>This is NOT part of any API supported by Sun Microsystems. If 3863N/A * you write code that depends on this, you do so at your own risk. 4064N/A * This code and its internal interfaces are subject to change or 4186N/A * deletion without notice.</b> 3863N/A /** A factory for creating scanners. */ 3863N/A /** The context key for the scanner factory. */ 3863N/A /** Get the Factory instance for this context. */ 3863N/A /** Create a new scanner factory. */ 3863N/A /* Output variables; set by nextToken(): 3863N/A /** The token, set by nextToken(). 3863N/A /** Allow hex floating-point literals. 3863N/A /** Allow underscores in literals. 3863N/A /** The source language setting. 3863N/A /** The token's position, 0-based offset from beginning of text. 4066N/A /** Character position just after the last character of the token. 4066N/A /** The last character position of the previous token. 4066N/A /** The position where a lexical error occurred; 4066N/A /** The name of an identifier or token: 4066N/A /** The radix of a numeric literal token. 3863N/A /** Has a @deprecated been encountered in last doc comment? 4064N/A * this needs to be reset by client. 3863N/A /** A character buffer for literals. 3863N/A /** The input buffer, index of next chacter to be read, 3863N/A * index of one past last character in buffer. 4064N/A /** The buffer index of the last converted unicode character 3863N/A /** The log to be used for error reporting. 3863N/A /** Common code for constructors. */ 3863N/A /** Create a scanner from the input buffer. buffer must implement 4064N/A * array() and compact(), and remaining() must be less than limit(). 4064N/A * Create a scanner from the input array. This method might 4066N/A * modify the array. To avoid copying the input array, ensure 3863N/A * that {@code inputLength < input.length} or 4064N/A * {@code input[input.length -1]} is a white space character. 4064N/A * @param fac the factory which created this Scanner 4064N/A * @param input the input, might be modified 4064N/A * @param inputLength the size of the input. 4064N/A * Must be positive and less than or equal to input.length. 3863N/A /** Report an error at the given position using the provided arguments. 3863N/A /** Report an error at the current token position using the provided 4064N/A /** Convert an ASCII digit from its base (8, 10, or 16) 4064N/A /** Convert unicode escape; bp points to initial '\' character 4064N/A /** Read next character in comment, skipping over double '\' characters. 4064N/A /** Append a character to sbuf. 4064N/A /** For debugging purposes: print character. 4064N/A /** Read next character in character or string literal and copy into sbuf. 3863N/A case '0':
case '1':
case '2':
case '3':
3863N/A case '4':
case '5':
case '6':
case '7':
3863N/A case '|':
case ',':
case '?':
case '%':
3863N/A case '^':
case '_':
case '{':
case '}':
3863N/A case '!':
case '-':
case '=':
4064N/A // Accept escape sequences for dangerous bytecode chars. 3863N/A // This is illegal in normal Java string or character literals. 3863N/A // Note that the escape sequence itself is passed through. 3863N/A /** Read next character in an exotic name #"foo" 3863N/A // reject any "dangerous" char which is illegal somewhere in the JVM spec 3863N/A case '/':
case '.':
case ';':
// illegal everywhere 3863N/A case '<':
case '>':
// illegal in methods, dangerous in classes 3863N/A case '[':
// illegal in classes 3863N/A /** Read fractional part of hexadecimal floating point number. /** Read fractional part of floating point number. if (
'0' <=
ch &&
ch <=
'9') {
if (
ch ==
'e' ||
ch ==
'E') {
if (
ch ==
'+' ||
ch ==
'-') {
if (
'0' <=
ch &&
ch <=
'9') {
/** Read fractional part and 'd' or 'f' suffix of floating point number. if (
ch ==
'f' ||
ch ==
'F') {
if (
ch ==
'd' ||
ch ==
'D') {
/** Read fractional part and 'd' or 'f' suffix of floating point number. * @param radix The radix of the number; one of 2, j8, 10, 16. // for octal, allow base-10 digit in case it's a float literal (
ch ==
'e' ||
ch ==
'E' ||
ch ==
'f' ||
ch ==
'F' ||
ch ==
'd' ||
ch ==
'D')) {
if (
ch ==
'l' ||
ch ==
'L') {
// optimization, was: putChar(ch); case 'A':
case 'B':
case 'C':
case 'D':
case 'E':
case 'F':
case 'G':
case 'H':
case 'I':
case 'J':
case 'K':
case 'L':
case 'M':
case 'N':
case 'O':
case 'P':
case 'Q':
case 'R':
case 'S':
case 'T':
case 'U':
case 'V':
case 'W':
case 'X':
case 'Y':
case 'a':
case 'b':
case 'c':
case 'd':
case 'e':
case 'f':
case 'g':
case 'h':
case 'i':
case 'j':
case 'k':
case 'l':
case 'm':
case 'n':
case 'o':
case 'p':
case 'q':
case 'r':
case 's':
case 't':
case 'u':
case 'v':
case 'w':
case 'x':
case 'y':
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
case '\u0000':
case '\u0001':
case '\u0002':
case '\u0003':
case '\u0004':
case '\u0005':
case '\u0006':
case '\u0007':
case '\u0008':
case '\u000E':
case '\u000F':
case '\u0010':
case '\u0011':
case '\u0012':
case '\u0013':
case '\u0014':
case '\u0015':
case '\u0016':
case '\u0017':
case '\u0018':
case '\u0019':
case '\u001B':
case '\u001A':
// EOI is also a legal identifier part // all ASCII range chars already handled, above /** Are surrogates supported? /** Scan surrogate pairs. If 'ch' is a high surrogate and * the next character is a low surrogate, then put the low * surrogate in 'ch', and return the high surrogate. * otherwise, just return 0. /** Return true if ch can be part of an operator. case '!':
case '%':
case '&':
case '*':
case '?':
case '+':
case '-':
case ':':
case '<':
case '=':
case '>':
case '^':
case '|':
case '~':
/** Read longest possible sequence of special characters and convert * Scan a documention comment; determine if a deprecated tag is present. * Called once the initial /, * have been skipped, positioned at the second * * (which is treated as the beginning of the first line). * Stops positioned at the closing '/'. // Skip optional WhiteSpace at beginning of line // Skip optional consecutive Stars // Skip optional WhiteSpace after Stars // At beginning of line in the JavaDoc sense. /* fall through to LF case */ /** The value of a literal token, recorded as a string. * For integers, leading 0x and 'l' suffixes are suppressed. }
while (
ch ==
' ' ||
ch ==
'\t' ||
ch ==
FF);
case 'A':
case 'B':
case 'C':
case 'D':
case 'E':
case 'F':
case 'G':
case 'H':
case 'I':
case 'J':
case 'K':
case 'L':
case 'M':
case 'N':
case 'O':
case 'P':
case 'Q':
case 'R':
case 'S':
case 'T':
case 'U':
case 'V':
case 'W':
case 'X':
case 'Y':
case 'a':
case 'b':
case 'c':
case 'd':
case 'e':
case 'f':
case 'g':
case 'h':
case 'i':
case 'j':
case 'k':
case 'l':
case 'm':
case 'n':
case 'o':
case 'p':
case 'q':
case 'r':
case 's':
case 't':
case 'u':
case 'v':
case 'w':
case 'x':
case 'y':
if (
ch ==
'x' ||
ch ==
'X') {
}
else if (
digit(
16) <
0) {
}
else if (
ch ==
'b' ||
ch ==
'B') {
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
if (
'0' <=
ch &&
ch <=
'9') {
// all ASCII range chars already handled, above /** Return the current token, set by nextToken(). /** Sets the current token. /** Return the current token's position: a 0-based * offset from beginning of the raw input stream * (before unicode translation) /** Return the last character position of the current token. /** Return the last character position of the previous token. /** Return the position where a lexical error occurred; /** Set the position where a lexical error occurred; /** Return the name of an identifier or token for the current token. /** Return the radix of a numeric literal token. /** Has a @deprecated been encountered in last doc comment? * This needs to be reset by client with resetDeprecatedFlag. * Returns the documentation string of the current token. * Returns a copy of the input buffer, up to its inputLength. * Unicode escape sequences are not translated. * Returns a copy of a character array subset of the input buffer. * The returned array begins at the <code>beginIndex</code> and * extends to the character at index <code>endIndex - 1</code>. * Thus the length of the substring is <code>endIndex-beginIndex</code>. * <code>String.substring(beginIndex, endIndex)</code>. * Unicode escape sequences are not translated. * @param beginIndex the beginning index, inclusive. * @param endIndex the ending index, exclusive. * @throws IndexOutOfBounds if either offset is outside of the * Called when a complete comment has been scanned. pos and endPos * will mark the comment boundary. * Called when a complete whitespace run has been scanned. pos and endPos * will mark the whitespace boundary. * Called when a line terminator has been processed. /** Build a map for translating between line numbers and * positions in the input.