286N/A * reserved comment block 286N/A * DO NOT REMOVE OR ALTER! 286N/A * Copyright 1999-2002,2004,2005 The Apache Software Foundation. 286N/A * Licensed under the Apache License, Version 2.0 (the "License"); 286N/A * you may not use this file except in compliance with the License. 286N/A * You may obtain a copy of the License at 286N/A * Unless required by applicable law or agreed to in writing, software 286N/A * distributed under the License is distributed on an "AS IS" BASIS, 286N/A * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 286N/A * See the License for the specific language governing permissions and 286N/A * limitations under the License. 286N/A * A regular expression matching engine using Non-deterministic Finite Automaton (NFA). 286N/A * This engine does not conform to the POSIX regular expression. 286N/A * RegularExpression re = new RegularExpression(<var>regex</var>); 286N/A * if (re.matches(text)) { ... } 286N/A * <dt>B. Capturing groups 286N/A * RegularExpression re = new RegularExpression(<var>regex</var>); 286N/A * Match match = new Match(); 286N/A * if (re.matches(text, match)) { 286N/A * ... // You can refer captured texts with methods of the <code>Match</code> class. 286N/A * <h4>Case-insensitive matching</h4> 286N/A * RegularExpression re = new RegularExpression(<var>regex</var>, "i"); 286N/A * if (re.matches(text) >= 0) { ...} 286N/A * <p>You can specify options to <a href="#RegularExpression(java.lang.String, java.lang.String)"><code>RegularExpression(</code><var>regex</var><code>, </code><var>options</var><code>)</code></a> 286N/A * or <a href="#setPattern(java.lang.String, java.lang.String)"><code>setPattern(</code><var>regex</var><code>, </code><var>options</var><code>)</code></a>. 286N/A * This <var>options</var> parameter consists of the following characters. 286N/A * <dt><a name="I_OPTION"><code>"i"</code></a> 286N/A * <dd>This option indicates case-insensitive matching. 286N/A * <dt><a name="M_OPTION"><code>"m"</code></a> 286N/A * <dd class="REGEX"><kbd>^</kbd> and <kbd>$</kbd> consider the EOL characters within the text. 286N/A * <dt><a name="S_OPTION"><code>"s"</code></a> 286N/A * <dd class="REGEX"><kbd>.</kbd> matches any one character. 286N/A * <dt><a name="U_OPTION"><code>"u"</code></a> 286N/A * <dd class="REGEX">Redefines <Kbd>\d \D \w \W \s \S \b \B \< \></kbd> as becoming to Unicode. 286N/A * <dt><a name="W_OPTION"><code>"w"</code></a> 286N/A * <dd class="REGEX">By this option, <kbd>\b \B \< \></kbd> are processed with the method of 286N/A * 'Unicode Regular Expression Guidelines' Revision 4. 286N/A * When "w" and "u" are specified at the same time, 286N/A * <kbd>\b \B \< \></kbd> are processed for the "w" option. 286N/A * <dt><a name="COMMA_OPTION"><code>","</code></a> 286N/A * <dd>The parser treats a comma in a character class as a range separator. 286N/A * <kbd class="REGEX">[a,b]</kbd> matches <kbd>a</kbd> or <kbd>,</kbd> or <kbd>b</kbd> without this option. 286N/A * <kbd class="REGEX">[a,b]</kbd> matches <kbd>a</kbd> or <kbd>b</kbd> with this option. 286N/A * <dt><a name="X_OPTION"><code>"X"</code></a> 286N/A * The <code>match()</code> method does not do subsring matching 286N/A * but entire string matching. 286N/A * <table border="1" bgcolor="#ddeeff"> 286N/A * <h4>Differences from the Perl 5 regular expression</h4> 286N/A * <li>There is 6-digit hexadecimal character representation (<kbd>\u005cv</kbd><var>HHHHHH</var>.) 286N/A * <li>Supports subtraction, union, and intersection operations for character classes. 286N/A * <li>Not supported: <kbd>\</kbd><var>ooo</var> (Octal character representations), 286N/A * <Kbd>\G</kbd>, <kbd>\C</kbd>, <kbd>\l</kbd><var>c</var>, 286N/A * <kbd>\u005c u</kbd><var>c</var>, <kbd>\L</kbd>, <kbd>\U</kbd>, 286N/A * <kbd>\E</kbd>, <kbd>\Q</kbd>, <kbd>\N{</kbd><var>name</var><kbd>}</kbd>, 286N/A * <Kbd>(?{<kbd><var>code</var><kbd>})</kbd>, <Kbd>(??{<kbd><var>code</var><kbd>})</kbd> 286N/A * <P>Meta characters are `<KBD>. * + ? { [ ( ) | \ ^ $</KBD>'.</P> 286N/A * <dt class="REGEX"><kbd>.</kbd> (A period) 286N/A * <dd>Matches any one character except the following characters. 286N/A * <dd>LINE FEED (U+000A), CARRIAGE RETURN (U+000D), 286N/A * PARAGRAPH SEPARATOR (U+2029), LINE SEPARATOR (U+2028) 286N/A * <dd>This expression matches one code point in Unicode. It can match a pair of surrogates. 286N/A * <dd>When <a href="#S_OPTION">the "s" option</a> is specified, 286N/A * it matches any character including the above four characters. 286N/A * <dt class="REGEX"><Kbd>\e \f \n \r \t</kbd> 286N/A * <dd>Matches ESCAPE (U+001B), FORM FEED (U+000C), LINE FEED (U+000A), 286N/A * CARRIAGE RETURN (U+000D), HORIZONTAL TABULATION (U+0009) 286N/A * <dt class="REGEX"><kbd>\c</kbd><var>C</var> 286N/A * <dd>Matches a control character. 286N/A * The <var>C</var> must be one of '<kbd>@</kbd>', '<kbd>A</kbd>'-'<kbd>Z</kbd>', 286N/A * '<kbd>[</kbd>', '<kbd>\u005c</kbd>', '<kbd>]</kbd>', '<kbd>^</kbd>', '<kbd>_</kbd>'. 286N/A * It matches a control character of which the character code is less than 286N/A * the character code of the <var>C</var> by 0x0040. 286N/A * <dd class="REGEX">For example, a <kbd>\cJ</kbd> matches a LINE FEED (U+000A), 286N/A * and a <kbd>\c[</kbd> matches an ESCAPE (U+001B). 286N/A * <dt class="REGEX">a non-meta character 286N/A * <dd>Matches the character. 286N/A * <dt class="REGEX"><KBD>\</KBD> + a meta character 286N/A * <dd>Matches the meta character. 286N/A * <dt class="REGEX"><kbd>\u005cx</kbd><var>HH</var> <kbd>\u005cx{</kbd><var>HHHH</var><kbd>}</kbd> 286N/A * <dd>Matches a character of which code point is <var>HH</var> (Hexadecimal) in Unicode. 286N/A * You can write just 2 digits for <kbd>\u005cx</kbd><var>HH</var>, and 286N/A * variable length digits for <kbd>\u005cx{</kbd><var>HHHH</var><kbd>}</kbd>. 286N/A * <dt class="REGEX"><kbd>\u005c u</kbd><var>HHHH</var> 286N/A * <dd>Matches a character of which code point is <var>HHHH</var> (Hexadecimal) in Unicode. 286N/A * <dt class="REGEX"><kbd>\u005cv</kbd><var>HHHHHH</var> 286N/A * <dd>Matches a character of which code point is <var>HHHHHH</var> (Hexadecimal) in Unicode. 286N/A * <dt class="REGEX"><kbd>\g</kbd> 286N/A * <dd>Matches a grapheme. 286N/A * <dd class="REGEX">It is equivalent to <kbd>(?[\p{ASSIGNED}]-[\p{M}\p{C}])?(?:\p{M}|[\x{094D}\x{09CD}\x{0A4D}\x{0ACD}\x{0B3D}\x{0BCD}\x{0C4D}\x{0CCD}\x{0D4D}\x{0E3A}\x{0F84}]\p{L}|[\x{1160}-\x{11A7}]|[\x{11A8}-\x{11FF}]|[\x{FF9E}\x{FF9F}])*</kbd> 286N/A * <dt class="REGEX"><kbd>\X</kbd> 286N/A * <dd class="REGEX">Matches a combining character sequence. 286N/A * It is equivalent to <kbd>(?:\PM\pM*)</kbd> 286N/A+ * <dt class="REGEX"><kbd>[</kbd><var>R<sub>1</sub></var><var>R<sub>2</sub></var><var>...</var><var>R<sub>n</sub></var><kbd>]</kbd> (without <a href="#COMMA_OPTION">"," option</a>) 286N/A+ * <dt class="REGEX"><kbd>[</kbd><var>R<sub>1</sub></var><kbd>,</kbd><var>R<sub>2</sub></var><kbd>,</kbd><var>...</var><kbd>,</kbd><var>R<sub>n</sub></var><kbd>]</kbd> (with <a href="#COMMA_OPTION">"," option</a>) 286N/A * <dd>Positive character class. It matches a character in ranges. 286N/A * <dd><var>R<sub>n</sub></var>: 286N/A * <li class="REGEX">A character (including <Kbd>\e \f \n \r \t</kbd> <kbd>\u005cx</kbd><var>HH</var> <kbd>\u005cx{</kbd><var>HHHH</var><kbd>}</kbd> <!--kbd>\u005c u</kbd><var>HHHH</var--> <kbd>\u005cv</kbd><var>HHHHHH</var>) 286N/A * <p>This range matches the character. 286N/A * <li class="REGEX"><var>C<sub>1</sub></var><kbd>-</kbd><var>C<sub>2</sub></var> 286N/A * <p>This range matches a character which has a code point that is >= <var>C<sub>1</sub></var>'s code point and <= <var>C<sub>2</sub></var>'s code point. 286N/A+ * <li class="REGEX">A POSIX character class: <Kbd>[:alpha:] [:alnum:] [:ascii:] [:cntrl:] [:digit:] [:graph:] [:lower:] [:print:] [:punct:] [:space:] [:upper:] [:xdigit:]</kbd>, 286N/A+ * and negative POSIX character classes in Perl like <kbd>[:^alpha:]</kbd> 286N/A * <li class="REGEX"><kbd>\d \D \s \S \w \W \p{</kbd><var>name</var><kbd>} \P{</kbd><var>name</var><kbd>}</kbd> 286N/A * <p>These expressions specifies the same ranges as the following expressions. 286N/A * <p class="REGEX">Enumerated ranges are merged (union operation). 286N/A * <kbd>[a-ec-z]</kbd> is equivalent to <kbd>[a-z]</kbd> 286N/A * <dt class="REGEX"><kbd>[^</kbd><var>R<sub>1</sub></var><var>R<sub>2</sub></var><var>...</var><var>R<sub>n</sub></var><kbd>]</kbd> (without a <a href="#COMMA_OPTION">"," option</a>) 286N/A * <dt class="REGEX"><kbd>[^</kbd><var>R<sub>1</sub></var><kbd>,</kbd><var>R<sub>2</sub></var><kbd>,</kbd><var>...</var><kbd>,</kbd><var>R<sub>n</sub></var><kbd>]</kbd> (with a <a href="#COMMA_OPTION">"," option</a>) 286N/A * <dd>Negative character class. It matches a character not in ranges. 286N/A * <dt class="REGEX"><kbd>(?[</kbd><var>ranges</var><kbd>]</kbd><var>op</var><kbd>[</kbd><var>ranges</var><kbd>]</kbd><var>op</var><kbd>[</kbd><var>ranges</var><kbd>]</kbd> ... <Kbd>)</kbd> 286N/A * (<var>op</var> is <kbd>-</kbd> or <kbd>+</kbd> or <kbd>&</kbd>.) 286N/A * <dd>Subtraction or union or intersection for character classes. 286N/A * <dd class="REGEX">For exmaple, <kbd>(?[A-Z]-[CF])</kbd> is equivalent to <kbd>[A-BD-EG-Z]</kbd>, and <kbd>(?[0x00-0x7f]-[K]&[\p{Lu}])</kbd> is equivalent to <kbd>[A-JL-Z]</kbd>. 286N/A * <dd>The result of this operations is a <u>positive character class</u> 286N/A * even if an expression includes any negative character classes. 286N/A * You have to take care on this in case-insensitive matching. 286N/A * For instance, <kbd>(?[^b])</kbd> is equivalent to <kbd>[\x00-ac-\x{10ffff}]</kbd>, 286N/A * which is equivalent to <kbd>[^b]</kbd> in case-sensitive matching. 286N/A * But, in case-insensitive matching, <kbd>(?[^b])</kbd> matches any character because 286N/A * it includes '<kbd>B</kbd>' and '<kbd>B</kbd>' matches '<kbd>b</kbd>' 286N/A * though <kbd>[^b]</kbd> is processed as <kbd>[^Bb]</kbd>. 286N/A * <dt class="REGEX"><kbd>[</kbd><var>R<sub>1</sub>R<sub>2</sub>...</var><kbd>-[</kbd><var>R<sub>n</sub>R<sub>n+1</sub>...</var><kbd>]]</kbd> (with an <a href="#X_OPTION">"X" option</a>)</dt> 286N/A * <dd>Character class subtraction for the XML Schema. 286N/A * You can use this syntax when you specify an <a href="#X_OPTION">"X" option</a>. 286N/A * <dt class="REGEX"><kbd>\d</kbd> 286N/A * <dd class="REGEX">Equivalent to <kbd>[0-9]</kbd>. 286N/A * <dd>When <a href="#U_OPTION">a "u" option</a> is set, it is equivalent to 286N/A * <span class="REGEX"><kbd>\p{Nd}</kbd></span>. 286N/A * <dt class="REGEX"><kbd>\D</kbd> 286N/A * <dd class="REGEX">Equivalent to <kbd>[^0-9]</kbd> 286N/A * <dd>When <a href="#U_OPTION">a "u" option</a> is set, it is equivalent to 286N/A * <span class="REGEX"><kbd>\P{Nd}</kbd></span>. 286N/A * <dt class="REGEX"><kbd>\s</kbd> 286N/A * <dd class="REGEX">Equivalent to <kbd>[ \f\n\r\t]</kbd> 286N/A * <dd>When <a href="#U_OPTION">a "u" option</a> is set, it is equivalent to 286N/A * <span class="REGEX"><kbd>[ \f\n\r\t\p{Z}]</kbd></span>. 286N/A * <dt class="REGEX"><kbd>\S</kbd> 286N/A * <dd class="REGEX">Equivalent to <kbd>[^ \f\n\r\t]</kbd> 286N/A * <dd>When <a href="#U_OPTION">a "u" option</a> is set, it is equivalent to 286N/A * <span class="REGEX"><kbd>[^ \f\n\r\t\p{Z}]</kbd></span>. 286N/A * <dt class="REGEX"><kbd>\w</kbd> 286N/A * <dd class="REGEX">Equivalent to <kbd>[a-zA-Z0-9_]</kbd> 286N/A * <dd>When <a href="#U_OPTION">a "u" option</a> is set, it is equivalent to 286N/A * <span class="REGEX"><kbd>[\p{Lu}\p{Ll}\p{Lo}\p{Nd}_]</kbd></span>. 286N/A * <dt class="REGEX"><kbd>\W</kbd> 286N/A * <dd class="REGEX">Equivalent to <kbd>[^a-zA-Z0-9_]</kbd> 286N/A * <dd>When <a href="#U_OPTION">a "u" option</a> is set, it is equivalent to 286N/A * <span class="REGEX"><kbd>[^\p{Lu}\p{Ll}\p{Lo}\p{Nd}_]</kbd></span>. 286N/A * <dt class="REGEX"><kbd>\p{</kbd><var>name</var><kbd>}</kbd> 286N/A * The following names are available: 286N/A * <dt>Unicode General Categories: 286N/A * L, M, N, Z, C, P, S, Lu, Ll, Lt, Lm, Lo, Mn, Me, Mc, Nd, Nl, No, Zs, Zl, Zp, 286N/A * Cc, Cf, Cn, Co, Cs, Pd, Ps, Pe, Pc, Po, Sm, Sc, Sk, So, 286N/A * <dd>(Currently the Cn category includes U+10000-U+10FFFF characters) 286N/A * Basic Latin, Latin-1 Supplement, Latin Extended-A, Latin Extended-B, 286N/A * IPA Extensions, Spacing Modifier Letters, Combining Diacritical Marks, Greek, 286N/A * Cyrillic, Armenian, Hebrew, Arabic, Devanagari, Bengali, Gurmukhi, Gujarati, 286N/A * Oriya, Tamil, Telugu, Kannada, Malayalam, Thai, Lao, Tibetan, Georgian, 286N/A * Hangul Jamo, Latin Extended Additional, Greek Extended, General Punctuation, 286N/A * Superscripts and Subscripts, Currency Symbols, Combining Marks for Symbols, 286N/A * Letterlike Symbols, Number Forms, Arrows, Mathematical Operators, 286N/A * Miscellaneous Technical, Control Pictures, Optical Character Recognition, 286N/A * Enclosed Alphanumerics, Box Drawing, Block Elements, Geometric Shapes, 286N/A * Miscellaneous Symbols, Dingbats, CJK Symbols and Punctuation, Hiragana, 286N/A * Katakana, Bopomofo, Hangul Compatibility Jamo, Kanbun, 286N/A * Enclosed CJK Letters and Months, CJK Compatibility, CJK Unified Ideographs, 286N/A * Hangul Syllables, High Surrogates, High Private Use Surrogates, Low Surrogates, 286N/A * Private Use, CJK Compatibility Ideographs, Alphabetic Presentation Forms, 286N/A * Arabic Presentation Forms-A, Combining Half Marks, CJK Compatibility Forms, 286N/A * Small Form Variants, Arabic Presentation Forms-B, Specials, 286N/A * Halfwidth and Fullwidth Forms 286N/A * <dd><kbd>ALL</kbd> (Equivalent to <kbd>[\u005cu0000-\u005cv10FFFF]</kbd>) 286N/A * <dd><kbd>ASSGINED</kbd> (<kbd>\p{ASSIGNED}</kbd> is equivalent to <kbd>\P{Cn}</kbd>) 286N/A * <dd><kbd>UNASSGINED</kbd> 286N/A * (<kbd>\p{UNASSIGNED}</kbd> is equivalent to <kbd>\p{Cn}</kbd>) 286N/A * <dt class="REGEX"><kbd>\P{</kbd><var>name</var><kbd>}</kbd> 286N/A * <dd>Matches one character not in the specified General Category or the specified Block. 286N/A * <li>Selection and Quantifier 286N/A * <dt class="REGEX"><VAR>X</VAR><kbd>|</kbd><VAR>Y</VAR> 286N/A * <dt class="REGEX"><VAR>X</VAR><kbd>*</KBD> 286N/A * <dd>Matches 0 or more <var>X</var>. 286N/A * <dt class="REGEX"><VAR>X</VAR><kbd>+</KBD> 286N/A * <dd>Matches 1 or more <var>X</var>. 286N/A * <dt class="REGEX"><VAR>X</VAR><kbd>?</KBD> 286N/A * <dd>Matches 0 or 1 <var>X</var>. 286N/A * <dt class="REGEX"><var>X</var><kbd>{</kbd><var>number</var><kbd>}</kbd> 286N/A * <dd>Matches <var>number</var> times. 286N/A * <dt class="REGEX"><var>X</var><kbd>{</kbd><var>min</var><kbd>,}</kbd> 286N/A * <dt class="REGEX"><var>X</var><kbd>{</kbd><var>min</var><kbd>,</kbd><var>max</var><kbd>}</kbd> 286N/A * <dt class="REGEX"><VAR>X</VAR><kbd>*?</kbd> 286N/A * <dt class="REGEX"><VAR>X</VAR><kbd>+?</kbd> 286N/A * <dt class="REGEX"><VAR>X</VAR><kbd>??</kbd> 286N/A * <dt class="REGEX"><var>X</var><kbd>{</kbd><var>min</var><kbd>,}?</kbd> 286N/A * <dt class="REGEX"><var>X</var><kbd>{</kbd><var>min</var><kbd>,</kbd><var>max</var><kbd>}?</kbd> 286N/A * <dd>Non-greedy matching. 286N/A * <li>Grouping, Capturing, and Back-reference 286N/A * <dt class="REGEX"><KBD>(?:</kbd><VAR>X</VAR><kbd>)</KBD> 286N/A * <dd>Grouping. "<KBD>foo+</KBD>" matches "<KBD>foo</KBD>" or "<KBD>foooo</KBD>". 286N/A * If you want it matches "<KBD>foofoo</KBD>" or "<KBD>foofoofoo</KBD>", 286N/A * you have to write "<KBD>(?:foo)+</KBD>". 286N/A * <dt class="REGEX"><KBD>(</kbd><VAR>X</VAR><kbd>)</KBD> 286N/A * <dd>Grouping with capturing. 286N/A * It make a group and applications can know 286N/A * where in target text a group matched with methods of a <code>Match</code> instance 286N/A * after <code><a href="#matches(java.lang.String, com.sun.org.apache.xerces.internal.utils.regex.Match)">matches(String,Match)</a></code>. 286N/A * The 0th group means whole of this regular expression. 286N/A * The <VAR>N</VAR>th gorup is the inside of the <VAR>N</VAR>th left parenthesis. 286N/A * <p>For instance, a regular expression is 286N/A * "<FONT color=blue><KBD> *([^<:]*) +<([^>]*)> *</KBD></FONT>" 286N/A * "<FONT color=red><KBD>From: TAMURA Kent <kent@trl.ibm.co.jp></KBD></FONT>": 286N/A * <li><code>Match.getCapturedText(0)</code>: 286N/A * "<FONT color=red><KBD> TAMURA Kent <kent@trl.ibm.co.jp></KBD></FONT>" 286N/A * <li><code>Match.getCapturedText(1)</code>: "<FONT color=red><KBD>TAMURA Kent</KBD></FONT>" 286N/A * <li><code>Match.getCapturedText(2)</code>: "<FONT color=red><KBD>kent@trl.ibm.co.jp</KBD></FONT>" 286N/A * <dt class="REGEX"><kbd>\1 \2 \3 \4 \5 \6 \7 \8 \9</kbd> 286N/A * <dt class="REGEX"><kbd>(?></kbd><var>X</var><kbd>)</kbd> 286N/A * <dd>Independent expression group. ................ 286N/A * <dt class="REGEX"><kbd>(?</kbd><var>options</var><kbd>:</kbd><var>X</var><kbd>)</kbd> 286N/A * <dt class="REGEX"><kbd>(?</kbd><var>options</var><kbd>-</kbd><var>options2</var><kbd>:</kbd><var>X</var><kbd>)</kbd> 286N/A * <dd>............................ 286N/A * <dd>The <var>options</var> or the <var>options2</var> consists of 'i' 'm' 's' 'w'. 286N/A * Note that it can not contain 'u'. 286N/A * <dt class="REGEX"><kbd>(?</kbd><var>options</var><kbd>)</kbd> 286N/A * <dt class="REGEX"><kbd>(?</kbd><var>options</var><kbd>-</kbd><var>options2</var><kbd>)</kbd> 286N/A * <dd>These expressions must be at the beginning of a group. 286N/A * <dt class="REGEX"><kbd>\A</kbd> 286N/A * <dd>Matches the beginnig of the text. 286N/A * <dt class="REGEX"><kbd>\Z</kbd> 286N/A * <dd>Matches the end of the text, or before an EOL character at the end of the text, 286N/A * or CARRIAGE RETURN + LINE FEED at the end of the text. 286N/A * <dt class="REGEX"><kbd>\z</kbd> 286N/A * <dd>Matches the end of the text. 286N/A * <dt class="REGEX"><kbd>^</kbd> 286N/A * <dd>Matches the beginning of the text. It is equivalent to <span class="REGEX"><Kbd>\A</kbd></span>. 286N/A * <dd>When <a href="#M_OPTION">a "m" option</a> is set, 286N/A * it matches the beginning of the text, or after one of EOL characters ( 286N/A * LINE FEED (U+000A), CARRIAGE RETURN (U+000D), LINE SEPARATOR (U+2028), 286N/A * PARAGRAPH SEPARATOR (U+2029).) 286N/A * <dt class="REGEX"><kbd>$</kbd> 286N/A * <dd>Matches the end of the text, or before an EOL character at the end of the text, 286N/A * or CARRIAGE RETURN + LINE FEED at the end of the text. 286N/A * <dd>When <a href="#M_OPTION">a "m" option</a> is set, 286N/A * it matches the end of the text, or before an EOL character. 286N/A * <dt class="REGEX"><kbd>\b</kbd> 286N/A * <dd>Matches word boundary. 286N/A * (See <a href="#W_OPTION">a "w" option</a>) 286N/A * <dt class="REGEX"><kbd>\B</kbd> 286N/A * <dd>Matches non word boundary. 286N/A * (See <a href="#W_OPTION">a "w" option</a>) 286N/A * <dt class="REGEX"><kbd>\<</kbd> 286N/A * <dd>Matches the beginning of a word. 286N/A * (See <a href="#W_OPTION">a "w" option</a>) 286N/A * <dt class="REGEX"><kbd>\></kbd> 286N/A * <dd>Matches the end of a word. 286N/A * (See <a href="#W_OPTION">a "w" option</a>) 286N/A * <li>Lookahead and lookbehind 286N/A * <dt class="REGEX"><kbd>(?=</kbd><var>X</var><kbd>)</kbd> 286N/A * <dt class="REGEX"><kbd>(?!</kbd><var>X</var><kbd>)</kbd> 286N/A * <dd>Negative lookahead. 286N/A * <dt class="REGEX"><kbd>(?<=</kbd><var>X</var><kbd>)</kbd> 286N/A * <dd>(Note for text capturing......) 286N/A * <dt class="REGEX"><kbd>(?<!</kbd><var>X</var><kbd>)</kbd> 286N/A * <dd>Negative lookbehind. 286N/A * <dt class="REGEX"><kbd>(?(</Kbd><var>condition</var><Kbd>)</kbd><var>yes-pattern</var><kbd>|</kbd><var>no-pattern</var><kbd>)</kbd>, 286N/A * <dt class="REGEX"><kbd>(?(</kbd><var>condition</var><kbd>)</kbd><var>yes-pattern</var><kbd>)</kbd> 286N/A * <dt class="REGEX"><kbd>(?#</kbd><var>comment</var><kbd>)</kbd> 286N/A * <dd>Comment. A comment string consists of characters except '<kbd>)</kbd>'. 286N/A * You can not write comments in character classes and before quantifiers. 286N/A * <h3>BNF for the regular expression</h3> 286N/A * regex ::= ('(?' options ')')? term ('|' term)* 286N/A * factor ::= anchors | atom (('*' | '+' | '?' | minmax ) '?'? )? 286N/A * minmax ::= '{' ([0-9]+ | [0-9]+ ',' | ',' [0-9]+ | [0-9]+ ',' [0-9]+) '}' 286N/A * atom ::= char | '.' | char-class | '(' regex ')' | '(?:' regex ')' | '\' [0-9] 286N/A * | '\w' | '\W' | '\d' | '\D' | '\s' | '\S' | category-block | '\X' 286N/A * | '(?>' regex ')' | '(?' options ':' regex ')' 286N/A * | '(?' ('(' [0-9] ')' | '(' anchors ')' | looks) term ('|' term)? ')' 286N/A * options ::= [imsw]* ('-' [imsw]+)? 286N/A * anchors ::= '^' | '$' | '\A' | '\Z' | '\z' | '\b' | '\B' | '\<' | '\>' 286N/A * looks ::= '(?=' regex ')' | '(?!' regex ')' 286N/A * | '(?<=' regex ')' | '(?<!' regex ')' 286N/A * char ::= '\\' | '\' [efnrtv] | '\c' [@-_] | code-point | character-1 286N/A * category-block ::= '\' [pP] category-symbol-1 286N/A * | ('\p{' | '\P{') (category-symbol | block-name 286N/A * | other-properties) '}' 286N/A * category-symbol-1 ::= 'L' | 'M' | 'N' | 'Z' | 'C' | 'P' | 'S' 286N/A * category-symbol ::= category-symbol-1 | 'Lu' | 'Ll' | 'Lt' | 'Lm' | Lo' 286N/A * | 'Mn' | 'Me' | 'Mc' | 'Nd' | 'Nl' | 'No' 286N/A * | 'Zs' | 'Zl' | 'Zp' | 'Cc' | 'Cf' | 'Cn' | 'Co' | 'Cs' 286N/A * | 'Pd' | 'Ps' | 'Pe' | 'Pc' | 'Po' 286N/A * | 'Sm' | 'Sc' | 'Sk' | 'So' 286N/A * block-name ::= (See above) 286N/A * other-properties ::= 'ALL' | 'ASSIGNED' | 'UNASSIGNED' 286N/A * character-1 ::= (any character except meta-characters) 286N/A * char-class ::= '[' ranges ']' 286N/A * | '(?[' ranges ']' ([-+&] '[' ranges ']')? ')' 286N/A * ranges ::= '^'? (range <a href="#COMMA_OPTION">','?</a>)+ 286N/A * range ::= '\d' | '\w' | '\s' | '\D' | '\W' | '\S' | category-block 286N/A * | range-char | range-char '-' range-char 286N/A * range-char ::= '\[' | '\]' | '\\' | '\' [,-efnrtv] | code-point | character-2 286N/A * code-point ::= '\x' hex-char hex-char 286N/A * | '\x{' hex-char+ '}' 286N/A * <!-- | '\u005c u' hex-char hex-char hex-char hex-char 286N/A * --> | '\v' hex-char hex-char hex-char hex-char hex-char hex-char 286N/A * hex-char ::= [0-9a-fA-F] 286N/A * character-2 ::= (any character except \[]-,) 286N/A * <li>2.4 Canonical Equivalents 286N/A * <li>Parsing performance 286N/A * @author TAMURA Kent <kent@trl.ibm.co.jp> 286N/A * Compiles a token tree into an operation flow. 286N/A * Converts a token to an operation. 286N/A // X{2,6} -> XX(X(X(XX?)?)?)? 286N/A }
else {
// Token.CLOSURE 286N/A * Checks whether the <var>target</var> text <strong>contains</strong> this pattern or not. 286N/A * @return true if the target is matched to this regular expression. 286N/A * Checks whether the <var>target</var> text <strong>contains</strong> this pattern 286N/A * in specified range or not. 286N/A * @param start Start offset of the range. 286N/A * @param end End offset +1 of the range. 286N/A * @return true if the target is matched to this regular expression. 286N/A * Checks whether the <var>target</var> text <strong>contains</strong> this pattern or not. 286N/A * @param match A Match instance for storing matching result. 286N/A * @return Offset of the start position in <VAR>target</VAR>; or -1 if not match. 286N/A * Checks whether the <var>target</var> text <strong>contains</strong> this pattern 286N/A * in specified range or not. 286N/A * @param start Start offset of the range. 286N/A * @param end End offset +1 of the range. 286N/A * @param match A Match instance for storing matching result. 286N/A * @return Offset of the start position in <VAR>target</VAR>; or -1 if not match. 286N/A // Need not to call setSource() because 286N/A // a caller can not access this match instance. 286N/A //System.err.println("DEBUG: matchEnd="+matchEnd); 286N/A * The pattern has only fixed string. 286N/A * The engine uses Boyer-Moore. 286N/A //System.err.println("DEBUG: fixed-only: "+this.fixedString); 286N/A * The pattern contains a fixed string. 286N/A * The engine checks with Boyer-Moore whether the text contains the fixed string or not. 286N/A * If not, it return with false. 286N/A //System.err.println("Non-match in fixed-string search."); 286N/A * Checks whether the expression starts with ".*". 286N/A * Optimization against the first character. 286N/A //System.err.println("DEBUG: with firstchar-matching: "+this.firstChar); 286N/A * Straightforward matching. 286N/A * Checks whether the <var>target</var> text <strong>contains</strong> this pattern or not. 286N/A * @return true if the target is matched to this regular expression. 286N/A * Checks whether the <var>target</var> text <strong>contains</strong> this pattern 286N/A * in specified range or not. 286N/A * @param start Start offset of the range. 286N/A * @param end End offset +1 of the range. 286N/A * @return true if the target is matched to this regular expression. 286N/A * Checks whether the <var>target</var> text <strong>contains</strong> this pattern or not. 286N/A * @param match A Match instance for storing matching result. 286N/A * @return Offset of the start position in <VAR>target</VAR>; or -1 if not match. 286N/A * Checks whether the <var>target</var> text <strong>contains</strong> this pattern 286N/A * in specified range or not. 286N/A * @param start Start offset of the range. 286N/A * @param end End offset +1 of the range. 286N/A * @param match A Match instance for storing matching result. 286N/A * @return Offset of the start position in <VAR>target</VAR>; or -1 if not match. 286N/A // Need not to call setSource() because 286N/A // a caller can not access this match instance. 286N/A * The pattern has only fixed string. 286N/A * The engine uses Boyer-Moore. 286N/A //System.err.println("DEBUG: fixed-only: "+this.fixedString); 286N/A * The pattern contains a fixed string. 286N/A * The engine checks with Boyer-Moore whether the text contains the fixed string or not. 286N/A * If not, it return with false. 286N/A //System.err.println("Non-match in fixed-string search."); 286N/A * Checks whether the expression starts with ".*". 286N/A * Optimization against the first character. 286N/A //System.err.println("DEBUG: with firstchar-matching: "+this.firstChar); 286N/A * Straightforward matching. 286N/A * @return -1 when not match; offset of the end of matched string when match. 286N/A // dx value is either 1 or -1 286N/A // Saves current position to avoid zero-width repeats. 286N/A // handle recursive operations 286N/A // exhausted all the operations 286N/A case '@':
// Internal use only. 286N/A // The @ always matches line beginnings. 286N/A }
// switch anchor type 286N/A * Checks whether the <var>target</var> text <strong>contains</strong> this pattern or not. 286N/A * @return true if the target is matched to this regular expression. 286N/A * Checks whether the <var>target</var> text <strong>contains</strong> this pattern or not. 286N/A * @param match A Match instance for storing matching result. 286N/A * @return Offset of the start position in <VAR>target</VAR>; or -1 if not match. 286N/A // Need not to call setSource() because 286N/A // a caller can not access this match instance. 286N/A //System.err.println("DEBUG: matchEnd="+matchEnd); 286N/A * The pattern has only fixed string. 286N/A * The engine uses Boyer-Moore. 286N/A //System.err.println("DEBUG: fixed-only: "+this.fixedString); 286N/A * The pattern contains a fixed string. 286N/A * The engine checks with Boyer-Moore whether the text contains the fixed string or not. 286N/A * If not, it return with false. 286N/A //System.err.println("Non-match in fixed-string search."); 286N/A * Checks whether the expression starts with ".*". 286N/A * Optimization against the first character. 286N/A //System.err.println("DEBUG: with firstchar-matching: "+this.firstChar); 286N/A * Straightforward matching. 286N/A // ================================================================ 286N/A * A regular expression. 286N/A * The number of parenthesis in the regular expression. 286N/A * Internal representation of the regular expression. 286N/A // We do not check for duplicates, caller is responsible for that 286N/A * Prepares for matching. This method is called just before starting matching. 286N/A if (this.operations.type == Op.CLOSURE && this.operations.getChild().type == Op.DOT) { // .* 286N/A Op anchor = Op.createAnchor(isSet(this.options, SINGLE_LINE) ? 'A' : '@'); 286N/A anchor.next = this.operations; 286N/A this.operations = anchor; 286N/A // This pattern has a fixed string of which length is more than one. 286N/A +
"/" //+this.fixedString 286N/A * If you specify this option, <span class="REGEX"><kbd>(</kbd><var>X</var><kbd>)</kbd></span> 286N/A * captures matched text, and <span class="REGEX"><kbd>(:?</kbd><var>X</var><kbd>)</kbd></span> 286N/A * @see #RegularExpression(java.lang.String,int) 286N/A * @see #setPattern(java.lang.String,int) 286N/A static final int MARK_PARENS = 1<<0; 286N/A * This option redefines <span class="REGEX"><kbd>\d \D \w \W \s \S</kbd></span>. 286N/A * @see #RegularExpression(java.lang.String,int) 286N/A * @see #setPattern(java.lang.String,int) 286N/A * @see #UNICODE_WORD_BOUNDARY 286N/A * This enables to process locale-independent word boundary for <span class="REGEX"><kbd>\b \B \< \></kbd></span>. 286N/A * <p>By default, the engine considers a position between a word character 286N/A * (<span class="REGEX"><Kbd>\w</kbd></span>) and a non word character 286N/A * <p>By this option, the engine checks word boundaries with the method of 286N/A * 'Unicode Regular Expression Guidelines' Revision 4. 286N/A * @see #RegularExpression(java.lang.String,int) 286N/A * @see #setPattern(java.lang.String,int) 286N/A * "X". XML Schema mode. 286N/A * Creates a new RegularExpression instance. 286N/A * @param regex A regular expression 286N/A * @exception org.apache.xerces.utils.regex.ParseException <VAR>regex</VAR> is not conforming to the syntax. 286N/A * Creates a new RegularExpression instance with options. 286N/A * @param regex A regular expression 286N/A * @param options A String consisted of "i" "m" "s" "u" "w" "," "X" 286N/A * @exception org.apache.xerces.utils.regex.ParseException <VAR>regex</VAR> is not conforming to the syntax. 286N/A * Creates a new RegularExpression instance with options. 286N/A * @param regex A regular expression 286N/A * @param options A String consisted of "i" "m" "s" "u" "w" "," "X" 286N/A * @exception org.apache.xerces.utils.regex.ParseException <VAR>regex</VAR> is not conforming to the syntax. 286N/A * Represents this instence in String. 286N/A * Returns a option string. 286N/A * The order of letters in it may be different from a string specified 286N/A * in a constructor or <code>setPattern()</code>. 286N/A * @see #RegularExpression(java.lang.String,java.lang.String) 286N/A * @see #setPattern(java.lang.String,java.lang.String) 286N/A * Return true if patterns are the same and the options are equivalent. 286N/A * Return the number of regular expression groups. 286N/A * This method returns 1 when the regular expression has no capturing-parenthesis. 286N/A // ================================================================ 286N/A // ================================================================