/*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* under the terms of the GNU General Public License version 2 only, as
* published by the Free Software Foundation. Oracle designates this
* particular file as subject to the "Classpath" exception as provided
* by Oracle in the LICENSE file that accompanied this code.
*
* This code is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
* version 2 for more details (a copy is included in the LICENSE file that
* accompanied this code).
*
* You should have received a copy of the GNU General Public License version
* 2 along with this work; if not, write to the Free Software Foundation,
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
*
* Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
* or visit www.oracle.com if you need additional information or have any
* questions.
*/
/*
*******************************************************************************
* (C) Copyright IBM Corp. and others, 1996-2009 - All Rights Reserved *
* *
* The original version of this source code and documentation is copyrighted *
* and owned by IBM, These materials are provided under terms of a License *
* Agreement between IBM and Sun. This technology is protected by multiple *
* US and International patents. This notice and attribution to IBM may not *
* to removed. *
*******************************************************************************
*/
/**
* Unicode Normalization
*
* <h2>Unicode normalization API</h2>
*
* <code>normalize</code> transforms Unicode text into an equivalent composed or
* decomposed form, allowing for easier sorting and searching of text.
* <code>normalize</code> supports the standard normalization forms described in
* <a href="http://www.unicode.org/unicode/reports/tr15/" target="unicode">
* Unicode Standard Annex #15 — Unicode Normalization Forms</a>.
*
* Characters with accents or other adornments can be encoded in
* several different ways in Unicode. For example, take the character A-acute.
* In Unicode, this can be encoded as a single character (the
* "composed" form):
*
* <p>
* 00C1 LATIN CAPITAL LETTER A WITH ACUTE
* </p>
*
* or as two separate characters (the "decomposed" form):
*
* <p>
* 0041 LATIN CAPITAL LETTER A
* 0301 COMBINING ACUTE ACCENT
* </p>
*
* To a user of your program, however, both of these sequences should be
* treated as the same "user-level" character "A with acute accent". When you
* are searching or comparing text, you must ensure that these two sequences are
* treated equivalently. In addition, you must handle characters with more than
* one accent. Sometimes the order of a character's combining accents is
* significant, while in other cases accent sequences in different orders are
* really equivalent.
*
* Similarly, the string "ffi" can be encoded as three separate letters:
*
* <p>
* 0066 LATIN SMALL LETTER F
* 0066 LATIN SMALL LETTER F
* 0069 LATIN SMALL LETTER I
* </p>
*
* or as the single character
*
* <p>
* FB03 LATIN SMALL LIGATURE FFI
* </p>
*
* The ffi ligature is not a distinct semantic character, and strictly speaking
* it shouldn't be in Unicode at all, but it was included for compatibility
* with existing character sets that already provided it. The Unicode standard
* identifies such characters by giving them "compatibility" decompositions
* into the corresponding semantic characters. When sorting and searching, you
* will often want to use these mappings.
*
* <code>normalize</code> helps solve these problems by transforming text into
* the canonical composed and decomposed forms as shown in the first example
* above. In addition, you can have it perform compatibility decompositions so
* that you can treat compatibility characters the same as their equivalents.
* Finally, <code>normalize</code> rearranges accents into the proper canonical
* order, so that you do not have to worry about accent rearrangement on your
* own.
*
* Form FCD, "Fast C or D", is also designed for collation.
* It allows to work on strings that are not necessarily normalized
* with an algorithm (like in collation) that works under "canonical closure",
* i.e., it treats precomposed characters and their decomposed equivalents the
* same.
*
* It is not a normalization form because it does not provide for uniqueness of
* representation. Multiple strings may be canonically equivalent (their NFDs
* are identical) and may all conform to FCD without being identical themselves.
*
* The form is defined such that the "raw decomposition", the recursive
* canonical decomposition of each character, results in a string that is
* canonically ordered. This means that precomposed characters are allowed for
* as long as their decompositions do not need canonical reordering.
*
* Its advantage for a process like collation is that all NFD and most NFC texts
* - and many unnormalized texts - already conform to FCD and do not need to be
* normalized (NFD) for such a process. The FCD quick check will return YES for
* most strings in practice.
*
* normalize(FCD) may be implemented with NFD.
*
* For more details on FCD see the collation design document:
*
* ICU collation performs either NFD or FCD normalization automatically if
* normalization is turned on for the collator object. Beyond collation and
* string search, normalized strings may be useful for string equivalence
* comparisons, transliteration/transcription, unique representations, etc.
*
* The W3C generally recommends to exchange texts in NFC.
* Note also that most legacy character encodings use only precomposed forms and
* often do not encode any combining marks by themselves. For conversion to such
* character encodings the Unicode text needs to be normalized to NFC.
* For more usage examples, see the Unicode Standard Annex.
* @stable ICU 2.8
*/
//-------------------------------------------------------------------------
// Private data
//-------------------------------------------------------------------------
// The input text and our position in it
private int currentIndex;
private int nextIndex;
/**
* Options bit set value to select Unicode 3.2 normalization
* (except NormalizationCorrections).
* At most one Unicode version can be selected at a time.
* @stable ICU 2.6
*/
/**
* Constant indicating that the end of the iteration has been reached.
* This is guaranteed to have the same value as {@link UCharacterIterator#DONE}.
* @stable ICU 2.8
*/
/**
* Constants for normalization modes.
* @stable ICU 2.8
*/
public static class Mode {
private int modeValue;
}
/**
* This method is used for method dispatch
* @stable ICU 2.6
*/
UnicodeSet nx) {
return srcLen;
}
return srcLen;
}
/**
* This method is used for method dispatch
* @stable ICU 2.6
*/
int options) {
);
}
/**
* This method is used for method dispatch
* @stable ICU 2.6
*/
return src;
}
/**
* This method is used for method dispatch
* @stable ICU 2.8
*/
protected int getMinC() {
return -1;
}
/**
* This method is used for method dispatch
* @stable ICU 2.8
*/
protected int getMask() {
return -1;
}
/**
* This method is used for method dispatch
* @stable ICU 2.8
*/
return null;
}
/**
* This method is used for method dispatch
* @stable ICU 2.8
*/
return null;
}
/**
* This method is used for method dispatch
* @stable ICU 2.6
*/
if(allowMaybe) {
return MAYBE;
}
return NO;
}
/**
* This method is used for method dispatch
* @stable ICU 2.8
*/
protected boolean isNFSkippable(int c) {
return true;
}
}
/**
* No decomposition/composition.
* @stable ICU 2.8
*/
/**
* Canonical decomposition.
* @stable ICU 2.8
*/
super(value);
}
UnicodeSet nx) {
int[] trailCC = new int[1];
}
}
protected int getMinC() {
return NormalizerImpl.MIN_WITH_LEAD_CC;
}
return new IsPrevNFDSafe();
}
return new IsNextNFDSafe();
}
protected int getMask() {
}
int limit,boolean allowMaybe,
UnicodeSet nx) {
return NormalizerImpl.quickCheck(
),
0,
);
}
protected boolean isNFSkippable(int c) {
return NormalizerImpl.isNFSkippable(c,this,
);
}
}
/**
* Compatibility decomposition.
* @stable ICU 2.8
*/
super(value);
}
UnicodeSet nx) {
int[] trailCC = new int[1];
}
}
protected int getMinC() {
return NormalizerImpl.MIN_WITH_LEAD_CC;
}
return new IsPrevNFDSafe();
}
return new IsNextNFDSafe();
}
protected int getMask() {
}
int limit,boolean allowMaybe,
UnicodeSet nx) {
return NormalizerImpl.quickCheck(
),
);
}
protected boolean isNFSkippable(int c) {
return NormalizerImpl.isNFSkippable(c, this,
);
}
}
/**
* Canonical decomposition followed by canonical composition.
* @stable ICU 2.8
*/
super(value);
}
UnicodeSet nx) {
0, nx);
}
}
protected int getMinC() {
return NormalizerImpl.getFromIndexesArr(
);
}
return new IsPrevTrueStarter();
}
return new IsNextTrueStarter();
}
protected int getMask() {
}
int limit,boolean allowMaybe,
UnicodeSet nx) {
return NormalizerImpl.quickCheck(
),
0,
);
}
protected boolean isNFSkippable(int c) {
return NormalizerImpl.isNFSkippable(c,this,
)
);
}
};
/**
* Compatibility decomposition followed by canonical composition.
* @stable ICU 2.8
*/
super(value);
}
UnicodeSet nx) {
}
}
protected int getMinC() {
return NormalizerImpl.getFromIndexesArr(
);
}
return new IsPrevTrueStarter();
}
return new IsNextTrueStarter();
}
protected int getMask() {
}
int limit,boolean allowMaybe,
UnicodeSet nx) {
return NormalizerImpl.quickCheck(
),
);
}
protected boolean isNFSkippable(int c) {
return NormalizerImpl.isNFSkippable(c, this,
)
);
}
};
/**
* Result values for quickCheck().
* For details see Unicode Technical Report 15.
* @stable ICU 2.8
*/
public static final class QuickCheckResult{
private int resultValue;
}
}
/**
* Indicates that string is not in the normalized format
* @stable ICU 2.8
*/
/**
* Indicates that string is in the normalized format
* @stable ICU 2.8
*/
/**
* Indicates it cannot be determined if string is in the normalized
* format without further thorough checks.
* @stable ICU 2.8
*/
//-------------------------------------------------------------------------
// Constructors
//-------------------------------------------------------------------------
/**
* Creates a new <tt>Normalizer</tt> object for iterating over the
* normalized form of a given string.
* <p>
* The <tt>options</tt> parameter specifies which optional
* <tt>Normalizer</tt> features are to be enabled for this object.
* <p>
* @param str The string to be normalized. The normalization
* will start at the beginning of the string.
*
* @param mode The normalization mode.
*
* @param opt Any optional features to be enabled.
* Currently the only available option is {@link #UNICODE_3_2}.
* If you want the default behavior corresponding to one of the
* standard Unicode Normalization Forms, use 0 for this argument.
* @stable ICU 2.6
*/
}
/**
* Creates a new <tt>Normalizer</tt> object for iterating over the
* normalized form of the given text.
* <p>
* @param iter The input text to be normalized. The normalization
* will start at the beginning of the string.
*
* @param mode The normalization mode.
*/
}
/**
* Creates a new <tt>Normalizer</tt> object for iterating over the
* normalized form of the given text.
* <p>
* @param iter The input text to be normalized. The normalization
* will start at the beginning of the string.
*
* @param mode The normalization mode.
*
* @param opt Any optional features to be enabled.
* Currently the only available option is {@link #UNICODE_3_2}.
* If you want the default behavior corresponding to one of the
* standard Unicode Normalization Forms, use 0 for this argument.
* @stable ICU 2.6
*/
);
}
/**
* Clones this <tt>Normalizer</tt> object. All properties of this
* object are duplicated in the new object, including the cloning of any
* {@link CharacterIterator} that was passed in to the constructor
* or to {@link #setText(CharacterIterator) setText}.
* However, the text storage underlying
* the <tt>CharacterIterator</tt> is not duplicated unless the
* iterator's <tt>clone</tt> method does so.
* @stable ICU 2.8
*/
try {
//clone the internal buffer
}
return copy;
}
catch (CloneNotSupportedException e) {
throw new InternalError(e.toString());
}
}
//--------------------------------------------------------------------------
// Static Utility methods
//--------------------------------------------------------------------------
/**
* Compose a string.
* The string will be composed to according the the specified mode.
* @param str The string to compose.
* @param compat If true the string will be composed accoding to
* NFKC rules and if false will be composed according to
* NFC rules.
* @param options The only recognized option is UNICODE_3_2
* @return String The composed string
* @stable ICU 2.6
*/
if (options == UNICODE_3_2_0_ORIGINAL) {
} else {
}
int destSize=0;
/* reset options bits that should only be set here or inside compose() */
options&=~(NormalizerImpl.OPTIONS_SETS_MASK|NormalizerImpl.OPTIONS_COMPAT|NormalizerImpl.OPTIONS_COMPOSE_CONTIGUOUS);
if(compat) {
}
for(;;) {
nx);
} else {
}
}
}
/**
* Decompose a string.
* The string will be decomposed to according the the specified mode.
* @param str The string to decompose.
* @param compat If true the string will be decomposed accoding to NFKD
* rules and if false will be decomposed according to NFD
* rules.
* @return String The decomposed string
* @stable ICU 2.8
*/
}
/**
* Decompose a string.
* The string will be decomposed to according the the specified mode.
* @param str The string to decompose.
* @param compat If true the string will be decomposed accoding to NFKD
* rules and if false will be decomposed according to NFD
* rules.
* @param options The normalization options, ORed together (0 for no options).
* @return String The decomposed string
* @stable ICU 2.6
*/
int[] trailCC = new int[1];
int destSize=0;
char[] dest;
if (options == UNICODE_3_2_0_ORIGINAL) {
for(;;) {
} else {
}
}
} else {
for(;;) {
} else {
}
}
}
}
/**
* Normalize a string.
* The string will be normalized according the the specified normalization
* mode and options.
* @param src The char array to compose.
* @param srcStart Start index of the source
* @param srcLimit Limit index of the source
* @param dest The char buffer to fill in
* @param destStart Start index of the destination buffer
* @param destLimit End index of the destination buffer
* @param mode The normalization mode; one of Normalizer.NONE,
* Normalizer.NFD, Normalizer.NFC, Normalizer.NFKC,
* Normalizer.NFKD, Normalizer.DEFAULT
* @param options The normalization options, ORed together (0 for no options).
* @return int The total buffer size needed;if greater than length of
* result, the output was truncated.
* @exception IndexOutOfBoundsException if the target capacity is
* less than the required length
* @stable ICU 2.6
*/
return length;
} else {
}
}
//-------------------------------------------------------------------------
// Iteration API
//-------------------------------------------------------------------------
/**
* Return the current character in the normalized text->
* @return The codepoint as an int
* @stable ICU 2.8
*/
public int current() {
return getCodePointAt(bufferPos);
} else {
return DONE;
}
}
/**
* Return the next character in the normalized text and advance
* the iteration position by one. If the end
* of the text has already been reached, {@link #DONE} is returned.
* @return The codepoint as an int
* @stable ICU 2.8
*/
public int next() {
int c=getCodePointAt(bufferPos);
return c;
} else {
return DONE;
}
}
/**
* Return the previous character in the normalized text and decrement
* the iteration position by one. If the beginning
* of the text has already been reached, {@link #DONE} is returned.
* @return The codepoint as an int
* @stable ICU 2.8
*/
public int previous() {
return c;
} else {
return DONE;
}
}
/**
* Reset the index to the beginning of the text.
* This is equivalent to setIndexOnly(startIndex)).
* @stable ICU 2.8
*/
public void reset() {
clearBuffer();
}
/**
* Set the iteration position in the input text that is being normalized,
* without any immediate normalization.
* After setIndexOnly(), getIndex() will return the same index that is
* specified here.
*
* @param index the desired index in the input text.
* @stable ICU 2.8
*/
clearBuffer();
}
/**
* Set the iteration position in the input text that is being normalized
* and return the first normalized character at that position.
* <p>
* <b>Note:</b> This method sets the position in the <em>input</em> text,
* while {@link #next} and {@link #previous} iterate through characters
* in the normalized <em>output</em>. This means that there is not
* necessarily a one-to-one correspondence between characters returned
* by <tt>next</tt> and <tt>previous</tt> and the indices passed to and
* returned from <tt>setIndex</tt> and {@link #getIndex}.
* <p>
* @param index the desired index in the input text->
*
* @return the first normalized character that is the result of iterating
* forward starting at the given index.
*
* @throws IllegalArgumentException if the given index is less than
* {@link #getBeginIndex} or greater than {@link #getEndIndex}.
* @return The codepoint as an int
* @deprecated ICU 3.2
* @obsolete ICU 3.2
*/
return current();
}
/**
* Retrieve the index of the start of the input text. This is the begin
* index of the <tt>CharacterIterator</tt> or the start (i.e. 0) of the
* <tt>String</tt> over which this <tt>Normalizer</tt> is iterating
* @deprecated ICU 2.2. Use startIndex() instead.
* @return The codepoint as an int
* @see #startIndex
*/
public int getBeginIndex() {
return 0;
}
/**
* Retrieve the index of the end of the input text. This is the end index
* of the <tt>CharacterIterator</tt> or the length of the <tt>String</tt>
* over which this <tt>Normalizer</tt> is iterating
* @deprecated ICU 2.2. Use endIndex() instead.
* @return The codepoint as an int
* @see #endIndex
*/
public int getEndIndex() {
return endIndex();
}
/**
* Retrieve the current iteration position in the input text that is
* being normalized. This method is useful in applications such as
* searching, where you need to be able to determine the position in
* the input text that corresponds to a given normalized output character.
* <p>
* <b>Note:</b> This method sets the position in the <em>input</em>, while
* {@link #next} and {@link #previous} iterate through characters in the
* <em>output</em>. This means that there is not necessarily a one-to-one
* correspondence between characters returned by <tt>next</tt> and
* <tt>previous</tt> and the indices passed to and returned from
* <tt>setIndex</tt> and {@link #getIndex}.
* @return The current iteration position
* @stable ICU 2.8
*/
public int getIndex() {
if(bufferPos<bufferLimit) {
return currentIndex;
} else {
return nextIndex;
}
}
/**
* Retrieve the index of the end of the input text-> This is the end index
* of the <tt>CharacterIterator</tt> or the length of the <tt>String</tt>
* over which this <tt>Normalizer</tt> is iterating
* @return The current iteration position
* @stable ICU 2.8
*/
public int endIndex() {
}
//-------------------------------------------------------------------------
// Property access methods
//-------------------------------------------------------------------------
/**
* Set the normalization mode for this object.
* <p>
* <b>Note:</b>If the normalization mode is changed while iterating
* over a string, calls to {@link #next} and {@link #previous} may
* return previously buffers characters in the old normalization mode
* until the iteration is able to re-sync at the next base character.
* It is safest to call {@link #setText setText()}, {@link #first},
* {@link #last}, etc. after calling <tt>setMode</tt>.
* <p>
* @param newMode the new mode for this <tt>Normalizer</tt>.
* The supported modes are:
* <ul>
* <li>{@link #COMPOSE} - Unicode canonical decompositiion
* followed by canonical composition.
* <li>{@link #COMPOSE_COMPAT} - Unicode compatibility decompositiion
* follwed by canonical composition.
* <li>{@link #DECOMP} - Unicode canonical decomposition
* <li>{@link #DECOMP_COMPAT} - Unicode compatibility decomposition.
* <li>{@link #NO_OP} - Do nothing but return characters
* from the underlying input text.
* </ul>
*
* @see #getMode
* @stable ICU 2.8
*/
}
/**
* Return the basic operation performed by this <tt>Normalizer</tt>
*
* @see #setMode
* @stable ICU 2.8
*/
return mode;
}
/**
* Set the input text over which this <tt>Normalizer</tt> will iterate.
* The iteration position is set to the beginning of the input text->
* @param newText The new string to be normalized.
* @stable ICU 2.8
*/
throw new InternalError("Could not create a new UCharacterIterator");
}
reset();
}
/**
* Set the input text over which this <tt>Normalizer</tt> will iterate.
* The iteration position is set to the beginning of the input text->
* @param newText The new string to be normalized.
* @stable ICU 2.8
*/
throw new InternalError("Could not create a new UCharacterIterator");
}
clearBuffer();
}
//-------------------------------------------------------------------------
// Private utility methods
//-------------------------------------------------------------------------
/* backward iteration --------------------------------------------------- */
/*
* read backwards and get norm32
* return 0 if the character is <minC
* if c2!=0 then (c2, c) is a surrogate pair (reversed - c2 is first
* surrogate but read second!)
*/
int/*unsigned*/ minC,
int/*unsigned*/ mask,
char[] chars) {
long norm32;
int ch=0;
/* need src.hasPrevious() */
return 0;
}
/* check for a surrogate before getting norm32 to see if we need to
* predecrement further */
return 0;
/* unpaired surrogate */
return 0;
/* all surrogate pairs with this lead surrogate have irrelevant
* data */
return 0;
} else {
/* norm32 must be a surrogate special */
}
} else {
/* unpaired second surrogate, undo the c2=src.previous() movement */
return 0;
}
}
private interface IsPrevBoundary{
int/*unsigned*/ minC,
int/*unsigned*/ mask,
char[] chars);
}
/*
* for NF*D:
* read backwards and check if the lead combining class is 0
* if c2!=0 then (c2, c) is a surrogate pair (reversed - c2 is first
* surrogate but read second!)
*/
int/*unsigned*/ minC,
int/*unsigned*/ ccOrQCMask,
char[] chars) {
ccOrQCMask, chars),
}
}
/*
* read backwards and check if the character is (or its decomposition
* begins with) a "true starter" (cc==0 and NF*C_YES)
* if c2!=0 then (c2, c) is a surrogate pair (reversed - c2 is first
* surrogate but read second!)
*/
int/*unsigned*/ minC,
int/*unsigned*/ ccOrQCMask,
char[] chars) {
long norm32;
int/*unsigned*/ decompQCMask;
}
}
int/*unsigned*/ minC,
int/*mask*/ mask,
char[] buffer,
int[] startIndex) {
char[] chars=new char[2];
boolean isBoundary;
/* fill the buffer from the end backwards */
/* always write this character to the front of the buffer */
/* make sure there is enough space in the buffer */
// grow the buffer
/* move the current buffer contents up */
//adjust the startIndex
}
}
/* stop if this just-copied character is a boundary */
if(isBoundary) {
break;
}
}
/* return the length of the buffer contents */
}
boolean doNormalize,
boolean[] pNeededToNormalize,
int options) {
int destLength, bufferLength;
int/*unsigned*/ mask;
int c,c2;
char minC;
destLength=0;
if(pNeededToNormalize!=null) {
pNeededToNormalize[0]=false;
}
if(isPreviousBoundary==null) {
destLength=0;
destLength=1;
if(UTF16.isTrailSurrogate((char)c)) {
if(destCapacity>=2) {
destLength=2;
}
// lead surrogate to be written below
c=c2;
} else {
}
}
}
if(destCapacity>0) {
dest[0]=(char)c;
}
}
return destLength;
}
char[] buffer = new char[100];
int[] startIndex= new int[1];
if(bufferLength>0) {
if(doNormalize) {
if(pNeededToNormalize!=null) {
));
}
} else {
/* just copy the source characters */
if(destCapacity>0) {
);
}
}
}
return destLength;
}
/* forward iteration ---------------------------------------------------- */
/*
* read forward and check if the character is a next-iteration boundary
* if c2!=0 then (c, c2) is a surrogate pair
*/
private interface IsNextBoundary{
int/*unsigned*/ minC,
int/*unsigned*/ mask,
int[] chars);
}
/*
* read forward and get norm32
* return 0 if the character is <minC
* if c2!=0 then (c2, c) is a surrogate pair
* always reads complete characters
*/
int/*unsigned*/ minC,
int/*unsigned*/ mask,
int[] chars) {
long norm32;
/* need src.hasNext() to be true */
return 0;
}
/* irrelevant data */
return 0;
} else {
/* norm32 must be a surrogate special */
}
} else {
/* unmatched surrogate */
return 0;
}
}
return norm32;
}
/*
* for NF*D:
* read forward and check if the lead combining class is 0
* if c2!=0 then (c, c2) is a surrogate pair
*/
int/*unsigned*/ minC,
int/*unsigned*/ ccOrQCMask,
int[] chars) {
}
}
/*
* for NF*C:
* read forward and check if the character is (or its decomposition begins
* with) a "true starter" (cc==0 and NF*C_YES)
* if c2!=0 then (c, c2) is a surrogate pair
*/
int/*unsigned*/ minC,
int/*unsigned*/ ccOrQCMask,
int[] chars) {
long norm32;
int/*unsigned*/ decompQCMask;
}
}
int/*unsigned*/ minC,
int/*unsigned*/ mask,
char[] buffer) {
return 0;
}
/* get one character and ignore its properties */
int[] chars = new int[2];
int bufferIndex = 1;
} else {
}
}
/* get all following characters until we see a boundary */
/* checking hasNext() instead of c!=DONE on the off-chance that U+ffff
* is part of the string */
/* back out the latest movement to stop at the boundary */
break;
} else {
}
} else {
}
}
}
}
/* return the length of the buffer contents */
return bufferIndex;
}
boolean doNormalize,
boolean[] pNeededToNormalize,
int options) {
int /*unsigned*/ mask;
int /*unsigned*/ bufferLength;
int c,c2;
char minC;
int destLength = 0;
if(pNeededToNormalize!=null) {
pNeededToNormalize[0]=false;
}
if(isNextBoundary==null) {
destLength=0;
if(c!=UCharacterIterator.DONE) {
destLength=1;
if(UTF16.isLeadSurrogate((char)c)) {
if(destCapacity>=2) {
destLength=2;
}
// lead surrogate to be written below
} else {
}
}
}
if(destCapacity>0) {
dest[0]=(char)c;
}
}
return destLength;
}
char[] buffer=new char[100];
int[] startIndex = new int[1];
buffer);
if(bufferLength>0) {
if(doNormalize) {
if(pNeededToNormalize!=null) {
destLength));
}
} else {
/* just copy the source characters */
if(destCapacity>0) {
);
}
}
}
return destLength;
}
private void clearBuffer() {
}
private boolean nextNormalize() {
clearBuffer();
return (bufferLimit>0);
}
private boolean previousNormalize() {
clearBuffer();
return bufferLimit>0;
}
);
}
);
}
}
}
}
/**
* Internal API
* @internal
*/
return mode.isNFSkippable(c);
}
//
// Options
//
/*
* Default option for Unicode 3.2.0 normalization.
* Corrigendum 4 was fixed in Unicode 3.2.0 but isn't supported in
* IDNA/StringPrep.
* The public review issue #29 was fixed in Unicode 4.1.0. Corrigendum 5
* allowed Unicode 3.2 to 4.0.1 to apply the fix for PRI #29, but it isn't
* supported by IDNA/StringPrep as well as Corrigendum 4.
*/
public static final int UNICODE_3_2_0_ORIGINAL =
/*
* Default option for the latest Unicode normalization. This option is
* provided mainly for testing.
* The value zero means that normalization is done with the fixes for
* - Corrigendum 4 (Five CJK Canonical Mapping Errors)
* - Corrigendum 5 (Normalization Idempotency)
*/
//
// public constructor and methods for java.text.Normalizer and
// sun.text.Normalizer
//
/**
* Creates a new <tt>Normalizer</tt> object for iterating over the
* normalized form of a given string.
*
* @param str The string to be normalized. The normalization
* will start at the beginning of the string.
*
* @param mode The normalization mode.
*/
}
/**
* Normalizes a <code>String</code> using the given normalization form.
*
* @param str the input string to be normalized.
* @param form the normalization form
*/
}
/**
* Normalizes a <code>String</code> using the given normalization form.
*
* @param str the input string to be normalized.
* @param form the normalization form
* @param options the optional features to be enabled.
*/
boolean asciiOnly = true;
if (len < 80) {
for (int i = 0; i < len; i++) {
asciiOnly = false;
break;
}
}
} else {
char[] a = str.toCharArray();
for (int i = 0; i < len; i++) {
if (a[i] > 127) {
asciiOnly = false;
break;
}
}
}
switch (form) {
case NFC :
case NFD :
case NFKC :
case NFKD :
}
throw new IllegalArgumentException("Unexpected normalization form: " +
form);
}
/**
* Test if a string is in a given normalization form.
* This is semantically equivalent to source.equals(normalize(source, mode)).
*
* Unlike quickCheck(), this function returns a definitive result,
* never a "maybe".
* For NFD, NFKD, and FCD, both functions work exactly the same.
* For NFC and NFKC where quickCheck may return "maybe", this function will
* @param str the input string to be checked to see if it is normalized
* @param form the normalization form
* @param options the optional features to be enabled.
*/
}
/**
* Test if a string is in a given normalization form.
* This is semantically equivalent to source.equals(normalize(source, mode)).
*
* Unlike quickCheck(), this function returns a definitive result,
* never a "maybe".
* For NFD, NFKD, and FCD, both functions work exactly the same.
* For NFC and NFKC where quickCheck may return "maybe", this function will
* @param str the input string to be checked to see if it is normalized
* @param form the normalization form
* @param options the optional features to be enabled.
*/
switch (form) {
case NFC:
case NFD:
case NFKC:
return (NFKC.quickCheck(str.toCharArray(),0,str.length(),false,NormalizerImpl.getNX(options))==YES);
case NFKD:
return (NFKD.quickCheck(str.toCharArray(),0,str.length(),false,NormalizerImpl.getNX(options))==YES);
}
throw new IllegalArgumentException("Unexpected normalization form: " +
form);
}
}