286N/A * reserved comment block 286N/A * DO NOT REMOVE OR ALTER! 286N/A * Copyright 1999-2004 The Apache Software Foundation. 286N/A * Licensed under the Apache License, Version 2.0 (the "License"); 286N/A * you may not use this file except in compliance with the License. 286N/A * You may obtain a copy of the License at 286N/A * Unless required by applicable law or agreed to in writing, software 286N/A * distributed under the License is distributed on an "AS IS" BASIS, 286N/A * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 286N/A * See the License for the specific language governing permissions and 286N/A * limitations under the License. 286N/A * Bare-bones, unsafe, fast string buffer. No thread-safety, no 286N/A * parameter range checking, exposed fields. Note that in typical 286N/A * applications, thread-safety of a StringBuffer is a somewhat 286N/A * dubious concept in any case. 286N/A * Note that Stree and DTM used a single FastStringBuffer as a string pool, 286N/A * by recording start and length indices within this single buffer. This 286N/A * minimizes heap overhead, but of course requires more work when retrieving 286N/A * FastStringBuffer operates as a "chunked buffer". Doing so 286N/A * reduces the need to recopy existing information when an append 286N/A * exceeds the space available; we just allocate another chunk and 286N/A * flow across to it. (The array of chunks may need to grow, 286N/A * admittedly, but that's a much smaller object.) Some excess 286N/A * recopying may arise when we extract Strings which cross chunk 286N/A * boundaries; larger chunks make that less frequent. 286N/A * The size values are parameterized, to allow tuning this code. In 286N/A * theory, Result Tree Fragments might want to be tuned differently 286N/A * from the main document's text. 286N/A * %REVIEW% An experiment in self-tuning is 286N/A * included in the code (using nested FastStringBuffers to achieve 286N/A * variation in chunk sizes), but this implementation has proven to 286N/A * be problematic when data may be being copied from the FSB into itself. 286N/A * We should either re-architect that to make this safe (if possible) 286N/A // If nonzero, forces the inial chunk size. 286N/A // %BUG% %REVIEW% *****PROBLEM SUSPECTED: If data from an FSB is being copied 286N/A // back into the same FSB (variable set from previous variable, for example) 286N/A // and blocksize changes in mid-copy... there's risk of severe malfunction in 286N/A // the read process, due to how the resizing code re-jiggers storage. Arggh. 286N/A // If we want to retain the variable-size-block feature, we need to reconsider 286N/A // that issue. For now, I have forced us into fixed-size mode. 286N/A /** Manifest constant: Suppress leading whitespace. 286N/A * This should be used when normalize-to-SAX is called for the first chunk of a 286N/A * multi-chunk output, or one following unsuppressed whitespace in a previous 286N/A * @see #sendNormalizedSAXcharacters(org.xml.sax.ContentHandler,int,int) 286N/A /** Manifest constant: Suppress trailing whitespace. 286N/A * This should be used when normalize-to-SAX is called for the last chunk of a 286N/A * multi-chunk output; it may have to be or'ed with SUPPRESS_LEADING_WS. 286N/A /** Manifest constant: Suppress both leading and trailing whitespace. 286N/A * This should be used when normalize-to-SAX is called for a complete string. 286N/A * (I'm not wild about the name of this one. Ideas welcome.) 286N/A * @see #sendNormalizedSAXcharacters(org.xml.sax.ContentHandler,int,int) 286N/A /** Manifest constant: Carry trailing whitespace of one chunk as leading 286N/A * whitespace of the next chunk. Used internally; I don't see any reason 286N/A * to make it public right now. 286N/A * Field m_chunkBits sets our chunking strategy, by saying how many 286N/A * bits of index can be used within a single chunk before flowing over 286N/A * to the next chunk. For example, if m_chunkbits is set to 15, each 286N/A * chunk can contain up to 2^15 (32K) characters 286N/A * Field m_maxChunkBits affects our chunk-growth strategy, by saying what 286N/A * the largest permissible chunk size is in this particular FastStringBuffer 286N/A * Field m_rechunkBits affects our chunk-growth strategy, by saying how 286N/A * many chunks should be allocated at one size before we encapsulate them 286N/A * into the first chunk of the next size up. For example, if m_rechunkBits 286N/A * is set to 3, then after 8 chunks at a given size we will rebundle 286N/A * them as the first element of a FastStringBuffer using a chunk size 286N/A * 8 times larger (chunkBits shifted left three bits). 286N/A * Field m_chunkSize establishes the maximum size of one chunk of the array 286N/A * as 2**chunkbits characters. 286N/A * (Which may also be the minimum size if we aren't tuning for storage) 286N/A * Field m_chunkMask is m_chunkSize-1 -- in other words, m_chunkBits 286N/A * worth of low-order '1' bits, useful for shift-and-mask addressing 286N/A * Field m_array holds the string buffer's text contents, using an 286N/A * array-of-arrays. Note that this array, and the arrays it contains, may be 286N/A * reallocated when necessary in order to allow the buffer to grow; 286N/A * references to them should be considered to be invalidated after any 286N/A * append. However, the only time these arrays are directly exposed 286N/A * is in the sendSAXcharacters call. 286N/A * Field m_lastChunk is an index into m_array[], pointing to the last 286N/A * chunk of the Chunked Array currently in use. Note that additional 286N/A * chunks may actually be allocated, eg if the FastStringBuffer had 286N/A * previously been truncated or if someone issued an ensureSpace request. 286N/A * The insertion point for append operations is addressed by the combination 286N/A * of m_lastChunk and m_firstFree. 286N/A * Field m_firstFree is an index into m_array[m_lastChunk][], pointing to 286N/A * the first character in the Chunked Array which is not part of the 286N/A * FastStringBuffer's current content. Since m_array[][] is zero-based, 286N/A * the length of that content can be calculated as 286N/A * (m_lastChunk<<m_chunkBits) + m_firstFree 286N/A * Field m_innerFSB, when non-null, is a FastStringBuffer whose total 286N/A * length equals m_chunkSize, and which replaces m_array[0]. This allows 286N/A * building a hierarchy of FastStringBuffers, where early appends use 286N/A * a smaller chunkSize (for less wasted memory overhead) but later 286N/A * ones use a larger chunkSize (for less heap activity overhead). 286N/A * Construct a FastStringBuffer, with allocation policy as per parameters. 286N/A * For coding convenience, I've expressed both allocation sizes in terms of 286N/A * a number of bits. That's needed for the final size of a chunk, 286N/A * to permit fast and efficient shift-and-mask addressing. It's less critical 286N/A * for the inital size, and may be reconsidered. 286N/A * An alternative would be to accept integer sizes and round to powers of two; 286N/A * that really doesn't seem to buy us much, if anything. 286N/A * @param initChunkBits Length in characters of the initial allocation 286N/A * of a chunk, expressed in log-base-2. (That is, 10 means allocate 1024 286N/A * characters.) Later chunks will use larger allocation units, to trade off 286N/A * allocation speed of large document against storage efficiency of small 286N/A * @param maxChunkBits Number of character-offset bits that should be used for 286N/A * addressing within a chunk. Maximum length of a chunk is 2^chunkBits 286N/A * @param rebundleBits Number of character-offset bits that addressing should 286N/A * advance before we attempt to take a step from initChunkBits to maxChunkBits 286N/A // Should this force to larger value, or smaller? Smaller less efficient, but if 286N/A // someone requested variable mode it's because they care about storage space. 286N/A // On the other hand, given the other changes I'm making, odds are that we should 286N/A // adopt the larger size. Dither, dither, dither... This is just stopgap workaround 286N/A // anyway; we need a permanant solution. 286N/A //if(DEBUG_FORCE_FIXED_CHUNKSIZE) initChunkBits=maxChunkBits; 286N/A // Don't bite off more than we're prepared to swallow! 286N/A * Construct a FastStringBuffer, using a default rebundleBits value. 286N/A * NEEDSDOC @param initChunkBits 286N/A * NEEDSDOC @param maxChunkBits 286N/A * Construct a FastStringBuffer, using default maxChunkBits and 286N/A * ISSUE: Should this call assert initial size, or fixed size? 286N/A * Now configured as initial, with a default for fixed. 286N/A * NEEDSDOC @param initChunkBits 286N/A * Construct a FastStringBuffer, using a default allocation policy. 286N/A // 10 bits is 1K. 15 bits is 32K. Remember that these are character 286N/A // counts, so actual memory allocation unit is doubled for UTF-16 chars. 286N/A // For reference: In the original FastStringBuffer, we simply 286N/A // overallocated by blocksize (default 1KB) on each buffer-growth. 286N/A * Get the length of the list. Synonym for length(). 286N/A * @return the number of characters in the FastStringBuffer's content. 286N/A * Get the length of the list. Synonym for size(). 286N/A * @return the number of characters in the FastStringBuffer's content. 286N/A * Discard the content of the FastStringBuffer, and most of the memory 286N/A * that was allocated by it, restoring the initial state. Note that this 286N/A * may eventually be different from setLength(0), which see. 286N/A // Recover the original chunk size 286N/A // Discard the hierarchy 286N/A * Directly set how much of the FastStringBuffer's storage is to be 286N/A * considered part of its content. This is a fast but hazardous 286N/A * operation. It is not protected against negative values, or values 286N/A * greater than the amount of storage currently available... and even 286N/A * if additional storage does exist, its contents are unpredictable. 286N/A * The only safe use for our setLength() is to truncate the FastStringBuffer 286N/A * @param l New length. If l<0 or l>=getLength(), this operation will 286N/A * not report an error but future operations will almost certainly fail. 286N/A // Replace this FSB with the appropriate inner FSB, truncated 286N/A // There's an edge case if l is an exact multiple of m_chunkBits, which risks leaving 286N/A // us pointing at the start of a chunk which has not yet been allocated. Rather than 286N/A // pay the cost of dealing with that in the append loops (more scattered and more 286N/A // inner-loop), we correct it here by moving to the safe side of that 286N/A // line -- as we would have left the indexes had we appended up to that point. 286N/A * Subroutine for the public setLength() method. Deals with the fact 286N/A * that truncation may require restoring one of the innerFSBs 286N/A * NEEDSDOC @param rootFSB 286N/A // Undo encapsulation -- pop the innerFSB data back up to root. 286N/A // Inefficient, but attempts to keep the code simple. 286N/A // Finally, truncate this sucker. 286N/A * Note that this operation has been somewhat deoptimized by the shift to a 286N/A * chunked array, as there is no factory method to produce a String object 286N/A * directly from an array of arrays and hence a double copy is needed. 286N/A * By using ensureCapacity we hope to minimize the heap overhead of building 286N/A * the intermediate StringBuffer. 286N/A * (It really is a pity that Java didn't design String as a final subclass 286N/A * of MutableString, rather than having StringBuffer be a separate hierarchy. 286N/A * We'd avoid a <strong>lot</strong> of double-buffering.) 286N/A * @return the contents of the FastStringBuffer as a standard Java string. 286N/A * Append a single character onto the FastStringBuffer, growing the 286N/A * storage if necessary. 286N/A * NOTE THAT after calling append(), previously obtained 286N/A * references to m_array[][] may no longer be valid.... 286N/A * though in fact they should be in this instance. 286N/A * @param value character to be appended. 286N/A // We may have preallocated chunks. If so, all but last should 286N/A // Hierarchical encapsulation 286N/A // Should do all the work of both encapsulating 286N/A // Space exists in the chunk. Append the character. 286N/A * Append the contents of a String onto the FastStringBuffer, 286N/A * growing the storage if necessary. 286N/A * NOTE THAT after calling append(), previously obtained 286N/A * references to m_array[] may no longer be valid. 286N/A * @param value String whose contents are to be appended. 286N/A // Repeat while data remains to be copied 286N/A // If there's more left, allocate another chunk and continue 286N/A // Hierarchical encapsulation 286N/A // Should do all the work of both encapsulating 286N/A // Adjust the insert point in the last chunk, when we've reached it. 286N/A * Append the contents of a StringBuffer onto the FastStringBuffer, 286N/A * growing the storage if necessary. 286N/A * NOTE THAT after calling append(), previously obtained 286N/A * references to m_array[] may no longer be valid. 286N/A * @param value StringBuffer whose contents are to be appended. 286N/A // Repeat while data remains to be copied 286N/A // If there's more left, allocate another chunk and continue 286N/A // Hierarchical encapsulation 286N/A // Should do all the work of both encapsulating 286N/A // Adjust the insert point in the last chunk, when we've reached it. 286N/A * Append part of the contents of a Character Array onto the 286N/A * FastStringBuffer, growing the storage if necessary. 286N/A * NOTE THAT after calling append(), previously obtained 286N/A * references to m_array[] may no longer be valid. 286N/A * @param chars character array from which data is to be copied 286N/A * @param start offset in chars of first character to be copied, 286N/A * @param length number of characters to be copied 286N/A // Repeat while data remains to be copied 286N/A // If there's more left, allocate another chunk and continue 286N/A // Hierarchical encapsulation 286N/A // Should do all the work of both encapsulating 286N/A // Adjust the insert point in the last chunk, when we've reached it. 286N/A * Append the contents of another FastStringBuffer onto 286N/A * this FastStringBuffer, growing the storage if necessary. 286N/A * NOTE THAT after calling append(), previously obtained 286N/A * references to m_array[] may no longer be valid. 286N/A * @param value FastStringBuffer whose contents are 286N/A // Complicating factor here is that the two buffers may use 286N/A // different chunk sizes, and even if they're the same we're 286N/A // probably on a different alignment due to previously appended 286N/A // data. We have to work through the source in bite-sized chunks. 286N/A // Repeat while data remains to be copied 286N/A // If there's more left, allocate another chunk and continue 286N/A // Hierarchical encapsulation 286N/A // Should do all the work of both encapsulating 286N/A // Adjust the insert point in the last chunk, when we've reached it. 286N/A * @return true if the specified range of characters are all whitespace, 286N/A * as defined by XMLCharacterRecognizer. 286N/A * CURRENTLY DOES NOT CHECK FOR OUT-OF-RANGE. 286N/A * @param start Offset of first character in the range. 286N/A * @param length Number of characters to send. 286N/A * @param start Offset of first character in the range. 286N/A * @param length Number of characters to send. 286N/A * @return a new String object initialized from the specified range of 286N/A * @param sb StringBuffer to be appended to 286N/A * @param start Offset of first character in the range. 286N/A * @param length Number of characters to send. 286N/A * @return sb with the requested text appended to it 286N/A * Internal support for toString() and getString(). 286N/A * PLEASE NOTE SIGNATURE CHANGE from earlier versions; it now appends into 286N/A * and returns a StringBuffer supplied by the caller. This simplifies 286N/A * Note that this operation has been somewhat deoptimized by the shift to a 286N/A * chunked array, as there is no factory method to produce a String object 286N/A * directly from an array of arrays and hence a double copy is needed. 286N/A * By presetting length we hope to minimize the heap overhead of building 286N/A * the intermediate StringBuffer. 286N/A * (It really is a pity that Java didn't design String as a final subclass 286N/A * of MutableString, rather than having StringBuffer be a separate hierarchy. 286N/A * We'd avoid a <strong>lot</strong> of double-buffering.) 286N/A * @return the contents of the FastStringBuffer as a standard Java string. 286N/A //StringBuffer sb=new StringBuffer(length); 286N/A * Get a single character from the string buffer. 286N/A * @param pos character position requested. 286N/A * @return A character from the requested position. 286N/A * Sends the specified range of characters as one or more SAX characters() 286N/A * Note that the buffer reference passed to the ContentHandler may be 286N/A * invalidated if the FastStringBuffer is edited; it's the user's 286N/A * responsibility to manage access to the FastStringBuffer to prevent this 286N/A * problem from arising. 286N/A * Note too that there is no promise that the output will be sent as a 286N/A * single call. As is always true in SAX, one logical string may be split 286N/A * across multiple blocks of memory and hence delivered as several 286N/A * @param ch SAX ContentHandler object to receive the event. 286N/A * @param start Offset of first character in the range. 286N/A * @param length Number of characters to send. 286N/A * @exception org.xml.sax.SAXException may be thrown by handler's 286N/A // Last, or only, chunk 286N/A * Sends the specified range of characters as one or more SAX characters() 286N/A * events, normalizing the characters according to XSLT rules. 286N/A * @param ch SAX ContentHandler object to receive the event. 286N/A * @param start Offset of first character in the range. 286N/A * @param length Number of characters to send. 286N/A * @return normalization status to apply to next chunk (because we may 286N/A * have been called recursively to process an inner FSB): 286N/A * <dd>if this output did not end in retained whitespace, and thus whitespace 286N/A * at the start of the following chunk (if any) should be converted to a 286N/A * <dt>SUPPRESS_LEADING_WS</dt> 286N/A * <dd>if this output ended in retained whitespace, and thus whitespace 286N/A * at the start of the following chunk (if any) should be completely 286N/A * @exception org.xml.sax.SAXException may be thrown by handler's 286N/A // This call always starts at the beginning of the 286N/A // string being written out, either because it was called directly or 286N/A // because it was an m_innerFSB recursion. This is important since 286N/A // it gives us a well-known initial state for this flag: 286N/A // Last, or only, chunk 286N/A * Internal method to directly normalize and dispatch the character array. 286N/A * This version is aware of the fact that it may be called several times 286N/A * in succession if the data is made up of multiple "chunks", and thus 286N/A * must actively manage the handling of leading and trailing whitespace. 286N/A * Note: The recursion is due to the possible recursion of inner FSBs. 286N/A * @param ch The characters from the XML document. 286N/A * @param start The start position in the array. 286N/A * @param length The number of characters to read from the array. 286N/A * @param handler SAX ContentHandler object to receive the event. 286N/A * This is a bitfield contining two flags, bitwise-ORed together: 286N/A * <dt>SUPPRESS_LEADING_WS</dt> 286N/A * <dd>When false, causes leading whitespace to be converted to a single 286N/A * space; when true, causes it to be discarded entirely. 286N/A * Should be set TRUE for the first chunk, and (in multi-chunk output) 286N/A * whenever the previous chunk ended in retained whitespace.</dd> 286N/A * <dt>SUPPRESS_TRAILING_WS</dt> 286N/A * <dd>When false, causes trailing whitespace to be converted to a single 286N/A * space; when true, causes it to be discarded entirely. 286N/A * Should be set TRUE for the last or only chunk. 286N/A * @return normalization status, as in the edgeTreatmentFlags parameter: 286N/A * <dd>if this output did not end in retained whitespace, and thus whitespace 286N/A * at the start of the following chunk (if any) should be converted to a 286N/A * <dt>SUPPRESS_LEADING_WS</dt> 286N/A * <dd>if this output ended in retained whitespace, and thus whitespace 286N/A * at the start of the following chunk (if any) should be completely 286N/A * @exception org.xml.sax.SAXException Any SAX exception, possibly 286N/A * wrapping another exception. 286N/A // Strip any leading spaces first, if required 286N/A // If we've only encountered leading spaces, the 286N/A // current state remains unchanged 286N/A // If we get here, there are no more leading spaces to strip 286N/A // Grab a chunk of non-whitespace characters 286N/A // Non-whitespace seen - emit them, along with a single 286N/A // space for any preceding whitespace characters 286N/A // Consume any whitespace characters 286N/A * Directly normalize and dispatch the character array. 286N/A * @param ch The characters from the XML document. 286N/A * @param start The start position in the array. 286N/A * @param length The number of characters to read from the array. 286N/A * @param handler SAX ContentHandler object to receive the event. 286N/A * @exception org.xml.sax.SAXException Any SAX exception, possibly 286N/A * wrapping another exception. 286N/A * Sends the specified range of characters as sax Comment. 286N/A * Note that, unlike sendSAXcharacters, this has to be done as a single 286N/A * call to LexicalHandler#comment. 286N/A * @param ch SAX LexicalHandler object to receive the event. 286N/A * @param start Offset of first character in the range. 286N/A * @param length Number of characters to send. 286N/A * @exception org.xml.sax.SAXException may be thrown by handler's 286N/A // %OPT% Do it this way for now... 286N/A * Copies characters from this string into the destination character 286N/A * @param srcBegin index of the first character in the string 286N/A * @param srcEnd index after the last character in the string 286N/A * @param dst the destination array. 286N/A * @param dstBegin the start offset in the destination array. 286N/A * @exception IndexOutOfBoundsException If any of the following 286N/A * <ul><li><code>srcBegin</code> is negative. 286N/A * <li><code>srcBegin</code> is greater than <code>srcEnd</code> 286N/A * <li><code>srcEnd</code> is greater than the length of this 286N/A * <li><code>dstBegin</code> is negative 286N/A * <li><code>dstBegin+(srcEnd-srcBegin)</code> is larger than 286N/A * <code>dst.length</code></ul> 286N/A * @exception NullPointerException if <code>dst</code> is <code>null</code> 286N/A // %TBD% Joe needs to write this function. Make public when implemented. 286N/A * Encapsulation c'tor. After this is called, the source FastStringBuffer 286N/A * will be reset to use the new object as its m_innerFSB, and will have 286N/A * had its chunk size reset appropriately. IT SHOULD NEVER BE CALLED 286N/A * EXCEPT WHEN source.length()==1<<(source.m_chunkBits+source.m_rebundleBits) 286N/A * NEEDSDOC @param source 286N/A // Copy existing information into new encapsulation 286N/A // These have to be adjusted because we're calling just at the time 286N/A // when we would be about to allocate another chunk 286N/A // Since we encapsulated just as we were about to append another 286N/A // chunk, return ready to create the chunk after the innerFSB