286N/A * reserved comment block 286N/A * DO NOT REMOVE OR ALTER! 286N/A * Copyright 1999-2004 The Apache Software Foundation. 286N/A * Licensed under the Apache License, Version 2.0 (the "License"); 286N/A * you may not use this file except in compliance with the License. 286N/A * You may obtain a copy of the License at 286N/A * Unless required by applicable law or agreed to in writing, software 286N/A * distributed under the License is distributed on an "AS IS" BASIS, 286N/A * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 286N/A * See the License for the specific language governing permissions and 286N/A * limitations under the License. 286N/A/** <p>IncrementalSAXSource_Filter implements IncrementalSAXSource, using a 286N/A * standard SAX2 event source as its input and parcelling out those 286N/A * events gradually in reponse to deliverMoreNodes() requests. Output from the 286N/A * filter will be passed along to a SAX handler registered as our 286N/A * listener, but those callbacks will pass through a counting stage 286N/A * which periodically yields control back to the controller coroutine. 286N/A * <p>%REVIEW%: This filter is not currenly intended to be reusable 286N/A * making it resettable at some point in the future. But it's a 286N/A * small object, so that'd be mostly a convenience issue; the cost 286N/A * of allocating each time is trivial compared to the cost of processing 286N/A * any nontrival stream.</p> 286N/A * <p>For a brief usage example, see the unit-test main() method.</p> 286N/A * <p>This is a simplification of the old CoroutineSAXParser, focusing 286N/A * specifically on filtering. The resulting controller protocol is _far_ 286N/A * simpler and less error-prone; the only controller operation is deliverMoreNodes(), 286N/A * and the only requirement is that deliverMoreNodes(false) be called if you want to 286N/A * discard the rest of the stream and the previous deliverMoreNodes() didn't return 286N/A * This class is final and package private for security reasons. Please 286N/A * see CR 6537912 for further details. 286N/A // Flag indicating that no more events should be delivered -- either 286N/A // because input stream ran to completion (endDocument), or because 286N/A // the user requested an early stop via deliverMoreNodes(false). 286N/A // Support for startParse() 286N/A /** Create a IncrementalSAXSource_Filter which is not yet bound to a specific 286N/A /** Bind our input streams to an XMLReader. 286N/A * Just a convenience routine; obviously you can explicitly register 286N/A * this as a listener with the same effect. 286N/A // Not supported by all SAX2 filters: 286N/A // Nothing we can do about it 286N/A // Nothing we can do about it 286N/A // Should we also bind as other varieties of handler? 286N/A // (DTDHandler and so on) 286N/A // Register a content handler for us to output to 286N/A // Register a DTD handler for us to output to 286N/A // Register a lexical handler for us to output to 286N/A // Not all filters support this... 286N/A // ??? Should we register directly on the filter? 286N/A // NOTE NAME -- subclassing issue in the Xerces version 286N/A // Register an error handler for us to output to 286N/A // NOTE NAME -- subclassing issue in the Xerces version 286N/A // Set the number of events between resumes of our coroutine 286N/A // Immediately resets number of events before _next_ resume as well. 286N/A // ContentHandler methods 286N/A // These pass the data to our client ContentHandler... 286N/A // but they also count the number of events passing through, 286N/A // and resume our coroutine each time that counter hits zero and 286N/A // Note that for everything except endDocument and fatalError, we do the count-and-yield 286N/A // BEFORE passing the call along. I'm hoping that this will encourage JIT 286N/A // compilers to realize that these are tail-calls, reducing the expense of 286N/A // the additional layer of data flow. 286N/A // %REVIEW% Glenn suggests that pausing after endElement, endDocument, 286N/A // and characters may be sufficient. I actually may not want to 286N/A // stop after characters, since in our application these wind up being 286N/A // concatenated before they're processed... but that risks huge blocks of 286N/A // text causing greater than usual readahead. (Unlikely? Consider the 286N/A // possibility of a large base-64 block in a SOAP stream.) 286N/A // EXCEPTION: In this case we need to run the event BEFORE we yield. 286N/A // This can cause a hang. -sb 286N/A // Otherwise, begin normal event delivery 286N/A // LexicalHandler support. Not all SAX2 filters support these events 286N/A // but we may want to pass them through when they exist... 286N/A // %REVIEW% These do NOT currently affect the eventcounter; I'm asserting 286N/A // that they're rare enough that it makes little or no sense to 286N/A // pause after them. As such, it may make more sense for folks who 286N/A // actually want to use them to register directly with the filter. 286N/A // But I want 'em here for now, to remind us to recheck this assertion! 286N/A // ErrorHandler support. 286N/A // PROBLEM: Xerces is apparently _not_ calling the ErrorHandler for 286N/A // exceptions thrown by the ContentHandler, which prevents us from 286N/A // handling this properly when running in filtering mode with Xerces 286N/A // as our event source. It's unclear whether this is a Xerces bug 286N/A // or a SAX design flaw. 286N/A // %REVIEW% Current solution: In filtering mode, it is REQUIRED that 286N/A // event source make sure this method is invoked if the event stream 286N/A // abends before endDocument is delivered. If that means explicitly calling 286N/A // us in the exception handling code because it won't be delivered as part 286N/A // of the normal SAX ErrorHandler stream, that's fine; Not Our Problem. 286N/A // EXCEPTION: In this case we need to run the event BEFORE we yield -- 286N/A // just as with endDocument, this terminates the event stream. 286N/A /** @return the CoroutineManager this CoroutineFilter object is bound to. 286N/A * If you're using the do...() methods, applications should only 286N/A * need to talk to the CoroutineManager once, to obtain the 286N/A * application's Coroutine ID. 286N/A /** <p>In the SAX delegation code, I've inlined the count-down in 286N/A * the hope of encouraging compilers to deliver better 286N/A * performance. However, if we subclass (eg to directly connect the 286N/A * output to a DTM builder), that would require calling super in 286N/A * order to run that logic... which seems inelegant. Hence this 286N/A * routine for the convenience of subclasses: every [frequency] 286N/A * invocations, issue a co_yield.</p> 286N/A * @param moreExepected Should always be true unless this is being called 286N/A * at the end of endDocument() handling. 286N/A * co_entry_pause is called in startDocument() before anything else 286N/A * happens. It causes the filter to wait for a "go ahead" request 286N/A * from the controller before delivering any events. Note that 286N/A * the very first thing the controller tells us may be "I don't 286N/A * need events after all"! 286N/A // Nobody called init()? Do it now... 286N/A // Coroutine system says we haven't registered. That's an 286N/A // application coding error, and is unrecoverable. 286N/A * Co_Yield handles coroutine interactions while a parse is in progress. 286N/A * When moreRemains==true, we are pausing after delivering events, to 286N/A * ask if more are needed. We will resume the controller thread with 286N/A * co_resume(Boolean.TRUE, ...) 286N/A * When control is passed back it may indicate 286N/A * Boolean.TRUE indication to continue delivering events 286N/A * Boolean.FALSE indication to discontinue events and shut down. 286N/A * When moreRemains==false, we shut down immediately without asking the 286N/A * controller's permission. Normally this means end of document has been 286N/A * Shutting down a IncrementalSAXSource_Filter requires terminating the incoming 286N/A * SAX event stream. If we are in control of that stream (if it came 286N/A * from an XMLReader passed to our startReader() method), we can do so 286N/A * very quickly by throwing a reserved exception to it. If the stream is 286N/A * coming from another source, we can't do that because its caller may 286N/A * not be prepared for this "normal abnormal exit", and instead we put 286N/A * ourselves in a "spin" mode where events are discarded. 286N/A // Horrendous kluge to run filter to completion. See below. 286N/A try // Coroutine manager might throw no-such. 286N/A // Yield control, resume parsing when done 286N/A // If we're at end of document or were told to stop early 286N/A // Yield control. We do NOT expect anyone to ever ask us again. 286N/A // Shouldn't happen unless we've miscoded our coroutine logic 286N/A // "Shut down the garbage smashers on the detention level!" 286N/A // Convenience: Run an XMLReader in a thread 286N/A /** Launch a thread that will run an XMLReader's parse() operation within 286N/A * a thread, feeding events to this IncrementalSAXSource_Filter. Mostly a convenience 286N/A * routine, but has the advantage that -- since we invoked parse() -- 286N/A * we can halt parsing quickly via a StopException rather than waiting 286N/A * for the SAX stream to end by itself. 286N/A * @throws SAXException is parse thread is already in progress 286N/A * or parsing can not be started. 286N/A // Xalan thread pooling... 286N/A // com.sun.org.apache.xalan.internal.transformer.TransformerImpl.runTransformThread(this); 286N/A /* Thread logic to support startParseThread() 286N/A // Guard against direct invocation of start(). 286N/A // Initially assume we'll run successfully. 286N/A // For the duration of this operation, all coroutine handshaking 286N/A // will occur in the co_yield method. That's the nice thing about 286N/A // coroutines; they give us a way to hand off control from the 286N/A // middle of a synchronous method. 286N/A // Expected and harmless 286N/A // Expected and harmless 286N/A // Unexpected malfunction 286N/A // Mark as no longer running in thread. 286N/A // Mark as done and yield control to the controller coroutine 286N/A // Shouldn't happen unless we've miscoded our coroutine logic 286N/A // "CPO, shut down the garbage smashers on the detention level!" 286N/A /** Used to quickly terminate parse when running under a 286N/A startParse() thread. Only its type is important. */ 286N/A /** deliverMoreNodes() is a simple API which tells the coroutine 286N/A * parser that we need more nodes. This is intended to be called 286N/A * from one of our partner routines, and serves to encapsulate the 286N/A * details of how incremental parsing has been achieved. 286N/A * @param parsemore If true, tells the incremental filter to generate 286N/A * another chunk of output. If false, tells the filter that we're 286N/A * satisfied and it can terminate parsing of this document. 286N/A * @return Boolean.TRUE if there may be more events available by invoking 286N/A * deliverMoreNodes() again. Boolean.FALSE if parsing has run to completion (or been 286N/A * terminated by deliverMoreNodes(false). Or an exception object if something 286N/A * malfunctioned. %REVIEW% We _could_ actually throw the exception, but 286N/A * that would require runinng deliverMoreNodes() in a try/catch... and for many 286N/A * applications, exception will be simply be treated as "not TRUE" in 286N/A // If parsing is already done, we can immediately say so 286N/A // SHOULD NEVER OCCUR, since the coroutine number and coroutine manager 286N/A // are those previously established for this IncrementalSAXSource_Filter... 286N/A // So I'm just going to return it as a parsing exception, for now. 286N/A //================================================================ 286N/A /** Simple unit test. Attempt coroutine parsing of document indicated 286N/A * by first argument (as a URI), report progress. 286N/A public static void _main(String args[]) 286N/A System.out.println("Starting..."); 286N/A new com.sun.org.apache.xerces.internal.parsers.SAXParser(); 286N/A for(int arg=0;arg<args.length;++arg) 286N/A // The filter is not currently designed to be restartable 286N/A // after a parse has ended. Generate a new one each time. 286N/A IncrementalSAXSource_Filter filter= 286N/A new IncrementalSAXSource_Filter(); 286N/A // Use a serializer as our sample output 286N/A filter.setContentHandler(trace); 286N/A filter.setLexicalHandler(trace); 286N/A InputSource source = new InputSource(args[arg]); 286N/A // init not issued; we _should_ automagically Do The Right Thing 286N/A // Bind parser, kick off parsing in a thread 286N/A filter.setXMLReader(theSAXParser); 286N/A filter.startParse(source); 286N/A for(result = filter.deliverMoreNodes(more); 286N/A (result instanceof Boolean && ((Boolean)result)==Boolean.TRUE); 286N/A result = filter.deliverMoreNodes(more)) 286N/A System.out.println("\nSome parsing successful, trying more.\n"); 286N/A // Special test: Terminate parsing early. 286N/A if(arg+1<args.length && "!".equals(args[arg+1])) 286N/A if (result instanceof Boolean && ((Boolean)result)==Boolean.FALSE) 286N/A System.out.println("\nFilter ended (EOF or on request).\n"); 286N/A else if (result == null) { 286N/A System.out.println("\nUNEXPECTED: Filter says shut down prematurely.\n"); 286N/A else if (result instanceof Exception) { 286N/A System.out.println("\nFilter threw exception:"); 286N/A ((Exception)result).printStackTrace(); 286N/A}
// class IncrementalSAXSource_Filter