XMLDTDProcessor.java revision 286
325N/A * reserved comment block 325N/A * DO NOT REMOVE OR ALTER! 325N/A * The Apache Software License, Version 1.1 325N/A * Copyright (c) 1999-2002 The Apache Software Foundation. 325N/A * Redistribution and use in source and binary forms, with or without 325N/A * modification, are permitted provided that the following conditions 325N/A * 1. Redistributions of source code must retain the above copyright 325N/A * notice, this list of conditions and the following disclaimer. 325N/A * 2. Redistributions in binary form must reproduce the above copyright 325N/A * notice, this list of conditions and the following disclaimer in 325N/A * the documentation and/or other materials provided with the 325N/A * 3. The end-user documentation included with the redistribution, 325N/A * if any, must include the following acknowledgment: 325N/A * "This product includes software developed by the 325N/A * Alternately, this acknowledgment may appear in the software itself, 325N/A * if and wherever such third-party acknowledgments normally appear. 325N/A * 4. The names "Xerces" and "Apache Software Foundation" must 325N/A * not be used to endorse or promote products derived from this 325N/A * software without prior written permission. For written 325N/A * permission, please contact apache@apache.org. 325N/A * 5. Products derived from this software may not be called "Apache", 325N/A * nor may "Apache" appear in their name, without prior written 325N/A * permission of the Apache Software Foundation. 325N/A * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED 325N/A * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES 325N/A * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 325N/A * DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR 325N/A * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 325N/A * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 325N/A * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF 325N/A * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 325N/A * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 325N/A * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT 325N/A * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 325N/A * ==================================================================== 325N/A * This software consists of voluntary contributions made by many 325N/A * individuals on behalf of the Apache Software Foundation and was 325N/A * originally based on software copyright (c) 1999, International 325N/A * information on the Apache Software Foundation, please see * The DTD processor. The processor implements a DTD * filter: receiving DTD events from the DTD scanner; validating * the content and structure; building a grammar, if applicable; * and notifying the DTDHandler of the information resulting from the * This component requires the following features and properties from the * component manager that uses it: * @author Neil Graham, IBM /** Top level scope (-1). */ /** Feature identifier: validation. */ /** Feature identifier: notify character references. */ /** Feature identifier: warn on duplicate attdef */ /** Feature identifier: warn on undeclared element referenced in content model. */ /** Property identifier: symbol table. */ /** Property identifier: error reporter. */ /** Property identifier: grammar pool. */ /** Property identifier: validator . */ // recognized features and properties /** Recognized features. */ /** Recognized properties. */ /** Property defaults. */ /** Validation against only DTD */ /** warn on duplicate attribute definition, this feature works only when validation is true */ /** warn on undeclared element referenced in content model, this feature only works when valiation is true */ // the validator to which we look for our grammar bucket (the // validator needs to hold the bucket so that it can initialize // the grammar with details like whether it's for a standalone document... // the grammar pool we'll try to add the grammar to: /** DTD content model handler. */ /** DTD content model source. */ /** Perform validation. */ /** True if in an ignore conditional section of the DTD. */ // information regarding the current element /** Temporary entity declaration. */ /** Notation declaration hash. */ /** DTD element declaration name. */ /** Mixed element type "hash". */ /** Element declarations in DTD. */ // to check for duplicate ID or ANNOTATION attribute declare in /** ID attribute names. */ /** NOTATION attribute names. */ /** NOTATION enumeration values. */ /** Default constructor. */ * Resets the component. The component can query the component manager * about any features and properties that affect the operation of the * @param componentManager The component manager. * @throws SAXException Thrown by component on finitialization error. * For example, if a feature or property is * required for the operation of the component, the * component manager may throw a * SAXNotRecognizedException or a * SAXNotSupportedException. // parser settings have not been changed // we get our grammarBucket from the validator... }
// reset(XMLComponentManager) * Returns a list of feature identifiers that are recognized by * this component. This method may return null if no features * are recognized by this component. }
// getRecognizedFeatures():String[] * Sets the state of a feature. This method is called by the component * manager any time after reset when a feature changes state. * <strong>Note:</strong> Components should silently ignore features * that do not affect the operation of the component. * @param featureId The feature identifier. * @param state The state of the feature. * @throws SAXNotRecognizedException The component should not throw * @throws SAXNotSupportedException The component should not throw }
// setFeature(String,boolean) * Returns a list of property identifiers that are recognized by * this component. This method may return null if no properties * are recognized by this component. }
// getRecognizedProperties():String[] * Sets the value of a property. This method is called by the component * manager any time after reset when a property changes value. * <strong>Note:</strong> Components should silently ignore properties * that do not affect the operation of the component. * @param propertyId The property identifier. * @param value The value of the property. * @throws SAXNotRecognizedException The component should not throw * @throws SAXNotSupportedException The component should not throw }
// setProperty(String,Object) * Returns the default state for a feature, or null if this * component does not want to report a default value for this * @param featureId The feature identifier. }
// getFeatureDefault(String):Boolean * Returns the default state for a property, or null if this * component does not want to report a default value for this * @param propertyId The property identifier. }
// getPropertyDefault(String):Object * @param dtdHandler The DTD handler. }
// setDTDHandler(XMLDTDHandler) * Returns the DTD handler. * @return The DTD handler. }
// getDTDHandler(): XMLDTDHandler // XMLDTDContentModelSource methods * Sets the DTD content model handler. * @param dtdContentModelHandler The DTD content model handler. }
// setDTDContentModelHandler(XMLDTDContentModelHandler) * Gets the DTD content model handler. * @return dtdContentModelHandler The DTD content model handler. }
// getDTDContentModelHandler(): XMLDTDContentModelHandler // XMLDTDContentModelHandler and XMLDTDHandler methods * The start of the DTD external subset. * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. * The end of the DTD external subset. * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. * Check standalone entity reference. * Made static to make common between the validator and loader. *@param grammar grammar to which entity belongs * @param tempEntityDecl empty entity declaration to put results in * @param errorReporter error reporter to send errors to * @throws XNIException Thrown by application to signal an error. // check VC: Standalone Document Declartion, entities references appear in the document. "MSG_REFERENCE_TO_EXTERNALLY_DECLARED_ENTITY_WHEN_STANDALONE",
* @param text The text in the comment. * @param augs Additional information that may include infoset augmentations * @throws XNIException Thrown by application to signal an error. * A processing instruction. Processing instructions consist of a * target name and, optionally, text data. The data is only meaningful * Typically, a processing instruction's data will contain a series * of pseudo-attributes. These pseudo-attributes follow the form of * element attributes but are <strong>not</strong> parsed or presented * to the application as anything other than text. The application is * responsible for parsing the data. * @param target The target. * @param data The data or null if none specified. * @param augs Additional information that may include infoset augmentations * @throws XNIException Thrown by handler to signal an error. }
// processingInstruction(String,XMLString) * @param locator The document locator, or null if the document * location cannot be reported during the parsing of * the document DTD. However, it is <em>strongly</em> * recommended that a locator be supplied that can * at least report the base system identifier of the * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. // the grammar bucket's DTDGrammar will now be the // one we want, whether we're constructing it or not. // if we're not constructing it, then we should not have a reference }
// startDTD(XMLLocator) * Characters within an IGNORE conditional section. * @param text The ignored text. * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. // ignored characters in DTD * Notifies of the presence of a TextDecl line in an entity. If present, * this method will be called immediately following the startParameterEntity call. * <strong>Note:</strong> This method is only called for external * parameter entities referenced in the DTD. * @param version The XML version, or null if not specified. * @param encoding The IANA encoding name of the entity. * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. * This method notifies of the start of a parameter entity. The parameter * entity name start with a '%' character. * @param name The name of the parameter entity. * @param identifier The resource identifier. * @param encoding The auto-detected IANA encoding name of the entity * stream. This value will be null in those situations * where the entity encoding is not auto-detected (e.g. * internal parameter entities). * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. * This method notifies the end of a parameter entity. Parameter entity * names begin with a '%' character. * @param name The name of the parameter entity. * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. * An element declaration. * @param name The name of the element. * @param contentModel The element content model. * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. //check VC: Unique Element Declaration "MSG_ELEMENT_ALREADY_DECLARED",
}
// elementDecl(String,String) * The start of an attribute list. * @param elementName The name of the element that this attribute * list is associated with. * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. }
// startAttlist(String) * An attribute declaration. * @param elementName The name of the element that this attribute * @param attributeName The name of the attribute. * @param type The attribute type. This value will be one of * the following: "CDATA", "ENTITY", "ENTITIES", * "ENUMERATION", "ID", "IDREF", "IDREFS", * "NMTOKEN", "NMTOKENS", or "NOTATION". * @param enumeration If the type has the value "ENUMERATION" or * "NOTATION", this array holds the allowed attribute * values; otherwise, this array is null. * @param defaultType The attribute default type. This value will be * one of the following: "#FIXED", "#IMPLIED", * @param defaultValue The attribute default value, or null if no * default value is specified. * @param nonNormalizedDefaultValue The attribute default value with no normalization * performed, or null if no default value is specified. * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. //Get Grammar index to grammar array //more than one attribute definition is provided for the same attribute of a given element type. //this feature works only when validation is true. "MSG_DUPLICATE_ATTRIBUTE_DEFINITION",
// a) VC: One ID per Element Type, If duplicate ID attribute // b) VC: ID attribute Default. if there is a declareared attribute // default for ID it should be of type #IMPLIED or #REQUIRED //we should not report an error, when there is duplicate attribute definition for given element type //according to XML 1.0 spec, When more than one definition is provided for the same attribute of a given //element type, the first declaration is binding and later declaration are *ignored*. So processor should //ignore the second declarations, however an application would be warned of the duplicate attribute defintion // one typical case where this could be a problem, when any XML file // provide the ID type information through internal subset so that it is available to the parser which read //only internal subset. Now that attribute declaration(ID Type) can again be part of external parsed entity //referenced. At that time if parser doesn't make this distinction it will throw an error for VC One ID per //Element Type, which (second defintion) actually should be ignored. Application behavior may differ on the //basis of error or warning thrown. - nb. "MSG_MORE_THAN_ONE_ID_ATTRIBUTE",
// VC: One Notation Per Element Type, should check if there is a // duplicate NOTATION attribute // VC: Notation Attributes: all notation names in the // (attribute) declaration must be declared. //we should not report an error, when there is duplicate attribute definition for given element type //according to XML 1.0 spec, When more than one definition is provided for the same attribute of a given //element type, the first declaration is binding and later declaration are *ignored*. So processor should //ignore the second declarations, however an application would be warned of the duplicate attribute definition "MSG_MORE_THAN_ONE_NOTATION_ATTRIBUTE",
// VC: No Duplicate Tokens // XML 1.0 SE Errata - E2 // Only report the first uniqueness violation. There could be others, // but additional overhead would be incurred tracking unique tokens // that have already been encountered. -- mrglavas ?
"MSG_DISTINCT_TOKENS_IN_ENUMERATION" :
"MSG_DISTINCT_NOTATION_IN_ENUMERATION",
// VC: Attribute Default Legal "MSG_ATT_DEFAULT_INVALID",
}
// attributeDecl(String,String,String,String[],String,XMLString, XMLString, Augmentations) * The end of an attribute list. * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. * An internal entity declaration. * @param name The name of the entity. Parameter entity names start with * '%', whereas the name of a general entity is just the * @param text The value of the entity. * @param nonNormalizedText The non-normalized value of the entity. This * value contains the same sequence of characters that was in * the internal entity declaration, without any entity * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. //If the same entity is declared more than once, the first declaration //encountered is binding, SAX requires only effective(first) declaration //to be reported to the application //REVISIT: Does it make sense to pass duplicate Entity information across //its a new entity and hasn't been declared. //store internal entity declaration in grammar }
// internalEntityDecl(String,XMLString,XMLString) * An external entity declaration. * @param name The name of the entity. Parameter entity names start * with '%', whereas the name of a general entity is just * @param identifier An object containing all location information * pertinent to this external entity. * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. //If the same entity is declared more than once, the first declaration //encountered is binding, SAX requires only effective(first) declaration //to be reported to the application //REVISIT: Does it make sense to pass duplicate entity information across //its a new entity and hasn't been declared. //store external entity declaration in grammar }
// externalEntityDecl(String,XMLResourceIdentifier, Augmentations) * An unparsed entity declaration. * @param name The name of the entity. * @param identifier An object containing all location information * pertinent to this entity. * @param notation The name of the notation. * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. // VC: Notation declared, in the production of NDataDecl }
// unparsedEntityDecl(String,XMLResourceIdentifier,String,Augmentations) * @param name The name of the notation. * @param identifier An object containing all location information * pertinent to this notation. * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. // VC: Unique Notation Name }
// notationDecl(String,XMLResourceIdentifier, Augmentations) * The start of a conditional section. * @param type The type of the conditional section. This value will * either be CONDITIONAL_INCLUDE or CONDITIONAL_IGNORE. * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. * @see #CONDITIONAL_INCLUDE * @see #CONDITIONAL_IGNORE }
// startConditional(short) * The end of a conditional section. * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. // VC : Notation Declared. for external entity declaration [Production 76]. "MSG_NOTATION_NOT_DECLARED_FOR_UNPARSED_ENTITYDECL",
// VC: Notation Attributes: // all notation names in the (attribute) declaration must be declared. "MSG_NOTATION_NOT_DECLARED_FOR_NOTATIONTYPE_ATTRIBUTE",
// VC: No Notation on Empty Element // An attribute of type NOTATION must not be declared on an element declared EMPTY. "NoNotationOnEmptyElement",
// should be safe to release these references // check whether each element referenced in a content model is declared // sets the XMLDTDSource of this handler }
// setDTDSource(XMLDTDSource) // returns the XMLDTDSource of this handler }
// getDTDSource(): XMLDTDSource // XMLDTDContentModelHandler methods // sets the XMLContentModelDTDSource of this handler }
// setDTDContentModelSource(XMLDTDContentModelSource) // returns the XMLDTDSource of this handler }
// getDTDContentModelSource(): XMLDTDContentModelSource * The start of a content model. Depending on the type of the content * model, specific methods may be called between the call to the * startContentModel method and the call to the endContentModel method. * @param elementName The name of the element. * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. }
// startContentModel(String) * A content model of ANY. * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. * A content model of EMPTY. * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. * A start of either a mixed or children content model. A mixed * content model will immediately be followed by a call to the * <code>pcdata()</code> method. A children content model will * contain additional groups and/or elements. * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. * The appearance of "#PCDATA" within a group signifying a * mixed content model. This method will be the first called * following the content model's <code>startGroup()</code>. * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. * A referenced element in a mixed or children content model. * @param elementName The name of the referenced element. * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. // check VC: No duplicate Types, in a single mixed-content declaration "DuplicateTypeInMixedContent",
}
// childrenElement(String) * The separator between choices or sequences of a mixed or children * @param separator The type of children separator. * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. * @see #SEPARATOR_SEQUENCE * The occurrence count for a child in a children content model or * for the mixed content model group. * @param occurrence The occurrence count for the last element * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. * @see #OCCURS_ZERO_OR_ONE * @see #OCCURS_ZERO_OR_MORE * @see #OCCURS_ONE_OR_MORE * The end of a group for mixed or children content models. * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. * The end of a content model. * @param augs Additional information that may include infoset * @throws XNIException Thrown by handler to signal an error. * Normalize the attribute value of a non CDATA default attribute * collapsing sequences of space characters (x20) * @param value The value to normalize * @return Whether the value was changed or not. boolean skipSpace =
true;
// skip leading spaces // take the first whitespace as a space and skip the others // simply shift non space chars if needed // if we finished on a space trim it // set the new value length }
// isValidNmtoken(String): boolean }
// isValidName(String): boolean * Checks that all elements referenced in content models have * been declared. This method calls out to the error handler * Does a recursive (if necessary) check on the specified element's * content spec to make sure that all children refer to declared "UndeclaredElementInContentSpec",
// It's not a leaf, so we have to recurse its left and maybe right // nodes. Save both values before we recurse and trash the node. // Recurse on both children. }
// class XMLDTDProcessor