286N/A * reserved comment block 286N/A * DO NOT REMOVE OR ALTER! 286N/A * Copyright 1999-2002,2004, 2005 The Apache Software Foundation. 286N/A * Licensed under the Apache License, Version 2.0 (the "License"); 286N/A * you may not use this file except in compliance with the License. 286N/A * You may obtain a copy of the License at 286N/A * Unless required by applicable law or agreed to in writing, software 286N/A * distributed under the License is distributed on an "AS IS" BASIS, 286N/A * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 286N/A * See the License for the specific language governing permissions and 286N/A * limitations under the License. 286N/A * This class adds implementation for normalizeDocument method. 286N/A * It acts as if the document was going through a save and load cycle, putting 286N/A * the document in a "normal" form. The actual result depends on the features being set 286N/A * and governing what operations actually take place. See setNormalizationFeature for details. 286N/A * Noticeably this method normalizes Text nodes, makes the document "namespace wellformed", 286N/A * according to the algorithm described below in pseudo code, by adding missing namespace 286N/A * declaration attributes and adding or changing namespace prefixes, updates the replacement 286N/A * tree of EntityReference nodes, normalizes attribute values, etc. 286N/A * Mutation events, when supported, are generated to reflect the changes occuring on the 286N/A * See Namespace normalization for details on how namespace declaration attributes and prefixes 286N/A * NOTE: There is an initial support for DOM revalidation with XML Schema as a grammar. 286N/A * The tree might not be validated correctly if entityReferences, CDATA sections are 286N/A * present in the tree. The PSVI information is not exposed, normalized data (including element 286N/A * default content is not available). 286N/A * @author Elena Litani, IBM 286N/A * @author Neeraj Bajaj, Sun Microsystems, inc. 286N/A /** Debug normalize document*/ 286N/A /** Debug namespace fix up algorithm*/ 286N/A /** Debug document handler events */ 286N/A /** prefix added by namespace fixup algorithm should follow a pattern "NS" + index*/ 286N/A /** Validation handler represents validator instance. */ 286N/A /** error handler. may be null. */ 286N/A * Cached {@link DOMError} impl. 286N/A * The same object is re-used to report multiple errors. 286N/A // Validation against namespace aware grammar 286N/A // Update PSVI information in the tree 286N/A /** The namespace context of this document: stores namespaces in scope */ 286N/A /** Stores all namespace bindings on the current element */ 286N/A /** list of attributes */ 286N/A /** DOM Locator - for namespace fixup algorithm */ 286N/A /** for setting the PSVI */ 286N/A // attribute value normalization 286N/A * If the user stops the process, this exception will be thrown. 286N/A //Check if element content is all "ignorable whitespace" 286N/A * Note: reset() must be called before this method. 286N/A // intialize and reset DOMNormalizer component 286N/A // reset namespace context 286N/A // report fatal error on DOM Level 1 nodes 286N/A // check if we need to fill in PSVI 286N/A // reset schema validator 286N/A return;
// processing aborted by the user 286N/A throw e;
// otherwise re-throw. 286N/A * This method acts as if the document was going through a save 286N/A * and load cycle, putting the document in a "normal" form. The actual result 286N/A * depends on the features being set and governing what operations actually 286N/A * take place. See setNormalizationFeature for details. Noticeably this method 286N/A * normalizes Text nodes, makes the document "namespace wellformed", 286N/A * according to the algorithm described below in pseudo code, by adding missing 286N/A * namespace declaration attributes and adding or changing namespace prefixes, updates 286N/A * the replacement tree of EntityReference nodes,normalizes attribute values, etc. 286N/A * @param node Modified node or null. If node is returned, we need 286N/A * to normalize again starting on the node returned. 286N/A * @return the normalized Node 286N/A //REVISIT: well-formness encoding info 286N/A //do the name check only when version of the document was changed & 286N/A //application has set the value of well-formed features to true 286N/A "wf-invalid-character-in-node-name",
286N/A "wf-invalid-character-in-node-name");
286N/A // push namespace context 286N/A // fix namespaces and remove default attributes 286N/A // normalize attribute values 286N/A // remove default attributes 286N/A //removeDefault(attr, attributes); 286N/A "wf-invalid-character-in-node-name",
286N/A "wf-invalid-character-in-node-name");
286N/A // REVISIT: possible solutions to discard default content are: 286N/A // either we pass some flag to XML Schema validator 286N/A // or rely on the PSVI information. 286N/A // set error node in the dom error wrapper 286N/A // so if error occurs we can report an error node 286N/A // call re-validation handler 286N/A // REVISIT: possible solutions to discard default content are: 286N/A // either we pass some flag to XML Schema validator 286N/A // or rely on the PSVI information. 286N/A // set error node in the dom error wrapper 286N/A // so if error occurs we can report an error node 286N/A // call re-validation handler 286N/A // set error node in the dom error wrapper 286N/A // so if error occurs we can report an error node 286N/A // set error node in the dom error wrapper 286N/A // so if error occurs we can report an error node 286N/A // pop namespace context 286N/A // remove the comment node 286N/A }
//if comment node need not be removed 286N/A // check comments for invalid xml chracter as per the version 286N/A }
//end-else if comment node is not to be removed. 286N/A // The list of children #text -> &ent; 286N/A // and entity has a first child as a text 286N/A // we should not advance 286N/A // REVISIT: traverse entity reference and send appropriate calls to the validator 286N/A // (no normalization should be performed for the children). 286N/A // convert CDATA to TEXT nodes 286N/A // send characters call for CDATA 286N/A // set error node in the dom error wrapper 286N/A // so if error occurs we can report an error node 286N/A // set error node in the dom error wrapper 286N/A // so if error occurs we can report an error node 286N/A "cdata-sections-splitted",
286N/A "cdata-sections-splitted");
286N/A // check well-formedness 286N/A // If node is a text node, we need to check for one of two 286N/A // 1) There is an adjacent text node 286N/A // 2) There is no adjacent text node, but node is 286N/A // If an adjacent text node, merge it with this node 286N/A // We don't need to check well-formness here since we are not yet 286N/A // done with this node. 286N/A // If kid is empty, remove it 286N/A // validator.characters() call and well-formness 286N/A // Don't send characters or check well-formness in the following cases: 286N/A // 1. entities is false, next child is entity reference: expand tree first 286N/A // 2. comments is false, and next child is comment 286N/A // 3. cdata is false, and next child is cdata 286N/A //do the well-formed valid PI target name , data check when application has set the value of well-formed feature to true 286N/A //1.check PI target name 286N/A "wf-invalid-character-in-node-name",
286N/A "wf-invalid-character-in-node-name");
286N/A //processing isntruction data may have certain characters 286N/A //which may not be valid XML character 286N/A }
//end case Node.PROCESSING_INSTRUCTION_NODE 286N/A // normalize attribute values 286N/A // remove default attributes 286N/A // check attribute names if the version of the document changed. 286N/A // ------------------------------------ 286N/A // pick up local namespace declarations 286N/A // <!-- add the following via DOM 286N/A // ------------------------------------ 286N/A // Record all valid local declarations 286N/A //do the name check only when version of the document was changed & 286N/A //application has set the value of well-formed features to true 286N/A //checkQName does checking based on the version of the document 286N/A // "namespace-declarations" == false; Discard all namespace declaration attributes 286N/A // Check for invalid namespace declaration: 286N/A //A null value for locale is passed to formatMessage, 286N/A //which means that the default locale will be used 286N/A // XML 1.0 Attribute value normalization 286N/A // value = normalizeAttributeValue(value, attr); 286N/A // REVISIT: issue error on invalid declarations 286N/A //removeDefault (attr, attributes); 286N/A }
else {
// (localpart == fXmlnsSymbol && prefix == fEmptySymbol) -- xmlns 286N/A // empty prefix is always bound ("" or some string) 286N/A //removeDefault (attr, attributes); 286N/A }
// end-else: valid declaration 286N/A }
// end-if: namespace attribute 286N/A // --------------------------------------------------------- 286N/A // Fix up namespaces for element: per DOM L3 286N/A // Need to consider the following cases: 286N/A // as well as namespace attribute rebounding xsl to another namespace. 286N/A // Need to make sure that the new namespace decl value is changed to 286N/A // --------------------------------------------------------- 286N/A // --------------------------------------------------------- 286N/A // "namespace-declarations" == false? Discard all namespace declaration attributes 286N/A // no namespace declaration == no namespace URI, semantics are to keep prefix 286N/A // The xmlns:prefix=namespace or xmlns="default" was declared at parent. 286N/A // The binder always stores mapping of empty prefix to "". 286N/A // the prefix is either undeclared 286N/A // conflict: the prefix is bound to another URI 286N/A }
else {
// Element has no namespace 286N/A // Error: DOM Level 1 node! 286N/A "NullLocalElementName");
286N/A "NullLocalElementName");
286N/A }
else {
// uri=null and no colon (DOM L2 node) 286N/A // undeclare default namespace declaration (before that element 286N/A // bound to non-zero length uir), but adding xmlns="" decl 286N/A // ----------------------------------------- 286N/A // Fix up namespaces for attributes: per DOM L3 286N/A // ----------------------------------------- 286N/A // clone content of the attributes 286N/A // normalize attribute value 286N/A // make sure that value is never null. 286N/A // --------------------------------------- 286N/A // skip namespace declarations 286N/A // --------------------------------------- 286N/A // REVISIT: can we assume that "uri" is from some symbol 286N/A // table, and compare by reference? -SG 286N/A //--------------------------------------- 286N/A // check if value of the attribute is namespace well-formed 286N/A //--------------------------------------- 286N/A "wf-invalid-character-in-node-name",
286N/A "wf-invalid-character-in-node-name");
286N/A // --------------------------------------- 286N/A // remove default attributes 286N/A // --------------------------------------- 286N/A if (removeDefault(attr, attributes)) { 286N/A // XML 1.0 Attribute value normalization 286N/A //value = normalizeAttributeValue(value, attr); 286N/A // find if for this prefix a URI was already declared 286N/A // attribute has no prefix (default namespace decl does not apply to attributes) 286N/A // attribute prefix is not declared 286N/A // conflict: attribute has a prefix that conficlicts with a binding 286N/A // already active in scope 286N/A // Find if any prefix for attributes namespace URI is available 286N/A // use the prefix that was found (declared previously for this URI 286N/A // the current prefix is not null and it has no in scope declaration 286N/A // find a prefix following the pattern "NS" +index (starting at 1) 286N/A // make sure this prefix is not declared in the current scope. 286N/A // add declaration for the new prefix 286N/A // change prefix for this attribute 286N/A }
else {
// attribute uri == null 286N/A // XML 1.0 Attribute value normalization 286N/A //value = normalizeAttributeValue(value, attr); 286N/A // It is an error if document has DOM L1 nodes. 286N/A // uri=null and no colon 286N/A // no fix up is needed: default namespace decl does not 286N/A // --------------------------------------- 286N/A // remove default attributes 286N/A // --------------------------------------- 286N/A // removeDefault(attr, attributes); 286N/A }
// end loop for attributes 286N/A * Adds a namespace attribute or replaces the value of existing namespace 286N/A * attribute with the given prefix and value for URI. 286N/A * In case prefix is empty will add/update default namespace declaration. 286N/A * @exception IOException 286N/A // Methods for well-formness checking 286N/A * Check if CDATA section is well-formed 286N/A * @param isXML11Version = true if XML 1.1 286N/A // version of the document is XML 1.1 286N/A // we need to check all chracters as per production rules of XML11 286N/A // check if this is a supplemental character 286N/A "wf-invalid-character");
286N/A }
// version of the document is XML 1.0 286N/A // we need to check all chracters as per production rules of XML 1.0 286N/A // check if this is a supplemental character 286N/A // is being used to obtain the message and DOM error type 286N/A // "wf-invalid-character" is used. Also per DOM it is error but 286N/A // as per XML spec. it is fatal error 286N/A }
// end-else fDocument.isXMLVersion() 286N/A * NON-DOM: check for valid XML characters as per the XML version 286N/A * @param isXML11Version = true if XML 1.1 286N/A // version of the document is XML 1.1 286N/A //we need to check all characters as per production rules of XML11 286N/A // check if this is a supplemental character 286N/A "wf-invalid-character");
286N/A }
// version of the document is XML 1.0 286N/A // we need to check all characters as per production rules of XML 1.0 286N/A // check if this is a supplemental character 286N/A "wf-invalid-character");
286N/A }
// end-else fDocument.isXMLVersion() 286N/A * NON-DOM: check if value of the comment is well-formed 286N/A * @param isXML11Version = true if XML 1.1 286N/A // version of the document is XML 1.1 286N/A // we need to check all chracters as per production rules of XML11 286N/A // check if this is a supplemental character 286N/A "InvalidCharInComment",
286N/A // invalid: '--' in comment 286N/A }
// version of the document is XML 1.0 286N/A // we need to check all chracters as per production rules of XML 1.0 286N/A // check if this is a supplemental character 286N/A // invalid: '--' in comment 286N/A }
// end-else fDocument.isXMLVersion() 286N/A /** NON-DOM: check if attribute value is well-formed 286N/A //check each child node of the attribute's value 286N/A //If the attribute's child is an entity refernce 286N/A //search for the entity in the docType 286N/A //of the attribute's ownerDocument 286N/A //If the entity was not found issue a fatal error 286N/A "UndeclaredEntRefInAttrValue");
286N/A * Reports a DOM error to the user handler. 286N/A * If the error is fatal, the processing will be always aborted. 286N/A // and in the namespaceFixup. Should reduce number of calls to symbol table. 286N/A /* REVISIT: remove this method if DOM does not change spec. 286N/A * Performs partial XML 1.0 attribute value normalization and replaces 286N/A * attribute value if the value is changed after the normalization. 286N/A * DOM defines that normalizeDocument acts as if the document was going 286N/A * through a save and load cycle, given that serializer will not escape 286N/A * any '\n' or '\r' characters on load those will be normalized. 286N/A * Thus during normalize document we need to do the following: 286N/A * - perform "2.11 End-of-Line Handling" 286N/A * - replace #xD, #xA, #x9 with #x20 (white space). 286N/A * Note: This alg. won't attempt to resolve entity references or character entity 286N/A * references, since '&' will be escaped during serialization and during loading 286N/A * this won't be recognized as entity reference, i.e. attribute value "&foo;" will 286N/A * be serialized as "&foo;" and thus after loading will be "&foo;" again. 286N/A * @param value current attribute value 286N/A * @param attr current attribute 286N/A * @return String the value (could be original if normalization did not change 286N/A // specified attributes should already have a normalized form 286N/A // since those were added by validator 286N/A if (c==
0x0009 || c==
0x000A) {
286N/A // REVISIT: this implementation does not store any value in augmentations 286N/A // and basically not keeping augs in parallel to attributes map 286N/A // untill all attributes are added (default attributes) 286N/A * This method adds default declarations 286N/A // add defaults to the tree 286N/A // the default attribute was removed by a user and needed to 286N/A // REVISIT: the following should also update ID table 286N/A // default attribute is in the tree 286N/A // we don't need to do anything since prefix was already fixed 286N/A // at the namespace fixup time and value must be same value, otherwise 286N/A // attribute will be treated as specified and we will never reach 286N/A //return fAttributes.item(index).ge); 286N/A // REVISIT: is this desired behaviour? 286N/A // The values are updated in the case datatype-normalization is turned on 286N/A // in this case we need to make sure that specified attributes stay specified 286N/A * Sets the augmentations of the attribute at the specified index. 286N/A * @param attrIndex The attribute index. 286N/A * @param augs The augmentations. 286N/A // XMLDocumentHandler methods 286N/A * The start of the document. 286N/A * @param locator The document locator, or null if the document 286N/A * location cannot be reported during the parsing 286N/A * of this document. However, it is <em>strongly</em> 286N/A * recommended that a locator be supplied that can 286N/A * at least report the system identifier of the 286N/A * @param encoding The auto-detected IANA encoding name of the entity 286N/A * stream. This value will be null in those situations 286N/A * where the entity encoding is not auto-detected (e.g. 286N/A * internal entities or a document entity that is 286N/A * parsed from a java.io.Reader). 286N/A * @param namespaceContext 286N/A * The namespace context in effect at the 286N/A * start of this document. 286N/A * This object represents the current context. 286N/A * Implementors of this class are responsible 286N/A * for copying the namespace bindings from the 286N/A * the current context (and its parent contexts) 286N/A * if that information is important. 286N/A * @param augs Additional information that may include infoset augmentations 286N/A * @exception XNIException 286N/A * Thrown by handler to signal an error. 286N/A * Notifies of the presence of an XMLDecl line in the document. If 286N/A * present, this method will be called immediately following the 286N/A * @param version The XML version. 286N/A * @param encoding The IANA encoding name of the document, or null if 286N/A * @param standalone The standalone value, or null if not specified. 286N/A * @param augs Additional information that may include infoset augmentations 286N/A * @exception XNIException 286N/A * Thrown by handler to signal an error. 286N/A * Notifies of the presence of the DOCTYPE line in the document. 286N/A * The name of the root element. 286N/A * @param publicId The public identifier if an external DTD or null 286N/A * if the external DTD is specified using SYSTEM. 286N/A * @param systemId The system identifier if an external DTD, null 286N/A * @param augs Additional information that may include infoset augmentations 286N/A * @exception XNIException 286N/A * Thrown by handler to signal an error. 286N/A * @param text The text in the comment. 286N/A * @param augs Additional information that may include infoset augmentations 286N/A * @exception XNIException 286N/A * Thrown by application to signal an error. 286N/A * A processing instruction. Processing instructions consist of a 286N/A * target name and, optionally, text data. The data is only meaningful 286N/A * Typically, a processing instruction's data will contain a series 286N/A * of pseudo-attributes. These pseudo-attributes follow the form of 286N/A * element attributes but are <strong>not</strong> parsed or presented 286N/A * to the application as anything other than text. The application is 286N/A * responsible for parsing the data. 286N/A * @param target The target. 286N/A * @param data The data or null if none specified. 286N/A * @param augs Additional information that may include infoset augmentations 286N/A * @exception XNIException 286N/A * Thrown by handler to signal an error. 286N/A * The start of an element. 286N/A * @param element The name of the element. 286N/A * @param attributes The element attributes. 286N/A * @param augs Additional information that may include infoset augmentations 286N/A * @exception XNIException 286N/A * Thrown by handler to signal an error. 286N/A //REVISIT: instead we should be using augmentations: 286N/A // datatype-normalization 286N/A // NOTE: The specified value MUST be set after we set 286N/A // the node value because that turns the "specified" 286N/A // flag to "true" which may overwrite a "false" 286N/A // value from the attribute list. 286N/A * @param element The name of the element. 286N/A * @param attributes The element attributes. 286N/A * @param augs Additional information that may include infoset augmentations 286N/A * @exception XNIException 286N/A * Thrown by handler to signal an error. 286N/A * This method notifies the start of a general entity. 286N/A * <strong>Note:</strong> This method is not called for entity references 286N/A * appearing as part of attribute values. 286N/A * @param name The name of the general entity. 286N/A * @param identifier The resource identifier. 286N/A * @param encoding The auto-detected IANA encoding name of the entity 286N/A * stream. This value will be null in those situations 286N/A * where the entity encoding is not auto-detected (e.g. 286N/A * internal entities or a document entity that is 286N/A * parsed from a java.io.Reader). 286N/A * @param augs Additional information that may include infoset augmentations 286N/A * @exception XNIException Thrown by handler to signal an error. 286N/A * Notifies of the presence of a TextDecl line in an entity. If present, 286N/A * this method will be called immediately following the startEntity call. 286N/A * <strong>Note:</strong> This method will never be called for the 286N/A * document entity; it is only called for external general entities 286N/A * referenced in document content. 286N/A * <strong>Note:</strong> This method is not called for entity references 286N/A * appearing as part of attribute values. 286N/A * @param version The XML version, or null if not specified. 286N/A * @param encoding The IANA encoding name of the entity. 286N/A * @param augs Additional information that may include infoset augmentations 286N/A * @exception XNIException 286N/A * Thrown by handler to signal an error. 286N/A * This method notifies the end of a general entity. 286N/A * <strong>Note:</strong> This method is not called for entity references 286N/A * appearing as part of attribute values. 286N/A * @param name The name of the entity. 286N/A * @param augs Additional information that may include infoset augmentations 286N/A * @exception XNIException 286N/A * Thrown by handler to signal an error. 286N/A * @param text The content. 286N/A * @param augs Additional information that may include infoset augmentations 286N/A * @exception XNIException 286N/A * Thrown by handler to signal an error. 286N/A * Ignorable whitespace. For this method to be called, the document 286N/A * source must have some way of determining that the text containing 286N/A * only whitespace characters should be considered ignorable. For 286N/A * example, the validator can determine if a length of whitespace 286N/A * characters in the document are ignorable based on the element 286N/A * @param text The ignorable whitespace. 286N/A * @param augs Additional information that may include infoset augmentations 286N/A * @exception XNIException 286N/A * Thrown by handler to signal an error. 286N/A * The end of an element. 286N/A * @param element The name of the element. 286N/A * @param augs Additional information that may include infoset augmentations 286N/A * @exception XNIException 286N/A * Thrown by handler to signal an error. 286N/A // include element default content (if one is available) 286N/A // NOTE: this is a hack: it is possible that DOM had an empty element 286N/A // and validator sent default value using characters(), which we don't 286N/A // implement. Thus, here we attempt to add the default value. 286N/A // default content could be provided 286N/A * The start of a CDATA section. 286N/A * @param augs Additional information that may include infoset augmentations 286N/A * @exception XNIException 286N/A * Thrown by handler to signal an error. 286N/A * The end of a CDATA section. 286N/A * @param augs Additional information that may include infoset augmentations 286N/A * @exception XNIException 286N/A * Thrown by handler to signal an error. 286N/A * The end of the document. 286N/A * @param augs Additional information that may include infoset augmentations 286N/A * @exception XNIException 286N/A * Thrown by handler to signal an error. 286N/A /** Sets the document source. */ 286N/A /** Returns the document source. */ 286N/A}
// DOMNormalizer class