38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync124907 HTML parse buffer problem when parsing larse in-memory docs
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync124110 DTD validation && wrong namespace
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync123564 xmllint --html --format
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync TODO for the XML parser and stuff:
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync ==================================
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync $Id: TODO,v 1.44 2005/01/07 13:56:19 veillard Exp $
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync this tend to be outdated :-\ ...
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsyncDOCS:
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync=====
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- use case of using XInclude to load for example a description.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync order document + product base -(XSLT)-> quote with XIncludes
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync |
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync HTML output with description of parts <---(XSLT)--
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsyncTODO:
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync=====
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- XInclude at the SAX level (libSRVG)
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- fix the C code prototype to bring back doc/libxml-undocumented.txt
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync to a reasonable level
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Computation of base when HTTP redirect occurs, might affect HTTP
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync interfaces.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Computation of base in XInclude. Relativization of URIs.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- listing all attributes in a node.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Better checking of external parsed entities TAG 1234
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Go through erratas and do the cleanup.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync http://www.w3.org/XML/xml-19980210-errata ... started ...
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- jamesh suggestion: SAX like functions to save a document ie. call a
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync function to open a new element with given attributes, write character
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync data, close last element, etc
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync + inversted SAX, initial patch in April 2002 archives.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- htmlParseDoc has parameter encoding which is not used.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync Function htmlCreateDocParserCtxt ignore it.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- fix realloc() usage.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Stricten the UTF8 conformance (Martin Duerst):
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync http://www.w3.org/2001/06/utf-8-test/.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync The bad files are in http://www.w3.org/2001/06/utf-8-wrong/.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- xml:id normalized value
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsyncTODO:
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync=====
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- move all string manipulation functions (xmlStrdup, xmlStrlen, etc.) to
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync global.c. Bjorn noted that the following files depends on parser.o solely
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync because of these string functions: entities.o, global.o, hash.o, tree.o,
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync xmlIO.o, and xpath.o.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Optimization of tag strings allocation ?
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- maintain coherency of namespace when doing cut'n paste operations
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync => the functions are coded, but need testing
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- function to rebuild the ID table
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- functions to rebuild the DTD hash tables (after DTD changes).
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsyncEXTENSIONS:
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync===========
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Tools to produce man pages from the SGML docs.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Add Xpointer recognition/API
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Add Xlink recognition/API
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync => started adding an xlink.[ch] with a unified API for XML and HTML.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync it's crap :-(
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Implement XSchemas
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync => Really need to be done <grin/>
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync - datatype are complete, but structure support is very limited.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- extend the shell with:
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync - edit
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync - load/save
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync - mv (yum, yum, but it's harder because directories are ordered in
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync our case, mvup and mvdown would be required)
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsyncDone:
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync=====
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Add HTML validation using the XHTML DTD
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync - problem: do we want to keep and maintain the code for handling
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync DTD/System ID cache directly in libxml ?
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync => not really done that way, but there are new APIs to check elements
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync or attributes. Otherwise XHTML validation directly ...
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- XML Schemas datatypes except Base64 and BinHex
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Relax NG validation
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- XmlTextReader streaming API + validation
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Add a DTD cache prefilled with xhtml DTDs and entities and a program to
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync manage them -> like the /usr/bin/install-catalog from SGML
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync right place seems $datadir/xmldtds
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync Maybe this is better left to user apps
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync => use a catalog instead , and xhtml1-dtd package
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Add output to XHTML
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync => XML serializer automatically recognize the DTd and apply the specific
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync rules.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Fix output of <tst val="x&#xA;y"/>
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- compliance to XML-Namespace checking, see section 6 of
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync http://www.w3.org/TR/REC-xml-names/
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Correct standalone checking/emitting (hard)
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync 2.9 Standalone Document Declaration
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Implement OASIS XML Catalog support
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync http://www.oasis-open.org/committees/entity/
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Get OASIS testsuite to a more friendly result, check all the results
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync once stable. the check-xml-test-suite.py script does this
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Implement XSLT
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync => libxslt
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Finish XPath
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync => attributes addressing troubles
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync => defaulted attributes handling
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync => namespace axis ?
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync done as XSLT got debugged
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- bug reported by Michael Meallin on validation problems
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync => Actually means I need to add support (and warn) for non-deterministic
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync content model.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Handle undefined namespaces in entity contents better ... at least
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync issue a warning
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- DOM needs
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync int xmlPruneProp(xmlNodePtr node, xmlAtttrPtr attr);
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync => done it's actually xmlRemoveProp xmlUnsetProp xmlUnsetNsProp
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- HTML: handling of Script and style data elements, need special code in
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync the parser and saving functions (handling of < > " ' ...):
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync http://www.w3.org/TR/html4/types.html#type-script
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync Attributes are no problems since entities are accepted.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- DOM needs
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync xmlAttrPtr xmlNewDocProp(xmlDocPtr doc, const xmlChar *name, const xmlChar *value)
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- problem when parsing hrefs with & with the HTML parser (IRC ac)
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- If the internal encoding is not UTF8 saving to a given encoding doesn't
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync work => fix to force UTF8 encoding ...
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync done, added documentation too
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Add an ASCII I/O encoder (asciiToUTF8 and UTF8Toascii)
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Issue warning when using non-absolute namespaces URI.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- the html parser should add <head> and <body> if they don't exist
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync started, not finished.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync Done, the automatic closing is added and 3 testcases were inserted
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Command to force the parser to stop parsing and ignore the rest of the file.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync xmlStopParser() should allow this, mostly untested
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- support for HTML empty attributes like <hr noshade>
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- plugged iconv() in for support of a large set of encodings.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- xmlSwitchToEncoding() rewrite done
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- URI checkings (no fragments) rfc2396.txt
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Added a clean mechanism for overload or added input methods:
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync xmlRegisterInputCallbacks()
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- dynamically adapt the alloc entry point to use g_alloc()/g_free()
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync if the programmer wants it:
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync - use xmlMemSetup() to reset the routines used.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Check attribute normalization especially xmlGetProp()
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Validity checking problems for NOTATIONS attributes
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Validity checking problems for ENTITY ENTITIES attributes
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Parsing of a well balanced chunk xmlParseBalancedChunkMemory()
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- URI module: validation, base, etc ... see uri.[ch]
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- turn tester into a generic program xmllint installed with libxml
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- extend validity checks to go through entities content instead of
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync just labelling them PCDATA
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Save Dtds using the children list instead of dumping the tables,
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync order is preserved as well as comments and PIs
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Wrote a notice of changes requires to go from 1.x to 2.x
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- make sure that all SAX callbacks are disabled if a WF error is detected
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- checking/handling of newline normalization
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync http://localhost/www.xml.com/axml/target.html#sec-line-ends
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- correct checking of '&' '%' on entities content.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- checking of PE/Nesting on entities declaration
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- checking/handling of xml:space
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync - checking done.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync - handling done, not well tested
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Language identification code, productions [33] to [38]
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync => done, the check has been added and report WFness errors
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Conditional sections in DTDs [61] to [65]
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync => should this crap be really implemented ???
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync => Yep OASIS testsuite uses them
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Allow parsed entities defined in the internal subset to override
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync the ones defined in the external subset (DtD customization).
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync => This mean that the entity content should be computed only at
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync use time, i.e. keep the orig string only at parse time and expand
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync only when referenced from the external subset :-(
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync Needed for complete use of most DTD from Eve Maler
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Add regression tests for all WFC errors
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync => did some in test/WFC
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync => added OASIS testsuite routines
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync http://xmlsoft.org/conf/result.html
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- I18N: http://wap.trondheim.com/vaer/index.phtml is not XML and accepted
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync by the XML parser, UTF-8 should be checked when there is no "encoding"
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync declared !
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Support for UTF-8 and UTF-16 encoding
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync => added some convertion routines provided by Martin Durst
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync patched them, got fixes from @@@
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync I plan to keep everything internally as UTF-8 (or ISO-Latin-X)
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync this is slightly more costly but more compact, and recent processors
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync efficiency is cache related. The key for good performances is keeping
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync the data set small, so will I.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync => the new progressive reading routines call the detection code
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync is enabled, tested the ISO->UTF-8 stuff
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- External entities loading:
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync - allow override by client code
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync - make sure it is alled for all external entities referenced
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync Done, client code should use xmlSetExternalEntityLoader() to set
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync the default loading routine. It will be called each time an external
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync entity entity resolution is triggered.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- maintain ID coherency when removing/changing attributes
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync The function used to deallocate attributes now check for it being an
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync ID and removes it from the table.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- push mode parsing i.e. non-blocking state based parser
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync done, both for XML and HTML parsers. Use xmlCreatePushParserCtxt()
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync and xmlParseChunk() and html counterparts.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync The tester program now has a --push option to select that parser
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync front-end. Douplicated tests to use both and check results are similar.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Most of XPath, still see some troubles and occasionnal memleaks.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- an XML shell, allowing to traverse/manipulate an XML document with
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync a shell like interface, and using XPath for the anming syntax
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync - use of readline and history added when available
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync - the shell interface has been cleanly separated and moved to debugXML.c
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- HTML parser, should be fairly stable now
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- API to search the lang of an attribute
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Collect IDs at parsing and maintain a table.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync PBM: maintain the table coherency
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync PBM: how to detect ID types in absence of DtD !
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Use it for XPath ID support
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Add validity checking
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync Should be finished now !
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Add regression tests with entity substitutions
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- External Parsed entities, either XML or external Subset [78] and [79]
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync parsing the xmllang DtD now works, so it should be sufficient for
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync most cases !
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- progressive reading. The entity support is a first step toward
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync asbtraction of an input stream. A large part of the context is still
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync located on the stack, moving to a state machine and putting everyting
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync in the parsing context should provide an adequate solution.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync => Rather than progressive parsing, give more power to the SAX-like
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync interface. Currently the DOM-like representation is built but
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync => it should be possible to define that only as a set of SAX callbacks
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync and remove the tree creation from the parser code.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync DONE
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- DOM support, instead of using a proprietary in memory
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync format for the document representation, the parser should
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync call a DOM API to actually build the resulting document.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync Then the parser becomes independent of the in-memory
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync representation of the document. Even better using RPC's
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync the parser can actually build the document in another
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync program.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync => Work started, now the internal representation is by default
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync very near a direct DOM implementation. The DOM glue is implemented
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync as a separate module. See the GNOME gdome module.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- C++ support : John Ehresman <jehresma@dsg.harvard.edu>
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Updated code to follow more recent specs, added compatibility flag
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Better error handling, use a dedicated, overridable error
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync handling function.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Support for CDATA.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Keep track of line numbers for better error reporting.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Support for PI (SAX one).
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Support for Comments (bad, should be in ASAP, they are parsed
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync but not stored), should be configurable.
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync- Improve the support of entities on save (+SAX).
38ae7e4efe803ea78b6499cd05a394db32623e41vboxsync