Commit 9e1c72da from 2001 introduced a bug where xmlAddPrevSibling and
xmlAddNextSibling would only try to merge text nodes with one of its
new siblings. Commit 4ccd3eb8 fixed this bug but unfortunately, lxml
and possibly other downstream code depend on text nodes not being
merged.
To avoid breaking downstream code while still having somewhat
consistent API behavior, it's probably best to make these functions
never coalesce text nodes.
This allows to report the reason why opening a file failed to the parser
context and improve error messages. Now we can also remove the stat call
before opening a file.
Also make text inclusions work with memory buffers, for example when
using a custom entity loader, and fix a memory leak in case of invalid
characters.
Fixes#483.
Avoids resource exhaustion if the maximum recursion depth was exceeded.
Note that the XInclude engine offers no protection against other
"billion laughs"-style amplification attacks as long as they stay below
the maximum depth.
When using xmlreader, XPointer expressions in XIncludes simply cannot
work. Expressions can reference nodes which weren't parsed yet or which
were already deleted.
After fixing nested XIncludes, we reference includes which were parsed
previously. When streaming, these nodes could have been deleted, leading
to use-after-free errors.
Disallow XPointer expressions and truncate the include table in
streaming mode.
Don't create subcontext in xmlXIncludeRecurseDoc. Save and restore 'doc'
and 'incTab' instead.
Make xmlXIncludeLoadFallback call xmlXIncludeCopyNode which seems safer
than xmlXIncludeDoProcess since the latter may modify the document.
This should also be more performant since we need to copy the whole
fallback subtree anyway. Also make sure to avoid replacements in
fallback elements in xmlXIncludeDoProcess.
The reader interface with XIncludes is somewhat broken and can generate
different error messages. Start to move tests which are sketchy with
reader to a separate directory.
Commit 7618a3b1 didn't account for coalesced text nodes.
I think it would be better if xmlStaticCopyNode didn't try to coalesce
text nodes at all. This code path can only be triggered if some other
code doesn't coalesce text nodes properly. In this case, OSS-Fuzz found
such behavior in xinclude.c.
If a document is parsed with XML_PARSE_DTDVALID and without
XML_PARSE_NOENT, the value of ID attributes has to be normalized after
potentially expanding entities in xmlRemoveID. Otherwise, later calls
to xmlGetID can return a pointer to previously freed memory.
ID attributes which are empty or contain only whitespace after
entity expansion are affected in a similar way. This is fixed by
not storing such attributes in the ID table.
The test to detect streaming mode when validating against a DTD was
broken. In connection with the defects above, this could result in a
use-after-free when using the xmlReader interface with validation.
Fix detection of streaming mode to avoid similar issues. (This changes
the expected result of a test case. But as far as I can tell, using the
XML reader with XIncludes referencing the root document never worked
properly, anyway.)
All of these issues can result in denial of service. Using xmlReader
with validation could result in disclosure of memory via the error
channel, typically stderr. The security impact of xmlGetID returning
a pointer to freed memory depends on the application. The typical use
case of calling xmlGetID on an unmodified document is not affected.
The code wasn't dead after all, but I can see no reason in delaying
the XPointer evaluation. This could lead to nodes included earlier
appearing in XPointer results.
When creating XML_XINCLUDE_START nodes, the children of the original
xi:include node must be freed, otherwise fallback content is copied
twice, doubling runtime and memory consumption for each nested
xi:fallback/xi:include pair.
Found with libFuzzer.
* uri.c: fixed a problem when base path was "./xxx"
* result/XInclude/*: 5 test results changed by above.
* Makefile.am: fixed a couple of spots where a new
result file used different flags that the testing one.
* uri.c, include/libxml/uri.h: added a new routine
xmlBuildRelativeURI needed for enhancement of xinclude.c
* xinclude.c: changed handling of xml:base (bug 135864)
* result/XInclude/*: results of 5 tests changed as a result
of the above change
* xinclude.c xmllint.c xmlreader.c include/libxml/xinclude.h
include/libxml/xmlerror.h: augmented the XInclude API
to be able to pass XML parser flags down to the Inclusion
process. Also resynchronized with the Last Call W3C Working
Draft 10 November 2003 for the xpointer attribute.
* Makefile.am test/XInclude/docs/nodes[23].xml
result/XInclude/*: augmented the tests for the new namespace and
testing the xpointer attribute, changed the way error messages
are tested
* doc/*: regenerated the documentation
Daniel
* xinclude.c xmlreader.c include/libxml/xinclude.h: adding XInclude
support to the reader interface. Lot of testing of the walker,
various bug fixes.
* xmllint.c: added --walker and made sure --xinclude --stream --debug
works as expected
* Makefile.am result/dtd11.rdr result/ent6.rdr test/dtd11 test/ent6
result/XInclude/*.rdr: added regression tests for the walker and
XInclude xmlReader support, had to slightly change a couple of tests
because the walker can't distinguish <foo/> from <foo></foo>
Daniel
* xinclude.c parserInternals.c encoding.c: fixed#99082
for xi:include encoding="..." support on text includes.
* result/XInclude/tstencoding.xml test/XInclude/docs/tstencoding.xml
test/XInclude/ents/isolatin.txt : added a specific regression test
* python/generator.py python/libxml2class.txt: fixed the generator
the new set of comments generated for doc/libxml2-api.xml were
breaking the python generation.
Daniel
* xinclude.c: quick but apparently working implementation of
xi:fallback, should close bug #89684
* Makefile.am test/XInclude/docs/fallback.xml
result/XInclude/fallback.xml: added a basic test for fallback,
and run with --nowarning to avoid a spurious warning
* configure.in: applied patch from Frederic Crozat for python
bindings on AMD 64bits machines.
Daniel
* tree.c valid.c xinclude.c: fix#68882, cleanup the XInclude
copying of node, merge back IDs in the target document.
* result/XInclude/docids.xml test/XInclude/docs/docids.xml
test/XInclude/ents/ids.xml: test case
* result/VC/ElementValid4: output changed due to a typo fix
Daniel
- result/XInclude/recursive.xml test/XInclude/docs/recursive.xml
test/XInclude/ents/inc.txt test/XInclude/ents/sub-inc.ent:
added specific regression test
- parser.h: preparing for the XSLT mode where DTD inherited
attributes are added to the tree.
Daniel
- nanoftp.c: fixed gcc 2.95 new warnings
- SAX.c: fixed a stupid bug
- tree.c: fixed a formatting problem when round-tripping
from/to memory
- xinclude.c: chased memleak, fixed a base problem
- xpointer.c: added xmlXPtrBuildRangeNodeList(), finished ?
xmlXPtrBuildNodeList()
- TODO: updated
- Makefile.am test/XInclude/docs test/XInclude/ents result/XInclude:
adding a first small set of regression tests for XInclude
Daniel