Don't use a separate function to handle "complex" attributes. Validate
UTF-8 byte sequences without decoding. This should improve performance
considerably when parsing multi-byte UTF-8 sequences.
Use a string buffer to avoid unnecessary allocations and copying when
expanding entities.
Normalize attribute values in a single pass while expanding entities.
Be more lenient in recovery mode.
If no entity substitution was requested, validate entities without
expanding. Fixes#596.
Also fixes#655.
Use in ctxt->input->entity instead of ctxt->inputNr to determine whether
we are inside a parameter entity.
Stop using ctxt->external to check whether we're in an external DTD.
This is signaled by ctxt->inSubset == 2.
This regressed in commit e0dd330b.
Also fixes a long-standing issue where namespaces from default
attributes weren't added if they match an existing namespace.
Fixes#643.
Covered by: test/VC/ElementValid5
This only affects XML Reader API with LIBXML_REGEXP_ENABLED and
LIBXML_VALID_ENABLED turned on.
* result/VC/ElementValid5.rdr:
- Update result to add missing error message.
* python/tests/reader2.py:
* result/VC/ElementValid6.rdr:
* result/VC/ElementValid7.rdr:
* result/valid/781333.xml.err.rdr:
- Update result to fix grammar issue.
* valid.c:
(xmlValidatePopElement):
- Check return value of xmlRegExecPushString() to handle -1, and
assign 'ret = 0;' to return 0 from xmlValidatePopElement().
This change affects xmlTextReaderValidatePop() from
xmlreader.c.
- Fix grammar of error message by changing 'child' to
'children'.
If we try to continue parsing after an error in the internal or external
subset, entity expansion accounting gets more complicated. Simply halt
the parser.
Found with libFuzzer.
Remove all the parts of the old test suite which are covered by
runtest.c for quite some time.
The following test programs are removed:
- testC14N
- testHTML
- testReader
- testRelax
- testSAX
- testSchemas
- testURI
- testXPath
This also removes a few results of unimportant tests only run by the old
test suite.
Implement section "4.6 Predefined Entities" of the XML 1.0 spec and
check whether redeclarations of predefined entities match the original
definitions.
Note that some test cases declared
<!ENTITY lt "<">
But the XML spec clearly states that this is illegal:
> If the entities lt or amp are declared, they MUST be declared as
> internal entities whose replacement text is a character reference to
> the respective character (less-than sign or ampersand) being escaped;
> the double escaping is REQUIRED for these entities so that references
> to them produce a well-formed result.
Also fixes#217 but the connection is only tangential. The integer
overflow discovered by fuzzing was more related to the fact that various
parts of the parser disagreed on whether to prefer predefined entities
over their redeclarations. The whole situation is a mess and even
depends on legacy parser options. But now that redeclarations are
validated, it shouldn't make a difference.
As noted in the added comment, this is also one of the cases where
overly defensive checks can hide interesting logic bugs from fuzzers.
Now that replacement of parameter entities goes exclusively through
xmlSkipBlankChars, we can account for the surrounding space characters
there and remove the "blanks wrapper" hack.
Make sure to finish all entities in the internal subset. Nevertheless,
readd a sanity check in xmlParseStartTag2 that was lost in my previous
commit. Also add a sanity check in xmlPopInput. Popping an input
unexpectedly was the source of many recent memory bugs. The check
doesn't mitigate such issues but helps with diagnosis.
Always base entity boundary checks on the input ID, not the input
pointer. The pointer could have been reallocated to the old address.
Always throw a well-formedness error if a boundary check fails. In a
few places, a validity error was thrown.
Fix a few error codes and improve indentation.
xmlSnprintfElementContent failed to correctly check the available
buffer space in two locations.
Fixes bug 781333 (CVE-2017-9047) and bug 781701 (CVE-2017-9048).
Thanks to Marcel Böhme and Thuan Pham for the report.
There were two bugs where parameter-entity references could lead to an
unexpected change of the input buffer in xmlParseNameComplex and
xmlDictLookup being called with an invalid pointer.
Percent sign in DTD Names
=========================
The NEXTL macro used to call xmlParserHandlePEReference. When parsing
"complex" names inside the DTD, this could result in entity expansion
which created a new input buffer. The fix is to simply remove the call
to xmlParserHandlePEReference from the NEXTL macro. This is safe because
no users of the macro require expansion of parameter entities.
- xmlParseNameComplex
- xmlParseNCNameComplex
- xmlParseNmtoken
The percent sign is not allowed in names, which are grammatical tokens.
- xmlParseEntityValue
Parameter-entity references in entity values are expanded but this
happens in a separate step in this function.
- xmlParseSystemLiteral
Parameter-entity references are ignored in the system literal.
- xmlParseAttValueComplex
- xmlParseCharDataComplex
- xmlParseCommentComplex
- xmlParsePI
- xmlParseCDSect
Parameter-entity references are ignored outside the DTD.
- xmlLoadEntityContent
This function is only called from xmlStringLenDecodeEntities and
entities are replaced in a separate step immediately after the function
call.
This bug could also be triggered with an internal subset and double
entity expansion.
This fixes bug 766956 initially reported by Wei Lei and independently by
Chromium's ClusterFuzz, Hanno Böck, and Marco Grassi. Thanks to everyone
involved.
xmlParseNameComplex with XML_PARSE_OLD10
========================================
When parsing Names inside an expanded parameter entity with the
XML_PARSE_OLD10 option, xmlParseNameComplex would call xmlGROW via the
GROW macro if the input buffer was exhausted. At the end of the
parameter entity's replacement text, this function would then call
xmlPopInput which invalidated the input buffer.
There should be no need to invoke GROW in this situation because the
buffer is grown periodically every XML_PARSER_CHUNK_SIZE characters and,
at least for UTF-8, in xmlCurrentChar. This also matches the code path
executed when XML_PARSE_OLD10 is not set.
This fixes bugs 781205 (CVE-2017-9049) and 781361 (CVE-2017-9050).
Thanks to Marcel Böhme and Thuan Pham for the report.
Additional hardening
====================
A separate check was added in xmlParseNameComplex to validate the
buffer size.
For https://bugzilla.gnome.org/show_bug.cgi?id=759671
when the end of the internal subset isn't properly detected
xmlParseInternalSubset should just return instead of trying
to process input further.
For https://bugzilla.gnome.org/show_bug.cgi?id=737840
the fix for 724903 introduced a regression on external entities carrying
IDs, revert that patch in part and add a specific test to avoid readding it
Use the tree node to provide the error context instead
of the parser input which is not relevant anymore,
based on a suggestion by François Delyon <f.delyon@satimage.fr>
* valid.c: fix a bug when valdating mixed content lists and some
name use namespaces prefixes.
* result/valid/notes.xml* test/valid/dtds/notes.dtd * test/valid/notes.xml:
add the test case to the regression suite
* parser.c: fix a problem reported by Ashwin for system parameter
entities referenced from entities in external subset, add a
specific loading routine.
* test/valid/dtds/external.ent test/valid/dtds/external2.ent
test/valid/t11.xml result/valid/t11.xml*: added the test to
the regression suite
Daniel
svn path=/trunk/; revision=3713
* encoding.c: poblem with encoding detection for UTF-16 reported by
Ashwin and found by Bill
* test/valid/dtds/utf16b.ent test/valid/dtds/utf16l.ent
test/valid/UTF16Entity.xml result/valid/UTF16Entity.xml*: added
the example to the regression tests
Daniel
svn path=/trunk/; revision=3700
* parser.c: fixed bug #170489 reported by Jirka Kosek
* test/valid/objednavka.xml test/valid/dtds/objednavka.dtd
result/valid/objednavka*: added the test to the regression suite.
Daniel
* valid.c: implemented bugfix from Massimo Morara for DTD
dumping problem.
* test/valid/t10.xml, result/valid/t10.*: added regression
for above
* configure.in: small change for my profile settings
* entities.c: fixed#127877, never output " in element content
* result/isolat3 result/slashdot16.xml result/noent/isolat3
result/noent/slashdot16.xml result/valid/REC-xml-19980210.xml
result/valid/index.xml result/valid/xlink.xml: this changes the
output of a few tests
Daniel
* parser.c: swapped the attribute defaulting and attribute checking
parts of parsing a new element start, fixes bug #127772
* result/valid/127772.* test/valid/127772.xml
test/valid/dtds/127772.dtd: added the example in the regression tests
Daniel
* HTMLparser.c: fixed to not send NULL to %s printing
* python/tests/error.py result/HTML/doc3.htm.err
result/HTML/test3.html.err result/HTML/wired.html.err
result/valid/t8.xml.err result/valid/t8a.xml.err: cleaning
up some of the regression tests error
Daniel
* error.c include/libxml/xmlerror.h include/libxml/xpath.h
include/libxml/xpathInternals.h xpath.c: cleaning up XPath
error reporting that time.
* threads.c: applied the two patches for TLS threads
on Windows from Jesse Pelton
* parser.c: tiny safety patch for xmlStrPrintf() make sure the
return is always zero terminated. Should also help detecting
passing wrong buffer size easilly.
* result/VC/* result/valid/rss.xml.err result/valid/xlink.xml.err:
updated the results to follow the errors string generated by
last commit.
Daniel
* Makefile.am: more cleanup in make tests
* error.c valid.c parser.c include/libxml/xmlerror.h: more work
in the transition to the new error reporting strategy.
* python/tests/reader2.py result/VC/* result/valid/*:
few changes in the strings generated by the validation output
Daniel
* Makefile.am: changed 'make tests' to use a concise output,
scrolling to see where thing broke wasn't pleasant
* configure.in: some beta4 preparation, but not ready yet
* error.c globals.c include/libxml/globals.h include/libxml/xmlerror.h:
new error handling code, last error informations are stored
in the parsing context or a global variable, new APIs to
handle the xmlErrorPtr type.
* parser.c parserInternals.c valid.c : started migrating to the
new error handling code, it's a royal pain.
* include/libxml/parser.h include/libxml/parserInternals.h:
moved the definition of xmlNewParserCtxt()
* parser.c: small potential buffer access problem in push code
provided by Justin Fletcher
* result/*.sax result/VC/PENesting* result/namespaces/*
result/valid/*.err: some error messages were sligthly changed.
Daniel
* SAX2.c: fixing namespace DTD validations
* result/valid/ns2.xml result/valid/ns.xml: the output of defaulted
namespaces is slightly different now.
* Makefile.am: report the memory used in Timingtests (as well as time)
Daniel
* parser.c include/libxml/xmlerror.h: factoring of more
error handling code, serious size reduction and more lisibility
of the resulting code.
* parserInternals.c parser.c include/libxml/parserInternals.h
include/libxml/parser.h: changing the way VC:Proper Group/PE Nesting
checks are done, use a counter for entities. Entities where freed and
reallocated at the same address failing the check.
* tree.c: avoid a warning
* result/valid/* result/VC/*: this slightly changes some validation
error messages.
Daniel
* valid.c: fixed bug #118712 about mixed content, and namespaced
element names.
* test/valid/mixed_ns.xml result/valid/mixed_ns*: added a check
in the regression tests
Daniel
* SAX.c test/valid/ns* test/result/ns*: applied the patch
provided by Brent Hendricks fixing #105992 and integrated the
examples in the testsuite.
Daniel
* parser.c: fixing bug #108976 get the ID/REFs to reference
the ID in the document content and not in the entity copy
* SAX.c include/libxml/parser.h: more checking of the ID/REF
stuff, better solution for #107208
* xmlregexp.c: removed a direct printf, dohhh
* xmlreader.c: fixed a bug on streaming validation of empty
elements in entities
* result/VC/ElementValid8 test/VCM/v20.xml result/valid/xhtml1.xhtml:
cleanup of the validation tests
* test/valid/id* test/valid/dtds/destfoo.ent result/valid/id*:
added more ID/IDREF tests to the suite
Daniel
* tree.c include/libxml/tree.h: modified the existing APIs
to handle XHTML1 serialization rules automatically, also add
xmlIsXHTML() to libxml2 API. Some tweaking to make sure
libxslt serialization uses it when needed without changing
the library API.
* test/xhtml1 result/noent/xhtml1 result/valid/xhtml1.xhtml
result/xhtml1: added a new test specifically for xhtml1 output
and updated the result of one XHTML1 test
Daniel
* SAX.c valid.c include/libxml/valid.h: fixed bug #92518 validation
error were not covering namespace declarations.
* result/valid/dia.xml test/valid/dia.xml: the test wasn't valid,
it was missing the attribute declaration for the namespace
* result/VC/NS3: the fix now report breakages in that test
Daniel
* error.c valid.c: working on better error reporting of validity
errors, especially providing an accurate context.
* result/valid/xlink.xml.err result/valid/rss.xml.err: better
error reports in those cases.
Daniel
* parser.c include/libxml/parser.h: adding a new API for Christian
Glahn: xmlParseBalancedChunkMemoryRecover
* valid.c: patch from Rick Jones for some grammar cleanup in
validation messages
* result/VC/* result/valid/*: this slightly change some of the
regression tests outputs
Daniel
* parser.c: applied a couple of patches from Peter Jacobi to start
to get rid of ctxt->token, with a possible significant speed
improvement to be gained once done. Better compliance with PE
references constructs in DTDs too.
* test/valid/t[0-9]* result/valid/t[0-9]*: added a set of tests
from Peter too
Daniel