From ac297930c24964cfa1bde996d837081a74f57eea Mon Sep 17 00:00:00 2001 From: Daniel Veillard Date: Thu, 17 Apr 2003 12:55:35 +0000 Subject: [PATCH] some cleanups extended the document to cover RelaxNG and tree operations * relaxng.c: some cleanups * doc/xmlreader.html: extended the document to cover RelaxNG and tree operations * python/tests/Makefile.am python/tests/reader[46].py: added some xmlReader example/regression tests * result/relaxng/tutor*.err: updated the output of a number of tests Daniel --- ChangeLog | 9 +++ doc/xmlreader.html | 82 +++++++++++++++++++++-- python/tests/Makefile.am | 3 + python/tests/reader4.py | 45 +++++++++++++ python/tests/reader6.py | 118 +++++++++++++++++++++++++++++++++ relaxng.c | 43 ++++++------ result/relaxng/tutor10_7_3.err | 2 +- result/relaxng/tutor10_8_3.err | 2 +- result/relaxng/tutor3_2_1.err | 4 +- result/relaxng/tutor3_5_2.err | 6 +- result/relaxng/tutor9_5_2.err | 4 +- result/relaxng/tutor9_5_3.err | 2 +- result/relaxng/tutor9_6_2.err | 2 +- result/relaxng/tutor9_6_3.err | 2 +- 14 files changed, 286 insertions(+), 38 deletions(-) create mode 100755 python/tests/reader4.py create mode 100755 python/tests/reader6.py diff --git a/ChangeLog b/ChangeLog index ce1a6b6b..64ed432e 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,12 @@ +Thu Apr 17 14:51:57 CEST 2003 Daniel Veillard + + * relaxng.c: some cleanups + * doc/xmlreader.html: extended the document to cover RelaxNG and + tree operations + * python/tests/Makefile.am python/tests/reader[46].py: added some + xmlReader example/regression tests + * result/relaxng/tutor*.err: updated the output of a number of tests + Thu Apr 17 11:35:37 CEST 2003 Daniel Veillard * relaxng.c: valgrind pointed out an uninitialized variable error. diff --git a/doc/xmlreader.html b/doc/xmlreader.html index 7b4ab994..fd956466 100644 --- a/doc/xmlreader.html +++ b/doc/xmlreader.html @@ -13,6 +13,8 @@ H3 {font-family: Verdana,Arial,Helvetica} A:link, A:visited, A:active { text-decoration: underline }--> + + Libxml2 XmlTextReader Interface tutorial @@ -42,6 +44,9 @@ examples using both C and the Python bindings:

attributes
  • Validating a document
  • Entities substitution
  • +
  • Relax-NG Validation
  • +
  • Mixing the reader and tree or XPath + operations
  • @@ -147,8 +152,7 @@ def streamFile(filename): ret = reader.Read() if ret != 0: - print "%s : failed to parse" % (filename) - + print "%s : failed to parse" % (filename)

    The only things worth adding are that the xmlTextReader @@ -390,9 +394,79 @@ the validation feature is just:

    Entities substitution

    -

    @@TODO@@

    +

    By default the xmlReader will report entities as such and not replace them +with their content. This default behaviour can however be overriden using:

    -

    +

    reader.SetParserProp(libxml2.PARSER_SUBST_ENTITIES,1)

    + +

    Relax-NG Validation

    + +

    Introduced in version 2.5.7

    + +

    Libxml2 can now validate the document being read using the xmlReader using +Relax-NG schemas. While the Relax NG validator can't always work in a +streamable mode, only subsets which cannot be reduced to regular expressions +need to have their subtree expanded for validation. In practice it means +that, unless the schemas for the top level element content is not expressable +as a regexp, only chunk of the document needs to be parsed while +validating.

    + +

    The steps to do so are:

    +
      +
    • create a reader working on a document as usual
    • +
    • before any call to read associate it to a Relax NG schemas, either the + preparsed schemas or the URL to the schemas to use
    • +
    • errors will be reported the usual way, and the validity status can be + obtained using the IsValid() interface of the reader like for DTDs.
    • +
    + +

    Example, assuming the reader has already being created and that the schema +string contains the Relax-NG schemas:

    + +

    rngp = libxml2.relaxNGNewMemParserCtxt(schema, len(schema))
    +rngs = rngp.relaxNGParse()
    +reader.RelaxNGSetSchema(rngs)
    +ret = reader.Read()
    +while ret == 1:
    + ret = reader.Read()
    +if ret != 0:
    + print "Error parsing the document"
    +if reader.IsValid() != 1:
    + print "Document failed to validate"

    +See reader6.py in the sources or documentation for a complete +example.

    + +

    Mixing the reader and tree or XPath operations

    + +

    Introduced in version 2.5.7

    + +

    While the reader is a streaming interface, its underlying implementation +is based on the DOM builder of libxml2. As a result it is relatively simple +to mix operations based on both models under some constraints. To do so the +reader has an Expand() operation allowing to grow the subtree under the +current node. It returns a pointer to a standard node wich can be manipulated +in the usual ways. The node will get all its ancestors and the full subtree +available. Usual operations like XPath queries can be used on that reduced +view of the document. Here is an example extracted from reader5.py in the +sources which extract and prints the bibliography for the "Dragon" compiler +book from the XML 1.0 recommendation:

    +
    f = open('../../test/valid/REC-xml-19980210.xml')
    +input = libxml2.inputBuffer(f)
    +reader = input.newTextReader("REC")
    +res=""
    +while reader.Read():
    +    while reader.Name() == 'bibl':
    +        node = reader.Expand()            # expand the subtree
    +        if node.xpathEval("@id = 'Aho'"): # use XPath on it
    +            res = res + node.serialize()
    +        if reader.Next() != 1:            # skip the subtree
    +            break;
    + +

    Note however that the node instance returned by the Expand() call is only +valid until the next Read() operation. The Expand() operation does not +affects the Read() ones, however usually once processed the full subtree is +not useful anymore, and the Next() operation allows to skip it completely and +process to the successor or return 0 if the document end is reached.

    Daniel Veillard

    diff --git a/python/tests/Makefile.am b/python/tests/Makefile.am index 761046a5..0c16acf0 100644 --- a/python/tests/Makefile.am +++ b/python/tests/Makefile.am @@ -23,6 +23,9 @@ PYTESTS= \ reader.py \ reader2.py \ reader3.py \ + reader4.py \ + reader5.py \ + reader6.py \ ctxterror.py\ readererr.py\ relaxng.py diff --git a/python/tests/reader4.py b/python/tests/reader4.py new file mode 100755 index 00000000..0269cb0c --- /dev/null +++ b/python/tests/reader4.py @@ -0,0 +1,45 @@ +#!/usr/bin/python -u +# +# this tests the basic APIs of the XmlTextReader interface +# +import libxml2 +import StringIO +import sys + +# Memory debug specific +libxml2.debugMemory(1) + +def tst_reader(s): + f = StringIO.StringIO(s) + input = libxml2.inputBuffer(f) + reader = input.newTextReader("tst") + res = "" + while reader.Read(): + res=res + "%s (%s) [%s] %d\n" % (reader.NodeType(),reader.Name(), + reader.Value(), reader.IsEmptyElement()) + if reader.NodeType() == 1: # Element + while reader.MoveToNextAttribute(): + res = res + "-- %s (%s) [%s]\n" % (reader.NodeType(), + reader.Name(),reader.Value()) + return res + +expect="""1 (test) [None] 0 +1 (b) [None] 1 +1 (c) [None] 1 +15 (test) [None] 0 +""" + +res = tst_reader("""""") + +if res != expect: + print "Did not get the expected error message:" + print res + sys.exit(1) + +# Memory debug specific +libxml2.cleanupParser() +if libxml2.debugMemory(1) == 0: + print "OK" +else: + print "Memory leak %d bytes" % (libxml2.debugMemory(1)) + libxml2.dumpMemory() diff --git a/python/tests/reader6.py b/python/tests/reader6.py new file mode 100755 index 00000000..fe22079f --- /dev/null +++ b/python/tests/reader6.py @@ -0,0 +1,118 @@ +#!/usr/bin/python -u +# +# this tests the entities substitutions with the XmlTextReader interface +# +import sys +import StringIO +import libxml2 + +schema=""" + + + + + + + + + + + + + + +""" +# Memory debug specific +libxml2.debugMemory(1) + +# +# Parse the Relax NG Schemas +# +rngp = libxml2.relaxNGNewMemParserCtxt(schema, len(schema)) +rngs = rngp.relaxNGParse() +del rngp + +# +# Parse and validate the correct document +# +docstr=""" + +100 +""" + +f = StringIO.StringIO(docstr) +input = libxml2.inputBuffer(f) +reader = input.newTextReader("correct") +reader.RelaxNGSetSchema(rngs) +ret = reader.Read() +while ret == 1: + ret = reader.Read() + +if ret != 0: + print "Error parsing the document" + sys.exit(1) + +if reader.IsValid() != 1: + print "Document failed to validate" + sys.exit(1) + +# +# Parse and validate the incorrect document +# +docstr=""" + +1000 +""" + +err="" +expect="""RNG validity error: file error line 3 element text +Type byte doesn't allow value '1000' +RNG validity error: file error line 3 element text +Error validating datatype byte +RNG validity error: file error line 3 element text +Element item failed to validate content +""" + +def callback(ctx, str): + global err + err = err + "%s" % (str) +libxml2.registerErrorHandler(callback, "") + +f = StringIO.StringIO(docstr) +input = libxml2.inputBuffer(f) +reader = input.newTextReader("error") +reader.RelaxNGSetSchema(rngs) +ret = reader.Read() +while ret == 1: + ret = reader.Read() + +if ret != 0: + print "Error parsing the document" + sys.exit(1) + +if reader.IsValid() != 0: + print "Document failed to detect the validation error" + sys.exit(1) + +if err != expect: + print "Did not get the expected error message:" + print err + sys.exit(1) + +# +# cleanup +# +del f +del input +del reader +del rngs +libxml2.relaxNGCleanupTypes() + +# Memory debug specific +libxml2.cleanupParser() +if libxml2.debugMemory(1) == 0: + print "OK" +else: + print "Memory leak %d bytes" % (libxml2.debugMemory(1)) + libxml2.dumpMemory() diff --git a/relaxng.c b/relaxng.c index c98e04e2..d453b93e 100644 --- a/relaxng.c +++ b/relaxng.c @@ -8,11 +8,9 @@ /** * TODO: - * - error reporting - * - handle namespace declarations as attributes. * - add support for DTD compatibility spec * http://www.oasis-open.org/committees/relax-ng/compatibility-20011203.html - * - report better mem allocations at runtime and abort immediately. + * - report better mem allocations pbms at runtime and abort immediately. */ #define IN_LIBXML @@ -836,7 +834,6 @@ xmlRelaxNGFreeDefine(xmlRelaxNGDefinePtr define) * @size: the default size for the container * * Allocate a new RelaxNG validation state container - * TODO: keep a pool in the ctxt * * Returns the newly allocated structure or NULL in case or error */ @@ -1989,7 +1986,7 @@ xmlRelaxNGGetErrorString(xmlRelaxNGValidErr err, const xmlChar *arg1, case XML_RELAXNG_ERR_EXTRADATA: return(xmlCharStrdup("Extra data in the document")); default: - TODO + return(xmlCharStrdup("Unknown error !")); } if (msg[0] == 0) { snprintf(msg, 1000, "Unknown error code %d", err); @@ -2279,12 +2276,6 @@ xmlRelaxNGSchemaTypeCheck(void *data ATTRIBUTE_UNUSED, xmlSchemaTypePtr typ; int ret; - /* - * TODO: the type should be cached ab provided back, interface subject - * to changes. - * TODO: handle facets, may require an additional interface and keep - * the value returned from the validation. - */ if ((type == NULL) || (value == NULL)) return(-1); typ = xmlSchemaGetPredefinedType(type, @@ -2956,9 +2947,9 @@ xmlRelaxNGCompile(xmlRelaxNGParserCtxtPtr ctxt, xmlRelaxNGDefinePtr def) { case XML_RELAXNG_LIST: case XML_RELAXNG_PARAM: case XML_RELAXNG_VALUE: - TODO /* This should not happen and generate an internal error */ - printf("trying to compile %s\n", xmlRelaxNGDefName(def)); - + /* This should not happen and generate an internal error */ + fprintf(stderr, "RNG internal error trying to compile %s\n", + xmlRelaxNGDefName(def)); break; } return(ret); @@ -3302,7 +3293,6 @@ xmlRelaxNGParseValue(xmlRelaxNGParserCtxtPtr ctxt, xmlNodePtr node) { } } } - /* TODO check ahead of time that the value is okay per the type */ return(def); } @@ -4878,10 +4868,9 @@ xmlRelaxNGParseAttribute(xmlRelaxNGParserCtxtPtr ctxt, xmlNodePtr node) { ctxt->nbErrors++; break; case XML_RELAXNG_NOOP: - TODO if (ctxt->error != NULL) ctxt->error(ctxt->userData, - "Internal error, noop found\n"); + "RNG Internal error, noop found in attribute\n"); ctxt->nbErrors++; break; } @@ -5199,16 +5188,27 @@ xmlRelaxNGParseElement(xmlRelaxNGParserCtxtPtr ctxt, xmlNodePtr node) { ret->attrs = cur; break; case XML_RELAXNG_START: + if (ctxt->error != NULL) + ctxt->error(ctxt->userData, + "RNG Internal error, start found in element\n"); + ctxt->nbErrors++; + break; case XML_RELAXNG_PARAM: + if (ctxt->error != NULL) + ctxt->error(ctxt->userData, + "RNG Internal error, param found in element\n"); + ctxt->nbErrors++; + break; case XML_RELAXNG_EXCEPT: - TODO + if (ctxt->error != NULL) + ctxt->error(ctxt->userData, + "RNG Internal error, except found in element\n"); ctxt->nbErrors++; break; case XML_RELAXNG_NOOP: - TODO if (ctxt->error != NULL) ctxt->error(ctxt->userData, - "Internal error, noop found\n"); + "RNG Internal error, noop found in element\n"); ctxt->nbErrors++; break; } @@ -5438,9 +5438,6 @@ xmlRelaxNGCheckReference(xmlRelaxNGDefinePtr ref, name); ctxt->nbErrors++; } - /* - * TODO: make a closure and verify there is no loop ! - */ } /** diff --git a/result/relaxng/tutor10_7_3.err b/result/relaxng/tutor10_7_3.err index ebbc9aa4..bc3d6acd 100644 --- a/result/relaxng/tutor10_7_3.err +++ b/result/relaxng/tutor10_7_3.err @@ -1,2 +1,2 @@ RNG validity error: file ./test/relaxng/tutor10_7_3.xml line 2 element card -Element addressBook has extra content: card +Element card failed to validate attributes diff --git a/result/relaxng/tutor10_8_3.err b/result/relaxng/tutor10_8_3.err index 34eb5e94..06229bf1 100644 --- a/result/relaxng/tutor10_8_3.err +++ b/result/relaxng/tutor10_8_3.err @@ -1,2 +1,2 @@ RNG validity error: file ./test/relaxng/tutor10_8_3.xml line 2 element card -Element addressBook has extra content: card +Element card failed to validate attributes diff --git a/result/relaxng/tutor3_2_1.err b/result/relaxng/tutor3_2_1.err index 83e9a57c..73577fcb 100644 --- a/result/relaxng/tutor3_2_1.err +++ b/result/relaxng/tutor3_2_1.err @@ -1,4 +1,2 @@ RNG validity error: file ./test/relaxng/tutor3_2_1.xml line 1 element email -Expecting element name, got email -RNG validity error: file ./test/relaxng/tutor3_2_1.xml line 1 element email -Element card failed to validate content +Did not expect element email there diff --git a/result/relaxng/tutor3_5_2.err b/result/relaxng/tutor3_5_2.err index ed09a330..80acb18f 100644 --- a/result/relaxng/tutor3_5_2.err +++ b/result/relaxng/tutor3_5_2.err @@ -1,2 +1,4 @@ -RNG validity error: file ./test/relaxng/tutor3_5_2.xml line 2 element card -Element addressBook has extra content: card +RNG validity error: file ./test/relaxng/tutor3_5_2.xml line 2 element email +Expecting element name, got email +RNG validity error: file ./test/relaxng/tutor3_5_2.xml line 2 element email +Element card failed to validate content diff --git a/result/relaxng/tutor9_5_2.err b/result/relaxng/tutor9_5_2.err index 650ca981..ede3b450 100644 --- a/result/relaxng/tutor9_5_2.err +++ b/result/relaxng/tutor9_5_2.err @@ -1,2 +1,4 @@ RNG validity error: file ./test/relaxng/tutor9_5_2.xml line 2 element card -Element addressBook has extra content: card +Invalid sequence in interleave +RNG validity error: file ./test/relaxng/tutor9_5_2.xml line 2 element card +Element card failed to validate attributes diff --git a/result/relaxng/tutor9_5_3.err b/result/relaxng/tutor9_5_3.err index eee06c7c..4566bccb 100644 --- a/result/relaxng/tutor9_5_3.err +++ b/result/relaxng/tutor9_5_3.err @@ -1,2 +1,2 @@ RNG validity error: file ./test/relaxng/tutor9_5_3.xml line 2 element card -Element addressBook has extra content: card +Invalid attribute error for element card diff --git a/result/relaxng/tutor9_6_2.err b/result/relaxng/tutor9_6_2.err index 259cb073..1a10f1b6 100644 --- a/result/relaxng/tutor9_6_2.err +++ b/result/relaxng/tutor9_6_2.err @@ -1,2 +1,2 @@ RNG validity error: file ./test/relaxng/tutor9_6_2.xml line 2 element card -Element addressBook has extra content: card +Element card failed to validate attributes diff --git a/result/relaxng/tutor9_6_3.err b/result/relaxng/tutor9_6_3.err index 2157e524..e92c5f1a 100644 --- a/result/relaxng/tutor9_6_3.err +++ b/result/relaxng/tutor9_6_3.err @@ -1,2 +1,2 @@ RNG validity error: file ./test/relaxng/tutor9_6_3.xml line 2 element card -Element addressBook has extra content: card +Invalid attribute error for element card