From 300f7d6d00eb8f957be42b4e32a0b14050b26538 Mon Sep 17 00:00:00 2001 From: Daniel Veillard Date: Fri, 24 Nov 2000 13:04:04 +0000 Subject: [PATCH] Added a small DTD related page following the IRC help needed by maciej on the topic, Daniel --- ChangeLog | 6 ++ doc/xml.html | 8 +- doc/xmldtd.html | 191 ++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 203 insertions(+), 2 deletions(-) create mode 100644 doc/xmldtd.html diff --git a/ChangeLog b/ChangeLog index afbf1952..7a3a23ab 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,9 @@ +Fri Nov 24 14:01:44 CET 2000 Daniel Veillard + + * doc/xmldtd.html doc/xml.html: following a short step by step + guidance on IRC to help maciej with DTDs I started a small + page on the subject. + Fri Nov 17 17:28:06 CET 2000 Daniel Veillard * HTMLparser.c: fixed handling of broken charrefs diff --git a/doc/xml.html b/doc/xml.html index 58794e50..e46d5f6d 100644 --- a/doc/xml.html +++ b/doc/xml.html @@ -1,7 +1,9 @@ + The XML C library for Gnome - + @@ -52,6 +54,8 @@ alt="W3C Logo">

  • libxml Internationalization support
  • libxml Input/Output interfaces
  • libxml Memory interfaces
  • +
  • a short introduction about DTDs and + libxml
  • Introduction

    @@ -1374,6 +1378,6 @@ Gnome CVS base under gnome-xml/example

    Daniel Veillard

    -

    $Id: xml.html,v 1.57 2000/10/25 13:32:38 veillard Exp $

    +

    $Id: xml.html,v 1.58 2000/11/13 18:22:47 veillard Exp $

    diff --git a/doc/xmldtd.html b/doc/xmldtd.html new file mode 100644 index 00000000..b526c63c --- /dev/null +++ b/doc/xmldtd.html @@ -0,0 +1,191 @@ + + + Libxml Input/Output handling + + + + + +

    Libxml DTD support

    + +

    Location: http://xmlsoft.org/xmldtd.html

    + +

    Libxml home page: http://xmlsoft.org/

    + +

    Mailing-list archive: http://xmlsoft.org/messages/

    + +

    Version: $Revision$

    + +

    Table of Content:

    +
      +
    1. General overview
    2. +
    3. The definition
    4. +
    5. Simple rules +
        +
      1. How to reference a DTD from a document
      2. +
      3. Declaring elements
      4. +
      5. Declaring attributes
      6. +
      +
    6. +
    7. Some examples
    8. +
    9. How to validate
    10. +
    11. Other resources
    12. +
    + +

    General overview

    + +

    DTD is the acronym for Document Type Definition. This is a description of +the content for a familly of XML files. This is part of the XML 1.0 +specification, and alows to describe and check that a given document instance +conforms to a set of rules detailing its structure and content.

    + +

    The definition

    + +

    The W3C XML Recommendation (Tim Bray's annotated version of +Rev1):

    + + +

    (unfortunately) all this is inherited from the SGML world, the syntax is +ancient...

    + +

    Simple rules

    + +

    Writing DTD can be done in multiple ways, the rules to build them if you +need something fixed or something which can evolve over time can be radically +different. Really complex DTD like Docbook ones are flexible but quite harder +to design. I will just focuse on DTDs for a formats with a fixed simple +structure. It is just a set of basic rules, and definitely not exhaustive nor +useable for complex DTD design.

    + +

    How to reference a DTD from a document:

    + +

    Assuming the top element of the document is spec and the dtd +is placed in the file mydtd in the subdirectory dtds +of the directory from where the document were loaded:

    + +

    <!DOCTYPE spec SYSTEM "dtds/mydtd">

    + +

    Notes:

    +
      +
    • the system string is actually an URI-Reference (as defined in RFC 2396) + so you can use a full URL string indicating the location of your DTD on + the Web, this is a really good thing to do if you want others to validate + your document
    • +
    • it is also possible to associate a PUBLIC identifier (a + magic string) so that the DTd is looked up in catalogs on the client side + without having to locate it on the web
    • +
    • a dtd contains a set of elements and attributes declarations, but they + don't define what the root of the document should be. This is explicitely + told to the parser/validator as the first element of the + DOCTYPE declaration.
    • +
    + +

    Declaring elements:

    + +

    The following declares an element spec:

    + +

    <!ELEMENT spec (front, body, back?)>

    + +

    it also expresses that the spec element contains one front, one body and +one optionnal back in this order. The declaration of one element of the +structure and its content are done in a single declaration. Similary the +following declares div1 elements:

    + +

    <!ELEMENT div1 (head, (p | list | note)*, div2*)>

    + +

    means div1 contains one head then a series of optional p, lists and notes +and then an optional div2. And last but not least an element can contain +text:

    + +

    <!ELEMENT b (#PCDATA)>

    + +

    b contains text or being of mixed content (text and elements +in no particular order):

    + +

    <!ELEMENT p (#PCDATA|a|ul|b|i|em)*>

    + +

    p can contain text or a, ul, +b, i or em elements in no particular +order.

    + +

    Declaring attributes:

    + +

    again the attributes declaration includes their content definition:

    + +

    <!ATTLIST termdef name CDATA #IMPLIED>

    + +

    means that the element termdef can have a name +attribute containing text (CDATA) and which is optionnal +(#IMPLIED). The attribute value can also be defined within a +set:

    + +

    <!ATTLIST list type (bullets|ordered|glossary) +"ordered">

    + +

    means list element have a type attribute with 3 +allowed values "bullets", "ordered" or "glossary" and which default to +"ordered" if the attribute is not explicitely specified.

    + +

    The content type of an attribute can be text (CDATA), +anchor/reference/references +(ID/IDREF/IDREFS), entity(ies) +(ENTITY/ENTITIES) or name(s) +(NMTOKEN/NMTOKENS). The following defines that a +chapter element can have an optional id attribute of +type ID, usable for reference from attribute of type IDREF:

    + +

    <!ATTLIST chapter id ID #IMPLIED>

    + +

    The last value of an attribute definition can be #REQUIRED +meaning that the attribute has to be given, #IMPLIED +meaning that it is optional, or the default value (possibly prefixed by +#FIXED if it is the only allowed).

    + +

    Some examples

    + +

    The directory test/valid/dtds/ in the libxml distribution +contains some complex DTD examples. The test/valid/dia.xml +example shows an XML file where the simple DTD is directly included within the +document.

    + +

    How to validate

    + +

    The simplest is to use the xmllint program comming with libxml. The +--valid option turn on validation of the files given as input, +for example the following validates a copy of the first revision of the XML +1.0 specification:

    + +

    xmllint --valid --noout test/valid/REC-xml-19980210.xml

    + +

    the -- noout is used to not output the resulting tree.

    + +

    The --dtdvalid dtd allows to validate the document(s) against +a given DTD.

    + +

    Libxml exports an API to handle DTDs and validation, check the associated +description.

    + +

    Other resources

    + +

    DTDs are as old as SGML. So there may be a number of examples on-line, I +will just list one for now, others pointers welcome:

    + + +

    + +

    Daniel Veillard

    + +

    $Id$

    + +