mirror of
https://github.com/GNOME/libxml2.git
synced 2025-02-23 18:29:14 +08:00
Mon Sep 3 10:07:03 MDT 2001 John Fleck <jfleck@inkstain.net> *doc/catalog.html - add link to the html version of the man page, other linguistic cleanups
399 lines
18 KiB
HTML
399 lines
18 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
|
|
"http://www.w3.org/TR/html4/loose.dtd">
|
|
<html>
|
|
<head>
|
|
<title>Libxml Catalog support</title>
|
|
<meta name="GENERATOR" content="amaya V5.0">
|
|
<meta http-equiv="Content-Type" content="text/html">
|
|
</head>
|
|
|
|
<body bgcolor="#ffffff">
|
|
<h1 align="center">Libxml Catalog support</h1>
|
|
|
|
<p>Location: <a
|
|
href="http://xmlsoft.org/catalog.html">http://xmlsoft.org/catalog.html</a></p>
|
|
|
|
<p>Libxml home page: <a href="http://xmlsoft.org/">http://xmlsoft.org/</a></p>
|
|
|
|
<p>Mailing-list archive: <a
|
|
href="http://mail.gnome.org/archives/xml/">http://mail.gnome.org/archives/xml/</a></p>
|
|
|
|
<p>Version: $Revision: 1.4 $</p>
|
|
|
|
<p>Table of Content:</p>
|
|
<ol>
|
|
<li><a href="#General">General overview</a></li>
|
|
<li><a href="#definition">The definition</a></li>
|
|
<li><a href="#Simple">Using catalogs</a></li>
|
|
<li><a href="#Some">Some examples</a></li>
|
|
<li><a href="#reference">How to tune catalog usage</a></li>
|
|
<li><a href="#validate">How to debug catalog processing</a></li>
|
|
<li><a href="#Declaring">How to create and maintain catalogs</a></li>
|
|
<li><a href="#implemento">The implementor corner quick review of the
|
|
API</a></li>
|
|
<li><a href="#Other">Other resources</a></li>
|
|
</ol>
|
|
|
|
<h2><a name="General">General overview</a></h2>
|
|
|
|
<p>What is a catalog? Basically it's a lookup mechanism used when
|
|
an entity (a file or a remote resource) references another entity. The catalog
|
|
lookup is inserted between the moment the reference is recognized by the
|
|
software (XML parser, stylesheet processing, or even images referenced for
|
|
inclusion in a rendering) and the time where loading that resource is
|
|
actually started.</p>
|
|
|
|
<p>It is basically used for 3 things:</p>
|
|
<ul>
|
|
<li>mapping from "logical" names, the public identifiers and a more
|
|
concrete name usable for download (and URI). For example it can associate
|
|
the logical name
|
|
<p>"-//OASIS//DTD DocBook XML V4.1.2//EN"</p>
|
|
<p>of the DocBook 4.1.2 XML DTD with the actual URL where it can be
|
|
downloaded</p>
|
|
<p>http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd</p>
|
|
</li>
|
|
<li>remapping from a given URL to another one, like an HTTP indirection
|
|
saying that
|
|
<p>"http://www.oasis-open.org/committes/tr.xsl"</p>
|
|
<p>should really be looked at</p>
|
|
<p>"http://www.oasis-open.org/committes/entity/stylesheets/base/tr.xsl"</p>
|
|
</li>
|
|
<li>providing a local cache mechanism allowing to load the entities
|
|
associated to public identifiers or remote resources, this is a really
|
|
important feature for any significant deployment of XML or SGML since it
|
|
allows to avoid the aleas and delays associated to fetching remote
|
|
resources.</li>
|
|
</ul>
|
|
|
|
<h2><a name="definition">The definitions</a></h2>
|
|
|
|
<p>Libxml, as of 2.4.3 implements 2 kind of catalogs:</p>
|
|
<ul>
|
|
<li>the older SGML catalogs, the official spec is SGML Open Technical
|
|
Resolution TR9401:1997, but is better understood by reading <a
|
|
href="http://www.jclark.com/sp/catalog.htm">the SP Catalog page</a> from
|
|
James Clark. This is relatively old and not the preferred mode of
|
|
operation of libxml.</li>
|
|
<li><a href="http://www.oasis-open.org/committees/entity/spec.html">XML
|
|
Catalogs</a>
|
|
is far more flexible, more recent, uses an XML syntax and should scale
|
|
quite better. This is the default option of libxml.</li>
|
|
</ul>
|
|
|
|
<p></p>
|
|
|
|
<h2><a name="Simple">Using catalog</a></h2>
|
|
|
|
<p>In a normal environment libxml will by default check the presence of a
|
|
catalog in /etc/xml/catalog, and assuming it has been correctly populated,
|
|
the processing is completely transparent to the document user. To take a
|
|
concrete example, suppose you are authoring a DocBook document, this one
|
|
starts with the following DOCTYPE definition:</p>
|
|
<pre><?xml version='1.0'?>
|
|
<!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4//EN"
|
|
"http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd"></pre>
|
|
|
|
<p>When validating the document with libxml, the catalog will be
|
|
automatically consulted to lookup the public identifier "-//Norman Walsh//DTD
|
|
DocBk XML V3.1.4//EN" and the system identifier
|
|
"http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd", and if these entities have
|
|
been installed on your system and the catalogs actually point to them, libxml
|
|
will fetch them from the local disk.</p>
|
|
|
|
<p style="font-size: 10pt"><strong>Note</strong>: Really don't use this
|
|
DOCTYPE example it's a really old version, but is fine as an example.</p>
|
|
|
|
<p>Libxml will check the catalog each time that it is requested to load an
|
|
entity, this includes DTD, external parsed entities, stylesheets, etc ... If
|
|
your system is correctly configured all the authoring phase and processing
|
|
should use only local files, even if your document stays portable because it
|
|
uses the canonical public and system ID, referencing the remote document.</p>
|
|
|
|
<h2><a name="Some">Some examples:</a></h2>
|
|
|
|
<p>Here is a couple of fragments from XML Catalogs used in libxml early
|
|
regression tests in <code>test/catalogs</code> :</p>
|
|
<pre><?xml version="1.0"?>
|
|
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
|
|
"http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
|
|
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
|
|
<public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
|
|
uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/>
|
|
...</pre>
|
|
|
|
<p>This is the beginning of a catalog for DocBook 4.1.2, XML Catalogs are
|
|
written in XML, there is a specific namespace for catalog elements
|
|
"urn:oasis:names:tc:entity:xmlns:xml:catalog". The first entry in this
|
|
catalog is a <code>public</code> mapping it allows to associate a Public
|
|
Identifier with an URI.</p>
|
|
<pre>...
|
|
<rewriteSystem systemIdStartString="http://www.oasis-open.org/docbook/"
|
|
rewritePrefix="file:///usr/share/xml/docbook/"/>
|
|
...</pre>
|
|
|
|
<p>A <code>rewriteSystem</code> is a very powerful instruction, it says that
|
|
any URI starting with a given prefix should be looked at another URI
|
|
constructed by replacing the prefix with an new one. In effect this acts like
|
|
a cache system for a full area of the Web. In practice it is extremely useful
|
|
with a file prefix if you have installed a copy of those resources on your
|
|
local system.</p>
|
|
<pre>...
|
|
<delegatePublic publicIdStartString="-//OASIS//DTD XML Catalog //"
|
|
catalog="file:///usr/share/xml/docbook.xml"/>
|
|
<delegatePublic publicIdStartString="-//OASIS//ENTITIES DocBook XML"
|
|
catalog="file:///usr/share/xml/docbook.xml"/>
|
|
<delegatePublic publicIdStartString="-//OASIS//DTD DocBook XML"
|
|
catalog="file:///usr/share/xml/docbook.xml"/>
|
|
<delegateSystem systemIdStartString="http://www.oasis-open.org/docbook/"
|
|
catalog="file:///usr/share/xml/docbook.xml"/>
|
|
<delegateURI uriStartString="http://www.oasis-open.org/docbook/"
|
|
catalog="file:///usr/share/xml/docbook.xml"/>
|
|
...</pre>
|
|
|
|
<p>Delegation is the core features which allows to build a tree of catalogs,
|
|
easier to maintain than a single catalog, based on Public Identifier, System
|
|
Identifier or URI prefixes it instructs the catalog software to look up entries
|
|
in another resource. This feature allow to build hierarchies of catalogs, the
|
|
set of entries presented should be sufficient to redirect the resolution of
|
|
all DocBook references to the specific catalog in
|
|
<code>/usr/share/xml/docbook.xml</code> this one in turn could delegate all
|
|
references for DocBook 4.2.1 to a specific catalog installed at the same time
|
|
as the DocBook resources on the local machine.</p>
|
|
|
|
<h2><a name="reference">How to tune catalog usage:</a></h2>
|
|
|
|
<p>The user can change the default catalog behaviour by redirecting queries
|
|
to its own set of catalogs, this can be done by setting the
|
|
<code>XML_CATALOG_FILES</code> environment variable to a list of catalogs, an
|
|
empty one should deactivate loading the default
|
|
<code>/etc/xml/catalog</code> default catalog.</p>
|
|
|
|
<p>@@More options are likely to be provided in the future@@</p>
|
|
|
|
<h2><a name="validate">How to debug catalog processing:</a></h2>
|
|
|
|
<p>Setting up the <code>XML_DEBUG_CATALOG</code> environment variable will
|
|
make libxml output debugging informations for each catalog operations, for
|
|
example:</p>
|
|
<pre>orchis:~/XML -> xmllint --memory --noout test/ent2
|
|
warning: failed to load external entity "title.xml"
|
|
orchis:~/XML -> export XML_DEBUG_CATALOG=
|
|
orchis:~/XML -> xmllint --memory --noout test/ent2
|
|
Failed to parse catalog /etc/xml/catalog
|
|
Failed to parse catalog /etc/xml/catalog
|
|
warning: failed to load external entity "title.xml"
|
|
Catalogs cleanup
|
|
orchis:~/XML -> </pre>
|
|
|
|
<p>The test/ent2 references an entity, running the parser from memory makes
|
|
the base URI unavailable and the the "title.xml" entity cannot be loaded.
|
|
Setting up the debug environment variable allows to detect that an attempt is
|
|
made to load the <code>/etc/xml/catalog</code> but since it's not present the
|
|
resolution fails.</p>
|
|
|
|
<p>But the most advanced way to debug XML catalog processing is to use the
|
|
<strong>xmlcatalog</strong> command shipped with libxml2, it allows to load
|
|
catalogs and make resolution queries to see what is going on. This is also
|
|
used for the regression tests:</p>
|
|
<pre>orchis:~/XML -> ./xmlcatalog test/catalogs/docbook.xml "-//OASIS//DTD DocBook XML V4.1.2//EN"
|
|
http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
|
|
orchis:~/XML -> </pre>
|
|
|
|
<p>For debugging what is going on, adding one -v flags increase the verbosity
|
|
level to indicate the processing done (adding a second flag also indicate
|
|
what elements are recognized at parsing):</p>
|
|
<pre>orchis:~/XML -> ./xmlcatalog -v test/catalogs/docbook.xml "-//OASIS//DTD DocBook XML V4.1.2//EN"
|
|
Parsing catalog test/catalogs/docbook.xml's content
|
|
Found public match -//OASIS//DTD DocBook XML V4.1.2//EN
|
|
http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
|
|
Catalogs cleanup
|
|
orchis:~/XML -> </pre>
|
|
|
|
<p>A shell interface is also available to debug and process multiple queries
|
|
(and for regression tests):</p>
|
|
<pre>orchis:~/XML -> ./xmlcatalog -shell test/catalogs/docbook.xml "-//OASIS//DTD DocBook XML V4.1.2//EN"
|
|
> help
|
|
Commands available:
|
|
public PublicID: make a PUBLIC identifier lookup
|
|
system SystemID: make a SYSTEM identifier lookup
|
|
resolve PublicID SystemID: do a full resolver lookup
|
|
add 'type' 'orig' 'replace' : add an entry
|
|
del 'values' : remove values
|
|
dump: print the current catalog state
|
|
debug: increase the verbosity level
|
|
quiet: decrease the verbosity level
|
|
exit: quit the shell
|
|
> public "-//OASIS//DTD DocBook XML V4.1.2//EN"
|
|
http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
|
|
> quit
|
|
orchis:~/XML -> </pre>
|
|
|
|
<p>This should be sufficient for most debugging purpose, this was actually
|
|
used heavily to debug the XML Catalog implementation itself.</p>
|
|
|
|
<h2><a name="Declaring">How to create and maintain</a> catalogs:</h2>
|
|
|
|
<p>Basically XML Catalogs are XML files, you can either use XML tools to
|
|
manage them or use <strong>xmlcatalog</strong> for this. The basic step is
|
|
to create a catalog the -create option provide this facility:</p>
|
|
<pre>orchis:~/XML -> ./xmlcatalog --create tst.xml
|
|
<?xml version="1.0"?>
|
|
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
|
|
"http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
|
|
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/>
|
|
orchis:~/XML -> </pre>
|
|
|
|
<p>By default xmlcatalog does not overwrite the original catalog and save the
|
|
result on the standard output, this can be overridden using the -noout
|
|
option. The <code>-add</code> command allows to add entries in the
|
|
catalog:</p>
|
|
<pre>orchis:~/XML -> ./xmlcatalog --noout --create --add "public" "-//OASIS//DTD DocBook XML V4.1.2//EN" http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd tst.xml
|
|
orchis:~/XML -> cat tst.xml
|
|
<?xml version="1.0"?>
|
|
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
|
|
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
|
|
<public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
|
|
uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/>
|
|
</catalog>
|
|
orchis:~/XML -> </pre>
|
|
|
|
<p>The <code>-add</code> option will always take 3 parameters even if some of
|
|
the XML Catalog constructs (like nextCatalog) will have only a single
|
|
argument, just pass a third empty string, it will be ignored.</p>
|
|
|
|
<p>Similarly the <code>-del</code> option remove matching entries from the
|
|
catalog:</p>
|
|
<pre>orchis:~/XML -> ./xmlcatalog --del "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" tst.xml
|
|
<?xml version="1.0"?>
|
|
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
|
|
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/>
|
|
orchis:~/XML -> </pre>
|
|
|
|
<p>The catalog is now empty. Note that the matching of <code>-del</code> is
|
|
exact and would have worked in a similar fashion with the Public ID
|
|
string.</p>
|
|
|
|
<p>This is rudimentary but should be sufficient to manage a not too complex
|
|
catalog tree of resources.</p>
|
|
|
|
<h2><a name="implemento">The implementor corner quick review of the
|
|
API:</a></h2>
|
|
|
|
<p>First, and like for every other module of libxml, there is an automatically
|
|
generated <a href="html/libxml-catalog.html">API page for catalog
|
|
support</a>.</p>
|
|
|
|
<p>The header for the catalog interfaces should be included as:</p>
|
|
<pre>#include <libxml/catalog.h></pre>
|
|
|
|
<p>The API is voluntarily kept very simple. First it is not obvious that
|
|
applications really need access to it since it is the default behaviour of
|
|
libxml (Note: it is possible to completely override libxml default catalog by
|
|
using <a href="html/libxml-parser.html">xmlSetExternalEntityLoader</a> to
|
|
plug an application specific resolver).</p>
|
|
|
|
<p>Basically libxml support 2 catalog lists:</p>
|
|
<ul>
|
|
<li>the default one, global shared by all the application</li>
|
|
<li>a per-document catalog, this one is built if the document uses the
|
|
<code>oasis-xml-catalog</code> PIs to specify its own catalog list, it is
|
|
associated to the parser context and destroyed when the parsing context
|
|
is destroyed.</li>
|
|
</ul>
|
|
|
|
<p>the document one will be used first if it exists.</p>
|
|
|
|
<h3>Initialization routines:</h3>
|
|
|
|
<p>xmlInitializeCatalog(), xmlLoadCatalog() and xmlLoadCatalogs() should be
|
|
used at startup to initialize the catalog, if the catalog should be
|
|
initialized with specific values xmlLoadCatalog() or xmlLoadCatalogs()
|
|
should be called before xmlInitializeCatalog() which would otherwise do a
|
|
default initialization first.</p>
|
|
|
|
<p>The xmlCatalogAddLocal() call is used by the parser to grow the document
|
|
own catalog list if needed.</p>
|
|
|
|
<h3>Preferences setup:</h3>
|
|
|
|
<p>The XML Catalog spec requires the possibility to select default
|
|
preferences between public and system delegation,
|
|
xmlCatalogSetDefaultPrefer() allows this, xmlCatalogSetDefaults() and
|
|
xmlCatalogGetDefaults() allow to control if XML Catalogs resolution should
|
|
be forbidden, allowed for global catalog, for document catalog or both, the
|
|
default is to allow both.</p>
|
|
|
|
<p>And of course xmlCatalogSetDebug() allows to generate debug messages
|
|
(through the xmlGenericError() mechanism).</p>
|
|
|
|
<h3>Querying routines:</h3>
|
|
|
|
<p>xmlCatalogResolve(), xmlCatalogResolveSystem(), xmlCatalogResolvePublic()
|
|
and xmlCatalogResolveURI() are relatively explicit if you read the XML
|
|
Catalog specification they correspond to section 7 algorithms, they should
|
|
also work if you have loaded an SGML catalog with a simplified semantic.</p>
|
|
|
|
<p>xmlCatalogLocalResolve() and xmlCatalogLocalResolveURI() are the same but
|
|
operate on the document catalog list</p>
|
|
|
|
<h3>Cleanup and Miscellaneous:</h3>
|
|
|
|
<p>xmlCatalogCleanup() free-up the global catalog, xmlCatalogFreeLocal() is
|
|
the per-document equivalent.</p>
|
|
|
|
<p>xmlCatalogAdd() and xmlCatalogRemove() are used to dynamically modify the
|
|
first catalog in the global list, and xmlCatalogDump() allows to dump a
|
|
catalog state, those routines are primarily designed for xmlcatalog, I'm not
|
|
sure that exposing more complex interfaces (like navigation ones) would be
|
|
really useful.</p>
|
|
|
|
<p>The xmlParseCatalogFile() is a function used to load XML Catalog files,
|
|
it's similar as xmlParseFile() except it bypass all catalog lookups, it's
|
|
provided because this functionality may be useful for client tools.</p>
|
|
|
|
<h3>threaded environments:</h3>
|
|
|
|
<p>Since the catalog tree is built progressively, some care has been taken to
|
|
try to avoid troubles in multithreaded environments but without a
|
|
test-and-set routine accessible from C this can't be fully guaranteed, so the
|
|
best is to use xmlGetExternalEntityLoader and set the entity loader routines
|
|
to one of your code doing the synchronization.</p>
|
|
|
|
<p></p>
|
|
|
|
<h2><a name="Other">Other resources</a></h2>
|
|
|
|
<p>The XML Catalog specification is relatively recent so there isn't much
|
|
literature to point at:</p>
|
|
<ul>
|
|
<li>You can find an good rant from Norm Walsh about <a
|
|
href="http://www.arbortext.com/Think_Tank/XML_Resources/Issue_Three/issue_three.html">the
|
|
need for catalogs</a>, it provides a lot of context informations even if
|
|
I don't agree with everything presented.</li>
|
|
<li>An <a href="http://home.ccil.org/~cowan/XML/XCatalog.html">old XML
|
|
catalog proposal</a> from John Cowan</li>
|
|
<li>The <a href="http://www.rddl.org/">Resource Directory Description
|
|
Language</a> (RDDL) another catalog system but more oriented toward
|
|
providing metadata for XML namespaces.</li>
|
|
<li>the page from the OASIS Technical <a
|
|
href="http://www.oasis-open.org/committees/entity/">Committee on Entity
|
|
Resolution</a> who maintains XML Catalog, you will find pointers to the
|
|
specification update, some background and pointers to others tools
|
|
providing XML Catalog support</li>
|
|
<li>I have uploaded <a href="ftp://xmlsoft.org/test/dbk412catalog.tar.gz">a
|
|
mall tarball</a> containing XML Catalogs for DocBook 4.1.2 which seems to
|
|
work fine for me</li>
|
|
<li>The <a href="http://www.xmlsoft.org/xmlcatalog_man.html">xmlcatalog manual page</a></li>
|
|
|
|
</ul>
|
|
|
|
<p>If you have suggestions for corrections or additions, simply contact
|
|
me:</p>
|
|
|
|
<p><a href="mailto:daniel@veillard.com">Daniel Veillard</a></p>
|
|
|
|
<p>$Id: catalog.html,v 1.4 2001/08/24 12:14:55 veillard Exp $</p>
|
|
</body>
|
|
</html>
|