hdf5/doc/html/TechNotes/H4-H5Compat.html

272 lines
12 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<title>Backward/Forward Compatability</title>
</head>
<body>
<h1>Backward/Forward Compatability</h1>
<p>The HDF5 development must proceed in such a manner as to
satisfy the following conditions:
<ol type=A>
<li>HDF5 applications can produce data that HDF5
applications can read and write and HDF4 applications can produce
data that HDF4 applications can read and write. The situation
that demands this condition is obvious.</li>
<li>HDF5 applications are able to produce data that HDF4 applications
can read and HDF4 applications can subsequently modify the
file subject to certain constraints depending on the
implementation. This condition is for the temporary
situation where a consumer has neither been relinked with a new
HDF4 API built on top of the HDF5 API nor recompiled with the
HDF5 API.</li>
<li>HDF5 applications can read existing HDF4 files and subsequently
modify the file subject to certain constraints depending on
the implementation. This is condition is for the temporary
situation in which the producer has neither been relinked with a
new HDF4 API built on top of the HDF5 API nor recompiled with
the HDF5 API, or the permanent situation of HDF5 consumers
reading archived HDF4 files.</li>
</ul>
<p>There's at least one invarient: new object features introduced
in the HDF5 file format (like 2-d arrays of structs) might be
impossible to "translate" to a format that an old HDF4
application can understand either because the HDF4 file format
or the HDF4 API has no mechanism to describe the object.
<p>What follows is one possible implementation based on how
Condition B was solved in the AIO/PDB world. It also attempts
to satisfy these goals:
<ol type=1>
<li>The main HDF5 library contains as little extra baggage as
possible by either relying on external programs to take care
of compatability issues or by incorporating the logic of such
programs as optional modules in the HDF5 library. Conditions B
and C are separate programs/modules.</li>
<li>No extra baggage not only means the library proper is small,
but also means it can be implemented (rather than migrated
from HDF4 source) from the ground up with minimal regard for
HDF4 thus keeping the logic straight forward.</li>
<li>Compatability issues are handled behind the scenes when
necessary (and possible) but can be carried out explicitly
during things like data migration.</li>
</ol>
<hr>
<h2>Wrappers</h2>
<p>The proposed implementation uses <i>wrappers</i> to handle
compatability issues. A Format-X file is <i>wrapped</i> in a
Format-Y file by creating a Format-Y skeleton that replicates
the Format-X meta data. The Format-Y skeleton points to the raw
data stored in Format-X without moving the raw data. The
restriction is that raw data storage methods in Format-Y is a
superset of raw data storage methods in Format-X (otherwise the
raw data must be copied to Format-Y). We're assuming that meta
data is small wrt the entire file.
<p>The wrapper can be a separate file that has pointers into the
first file or it can be contained within the first file. If
contained in a single file, the file can appear as a Format-Y
file or simultaneously a Format-Y and Format-X file.
<p>The Format-X meta-data can be thought of as the original
wrapper around raw data and Format-Y is a second wrapper around
the same data. The wrappers are independend of one another;
modifying the meta-data in one wrapper causes the other to
become out of date. Modification of raw data doesn't invalidate
either view as long as the meta data that describes its storage
isn't modifed. For instance, an array element can change values
if storage is already allocated for the element, but if storage
isn't allocated then the meta data describing the storage must
change, invalidating all wrappers but one.
<p>It's perfectly legal to modify the meta data of one wrapper
without modifying the meta data in the other wrapper(s). The
illegal part is accessing the raw data through a wrapper which
is out of date.
<p>If raw data is wrapped by more than one internal wrapper
(<i>internal</i> means that the wrapper is in the same file as
the raw data) then access to that file must assume that
unreferenced parts of that file contain meta data for another
wrapper and cannot be reclaimed as free memory.
<hr>
<h2>Implementation of Condition B</h2>
<p>Since this is a temporary situation which can't be
automatically detected by the HDF5 library, we must rely
on the application to notify the HDF5 library whether or not it
must satisfy Condition B. (Even if we don't rely on the
application, at some point someone is going to remove the
Condition B constraint from the library.) So the module that
handles Condition B is conditionally compiled and then enabled
on a per-file basis.
<p>If the application desires to produce an HDF4 file (determined
by arguments to <code>H5Fopen</code>), and the Condition B
module is compiled into the library, then <code>H5Fclose</code>
calls the module to traverse the HDF5 wrapper and generate an
additional internal or external HDF4 wrapper (wrapper specifics
are described below). If Condition B is implemented as a module
then it can benefit from the metadata already cached by the main
library.
<p>An internal HDF4 wrapper would be used if the HDF5 file is
writable and the user doesn't mind that the HDF5 file is
modified. An external wrapper would be used if the file isn't
writable or if the user wants the data file to be primarily HDF5
but a few applications need an HDF4 view of the data.
<p>Modifying through the HDF5 library an HDF5 file that has
internal HDF4 wrapper should invalidate the HDF4 wrapper (and
optionally regenerate it when <code>H5Fclose</code> is
called). The HDF5 library must understand how wrappers work, but
not necessarily anything about the HDF4 file format.
<p>Modifying through the HDF5 library an HDF5 file that has an
external HDF4 wrapper will cause the HDF4 wrapper to become out
of date (but possibly regenerated during <code>H5Fclose</code>).
<b>Note: Perhaps the next release of the HDF4 library should
insure that the HDF4 wrapper file has a more recent modification
time than the raw data file (the HDF5 file) to which it
points(?)</b>
<p>Modifying through the HDF4 library an HDF5 file that has an
internal or external HDF4 wrapper will cause the HDF5 wrapper to
become out of date. However, there is now way for the old HDF4
library to notify the HDF5 wrapper that it's out of date.
Therefore the HDF5 library must be able to detect when the HDF5
wrapper is out of date and be able to fix it. If the HDF4
wrapper is complete then the easy way is to ignore the original
HDF5 wrapper and generate a new one from the HDF4 wrapper. The
other approach is to compare the HDF4 and HDF5 wrappers and
assume that if they differ HDF4 is the right one, if HDF4 omits
data then it was because HDF4 is a partial wrapper (rather than
assume HDF4 deleted the data), and if HDF4 has new data then
copy the new meta data to the HDF5 wrapper. On the other hand,
perhaps we don't need to allow these situations (modifying an
HDF5 file with the old HDF4 library and then accessing it with
the HDF5 library is either disallowed or causes HDF5 objects
that can't be described by HDF4 to be lost).
<p>To convert an HDF5 file to an HDF4 file on demand, one simply
opens the file with the HDF4 flag and closes it. This is also
how AIO implemented backward compatability with PDB in its file
format.
<hr>
<h2>Implementation of Condition C</h2>
<p>This condition must be satisfied for all time because there
will always be archived HDF4 files. If a pure HDF4 file (that
is, one without HDF5 meta data) is opened with an HDF5 library,
the <code>H5Fopen</code> builds an internal or external HDF5
wrapper and then accesses the raw data through that wrapper. If
the HDF5 library modifies the file then the HDF4 wrapper becomes
out of date. However, since the HDF5 library hasn't been
released, we can at least implement it to disable and/or reclaim
the HDF4 wrapper.
<p>If an external and temporary HDF5 wrapper is desired, the
wrapper is created through the cache like all other HDF5 files.
The data appears on disk only if a particular cached datum is
preempted. Instead of calling <code>H5Fclose</code> on the HDF5
wrapper file we call <code>H5Fabort</code> which immediately
releases all file resources without updating the file, and then
we unlink the file from Unix.
<hr>
<h2>What do wrappers look like?</h2>
<p>External wrappers are quite obvious: they contain only things
from the format specs for the wrapper and nothing from the
format specs of the format which they wrap.
<p>An internal HDF4 wrapper is added to an HDF5 file in such a way
that the file appears to be both an HDF4 file and an HDF5
file. HDF4 requires an HDF4 file header at file offset zero. If
a user block is present then we just move the user block down a
bit (and truncate it) and insert the minimum HDF4 signature.
The HDF4 <code>dd</code> list and any other data it needs are
appended to the end of the file and the HDF5 signature uses the
logical file length field to determine the beginning of the
trailing part of the wrapper.
<p>
<center>
<table border width="60%">
<tr>
<td>HDF4 minimal file header. Its main job is to point to
the <code>dd</code> list at the end of the file.</td>
</tr>
<tr>
<td>User-defined block which is truncated by the size of the
HDF4 file header so that the HDF5 boot block file address
doesn't change.</td>
</tr>
<tr>
<td>The HDF5 boot block and data, unmodified by adding the
HDF4 wrapper.</td>
</tr>
<tr>
<td>The main part of the HDF4 wrapper. The <code>dd</code>
list will have entries for all parts of the file so
hdpack(?) doesn't (re)move anything.</td>
</tr>
</table>
</center>
<p>When such a file is opened by the HDF5 library for
modification it shifts the user block back down to address zero
and fills with zeros, then truncates the file at the end of the
HDF5 data or adds the trailing HDF4 wrapper to the free
list. This prevents HDF4 applications from reading the file with
an out of date wrapper.
<p>If there is no user block then we have a problem. The HDF5
boot block must be moved to make room for the HDF4 file header.
But moving just the boot block causes problems because all file
addresses stored in the file are relative to the boot block
address. The only option is to shift the entire file contents
by 512 bytes to open up a user block (too bad we don't have
hooks into the Unix i-node stuff so we could shift the entire
file contents by the size of a file system page without ever
performing I/O on the file :-)
<p>Is it possible to place an HDF5 wrapper in an HDF4 file? I
don't know enough about the HDF4 format, but I would suspect it
might be possible to open a hole at file address 512 (and
possibly before) by moving some things to the end of the file
to make room for the HDF5 signature. The remainder of the HDF5
wrapper goes at the end of the file and entries are added to the
HDF4 <code>dd</code> list to mark the location(s) of the HDF5
wrapper.
<hr>
<h2>Other Thoughts</h2>
<p>Conversion programs that copy an entire HDF4 file to a separate,
self-contained HDF5 file and vice versa might be useful.
<hr>
<address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
<!-- Created: Fri Oct 3 11:52:31 EST 1997 -->
<!-- hhmts start -->
Last modified: Wed Oct 8 12:34:42 EST 1997
<!-- hhmts end -->
</body>
</html>