mirror of
https://github.com/HDFGroup/hdf5.git
synced 2024-12-09 07:32:32 +08:00
272 lines
12 KiB
HTML
272 lines
12 KiB
HTML
|
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
|
||
|
<html>
|
||
|
<head>
|
||
|
<title>Backward/Forward Compatability</title>
|
||
|
</head>
|
||
|
|
||
|
<body>
|
||
|
<h1>Backward/Forward Compatability</h1>
|
||
|
|
||
|
<p>The HDF5 development must proceed in such a manner as to
|
||
|
satisfy the following conditions:
|
||
|
|
||
|
<ol type=A>
|
||
|
<li>HDF5 applications can produce data that HDF5
|
||
|
applications can read and write and HDF4 applications can produce
|
||
|
data that HDF4 applications can read and write. The situation
|
||
|
that demands this condition is obvious.</li>
|
||
|
|
||
|
<li>HDF5 applications are able to produce data that HDF4 applications
|
||
|
can read and HDF4 applications can subsequently modify the
|
||
|
file subject to certain constraints depending on the
|
||
|
implementation. This condition is for the temporary
|
||
|
situation where a consumer has neither been relinked with a new
|
||
|
HDF4 API built on top of the HDF5 API nor recompiled with the
|
||
|
HDF5 API.</li>
|
||
|
|
||
|
<li>HDF5 applications can read existing HDF4 files and subsequently
|
||
|
modify the file subject to certain constraints depending on
|
||
|
the implementation. This is condition is for the temporary
|
||
|
situation in which the producer has neither been relinked with a
|
||
|
new HDF4 API built on top of the HDF5 API nor recompiled with
|
||
|
the HDF5 API, or the permanent situation of HDF5 consumers
|
||
|
reading archived HDF4 files.</li>
|
||
|
</ul>
|
||
|
|
||
|
<p>There's at least one invarient: new object features introduced
|
||
|
in the HDF5 file format (like 2-d arrays of structs) might be
|
||
|
impossible to "translate" to a format that an old HDF4
|
||
|
application can understand either because the HDF4 file format
|
||
|
or the HDF4 API has no mechanism to describe the object.
|
||
|
|
||
|
<p>What follows is one possible implementation based on how
|
||
|
Condition B was solved in the AIO/PDB world. It also attempts
|
||
|
to satisfy these goals:
|
||
|
|
||
|
<ol type=1>
|
||
|
<li>The main HDF5 library contains as little extra baggage as
|
||
|
possible by either relying on external programs to take care
|
||
|
of compatability issues or by incorporating the logic of such
|
||
|
programs as optional modules in the HDF5 library. Conditions B
|
||
|
and C are separate programs/modules.</li>
|
||
|
|
||
|
<li>No extra baggage not only means the library proper is small,
|
||
|
but also means it can be implemented (rather than migrated
|
||
|
from HDF4 source) from the ground up with minimal regard for
|
||
|
HDF4 thus keeping the logic straight forward.</li>
|
||
|
|
||
|
<li>Compatability issues are handled behind the scenes when
|
||
|
necessary (and possible) but can be carried out explicitly
|
||
|
during things like data migration.</li>
|
||
|
</ol>
|
||
|
|
||
|
<hr>
|
||
|
<h2>Wrappers</h2>
|
||
|
|
||
|
<p>The proposed implementation uses <i>wrappers</i> to handle
|
||
|
compatability issues. A Format-X file is <i>wrapped</i> in a
|
||
|
Format-Y file by creating a Format-Y skeleton that replicates
|
||
|
the Format-X meta data. The Format-Y skeleton points to the raw
|
||
|
data stored in Format-X without moving the raw data. The
|
||
|
restriction is that raw data storage methods in Format-Y is a
|
||
|
superset of raw data storage methods in Format-X (otherwise the
|
||
|
raw data must be copied to Format-Y). We're assuming that meta
|
||
|
data is small wrt the entire file.
|
||
|
|
||
|
<p>The wrapper can be a separate file that has pointers into the
|
||
|
first file or it can be contained within the first file. If
|
||
|
contained in a single file, the file can appear as a Format-Y
|
||
|
file or simultaneously a Format-Y and Format-X file.
|
||
|
|
||
|
<p>The Format-X meta-data can be thought of as the original
|
||
|
wrapper around raw data and Format-Y is a second wrapper around
|
||
|
the same data. The wrappers are independend of one another;
|
||
|
modifying the meta-data in one wrapper causes the other to
|
||
|
become out of date. Modification of raw data doesn't invalidate
|
||
|
either view as long as the meta data that describes its storage
|
||
|
isn't modifed. For instance, an array element can change values
|
||
|
if storage is already allocated for the element, but if storage
|
||
|
isn't allocated then the meta data describing the storage must
|
||
|
change, invalidating all wrappers but one.
|
||
|
|
||
|
<p>It's perfectly legal to modify the meta data of one wrapper
|
||
|
without modifying the meta data in the other wrapper(s). The
|
||
|
illegal part is accessing the raw data through a wrapper which
|
||
|
is out of date.
|
||
|
|
||
|
<p>If raw data is wrapped by more than one internal wrapper
|
||
|
(<i>internal</i> means that the wrapper is in the same file as
|
||
|
the raw data) then access to that file must assume that
|
||
|
unreferenced parts of that file contain meta data for another
|
||
|
wrapper and cannot be reclaimed as free memory.
|
||
|
|
||
|
<hr>
|
||
|
<h2>Implementation of Condition B</h2>
|
||
|
|
||
|
<p>Since this is a temporary situation which can't be
|
||
|
automatically detected by the HDF5 library, we must rely
|
||
|
on the application to notify the HDF5 library whether or not it
|
||
|
must satisfy Condition B. (Even if we don't rely on the
|
||
|
application, at some point someone is going to remove the
|
||
|
Condition B constraint from the library.) So the module that
|
||
|
handles Condition B is conditionally compiled and then enabled
|
||
|
on a per-file basis.
|
||
|
|
||
|
<p>If the application desires to produce an HDF4 file (determined
|
||
|
by arguments to <code>H5Fopen</code>), and the Condition B
|
||
|
module is compiled into the library, then <code>H5Fclose</code>
|
||
|
calls the module to traverse the HDF5 wrapper and generate an
|
||
|
additional internal or external HDF4 wrapper (wrapper specifics
|
||
|
are described below). If Condition B is implemented as a module
|
||
|
then it can benefit from the metadata already cached by the main
|
||
|
library.
|
||
|
|
||
|
<p>An internal HDF4 wrapper would be used if the HDF5 file is
|
||
|
writable and the user doesn't mind that the HDF5 file is
|
||
|
modified. An external wrapper would be used if the file isn't
|
||
|
writable or if the user wants the data file to be primarily HDF5
|
||
|
but a few applications need an HDF4 view of the data.
|
||
|
|
||
|
<p>Modifying through the HDF5 library an HDF5 file that has
|
||
|
internal HDF4 wrapper should invalidate the HDF4 wrapper (and
|
||
|
optionally regenerate it when <code>H5Fclose</code> is
|
||
|
called). The HDF5 library must understand how wrappers work, but
|
||
|
not necessarily anything about the HDF4 file format.
|
||
|
|
||
|
<p>Modifying through the HDF5 library an HDF5 file that has an
|
||
|
external HDF4 wrapper will cause the HDF4 wrapper to become out
|
||
|
of date (but possibly regenerated during <code>H5Fclose</code>).
|
||
|
<b>Note: Perhaps the next release of the HDF4 library should
|
||
|
insure that the HDF4 wrapper file has a more recent modification
|
||
|
time than the raw data file (the HDF5 file) to which it
|
||
|
points(?)</b>
|
||
|
|
||
|
<p>Modifying through the HDF4 library an HDF5 file that has an
|
||
|
internal or external HDF4 wrapper will cause the HDF5 wrapper to
|
||
|
become out of date. However, there is now way for the old HDF4
|
||
|
library to notify the HDF5 wrapper that it's out of date.
|
||
|
Therefore the HDF5 library must be able to detect when the HDF5
|
||
|
wrapper is out of date and be able to fix it. If the HDF4
|
||
|
wrapper is complete then the easy way is to ignore the original
|
||
|
HDF5 wrapper and generate a new one from the HDF4 wrapper. The
|
||
|
other approach is to compare the HDF4 and HDF5 wrappers and
|
||
|
assume that if they differ HDF4 is the right one, if HDF4 omits
|
||
|
data then it was because HDF4 is a partial wrapper (rather than
|
||
|
assume HDF4 deleted the data), and if HDF4 has new data then
|
||
|
copy the new meta data to the HDF5 wrapper. On the other hand,
|
||
|
perhaps we don't need to allow these situations (modifying an
|
||
|
HDF5 file with the old HDF4 library and then accessing it with
|
||
|
the HDF5 library is either disallowed or causes HDF5 objects
|
||
|
that can't be described by HDF4 to be lost).
|
||
|
|
||
|
<p>To convert an HDF5 file to an HDF4 file on demand, one simply
|
||
|
opens the file with the HDF4 flag and closes it. This is also
|
||
|
how AIO implemented backward compatability with PDB in its file
|
||
|
format.
|
||
|
|
||
|
<hr>
|
||
|
<h2>Implementation of Condition C</h2>
|
||
|
|
||
|
<p>This condition must be satisfied for all time because there
|
||
|
will always be archived HDF4 files. If a pure HDF4 file (that
|
||
|
is, one without HDF5 meta data) is opened with an HDF5 library,
|
||
|
the <code>H5Fopen</code> builds an internal or external HDF5
|
||
|
wrapper and then accesses the raw data through that wrapper. If
|
||
|
the HDF5 library modifies the file then the HDF4 wrapper becomes
|
||
|
out of date. However, since the HDF5 library hasn't been
|
||
|
released, we can at least implement it to disable and/or reclaim
|
||
|
the HDF4 wrapper.
|
||
|
|
||
|
<p>If an external and temporary HDF5 wrapper is desired, the
|
||
|
wrapper is created through the cache like all other HDF5 files.
|
||
|
The data appears on disk only if a particular cached datum is
|
||
|
preempted. Instead of calling <code>H5Fclose</code> on the HDF5
|
||
|
wrapper file we call <code>H5Fabort</code> which immediately
|
||
|
releases all file resources without updating the file, and then
|
||
|
we unlink the file from Unix.
|
||
|
|
||
|
<hr>
|
||
|
<h2>What do wrappers look like?</h2>
|
||
|
|
||
|
<p>External wrappers are quite obvious: they contain only things
|
||
|
from the format specs for the wrapper and nothing from the
|
||
|
format specs of the format which they wrap.
|
||
|
|
||
|
<p>An internal HDF4 wrapper is added to an HDF5 file in such a way
|
||
|
that the file appears to be both an HDF4 file and an HDF5
|
||
|
file. HDF4 requires an HDF4 file header at file offset zero. If
|
||
|
a user block is present then we just move the user block down a
|
||
|
bit (and truncate it) and insert the minimum HDF4 signature.
|
||
|
The HDF4 <code>dd</code> list and any other data it needs are
|
||
|
appended to the end of the file and the HDF5 signature uses the
|
||
|
logical file length field to determine the beginning of the
|
||
|
trailing part of the wrapper.
|
||
|
|
||
|
<p>
|
||
|
<center>
|
||
|
<table border width="60%">
|
||
|
<tr>
|
||
|
<td>HDF4 minimal file header. Its main job is to point to
|
||
|
the <code>dd</code> list at the end of the file.</td>
|
||
|
</tr>
|
||
|
<tr>
|
||
|
<td>User-defined block which is truncated by the size of the
|
||
|
HDF4 file header so that the HDF5 boot block file address
|
||
|
doesn't change.</td>
|
||
|
</tr>
|
||
|
<tr>
|
||
|
<td>The HDF5 boot block and data, unmodified by adding the
|
||
|
HDF4 wrapper.</td>
|
||
|
</tr>
|
||
|
<tr>
|
||
|
<td>The main part of the HDF4 wrapper. The <code>dd</code>
|
||
|
list will have entries for all parts of the file so
|
||
|
hdpack(?) doesn't (re)move anything.</td>
|
||
|
</tr>
|
||
|
</table>
|
||
|
</center>
|
||
|
|
||
|
<p>When such a file is opened by the HDF5 library for
|
||
|
modification it shifts the user block back down to address zero
|
||
|
and fills with zeros, then truncates the file at the end of the
|
||
|
HDF5 data or adds the trailing HDF4 wrapper to the free
|
||
|
list. This prevents HDF4 applications from reading the file with
|
||
|
an out of date wrapper.
|
||
|
|
||
|
<p>If there is no user block then we have a problem. The HDF5
|
||
|
boot block must be moved to make room for the HDF4 file header.
|
||
|
But moving just the boot block causes problems because all file
|
||
|
addresses stored in the file are relative to the boot block
|
||
|
address. The only option is to shift the entire file contents
|
||
|
by 512 bytes to open up a user block (too bad we don't have
|
||
|
hooks into the Unix i-node stuff so we could shift the entire
|
||
|
file contents by the size of a file system page without ever
|
||
|
performing I/O on the file :-)
|
||
|
|
||
|
<p>Is it possible to place an HDF5 wrapper in an HDF4 file? I
|
||
|
don't know enough about the HDF4 format, but I would suspect it
|
||
|
might be possible to open a hole at file address 512 (and
|
||
|
possibly before) by moving some things to the end of the file
|
||
|
to make room for the HDF5 signature. The remainder of the HDF5
|
||
|
wrapper goes at the end of the file and entries are added to the
|
||
|
HDF4 <code>dd</code> list to mark the location(s) of the HDF5
|
||
|
wrapper.
|
||
|
|
||
|
<hr>
|
||
|
<h2>Other Thoughts</h2>
|
||
|
|
||
|
<p>Conversion programs that copy an entire HDF4 file to a separate,
|
||
|
self-contained HDF5 file and vice versa might be useful.
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
<hr>
|
||
|
<address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
|
||
|
<!-- Created: Fri Oct 3 11:52:31 EST 1997 -->
|
||
|
<!-- hhmts start -->
|
||
|
Last modified: Wed Oct 8 12:34:42 EST 1997
|
||
|
<!-- hhmts end -->
|
||
|
</body>
|
||
|
</html>
|