mirror of
https://github.com/HDFGroup/hdf5.git
synced 2024-11-27 02:10:55 +08:00
280 lines
12 KiB
HTML
280 lines
12 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
|
|
<html>
|
|
<head>
|
|
<title>External Files in HDF5</title>
|
|
</head>
|
|
|
|
<body>
|
|
<center><h1>External Files in HDF5</h1></center>
|
|
|
|
<h3>Overview of Layers</h3>
|
|
|
|
<p>This table shows some of the layers of HDF5. Each layer calls
|
|
functions at the same or lower layers and never functions at
|
|
higher layers. An object identifier (OID) takes various forms
|
|
at the various layers: at layer 0 an OID is an absolute physical
|
|
file address; at layers 1 and 2 it's an absolute virtual file
|
|
address. At layers 3 through 6 it's a relative address, and at
|
|
layers 7 and above it's an object handle.
|
|
|
|
<p><center>
|
|
<table border cellpadding=4 width="60%">
|
|
<tr align=center>
|
|
<td>Layer-7</td>
|
|
<td>Groups</td>
|
|
<td>Datasets</td>
|
|
</tr>
|
|
<tr align=center>
|
|
<td>Layer-6</td>
|
|
<td>Indirect Storage</td>
|
|
<td>Symbol Tables</td>
|
|
</tr>
|
|
<tr align=center>
|
|
<td>Layer-5</td>
|
|
<td>B-trees</td>
|
|
<td>Object Hdrs</td>
|
|
<td>Heaps</td>
|
|
</tr>
|
|
<tr align=center>
|
|
<td>Layer-4</td>
|
|
<td>Caching</td>
|
|
</tr>
|
|
<tr align=center>
|
|
<td>Layer-3</td>
|
|
<td>H5F chunk I/O</td>
|
|
</tr>
|
|
<tr align=center>
|
|
<td>Layer-2</td>
|
|
<td>H5F low</td>
|
|
</tr>
|
|
<tr align=center>
|
|
<td>Layer-1</td>
|
|
<td>File Family</td>
|
|
<td>Split Meta/Raw</td>
|
|
</tr>
|
|
<tr align=center>
|
|
<td>Layer-0</td>
|
|
<td>Section-2 I/O</td>
|
|
<td>Standard I/O</td>
|
|
<td>Malloc/Free</td>
|
|
</tr>
|
|
</table>
|
|
</center>
|
|
|
|
<h3>Single Address Space</h3>
|
|
|
|
<p>The simplest form of hdf5 file is a single file containing only
|
|
hdf5 data. The file begins with the super block, which is
|
|
followed until the end of the file by hdf5 data. The next most
|
|
complicated file allows non-hdf5 data (user defined data or
|
|
internal wrappers) to appear before the super block and after the
|
|
end of the hdf5 data. The hdf5 data is treated as a single
|
|
linear address space in both cases.
|
|
|
|
<p>The next level of complexity comes when non-hdf5 data is
|
|
interspersed with the hdf5 data. We handle that by including
|
|
the non-hdf5 interspersed data in the hdf5 address space and
|
|
simply not referencing it (eventually we might add those
|
|
addresses to a "do-not-disturb" list using the same mechanism as
|
|
the hdf5 free list, but it's not absolutely necessary). This is
|
|
implemented except for the "do-not-disturb" list.
|
|
|
|
<p>The most complicated single address space hdf5 file is when we
|
|
allow the address space to be split among multiple physical
|
|
files. For instance, a >2GB file can be split into smaller
|
|
chunks and transfered to a 32 bit machine, then accessed as a
|
|
single logical hdf5 file. The library already supports >32 bit
|
|
addresses, so at layer 1 we split a 64-bit address into a 32-bit
|
|
file number and a 32-bit offset (the 64 and 32 are
|
|
arbitrary). The rest of the library still operates with a linear
|
|
address space.
|
|
|
|
<p>Another variation might be a family of two files where all the
|
|
meta data is stored in one file and all the raw data is stored
|
|
in another file to allow the HDF5 wrapper to be easily replaced
|
|
with some other wrapper.
|
|
|
|
<p>The <code>H5Fcreate</code> and <code>H5Fopen</code> functions
|
|
would need to be modified to pass file-type info down to layer 2
|
|
so the correct drivers can be called and parameters passed to
|
|
the drivers to initialize them.
|
|
|
|
<h4>Implementation</h4>
|
|
|
|
<p>I've implemented fixed-size family members. The entire hdf5
|
|
file is partitioned into members where each member is the same
|
|
size. The family scheme is used if one passes a name to
|
|
<code>H5F_open</code> (which is called by <code>H5Fopen()</code>
|
|
and <code>H5Fcreate</code>) that contains a
|
|
<code>printf(3c)</code>-style integer format specifier.
|
|
Currently, the default low-level file driver is used for all
|
|
family members (H5F_LOW_DFLT, usually set to be Section 2 I/O or
|
|
Section 3 stdio), but we'll probably eventually want to pass
|
|
that as a parameter of the file access property list, which
|
|
hasn't been implemented yet. When creating a family, a default
|
|
family member size is used (defined at the top H5Ffamily.c,
|
|
currently 64MB) but that also should be settable in the file
|
|
access property list. When opening an existing family, the size
|
|
of the first member is used to determine the member size
|
|
(flushing/closing a family ensures that the first member is the
|
|
correct size) but the other family members don't have to be that
|
|
large (the local address space, however, is logically the same
|
|
size for all members).
|
|
|
|
<p>I haven't implemented a split meta/raw family yet but am rather
|
|
curious to see how it would perform. I was planning to use the
|
|
`.h5' extension for the meta data file and `.raw' for the raw
|
|
data file. The high-order bit in the address would determine
|
|
whether the address refers to meta data or raw data. If the user
|
|
passes a name that ends with `.raw' to <code>H5F_open</code>
|
|
then we'll chose the split family and use the default low level
|
|
driver for each of the two family members. Eventually we'll
|
|
want to pass these kinds of things through the file access
|
|
property list instead of relying on naming convention.
|
|
|
|
<h3>External Raw Data</h3>
|
|
|
|
<p>We also need the ability to point to raw data that isn't in the
|
|
HDF5 linear address space. For instance, a dataset might be
|
|
striped across several raw data files.
|
|
|
|
<p>Fortunately, the only two packages that need to be aware of
|
|
this are the packages for reading/writing contiguous raw data
|
|
and discontiguous raw data. Since contiguous raw data is a
|
|
special case, I'll discuss how to implement external raw data in
|
|
the discontiguous case.
|
|
|
|
<p>Discontiguous data is stored as a B-tree whose keys are the
|
|
chunk indices and whose leaf nodes point to the raw data by
|
|
storing a file address. So what we need is some way to name the
|
|
external files, and a way to efficiently store the external file
|
|
name for each chunk.
|
|
|
|
<p>I propose adding to the object header an <em>External File
|
|
List</em> message that is a 1-origin array of file names.
|
|
Then, in the B-tree, each key has an index into the External
|
|
File List (or zero for the HDF5 file) for the file where the
|
|
chunk can be found. The external file index is only used at
|
|
the leaf nodes to get to the raw data (the entire B-tree is in
|
|
the HDF5 file) but because of the way keys are copied among
|
|
the B-tree nodes, it's much easier to store the index with
|
|
every key.
|
|
|
|
<h3>Multiple HDF5 Files</h3>
|
|
|
|
<p>One might also want to combine two or more HDF5 files in a
|
|
manner similar to mounting file systems in Unix. That is, the
|
|
group structure and meta data from one file appear as though
|
|
they exist in the first file. One opens File-A, and then
|
|
<em>mounts</em> File-B at some point in File-A, the <em>mount
|
|
point</em>, so that traversing into the mount point actually
|
|
causes one to enter the root object of File-B. File-A and
|
|
File-B are each complete HDF5 files and can be accessed
|
|
individually without mounting them.
|
|
|
|
<p>We need a couple additional pieces of machinery to make this
|
|
work. First, an haddr_t type (a file address) doesn't contain
|
|
any info about which HDF5 file's address space the address
|
|
belongs to. But since haddr_t is an opaque type except at
|
|
layers 2 and below, it should be quite easy to add a pointer to
|
|
the HDF5 file. This would also remove the H5F_t argument from
|
|
most of the low-level functions since it would be part of the
|
|
OID.
|
|
|
|
<p>The other thing we need is a table of mount points and some
|
|
functions that understand them. We would add the following
|
|
table to each H5F_t struct:
|
|
|
|
<p><code><pre>
|
|
struct H5F_mount_t {
|
|
H5F_t *parent; /* Parent HDF5 file if any */
|
|
struct {
|
|
H5F_t *f; /* File which is mounted */
|
|
haddr_t where; /* Address of mount point */
|
|
} *mount; /* Array sorted by mount point */
|
|
intn nmounts; /* Number of mounted files */
|
|
intn alloc; /* Size of mount table */
|
|
}
|
|
</pre></code>
|
|
|
|
<p>The <code>H5Fmount</code> function takes the ID of an open
|
|
file or group, the name of a to-be-mounted file, the name of the mount
|
|
point, and a file access property list (like <code>H5Fopen</code>).
|
|
It opens the new file and adds a record to the parent's mount
|
|
table. The <code>H5Funmount</code> function takes the parent
|
|
file or group ID and the name of the mount point and disassociates
|
|
the mounted file from the mount point. It does not close the
|
|
mounted file. The <code>H5Fclose</code>
|
|
function closes/unmounts files recursively.
|
|
|
|
<p>The <code>H5G_iname</code> function which translates a name to
|
|
a file address (<code>haddr_t</code>) looks at the mount table
|
|
at each step in the translation and switches files where
|
|
appropriate. All name-to-address translations occur through
|
|
this function.
|
|
|
|
<h3>How Long?</h3>
|
|
|
|
<p>I'm expecting to be able to implement the two new flavors of
|
|
single linear address space in about two days. It took two hours
|
|
to implement the malloc/free file driver at level zero and I
|
|
don't expect this to be much more work.
|
|
|
|
<p>I'm expecting three days to implement the external raw data for
|
|
discontiguous arrays. Adding the file index to the B-tree is
|
|
quite trivial; adding the external file list message shouldn't
|
|
be too hard since the object header message class from wich this
|
|
message derives is fully implemented; and changing
|
|
<code>H5F_istore_read</code> should be trivial. Most of the
|
|
time will be spent designing a way to cache Unix file
|
|
descriptors efficiently since the total number open files
|
|
allowed per process could be much smaller than the total number
|
|
of HDF5 files and external raw data files.
|
|
|
|
<p>I'm expecting four days to implement being able to mount one
|
|
HDF5 file on another. I was originally planning a lot more, but
|
|
making <code>haddr_t</code> opaque turned out to be much easier
|
|
than I planned (I did it last Fri). Most of the work will
|
|
probably be removing the redundant H5F_t arguments for lots of
|
|
functions.
|
|
|
|
<h3>Conclusion</h3>
|
|
|
|
<p>The external raw data could be implemented as a single linear
|
|
address space, but doing so would require one to allocate large
|
|
enough file addresses throughout the file (>32bits) before the
|
|
file was created. It would make mixing an HDF5 file family with
|
|
external raw data, or external HDF5 wrapper around an HDF4 file
|
|
a more difficult process. So I consider the implementation of
|
|
external raw data files as a single HDF5 linear address space a
|
|
kludge.
|
|
|
|
<p>The ability to mount one HDF5 file on another might not be a
|
|
very important feature especially since each HDF5 file must be a
|
|
complete file by itself. It's not possible to stripe an array
|
|
over multiple HDF5 files because the B-tree wouldn't be complete
|
|
in any one file, so the only choice is to stripe the array
|
|
across multiple raw data files and store the B-tree in the HDF5
|
|
file. On the other hand, it might be useful if one file
|
|
contains some public data which can be mounted by other files
|
|
(e.g., a mesh topology shared among collaborators and mounted by
|
|
files that contain other fields defined on the mesh). Of course
|
|
the applications can open the two files separately, but it might
|
|
be more portable if we support it in the library.
|
|
|
|
<p>So we're looking at about two weeks to implement all three
|
|
versions. I didn't get a chance to do any of them in AIO
|
|
although we had long-term plans for the first two with a
|
|
possibility of the third. They'll be much easier to implement in
|
|
HDF5 than AIO since I've been keeping these in mind from the
|
|
start.
|
|
|
|
<hr>
|
|
<address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
|
|
<!-- Created: Sat Nov 8 18:08:52 EST 1997 -->
|
|
<!-- hhmts start -->
|
|
Last modified: Tue Sep 8 14:43:32 EDT 1998
|
|
<!-- hhmts end -->
|
|
</body>
|
|
</html>
|