hdf5/doc/html/storage.html

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
  <head>
    <title>Raw Data Storage in HDF5</title>
  </head>

  <body>
    <h1>Raw Data Storage in HDF5</h1>

    <p>This document describes the various ways that raw data is
      stored in an HDF5 file and the object header messages which
      contain the parameters for the storage.

    <p>Raw data storage has three components: the mapping from some
      logical multi-dimensional element space to the linear address
      space of a file, compression of the raw data on disk, and
      striping of raw data across multiple files.  These components
      are orthogonal.

    <p>Some goals of the storage mechanism are to be able to
      efficently store data which is:

    <dl>
      <dt>Small
      <dd>Small pieces of raw data can be treated as meta data and
	stored in the object header.  This will be achieved by storing
	the raw data in the object header with message 0x0006.
	Compression and striping are not supported in this case.

      <dt>Complete Large
      <dd>The library should be able to store large arrays
	contiguously in the file provided the user knows the final
	array size a priori.  The array can then be read/written in a
	single I/O request.  This is accomplished by describing the
	storage with object header message 0x0005. Compression and
	striping are not supported in this case.

      <dt>Sparse Large
      <dd>A large sparse raw data array should be stored in a manner
	that is space-efficient but one in which any element can still
	be accessed in a reasonable amount of time. Implementation
	details are below.

      <dt>Dynamic Size
      <dd>One often doesn't have prior knowledge of the size of an
	array. It would be nice to allow arrays to grow dynamically in
	any dimension. It might also be nice to allow the array to
	grow in the negative dimension directions if convenient to
	implement. Implementation details are below.

      <dt>Subslab Access
      <dd>Some multi-dimensional arrays are almost always accessed by
	subslabs. For instance, a 2-d array of pixels might always be
	accessed as smaller 1k-by-1k 2-d arrays always aligned on 1k
	index values.  We should be able to store the array in such a
	way that striding though the entire array is not necessary.
	Subslab access might also be useful with compression
	algorithms where each storage slab can be compressed
	independently of the others. Implementation details are below.

      <dt>Compressed
      <dd>Various compression algorithms can be applied to the entire
	array. We're not planning to support separate algorithms (or a
	single algorithm with separate parameters) for each chunk
	although it would be possible to implement that in a manner
	similar to the way striping across files is
	implemented.

      <dt>Striped Across Files
      <dd>The array access functions should support arrays stored
	discontiguously across a set of files.
    </dl>

    <h1>Implementation of Indexed Storage</h1>

    <p>The Sparse Large, Dynamic Size, and Subslab Access methods
      share so much code that they can be described with a single
      message.  The new Indexed Storage Message (<code>0x0008</code>)
      will replace the old Chunked Object (<code>0x0009</code>) and
      Sparse Object (<code>0x000A</code>) Messages.

    <p>
      <center>
	<table border cellpadding=4 width="60%">
	  <caption align=bottom>
	    <b>The Format of the Indexed Storage Message</b>
	  </caption>
	  <tr align=center>
	    <th width="25%">byte</th>
	    <th width="25%">byte</th>
	    <th width="25%">byte</th>
	    <th width="25%">byte</th>
	  </tr>

	  <tr align=center>
	    <td colspan=4><br>Address of B-tree<br><br></td>
	  </tr>
	  <tr align=center>
	    <td>Number of Dimensions</td>
	    <td>Reserved</td>
	    <td>Reserved</td>
	    <td>Reserved</td>
	  </tr>
	  <tr align=center>
	    <td colspan=4>Reserved (4 bytes)</td>
	  </tr>
	  <tr align=center>
	    <td colspan=4>Alignment for Dimension 0 (4 bytes)</td>
	  </tr>
	  <tr align=center>
	    <td colspan=4>Alignment for Dimension 1 (4 bytes)</td>
	  </tr>
	  <tr align=center>
	    <td colspan=4>...</td>
	  </tr>
	  <tr align=center>
	    <td colspan=4>Alignment for Dimension N (4 bytes)</td>
	  </tr>
	</table>
      </center>

    <p>The alignment fields indicate the alignment in logical space to
      use when allocating new storage areas on disk.  For instance,
      writing every other element of a 100-element one-dimensional
      array (using one HDF5 I/O partial write operation per element)
      that has unit storage alignment would result in 50
      single-element, discontiguous storage segments.  However, using
      an alignment of 25 would result in only four discontiguous
      segments.  The size of the message varies with the number of
      dimensions.

    <p>A B-tree is used to point to the discontiguous portions of
      storage which has been allocated for the object.  All keys of a
      particular B-tree are the same size and are a function of the
      number of dimensions. It is therefore not possible to change the
      dimensionality of an indexed storage array after its B-tree is
      created.

    <p>
      <center>
	<table border cellpadding=4 width="60%">
	  <caption align=bottom>
	    <b>The Format of a B-Tree Key</b>
	  </caption>
	  <tr align=center>
	    <th width="25%">byte</th>
	    <th width="25%">byte</th>
	    <th width="25%">byte</th>
	    <th width="25%">byte</th>
	  </tr>

	  <tr align=center>
	    <td colspan=4>External File Number or Zero (4 bytes)</td>
	  </tr>
	  <tr align=center>
	    <td colspan=4>Chunk Offset in Dimension 0 (4 bytes)</td>
	  </tr>
	  <tr align=center>
	    <td colspan=4>Chunk Offset in Dimension 1 (4 bytes)</td>
	  </tr>
	  <tr align=center>
	    <td colspan=4>...</td>
	  </tr>
	  <tr align=center>
	    <td colspan=4>Chunk Offset in Dimension N (4 bytes)</td>
	  </tr>
	</table>
      </center>

    <p>The keys within a B-tree obey an ordering based on the chunk
      offsets.  If the offsets in dimension-0 are equal, then
      dimension-1 is used, etc. The External File Number field
      contains a 1-origin offset into the External File List message
      which contains the name of the external file in which that chunk
      is stored.

    <h1>Implementation of Striping</h1>

    <p>The indexed storage will support arbitrary striping at the
      chunk level; each chunk can be stored in any file.  This is
      accomplished by using the External File Number field of an
      indexed storage B-tree key as a 1-origin offset into an External
      File List Message (0x0009) which takes the form:

    <p>
      <center>
	<table border cellpadding=4 width="60%">
	  <caption align=bottom>
	    <b>The Format of the External File List Message</b>
	  </caption>
	  <tr align=center>
	    <th width="25%">byte</th>
	    <th width="25%">byte</th>
	    <th width="25%">byte</th>
	    <th width="25%">byte</th>
	  </tr>

	  <tr align=center>
	    <td colspan=4><br>Name Heap Address<br><br></td>
	  </tr>
	  <tr align=center>
	    <td colspan=4>Number of Slots Allocated (4 bytes)</td>
	  </tr>
	  <tr align=center>
	    <td colspan=4>Number of File Names (4 bytes)</td>
	  </tr>
	  <tr align=center>
	    <td colspan=4>Byte Offset of Name 1 in Heap (4 bytes)</td>
	  </tr>
	  <tr align=center>
	    <td colspan=4>Byte Offset of Name 2 in Heap (4 bytes)</td>
	  </tr>
	  <tr align=center>
	    <td colspan=4>...</td>
	  </tr>
	  <tr align=center>
	    <td colspan=4><br>Unused Slot(s)<br><br></td>
	  </tr>
	</table>
      </center>

    <p>Each indexed storage array that has all or part of its data
      stored in external files will contain a single external file
      list message.  The size of the messages is determined when the
      message is created, but it may be possible to enlarge the
      message on demand by moving it.  At this time, it's not possible
      for multiple arrays to share a single external file list
      message.

    <dl>
      <dt><code>
	  H5O_efl_t *H5O_efl_new (H5G_entry_t *object, intn
	  nslots_hint, intn heap_size_hint)
	</code>
      <dd>Adds a new, empty external file list message to an object
	header and returns a pointer to that message.  The message
	acts as a cache for file descriptors of external files that
	are open.

      <p><dt><code>
	  intn H5O_efl_index (H5O_efl_t *efl, const char *filename)
	</code>
      <dd>Gets the external file index number for a particular file name.
	If the name isn't in the external file list then it's added to
	the H5O_efl_t struct and immediately written to the object
	header to which the external file list message belongs. Name
	comparison is textual.  Each name should be relative to the
	directory which contains the HDF5 file.

      <p><dt><code>
	  H5F_low_t *H5O_efl_open (H5O_efl_t *efl, intn index, uintn mode)
	</code>
      <dd>Gets a low-level file descriptor for an external file.  The
	external file list caches file descriptors because we might
	have many more external files than there are file descriptors
	available to this process.  The caller should not close this file.

      <p><dt><code>
	  herr_t H5O_efl_release (H5O_efl_t *efl)
	</code>
      <dd>Releases an external file list, closes all files
	associated with that list, and if the list has been modified
	since the call to <code>H5O_efl_new</code> flushes the message
	to disk.
    </dl>

    <hr>
    <address><a href="mailto:robb@arborea.spizella.com">Robb Matzke</a></address>
<!-- Created: Fri Oct  3 09:52:32 EST 1997 -->
<!-- hhmts start -->
Last modified: Tue Nov 25 12:36:50 EST 1997
<!-- hhmts end -->
  </body>
</html>