[svn-r4193] Purpose:

New section -- "Freespace Management"
Description:
    Added "Freespace Management" section.
    Minor formatting.
Platforms tested:
    IE 5
This commit is contained in:
Frank Baker 2001-07-11 17:01:45 -05:00
parent 4b218c6a58
commit 7c706d9d14

View File

@ -58,12 +58,100 @@
<h2>2. Dataset Chunking</h2>
Appropriate dataset chunking can make a siginificant difference
in HDF5 performance. This topic is discussed in
<a href="Chunking.html">Dataset Chunking Issues</a> elsewhere
in this <cite>User's Guide</cite>.
Appropriate dataset chunking can make a siginificant difference
in HDF5 performance. This topic is discussed in
<a href="Chunking.html">Dataset Chunking Issues</a> elsewhere
in this <cite>User's Guide</cite>.
<h2>3. Use of the Pablo Instrumentation of HDF5</h2>
<h2>3. Freespace Management</h2>
<p>HDF5 does not yet manage freespace as effectively as it might.
While a file is opened, the library actively tracks and re-uses
<em>freespace</em>, i.e., space that is freed (or released)
during the run.
But the library does not yet manage freespace across the
closing and reopening of a file; when a file is closed,
all knowledge of available freespace is lost.
What was freespace becomes an unusable <em>hole</em> in the file.
<p>There are several circumstances that can result in freespace
in an HDF5 file:
<ul>
<li>Reading then rewriting a dataset or compressed dataset
chunk.<sup><a href="#footcchunk">1</a></sup>
<ul>
<li>If the rewritten dataset or compressed chunk is the same
size as or smaller than the original, it will be written
to the same file location.
<li>If, however, the dataset or compressed chunk is larger
than the original, it will be written contiguously elsewhere
in the file, leaving freespace at the original location.
<li>If the rewritten dataset or compressed chunk is
substantially smaller than the original, the remaining
space will be released and identified as freespace.
</ul>
<li>Deleting (or unlinking) a dataset or group.
<ul>
<li>If an object, such as a dataset, group, or named datatype,
is deleted (normally with <code>H5Gunlink</code>),
the space previously occupied by the object is released
and identified as freespace.
</ul>
</ul>
<p>As stated above, freespace is not managed across the
closing and reopening of an HDF5 file; file space that was
known freespace while the file remained open becomes an
inaccessible hole when the file is closed.
Thus, if a file is often closed and reopened, datasets
frequently rewritten, or groups and/or datasets frequently
added and deleted, that file can develop large numbers of
holes and grow unnecessarily large. This can, in turn,
seriously impair application or library performance
as the file ages.
<p>An <code>h5pack</code> utility would enable <em>packing</em>
a file to remove the holes, but writing such a utility to
universally pack the file correctly is a complex task and the
HDF5 development team has not to date had the resources to
complete the task.
<p>For application developers or researchers who find themselves
working with files that become bloated in this manner, there
are, at this time, two remedies:
<ul>
<li><code>H5view</code>, an HDF5 Java tool, allows the user
to open a file and, using the <code>Save As...</code> feature,
save the file under a new filename. The new file can then
be closed and will be a packed version of the original file.
This approach is reasonably reliable, but with two caveats:
<ul>
<li>It is not automated.
<li>This ability is a side-effect of the tool's design;
it was not designed for this purpose and this approach
to file packing has not been exhaustively tested.
</ul>
<li>An application developer or researcher can write a utility
that is tuned to their data and file structures. This
untility can then read in a file, copy the structures and
datasets to a new file, and write the new file to storage.
This will eliminate the holes, making the new file a
fully-packed version of the original file.
</ul>
<a name="footcchunk">
<p></a>
<sup>1</sup>
<font size=-1>
This is a problem only with compressed chunks.
The compression ratio of data is highly dependent on the data
itself; regardless of whether the <em>size</em> of the data
changes, the size of the compressed data change substantially
as the data changes. Uncompressed chunks do not vary in size,
so this issue does not arise.
</font>
<h2>4. Use of the Pablo Instrumentation of HDF5</h2>
Pablo HDF5 Trace software provides a means of measuring the
performance of programs using HDF5.
@ -147,7 +235,7 @@
<!-- Created: Thu Oct 14 16:46:00 CDT 1999 -->
<!-- hhmts start -->
Last modified: 14 October 1999
Last modified: 11 July 2001
<!-- hhmts end -->
<br>