hdf5/doc/html/Files.html
Frank Baker ac784dcef4 [svn-r648] Make subheading numbering consistent and fix error in level 3 header
numbering sequence.

Add internal named link to "File Families" section (for references
from Reference Manual).
1998-09-01 16:18:42 -05:00

539 lines
24 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<title>HDF5 Files</title>
</head>
<body>
<h1>Files</h1>
<h2>1. Introduction</h2>
<p>HDF5 files are composed of a "boot block" describing information
required to portably access files on multiple platforms, followed
by information about the groups in a file and the datasets in the
file. The boot block contains information about the size of offsets
and lengths of objects, the number of entries in symbol tables
(used to store groups) and additional version information for the
file.
<h2>2. File access modes</h2>
<p>The HDF5 library assumes that all files are implicitly opened for read
access at all times. Passing the <code>H5F_ACC_RDWR</code>
parameter to <code>H5Fopen()</code> allows write access to a
file also. <code>H5Fcreate()</code> assumes write access as
well as read access, passing <code>H5F_ACC_TRUNC</code> forces
the truncation of an existing file, otherwise H5Fcreate will
fail to overwrite an existing file.
<h2>3. Creating, Opening, and Closing Files</h2>
<p>Files are created with the <code>H5Fcreate()</code> function,
and existing files can be accessed with <code>H5Fopen()</code>. Both
functions return an object ID which should be eventually released by
calling <code>H5Fclose()</code>.
<dl>
<dt><code>hid_t H5Fcreate (const char *<em>name</em>, uintn
<em>flags</em>, hid_t <em>create_properties</em>, hid_t
<em>access_properties</em>)</code>
<dd>This function creates a new file with the specified name in
the current directory. The file is opened with read and write
permission, and if the <code>H5F_ACC_TRUNC</code> flag is set,
any current file is truncated when the new file is created.
If a file of the same name exists and the
<code>H5F_ACC_TRUNC</code> flag is not set (or the
<code>H5F_ACC_EXCL</code> bit is set), this function will
fail. Passing <code>H5P_DEFAULT</code> for the creation
and/or access property lists uses the library's default
values for those properties. Creating and changing the
values of a property list is documented further below. The
return value is an ID for the open file and it should be
closed by calling <code>H5Fclose()</code> when it's no longer
needed. A negative value is returned for failure.
<br><br>
<dt><code>hid_t H5Fopen (const char *<em>name</em>, uintn
<em>flags</em>, hid_t <em>access_properties</em>)</code>
<dd>This function opens an existing file with read permission
and write permission if the <code>H5F_ACC_RDWR</code> flag is
set. The <em>access_properties</em> is a file access property
list ID or <code>H5P_DEFAULT</code> for the default I/O access
parameters. Creating and changing the parameters for access
templates is documented further below. Files which are opened
more than once return a unique identifier for each
<code>H5Fopen()</code> call and can be accessed through all
file IDs. The return value is an ID for the open file and it
should be closed by calling <code>H5Fclose()</code> when it's
no longer needed. A negative value is returned for failure.
<br><br>
<dt><code>herr_t H5Fclose (hid_t <em>file_id</em>)</code>
<dd>This function releases resources used by a file which was
opened by <code>H5Fcreate()</code> or <code>H5Fopen()</code>. After
closing a file the <em>file_id</em> should not be used again. This
function returns zero for success or a negative value for failure.
<br><br>
<dt><code>herr_t H5Fflush (hid_t <em>object_id</em>)</code>
<dd>This function will cause all buffers associated with a file
to be immediately flushed to the file. The <em>object_id</em>
can be any object which is associated with a file, including
the file itself.
</dl>
<h2>4. File Property Lists</h2>
<p>Additional parameters to <code>H5Fcreate()</code> or
<code>H5Fopen()</code> are passed through property list
objects, which are created with the <code>H5Pcreate()</code>
function. These objects allow many parameters of a file's
creation or access to be changed from the default values.
Property lists are used as a portable and extensible method of
modifying multiple parameter values with simple API functions.
There are two kinds of file-related property lists,
namely file creation properties and file access properties.
<h3>4.1. File Creation Properties</h3>
<P>File creation property lists apply to <code>H5Fcreate()</code> only
and are used to control the file meta-data which is maintained
in the boot block of the file. The parameters which can be
modified are:
<dl>
<dt>User-Block Size <dd>The "user-block" is a fixed length block of
data located at the beginning of the file which is ignored by the
HDF5 library and may be used to store any data information found
to be useful to applications. This value may be set to any power
of two equal to 512 or greater (i.e. 512, 1024, 2048, etc). This
parameter is set and queried with the
<code>H5Pset_userblock()</code> and
<code>H5Pget_userblock()</code> calls.
<br><br>
<dt>Offset and Length Sizes
<dd>The number of bytes used to store the offset and length of
objects in the HDF5 file can be controlled with this
parameter. Values of 2, 4 and 8 bytes are currently
supported to allow 16-bit, 32-bit and 64-bit files to
be addressed. These parameters are set and queried
with the <code>H5Pset_sizes()</code> and
<code>H5Pget_sizes()</code> calls.
<br><br>
<dt>Symbol Table Parameters
<dd>The size of symbol table B-trees can be controlled by setting
the 1/2 rank and 1/2 node size parameters of the B-tree. These
parameters are set and queried with the
<code>H5Pset_sym_k()</code> and <code>H5Pget_sym_k()</code> calls.
<br><br>
<dt>Indexed Storage Parameters
<dd>The size of indexed storage B-trees can be controlled by
setting the 1/2 rank and 1/2 node size parameters of the B-tree.
These parameters are set and queried with the
<code>H5Pset_istore_k()</code> and <code>H5Pget_istore_k()</code>
calls.
</dl>
<h3>4.2. File Access Property Lists</h3>
<p>File access property lists apply to <code>H5Fcreate()</code> or
<code>H5Fopen()</code> and are used to control different methods of
performing I/O on files.
<dl>
<dt>Unbuffered I/O
<dd>Local permanent files can be accessed with the functions described
in Section 2 of the Posix manual, namely <code>open()</code>,
<code>lseek()</code>, <code>read()</code>, <code>write()</code>, and
<code>close()</code>. The <code>lseek64()</code> function is used
on operating systems that support it. This driver is enabled and
configured with <code>H5Pset_sec2()</code>, and queried with
<code>H5Pget_sec2()</code>.
<br><br>
<dt>Buffered I/O
<dd>Local permanent files can be accessed with the functions declared
in the <code>stdio.h</code> header file, namely
<code>fopen()</code>, <code>fseek()</code>, <code>fread()</code>,
<code>fwrite()</code>, and <code>fclose()</code>. The
<code>fseek64()</code> function is used on operating systems that
support it. This driver is enabled and configured with
<code>H5Pset_stdio()</code>, and queried with
<code>H5Pget_stdio()</code>.
<br><br>
<dt>Memory I/O
<dd>Local temporary files can be created and accessed directly from
memory without ever creating permanent storage. The library uses
<code>malloc()</code> and <code>free()</code> to create storage
space for the file. The total size of the file must be small enough
to fit in virtual memory. The name supplied to
<code>H5Fcreate()</code> is irrelevant, and <code>H5Fopen()</code>
will always fail.
<br><br>
<dt>Parallel Files using MPI I/O
<dd>This driver allows parallel access to a file through the MPI I/O
library. The parameters which can be modified are the MPI
communicator, the info object, and the access mode.
The communicator and info object are saved and then
passed to <code>MPI_File_open()</code> during file creation or open.
The access_mode controls the kind of parallel access the application
intends. (Note that it is likely that the next API revision will
remove the access_mode parameter and have access control specified
via the raw data transfer property list of <code>H5Dread()</code>
and <code>H5Dwrite()</code>.) These parameters are set and queried
with the <code>H5Pset_mpi()</code> and <code>H5Pget_mpi()</code>
calls.
<br><br>
<dt>Data Alignment
<dd>Sometimes file access is faster if certain things are
aligned on file blocks. This can be controlled by setting
alignment properties of a file access property list with the
<code>H5Pset_alignment()</code> function. Any allocation
request at least as large as some threshold will be aligned on
an address which is a multiple of some number.
</dl> </ul>
<h2>5. Examples of using file templates</h2>
<h3>5.1. Example of using file creation templates</h3>
<p>This following example shows how to create a file with 64-bit object
offsets and lengths:<br>
<pre>
hid_t create_template;
hid_t file_id;
create_template = H5Pcreate(H5P_FILE_CREATE);
H5Pset_sizes(create_template, 8, 8);
file_id = H5Fcreate("test.h5", H5F_ACC_TRUNC,
create_template, H5P_DEFAULT);
.
.
.
H5Fclose(file_id);
</pre>
<h3>5.2. Example of using file creation templates</h3>
<p>This following example shows how to open an existing file for
independent datasets access by MPI parallel I/O:<br>
<pre>
hid_t access_template;
hid_t file_id;
access_template = H5Pcreate(H5P_FILE_ACCESS);
H5Pset_mpi(access_template, MPI_COMM_WORLD, MPI_INFO_NULL);
/* H5Fopen must be called collectively */
file_id = H5Fopen("test.h5", H5F_ACC_RDWR, access_template);
.
.
.
/* H5Fclose must be called collectively */
H5Fclose(file_id);
</pre>
<h2>6. Low-level File Drivers</h2>
<p>HDF5 is able to access its address space through various types of
low-level <em>file drivers</em>. For instance, an address space might
correspond to a single file on a Unix file system, multiple files on a
Unix file system, multiple files on a parallel file system, or a block
of memory within the application. Generally, an HDF5 address space is
referred to as an "HDF5 file" regardless of how the space is organized
at the storage level.
<h3>6.1. Unbuffered Permanent Files</h3>
<p>The <em>sec2</em> driver uses functions from section 2 of the
Posix manual to access files stored on a local file system. These are
the <code>open()</code>, <code>close()</code>, <code>read()</code>,
<code>write()</code>, and <code>lseek()</code> functions. If the
operating system supports <code>lseek64()</code> then it is used instead
of <code>lseek()</code>. The library buffers meta data regardless of
the low-level driver, but using this driver prevents data from being
buffered again by the lowest layers of the HDF5 library.
<dl>
<dt><code>H5F_driver_t H5Pget_driver (hid_t
<em>access_properties</em>)</code>
<dd>This function returns the constant <code>H5F_LOW_SEC2</code> if the
<em>sec2</em> driver is defined as the low-level driver for the
specified access property list.
<br><br>
<dt><code>herr_t H5Pset_sec2 (hid_t <em>access_properties</em>)</code>
<dd>The file access properties are set to use the <em>sec2</em>
driver. Any previously defined driver properties are erased from the
property list. Additional parameters may be added to this function in
the future.
<br><br>
<dt><code>herr_t H5Pget_sec2 (hid_t <em>access_properties</em>)</code>
<dd>If the file access property list is set to the <em>sec2</em> driver
then this function returns zero; otherwise it returns a negative
value. In the future, additional arguments may be added to this
function to match those added to <code>H5Pset_sec2()</code>.
</dl>
<h3>6.2. Buffered Permanent Files</h3>
<p>The <em>stdio</em> driver uses the functions declared in the
<code>stdio.h</code> header file to access permanent files in a local
file system. These are the <code>fopen()</code>, <code>fclose()</code>,
<code>fread()</code>, <code>fwrite()</code>, and <code>fseek()</code>
functions. If the operating system supports <code>fseek64()</code> then
it is used instead of <code>fseek()</code>. Use of this driver
introduces an additional layer of buffering beneath the HDF5 library.
<dl>
<dt><code>H5F_driver_t H5Pget_driver(hid_t
<em>access_properties</em>)</code>
<dd>This function returns the constant <code>H5F_LOW_STDIO</code> if the
<em>stdio</em> driver is defined as the low-level driver for the
specified access property list.
<br><br>
<dt><code>herr_t H5Pset_stdio (hid_t <em>access_properties</em>)</code>
<dd>The file access properties are set to use the <em>stdio</em>
driver. Any previously defined driver properties are erased from the
property list. Additional parameters may be added to this function in
the future.
<br><br>
<dt><code>herr_t H5Pget_stdio (hid_t <em>access_properties</em>)</code>
<dd>If the file access property list is set to the <em>stdio</em> driver
then this function returns zero; otherwise it returns a negative
value. In the future, additional arguments may be added to this
function to match those added to <code>H5Pset_stdio()</code>.
</dl>
<h3>6.3. Buffered Temporary Files</h3>
<p>The <em>core</em> driver uses <code>malloc()</code> and
<code>free()</code> to allocated space for a file in the heap. Reading
and writing to a file of this type results in mem-to-mem copies instead
of disk I/O and as a result is somewhat faster. However, the total file
size must not exceed the amount of available virtual memory, and only
one HDF5 file handle can access the file (because the name of such a
file is insignificant and <code>H5Fopen()</code> always fails).
<dl>
<dt><code>H5F_driver_t H5Pget_driver (hid_t
<em>access_properties</em>)</code>
<dd>This function returns the constant <code>H5F_LOW_CORE</code> if the
<em>core</em> driver is defined as the low-level driver for the
specified access property list.
<br><br>
<dt><code>herr_t H5Pset_core (hid_t <em>access_properties</em>, size_t
<em>block_size</em>)</code>
<dd>The file access properties are set to use the <em>core</em>
driver and any previously defined driver properties are erased from
the property list. Memory for the file will always be allocated in
units of the specified <em>block_size</em>. Additional parameters may
be added to this function in the future.
<br><br>
<dt><code>herr_t H5Pget_core (hid_t <em>access_properties</em>, size_t
*<em>block_size</em>)</code>
<dd>If the file access property list is set to the <em>core</em> driver
then this function returns zero and <em>block_size</em> is set to the
block size used for the file; otherwise it returns a negative
value. In the future, additional arguments may be added to this
function to match those added to <code>H5Pset_core()</code>.
</dl>
<h3>6.4. Parallel Files</h3>
<p>This driver uses MPI I/O to provide parallel access to a file.
<dl>
<dt><code>H5F_driver_t H5Pget_driver (hid_t
<em>access_properties</em>)</code>
<dd>This function returns the constant <code>H5F_LOW_MPI</code> if the
<em>mpi</em> driver is defined as the low-level driver for the
specified access property list.
<br><br>
<dt><code>herr_t H5Pset_mpi (hid_t <em>access_properties</em>, MPI_Comm
<em>comm</em>, MPI_info <em>info</em>)</code>
<dd>The file access properties are set to use the <em>mpi</em>
driver and any previously defined driver properties are erased from
the property list. Additional parameters may be added to this
function in the future.
<br><br>
<dt><code>herr_t H5Pget_mpi (hid_t <em>access_properties</em>, MPI_Comm
*<em>comm</em>, MPI_info *<em>info</em>)</code>
<dd>If the file access property list is set to the <em>mpi</em> driver
then this function returns zero and <em>comm</em>, and <em>info</em>
are set to the values stored in the property
list; otherwise the function returns a negative value. In the future,
additional arguments may be added to this function to match those
added to <code>H5Pset_mpi()</code>.
</dl>
<a name="Files_Families">
<h3>6.5. File Families</h3>
</a>
<p>A single HDF5 address space may be split into multiple files which,
together, form a file family. Each member of the family must be the
same logical size although the size and disk storage reported by
<code>ls</code>(1) may be substantially smaller. The name passed to
<code>H5Fcreate()</code> or <code>H5Fopen()</code> should include a
<code>printf(3c)</code> style integer format specifier which will be
replaced with the family member number (the first family member is
zero).
<p>Any HDF5 file can be split into a family of files by running
the file through <code>split</code>(1) and numbering the output
files. However, because HDF5 is lazy about extending the size
of family members, a valid file cannot generally be created by
concatenation of the family members. Additionally,
<code>split</code> and <code>cat</code> don't attempt to
generate files with holes. The <code>h5repart</code> program
can be used to repartition an HDF5 file or family into another
file or family and preserves holes in the files.
<dl>
<dt><code>h5repart</code> [<code>-v</code>] [<code>-b</code>
<em>block_size</em>[<em>suffix</em>]] [<code>-m</code>
<em>member_size</em>[<em>suffix</em>]] <em>source
destination</em>
<dd>This program repartitions an HDF5 file by copying the source
file or family to the destination file or family preserving
holes in the underlying Unix files. Families are used for the
source and/or destination if the name includes a
<code>printf</code>-style integer format such as "%d". The
<code>-v</code> switch prints input and output file names on
the standard error stream for progress monitoring,
<code>-b</code> sets the I/O block size (the default is 1kB),
and <code>-m</code> sets the output member size if the
destination is a family name (the default is 1GB). The block
and member sizes may be suffixed with the letters
<code>g</code>, <code>m</code>, or <code>k</code> for GB, MB,
or kB respectively.
<br><br>
<dt><code>H5F_driver_t H5Pget_driver (hid_t
<em>access_properties</em>)</code>
<dd>This function returns the constant <code>H5F_LOW_FAMILY</code> if
the <em>family</em> driver is defined as the low-level driver for the
specified access property list.
<br><br>
<dt><code>herr_t H5Pset_family (hid_t <em>access_properties</em>,
hsize_t <em>memb_size</em>, hid_t <em>member_properties</em>)</code>
<dd>The file access properties are set to use the <em>family</em>
driver and any previously defined driver properties are erased
from the property list. Each member of the file family will
use <em>member_properties</em> as its file access property
list. The <em>memb_size</em> argument gives the logical size
in bytes of each family member but the actual size could be
smaller depending on whether the file contains holes. The
member size is only used when creating a new file or
truncating an existing file; otherwise the member size comes
from the size of the first member of the family being
opened. Note: if the size of the <code>off_t</code> type is
four bytes then the maximum family member size is usually
2^31-1 because the byte at offset 2,147,483,647 is generally
inaccessable. Additional parameters may be added to this
function in the future.
<br><br>
<dt><code>herr_t H5Pget_family (hid_t <em>access_properties</em>,
hsize_t *<em>memb_size</em>, hid_t
*<em>member_properties</em>)</code>
<dd>If the file access property list is set to the <em>family</em>
driver then this function returns zero; otherwise the function
returns a negative value. On successful return,
<em>access_properties</em> will point to a copy of the member
access property list which should be closed by calling
<code>H5Pclose()</em> when the application is finished with
it. If <em>memb_size</em> is non-null then it will contain
the logical size in bytes of each family member. In the
future, additional arguments may be added to this function to
match those added to <code>H5Pset_family()</code>.
</dl>
<h3>6.6. Split Meta/Raw Files</h3>
<p>On occasion, it might be useful to separate meta data from raw
data. The <em>split</em> driver does this by creating two files: one for
meta data and another for raw data. The application provides a base
file name to <code>H5Fcreate()</code> or <code>H5Fopen()</code> and this
driver appends a file extension which defaults to ".meta" for the meta
data file and ".raw" for the raw data file. Each file can have its own
file access property list which allows, for instance, a split file with
meta data stored with the <em>core</em> driver and raw data stored with
the <em>sec2</em> driver.
<dl>
<dt><code>H5F_driver_t H5Pget_driver (hid_t
<em>access_properties</em>)</code>
<dd>This function returns the constant <code>H5F_LOW_SPLIT</code> if
the <em>split</em> driver is defined as the low-level driver for the
specified access property list.
<br><br>
<dt><code>herr_t H5Pset_split (hid_t <em>access_properties</em>,
const char *<em>meta_extension</em>, hid_t
<em>meta_properties</em>, const char *<em>raw_extension</em>, hid_t
<em>raw_properties</em>)</code>
<dd>The file access properties are set to use the <em>split</em>
driver and any previously defined driver properties are erased from
the property list. The meta file will have a name which is formed by
adding <em>meta_extension</em> (or ".meta") to the end of the base
name and will be accessed according to the
<em>meta_properties</em>. The raw file will have a name which is
formed by appending <em>raw_extension</em> (or ".raw") to the base
name and will be accessed according to the <em>raw_properties</em>.
Additional parameters may be added to this function in the future.
<br><br>
<dt><code>herr_t H5Pget_split (hid_t <em>access_properties</em>,
size_t <em>meta_ext_size</em>, const char *<em>meta_extension</em>,
hid_t <em>meta_properties</em>, size_t <em>raw_ext_size</em>, const
char *<em>raw_extension</em>, hid_t *<em>raw_properties</em>)</code>
<dd>If the file access property list is set to the <em>split</em>
driver then this function returns zero; otherwise the function
returns a negative value. On successful return,
<em>meta_properties</em> and <em>raw_properties</em> will
point to copies of the meta and raw access property lists
which should be closed by calling <code>H5Pclose()</em> when
the application is finished with them, but if the meta and/or
raw file has no property list then a negative value is
returned for that property list handle. Also, if
<em>meta_extension</em> and/or <em>raw_extension</em> are
non-null pointers, at most <em>meta_ext_size</em> or
<em>raw_ext_size</em> characters of the meta or raw file name
extension will be copied to the specified buffer. If the
actual name is longer than what was requested then the result
will not be null terminated (similar to
<code>strncpy()</code>). In the future, additional arguments
may be added to this function to match those added to
<code>H5Pset_split()</code>.
</dl>
<hr>
<address><a href="mailto:koziol@ncsa.uiuc.edu">Quincey Koziol</a></address>
<address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
<!-- Created: Tue Jan 27 09:11:27 EST 1998 -->
<!-- hhmts start -->
Last modified: Thu Aug 6 16:17:08 EDT 1998
<!-- hhmts end -->
</body>
</html>