hdf5/doxygen/dox/LearnBasics2.dox
Mark Kittisopikul 5d3d43b1ee
Fix new codespell issues (#2521)
* Fix new codespell issues

* Have codespell ignore ./config/sanitizer/sanitizers.cmake
2023-03-08 14:42:47 -06:00

1160 lines
48 KiB
Plaintext

/** @page LBGrpCreate Creating an Group
Navigate back: \ref index "Main" / \ref GettingStarted / \ref LearnBasics
<hr>
\section secLBGrpCreate Creating an group
An HDF5 group is a structure containing zero or more HDF5 objects. The two primary HDF5 objects are groups and datasets. To create a group, the calling program must:
<ol>
<li>Obtain the location identifier where the group is to be created.</li>
<li>Create the group.</li>
<li>Close the group.</li>
</ol>
To create a group, the calling program must call #H5Gcreate.
To close the group, #H5Gclose must be called. The close call is mandatory.
For example:
<em>C</em>
\code
group_id = H5Gcreate(file_id, "/MyGroup", H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
status = H5Gclose (group_id);
\endcode
<em>Fortran</em>
\code
CALL h5gcreate_f (loc_id, name, group_id, error)
CALL h5gclose_f (group_id, error)
\endcode
\section secLBGrpCreateRWEx Programming Example
\subsection secLBGrpCreateRWExDesc Description
See \ref LBExamples for the examples used in the \ref LearnBasics tutorial.
The example shows how to create and close a group. It creates a file called <code style="background-color:whitesmoke;">group.h5</code> in C
(<code style="background-color:whitesmoke;">groupf.h5</code> for FORTRAN), creates a group called MyGroup in the root group, and then closes the group and file.
For details on compiling an HDF5 application:
[ \ref LBCompiling ]
\subsection secLBGrpCreateRWExCont File Contents
Shown below is the contents and the definition of the group of <code style="background-color:whitesmoke;">group.h5</code> (created by the C program).
(The FORTRAN program creates the HDF5 file <code style="background-color:whitesmoke;">groupf.h5</code> and the resulting DDL shows the filename
<code style="background-color:whitesmoke;">groupf.h5</code> in the first line.)
<table>
<caption>The Contents of group.h5.</caption>
<tr>
<td>
\image html imggrpcreate.gif
</td>
</tr>
</table>
<em>group.h5 in DDL</em>
\code
HDF5 "group.h5" {
GROUP "/" {
GROUP "MyGroup" {
}
}
}
\endcode
<hr>
Navigate back: \ref index "Main" / \ref GettingStarted / \ref LearnBasics
@page LBGrpCreateNames Creating Groups using Absolute and Relative Names
Navigate back: \ref index "Main" / \ref GettingStarted / \ref LearnBasics
<hr>
Recall that to create an HDF5 object, we have to specify the location where the object is to be created.
This location is determined by the identifier of an HDF5 object and the name of the object to be created.
The name of the created object can be either an absolute name or a name relative to the specified identifier.
In the previous example, we used the file identifier and the absolute name <code style="background-color:whitesmoke;">/MyGroup</code> to create a group.
In this section, we discuss HDF5 names and show how to use absolute and relative names.
\section secLBGrpCreateNames Names
HDF5 object names are a slash-separated list of components. There are few restrictions on names: component
names may be any length except zero and may contain any character except slash (<code style="background-color:whitesmoke;">/</code>) and the null terminator.
A full name may be composed of any number of component names separated by slashes, with any of the component
names being the special name <code style="background-color:whitesmoke;">.</code> (a dot or period). A name which begins with a slash is an <em>absolute name</em> which
is accessed beginning with the root group of the file; all other names are <em>relative names</em> and and the named
object is accessed beginning with the specified group. A special case is the name <code style="background-color:whitesmoke;">/</code> (or equivalent) which
refers to the root group.
Functions which operate on names generally take a location identifier, which can be either a file identifier
or a group identifier, and perform the lookup with respect to that location. Several possibilities are
described in the following table:
<table>
<tr>
<th><strong>Location Type</strong></th>
<th><strong>Object Name</strong></th>
<th><strong>Description</strong></th>
</tr>
<tr>
<th><strong>File identifier</strong></th>
<td>/foo/bar</td>
<td>The object bar in group foo in the root group.</td>
</tr>
<tr>
<th><strong>Group identifier</strong></th>
<td>/foo/bar</td>
<td>The object bar in group foo in the root group of the file containing the specified group.
In other words, the group identifier's only purpose is to specify a file.</td>
</tr>
<tr>
<th><strong>File identifier</strong></th>
<td>/</td>
<td>The root group of the specified file.</td>
</tr>
<tr>
<th><strong>Group identifier</strong></th>
<td>/</td>
<td>The root group of the file containing the specified group.</td>
</tr>
<tr>
<th><strong>Group identifier</strong></th>
<td>foo/bar</td>
<td>The object bar in group foo in the specified group.</td>
</tr>
<tr>
<th><strong>File identifier</strong></th>
<td>.</td>
<td>The root group of the file.</td>
</tr>
<tr>
<th><strong>Group identifier</strong></th>
<td>.</td>
<td>The specified group.</td>
</tr>
<tr>
<th><strong>Other identifier</strong></th>
<td>.</td>
<td>The specified object.</td>
</tr>
</table>
\section secLBGrpCreateNamesEx Programming Example
\subsection secLBGrpCreateNamesExDesc Description
See \ref LBExamples for the examples used in the \ref LearnBasics tutorial.
The example code shows how to create groups using absolute and relative names. It creates three groups: the first two groups are created using
the file identifier and the group absolute names while the third group is created using a group identifier and a name relative to the specified group.
For details on compiling an HDF5 application:
[ \ref LBCompiling ]
\subsection secLBGrpCreateNamesExRem Remarks
#H5Gcreate creates a group at the location specified by a location identifier and a name. The location identifier
can be a file identifier or a group identifier and the name can be relative or absolute.
The first #H5Gcreate/h5gcreate_f creates the group <code style="background-color:whitesmoke;">MyGroup</code> in the root group of the specified file.
The second #H5Gcreate/h5gcreate_f creates the group <code style="background-color:whitesmoke;">Group_A</code> in the group <code style="background-color:whitesmoke;">MyGroup</code> in the root group of the specified
file. Note that the parent group (<code style="background-color:whitesmoke;">MyGroup</code>) already exists.
The third #H5Gcreate/h5gcreate_f creates the group <code style="background-color:whitesmoke;">Group_B</code> in the specified group.
\subsection secLBGrpCreateNamesExCont File Contents
Shown below is the contents and the definition of the group of <code style="background-color:whitesmoke;">groups.h5</code> (created by the C program).
(The FORTRAN program creates the HDF5 file <code style="background-color:whitesmoke;">groupsf.h5</code> and the resulting DDL shows the filename
<code style="background-color:whitesmoke;">groupsf.h5</code> in the first line.)
<table>
<caption>The Contents of groups.h5.</caption>
<tr>
<td>
\image html imggrps.gif
</td>
</tr>
</table>
<em>groups.h5 in DDL</em>
\code
HDF5 "groups.h5" {
GROUP "/" {
GROUP "MyGroup" {
GROUP "Group_A" {
}
GROUP "Group_B" {
}
}
}
}
\endcode
<hr>
Navigate back: \ref index "Main" / \ref GettingStarted / \ref LearnBasics
@page LBGrpDset Creating Datasets in Groups
Navigate back: \ref index "Main" / \ref GettingStarted / \ref LearnBasics
<hr>
\section secLBGrpDset Datasets in Groups
We have shown how to create groups, datasets, and attributes. In this section, we show how to create
datasets in groups. Recall that #H5Dcreate creates a dataset at the location specified by a location
identifier and a name. Similar to #H5Gcreate, the location identifier can be a file identifier or a
group identifier and the name can be relative or absolute. The location identifier and the name
together determine the location where the dataset is to be created. If the location identifier and
name refer to a group, then the dataset is created in that group.
\section secLBGrpDsetEx Programming Example
\subsection secLBGrpDsetExDesc Description
See \ref LBExamples for the examples used in the \ref LearnBasics tutorial.
The example shows how to create a dataset in a particular group. It opens the file created in the previous example and creates two datasets:
For details on compiling an HDF5 application:
[ \ref LBCompiling ]
\subsection secLBGrpDsetExCont File Contents
Shown below is the contents and the definition of the group of <code style="background-color:whitesmoke;">groups.h5</code> (created by the C program).
(The FORTRAN program creates the HDF5 file <code style="background-color:whitesmoke;">groupsf.h5</code> and the resulting DDL shows the filename
<code style="background-color:whitesmoke;">groupsf.h5</code> in the first line.)
<table>
<caption>The contents of the file groups.h5 (groupsf.h5 for FORTRAN)</caption>
<tr>
<td>
\image html imggrpdsets.gif
</td>
</tr>
</table>
<em>groups.h5 in DDL</em>
\code
HDF5 "groups.h5" {
GROUP "/" {
GROUP "MyGroup" {
GROUP "Group_A" {
DATASET "dset2" {
DATATYPE { H5T_STD_I32BE }
DATASPACE { SIMPLE ( 2, 10 ) / ( 2, 10 ) }
DATA {
1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
}
}
}
GROUP "Group_B" {
}
DATASET "dset1" {
DATATYPE { H5T_STD_I32BE }
DATASPACE { SIMPLE ( 3, 3 ) / ( 3, 3 ) }
DATA {
1, 2, 3,
1, 2, 3,
1, 2, 3
}
}
}
}
}
\endcode
<em>groupsf.h5 in DDL</em>
\code
HDF5 "groupsf.h5" {
GROUP "/" {
GROUP "MyGroup" {
GROUP "Group_A" {
DATASET "dset2" {
DATATYPE { H5T_STD_I32BE }
DATASPACE { SIMPLE ( 10, 2 ) / ( 10, 2 ) }
DATA {
1, 1,
2, 2,
3, 3,
4, 4,
5, 5,
6, 6,
7, 7,
8, 8,
9, 9,
10, 10
}
}
}
GROUP "Group_B" {
}
DATASET "dset1" {
DATATYPE { H5T_STD_I32BE }
DATASPACE { SIMPLE ( 3, 3 ) / ( 3, 3 ) }
DATA {
1, 1, 1,
2, 2, 2,
3, 3, 3
}
}
}
}
}
\endcode
<hr>
Navigate back: \ref index "Main" / \ref GettingStarted / \ref LearnBasics
@page LBDsetSubRW Reading From or Writing To a Subset of a Dataset
Navigate back: \ref index "Main" / \ref GettingStarted / \ref LearnBasics
<hr>
\section secLBDsetSubRW Dataset Subsets
There are two ways that you can select a subset in an HDF5 dataset and read or write to it:
<ul><li>
<strong>Hyperslab Selection</strong>: The #H5Sselect_hyperslab call selects a logically contiguous
collection of points in a dataspace, or a regular pattern of points or blocks in a dataspace.
</li><li>
<strong>Element Selection</strong>: The #H5Sselect_elements call selects elements in an array.
</li></ul>
HDF5 allows you to read from or write to a portion or subset of a dataset by:
\li Selecting a Subset of the Dataset's Dataspace,
\li Selecting a Memory Dataspace,
\li Reading From or Writing to a Dataset Subset.
\section secLBDsetSubRWSel Selecting a Subset of the Dataset's Dataspace
First you must obtain the dataspace of a dataset in a file by calling #H5Dget_space.
Then select a subset of that dataspace by calling #H5Sselect_hyperslab. The <em>offset</em>, <em>count</em>, <em>stride</em>
and <em>block</em> parameters of this API define the shape and size of the selection. They must be arrays
with the same number of dimensions as the rank of the dataset's dataspace. These arrays <strong>ALL</strong> work
together to define a selection. A change to one of these arrays can affect the others.
\li \em offset: An array that specifies the offset of the starting element of the specified hyperslab.
\li \em count: An array that determines how many blocks to select from the dataspace in each dimension. If the block
size for a dimension is one then the count is the number of elements along that dimension.
\li \em stride: An array that allows you to sample elements along a dimension. For example, a stride of one (or NULL)
will select every element along a dimension, a stride of two will select every other element, and a stride of three
will select an element after every two elements.
\li \em block: An array that determines the size of the element block selected from a dataspace. If the block size
is one or NULL then the block size is a single element in that dimension.
\section secLBDsetSubRWMem Selecting a Memory Dataspace
You must select a memory dataspace in addition to a file dataspace before you can read a subset from or write a subset
to a dataset. A memory dataspace can be specified by calling #H5Screate_simple.
The memory dataspace passed to the read or write call must contain the same number of elements as the file dataspace.
The number of elements in a dataspace selection can be determined with the #H5Sget_select_npoints API.
\section secLBDsetSubRWSub Reading From or Writing To a Dataset Subset
To read from or write to a dataset subset, the #H5Dread and #H5Dwrite routines are used. The memory and file dataspace
identifiers from the selections that were made are passed into the read or write call. For example (C):
\code
status = H5Dwrite (.., .., memspace_id, dataspace_id, .., ..);
\endcode
\section secLBDsetSubRWProg Programming Example
\subsection subsecLBDsetSubRWProgDesc Description
See \ref LBExamples for the examples used in the \ref LearnBasics tutorial.
The example creates an 8 x 10 integer dataset in an HDF5 file. It then selects and writes to a 3 x 4 subset
of the dataset created with the dimensions offset by 1 x 2. (If using Fortran, the dimensions will be swapped.
The dataset will be 10 x 8, the subset will be 4 x 3, and the offset will be 2 x 1.)
PLEASE NOTE that the examples and images below were created using C.
The following image shows the dataset that gets written originally, and the subset of data that gets modified
afterwards. Dimension 0 is vertical and Dimension 1 is horizontal as shown below:
<table>
<tr>
<td>
\image html LBDsetSubRWProg.png
</td>
</tr>
</table>
The subset on the right above is created using these values for offset, count stride, and block:
\code
offset = {1, 2}
count = {3, 4}
stride = {1, 1}
block = {1, 1}
\endcode
\subsection subsecLBDsetSubRWProgExper Experiments with Different Selections
Following are examples of changes that can be made to the example code provided to better understand
how to make selections.
\subsubsection subsubsecLBDsetSubRWProgExperOne Example 1
By default the example code will select and write to a 3 x 4 subset. You can modify the count
parameter in the example code to select a different subset, by changing the value of
DIM0_SUB (C, C++) / dim0_sub (Fortran) near the top. Change its value to 7 to create a 7 x 4 subset:
<table>
<tr>
<td>
\image html imgLBDsetSubRW11.png
</td>
</tr>
</table>
If you were to change the subset to 8 x 4, the selection would be beyond the extent of the dimension:
<table>
<tr>
<td>
\image html imgLBDsetSubRW12.png
</td>
</tr>
</table>
The write will fail with the error: "<strong>file selection+offset not within extent</strong>"
\subsubsection subsubsecLBDsetSubRWProgExperTwo Example 2
In the example code provided, the memory and file dataspaces passed to the H5Dwrite call have the
same size, 3 x 4 (DIM0_SUB x DIM1_SUB). Change the size of the memory dataspace to be 4 x 4 so that
they do not match, and then compile:
\code
dimsm[0] = DIM0_SUB + 1;
dimsm[1] = DIM1_SUB;
memspace_id = H5Screate_simple (RANK, dimsm, NULL);
\endcode
The code will fail with the error: "<strong>src and dest data spaces have different sizes</strong>"
How many elements are in the memory and file dataspaces that were specified above? Add these lines:
\code
hssize_t size;
/* Just before H5Dwrite call the following */
size = H5Sget_select_npoints (memspace_id);
printf ("\nmemspace_id size: %i\n", size);
size = H5Sget_select_npoints (dataspace_id);
printf ("dataspace_id size: %i\n", size);
\endcode
You should see these lines followed by the error:
\code
memspace_id size: 16
dataspace_id size: 12
\endcode
\subsubsection subsubsecLBDsetSubRWProgExperThree Example 3
This example shows the selection that occurs if changing the values of the <em>offset</em>, <em>count</em>,
<em>stride</em> and <em>block</em> parameters in the example code.
This will select two blocks. The <em>count</em> array specifies the number of blocks. The <em>block</em> array
specifies the size of a block. The <em>stride</em> must be modified to accommodate the block <em>size</em>.
<table>
<tr>
<td>
\image html imgLBDsetSubRW31.png
</td>
</tr>
</table>
Now try modifying the count as shown below. The write will fail because the selection goes beyond the extent of the dimension:
<table>
<tr>
<td>
\image html imgLBDsetSubRW32.png
</td>
</tr>
</table>
If the offset were 1x1 (instead of 1x2), then the selection can be made:
<table>
<tr>
<td>
\image html imgLBDsetSubRW33.png
</td>
</tr>
</table>
The selections above were tested with the
<a href="https://support.hdfgroup.org/ftp/HDF5/examples/howto/subset/h5_subsetbk.c">h5_subsetbk.c</a>
example code. The memory dataspace was defined as one-dimensional.
\subsection subsecLBDsetSubRWProgRem Remarks
\li In addition to #H5Sselect_hyperslab, this example introduces the #H5Dget_space call to obtain the dataspace of a dataset.
\li If using the default values for the stride and block parameters of #H5Sselect_hyperslab, then, for C you can specify NULL
for these parameters, rather than passing in an array for each, and for Fortran 90 you can omit these parameters.
<hr>
Navigate back: \ref index "Main" / \ref GettingStarted / \ref LearnBasics
@page LBDatatypes Datatype Basics
Navigate back: \ref index "Main" / \ref GettingStarted / \ref LearnBasics
<hr>
\section secLBDtype What is a Datatype?
A datatype is a collection of datatype properties which provide complete information for data conversion to or from that datatype.
Datatypes in HDF5 can be grouped as follows:
\li <strong>Pre-Defined Datatypes</strong>: These are datatypes that are created by HDF5. They are actually opened
(and closed) by HDF5, and can have a different value from one HDF5 session to the next.
\li <strong>Derived Datatypes</strong>: These are datatypes that are created or derived from the pre-defined datatypes.
Although created from pre-defined types, they represent a category unto themselves. An example of a commonly used derived
datatype is a string of more than one character.
\section secLBDtypePre Pre-defined Datatypes
The properties of pre-defined datatypes are:
\li Pre-defined datatypes are opened and closed by HDF5.
\li A pre-defined datatype is a handle and is NOT PERSISTENT. Its value can be different from one HDF5 session to the next.
\li Pre-defined datatypes are Read-Only.
\li As mentioned, other datatypes can be derived from pre-defined datatypes.
There are two types of pre-defined datatypes, standard (file) and native.
<h4>Standard</h4>
A standard (or file) datatype can be:
<ul>
<li><strong>Atomic</strong>: A datatype which cannot be decomposed into smaller datatype units at the API level.
The atomic datatypes are:
<ul>
<li>integer</li>
<li>float</li>
<li>string (1-character)</li>
<li>date and time</li>
<li>bitfield</li>
<li>reference</li>
<li>opaque</li>
</ul>
</li>
<li><strong>Composite</strong>: An aggregation of one or more datatypes.
Composite datatypes include:
<ul>
<li>array</li>
<li>variable length</li>
<li>enumeration</li>
<li>compound datatypes</li>
</ul>
Array, variable length, and enumeration datatypes are defined in terms of a single atomic datatype,
whereas a compound datatype is a datatype composed of a sequence of datatypes.
</li>
</ul>
<table>
<tr>
<th><strong>Notes</strong></th>
</tr>
<tr>
<td>
\li Standard pre-defined datatypes are the SAME on all platforms.
\li They are the datatypes that you see in an HDF5 file.
\li They are typically used when creating a dataset.
</td>
</tr>
</table>
<h4>Native</h4>
Native pre-defined datatypes are used for memory operations, such as reading and writing. They are
NOT THE SAME on different platforms. They are similar to C type names, and are aliased to the
appropriate HDF5 standard pre-defined datatype for a given platform.
For example, when on an Intel based PC, #H5T_NATIVE_INT is aliased to the standard pre-defined type,
#H5T_STD_I32LE. On a MIPS machine, it is aliased to #H5T_STD_I32BE.
<table>
<tr>
<th><strong>Notes</strong></th>
</tr>
<tr>
<td>
\li Native datatypes are NOT THE SAME on all platforms.
\li Native datatypes simplify memory operations (read/write). The HDF5 library automatically converts as needed.
\li Native datatypes are NOT in an HDF5 File. The standard pre-defined datatype that a native datatype corresponds
to is what you will see in the file.
</td>
</tr>
</table>
<h4>Pre-Defined</h4>
The following table shows the native types and the standard pre-defined datatypes they correspond
to. (Keep in mind that HDF5 can convert between datatypes, so you can specify a buffer of a larger
type for a dataset of a given type. For example, you can read a dataset that has a short datatype
into a long integer buffer.)
<table>
<caption>Some HDF5 pre-defined native datatypes and corresponding standard (file) type</caption>
<tr>
<th><strong>C Type</strong></th>
<th><strong>HDF5 Memory Type</strong></th>
<th><strong>HDF5 File Type*</strong></th>
</tr>
<tr>
<th span="3"><strong>Integer</strong></th>
</tr>
<tr>
<td>int</td>
<td>#H5T_NATIVE_INT</td>
<td>#H5T_STD_I32BE or #H5T_STD_I32LE</td>
</tr>
<tr>
<td>short</td>
<td>#H5T_NATIVE_SHORT</td>
<td>#H5T_STD_I16BE or #H5T_STD_I16LE</td>
</tr>
<tr>
<td>long</td>
<td>#H5T_NATIVE_LONG</td>
<td>#H5T_STD_I32BE, #H5T_STD_I32LE,
#H5T_STD_I64BE or #H5T_STD_I64LE</td>
</tr>
<tr>
<td>long long</td>
<td>#H5T_NATIVE_LLONG</td>
<td>#H5T_STD_I64BE or #H5T_STD_I64LE</td>
</tr>
<tr>
<td>unsigned int</td>
<td>#H5T_NATIVE_UINT</td>
<td>#H5T_STD_U32BE or #H5T_STD_U32LE</td>
</tr>
<tr>
<td>unsigned short</td>
<td>#H5T_NATIVE_USHORT</td>
<td>#H5T_STD_U16BE or #H5T_STD_U16LE</td>
</tr>
<tr>
<td>unsigned long</td>
<td>#H5T_NATIVE_ULONG</td>
<td>#H5T_STD_U32BE, #H5T_STD_U32LE,
#H5T_STD_U64BE or #H5T_STD_U64LE</td>
</tr>
<tr>
<td>unsigned long long</td>
<td>#H5T_NATIVE_ULLONG</td>
<td>#H5T_STD_U64BE or #H5T_STD_U64LE</td>
</tr>
<tr>
<th span="3"><strong>Float</strong></th>
</tr>
<tr>
<td>float</td>
<td>#H5T_NATIVE_FLOAT</td>
<td>#H5T_IEEE_F32BE or #H5T_IEEE_F32LE</td>
</tr>
<tr>
<td>double</td>
<td>#H5T_NATIVE_DOUBLE</td>
<td>#H5T_IEEE_F64BE or #H5T_IEEE_F64LE</td>
</tr>
</table>
<table>
<caption>Some HDF5 pre-defined native datatypes and corresponding standard (file) type</caption>
<tr>
<th><strong>F90 Type</strong></th>
<th><strong>HDF5 Memory Type</strong></th>
<th><strong>HDF5 File Type*</strong></th>
</tr>
<tr>
<td>integer</td>
<td>H5T_NATIVE_INTEGER</td>
<td>#H5T_STD_I32BE(8,16) or #H5T_STD_I32LE(8,16)</td>
</tr>
<tr>
<td>real</td>
<td>H5T_NATIVE_REAL</td>
<td>#H5T_IEEE_F32BE or #H5T_IEEE_F32LE</td>
</tr>
<tr>
<td>double-precision</td>
<td>#H5T_NATIVE_DOUBLE</td>
<td>#H5T_IEEE_F64BE or #H5T_IEEE_F64LE</td>
</tr>
</table>
<table>
<tr>
<td>* Note that the HDF5 File Types listed are those that are most commonly created.
The file type created depends on the compiler switches and platforms being
used. For example, on the Cray an integer is 64-bit, and using #H5T_NATIVE_INT (C)
or H5T_NATIVE_INTEGER (F90) would result in an #H5T_STD_I64BE file type.</td>
</tr>
</table>
The following code is an example of when you would use standard pre-defined datatypes vs. native types:
\code
#include "hdf5.h"
main() {
hid_t file_id, dataset_id, dataspace_id;
herr_t status;
hsize_t dims[2]={4,6};
int i, j, dset_data[4][6];
for (i = 0; i < 4; i++)
for (j = 0; j < 6; j++)
dset_data[i][j] = i * 6 + j + 1;
file_id = H5Fcreate ("dtypes.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
dataspace_id = H5Screate_simple (2, dims, NULL);
dataset_id = H5Dcreate (file_id, "/dset", H5T_STD_I32BE, dataspace_id,
H5P_DEFAULT);
status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL,
H5P_DEFAULT, dset_data);
status = H5Dclose (dataset_id);
status = H5Fclose (file_id);
}
\endcode
By using the native types when reading and writing, the code that reads from or writes to a dataset
can be the same for different platforms.
Can native types also be used when creating a dataset? Yes. However, just be aware that the resulting
datatype in the file will be one of the standard pre-defined types and may be different than expected.
What happens if you do not use the correct native datatype for a standard (file) datatype? Your data
may be incorrect or not what you expect.
\section secLBDtypeDer Derived Datatypes
ANY pre-defined datatype can be used to derive user-defined datatypes.
To create a datatype derived from a pre-defined type:
<ol>
<li>Make a copy of the pre-defined datatype:
\code
tid = H5Tcopy (H5T_STD_I32BE);
\endcode
</li>
<li>Change the datatype.</li>
</ol>
There are numerous datatype functions that allow a user to alter a pre-defined datatype. See
\ref subsecLBDtypeSpecStr below for a simple example.
Refer to the \ref H5T in the \ref RM. Example functions are #H5Tset_size and #H5Tset_precision.
\section secLBDtypeSpec Specific Datatypes
On the <a href="https://portal.hdfgroup.org/display/HDF5/Examples+by+API">Examples by API</a>
page under <a href="https://confluence.hdfgroup.org/display/HDF5/Examples+by+API#ExamplesbyAPI-datatypes">Datatypes</a>
you will find many example programs for creating and reading datasets with different datatypes.
Below is additional information on some of the datatypes. See
the <a href="https://portal.hdfgroup.org/display/HDF5/Examples+by+API">Examples by API</a>
page for examples of these datatypes.
\subsection subsecLBDtypeSpec Array Datatype vs Array Dataspace
#H5T_ARRAY is a datatype, and it should not be confused with the dataspace of a dataset. The dataspace
of a dataset can consist of a regular array of elements. For example, the datatype for a dataset
could be an atomic datatype like integer, and the dataset could be an N-dimensional appendable array,
as specified by the dataspace. See #H5Screate and #H5Screate_simple for details.
Unlimited dimensions and subsetting are not supported when using the #H5T_ARRAY datatype.
The #H5T_ARRAY datatype was primarily created to address the simple case of a compound datatype
when all members of the compound datatype are of the same type and there is no need to subset by
compound datatype members. Creation of such a datatype is more efficient and I/O also requires
less work, because there is no alignment involved.
\subsection subsecLBDtypeSpecArr Array Datatype
The array class of datatypes, #H5T_ARRAY, allows the construction of true, homogeneous,
multi-dimensional arrays. Since these are homogeneous arrays, each element of the array
will be of the same datatype, designated at the time the array is created.
Users may be confused by this datatype, as opposed to a dataset with a simple atomic
datatype (eg. integer) that is an array. See subsecLBDtypeSpec for more information.
Arrays can be nested. Not only is an array datatype used as an element of an HDF5 dataset,
but the elements of an array datatype may be of any datatype, including another array datatype.
Array datatypes <strong>cannot be subdivided for I/O</strong>; the entire array must be transferred from one
dataset to another.
Within certain limitations, outlined in the next paragraph, array datatypes may be N-dimensional
and of any dimension size. <strong>Unlimited dimensions, however, are not supported</strong>. Functionality similar
to unlimited dimension arrays is available through the use of variable-length datatypes.
The maximum number of dimensions, i.e., the maximum rank, of an array datatype is specified by
the HDF5 library constant #H5S_MAX_RANK. The minimum rank is 1 (one). All dimension sizes must
be greater than 0 (zero).
One array datatype may only be converted to another array datatype if the number of dimensions
and the sizes of the dimensions are equal and the datatype of the first array's elements can be
converted to the datatype of the second array's elements.
\subsubsection subsubsecLBDtypeSpecArrAPI Array Datatype APIs
There are three functions that are specific to array datatypes: one, #H5Tarray_create, for creating
an array datatype, and two, #H5Tget_array_ndims and #H5Tget_array_dims
for working with existing array datatypes.
<h4>Creating</h4>
The function #H5Tarray_create creates a new array datatype object. Parameters specify
\li the base datatype of each element of the array,
\li the rank of the array, i.e., the number of dimensions,
\li the size of each dimension, and
\li the dimension permutation of the array, i.e., whether the elements of the array are listed in C or FORTRAN order.
<h4>Working with existing array datatypes</h4>
When working with existing arrays, one must first determine the the rank, or number of dimensions, of the array.
The function #H5Tget_array_dims returns the rank of a specified array datatype.
In many instances, one needs further information. The function #H5Tget_array_dims retrieves the
permutation of the array and the size of each dimension.
\subsection subsecLBDtypeSpecCmpd Compound
\subsubsection subsubsecLBDtypeSpecCmpdProp Properties of compound datatypes
A compound datatype is similar to a struct in C or a common block in Fortran. It is a collection of
one or more atomic types or small arrays of such types. To create and use of a compound datatype
you need to refer to various properties of the data compound datatype:
\li It is of class compound.
\li It has a fixed total size, in bytes.
\li It consists of zero or more members (defined in any order) with unique names and which occupy non-overlapping regions within the datum.
\li Each member has its own datatype.
\li Each member is referenced by an index number between zero and N-1, where N is the number of members in the compound datatype.
\li Each member has a name which is unique among its siblings in a compound datatype.
\li Each member has a fixed byte offset, which is the first byte (smallest byte address) of that member in a compound datatype.
\li Each member can be a small array of up to four dimensions.
Properties of members of a compound datatype are defined when the member is added to the compound type and cannot be subsequently modified.
\subsubsection subsubsecLBDtypeSpecCmpdDef Defining compound datatypes
Compound datatypes must be built out of other datatypes. First, one creates an empty compound
datatype and specifies its total size. Then members are added to the compound datatype in any order.
Member names. Each member must have a descriptive name, which is the key used to uniquely identify
the member within the compound datatype. A member name in an HDF5 datatype does not necessarily
have to be the same as the name of the corresponding member in the C struct in memory, although
this is often the case. Nor does one need to define all members of the C struct in the HDF5
compound datatype (or vice versa).
Offsets. Usually a C struct will be defined to hold a data point in memory, and the offsets of the
members in memory will be the offsets of the struct members from the beginning of an instance of the
struct. The library defines the macro to compute the offset of a member within a struct:
\code
HOFFSET(s,m)
\endcode
This macro computes the offset of member m within a struct variable s.
Here is an example in which a compound datatype is created to describe complex numbers whose type
is defined by the complex_t struct.
\code
typedef struct {
double re; /*real part */
double im; /*imaginary part */
} complex_t;
complex_t tmp; /*used only to compute offsets */
hid_t complex_id = H5Tcreate (H5T_COMPOUND, sizeof tmp);
H5Tinsert (complex_id, "real", HOFFSET(tmp,re), H5T_NATIVE_DOUBLE);
H5Tinsert (complex_id, "imaginary", HOFFSET(tmp,im), H5T_NATIVE_DOUBLE);
\endcode
\subsection subsecLBDtypeSpecRef Reference
There are two types of Reference datatypes in HDF5:
\li \ref subsubsecLBDtypeSpecRefObj
\li \ref subsubsecLBDtypeSpecRefDset
\subsubsection subsubsecLBDtypeSpecRefObj Reference to objects
In HDF5, objects (i.e. groups, datasets, and named datatypes) are usually accessed by name.
There is another way to access stored objects -- by reference.
An object reference is based on the relative file address of the object header in the file
and is constant for the life of the object. Once a reference to an object is created and
stored in a dataset in the file, it can be used to dereference the object it points to.
References are handy for creating a file index or for grouping related objects by storing
references to them in one dataset.
<h4>Creating and storing references to objects</h4>
The following steps are involved in creating and storing file references to objects:
<ol>
<li>Create the objects or open them if they already exist in the file.</li>
<li>Create a dataset to store the objects' references, by specifying #H5T_STD_REF_OBJ as the datatype</li>
<li>Create and store references to the objects in a buffer, using #H5Rcreate.</li>
<li>Write a buffer with the references to the dataset, using #H5Dwrite with the #H5T_STD_REF_OBJ datatype.</li>
</ol>
<h4>Reading references and accessing objects using references</h4>
The following steps are involved:
<ol>
<li>Open the dataset with the references and read them. The #H5T_STD_REF_OBJ datatype must be used to describe the memory datatype.</li>
<li>Use the read reference to obtain the identifier of the object the reference points to using #H5Rdereference.</li>
<li>Open the dereferenced object and perform the desired operations.</li>
<li>Close all objects when the task is complete.</li>
</ol>
\subsubsection subsubsecLBDtypeSpecRefDset Reference to a dataset region
A dataset region reference points to a dataset selection in another dataset.
A reference to the dataset selection (region) is constant for the life of the dataset.
<h4>Creating and storing references to dataset regions</h4>
The following steps are involved in creating and storing references to a dataset region:
\li Create a dataset to store the dataset region (selection), by passing in #H5T_STD_REF_DSETREG for the datatype when calling #H5Dcreate.
\li Create selection(s) in existing dataset(s) using #H5Sselect_hyperslab and/or #H5Sselect_elements.
\li Create reference(s) to the selection(s) using #H5Rcreate and store them in a buffer.
\li Write the references to the dataset regions in the file.
\li Close all objects.
<h4>Reading references to dataset regions</h4>
The following steps are involved in reading references to dataset regions and referenced dataset regions (selections).
<ol>
<li>Open and read the dataset containing references to the dataset regions.
The datatype #H5T_STD_REF_DSETREG must be used during read operation.</li>
<li>Use #H5Rdereference to obtain the dataset identifier from the read dataset region reference.
OR
Use #H5Rget_region to obtain the dataspace identifier for the dataset containing the selection from the read dataset region reference.
</li>
<li>With the dataspace identifier, the \ref H5S interface functions, H5Sget_select_*,
can be used to obtain information about the selection.</li>
<li>Close all objects when they are no longer needed.</li>
</ol>
The dataset with the region references was read by #H5Dread with the #H5T_STD_REF_DSETREG datatype specified.
The read reference can be used to obtain the dataset identifier by calling #H5Rdereference or by obtaining
obtain spatial information (dataspace and selection) with the call to #H5Rget_region.
The reference to the dataset region has information for both the dataset itself and its selection. In both functions:
\li The first parameter is an identifier of the dataset with the region references.
\li The second parameter specifies the type of reference stored. In this example, a reference to the dataset region is stored.
\li The third parameter is a buffer containing the reference of the specified type.
This example introduces several H5Sget_select_* functions used to obtain information about selections:
<table>
<caption>Examples of HDF5 predefined datatypes</caption>
<tr>
<th><strong>Function</strong></th>
<th><strong>Description</strong></th>
</tr>
<tr>
<td>#H5Sget_select_npoints</td>
<td>Returns the number of elements in the hyperslab</td>
</tr>
<tr>
<td>#H5Sget_select_hyper_nblocks</td>
<td>Returns the number of blocks in the hyperslab</td>
</tr>
<tr>
<td>#H5Sget_select_hyper_blocklist</td>
<td>Returns the "lower left" and "upper right" coordinates of the blocks in the hyperslab selection</td>
</tr>
<tr>
<td>#H5Sget_select_bounds</td>
<td>Returns the coordinates of the "minimal" block containing a hyperslab selection</td>
</tr>
<tr>
<td>#H5Sget_select_elem_npoints</td>
<td>Returns the number of points in the element selection</td>
</tr>
<tr>
<td>#H5Sget_select_elem_pointlist</td>
<td>Returns the coordinates of points in the element selection</td>
</tr>
</table>
\subsection subsecLBDtypeSpecStr String
A simple example of creating a derived datatype is using the string datatype,
#H5T_C_S1 (#H5T_FORTRAN_S1) to create strings of more than one character. Strings
can be stored as either fixed or variable length, and may have different rules
for padding of unused storage.
\subsubsection subsecLBDtypeSpecStrFix Fixed Length 5-character String Datatype
\code
hid_t strtype; /* Datatype ID */
herr_t status;
strtype = H5Tcopy (H5T_C_S1);
status = H5Tset_size (strtype, 5); /* create string of length 5 */
\endcode
\subsubsection subsecLBDtypeSpecStrVar Variable Length String Datatype
\code
strtype = H5Tcopy (H5T_C_S1);
status = H5Tset_size (strtype, H5T_VARIABLE);
\endcode
The ability to derive datatypes from pre-defined types allows users to create any number of datatypes,
from simple to very complex.
As the term implies, variable length strings are strings of varying lengths. They are stored internally
in a heap, potentially impacting efficiency in the following ways:
\li Heap storage requires more space than regular raw data storage.
\li Heap access generally reduces I/O efficiency because it requires individual read or write operations
for each data element rather than one read or write per dataset or per data selection.
\li A variable length dataset consists of pointers to the heaps of data, not the actual data. Chunking
and filters, including compression, are not available for heaps.
See \ref subsubsec_datatype_other_strings in the \ref UG, for more information on how fixed and variable
length strings are stored.
\subsection subsecLBDtypeSpecVL Variable Length
Variable-length (VL) datatypes are sequences of an existing datatype (atomic, VL, or compound)
which are not fixed in length from one dataset location to another. In essence, they are similar
to C character strings -- a sequence of a type which is pointed to by a particular type of
pointer -- although they are implemented more closely to FORTRAN strings by including an explicit
length in the pointer instead of using a particular value to terminate the sequence.
VL datatypes are useful to the scientific community in many different ways, some of which are listed below:
<ul>
<li>Ragged arrays: Multi-dimensional ragged arrays can be implemented with the last (fastest changing)
dimension being ragged by using a VL datatype as the type of the element stored. (Or as a field in a compound datatype.)
</li>
<li>Fractal arrays: If a compound datatype has a VL field of another compound type with VL fields
(a nested VL datatype), this can be used to implement ragged arrays of ragged arrays, to whatever
nesting depth is required for the user.
</li>
<li>Polygon lists: A common storage requirement is to efficiently store arrays of polygons with
different numbers of vertices. VL datatypes can be used to efficiently and succinctly describe an
array of polygons with different numbers of vertices.
</li>
<li>Character strings: Perhaps the most common use of VL datatypes will be to store C-like VL character
strings in dataset elements or as attributes of objects.
</li>
<li>Indices: An array of VL object references could be used as an index to all the objects in a file
which contain a particular sequence of dataset values. Perhaps an array something like the following:
\code
Value1: Object1, Object3, Object9
Value2: Object0, Object12, Object14, Object21, Object22
Value3: Object2
Value4: <none>
Value5: Object1, Object10, Object12
.
.
\endcode
</li>
<li>Object Tracking: An array of VL dataset region references can be used as a method of tracking
objects or features appearing in a sequence of datasets. Perhaps an array of them would look like:
\code
Feature1: Dataset1:Region, Dataset3:Region, Dataset9:Region
Feature2: Dataset0:Region, Dataset12:Region, Dataset14:Region,
Dataset21:Region, Dataset22:Region
Feature3: Dataset2:Region
Feature4: <none>
Feature5: Dataset1:Region, Dataset10:Region, Dataset12:Region
.
.
\endcode
</li>
</ul>
\subsubsection subsubsecLBDtypeSpecVLMem Variable-length datatype memory management
With each element possibly being of different sequence lengths for a dataset with a VL datatype,
the memory for the VL datatype must be dynamically allocated. Currently there are two methods
of managing the memory for VL datatypes: the standard C malloc/free memory allocation routines
or a method of calling user-defined memory management routines to allocate or free memory. Since
the memory allocated when reading (or writing) may be complicated to release, an HDF5 routine is
provided to traverse a memory buffer and free the VL datatype information without leaking memory.
\subsubsection subsubsecLBDtypeSpecVLDiv Variable-length datatypes cannot be divided
VL datatypes are designed so that they cannot be subdivided by the library with selections, etc.
This design was chosen due to the complexities in specifying selections on each VL element of a
dataset through a selection API that is easy to understand. Also, the selection APIs work on
dataspaces, not on datatypes. At some point in time, we may want to create a way for dataspaces
to have VL components to them and we would need to allow selections of those VL regions, but
that is beyond the scope of this document.
\subsubsection subsubsecLBDtypeSpecVLErr What happens if the library runs out of memory while reading?
It is possible for a call to #H5Dread to fail while reading in VL datatype information if the memory
required exceeds that which is available. In this case, the #H5Dread call will fail gracefully and any
VL data which has been allocated prior to the memory shortage will be returned to the system via the
memory management routines detailed below. It may be possible to design a partial read API function
at a later date, if demand for such a function warrants.
\subsubsection subsubsecLBDtypeSpecVLStr Strings as variable-length datatypes
Since character strings are a special case of VL data that is implemented in many different ways on
different machines and in different programming languages, they are handled somewhat differently from
other VL datatypes in HDF5.
HDF5 has native VL strings for each language API, which are stored the same way on disk, but are
exported through each language API in a natural way for that language. When retrieving VL strings
from a dataset, users may choose to have them stored in memory as a native VL string or in HDF5's
#hvl_t struct for VL datatypes.
VL strings may be created in one of two ways: by creating a VL datatype with a base type of
#H5T_C_S1 and setting its length to #H5T_VARIABLE. The second method is used to access native VL strings in memory. The
library will convert between the two types, but they are stored on disk using different datatypes
and have different memory representations.
Multi-byte character representations, such as \em UNICODE or \em wide characters in C/C++, will need the
appropriate character and string datatypes created so that they can be described properly through
the datatype API. Additional conversions between these types and the current ASCII characters
will also be required.
Variable-width character strings (which might be compressed data or some other encoding) are not
currently handled by this design. We will evaluate how to implement them based on user feedback.
\subsubsection subsubsecLBDtypeSpecVLAPIs Variable-length datatype APIs
<h4>Creation</h4>
VL datatypes are created with the #H5Tvlen_create function as follows:
\code
type_id = H5Tvlen_create(hid_t base_type_id);
\endcode
The base datatype will be the datatype that the sequence is composed of, characters for character
strings, vertex coordinates for polygon lists, etc. The base datatype specified for the VL datatype
can be of any HDF5 datatype, including another VL datatype, a compound datatype, or an atomic datatype.
<h4>Querying base datatype of VL datatype</h4>
It may be necessary to know the base datatype of a VL datatype before memory is allocated, etc.
The base datatype is queried with the #H5Tget_super function, described in the \ref H5T documentation.
<h4>Querying minimum memory required for VL information</h4>
It order to predict the memory usage that #H5Dread may need to allocate to store VL data while
reading the data, the #H5Dvlen_get_buf_size function is provided:
\code
herr_t H5Dvlen_get_buf_size(hid_t dataset_id, hid_t type_id, hid_t space_id, hsize_t *size)
\endcode
This routine checks the number of bytes required to store the VL data from the dataset, using
the \em space_id for the selection in the dataset on disk and the \em type_id for the memory representation
of the VL data in memory. The *\em size value is modified according to how many bytes are required
to store the VL data in memory.
<h4>Specifying how to manage memory for the VL datatype</h4>
The memory management method is determined by dataset transfer properties passed into the
#H5Dread and #H5Dwrite functions with the dataset transfer property list.
Default memory management is set by using #H5P_DEFAULT for the dataset transfer
property list identifier. If #H5P_DEFAULT is used with #H5Dread, the system \em malloc and \em free
calls will be used for allocating and freeing memory. In such a case, #H5P_DEFAULT should
also be passed as the property list identifier to #H5Dvlen_reclaim.
The rest of this subsection is relevant only to those who choose not to use default memory management.
The user can choose whether to use the system \em malloc and \em free calls or user-defined, or custom,
memory management functions. If user-defined memory management functions are to be used, the
memory allocation and free routines must be defined via #H5Pset_vlen_mem_manager(), as follows:
\code
herr_t H5Pset_vlen_mem_manager(hid_t plist_id, H5MM_allocate_t alloc, void *alloc_info, H5MM_free_t free, void *free_info)
\endcode
The \em alloc and \em free parameters identify the memory management routines to be used. If the user
has defined custom memory management routines, \em alloc and/or \em free should be set to make those
routine calls (i.e., the name of the routine is used as the value of the parameter); if the user
prefers to use the system's \em malloc and/or \em free, the \em alloc and \em free parameters, respectively, should be set to \em NULL
The prototypes for the user-defined functions would appear as follows:
\code
typedef void *(*H5MM_allocate_t)(size_t size, void *info) ; typedef void (*H5MM_free_t)(void *mem, void *free_info) ;
\endcode
The \em alloc_info and \em free_info parameters can be used to pass along any required information to
the user's memory management routines.
In summary, if the user has defined custom memory management routines, the name(s) of the routines
are passed in the \em alloc and \em free parameters and the custom routines' parameters are passed in the
\em alloc_info and \em free_info parameters. If the user wishes to use the system \em malloc and \em free functions,
the \em alloc and/or \em free parameters are set to \em NULL and the \em alloc_info and \em free_info parameters are ignored.
<h4>Recovering memory from VL buffers read in</h4>
The complex memory buffers created for a VL datatype may be reclaimed with the #H5Dvlen_reclaim
function call, as follows:
\code
herr_t H5Dvlen_reclaim(hid_t type_id, hid_t space_id, hid_t plist_id, void *buf);
\endcode
The \em type_id must be the datatype stored in the buffer, \em space_id describes the selection for the
memory buffer to free the VL datatypes within, \em plist_id is the dataset transfer property list
which was used for the I/O transfer to create the buffer, and \em buf is the pointer to the buffer
to free the VL memory within. The VL structures (#hvl_t) in the user's buffer are modified to zero
out the VL information after it has been freed.
If nested VL datatypes were used to create the buffer, this routine frees them from the bottom up,
releasing all the memory without creating memory leaks.
<hr>
Navigate back: \ref index "Main" / \ref GettingStarted / \ref LearnBasics
*/