mirror of
https://github.com/HDFGroup/hdf5.git
synced 2024-12-21 07:51:46 +08:00
1209 lines
50 KiB
Plaintext
1209 lines
50 KiB
Plaintext
/** @page LBGrpCreate Creating an Group
|
|
Navigate back: \ref index "Main" / \ref GettingStarted / \ref LearnBasics
|
|
<hr>
|
|
|
|
\section secLBGrpCreate Creating an group
|
|
An HDF5 group is a structure containing zero or more HDF5 objects. The two primary HDF5 objects are groups and datasets. To create a group, the calling program must:
|
|
<ol>
|
|
<li>Obtain the location identifier where the group is to be created.</li>
|
|
<li>Create the group.</li>
|
|
<li>Close the group.</li>
|
|
</ol>
|
|
|
|
To create a group, the calling program must call #H5Gcreate.
|
|
To close the group, #H5Gclose must be called. The close call is mandatory.
|
|
|
|
For example:
|
|
|
|
<em>C</em>
|
|
\code
|
|
group_id = H5Gcreate(file_id, "/MyGroup", H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
|
|
status = H5Gclose (group_id);
|
|
\endcode
|
|
|
|
<em>Fortran</em>
|
|
\code
|
|
CALL h5gcreate_f (loc_id, name, group_id, error)
|
|
CALL h5gclose_f (group_id, error)
|
|
\endcode
|
|
|
|
\section secLBGrpCreateRWEx Programming Example
|
|
|
|
\subsection secLBGrpCreateRWExDesc Description
|
|
See \ref LBExamples for the examples used in the \ref LearnBasics tutorial.
|
|
|
|
The example shows how to create and close a group. It creates a file called <code style="background-color:whitesmoke;">group.h5</code> in C
|
|
(<code style="background-color:whitesmoke;">groupf.h5</code> for FORTRAN), creates a group called MyGroup in the root group, and then closes the group and file.
|
|
|
|
For details on compiling an HDF5 application:
|
|
[ \ref LBCompiling ]
|
|
|
|
\subsection secLBGrpCreateRWExCont File Contents
|
|
|
|
Shown below is the contents and the definition of the group of <code style="background-color:whitesmoke;">group.h5</code> (created by the C program).
|
|
(The FORTRAN program creates the HDF5 file <code style="background-color:whitesmoke;">groupf.h5</code> and the resulting DDL shows the filename
|
|
<code style="background-color:whitesmoke;">groupf.h5</code> in the first line.)
|
|
<table>
|
|
<caption>The Contents of group.h5.</caption>
|
|
<tr>
|
|
<td>
|
|
\image html imggrpcreate.gif
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<em>group.h5 in DDL</em>
|
|
\code
|
|
HDF5 "group.h5" {
|
|
GROUP "/" {
|
|
GROUP "MyGroup" {
|
|
}
|
|
}
|
|
}
|
|
\endcode
|
|
|
|
<hr>
|
|
Previous Chapter \ref LBAttrCreate - Next Chapter \ref LBGrpCreateNames
|
|
|
|
Navigate back: \ref index "Main" / \ref GettingStarted / \ref LearnBasics
|
|
|
|
@page LBGrpCreateNames Creating Groups using Absolute and Relative Names
|
|
Navigate back: \ref index "Main" / \ref GettingStarted / \ref LearnBasics
|
|
<hr>
|
|
|
|
Recall that to create an HDF5 object, we have to specify the location where the object is to be created.
|
|
This location is determined by the identifier of an HDF5 object and the name of the object to be created.
|
|
The name of the created object can be either an absolute name or a name relative to the specified identifier.
|
|
In the previous example, we used the file identifier and the absolute name <code style="background-color:whitesmoke;">/MyGroup</code> to create a group.
|
|
|
|
In this section, we discuss HDF5 names and show how to use absolute and relative names.
|
|
|
|
\section secLBGrpCreateNames Names
|
|
HDF5 object names are a slash-separated list of components. There are few restrictions on names: component
|
|
names may be any length except zero and may contain any character except slash (<code style="background-color:whitesmoke;">/</code>) and the null terminator.
|
|
A full name may be composed of any number of component names separated by slashes, with any of the component
|
|
names being the special name <code style="background-color:whitesmoke;">.</code> (a dot or period). A name which begins with a slash is an <em>absolute name</em> which
|
|
is accessed beginning with the root group of the file; all other names are <em>relative names</em> and and the named
|
|
object is accessed beginning with the specified group. A special case is the name <code style="background-color:whitesmoke;">/</code> (or equivalent) which
|
|
refers to the root group.
|
|
|
|
Functions which operate on names generally take a location identifier, which can be either a file identifier
|
|
or a group identifier, and perform the lookup with respect to that location. Several possibilities are
|
|
described in the following table:
|
|
|
|
<table>
|
|
<tr>
|
|
<th><strong>Location Type</strong></th>
|
|
<th><strong>Object Name</strong></th>
|
|
<th><strong>Description</strong></th>
|
|
</tr>
|
|
<tr>
|
|
<th><strong>File identifier</strong></th>
|
|
<td>/foo/bar</td>
|
|
<td>The object bar in group foo in the root group.</td>
|
|
</tr>
|
|
<tr>
|
|
<th><strong>Group identifier</strong></th>
|
|
<td>/foo/bar</td>
|
|
<td>The object bar in group foo in the root group of the file containing the specified group.
|
|
In other words, the group identifier's only purpose is to specify a file.</td>
|
|
</tr>
|
|
<tr>
|
|
<th><strong>File identifier</strong></th>
|
|
<td>/</td>
|
|
<td>The root group of the specified file.</td>
|
|
</tr>
|
|
<tr>
|
|
<th><strong>Group identifier</strong></th>
|
|
<td>/</td>
|
|
<td>The root group of the file containing the specified group.</td>
|
|
</tr>
|
|
<tr>
|
|
<th><strong>Group identifier</strong></th>
|
|
<td>foo/bar</td>
|
|
<td>The object bar in group foo in the specified group.</td>
|
|
</tr>
|
|
<tr>
|
|
<th><strong>File identifier</strong></th>
|
|
<td>.</td>
|
|
<td>The root group of the file.</td>
|
|
</tr>
|
|
<tr>
|
|
<th><strong>Group identifier</strong></th>
|
|
<td>.</td>
|
|
<td>The specified group.</td>
|
|
</tr>
|
|
<tr>
|
|
<th><strong>Other identifier</strong></th>
|
|
<td>.</td>
|
|
<td>The specified object.</td>
|
|
</tr>
|
|
</table>
|
|
|
|
\section secLBGrpCreateNamesEx Programming Example
|
|
|
|
\subsection secLBGrpCreateNamesExDesc Description
|
|
See \ref LBExamples for the examples used in the \ref LearnBasics tutorial.
|
|
|
|
The example code shows how to create groups using absolute and relative names. It creates three groups: the first two groups are created using
|
|
the file identifier and the group absolute names while the third group is created using a group identifier and a name relative to the specified group.
|
|
|
|
For details on compiling an HDF5 application:
|
|
[ \ref LBCompiling ]
|
|
|
|
\subsection secLBGrpCreateNamesExRem Remarks
|
|
#H5Gcreate creates a group at the location specified by a location identifier and a name. The location identifier
|
|
can be a file identifier or a group identifier and the name can be relative or absolute.
|
|
|
|
The first #H5Gcreate/h5gcreate_f creates the group <code style="background-color:whitesmoke;">MyGroup</code> in the root group of the specified file.
|
|
|
|
The second #H5Gcreate/h5gcreate_f creates the group <code style="background-color:whitesmoke;">Group_A</code> in the group <code style="background-color:whitesmoke;">MyGroup</code> in the root group of the specified
|
|
file. Note that the parent group (<code style="background-color:whitesmoke;">MyGroup</code>) already exists.
|
|
|
|
The third #H5Gcreate/h5gcreate_f creates the group <code style="background-color:whitesmoke;">Group_B</code> in the specified group.
|
|
|
|
\subsection secLBGrpCreateNamesExCont File Contents
|
|
|
|
Shown below is the contents and the definition of the group of <code style="background-color:whitesmoke;">groups.h5</code> (created by the C program).
|
|
(The FORTRAN program creates the HDF5 file <code style="background-color:whitesmoke;">groupsf.h5</code> and the resulting DDL shows the filename
|
|
<code style="background-color:whitesmoke;">groupsf.h5</code> in the first line.)
|
|
<table>
|
|
<caption>The Contents of groups.h5.</caption>
|
|
<tr>
|
|
<td>
|
|
\image html imggrps.gif
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<em>groups.h5 in DDL</em>
|
|
\code
|
|
HDF5 "groups.h5" {
|
|
GROUP "/" {
|
|
GROUP "MyGroup" {
|
|
GROUP "Group_A" {
|
|
}
|
|
GROUP "Group_B" {
|
|
}
|
|
}
|
|
}
|
|
}
|
|
\endcode
|
|
|
|
<hr>
|
|
Previous Chapter \ref LBGrpCreate - Next Chapter \ref LBGrpDset
|
|
|
|
Navigate back: \ref index "Main" / \ref GettingStarted / \ref LearnBasics
|
|
|
|
@page LBGrpDset Creating Datasets in Groups
|
|
Navigate back: \ref index "Main" / \ref GettingStarted / \ref LearnBasics
|
|
<hr>
|
|
|
|
\section secLBGrpDset Datasets in Groups
|
|
We have shown how to create groups, datasets, and attributes. In this section, we show how to create
|
|
datasets in groups. Recall that #H5Dcreate creates a dataset at the location specified by a location
|
|
identifier and a name. Similar to #H5Gcreate, the location identifier can be a file identifier or a
|
|
group identifier and the name can be relative or absolute. The location identifier and the name
|
|
together determine the location where the dataset is to be created. If the location identifier and
|
|
name refer to a group, then the dataset is created in that group.
|
|
|
|
\section secLBGrpDsetEx Programming Example
|
|
|
|
\subsection secLBGrpDsetExDesc Description
|
|
See \ref LBExamples for the examples used in the \ref LearnBasics tutorial.
|
|
|
|
The example shows how to create a dataset in a particular group. It opens the file created in the previous example and creates two datasets:
|
|
|
|
For details on compiling an HDF5 application:
|
|
[ \ref LBCompiling ]
|
|
|
|
\subsection secLBGrpDsetExCont File Contents
|
|
|
|
Shown below is the contents and the definition of the group of <code style="background-color:whitesmoke;">groups.h5</code> (created by the C program).
|
|
(The FORTRAN program creates the HDF5 file <code style="background-color:whitesmoke;">groupsf.h5</code> and the resulting DDL shows the filename
|
|
<code style="background-color:whitesmoke;">groupsf.h5</code> in the first line.)
|
|
<table>
|
|
<caption>The contents of the file groups.h5 (groupsf.h5 for FORTRAN)</caption>
|
|
<tr>
|
|
<td>
|
|
\image html imggrpdsets.gif
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<em>groups.h5 in DDL</em>
|
|
\code
|
|
HDF5 "groups.h5" {
|
|
GROUP "/" {
|
|
GROUP "MyGroup" {
|
|
GROUP "Group_A" {
|
|
DATASET "dset2" {
|
|
DATATYPE { H5T_STD_I32BE }
|
|
DATASPACE { SIMPLE ( 2, 10 ) / ( 2, 10 ) }
|
|
DATA {
|
|
1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
|
|
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
|
|
}
|
|
}
|
|
}
|
|
GROUP "Group_B" {
|
|
}
|
|
DATASET "dset1" {
|
|
DATATYPE { H5T_STD_I32BE }
|
|
DATASPACE { SIMPLE ( 3, 3 ) / ( 3, 3 ) }
|
|
DATA {
|
|
1, 2, 3,
|
|
1, 2, 3,
|
|
1, 2, 3
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
\endcode
|
|
|
|
<em>groupsf.h5 in DDL</em>
|
|
\code
|
|
HDF5 "groupsf.h5" {
|
|
GROUP "/" {
|
|
GROUP "MyGroup" {
|
|
GROUP "Group_A" {
|
|
DATASET "dset2" {
|
|
DATATYPE { H5T_STD_I32BE }
|
|
DATASPACE { SIMPLE ( 10, 2 ) / ( 10, 2 ) }
|
|
DATA {
|
|
1, 1,
|
|
2, 2,
|
|
3, 3,
|
|
4, 4,
|
|
5, 5,
|
|
6, 6,
|
|
7, 7,
|
|
8, 8,
|
|
9, 9,
|
|
10, 10
|
|
}
|
|
}
|
|
}
|
|
GROUP "Group_B" {
|
|
}
|
|
DATASET "dset1" {
|
|
DATATYPE { H5T_STD_I32BE }
|
|
DATASPACE { SIMPLE ( 3, 3 ) / ( 3, 3 ) }
|
|
DATA {
|
|
1, 1, 1,
|
|
2, 2, 2,
|
|
3, 3, 3
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
\endcode
|
|
|
|
<hr>
|
|
Previous Chapter \ref LBGrpCreateNames - Next Chapter \ref LBDsetSubRW
|
|
|
|
Navigate back: \ref index "Main" / \ref GettingStarted / \ref LearnBasics
|
|
|
|
@page LBDsetSubRW Reading From or Writing To a Subset of a Dataset
|
|
Navigate back: \ref index "Main" / \ref GettingStarted / \ref LearnBasics
|
|
<hr>
|
|
|
|
\section secLBDsetSubRW Dataset Subsets
|
|
There are two ways that you can select a subset in an HDF5 dataset and read or write to it:
|
|
<ul><li>
|
|
<strong>Hyperslab Selection</strong>: The #H5Sselect_hyperslab call selects a logically contiguous
|
|
collection of points in a dataspace, or a regular pattern of points or blocks in a dataspace.
|
|
</li><li>
|
|
<strong>Element Selection</strong>: The #H5Sselect_elements call selects elements in an array.
|
|
</li></ul>
|
|
|
|
HDF5 allows you to read from or write to a portion or subset of a dataset by:
|
|
\li Selecting a Subset of the Dataset's Dataspace,
|
|
\li Selecting a Memory Dataspace,
|
|
\li Reading From or Writing to a Dataset Subset.
|
|
|
|
\section secLBDsetSubRWSel Selecting a Subset of the Dataset's Dataspace
|
|
First you must obtain the dataspace of a dataset in a file by calling #H5Dget_space.
|
|
|
|
Then select a subset of that dataspace by calling #H5Sselect_hyperslab. The <em>offset</em>, <em>count</em>, <em>stride</em>
|
|
and <em>block</em> parameters of this API define the shape and size of the selection. They must be arrays
|
|
with the same number of dimensions as the rank of the dataset's dataspace. These arrays <strong>ALL</strong> work
|
|
together to define a selection. A change to one of these arrays can affect the others.
|
|
\li \em offset: An array that specifies the offset of the starting element of the specified hyperslab.
|
|
\li \em count: An array that determines how many blocks to select from the dataspace in each dimension. If the block
|
|
size for a dimension is one then the count is the number of elements along that dimension.
|
|
\li \em stride: An array that allows you to sample elements along a dimension. For example, a stride of one (or NULL)
|
|
will select every element along a dimension, a stride of two will select every other element, and a stride of three
|
|
will select an element after every two elements.
|
|
\li \em block: An array that determines the size of the element block selected from a dataspace. If the block size
|
|
is one or NULL then the block size is a single element in that dimension.
|
|
|
|
\section secLBDsetSubRWMem Selecting a Memory Dataspace
|
|
You must select a memory dataspace in addition to a file dataspace before you can read a subset from or write a subset
|
|
to a dataset. A memory dataspace can be specified by calling #H5Screate_simple.
|
|
|
|
The memory dataspace passed to the read or write call must contain the same number of elements as the file dataspace.
|
|
The number of elements in a dataspace selection can be determined with the #H5Sget_select_npoints API.
|
|
|
|
\section secLBDsetSubRWSub Reading From or Writing To a Dataset Subset
|
|
To read from or write to a dataset subset, the #H5Dread and #H5Dwrite routines are used. The memory and file dataspace
|
|
identifiers from the selections that were made are passed into the read or write call. For example (C):
|
|
\code
|
|
status = H5Dwrite (.., .., memspace_id, dataspace_id, .., ..);
|
|
\endcode
|
|
|
|
\section secLBDsetSubRWProg Programming Example
|
|
|
|
\subsection subsecLBDsetSubRWProgDesc Description
|
|
See \ref LBExamples for the examples used in the \ref LearnBasics tutorial.
|
|
|
|
The example creates an 8 x 10 integer dataset in an HDF5 file. It then selects and writes to a 3 x 4 subset
|
|
of the dataset created with the dimensions offset by 1 x 2. (If using Fortran, the dimensions will be swapped.
|
|
The dataset will be 10 x 8, the subset will be 4 x 3, and the offset will be 2 x 1.)
|
|
|
|
PLEASE NOTE that the examples and images below were created using C.
|
|
|
|
The following image shows the dataset that gets written originally, and the subset of data that gets modified
|
|
afterwards. Dimension 0 is vertical and Dimension 1 is horizontal as shown below:
|
|
<table>
|
|
<tr>
|
|
<td>
|
|
\image html LBDsetSubRWProg.png
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
|
|
The subset on the right above is created using these values for offset, count stride, and block:
|
|
\code
|
|
offset = {1, 2}
|
|
|
|
count = {3, 4}
|
|
|
|
stride = {1, 1}
|
|
|
|
block = {1, 1}
|
|
\endcode
|
|
|
|
\subsection subsecLBDsetSubRWProgExper Experiments with Different Selections
|
|
Following are examples of changes that can be made to the example code provided to better understand
|
|
how to make selections.
|
|
|
|
\subsubsection subsubsecLBDsetSubRWProgExperOne Example 1
|
|
By default the example code will select and write to a 3 x 4 subset. You can modify the count
|
|
parameter in the example code to select a different subset, by changing the value of
|
|
DIM0_SUB (C, C++) / dim0_sub (Fortran) near the top. Change its value to 7 to create a 7 x 4 subset:
|
|
<table>
|
|
<tr>
|
|
<td>
|
|
\image html imgLBDsetSubRW11.png
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
|
|
If you were to change the subset to 8 x 4, the selection would be beyond the extent of the dimension:
|
|
<table>
|
|
<tr>
|
|
<td>
|
|
\image html imgLBDsetSubRW12.png
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
|
|
The write will fail with the error: "<strong>file selection+offset not within extent</strong>"
|
|
|
|
\subsubsection subsubsecLBDsetSubRWProgExperTwo Example 2
|
|
In the example code provided, the memory and file dataspaces passed to the H5Dwrite call have the
|
|
same size, 3 x 4 (DIM0_SUB x DIM1_SUB). Change the size of the memory dataspace to be 4 x 4 so that
|
|
they do not match, and then compile:
|
|
\code
|
|
dimsm[0] = DIM0_SUB + 1;
|
|
dimsm[1] = DIM1_SUB;
|
|
memspace_id = H5Screate_simple (RANK, dimsm, NULL);
|
|
\endcode
|
|
The code will fail with the error: "<strong>src and dest data spaces have different sizes</strong>"
|
|
|
|
How many elements are in the memory and file dataspaces that were specified above? Add these lines:
|
|
\code
|
|
hssize_t size;
|
|
|
|
/* Just before H5Dwrite call the following */
|
|
size = H5Sget_select_npoints (memspace_id);
|
|
printf ("\nmemspace_id size: %i\n", size);
|
|
size = H5Sget_select_npoints (dataspace_id);
|
|
printf ("dataspace_id size: %i\n", size);
|
|
\endcode
|
|
|
|
You should see these lines followed by the error:
|
|
\code
|
|
memspace_id size: 16
|
|
dataspace_id size: 12
|
|
\endcode
|
|
|
|
\subsubsection subsubsecLBDsetSubRWProgExperThree Example 3
|
|
This example shows the selection that occurs if changing the values of the <em>offset</em>, <em>count</em>,
|
|
<em>stride</em> and <em>block</em> parameters in the example code.
|
|
|
|
This will select two blocks. The <em>count</em> array specifies the number of blocks. The <em>block</em> array
|
|
specifies the size of a block. The <em>stride</em> must be modified to accommodate the block <em>size</em>.
|
|
<table>
|
|
<tr>
|
|
<td>
|
|
\image html imgLBDsetSubRW31.png
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
|
|
Now try modifying the count as shown below. The write will fail because the selection goes beyond the extent of the dimension:
|
|
<table>
|
|
<tr>
|
|
<td>
|
|
\image html imgLBDsetSubRW32.png
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
|
|
If the offset were 1x1 (instead of 1x2), then the selection can be made:
|
|
<table>
|
|
<tr>
|
|
<td>
|
|
\image html imgLBDsetSubRW33.png
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
|
|
The selections above were tested with the
|
|
<a href="https://\AEXURL/howto/subset/h5_subsetbk.c">h5_subsetbk.c</a>
|
|
example code. The memory dataspace was defined as one-dimensional.
|
|
|
|
\subsection subsecLBDsetSubRWProgRem Remarks
|
|
\li In addition to #H5Sselect_hyperslab, this example introduces the #H5Dget_space call to obtain the dataspace of a dataset.
|
|
\li If using the default values for the stride and block parameters of #H5Sselect_hyperslab, then, for C you can specify NULL
|
|
for these parameters, rather than passing in an array for each, and for Fortran 90 you can omit these parameters.
|
|
|
|
<hr>
|
|
Previous Chapter \ref LBGrpDset - Next Chapter \ref LBDatatypes
|
|
|
|
Navigate back: \ref index "Main" / \ref GettingStarted / \ref LearnBasics
|
|
|
|
@page LBDatatypes Datatype Basics
|
|
Navigate back: \ref index "Main" / \ref GettingStarted / \ref LearnBasics
|
|
<hr>
|
|
|
|
\section secLBDtype What is a Datatype?
|
|
A datatype is a collection of datatype properties which provide complete information for data conversion to or from that datatype.
|
|
|
|
Datatypes in HDF5 can be grouped as follows:
|
|
\li <strong>Pre-Defined Datatypes</strong>: These are datatypes that are created by HDF5. They are actually opened
|
|
(and closed) by HDF5, and can have a different value from one HDF5 session to the next.
|
|
\li <strong>Derived Datatypes</strong>: These are datatypes that are created or derived from the pre-defined datatypes.
|
|
Although created from pre-defined types, they represent a category unto themselves. An example of a commonly used derived
|
|
datatype is a string of more than one character.
|
|
|
|
\section secLBDtypePre Pre-defined Datatypes
|
|
The properties of pre-defined datatypes are:
|
|
\li Pre-defined datatypes are opened and closed by HDF5.
|
|
\li A pre-defined datatype is a handle and is NOT PERSISTENT. Its value can be different from one HDF5 session to the next.
|
|
\li Pre-defined datatypes are Read-Only.
|
|
\li As mentioned, other datatypes can be derived from pre-defined datatypes.
|
|
|
|
There are two types of pre-defined datatypes, standard (file) and native.
|
|
|
|
<h4>Standard</h4>
|
|
A standard (or file) datatype can be:
|
|
<ul>
|
|
<li><strong>Atomic</strong>: A datatype which cannot be decomposed into smaller datatype units at the API level.
|
|
The atomic datatypes are:
|
|
<ul>
|
|
<li>integer</li>
|
|
<li>float</li>
|
|
<li>string (1-character)</li>
|
|
<li>date and time</li>
|
|
<li>bitfield</li>
|
|
<li>reference</li>
|
|
<li>opaque</li>
|
|
</ul>
|
|
</li>
|
|
<li><strong>Composite</strong>: An aggregation of one or more datatypes.
|
|
Composite datatypes include:
|
|
<ul>
|
|
<li>array</li>
|
|
<li>variable length</li>
|
|
<li>enumeration</li>
|
|
<li>compound datatypes</li>
|
|
</ul>
|
|
Array, variable length, and enumeration datatypes are defined in terms of a single atomic datatype,
|
|
whereas a compound datatype is a datatype composed of a sequence of datatypes.
|
|
</li>
|
|
</ul>
|
|
|
|
<table>
|
|
<tr>
|
|
<th><strong>Notes</strong></th>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
\li Standard pre-defined datatypes are the SAME on all platforms.
|
|
\li They are the datatypes that you see in an HDF5 file.
|
|
\li They are typically used when creating a dataset.
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<h4>Native</h4>
|
|
Native pre-defined datatypes are used for memory operations, such as reading and writing. They are
|
|
NOT THE SAME on different platforms. They are similar to C type names, and are aliased to the
|
|
appropriate HDF5 standard pre-defined datatype for a given platform.
|
|
|
|
For example, when on an Intel based PC, #H5T_NATIVE_INT is aliased to the standard pre-defined type,
|
|
#H5T_STD_I32LE. On a MIPS machine, it is aliased to #H5T_STD_I32BE.
|
|
<table>
|
|
<tr>
|
|
<th><strong>Notes</strong></th>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
\li Native datatypes are NOT THE SAME on all platforms.
|
|
\li Native datatypes simplify memory operations (read/write). The HDF5 library automatically converts as needed.
|
|
\li Native datatypes are NOT in an HDF5 File. The standard pre-defined datatype that a native datatype corresponds
|
|
to is what you will see in the file.
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<h4>Pre-Defined</h4>
|
|
The following table shows the native types and the standard pre-defined datatypes they correspond
|
|
to. (Keep in mind that HDF5 can convert between datatypes, so you can specify a buffer of a larger
|
|
type for a dataset of a given type. For example, you can read a dataset that has a short datatype
|
|
into a long integer buffer.)
|
|
|
|
<table>
|
|
<caption>Some HDF5 pre-defined native datatypes and corresponding standard (file) type</caption>
|
|
<tr>
|
|
<th><strong>C Type</strong></th>
|
|
<th><strong>HDF5 Memory Type</strong></th>
|
|
<th><strong>HDF5 File Type*</strong></th>
|
|
</tr>
|
|
<tr>
|
|
<th span="3"><strong>Integer</strong></th>
|
|
</tr>
|
|
<tr>
|
|
<td>int</td>
|
|
<td>#H5T_NATIVE_INT</td>
|
|
<td>#H5T_STD_I32BE or #H5T_STD_I32LE</td>
|
|
</tr>
|
|
<tr>
|
|
<td>short</td>
|
|
<td>#H5T_NATIVE_SHORT</td>
|
|
<td>#H5T_STD_I16BE or #H5T_STD_I16LE</td>
|
|
</tr>
|
|
<tr>
|
|
<td>long</td>
|
|
<td>#H5T_NATIVE_LONG</td>
|
|
<td>#H5T_STD_I32BE, #H5T_STD_I32LE,
|
|
#H5T_STD_I64BE or #H5T_STD_I64LE</td>
|
|
</tr>
|
|
<tr>
|
|
<td>long long</td>
|
|
<td>#H5T_NATIVE_LLONG</td>
|
|
<td>#H5T_STD_I64BE or #H5T_STD_I64LE</td>
|
|
</tr>
|
|
<tr>
|
|
<td>unsigned int</td>
|
|
<td>#H5T_NATIVE_UINT</td>
|
|
<td>#H5T_STD_U32BE or #H5T_STD_U32LE</td>
|
|
</tr>
|
|
<tr>
|
|
<td>unsigned short</td>
|
|
<td>#H5T_NATIVE_USHORT</td>
|
|
<td>#H5T_STD_U16BE or #H5T_STD_U16LE</td>
|
|
</tr>
|
|
<tr>
|
|
<td>unsigned long</td>
|
|
<td>#H5T_NATIVE_ULONG</td>
|
|
<td>#H5T_STD_U32BE, #H5T_STD_U32LE,
|
|
#H5T_STD_U64BE or #H5T_STD_U64LE</td>
|
|
</tr>
|
|
<tr>
|
|
<td>unsigned long long</td>
|
|
<td>#H5T_NATIVE_ULLONG</td>
|
|
<td>#H5T_STD_U64BE or #H5T_STD_U64LE</td>
|
|
</tr>
|
|
<tr>
|
|
<th span="3"><strong>Float</strong></th>
|
|
</tr>
|
|
<tr>
|
|
<td>_Float16</td>
|
|
<td>#H5T_NATIVE_FLOAT16</td>
|
|
<td>#H5T_IEEE_F16BE or #H5T_IEEE_F16LE</td>
|
|
</tr>
|
|
<tr>
|
|
<td>float</td>
|
|
<td>#H5T_NATIVE_FLOAT</td>
|
|
<td>#H5T_IEEE_F32BE or #H5T_IEEE_F32LE</td>
|
|
</tr>
|
|
<tr>
|
|
<td>double</td>
|
|
<td>#H5T_NATIVE_DOUBLE</td>
|
|
<td>#H5T_IEEE_F64BE or #H5T_IEEE_F64LE</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<table>
|
|
<caption>Some HDF5 pre-defined native datatypes and corresponding standard (file) type</caption>
|
|
<tr>
|
|
<th><strong>F90 Type</strong></th>
|
|
<th><strong>HDF5 Memory Type</strong></th>
|
|
<th><strong>HDF5 File Type*</strong></th>
|
|
</tr>
|
|
<tr>
|
|
<td>integer</td>
|
|
<td>H5T_NATIVE_INTEGER</td>
|
|
<td>#H5T_STD_I32BE(8,16) or #H5T_STD_I32LE(8,16)</td>
|
|
</tr>
|
|
<tr>
|
|
<td>real</td>
|
|
<td>H5T_NATIVE_REAL</td>
|
|
<td>#H5T_IEEE_F32BE or #H5T_IEEE_F32LE</td>
|
|
</tr>
|
|
<tr>
|
|
<td>double-precision</td>
|
|
<td>#H5T_NATIVE_DOUBLE</td>
|
|
<td>#H5T_IEEE_F64BE or #H5T_IEEE_F64LE</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<table>
|
|
<tr>
|
|
<td>* Note that the HDF5 File Types listed are those that are most commonly created.
|
|
The file type created depends on the compiler switches and platforms being
|
|
used. For example, on the Cray an integer is 64-bit, and using #H5T_NATIVE_INT (C)
|
|
or H5T_NATIVE_INTEGER (F90) would result in an #H5T_STD_I64BE file type.</td>
|
|
</tr>
|
|
</table>
|
|
|
|
The following code is an example of when you would use standard pre-defined datatypes vs. native types:
|
|
\code
|
|
#include "hdf5.h"
|
|
|
|
main() {
|
|
|
|
hid_t file_id, dataset_id, dataspace_id;
|
|
herr_t status;
|
|
hsize_t dims[2]={4,6};
|
|
int i, j, dset_data[4][6];
|
|
|
|
for (i = 0; i < 4; i++)
|
|
for (j = 0; j < 6; j++)
|
|
dset_data[i][j] = i * 6 + j + 1;
|
|
|
|
file_id = H5Fcreate ("dtypes.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
|
|
|
|
dataspace_id = H5Screate_simple (2, dims, NULL);
|
|
|
|
dataset_id = H5Dcreate (file_id, "/dset", H5T_STD_I32BE, dataspace_id,
|
|
H5P_DEFAULT);
|
|
|
|
status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL,
|
|
H5P_DEFAULT, dset_data);
|
|
|
|
status = H5Dclose (dataset_id);
|
|
|
|
status = H5Fclose (file_id);
|
|
}
|
|
\endcode
|
|
By using the native types when reading and writing, the code that reads from or writes to a dataset
|
|
can be the same for different platforms.
|
|
|
|
Can native types also be used when creating a dataset? Yes. However, just be aware that the resulting
|
|
datatype in the file will be one of the standard pre-defined types and may be different than expected.
|
|
|
|
What happens if you do not use the correct native datatype for a standard (file) datatype? Your data
|
|
may be incorrect or not what you expect.
|
|
|
|
\section secLBDtypeDer Derived Datatypes
|
|
ANY pre-defined datatype can be used to derive user-defined datatypes.
|
|
|
|
To create a datatype derived from a pre-defined type:
|
|
<ol>
|
|
<li>Make a copy of the pre-defined datatype:
|
|
\code
|
|
tid = H5Tcopy (H5T_STD_I32BE);
|
|
\endcode
|
|
</li>
|
|
<li>Change the datatype.</li>
|
|
</ol>
|
|
|
|
There are numerous datatype functions that allow a user to alter a pre-defined datatype. See
|
|
\ref subsecLBDtypeSpecStr below for a simple example.
|
|
|
|
Refer to the \ref H5T in the \ref RM. Example functions are #H5Tset_size and #H5Tset_precision.
|
|
|
|
\section secLBDtypeSpec Specific Datatypes
|
|
On the \ref ExAPI
|
|
page under \ref sec_exapi_dtypes
|
|
you will find many example programs for creating and reading datasets with different datatypes.
|
|
|
|
Below is additional information on some of the datatypes. See
|
|
the \ref ExAPI
|
|
page for examples of these datatypes.
|
|
|
|
\subsection subsecLBDtypeSpec Array Datatype vs Array Dataspace
|
|
#H5T_ARRAY is a datatype, and it should not be confused with the dataspace of a dataset. The dataspace
|
|
of a dataset can consist of a regular array of elements. For example, the datatype for a dataset
|
|
could be an atomic datatype like integer, and the dataset could be an N-dimensional appendable array,
|
|
as specified by the dataspace. See #H5Screate and #H5Screate_simple for details.
|
|
|
|
Unlimited dimensions and subsetting are not supported when using the #H5T_ARRAY datatype.
|
|
|
|
The #H5T_ARRAY datatype was primarily created to address the simple case of a compound datatype
|
|
when all members of the compound datatype are of the same type and there is no need to subset by
|
|
compound datatype members. Creation of such a datatype is more efficient and I/O also requires
|
|
less work, because there is no alignment involved.
|
|
|
|
\subsection subsecLBDtypeSpecArr Array Datatype
|
|
The array class of datatypes, #H5T_ARRAY, allows the construction of true, homogeneous,
|
|
multi-dimensional arrays. Since these are homogeneous arrays, each element of the array
|
|
will be of the same datatype, designated at the time the array is created.
|
|
|
|
Users may be confused by this datatype, as opposed to a dataset with a simple atomic
|
|
datatype (eg. integer) that is an array. See subsecLBDtypeSpec for more information.
|
|
|
|
Arrays can be nested. Not only is an array datatype used as an element of an HDF5 dataset,
|
|
but the elements of an array datatype may be of any datatype, including another array datatype.
|
|
|
|
Array datatypes <strong>cannot be subdivided for I/O</strong>; the entire array must be transferred from one
|
|
dataset to another.
|
|
|
|
Within certain limitations, outlined in the next paragraph, array datatypes may be N-dimensional
|
|
and of any dimension size. <strong>Unlimited dimensions, however, are not supported</strong>. Functionality similar
|
|
to unlimited dimension arrays is available through the use of variable-length datatypes.
|
|
|
|
The maximum number of dimensions, i.e., the maximum rank, of an array datatype is specified by
|
|
the HDF5 library constant #H5S_MAX_RANK. The minimum rank is 1 (one). All dimension sizes must
|
|
be greater than 0 (zero).
|
|
|
|
One array datatype may only be converted to another array datatype if the number of dimensions
|
|
and the sizes of the dimensions are equal and the datatype of the first array's elements can be
|
|
converted to the datatype of the second array's elements.
|
|
|
|
\subsubsection subsubsecLBDtypeSpecArrAPI Array Datatype APIs
|
|
There are three functions that are specific to array datatypes: one, #H5Tarray_create, for creating
|
|
an array datatype, and two, #H5Tget_array_ndims and #H5Tget_array_dims
|
|
for working with existing array datatypes.
|
|
|
|
<h4>Creating</h4>
|
|
The function #H5Tarray_create creates a new array datatype object. Parameters specify
|
|
\li the base datatype of each element of the array,
|
|
\li the rank of the array, i.e., the number of dimensions,
|
|
\li the size of each dimension, and
|
|
\li the dimension permutation of the array, i.e., whether the elements of the array are listed in C or FORTRAN order.
|
|
|
|
<h4>Working with existing array datatypes</h4>
|
|
When working with existing arrays, one must first determine the rank, or number of dimensions, of the array.
|
|
|
|
The function #H5Tget_array_dims returns the rank of a specified array datatype.
|
|
|
|
In many instances, one needs further information. The function #H5Tget_array_dims retrieves the
|
|
permutation of the array and the size of each dimension.
|
|
|
|
\subsection subsecLBDtypeSpecCmpd Compound
|
|
|
|
\subsubsection subsubsecLBDtypeSpecCmpdProp Properties of compound datatypes
|
|
A compound datatype is similar to a struct in C or a common block in Fortran. It is a collection of
|
|
one or more atomic types or small arrays of such types. To create and use of a compound datatype
|
|
you need to refer to various properties of the data compound datatype:
|
|
\li It is of class compound.
|
|
\li It has a fixed total size, in bytes.
|
|
\li It consists of zero or more members (defined in any order) with unique names and which occupy non-overlapping regions within the datum.
|
|
\li Each member has its own datatype.
|
|
\li Each member is referenced by an index number between zero and N-1, where N is the number of members in the compound datatype.
|
|
\li Each member has a name which is unique among its siblings in a compound datatype.
|
|
\li Each member has a fixed byte offset, which is the first byte (smallest byte address) of that member in a compound datatype.
|
|
\li Each member can be a small array of up to four dimensions.
|
|
|
|
Properties of members of a compound datatype are defined when the member is added to the compound type and cannot be subsequently modified.
|
|
|
|
\subsubsection subsubsecLBDtypeSpecCmpdDef Defining compound datatypes
|
|
Compound datatypes must be built out of other datatypes. First, one creates an empty compound
|
|
datatype and specifies its total size. Then members are added to the compound datatype in any order.
|
|
|
|
Member names. Each member must have a descriptive name, which is the key used to uniquely identify
|
|
the member within the compound datatype. A member name in an HDF5 datatype does not necessarily
|
|
have to be the same as the name of the corresponding member in the C struct in memory, although
|
|
this is often the case. Nor does one need to define all members of the C struct in the HDF5
|
|
compound datatype (or vice versa).
|
|
|
|
Offsets. Usually a C struct will be defined to hold a data point in memory, and the offsets of the
|
|
members in memory will be the offsets of the struct members from the beginning of an instance of the
|
|
struct. The library defines the macro to compute the offset of a member within a struct:
|
|
\code
|
|
HOFFSET(s,m)
|
|
\endcode
|
|
This macro computes the offset of member m within a struct variable s.
|
|
|
|
Here is an example in which a compound datatype is created to describe complex numbers whose type
|
|
is defined by the complex_t struct.
|
|
\code
|
|
typedef struct {
|
|
double re; /*real part */
|
|
double im; /*imaginary part */
|
|
} complex_t;
|
|
|
|
complex_t tmp; /*used only to compute offsets */
|
|
hid_t complex_id = H5Tcreate (H5T_COMPOUND, sizeof tmp);
|
|
H5Tinsert (complex_id, "real", HOFFSET(tmp,re), H5T_NATIVE_DOUBLE);
|
|
H5Tinsert (complex_id, "imaginary", HOFFSET(tmp,im), H5T_NATIVE_DOUBLE);
|
|
\endcode
|
|
|
|
\subsection subsecLBDtypeSpecRef Reference
|
|
There are three types of Reference datatypes in HDF5:
|
|
\li \ref subsubsecLBDtypeSpecRefStd
|
|
\li \ref subsubsecLBDtypeSpecRefObj
|
|
\li \ref subsubsecLBDtypeSpecRefDset
|
|
|
|
\subsubsection subsubsecLBDtypeSpecRefStd Standard Reference
|
|
HDF5 references allow users to reference existing HDF5 objects as well as selections within datasets. The
|
|
original API, now deprecated, was extended in order to add the ability to reference attributes as well as objects in
|
|
external files.
|
|
|
|
The newer API introduced a single opaque reference type, which not only has the advantage of hiding the internal
|
|
representation of references, but it also allows for future extensions to be added more seamlessly. The newer API
|
|
introduces a single abstract #H5R_ref_t type as well as attribute references and external references
|
|
(i.e., references to objects in an external file).
|
|
|
|
A file, group, dataset, named datatype, or attribute may be the target of an object reference.
|
|
The object reference is created by
|
|
#H5Rcreate_object with the name of an object which may be a file, group, dataset, named datatype, or attribute
|
|
and the reference type #H5R_OBJECT. The object does not have to be open to create a reference to it.
|
|
|
|
An object reference may also refer to a region (selection) of a dataset. The reference is created
|
|
with #H5Rcreate_region. The dataspace for the region can be retrieved with a call to #H5Ropen_region.
|
|
|
|
An object reference may also refer to a attribute. The reference is created
|
|
with #H5Rcreate_attr. #H5Ropen_attr can be used to open the attribute by returning an identifier
|
|
to the attribute just as if #H5Aopen has been called.
|
|
|
|
An object reference can be accessed by a call to #H5Ropen_object.
|
|
|
|
When the reference is to a dataset or dataset region, the #H5Ropen_object call returns an
|
|
identifier to the dataset just as if #H5Dopen has been called.
|
|
When the reference is to an attribute, the #H5Ropen_object call returns an
|
|
identifier to the attribute just as if #H5Aopen has been called.
|
|
|
|
The reference buffer from the #H5Rcreate_object call must be released by
|
|
using #H5Rdestroy to avoid resource leaks and possible HDF5 library shutdown issues. And any identifiers
|
|
returned by #H5Ropen_object must be closed with the appropriate close call.
|
|
|
|
\subsubsection subsubsecLBDtypeSpecRefObj Reference to objects - Deprecated
|
|
In HDF5, objects (i.e. groups, datasets, and named datatypes) are usually accessed by name.
|
|
There is another way to access stored objects -- by reference.
|
|
|
|
An object reference is based on the relative file address of the object header in the file
|
|
and is constant for the life of the object. Once a reference to an object is created and
|
|
stored in a dataset in the file, it can be used to dereference the object it points to.
|
|
References are handy for creating a file index or for grouping related objects by storing
|
|
references to them in one dataset.
|
|
|
|
<h4>Creating and storing references to objects</h4>
|
|
The following steps are involved in creating and storing file references to objects:
|
|
<ol>
|
|
<li>Create the objects or open them if they already exist in the file.</li>
|
|
<li>Create a dataset to store the objects' references, by specifying #H5T_STD_REF_OBJ as the datatype</li>
|
|
<li>Create and store references to the objects in a buffer, using #H5Rcreate.</li>
|
|
<li>Write a buffer with the references to the dataset, using #H5Dwrite with the #H5T_STD_REF_OBJ datatype.</li>
|
|
</ol>
|
|
|
|
<h4>Reading references and accessing objects using references</h4>
|
|
The following steps are involved:
|
|
<ol>
|
|
<li>Open the dataset with the references and read them. The #H5T_STD_REF_OBJ datatype must be used to describe the memory datatype.</li>
|
|
<li>Use the read reference to obtain the identifier of the object the reference points to using #H5Rdereference.</li>
|
|
<li>Open the dereferenced object and perform the desired operations.</li>
|
|
<li>Close all objects when the task is complete.</li>
|
|
</ol>
|
|
|
|
\subsubsection subsubsecLBDtypeSpecRefDset Reference to a dataset region - Deprecated
|
|
A dataset region reference points to a dataset selection in another dataset.
|
|
A reference to the dataset selection (region) is constant for the life of the dataset.
|
|
|
|
<h4>Creating and storing references to dataset regions</h4>
|
|
The following steps are involved in creating and storing references to a dataset region:
|
|
\li Create a dataset to store the dataset region (selection), by passing in #H5T_STD_REF_DSETREG for the datatype when calling #H5Dcreate.
|
|
\li Create selection(s) in existing dataset(s) using #H5Sselect_hyperslab and/or #H5Sselect_elements.
|
|
\li Create reference(s) to the selection(s) using #H5Rcreate and store them in a buffer.
|
|
\li Write the references to the dataset regions in the file.
|
|
\li Close all objects.
|
|
|
|
<h4>Reading references to dataset regions</h4>
|
|
The following steps are involved in reading references to dataset regions and referenced dataset regions (selections).
|
|
<ol>
|
|
<li>Open and read the dataset containing references to the dataset regions.
|
|
The datatype #H5T_STD_REF_DSETREG must be used during read operation.</li>
|
|
<li>Use #H5Rdereference to obtain the dataset identifier from the read dataset region reference.
|
|
OR
|
|
Use #H5Rget_region to obtain the dataspace identifier for the dataset containing the selection from the read dataset region reference.
|
|
</li>
|
|
<li>With the dataspace identifier, the \ref H5S interface functions, H5Sget_select_*,
|
|
can be used to obtain information about the selection.</li>
|
|
<li>Close all objects when they are no longer needed.</li>
|
|
</ol>
|
|
|
|
The dataset with the region references was read by #H5Dread with the #H5T_STD_REF_DSETREG datatype specified.
|
|
|
|
The read reference can be used to obtain the dataset identifier by calling #H5Rdereference or by obtaining
|
|
obtain spatial information (dataspace and selection) with the call to #H5Rget_region.
|
|
|
|
The reference to the dataset region has information for both the dataset itself and its selection. In both functions:
|
|
\li The first parameter is an identifier of the dataset with the region references.
|
|
\li The second parameter specifies the type of reference stored. In this example, a reference to the dataset region is stored.
|
|
\li The third parameter is a buffer containing the reference of the specified type.
|
|
|
|
This example introduces several H5Sget_select_* functions used to obtain information about selections:
|
|
<table>
|
|
<caption>Examples of HDF5 predefined datatypes</caption>
|
|
<tr>
|
|
<th><strong>Function</strong></th>
|
|
<th><strong>Description</strong></th>
|
|
</tr>
|
|
<tr>
|
|
<td>#H5Sget_select_npoints</td>
|
|
<td>Returns the number of elements in the hyperslab</td>
|
|
</tr>
|
|
<tr>
|
|
<td>#H5Sget_select_hyper_nblocks</td>
|
|
<td>Returns the number of blocks in the hyperslab</td>
|
|
</tr>
|
|
<tr>
|
|
<td>#H5Sget_select_hyper_blocklist</td>
|
|
<td>Returns the "lower left" and "upper right" coordinates of the blocks in the hyperslab selection</td>
|
|
</tr>
|
|
<tr>
|
|
<td>#H5Sget_select_bounds</td>
|
|
<td>Returns the coordinates of the "minimal" block containing a hyperslab selection</td>
|
|
</tr>
|
|
<tr>
|
|
<td>#H5Sget_select_elem_npoints</td>
|
|
<td>Returns the number of points in the element selection</td>
|
|
</tr>
|
|
<tr>
|
|
<td>#H5Sget_select_elem_pointlist</td>
|
|
<td>Returns the coordinates of points in the element selection</td>
|
|
</tr>
|
|
</table>
|
|
|
|
\subsection subsecLBDtypeSpecStr String
|
|
A simple example of creating a derived datatype is using the string datatype,
|
|
#H5T_C_S1 (#H5T_FORTRAN_S1) to create strings of more than one character. Strings
|
|
can be stored as either fixed or variable length, and may have different rules
|
|
for padding of unused storage.
|
|
|
|
\subsubsection subsecLBDtypeSpecStrFix Fixed Length 5-character String Datatype
|
|
\code
|
|
hid_t strtype; /* Datatype ID */
|
|
herr_t status;
|
|
|
|
strtype = H5Tcopy (H5T_C_S1);
|
|
status = H5Tset_size (strtype, 5); /* create string of length 5 */
|
|
\endcode
|
|
|
|
\subsubsection subsecLBDtypeSpecStrVar Variable Length String Datatype
|
|
\code
|
|
strtype = H5Tcopy (H5T_C_S1);
|
|
status = H5Tset_size (strtype, H5T_VARIABLE);
|
|
\endcode
|
|
|
|
The ability to derive datatypes from pre-defined types allows users to create any number of datatypes,
|
|
from simple to very complex.
|
|
|
|
As the term implies, variable length strings are strings of varying lengths. They are stored internally
|
|
in a heap, potentially impacting efficiency in the following ways:
|
|
\li Heap storage requires more space than regular raw data storage.
|
|
\li Heap access generally reduces I/O efficiency because it requires individual read or write operations
|
|
for each data element rather than one read or write per dataset or per data selection.
|
|
\li A variable length dataset consists of pointers to the heaps of data, not the actual data. Chunking
|
|
and filters, including compression, are not available for heaps.
|
|
|
|
See \ref subsubsec_datatype_other_strings in the \ref UG, for more information on how fixed and variable
|
|
length strings are stored.
|
|
|
|
\subsection subsecLBDtypeSpecVL Variable Length
|
|
Variable-length (VL) datatypes are sequences of an existing datatype (atomic, VL, or compound)
|
|
which are not fixed in length from one dataset location to another. In essence, they are similar
|
|
to C character strings -- a sequence of a type which is pointed to by a particular type of
|
|
pointer -- although they are implemented more closely to FORTRAN strings by including an explicit
|
|
length in the pointer instead of using a particular value to terminate the sequence.
|
|
|
|
VL datatypes are useful to the scientific community in many different ways, some of which are listed below:
|
|
<ul>
|
|
<li>Ragged arrays: Multi-dimensional ragged arrays can be implemented with the last (fastest changing)
|
|
dimension being ragged by using a VL datatype as the type of the element stored. (Or as a field in a compound datatype.)
|
|
</li>
|
|
<li>Fractal arrays: If a compound datatype has a VL field of another compound type with VL fields
|
|
(a nested VL datatype), this can be used to implement ragged arrays of ragged arrays, to whatever
|
|
nesting depth is required for the user.
|
|
</li>
|
|
<li>Polygon lists: A common storage requirement is to efficiently store arrays of polygons with
|
|
different numbers of vertices. VL datatypes can be used to efficiently and succinctly describe an
|
|
array of polygons with different numbers of vertices.
|
|
</li>
|
|
<li>Character strings: Perhaps the most common use of VL datatypes will be to store C-like VL character
|
|
strings in dataset elements or as attributes of objects.
|
|
</li>
|
|
<li>Indices: An array of VL object references could be used as an index to all the objects in a file
|
|
which contain a particular sequence of dataset values. Perhaps an array something like the following:
|
|
\code
|
|
Value1: Object1, Object3, Object9
|
|
Value2: Object0, Object12, Object14, Object21, Object22
|
|
Value3: Object2
|
|
Value4: <none>
|
|
Value5: Object1, Object10, Object12
|
|
.
|
|
.
|
|
\endcode
|
|
</li>
|
|
<li>Object Tracking: An array of VL dataset region references can be used as a method of tracking
|
|
objects or features appearing in a sequence of datasets. Perhaps an array of them would look like:
|
|
\code
|
|
Feature1: Dataset1:Region, Dataset3:Region, Dataset9:Region
|
|
Feature2: Dataset0:Region, Dataset12:Region, Dataset14:Region,
|
|
Dataset21:Region, Dataset22:Region
|
|
Feature3: Dataset2:Region
|
|
Feature4: <none>
|
|
Feature5: Dataset1:Region, Dataset10:Region, Dataset12:Region
|
|
.
|
|
.
|
|
\endcode
|
|
</li>
|
|
</ul>
|
|
|
|
\subsubsection subsubsecLBDtypeSpecVLMem Variable-length datatype memory management
|
|
With each element possibly being of different sequence lengths for a dataset with a VL datatype,
|
|
the memory for the VL datatype must be dynamically allocated. Currently there are two methods
|
|
of managing the memory for VL datatypes: the standard C malloc/free memory allocation routines
|
|
or a method of calling user-defined memory management routines to allocate or free memory. Since
|
|
the memory allocated when reading (or writing) may be complicated to release, an HDF5 routine is
|
|
provided to traverse a memory buffer and free the VL datatype information without leaking memory.
|
|
|
|
\subsubsection subsubsecLBDtypeSpecVLDiv Variable-length datatypes cannot be divided
|
|
VL datatypes are designed so that they cannot be subdivided by the library with selections, etc.
|
|
This design was chosen due to the complexities in specifying selections on each VL element of a
|
|
dataset through a selection API that is easy to understand. Also, the selection APIs work on
|
|
dataspaces, not on datatypes. At some point in time, we may want to create a way for dataspaces
|
|
to have VL components to them and we would need to allow selections of those VL regions, but
|
|
that is beyond the scope of this document.
|
|
|
|
\subsubsection subsubsecLBDtypeSpecVLErr What happens if the library runs out of memory while reading?
|
|
It is possible for a call to #H5Dread to fail while reading in VL datatype information if the memory
|
|
required exceeds that which is available. In this case, the #H5Dread call will fail gracefully and any
|
|
VL data which has been allocated prior to the memory shortage will be returned to the system via the
|
|
memory management routines detailed below. It may be possible to design a partial read API function
|
|
at a later date, if demand for such a function warrants.
|
|
|
|
\subsubsection subsubsecLBDtypeSpecVLStr Strings as variable-length datatypes
|
|
Since character strings are a special case of VL data that is implemented in many different ways on
|
|
different machines and in different programming languages, they are handled somewhat differently from
|
|
other VL datatypes in HDF5.
|
|
|
|
HDF5 has native VL strings for each language API, which are stored the same way on disk, but are
|
|
exported through each language API in a natural way for that language. When retrieving VL strings
|
|
from a dataset, users may choose to have them stored in memory as a native VL string or in HDF5's
|
|
#hvl_t struct for VL datatypes.
|
|
|
|
VL strings may be created in one of two ways: by creating a VL datatype with a base type of
|
|
#H5T_C_S1 and setting its length to #H5T_VARIABLE. The second method is used to access native VL strings in memory. The
|
|
library will convert between the two types, but they are stored on disk using different datatypes
|
|
and have different memory representations.
|
|
|
|
Multi-byte character representations, such as \em UNICODE or \em wide characters in C/C++, will need the
|
|
appropriate character and string datatypes created so that they can be described properly through
|
|
the datatype API. Additional conversions between these types and the current ASCII characters
|
|
will also be required.
|
|
|
|
Variable-width character strings (which might be compressed data or some other encoding) are not
|
|
currently handled by this design. We will evaluate how to implement them based on user feedback.
|
|
|
|
\subsubsection subsubsecLBDtypeSpecVLAPIs Variable-length datatype APIs
|
|
|
|
<h4>Creation</h4>
|
|
VL datatypes are created with the #H5Tvlen_create function as follows:
|
|
\code
|
|
type_id = H5Tvlen_create(hid_t base_type_id);
|
|
\endcode
|
|
The base datatype will be the datatype that the sequence is composed of, characters for character
|
|
strings, vertex coordinates for polygon lists, etc. The base datatype specified for the VL datatype
|
|
can be of any HDF5 datatype, including another VL datatype, a compound datatype, or an atomic datatype.
|
|
|
|
<h4>Querying base datatype of VL datatype</h4>
|
|
It may be necessary to know the base datatype of a VL datatype before memory is allocated, etc.
|
|
The base datatype is queried with the #H5Tget_super function, described in the \ref H5T documentation.
|
|
|
|
<h4>Querying minimum memory required for VL information</h4>
|
|
It order to predict the memory usage that #H5Dread may need to allocate to store VL data while
|
|
reading the data, the #H5Dvlen_get_buf_size function is provided:
|
|
\code
|
|
herr_t H5Dvlen_get_buf_size(hid_t dataset_id, hid_t type_id, hid_t space_id, hsize_t *size)
|
|
\endcode
|
|
This routine checks the number of bytes required to store the VL data from the dataset, using
|
|
the \em space_id for the selection in the dataset on disk and the \em type_id for the memory representation
|
|
of the VL data in memory. The *\em size value is modified according to how many bytes are required
|
|
to store the VL data in memory.
|
|
|
|
<h4>Specifying how to manage memory for the VL datatype</h4>
|
|
The memory management method is determined by dataset transfer properties passed into the
|
|
#H5Dread and #H5Dwrite functions with the dataset transfer property list.
|
|
|
|
Default memory management is set by using #H5P_DEFAULT for the dataset transfer
|
|
property list identifier. If #H5P_DEFAULT is used with #H5Dread, the system \em malloc and \em free
|
|
calls will be used for allocating and freeing memory. In such a case, #H5P_DEFAULT should
|
|
also be passed as the property list identifier to #H5Dvlen_reclaim.
|
|
|
|
The rest of this subsection is relevant only to those who choose not to use default memory management.
|
|
|
|
The user can choose whether to use the system \em malloc and \em free calls or user-defined, or custom,
|
|
memory management functions. If user-defined memory management functions are to be used, the
|
|
memory allocation and free routines must be defined via #H5Pset_vlen_mem_manager(), as follows:
|
|
\code
|
|
herr_t H5Pset_vlen_mem_manager(hid_t plist_id, H5MM_allocate_t alloc, void *alloc_info, H5MM_free_t free, void *free_info)
|
|
\endcode
|
|
The \em alloc and \em free parameters identify the memory management routines to be used. If the user
|
|
has defined custom memory management routines, \em alloc and/or \em free should be set to make those
|
|
routine calls (i.e., the name of the routine is used as the value of the parameter); if the user
|
|
prefers to use the system's \em malloc and/or \em free, the \em alloc and \em free parameters, respectively, should be set to \em NULL
|
|
|
|
The prototypes for the user-defined functions would appear as follows:
|
|
\code
|
|
typedef void *(*H5MM_allocate_t)(size_t size, void *info) ; typedef void (*H5MM_free_t)(void *mem, void *free_info) ;
|
|
\endcode
|
|
The \em alloc_info and \em free_info parameters can be used to pass along any required information to
|
|
the user's memory management routines.
|
|
|
|
In summary, if the user has defined custom memory management routines, the name(s) of the routines
|
|
are passed in the \em alloc and \em free parameters and the custom routines' parameters are passed in the
|
|
\em alloc_info and \em free_info parameters. If the user wishes to use the system \em malloc and \em free functions,
|
|
the \em alloc and/or \em free parameters are set to \em NULL and the \em alloc_info and \em free_info parameters are ignored.
|
|
|
|
<h4>Recovering memory from VL buffers read in</h4>
|
|
The complex memory buffers created for a VL datatype may be reclaimed with the #H5Dvlen_reclaim
|
|
function call, as follows:
|
|
\code
|
|
herr_t H5Dvlen_reclaim(hid_t type_id, hid_t space_id, hid_t plist_id, void *buf);
|
|
\endcode
|
|
|
|
The \em type_id must be the datatype stored in the buffer, \em space_id describes the selection for the
|
|
memory buffer to free the VL datatypes within, \em plist_id is the dataset transfer property list
|
|
which was used for the I/O transfer to create the buffer, and \em buf is the pointer to the buffer
|
|
to free the VL memory within. The VL structures (#hvl_t) in the user's buffer are modified to zero
|
|
out the VL information after it has been freed.
|
|
|
|
If nested VL datatypes were used to create the buffer, this routine frees them from the bottom up,
|
|
releasing all the memory without creating memory leaks.
|
|
|
|
<hr>
|
|
Previous Chapter \ref LBDsetSubRW - Next Chapter \ref LBPropsList
|
|
|
|
Navigate back: \ref index "Main" / \ref GettingStarted / \ref LearnBasics
|
|
|
|
*/
|