</FONT><FONTFACE="Times"><P>This is an introduction to the HDF5 data model and programming model. Being a <I>Getting Started</I> or <I>QuickStart</I> document, this </FONT><I>Introduction to HDF5</I><FONTFACE="Times">is intended to provide enough information for you to develop a basic understanding of how HDF5 works and is meant to be used. Knowledge of the current version of HDF will make it easier to follow the text, but it is not required. More complete information of the sort you will need to actually use HDF5 is available in the HDF5 documentation at </FONT><AHREF="http://hdf.ncsa.uiuc.edu/HDF5/"><FONTFACE="Times">http://hdf.ncsa.uiuc.edu/HDF5/</FONT></A><FONTFACE="Times">. Available documents include the following:
</FONT><LI><I>HDF5 User’s Guide</I> at <AHREF="http://hdf.ncsa.uiuc.edu/HDF5/H5.user.html">http://hdf.ncsa.uiuc.edu/HDF5/H5.user.html</A>. Where appropriate, this <I>Introduction</I> will refer to specific sections of the <I>User’s Guide</I>.
</FONT><LI>The directory<FONTFACE="Courier"SIZE=2> hdf5/examples</FONT> contains the examples used in this document.
<LI>The directory<FONTFACE="Courier"SIZE=2> hdf5/test</FONT> contains the development tests used by the HDF5 developers. Since these codes are intended to fully exercise the system, they provide more diverse and sophisticated examples of what HDF5 can do.</UL>
<FONTFACE="Times"><P>HDF5 is a new, experimental version of HDF that is designed to address some of the limitations of the current version of HDF (HDF4.x) and to address current and anticipated requirements of modern systems and applications.
<P>We urge you to look at this new version of HDF and give us feedback on what you like or do not like about it, and what features you would like to see added to it.
<B><P>Why HDF5?</B> The development of HDF5 is motivated by a number of limitations in the current HDF format, as well as limitations in the library. Some of these limitations are:
</FONT><LI>A new file format designed to address some of the deficiencies of HDF4.x, particularly the need to store larger files and more objects per file.
<LI>A simpler, more comprehensive data model that includes only two basic structures: a multidimensional array of record structures, and a grouping structure.
<LI>A simpler, better-engineered library and API, with improved support for parallel i/o, threads, and other requirements imposed by modern systems and applications.</UL>
<H2><ANAME="_Toc429885300">Limitations of the current release</A></H2>
<FONTFACE="Times"><P>The beta release includes most of the basic functionality that is planned for the HDF5 library. However, the library does not implement all of the features detailed in the format and API specifications. Here is a listing of some of the limitations of the current release:
<H2><ANAME="_Toc429885301">Changes in the current release</A></H2>
<P>A detailed listing of changes in HDF5 since the last release (HDF5 1.0 alpha 2.0) can be found in the file <CODE>hdf5/RELEASE </CODE>in the beta code installation. Important changes include:
<LI>Improvements have been made in the Dataspace API.
<LI>The library has been changed to accommodate raw data filters provided by application-defined modules. Filters implemented so far include a GZIP data compression module, a checksumming module, and a very simple encryption module.
<LI>All integer and floating point formats of supported machines have been implemented, including the `long double' type where applicable.
<LI>A string datatype has been added.
<LI>All number type conversions have been implemented except conversions between integer and floating point.
<LI>New performance-enhancing features have been implemented.</UL>
<H2><ANAME="_Toc429885302">HDF5 file organization and data model</A></H2>
<FONTFACE="Times"><P>Working with groups and group members is similar in many ways to working with directories and files in UNIX. As with UNIX directories and files, objects in an HDF5 file are often described by giving their full path names.
<FONTFACE="Times"><P>Any HDF5 group or dataset may have an associated <I>attribute list.</I> An HDF5 <I>attribute</I> is a user-defined HDF5 structure that provides extra information about an HDF5 object. Attributes are described in more detail below.
<FONTFACE="Times"><P>A dataset is stored in a file in two parts: a header and a data array.
<P>The header contains information that is needed to interpret the array portion of the dataset, as well as metadata (or pointers to metadata) that describes or annotates the dataset. Header information includes the name of the object, its dimensionality, its number-type, information about how the data itself is stored on disk, and other information used by the library to speed up access to the dataset or maintain the file's integrity.
<P>There are four essential classes of information in any header: <I>name</I>, <I>datatype</I>, <I>dataspace</I>, and <I>storage layout</I>:
</FONT><B><DFN><P>Name.</B></DFN><FONTFACE="Times"> A dataset <I>name</I> is a sequence of alphanumeric ASCII characters.
</FONT><B><DFN><P>Datatype.</B></DFN><FONTFACE="Times"> HDF5 allows one to define many different kinds of datatypes. There are two categories of datatypes: <I>atomic</I> datatypes and <I>compound</I> datatypes. Atomic datatypes are those that are not decomposed at the datatype interface level, such as integers and floats. <I><CODE>NATIVE</CODE></I> datatypes are system-specific instances of atomic datatypes. Compound datatypes are made up of atomic datatypes. And <I>named</I> datatypes are either atomic or compound datatypes that are have been specifically designated to be shared across datasets.
<I><P>Atomic datatypes</I> include integers and floating-point numbers. Each atomic type belongs to a particular class and has several properties: size, order, precision, and offset. In this introduction, we consider only a few of these properties.
<P>Atomic datatypes include integer, float, date and time, string, bit field, and opaque. <I>(Note: Only integer, float and string classes are available in the current implementation.)
</I><P>Properties of integer types include size, order (endian-ness), and signed-ness (signed/unsigned).
<P>Properties of float types include the size and location of the exponent and mantissa, and the location of the sign bit.
<P>The datatypes that are supported in the current implementation are:
<em><code>NATIVE</code> datatypes.</em> Although it is possible to describe nearly any kind of atomic data type, most applications will use predefined datatypes that are supported by their compiler. In HDF5 these are called <i>native</i> datatypes. <CODE>NATIVE</CODE> datatypes are C-like datatypes that are generally supported by the hardware of the machine on which the library was compiled. In order to be portable, applications should almost always use the <CODE>NATIVE </CODE>designation to describe data values in memory.
<P>The <CODE>NATIVE</CODE> architecture has base names which do not follow the same rules as the others. Instead, native type names are similar to the C type names. Here are some examples:
<FONTFACE="Times"><P>See <I>Datatypes</I> at </FONT><AHREF="http://hdf.ncsa.uiuc.edu/HDF5/Datatypes.html">http://hdf.ncsa.uiuc.edu/HDF5/Datatypes.html</A><FONTFACE="Times"> in the<I> HDF User’s Guide</I> for further information.
<FONTFACE="Times"><P>A <I>compound datatype</I> is one in which a collection of simple datatypes are represented as a single unit, similar to a <I>struct</I> in C. The parts of a compound datatype are called <I>members.</I> The members of a compound datatype may be of any datatype, including another compound datatype. It is possible to read members from a compound type without reading the whole type.
<ta/FONT><I><P>Named datatypes.</I> Normally each dataset has its own datatype, but sometimes we may want to share a datatype among several datasets. This can be done using a <I>named </I>datatype. A named data type is stored in the file independently of any dataset, and referenced by all datasets that have that datatype. Named datatypes may have an associated attributes list.
See <I>Datatypes</I> at </FONT><AHREF="http://hdf.ncsa.uiuc.edu/HDF5/Datatypes.html">http://hdf.ncsa.uiuc.edu/HDF5/Datatypes.html</A><FONTFACE="Times"> in the<I> HDF User’s Guide</I> for further information.
<B><DFN><P>Dataspace.</B></DFN>A dataset <I>dataspace </I>describes the dimensionality of the dataset. The dimensions of a dataset can be fixed (unchanging), or they may be <I>unlimited</I>, which means that they are extendible (i.e. they can grow larger).
<P>Properties of a dataspace consist of the <I>rank </I>(number of dimensions) of the data array, the <I>actual sizes of the dimensions</I> of the array, and the <I>maximum sizes of the dimensions </I>of the array. For a fixed-dimension dataset, the actual size is the same as the maximum size of a dimension. When a dimension is unlimited, the maximum size is set to the </FONT>value <CODE>H5P_UNLIMITED</CODE>.<FONTFACE="Times"> (An example below shows how to create extendible datasets.)
<P>A dataspace can also describe portions of a dataset, making it possible to do partial I/O operations on <I>selections</I>. <I>Selection</I> is supported by the dataspace interface (H5S). Given an n-dimensional dataset, there are currently three ways to do partial selection:
<OL>
</FONT><LI>Select a logically contiguous n-dimensional hyperslab.
<LI>Select a non-contiguous hyperslab consisting of elements or blocks of elements (hyperslabs) that are equally spaced.
<LI>Select a list of independent points. </OL>
<FONTFACE="Times"><P>Since I/O operations have two end-points, the raw data transfer functions require two dataspace arguments: one describes the application memory dataspace or subset thereof, and the other describes the file dataspace or subset thereof.
<P>See <I>Dataspaces</I> at </FONT><AHREF="http://hdf.ncsa.uiuc.edu/HDF5/Dataspaces.html">http://hdf.ncsa.uiuc.edu/HDF5/Dataspaces.html</A><FONTFACE="Times"> in the<I> HDF User’s Guide</I> for further information.
</FONT><B><DFN><P>Storage layout.</B></DFN><FONTFACE="Times"> The HDF5 format makes it possible to store data in a variety of ways. The default storage layout format is <I>contiguous</I>, meaning that data is stored in the same linear way that it is organized in memory. Two other storage layout formats are currently defined for HDF5: <I>compact, </I>and<I> chunked. </I>In the future, other storage layouts may be added.<I>
<P>Compact</I> storage is used when the amount of data is small and can be stored directly in the object header. <I>(Note: Compact storage is not supported in this release.)</I>
<I><P>Chunked</I> storage involves dividing the dataset into equal-sized "chunks" that are stored separately. Chunking has three important benefits.
<OL>
<LI>It makes it possible to achieve good performance when accessing subsets of the datasets, even when the subset to be chosen is orthogonal to the normal storage order of the dataset.
<LI>It makes it possible to compress large datasets and still achieve good performance when accessing subsets of the dataset.
<LI>It makes it possible efficiently to extend the dimensions of a dataset in any direction.</OL>
<P>See <I>Datasets</I> at </FONT><AHREF="http://hdf.ncsa.uiuc.edu/HDF5/Datasets.html">http://hdf.ncsa.uiuc.edu/HDF5/Datasets.html</A><FONTFACE="Times"> in the<I> HDF User’s Guide</I> for further information.
<I>Attributes </I>are small named datasets that are attached to primary datasets, groups, or named datatypes. Attributes can be used to describe the nature and/or the intended usage of a dataset or group. An attribute has two parts: (1) a <I>name</I> and (2) a <I>value</I>. The value part contains one or more data entries of the same data type.
<FONTFACE="Times"><P>The Attribute API (H5A) is used to read or write attribute information. When accessing attributes, they can be identified by name or by an <I>index value</I>. The use of an index value makes it possible to iterate through all of the attributes associated with a given object.
<P>The HDF5 format and I/O library are designed with the assumption that attributes are small datasets. They are always stored in the object header of the object they are attached to. Because of this, large datasets should not be stored as attributes. How large is "large" is not defined by the library and is up to the user's interpretation. (Large datasets with metadata can be stored as supplemental datasets in a group with the primary dataset.)
<P>See <I>Attributes</I> at </FONT><AHREF="http://hdf.ncsa.uiuc.edu/HDF5/Attributes.html">http://hdf.ncsa.uiuc.edu/HDF5/Attributes.html</A><FONTFACE="Times"> in the<I> HDF User’s Guide</I> for further information.
<FONTFACE="Times"><P>The current HDF5 API is implemented only in C. The API provides routines for creating HDF5 files, creating and writing groups, datasets, and their attributes to HDF5 files, and reading groups, datasets and their attributes from HDF5 files.
<FONTFACE="Times"><P>All C routines in the HDF 5 library begin with a prefix of the form <B>H5*</B>, where <B>*</B> is a single letter indicating the object on which the operation is to be performed:
Example: <CODE>H5Fopen</CODE>, which opens an HDF5 file.
<B><LI>H5G</B>: <B>G</B>roup functions, for creating and operating on groups of objects. <BR>
Example: <CODE>H5Gset</CODE><FONTFACE="Courier">,</FONT>which sets the working group to the specified group.
<B><LI>H5T: </B>Data<B>T</B>ype functions, for creating and operating on simple and compound datatypes to be used as the elements in data arrays.<B><BR>
</B>Example: <CODE>H5Tcopy</CODE><FONTFACE="Courier">,</FONT>which creates a copy of an existing data type.
<B><LI>H5S: </B>Data<B>S</B>pace functions, which create and manipulate the dataspace in which the elements of a data array are stored.<BR>
Example: <CODE>H5Screate_simple</CODE>, which creates simple dataspaces.
<B><LI>H5D: D</B>ataset functions, which manipulate the data within datasets and determine how the data is to be stored in the file. <BR>
Example: <CODE>H5Dread</CODE>, which reads all or part of a dataset into a buffer in memory.
<B><LI>H5P</B>: <B>P</B>roperty list functions, for manipulating object creation and access properties. <BR>
Example: <CODE>H5Pset_chunk</CODE>, which sets the number of dimensions and the size of a chunk.
<B><LI>H5A</B>: <B>A</B>ttribute access and manipulating routines. <BR>
Example: <CODE>H5Aget_name</CODE>, which retrieves name of an attribute.
Example: <CODE>H5Eprint</CODE>, which prints the current error stack.</UL>
<H3><ANAME="_Toc429885308">Include files</A></H3>
<FONTFACE="Times"><P>There are a number definitions and declarations that should be included with any HDF5 program. These definitions and declarations are contained in several <I>include</I> files. The main include </FONT>file is <CODE>hdf5.h</CODE>. This file<FONTFACE="Times"> includes all of the other files that your program is likely to need. <I>Be sure to include </i><code>hdf5.h</code><i> in any program that uses the HDF5 library.</I></FONT>
<P>The following code fragment implements the specified model. If there is a possibility that the file already exists, the user must add the flag <CODE>H5ACC_TRUNC</CODE> to the access mode to overwrite the previous file's information.
<P>Recall that datatypes and dimensionality (dataspace) are independent objects, which are created separately from any dataset that they might be attached to. Because of this the creation of a dataset requires, at a minimum, separate definitions of datatype, dimensionality, and dataset. Hence, to create a dataset the following steps need to be taken:
</CODE><H4><ANAME="_Toc429885313">How to discard objects when they are no longer needed</A></H4>
<FONTFACE="Times"><P>The datatype, dataspace and dataset objects should be released once they are no longer needed by a program. Since each is an independent object, the must be released (or <I>closed</I>) separately. The following lines of code close the datatype, dataspace, and datasets that were created in the preceding section.
</FONT><CODE><P>H5Tclose(datatype);
<P>H5Dclose(dataset);
<P>H5Sclose(dataspace);
</CODE><H4><ANAME="_Toc429885314">How to write a dataset to a new file</A></H4>
<FONTFACE="Times"><P>Having defined the datatype, dataset, and dataspace parameters, you write out the data with a call to </FONT><CODE>H5Dwrite</CODE><FONTFACE="Courier">.
</CODE><FONTFACE="Times"><P>The third and fourth parameters of </FONT><CODE>H5Dwrite</CODE><FONTFACE="Times"> in the example describe the dataspaces in memory and in the file, respectively. They are set to the value </FONT><CODE>H5S_ALL</CODE><FONTFACE="Times"> to indicate that an entire dataset is to be written. In a later section we look at how we would access a portion of a dataset.
</FONT><P><AHREF="#CreateExample"><FONTFACE="Times">Example 1</FONT></A><FONTFACE="Times"> contains a program that creates a file and a dataset, and writes the dataset to the file.
<P>Reading is analogous to writing. If, in the previous example, we wish to read an entire dataset, we would use the same basic calls with the same parameters. Of course, the routine </FONT><CODE>H5Dread</CODE><FONTFACE="Times"> would replace </FONT><CODE>H5Dwrite</CODE><FONTFACE="Courier">.</FONT><FONTFACE="Times">
</FONT><H4><ANAME="_Toc429885315">Getting information about a dataset</A></H4>
<FONTFACE="Times"><P>Although reading is analogous to writing, it is often necessary to query a file to obtain information about a dataset. For instance, we often need to know about the datatype associated with a dataset, as well dataspace information (e.g. rank and dimensions). There are several "get" routines for obtaining this information The following code segment illustrates how we would get this kind of information:
</CODE><H4><ANAME="_Toc429885316">Reading and writing a portion of a dataset</A></H4>
<P>In the previous discussion, we describe how to access an entire dataset with one write (or read) operation. HDF5 also supports access to portions (or selections) of a dataset in one read/write operation. Currently selections are limited to hyperslabs and the lists of independent points. Both types of selection will be discussed in the following sections. Several sample cases of selection reading/writing are shown on the following figure.
</B><P>In example (a) a single hyperslab is read from the midst of a two-dimensional array in a file and stored in the corner of a smaller two-dimensional array in memory. In (b) a regular series of blocks is read from a two-dimensional array in the file and stored as a contiguous sequence of values at a certain offset in a one-dimensional array in memory. In (c) a sequence of points with no regular pattern is read from a two-dimensional array in a file and stored as a sequence of points with no regular pattern in a three-dimensional array in memory.
<P>As these examples illustrate, whenever we perform partial read/write operations on the data, the following information must be provided: file dataspace, file dataspace selection, memory dataspace and memory dataspace selection. After the required information is specified, actual read/write operation on the portion of data is done in a single call to the HDF5 read/write functions H5Dread(write).
<FONTFACE="Times"><P>Hyperslabs are portions of datasets. A hyperslab selection can be a logically contiguous collection of points in a dataspace, or it can be regular pattern of points or blocks in a dataspace. The following picture illustrates a selection of regularly spaced 3x2 blocks in an 8x12 dataspace.</FONT>
<FONTFACE="Times"><P>Four parameters are required to describe a completely general hyperslab. Each parameter is an array whose rank is the same as that of the dataspace:
</FONT><CODE><LI>start</CODE>: a starting location for the hyperslab. In the example <CODE>start</CODE> is (1,0).
<CODE><LI>stride</CODE>: the number of elements to separate each element or block to be selected. In the example <CODE>stride</CODE><I></I> is (4,3). If the stride parameter is set to NULL, the stride size defaults to 1 in each dimension.
<CODE><LI>count</CODE>: the number of elements or blocks to select along each dimension. In the example, <CODE>count</CODE> is (2,4).
<CODE><LI>block</CODE>: the size of the block selected from the dataspace. In the example, <CODE>block</CODE> is (3,2). If the block parameter is set to NULL, the block size defaults to a single element in each dimension, as if the block array was set to all 1s.</UL>
<B><P>In what order is data copied? </B>When actual I/O is performed data values are copied by default from one dataspace to another in so-called row-major, or C order. That is, it is assumed that the first dimension varies slowest, the second next slowest, and so forth.
<p><B>Example without strides or blocks.</B> Suppose we want to read a 3x4 hyperslab from a dataset in a file beginning at the element <CODE><1,2></CODE><FONTFACE="Times"> in the dataset. In order to do this, we must create a dataspace that describes the overall rank and dimensions of the dataset in the file, as well as the position and size of the hyperslab that we are extracting from that dataset. The following code illustrates the selection of the hyperslab in the file dataspace.
</CODE><FONTFACE="Times"><P>This describes the dataspace from which we wish to read. We need to define the dataspace in memory analogously. Suppose, for instance, that we have in memory a 3 dimensional 7x7x3 array into which we wish to read the 3x4 hyperslab described above beginning at the element </FONT><CODE><3,0,0></CODE><FONTFACE="Times">. Since the in-memory dataspace has three dimensions, we have to describe the hyperslab as an array with three dimensions, with the last dimension being 1: </FONT><CODE><3,4,1></CODE><FONTFACE="Times">.
<P>Notice that we must describe two things: the dimensions of the in-memory array, and the size and position of the hyperslab that we wish to read in. The following code illustrates how this would be done.
</CODE><P><AHREF="#CheckAndReadExample"><FONTFACE="Times">Example 2</FONT></A><FONTFACE="Times"> contains a complete program that performs these operations.
<B><P>Example with strides and blocks</B>. Consider the 8x12 dataspace described above, in which we selected eight 3x2 blocks. Suppose we wish to fill these eight blocks. </FONT>
</CODE><FONTFACE="Times"><P>Suppose that the source dataspace in memory is this 50-element one dimensional array called </FONT><CODE>vector</CODE><FONTFACE="Times">:</FONT>
<FONTFACE="Times"><P>The following code will write 48 elements from </FONT><CODE>vector</code> to our file dataset, starting with the second element in <code>vector</code>.
<pre>
/* Select hyperslab for the dataset in the file, using 3x2 blocks, (4,3) stride
* (2,4) count starting at the position (0,1).
*/
start[0] = 0; start[1] = 1;
stride[0] = 4; stride[1] = 3;
count[0] = 2; count[1] = 4;
block[0] = 3; block[1] = 2;
ret = H5Sselect_hyperslab(fid, H5S_SELECT_SET, start, stride, count, block);
A hyperslab specifies a regular pattern of elements in a dataset. It is also possible to specify a list of independent elements to read or write using the function <CODE>H5Sselect_elements</CODE>. Suppose, for example, that we wish to write the values 53, 59, 61, 67 to the following elements of the 8x12 array used in the previous example: (0,0), (3,3), (3,5), and (5,6). The following code selects the points and writes them to the dataset:
<P><AHREF="#WriteSelected"><FONTFACE="Times">Example 3</FONT></A><FONTFACE="Times"> contains a complete program that performs these subsetting operations.
<B><P>Properties of compound datatypes. </B>A compound datatype is similar to a struct in C or a common block in Fortran. It is a collection of one or more atomic types or small arrays of such types. To create and use of a compound datatype you need to refer to various <i>properties</i> of the data compound datatype:
<FONTFACE="Times"><P>Properties of members of a compound data type are defined when the member is added to the compound type and cannot be subsequently modified.
<B><P>Defining compound datatypes. </B>Compound datatypes must be built out of other datatypes. First, one creates an empty compound data type and specifies its total size. Then members are added to the compound data type in any order.
<I><P>Member names. </I>Each member must have a descriptive name, which is the key used to uniquely identify the member within the compound data type. A member name in an HDF5 data type does not necessarily have to be the same as the name of the corresponding member in the C struct in memory, although this is often the case. Nor does one need to define all members of the C struct in the HDF5 compound data type (or vice versa).
<I><P>Offsets. </I>Usually a C struct will be defined to hold a data point in memory, and the offsets of the members in memory will be the offsets of the struct members from the beginning of an instance of the struct. The library defines the macro to compute the offset of a member within a struct:
<br><FONTFACE="Times">This macro computes the offset of member </FONT><FONTFACE="Courier"><EM>m</EM></FONT><FONTFACE="Times">within a struct variable <EM>s</EM>.
<P>Here is an example in which a compound data type is created to describe complex numbers whose type is defined by the </FONT><CODE>complex_t</CODE><FONTFACE="Times"SIZE=2></FONT><FONTFACE="Times">struct.
</CODE><P><AHREF="#Compound">Example 4</A><FONTFACE="Times"> shows how to create a compound data type, write an array that has the compound data type to the file, and read back subsets of the members.
</FONT><H4><ANAME="_Toc429885320">Creating and writing extendible datasets</A></H4>
<FONTFACE="Times"><P>An <I>extendible</I> dataset is one whose dimensions can grow. In HDF5, it is possible to define a dataset to have certain initial dimensions, then later to increase the size of any of the initial dimensions.
<P>For example, you can create and store the following 3x3 HDF5 dataset:
<FONTFACE="Times"><P>The current version of HDF 5 requires you to use <I>chunking</I> in order to define extendible datasets. Chunking makes it possible to extend datasets efficiently, without having to reorganize storage excessively.
<P>For example, suppose we wish to create a dataset similar to the one shown above. We want to start with a 3x3 dataset, then later extend it in both directions.
<B><P>Declaring unlimited dimensions. </B>We could declare the dataspace to have unlimited dimensions with the following code, which uses the predefined constant </FONT><CODE>H5S_UNLIMITED</CODE><FONTFACE="Times"> to specify unlimited dimensions.
<B><P>Enabling chunking. </B>We can then set the dataset storage layout properties to enable chunking. We do this using the routine <CODE>H5Pset_chunk</CODE><FONTSIZE=4>:
<B><P>Extending dataset size. </B>Finally, when we want to extend the size of the dataset, we invoke <CODE>H5Dextend </CODE>to extend the size of the dataset. In the following example, we extend the dataset along the first dimension, by seven rows, so that the new dimensions are <CODE><10,3></CODE>:
</FONT><P><AHREF="#CreateExtendWrite">Example 5</A> shows how to create a 3x3 extendible dataset, write the dataset, extend the dataset to 10x3, write the dataset again, extend it again to 10x5, write the dataset again.
<P><AHREF="#ReadExtended">Example 6</A> shows how to read the data written by Example 5.
<P>Groups provide a mechanism for organizing meaningful and extendible sets of datasets within an HDF5 file. The H5G API contains routines for working with groups.
<B><P>Creating a group. </B>To create a group, use <CODE>H5Gcreate</CODE>. For example, the following code creates two groups that are members of the root group. They are called <CODE>/IntData</CODE> and <CODE>/FloatData</CODE>. The return value <CODE>dir</CODE> is the group identifier.
</CODE><P>The third parameter in <CODE>H5Gcreate</CODE> optionally specifies how much file space to reserve to store the names that will appear in this group. If a non-positive value is supplied then a default size is chosen.
<CODE><P>H5Gclose</CODE> closes the group and releases the group identifier.
<P>
<B><P>Creating an object in a particular group. </B>Except for single-object HDF5 files, every object in an HDF5 file must belong to a group, and hence has a path name. Hence, we put an object in a particular group by giving its path name when we create it. For example, the following code creates a dataset <CODE>IntArray</CODE> in the group <CODE>/IntData</CODE>:
</CODE><B><P>Changing the current group. </B>The HDF5 Group API supports the idea of a <i>current group</i>. This is analogous to the <i>current working directory</i> idea in UNIX. You can set the current group in HDF5 with the routine <CODE>H5Gset</CODE>. The following code shows how to set a current group, then create a certain dataset, <CODE>FloatData</CODE>, in that group.
<H4><ANAME="_Toc429885322">Working with attributes</A></H4>
<P>Think of an attribute as a small datasets that is attached to a normal dataset or group. The H5A API contains routines for working with attributes. Since attributes share many of the characteristics of datasets, the programming model for working with attributes is analogous in many ways to the model for working with datasets. The primary differences are that an attribute must be attached to a dataset or a group, and subsetting operations cannot be performed on attributes.
<B><P>To create an attribute </B>belonging to a particular dataset or group<B>, </B>first create a dataspace for the attribute with the call to <CODE>H5Screate</CODE>, then create the attribute using <CODE>H5Acreate</CODE>. For example, the following code creates an attribute called <CODE> Integer_attribute </CODE>that is a member of a dataset whose identifier is <CODE>dataset</CODE>. The attribute identifier is <CODE>attr2</CODE>.<CODE> H5Awrite</CODE> then sets the value of the attribute of that of the integer variable <CODE>point</code>. <code>H5Aclose</code><FONTFACE="Times">then releases the attribute identifier.
</CODE><B><P>To read a scalar attribute whose name and datatype are known</B>, first open the attribute using <CODE>H5Aopen_name</CODE>, then use H5Aread to get its value. For example the following reads a scalar attribute called <CODE>Integer_attribute</CODE> whose datatype is a native integer, and whose parent dataset has the identifier <CODE>dataset</CODE>.
</FONT><B><P>Reading an attribute whose characteristics are not known. </B>It may be necessary to query a<FONTFACE="Times"> file to obtain information about an attribute, namely its name, data type, rank and dimensions. The following code opens an attribute by its index value using </FONT><CODE>H5Aopen_index</CODE><FONTFACE="Times">, then reads in information about its datatype.
* Attach to the string attribute using its index, then read and display the value.
*/
attr = H5Aopen_idx(dataset, 2);
atype = H5Tcopy(H5T_C_S1);
H5Tset_size(atype, 4);
ret = H5Aread(attr, atype, string_out);
printf("The value of the attribute with the index 2 is %s \n", string_out);
</pre>
<code>
</CODE><P>In practice, if the characteristics of attributes are not know, the code involved in accessing and processing the attribute can be quite complex. For this reason, HDF5 includes a function called <CODE>H5Aiterate</CODE>, which applies a user-supplied function to each of a set of attributes. The user-supplied function can contain the code that interprets, accesses and processes each attribute.
<ahref="#ReadWriteAttributes">Example 8</a><ANAME="_Toc429885323">illustrates the use of the <code>H5Aiterate</code> function, as well as the other attribute examples described above.</A>
<H4><ANAME="CreateExample"><ANAME="_Toc429885325">Example 1: How to create a homogeneous multi-dimensional dataset</A> and write it to a file.</A></H4>
<P>This example creates a 2-dimensional HDF 5 dataset of little endian 32-bit integers.
<P>This example shows how to create a compound data type, write an array which has the compound data type to the file, and read back subsets of fields.