mirror of
https://github.com/HDFGroup/hdf5.git
synced 2024-12-21 07:51:46 +08:00
317 lines
14 KiB
Plaintext
317 lines
14 KiB
Plaintext
/** \page TN Technical Notes
|
||
|
||
\li \ref api-compat-macros
|
||
\li \ref APPDBG
|
||
\li \ref FMTDISC
|
||
\li \ref FILEIMGOPS
|
||
\li \ref subsubsec_dataset_transfer_filter
|
||
\li \ref IOFLOW
|
||
\li \ref TNMDC
|
||
\li \ref thread-safe-lib
|
||
\li \ref SWMR
|
||
\li \ref VDS
|
||
\li \ref RELVERSION
|
||
\li \ref VFL
|
||
\li <a href="https://github.com/HDFGroup/arch-doc/blob/main/An_Overview_of_the_HDF5_Library_Architecture.v2.pdf">HDF5 Library Architecture Overview</a>
|
||
\li \ref VOL_Connector
|
||
|
||
*/
|
||
|
||
/** \page IOFLOW HDF5 Raw I/O Flow Notes
|
||
|
||
\htmlinclude IOFlow.html
|
||
|
||
*/
|
||
|
||
/** \page RELVERSION HDF5 Library Release Version Numbers
|
||
|
||
\htmlinclude LibraryReleaseVersionNumbers.html
|
||
|
||
*/
|
||
|
||
/** \page VFL HDF5 Virtual File Layer
|
||
|
||
\htmlinclude VFL.html
|
||
|
||
*/
|
||
|
||
/** \page FMTDISC HDF5 File Format Discussion
|
||
|
||
\htmlinclude FileFormat.html
|
||
|
||
*/
|
||
|
||
/** \page FILEIMGOPS HDF5 File Image Operations
|
||
|
||
\htmlinclude FileImageOps.html
|
||
|
||
*/
|
||
|
||
/** \page APPDBG Debugging HDF5 Applications
|
||
|
||
\htmlinclude DebuggingHDF5Applications.html
|
||
|
||
*/
|
||
|
||
/** \page SWMR Introduction to Single-Writer/Multiple-Reader (SWMR)
|
||
|
||
\section sec_swmr_intro Introduction to SWMR
|
||
The Single-Writer / Multiple-Reader (SWMR) feature enables multiple processes to read an HDF5 file
|
||
while it is being written to (by a single process) without using locks or requiring communication between processes.
|
||
<img src=tutr-swmr1.png alt="tutr-swmr1.png" width=500>
|
||
|
||
All communication between processes must be performed via the HDF5 file. The HDF5 file under SWMR access must
|
||
reside on a system that complies with POSIX write() semantics.
|
||
|
||
The basic engineering challenge for this to work was to ensure that the readers of an HDF5 file always
|
||
see a coherent (though possibly not up to date) HDF5 file.
|
||
|
||
The issue is that when writing data there is information in the metadata cache in addition to the physical file on disk:
|
||
<img src=tutr-swmr2.png alt="tutr-swmr2.png" width=500>
|
||
|
||
However, the readers can only see the state contained in the physical file:
|
||
<img src=tutr-swmr3.png alt="tutr-swmr3.png" width=500>
|
||
|
||
The SWMR solution implements dependencies on when the metadata can be flushed to the file. This ensures that metadata cache
|
||
flush operations occur in the proper order, so that there will never be internal file pointers in the physical file
|
||
that point to invalid (unflushed) file addresses.
|
||
|
||
A beneficial side effect of using SWMR access is better fault tolerance. It is more difficult to corrupt a file when using SWMR.
|
||
|
||
\subsection subsec_swmr_doc Documentation
|
||
\subsubsection subsubsec_swmr_doc_guide User Guide
|
||
<a href="https://docs.hdfgroup.org/documentation/HDF5/features/SWMR/HDF5_SWMR_Users_Guide.pdf">SWMR User Guide</a>
|
||
|
||
\subsubsection subsubsec_swmr_doc_apis HDF5 Library APIs
|
||
<ul>
|
||
<li>#H5Fstart_swmr_write — Enables SWMR writing mode for a file</li>
|
||
<li>#H5DOappend — Appends data to a dataset along a specified dimension</li>
|
||
<li>#H5Pset_object_flush_cb — Sets a callback function to invoke when an object flush occurs in the file</li>
|
||
<li>#H5Pget_object_flush_cb — Retrieves the object flush property values from the file access property list</li>
|
||
<li>#H5Odisable_mdc_flushes — Prevents metadata entries for an HDF5 object from being flushed from the metadata cache to storage</li>
|
||
<li>#H5Oenable_mdc_flushes — Enables flushing of dirty metadata entries from a file’s metadata cache</li>
|
||
<li>#H5Oare_mdc_flushes_disabled — Determines if an HDF5 object has had flushes of metadata entries disabled</li>
|
||
</ul>
|
||
|
||
\subsubsection subsubsec_swmr_doc_tools Tools
|
||
\li h5watch — Outputs new records appended to a dataset as the dataset grows
|
||
\li h5format_convert — Converts the layout format version and chunked indexing types of datasets created with
|
||
HDF5-1.10 so that applications built with HDF5-1.8 can access them
|
||
\li h5clear — Clears superblock status_flags field, removes metadata cache image, prints EOA and EOF, or sets EOA of a file
|
||
|
||
\subsubsection subsubsec_swmr_doc_design Design Documents
|
||
|
||
\subsection subsec_swmr_model Programming Model
|
||
Please be aware that the SWMR feature requires that an HDF5 file be created with the latest file format. See
|
||
#H5Pset_libver_bounds for more information.
|
||
|
||
To use SWMR follow the the general programming model for creating and accessing HDF5 files and objects along with the steps described below.
|
||
|
||
\subsubsection subsubsec_swmr_model_writer SWMR Writer
|
||
The SWMR writer either opens an existing file and objects or creates them as follows.
|
||
|
||
Open an existing file:
|
||
Call #H5Fopen using the #H5F_ACC_SWMR_WRITE flag.
|
||
Begin writing datasets.
|
||
Periodically flush data.
|
||
|
||
Create a new file:
|
||
Call #H5Fcreate using the latest file format.
|
||
Create groups, datasets and attributes, and then close the attributes.
|
||
Call #H5Fstart_swmr_write to start SWMR access to the file.
|
||
Periodically flush data.
|
||
|
||
<h4 id="example-code">Example Code:</h4>
|
||
Create the file using the latest file format property:
|
||
\code
|
||
fapl = H5Pcreate (H5P_FILE_ACCESS);
|
||
status = H5Pset_libver_bounds (fapl, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST);
|
||
fid = H5Fcreate (filename, H5F_ACC_TRUNC, H5P_DEFAULT, fapl);
|
||
// Create objects (files, datasets, ...).
|
||
// Close any attributes and named datatype objects.
|
||
// Groups and datasets may remain open before starting SWMR access to them.
|
||
|
||
// Start SWMR access to the file:
|
||
status = H5Fstart_swmr_write (fid);
|
||
|
||
// Reopen the datasets and then start writing, periodically flushing data:
|
||
status = H5Dwrite (dset_id, ...);
|
||
status = H5Dflush (dset_id);
|
||
\endcode
|
||
|
||
\subsubsection subsubsec_swmr_model_reader SWMR Reader
|
||
The SWMR reader must continually poll for new data:
|
||
|
||
Call #H5Fopen using the #H5F_ACC_SWMR_READ flag.
|
||
Poll, checking the size of the dataset to see if there is new data available for reading.
|
||
Read new data, if any.
|
||
|
||
<h4 id="example-code-1">Example Code:</h4>
|
||
\code
|
||
// Open the file using the SWMR read flag:
|
||
fid = H5Fopen (filename, H5F_ACC_RDONLY | H5F_ACC_SWMR_READ, H5P_DEFAULT);
|
||
// Open the dataset and then repeatedly poll the dataset, by getting the dimensions, reading new data, and refreshing:
|
||
dset_id = H5Dopen (...);
|
||
space_id = H5Dget_space (...);
|
||
while (...) {
|
||
status = H5Dread (dset_id, ...);
|
||
status = H5Drefresh (dset_id);
|
||
space_id = H5Dget_space (...);
|
||
}
|
||
\endcode
|
||
|
||
\subsection subsec_swmr_scope Limitations and Scope
|
||
An HDF5 file under SWMR access must reside on a system that complies with POSIX write()
|
||
semantics. It is also limited in scope as follows.
|
||
|
||
The writer process is only allowed to modify raw data of existing datasets by;
|
||
Appending data along any unlimited dimension.
|
||
Modifying existing data
|
||
The following operations are not allowed (and the corresponding HDF5 files will fail)
|
||
\li The writer cannot add new objects to the file.
|
||
\li The writer cannot delete objects in the file.
|
||
\li The writer cannot modify or append data with variable length, string or region reference datatypes.
|
||
\li File space recycling is not allowed. As a result the size of a file modified by a SWMR writer may be larger than a file modified by a non-SWMR writer.</p>
|
||
|
||
\subsection subsec_swmr_tools Tools for Working with SWMR
|
||
Two new tools, h5watch and h5clear, are available for use with SWMR. The other HDF5 utilities have also been modified to recognize SWMR
|
||
\li The h5watch tool allows a user to monitor the growth of a dataset.
|
||
\li The h5clear tool clears the status flags in the superblock of an HDF5 file.
|
||
\li The rest of the HDF5 tools will exit gracefully but not work with SWMR otherwise.
|
||
|
||
\subsection subsec_swmr_example Programming Example
|
||
A good example of using SWMR is included with the HDF5 tests in the source code. You can run it while reading
|
||
the file it creates. If you then interrupt the application and reader and look at the resulting file, you will
|
||
see that the file is still valid. Follow these steps:
|
||
\li Download the HDF5 source code to a local directory on a filesystem (that complies with POSIX write() semantics).
|
||
Build the software. No special configuration options are needed to use SWMR.
|
||
\li Invoke two command terminal windows. In one window go into the bin directory of the built binaries.
|
||
In the other window go into the test directory of the HDF5-1.10 source code that was just built.
|
||
\li In the window in the test directory compile and run use_append_chunk.c. The example writes a three
|
||
dimensional dataset by planes (with chunks of size 1 x 256 x 256).
|
||
\li In the other window (in the bin directory) run h5watch on the file created by
|
||
use_append_chunk.c (use_append_chunk.h5). It should be run while use_append_chunk is executing and you
|
||
will see valid data displayed with h5watch.
|
||
\li Interrupt use_append_chunk while it is running, and stop h5watch.
|
||
\li Use h5clear to clear the status flags in the superblock of the HDF5 file (use_append_chunk.h5).
|
||
\li View the file with h5dump. You will see that it is a valid file even though the application did not
|
||
close properly. It will contain data up to the point that it was interrupted.
|
||
|
||
*/
|
||
|
||
/** \page VDS Introduction to the Virtual Dataset - VDS
|
||
|
||
\section sec_vds_intro Introduction to VDS
|
||
The HDF5 Virtual Dataset (VDS) feature enables users to access data in a collection of HDF5 files as a
|
||
single HDF5 dataset and to use the HDF5 APIs to work with that dataset.
|
||
|
||
For example, your data may be collected into four files:
|
||
<img src="tutrvds-multimgs.png" alt="tutrvds-multimgs.png" width=750>
|
||
|
||
You can map the datasets in the four files into a single VDS that can be accessed just like any other dataset:
|
||
<img src="tutrvds-snglimg.png" alt="tutrvds-snglimg.png" width=500>
|
||
|
||
The mapping between a VDS and the HDF5 source datasets is persistent and transparent to an application. If a source
|
||
file is missing the fill value will be displayed.
|
||
|
||
See the Virtual (VDS) Documentation for complete details regarding the VDS feature.
|
||
|
||
The VDS feature was implemented using hyperslab selection (#H5Sselect_hyperslab). See the tutorial on
|
||
Reading From or Writing to a Subset of a Dataset for more information on selecting hyperslabs.
|
||
|
||
\subsection subsec_vds_intro_model Programming Model
|
||
To create a Virtual Dataset you simply follow the HDF5 programming model and add a few additional API calls
|
||
to map the source code datasets to the VDS.
|
||
|
||
Following are the steps for creating a Virtual Dataset:
|
||
\li Create the source datasets that will comprise the VDS
|
||
\li Create the VDS: ‐ Define a datatype and dataspace (can be unlimited)
|
||
\li Define the dataset creation property list (including fill value)
|
||
\li (Repeat for each source dataset) Map elements from the source dataset to elements of the VDS
|
||
\li Select elements in the source dataset (source selection)
|
||
\li Select elements in the virtual dataset (destination selection)
|
||
\li Map destination selections to source selections (see Functions for Working with a VDS)
|
||
\li Call H5Dcreate using the properties defined above
|
||
\li Access the VDS as a regular HDF5 dataset
|
||
\li Close the VDS when finished
|
||
|
||
<h4>Functions for Working with a VDS</h4>
|
||
The #H5Pset_virtual API sets the mapping between virtual and source datasets. This is a dataset creation property list.
|
||
Using this API will change the layout of the dataset to #H5D_VIRTUAL. As with specifying any dataset creation property
|
||
list, an instance of the property list is created, modified, passed into the dataset creation call and then closed:
|
||
\code
|
||
dcpl = H5Pcreate (H5P_DATASET_CREATE);
|
||
src_space = H5screate_simple ...
|
||
status = H5Sselect_hyperslab (space, ...
|
||
status = H5Pset_virtual (dcpl, space, SRC_FILE[i], SRC_DATASET[i], src_space);
|
||
dset = H5Dcreate2 (file, DATASET, H5T_NATIVE_INT, space, H5P_DEFAULT, dcpl, H5P_DEFAULT);
|
||
status = H5Pclose (dcpl);
|
||
\endcode
|
||
|
||
There are several other APIs introduced with Virtual Datasets, including query functions. For details
|
||
see the complete list of HDF5 library APIs that support Virtual Datasets.
|
||
|
||
<h4>Limitations</h4>
|
||
This feature was introduced in HDF5-1.10.
|
||
|
||
The number of source datasets is unlimited. However, there is a limit on the size of each source dataset.
|
||
|
||
\subsection subsec_vds_intro_examples Programming Examples
|
||
<em>Example 1</em>
|
||
This example creates three HDF5 files, each with a one-dimensional dataset of 6 elements. The datasets in these files
|
||
are the source datasets that are then used to create a 4 x 6 Virtual Dataset with a fill value of -1. The first three
|
||
rows of the VDS are mapped to the data from the three source datasets as shown below:
|
||
<img src="tutrvds-ex.png" alt="tutrvds-ex.png" width=500>
|
||
|
||
In this example the three source datasets are mapped to the VDS with this code:
|
||
\code>
|
||
src_space = H5Screate_simple (RANK1, dims, NULL);
|
||
for (i = 0; i < 3; i++) {
|
||
start[0] = (hsize_t)i;
|
||
// Select i-th row in the virtual dataset; selection in the source datasets is the same.
|
||
status = H5Sselect_hyperslab (space, H5S_SELECT_SET, start, NULL, count, block);
|
||
status = H5Pset_virtual (dcpl, space, SRC_FILE[i], SRC_DATASET[i], src_space);
|
||
}
|
||
endcode>
|
||
|
||
After the VDS is created and closed, it is reopened. The property list is then queried to determine the
|
||
layout of the dataset and its mappings, and the data in the VDS is read and printed.
|
||
|
||
This example is in the HDF5 source code and can be obtained from here:
|
||
<h4>C Example</h4>
|
||
For details on compiling an HDF5 application: [ Compiling HDF5 Applications ]
|
||
|
||
<h4>Example 2</h4>
|
||
This example shows how to use a C-style printf statement for specifying multiple source datasets as one virtual
|
||
dataset. Only one mapping is required. In other words only one #H5Pset_virtual call is needed to map multiple datasets.
|
||
It creates a 2-dimensional unlimited VDS. Then it re-opens the file, makes queries, and reads the virtual dataset.
|
||
|
||
The source datasets are specified as A-0, A-1, A-2, and A-3. These are mapped to the virtual dataset with one call:
|
||
\code
|
||
status = H5Pset_virtual (dcpl, vspace, SRCFILE, "A-%b", src_space);
|
||
\endcode
|
||
|
||
The %b indicates that the block count of the selection in the dimension should be used.
|
||
|
||
<h4>C Example</h4>
|
||
For details on compiling an HDF5 application: [ Compiling HDF5 Applications ]
|
||
|
||
Using h5dump with a VDS
|
||
The h5dump utility can be used to view a VDS. The h5dump output for a VDS looks exactly like that for any other dataset.
|
||
If h5dump cannot find a source dataset then the fill value will be displayed.
|
||
|
||
You can determine that a dataset is a VDS by looking at its properties with
|
||
\code
|
||
h5dump -p
|
||
\endcode
|
||
It will display each source dataset mapping, beginning with Mapping 0. Below is an excerpt of the output of
|
||
\code
|
||
h5dump -p
|
||
\endcode
|
||
on the vds.h5 file created in Example 1.You can see that the entire source file a.h5 is mapped to the first row of the VDS dataset.
|
||
|
||
<img src="tutrvds-map.png" alt="tutrvds-map.png" width=650>
|
||
|
||
*/
|
||
|