hdf5/test/SWMR_UseCase_UG.txt

224 lines
8.7 KiB
Plaintext
Raw Normal View History

1. Title:
User Guide for SWMR Use Case Programs
2. Purpose:
This is a User Guide of the SWMR Use Case programs. It describes the use
case program and explain how to run them.
2.1. Author and Dates:
Version 2: By Albert Cheng, 2013/06/18.
Version 1: By Albert Cheng, 2013/06/01.
%%%%Use Case 1.7%%%%
2020-04-21 07:12:00 +08:00
3. Use Case [1.7]:
Appending a single chunk
3.1. Program name:
use_append_chunk
3.2. Description:
Appending a single chunk of raw data to a dataset along an unlimited
dimension within a pre-created file and reading the new data back.
It first creates one 3d dataset using chunked storage, each chunk
is a (1, chunksize, chunksize) square. The dataset is (unlimited,
chunksize, chunksize). Data type is 2 bytes integer. It starts out
"empty", i.e., first dimension is 0.
The writer then appends planes, each of (1,chunksize,chunksize)
to the dataset. Fills each plan with plane number and then writes
it at the nth plane. Increases the plane number and repeats till
the end of dataset, when it reaches chunksize long. End product is
a chunksize^3 cube.
The reader is a separated process, running in parallel with
the writer. It reads planes from the dataset. It expects the
dataset is being changed (growing). It checks the unlimited dimension
(dimension[0]). When it increases, it will read in the new planes, one
by one, and verify the data correctness. (The nth plan should contain
all "n".) When the unlimited dimension grows to the chunksize (it
becomes a cube), that is the expected end of data, the reader exits.
3.3. How to run the program:
Simplest way is
$ use_append_chunk
It creates a skeleton dataset (0,256,256) of shorts. Then fork off
a process, which becomes the reader process to read planes from the
dataset, while the original process continues as the writer process
to append planes onto the dataset.
Other possible options:
1. -z option: different chunksize. Default is 256.
$ use_append_chunk -z 1024
It uses (1,1024,1024) chunks to produce a 1024^3 cube, about 2GB big.
2. -f filename: different dataset file name
$ use_append_chunk -f /gpfs/tmp/append_data.h5
The data file is /gpfs/tmp/append_data.h5. This allows two independent
processes in separated compute nodes to access the datafile on the
shared /gpfs file system.
3. -l option: launch only the reader or writer process.
$ use_append_chunk -f /gpfs/tmp/append_data.h5 -l w # in node X
$ use_append_chunk -f /gpfs/tmp/append_data.h5 -l r # in node Y
In node X, launch the writer process, which creates the data file
and appends to it.
In node Y, launch the read process to read the data file.
Note that you need to time the read process to start AFTER the write
process has created the skeleton data file. Otherwise, the reader
will encounter errors such as data file not found.
4. -n option: number of planes to write/read. Default is same as the
chunk size as specified by option -z.
$ use_append_chunk -n 1000 # 1000 planes are writtern and read.
5. -s option: use SWMR file access mode or not. Default is yes.
$ use_append_chunk -s 0
It opens the HDF5 data file without the SWMR access mode (0 means
off). This likely will result in error. This option is provided for
users to see the effect of the needed SWMR access mode for concurrent
access.
3.4. Test Shell Script:
The Use Case program is installed in the test/ directory and is
compiled as part of the make process. A test script (test_usecases.sh)
is installed in the same directory to test the use case programs. The
test script is rather basic and is more for demonstrating how to
use the program.
%%%%Use Case 1.8%%%%
4. Use Case [1.8]:
Appending a hyperslab of multiple chunks.
4.1. Program name:
use_append_mchunks
4.2. Description:
Appending a hyperslab that spans several chunks of a dataset with
unlimited dimensions within a pre-created file and reading the new
data back.
It first creates one 3d dataset using chunked storage, each chunk is a (1,
chunksize, chunksize) square. The dataset is (unlimited, 2*chunksize,
2*chunksize). Data type is 2 bytes integer. Therefore, each plane
consists of 4 chunks. It starts out "empty", i.e., first dimension is 0.
The writer then appends planes, each of (1,2*chunksize,2*chunksize)
to the dataset. Fills each plan with plane number and then writes
it at the nth plane. Increases the plane number and repeats till
the end of dataset, when it reaches chunksize long. End product is
a (2*chunksize)^3 cube.
The reader is a separated process, running in parallel with
the writer. It reads planes from the dataset. It expects the
dataset is being changed (growing). It checks the unlimited dimension
(dimension[0]). When it increases, it will read in the new planes, one
by one, and verify the data correctness. (The nth plan should contain
all "n".) When the unlimited dimension grows to the 2*chunksize (it
becomes a cube), that is the expected end of data, the reader exits.
4.3. How to run the program:
Simplest way is
$ use_append_mchunks
It creates a skeleton dataset (0,512,512) of shorts. Then fork off
a process, which becomes the reader process to read planes from the
dataset, while the original process continues as the writer process
to append planes onto the dataset.
Other possible options:
1. -z option: different chunksize. Default is 256.
$ use_append_mchunks -z 512
It uses (1,512,512) chunks to produce a 1024^3 cube, about 2GB big.
2. -f filename: different dataset file name
$ use_append_mchunks -f /gpfs/tmp/append_data.h5
The data file is /gpfs/tmp/append_data.h5. This allows two independent
processes in separated compute nodes to access the datafile on the
shared /gpfs file system.
3. -l option: launch only the reader or writer process.
$ use_append_mchunks -f /gpfs/tmp/append_data.h5 -l w # in node X
$ use_append_mchunks -f /gpfs/tmp/append_data.h5 -l r # in node Y
In node X, launch the writer process, which creates the data file
and appends to it.
In node Y, launch the read process to read the data file.
Note that you need to time the read process to start AFTER the write
process has created the skeleton data file. Otherwise, the reader
will encounter errors such as data file not found.
4. -n option: number of planes to write/read. Default is same as the
chunk size as specified by option -z.
$ use_append_mchunks -n 1000 # 1000 planes are writtern and read.
5. -s option: use SWMR file access mode or not. Default is yes.
$ use_append_mchunks -s 0
It opens the HDF5 data file without the SWMR access mode (0 means
off). This likely will result in error. This option is provided for
users to see the effect of the needed SWMR access mode for concurrent
access.
4.4. Test Shell Script:
The Use Case program is installed in the test/ directory and is
compiled as part of the make process. A test script (test_usecases.sh)
is installed in the same directory to test the use case programs. The
test script is rather basic and is more for demonstrating how to
use the program.
%%%%Use Case 1.9%%%%
5. Use Case [1.9]:
Appending n-1 dimensional planes
5.1. Program names:
use_append_chunk and use_append_mchunks
5.2. Description:
Appending n-1 dimensional planes or regions to a chunked dataset where
the data does not fill the chunk.
This means the chunks have multiple planes and when a plane is written,
only one of the planes in each chunk is written. This use case is
achieved by extending the previous use cases 1.7 and 1.8 by defining the
chunks to have more than 1 plane. The -y option is implemented for both
use_append_chunk and use_append_mchunks.
5.3. How to run the program:
Simplest way is
$ use_append_mchunks -y 5
It creates a skeleton dataset (0,512,512), with storage chunks (5,512,512)
of shorts. It then proceeds like use case 1.8 by forking off a reader
process. The original process continues as the writer process that
writes 1 plane at a time, updating parts of the chunks involved. The
reader reads 1 plane at a time, retrieving data from partial chunks.
The other possible options will work just like the two use cases.
5.4. Test Shell Script:
Commands are added with -y options to demonstrate how the two use case
programs can be used as for this use case.