Purpose:
Remove H5_MPI_SPECIAL_COLLECTIVE_IO_WORKS and
H5_MPI_COMPLEX_DERIVED_DATATYPE_WORKS #defines from source.
Description:
Two advanced parallel functionalities, special collective IO and
complex derived datatypes, are not supported by older
implementations of mpi, and thus our code limits the use of these
features with #ifdefs and has checks in configure to set them (or
not). Unfortunately, configure can't actually run a parallel check
to see if these features are working (nor not) so it resorts to
looking in the config files where they are explicity enabled or
disabled based on versions of mpi, sytems being built on, or for
no documented reason at all (i.e. just set to on or off as some
'default'). Overriding these settings is easy if need be, provided
it is known that it needs to be done to get improved performance,
and oftentimes it is not.
Most new MPI implementations successfully handle the functionality
requested when these #defines are set, and many of the "turn these
features off" cases in the config files are for old (> 5 years)
versions of MPI and retired systems (such as NCSA's tungsten).
Therefore, the decision has been made to remove the support for
these old versions of MPI and systems that cannot handle these
behaviors. The #ifdefs and supporting setup in the config/ files
and configure script has been removed, and the code executed when
these options were not set removed from the source.
In passing, this commit also cleans up some whitespace issues in
both t_mpi.c and H5Dmpio.c. Furthermore, in t_mpi.c, the special
collective IO test was not getting regularly run due to it being
written to work only with four processes (we regularly test with
six, previously with three), and thus it failed when actually run
due to an out of bounds data buffer assignment. It has been
modified to run at any number of processes greater than four, and
the memory problem has been fixed so the test passes.
Tested:
jam, h5committest, ember
error and wanted to exit the test program. This was not good since if only a
subset of processes called MPI_Finalize(), the other processes will likely
hang. That happened in AIX that it would waited till the alarm signal to kill
the processes. Definitely a waste of time.
Solution: Changed it to call MPI_Abort.
That showed another problem. HDF5 has setup atexit post-process to try to close
unclose objects, release resources, etc. But if the MPI processes have
encountered an error and has been aborted, it is not likely any more MPI calls
can function properly. E.g., it would attempt to free some communicators in
the HDF5 MPIO file handle. It would again hang.
Solution: need to call H5dont_atexit() to disable any atexit post-processing.
This must be done early, like before calling H5open. This is added to each
parallel test main program.
testphdf5.h:
Changed macros VRFY and MESG. Added comments too.
testphdf5.c:
t_mpi.c:
t_cache.c:
t_shapesame.c:
Added H5dont_atexit.
Tested: h5committest.
t_file.c: replace the old variable, color, with a more meaningfule name of
is_old.
t_mpi.c: use the official mpi_file_delete, instead of remove, to delete a file
in MPI environment.
Tested:
Only jam in parallel as changes are trivial.
Remove another call to H5E_clear_stack() from within the library.
Clean up lots of compiler warnings.
Tested on:
Mac OS X/32 10.5.6 (amazon)
(followup on other platforms forthcoming)
Remove trailing whitespace from C/C++ source files, with the following
script:
foreach f (*.[ch] *.cpp)
sed 's/[[:blank:]]*$//' $f > sed.out && mv sed.out $f
end
Tested on:
Mac OS X/32 10.5.5 (amazon)
No need for h5committest, just whitespace changes...
file size from MPI_File_get_size. Bypass this problem by replacing it with
stat. Add an option --disable-mpi-size in configure to indicate this function
doesn't work properly. Add a test in testpar/t_mpi.c, too. If it returns wrong
file size, print out a warning.
Tested on kagiso (parallel) because already tested the same change to v1.6 on
several platforms (kagiso, cobalt, copper, and sol).
Tested platform:
Kagiso only since it is only a comment block change. If it works in one
machine, it should work in all, I hope. Still need to check the parallel
build on copper.
Code cleanup
Description:
Trim trailing whitespace in Makefile.am and C/C++ source files to make
diffing changes easier.
Platforms tested:
None necessary, whitespace only change
Add new tests
Description:
Collective IO doesn't work for some platforms/mpio packages when more than
one process has no contributions to IO.
Solution:
1. Add a collective IO test to verify the correctness of the library when
more than one process has no contributions to IO.
2. Add the similar MPI-IO test in t_mpi to help us maintain in more platforms.
Platforms tested:
heping, mir, copper
Misc. update:
bug fix.
Description:
The complex derived datatype test assumed that the fill value would be 0. This is not
the case on all systems.
Solution:
Modified the test to check against a known value in the outbuf array, instead of the fill value.
Platforms tested:
heping and MCR.
Misc. update:
bug fix.
Description:
Fixed typo in a comment. The word "file" was supposed to be "fill." The explanation of
how the complex derived datatype test works is much clearer now.
Solution:
Platforms tested:
minor change.
Misc. update:
A bug fix
Description:
MPI_Status_IGNORE is treated as a NULL pointer for mpich 1.2.4 or similar MPI packages.
It caused segmentation fault for MPI derived datatype test.
Solution:
Define MPI_STATUS status,
and pass &status into MPI_File_read and MPI_File_write.
Platforms tested:
too trivial to test.
Misc. update:
Improvement.
Description:
The test may hang if there are system failures that some processors
are not working.
Solution:
Added the ALARM calls to limit all tests be done with the default
alarm time. So, even if a process is hanging, the ALARM signal
would terminate the process.
Platforms tested:
tg-login2 of NCSA.
Misc. update:
"bug fix"
Description:
The test_mpio_derived_dtype() often hangs when fails. So it was
not run by default to avoid hanging the daily tests or confusing
users. But then new system or new codes in collective chunk fails
while one can't tell for sure if it is because of the complicated
derived type failures or something else.
Solution:
Changed the logic so it is skipped only if it is known that
the complicated MPI derived type does not work. (This is
indicated by macro H5_MPI_COMPLEX_DERIVED_DATATYPE_WORKS NOT
defined.
Platforms tested:
heping pp (where it is tested by default).
Modi4 pp (where it is SKIPPED by default.)
I also forced modi4 to test it and modi4 said it actually
is working and should change the setting of
H5_MPI_COMPLEX_DERIVED_DATATYPE_WORKS to working?!!
Misc. update:
Code cleaning.
Description:
The block of code that is conditioned by the H5_MPI_COMPLEX_DERIVED_DATATYPE_WORKS
and manipulates the return code of test_mpio_derived_dtype, does not really
belong in main. If return code of test_mpio_derived_dtype needs to be
adjusted, it should be done in test_mpio_derived_dtype.
Solution:
Moved that block of code test_mpio_derived_dtype.
Platforms tested:
heping PP.
Misc. update:
Code cleanup
Description:
Trim trailing whitespace, which is making 'diff'ing the two branches
difficult.
Solution:
Ran this script in each directory:
foreach f (*.[ch] *.cpp)
sed 's/[[:blank:]]*$//' $f > sed.out && mv sed.out $f
end
Platforms tested:
FreeBSD 4.11 (sleipnir)
Too minor to require h5committest
Improvement.
Description:
The derived datatype test often hangs when it fails. This blocks
daily test or automatic build. Run it only when hi verbose mode
is used.
Platforms tested:
Tested in eirene pp.
Misc. update:
Add detailed comments for MPI derived data type test.
Description:
Solution:
Platforms tested:
heping(Only comments and printf messages were added, no need to test
on all platforms.)
Misc. update:
Provide a way to warn users on the usage of collective IO because of
potential MPI-IO bugs for some platforms.
Also provide a way for users to give us feedback if the vendor has
already fixed the problem so that we can turn off the hard-code macro
in our configure description file.
Description:
See above.
Solution:
Use a simple MPI complicated derived datatype program to check
whether derived data type works for this MPI-IO package.
Print out some messages to warn users if not working as we had expected.
Platforms tested:
AIX 5.1(copper) and Linux 2.4(heping)
Misc. update:
Feature--to provide a standalone mode for t_mpi.c so that it can
be built outside of PHDF5 environment.
Description:
Move definitions that are common to all parallel test programs
to a new header file called testpar.h.
Leave only Parallel HDF5 tests related definitions in testphdf5.h.
Platforms tested:
heping (pp) and modi4(PP). Copper was down.
Misc. update:
Bug Fix/Code Cleanup/Doc Cleanup/Optimization/Branch Sync :-)
Description:
Generally speaking, this is the "signed->unsigned" change to selections.
However, in the process of merging code back, things got stickier and stickier
until I ended up doing a big "sync the two branches up" operation. So... I
brought back all the "infrastructure" fixes from the development branch to the
release branch (which I think were actually making some improvement in
performance) as well as fixed several bugs which had been fixed in one branch,
but not the other.
I've also tagged the repository before making this checkin with the label
"before_signed_unsigned_changes".
Platforms tested:
FreeBSD 4.10 (sleipnir) w/parallel & fphdf5
FreeBSD 4.10 (sleipnir) w/threadsafe
FreeBSD 4.10 (sleipnir) w/backward compatibility
Solaris 2.7 (arabica) w/"purify options"
Solaris 2.8 (sol) w/FORTRAN & C++
AIX 5.x (copper) w/parallel & FORTRAN
IRIX64 6.5 (modi4) w/FORTRAN
Linux 2.4 (heping) w/FORTRAN & C++
Misc. update:
Improvement.
Description:
Made all processes print hostname() by default so that it is easier
to spot problems.
Platforms tested:
Tested in copper only. It is a trivial small change.
Misc. update:
bug fix.
Description:
The test routines only print error messages but not all of them
return number of errors detected back to the main routine which
always exit with a 0 status. Thus make or shell commands could
not detect there were errors.
Solution:
Changed the test routines to return appropriate number of
errors to main routine which in turn exit with the appropriate
exit code if errors found.
Platforms tested:
Tested in Sol and eirene (pp).
Misc. update:
feature
Description:
Another revamp of the test interface.
TestInit: is used to register Test Program name, test program specific
Usage and option parsing routines.
TestUsage: will invoke extra usage routine if provided.
TestParseCmdLine: will invoke extra option parsing routine if provided.
GetTestSummary() and GetTestCleanup() replaces the previous Summary and
CleanUp arguments of TestParseCmdLine.
test/testhdf5, test/ttsafe.c, testpar/t_mpi.c, testpar/testphdf5.c:
All have been updated to use the new Test Routines.
testpar/t_mpi.c:
Also a fix of a compiler optimization bug when pgcc in Linux is
used to compile it. Changed buf[] and expected to unsigned char
type to avoid a bug that failed to do sign-extension.
Platforms tested:
"h5committested"
Also tested thread-safe option in eirene.
Improvement
Description:
The MPI atomicity and file_sync tests may hang if a filesystem
is not able to support the operation. This will block the
whole tests. PHDF5 is not using either features. So, removed
them from the default tests.
Platforms tested:
Only in eirene using pp. Copper is still down.
Code cleanup
Description:
Clean up lots of warnings based on those reported from the SGI compilers
as well as gcc.
Platforms tested:
SGI O3900, IRIX64 6.5 (Cheryl's SGI machine)
FreeBSD 4.9 (sleipnir) w/ & w/o parallel
h5committest
Improvement
Description:
Changed parsing of verbose level by the common test library routine.
Change t_mpi.c to use the Verbose control better.
Platforms tested:
verena (pp).
Misc. update:
Improvement.
Description:
Complete change of the verbose control to use the routines provided by
the test/libh5test.a.
Also put in a temporary fix for the H5Eset_auto() and H5Eget_auto()
so that the Compat code are isolated in one place rather than all over
the source file.
Platforms tested:
Tested in Eirene (parallel).
Misc. update:
new feature.
Description:
Added tests of 1wMr with options to apply Atomicity or File_sync.
Platforms tested:
only tested in eirene and Teragrid as this is just an MPI test.
Misc. update:
new test.
Description:
Added test_mpio_1wMr test which verify if the file system can support
1 process writes, many processes read.
Platforms tested:
h5committested.
Misc. update:
bug fix
Description:
Added a barrier to ensure all processes have finished using
the file before cleaning it away.
Added H5close() to ensure all HDF5 stuff are closed before
calling MPI_Finalize.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}? Yes
Tidy up
Description:
Old version showed tons of output even if MPI_Offset is too small
to support multiple GB sized files and destined to fail.
Output is pretty confusing.
Solution:
Prints the signness and size of MPI_Offset for information.
Skipped tests if MPI_Offset is not big enough to support the file
sizes.
Platforms tested:
modi4, eirene, burrwhite (all parallel).
Users were alarmed by the OFFSET overflow and GB file size tests.
Those tests only checks the limits of the MPI implementation, not
really as an error.
Solution:
Changed the VRFY macro to indicate it is an "ERROR".
Modified the INFO macro to print messages as "REMARK (not an error)"
so that users would not be alarmed.
Added an explanation string in the GB file size write/read.
Platforms tested:
eirene and modi4 (parallel)
Code cleanup
Description:
This was "thrown" together in a quick way to test MPIO functionality.
Cleaned out some embrassingly useless declaration to reduce compiler
warnings.
Platforms tested:
modi4-pp and eirene-pp.
Bug fix
Description:
The t_mpi used to fail and exit if any error detected.
That aborted other process in a "make check" situation.
Solution:
Introduced a new error verification as INFO. INFO is for
information only. It does not increase nerrors count.
The program always exits with 0.
Platforms tested:
eirene with mpich.
Bug fix
Description:
added a barrier to prevent racing condition before remove file and
open file.
Platforms tested:
modi4,pp
cVS: ----------------------------------------------------------------------
Bug fix and clean up.
Description:
The part that should test 4GB was actually testing 2GB due to
typo.
Solution:
Corrected the typo to use 4GB constant. Rearranged the code
to group 2GB and 4GB tests in their own. Removed some duplicated
testing code.
Platforms tested:
modi4.
new test
Description:
Added two new tests.
test_mpio_offset:
Verify that MPI_Offset exceeding 2**31 can be computed correctly.
test_mpio_gb_file
Test if MPIO can write file from under 2GB to over 2GB and then
from under 4GB to over 4GB.
Platforms tested:
modi4(-64), tflops.
Features, kind of.
Description:
Separated the MPI features test into its own independent
program so that it can be tested on its own without too
much HDF5 stuff involved.
Added automatic removal of temporary test files after
the tests completed.
Reduced the size of the dataset dimensions to avoid tripping
the SGI MPI problems of running out of internal mpi type entries.
Platforms tested:
O2K -64
Makefile.in:
Added test/ as one of the -I directories to search for header files.
Needed because <h5test.h> is used.
t_file.c t_mpi.c testphdf5.c testphdf5.h:
Added FILENAME to meet the assumption in h5test.h. (May use
CLEANUP in the future.) Moved the prefix setting to the
h5_fixname().
Changed it to skip the test instead of aborting when there is not
enough processes to do the test. Also corrected an error in the
error reporting printf statement.
t_dset.c:
testphdf5.c:
testphdf5.h:
Added option for specifying chunk dimensions.