Check in "actual I/O mode" feature to trunk. Will merge back to 1.8 branch
after it bakes over the weekend.
Tested on:
FreeBSD/32 8.2 (loyalty) w/gcc4.6, w/C++ & FORTRAN, in debug mode
FreeBSD/64 8.2 (freedom) w/gcc4.6, w/C++ & FORTRAN, in debug mode
Linux/32 2.6 (jam) w/PGI compilers, w/default API=1.8.x,
w/C++ & FORTRAN, w/threadsafe, in debug mode
Linux/64-amd64 2.6 (koala) w/Intel compilers, w/default API=1.6.x,
w/C++ & FORTRAN, in production mode
Solaris/32 2.10 (linew) w/deprecated symbols disabled, w/C++ & FORTRAN,
w/szip filter, w/threadsafe, in production mode
Linux/PPC 2.6 (heiwa) w/C++ & FORTRAN, w/threadsafe, in debug mode
Linux/64-ia64 2.6 (ember) w/Intel compilers, w/C++ & FORTRAN,
in production mode
Linux/64-amd64 2.6 (abe) w/parallel, w/FORTRAN, in debug mode
Mac OS X/32 10.6.8 (amazon) in debug mode
Mac OS X/32 10.6.8 (amazon) w/C++ & FORTRAN, w/threadsafe,
in production mode
fraction of the subtests depending on the current express test level. Also
added code to display fraction of subtests skipped.
The current tables controlling the fraction of tests skipped as a function
of express test level is a guess at what will be needed. It will be necessary
to tune this table against the express test targets and our worst case system.
Initially commit tested on Jam, Koala, and Heiwa, but ran into an unrelated
failure on Heiwa (bug reported). Replaced Heiws with Linew and got a clean
h5commit test.
Also tested parallel on Koala. Initially got very bad results (test timed out
roughly 1/3 to 1/2 the way through). Discussed matters with Matthew, and moved
the build to the solid state drive on Koala. This dealt with the performance
issues completely.
Ran bin/reconfigure to update the Makefile.in in directories not part of the fortran directory check=in. Updates Makefile.in due to changes made in configure.in for the Fortran 2003 additions.
Tested on all platforms run under daily tests.
Purpose:
Remove H5_MPI_SPECIAL_COLLECTIVE_IO_WORKS and
H5_MPI_COMPLEX_DERIVED_DATATYPE_WORKS #defines from source.
Description:
Two advanced parallel functionalities, special collective IO and
complex derived datatypes, are not supported by older
implementations of mpi, and thus our code limits the use of these
features with #ifdefs and has checks in configure to set them (or
not). Unfortunately, configure can't actually run a parallel check
to see if these features are working (nor not) so it resorts to
looking in the config files where they are explicity enabled or
disabled based on versions of mpi, sytems being built on, or for
no documented reason at all (i.e. just set to on or off as some
'default'). Overriding these settings is easy if need be, provided
it is known that it needs to be done to get improved performance,
and oftentimes it is not.
Most new MPI implementations successfully handle the functionality
requested when these #defines are set, and many of the "turn these
features off" cases in the config files are for old (> 5 years)
versions of MPI and retired systems (such as NCSA's tungsten).
Therefore, the decision has been made to remove the support for
these old versions of MPI and systems that cannot handle these
behaviors. The #ifdefs and supporting setup in the config/ files
and configure script has been removed, and the code executed when
these options were not set removed from the source.
In passing, this commit also cleans up some whitespace issues in
both t_mpi.c and H5Dmpio.c. Furthermore, in t_mpi.c, the special
collective IO test was not getting regularly run due to it being
written to work only with four processes (we regularly test with
six, previously with three), and thus it failed when actually run
due to an out of bounds data buffer assignment. It has been
modified to run at any number of processes greater than four, and
the memory problem has been fixed so the test passes.
Tested:
jam, h5committest, ember
General shared library improvements for CYGWIN / AIX
Description:
Shared libraries are disabled on both CYGWIN and AIX due
to inability to build them correctly. Part of the problem
in both of these situations is the lack of the libtool
flag -no-undefined, which tells libtool that all needed
symbols are defined at link time (a requirement on these
systems) and that it's okay to build shared libraries.
Another problem are lack of dependencies between wrapper
libraries and core C HDF5 library.
This patch addresses both of these by fixing configure to
add in -no-undefined flag for libtool during linking and
adds automake dependencies in the Makefile.am files.
After testing, both CYGWIN and AIX now generate shared
libraries, but there are still some test failures in each.
(cache_api, dt_arith, and testerror.sh on CYGWIN, and
fortran tests on AIX).
Even though the shared libraries are not quite perfect,
this is a general improvement to what we had before, so
I'm applying the patch anyways. Note that default behavior
of shared libraries on these systems being disabled has
NOT been changed and requires the use of the
--enable-unsupported to attempt to build them.
We will need to address the test failures in each
architecture prior to formally supporting shared
libraries on each.
Tested:
h5committested & CYGWIN tested (on bangan)
(AIX tested by Albert on bp-login2)
Add "silent make" mode configure option.
Description:
Automake 1.11 has a new option available that allows for a
silent make mode. This functionality needs to be explicitly
enabled in configure.in via the use of the automake macro
AM_SILENT_RULES, which is what this commit is adding.
This introduces a new configure option:
--{en|dis}able-silent-rules
This option is on by default, and simplies compile and link
line outputs when building the library. Disabling this option
will print full "verbose" output (i.e., full compile and
linking lines for each target).
Tested:
This was tested on jam & h5committested
- Revise shared Fortran library disabling scenarios in configure
- Improve configure output summary
Description:
Shared Fortran libraries are not supported on Mac, but were being
disabled by configure in a way that also forced the C libraries
to be static-only. This has been fixed, so now only shared Fortran
is disabled while shared C can remain.
This prompted two additional changes:
1. While working on the check that addresses whether or not
shared Fortran libraries are allowed, removed old and no
longer needed check(s) that disable shared Fortran
libraries with HP, Intel 8, PGI, and Absoft compilers.
(Essentially, Mac is the only situation in which Fortran
shared are disabled by configure.)
2. Having two different states of libraries (i.e. shared C
library with static-only Fortran library) was not apparent
in the configure summary, which labeled all libraries as
either shared and/or static. I've added lines to both the
C++ and Fortran output sections to list shared/static-ness
of these libraries specifically.
Additionally, I've made sure that the new --enable-unsupported
configure option correctly overrides configure if it tries to
disable a shared library.
Tested:
jam, fred, & h5committest
Clean up MPI resource leaks in parallel tests, along with a bunch of
compiler warnings.
Tested on:
FreeBSD/32 6.3 (duty) in debug mode
FreeBSD/64 6.3 (liberty) w/C++ & FORTRAN, in debug mode
Linux/32 2.6 (jam) w/PGI compilers, w/default API=1.8.x,
w/C++ & FORTRAN, w/threadsafe, in debug mode
Linux/64-amd64 2.6 (amani) w/Intel compilers, w/default API=1.6.x,
w/C++ & FORTRAN, in production mode
Solaris/32 2.10 (linew) w/deprecated symbols disabled, w/C++ & FORTRAN,
w/szip filter, w/threadsafe, in production mode
Linux/PPC 2.6 (heiwa) w/C++ & FORTRAN, w/threadsafe, in debug mode
Linux/64-amd64 2.6 (abe) w/parallel, w/FORTRAN, in debug mode
Mac OS X/32 10.6.6 (amazon) in debug mode
Mac OS X/32 10.6.6 (amazon) w/C++ & FORTRAN, w/threadsafe,
in production mode
When $HDF5ExpressTest is NOT set or when it is set to 1 or 0, it does not
skip test.
When $HDF5ExpressTest is set other than values above, it may skip tests.
The following message is printed:
Test skipped
when some tests are really skipped.
This is a temporary patch so that v186 can be tested. A more permanent fix
is needed, later.
Tested: h5committest.
This continues the previous work and this one breaks the
checker_board_hyperslab_dr_pio_test() into 4 smaller
sub-tests.
Tested: h5committest plus jam serial.
The shape same tests ran too long. Break them into smaller subtests
so that they can finish sub-test in a shorter time. Easier to tell
which one sub-test is taking too much time and/or errors occur in
one fo the sub-tests.
This one breaks the contig_hyperslab_dr_pio_test() into 4 smaller
sub-tests.
Tested: h5committest
error and wanted to exit the test program. This was not good since if only a
subset of processes called MPI_Finalize(), the other processes will likely
hang. That happened in AIX that it would waited till the alarm signal to kill
the processes. Definitely a waste of time.
Solution: Changed it to call MPI_Abort.
That showed another problem. HDF5 has setup atexit post-process to try to close
unclose objects, release resources, etc. But if the MPI processes have
encountered an error and has been aborted, it is not likely any more MPI calls
can function properly. E.g., it would attempt to free some communicators in
the HDF5 MPIO file handle. It would again hang.
Solution: need to call H5dont_atexit() to disable any atexit post-processing.
This must be done early, like before calling H5open. This is added to each
parallel test main program.
testphdf5.h:
Changed macros VRFY and MESG. Added comments too.
testphdf5.c:
t_mpi.c:
t_cache.c:
t_shapesame.c:
Added H5dont_atexit.
Tested: h5committest.
not find t_shapesame in daily test. Turned out the mpiexec launcher is
working like real shell and the daily test signon (hdftest) does not have
"." in its $PATH. So, it could not automatically look for executables in
the current directory.
Solution:
Change the executable to an explicit ./t_shapesame. Now mpiexec can "find"
it.
Tested by hand in Amani.
Moved the two shape same tests from testphdf5 to a separated executables,
named t_shapesame. The shape same tests runs too long for testphdf5.
In a separated executalbe, it will be easier to separate any errors in
testphdf5 sub-tests from the shape same tests.
t_shapesame.c:
Contains the shape same tests (cloned from t_rank_projection.c) plus
a duplicate of "testphdf5.c" for now. After verifying it is correct, more
cleanup is needed.
testphdf5.c:
Removed the two shape same tests (chsssdrpio & cbhsssdrpio).
Makefile.am:
Makefile.in:
Added t_shapesame as a new test executable.
Removed t_rank_projections.c from part of testphdf5.
testph5.sh.in:
Temporary added the "t_shapesame -p" test for testing shape same tests
with MPIO-Posix VFD.
Tested: h5committested, plus serial jam.
Checked in fix for failure in shape same tests that appeared after
Quincy's recent massage of the test code. The problem was a race
condition created when Quincey re-worked the code selecting either
collective or independant I/O.
Previously, when independant I/O was selected in the test, I had
used H5Pset_dxpl_mpio() and H5Pset_dxpl_mpio_collective_opt() to
select collective semantics with independant I/O going on under
the hood. Quincey modified this to call H5Pset_dxpl_mpio() when
collective I/O was selected, and do nothing in the independant I/O
case. As a result, processes were able to race ahead and
modify the initial values of the data set before some processes
had verified that the initialization was correct.
Solved the problem by adding barriers, and making all barriers
dependant on independant I/O being selected.
Tested parallel on amani and phoenix. h5committested.
Note that parallel on amani and h5committest on heiwa failed
several times before I got a clean pass without code changes.
The failures on amani seemed to be time outs caused by contention
for the machine -- worryingly, they occurred in the shape same
tests. However, given subsequent passes and passes on jam and
phoenix, I am going ahead with the commit.
The failure on heiwa was in the fheap test. I don't see how
this can be related to changes in testpar, and in any case, it
went away on the second try.
Correct tests to use native datatypes consistently, and also to use
"normal" methods for performing collective I/O. Also, minor cleanups for
zeroing out buffers, etc.
Tested on:
AIX/64 6.? (bp) w/parallel
metadata confusion test that appeared after Albert modified the test.
Cursory commit test. No test on Abe as that system is down, the
fix is very minor, and it seems to work in the 1.8.6 branch
John Mainzer fixed the bug and added a test which wrote file and flush a few
time; close the file then open it by serial and read simple structure. I
changed the test to two parallel running parts of ..._writer and ..._reader
and have the reader verify the file after every flush by the writer.
Tested: parallel in Jam and Amani.
Replaced calls to H5Dcreate() and H5Acreate() with calls to H5Dcreate2()
and H5Acreate2() respectively in t_mdset.c.
This was done to repair a compile failure that occured on a build
with the --with-default-api-version=v16 config option
Cursory commit test
Modified test code in t_mdset to use H5Dopen2() instead of H5Dopen1().
This should fix the compile failure when we used --disable-deprecated-symbols
Cursory commit test
of the H5Ocache.c code to update its image of the on disk representation
of the object header on a call to the clear callback.
This wasn't an issue as long as all flushes of the object header were
made from the same process, but if an object header is modified, and
then flushed on one process and cleared on the rest, the changes were
not be reflected in the images of the on disk representation on all
processes where the object header was cleared rather than flushed.
If one of these processes did the next flush, the changes were lost in
the on disk representation.
Fixed this by causing all dirty messages and to be written to the copy
of the on disk image maintained by the object header code on both flush
and clear.
Also added associated test code in t_mdset.c.
Also checking in some cache debug code developed while chasing this bug.
Commit tested and tested (parallel) on phoenix.
Problem appears to have been caused by file system contention.
In the chunked dataset case, reshaping the chunks so that only one
process would touch each chunk and setting the alignment equal to the
default Lustre block size more or less dealt with the problem.
For contiguous datasets, the problem was a bit more difficult, as
re-working the test to avoid contention would have been very time
consuming.
Instead, I added code to time one execution of each type of shape
same test, and skip additional tests of that type if the duration
of the test exceeded some threshold
In all cases, I set up code to turn off the above fixes if express
test is 0.
Tested on Abe and commit tested. On the commit test, the configure
test failed -- probably because I was h5committest from heiwa due
to some ssh wierdness. In any case a manual reconfigure run on
jam seemed to work fine.
Also, in h5committest, I ran into some data conversion warnings.
I didn't worry about them as the only code I changed was in testpar.
Corrected use/name of source folder aliases.
Duplicated FindMPI.cmake so that non-c++ compiler is found first (recommemded commands did not work).
Tested: local linux with mpich
Bring r19234 from the 1.8 branch to the trunk:
Initialize loop variable that caused failures in certain circumstances.
Also clean up compiler warnings and release MPI datatype.
Tested on:
FreeBSD/32 6.3 (duty) in debug mode
FreeBSD/64 6.3 (liberty) w/C++ & FORTRAN, in debug mode
Linux/32 2.6 (jam) w/PGI compilers, w/default API=1.8.x,
w/C++ & FORTRAN, w/threadsafe, in debug mode
Linux/64-amd64 2.6 (amani) w/Intel compilers, w/default API=1.6.x,
w/C++ & FORTRAN, in production mode
Solaris/32 2.10 (linew) w/deprecated symbols disabled, w/C++ & FORTRAN,
w/szip filter, w/threadsafe, in production mode
Linux/PPC 2.6 (heiwa) w/C++ & FORTRAN, w/threadsafe, in debug mode
Linux/64-ia64 2.6 (cobalt) w/Intel compilers, w/C++ & FORTRAN,
in production mode
Linux/64-amd64 2.6 (abe) w/parallel, w/FORTRAN, in debug mode
Mac OS X/32 10.6.4 (amazon) in debug mode
Mac OS X/32 10.6.4 (amazon) w/C++ & FORTRAN, w/threadsafe,
in production mode
Mac OS X/32 10.6.4 (amazon) w/parallel, in debug mode
Rename H5AC_set() to H5AC_insert_entry()
Get rid of H5C_set_skip_flags() & related flags
Tested on:
Mac OS X/32 10.6.4 (amazon) w/debug, production & parallel
(too simple to require h5committest)