HDFFV-8146 - Remove "multi-chunk IO without optimization" sub-feature from MPI I/O optimization for chunked dataset feature
Description:
The “multi-chunk IO without optimization” feature is removed and made the related xfer property (H5FD_MPIO_CHUNK_MULTI_IO) go directly to “multi-chunk-io” feature.
Also update/fix/cleanup testings (chunk collective IO and actual chunk opt mode) accordingly.
Tested:
jam (linux32-LE), koala (linux64-LE), ostrich (linuxppc64-BE), fred (mac64-LE), Windows (32-LE cmake), cmake (jam)
HDFFV-8143 Provide a routine(s) for telling the user why the library broke collective data access
Description:
Fixed Daily test failed from the previous commit r22735. (ember)
Also changed H5Pget_mpio_no_collective_cause() parameter type from
H5D_mpio_no_collective_cause_t to uint32_t due to change to return
combined bitmap value which can be not emun defined value.
Tested:
jam (linux32-LE), koala-pp (linux64-LE), ember, h5committest
Bring generic improvements from encode/decode property list branch to
the trunk. This includes a better version of the property list comparison
routine, cleaned up compiler warnings, and some cleaned up property list
callbacks. Also, started on changes to clean up parallel test output, so that
it doesn't report successful tests from each process.
Tested on:
Mac OSX/64 10.7.4 (amazon) w/debug, GCC 4.7.x, FORTRAN, C++, threadsafe and parallel
Linux 2.6/32 (jam) w/debug
Solaris 2.7/64 (linew) w/debug
HDFFV-8143 Provide a routine(s) for telling the user why the library broke collective data access
Description:
Daily test failed from the previous commit r22735. (ember)
Skip tests not to disrupt other tests while finding a solution for ember.
HDFFV-8143 Provide a routine(s) for telling the user why the library broke collective data access
Description:
Daily test failed from the previous commit r22735. (ember)
Follow actual_io function to sync before go futher as this is similar
function.
Tested:
jam (linux32-LE), koala-pp (linux64-LE), ember
HDFFV-8143 Provide a routine(s) for telling the user why the library broke collective data access
Description:
Daily test failed from the previous commit r22735. (koala , ember)
Fixed failure due to not be able to read external-storage file from external test.
Tested:
jam (linux32-LE), koala-pp (linux64-LE)
HDFFV-8143 Provide a routine(s) for telling the user why the library broke collective data access
Description:
Added H5Pget_mpio_no_collective_cause() function that retrive reasons why the collective I/O was broken during Read/Write IO access.
Reasons to break collective I/O:
- SET_INDEPENDENT
- DATATYPE_CONVERSION
- DATA_TRANSFORMS
- MPIPOSIX
- NOT_SIMPLE_OR_SCALAR_DATASPACES (NULL Space)
- POINT_SELECTIONS
- NOT_CONTIGUOUS_OR_CHUNKED_DATASET (Compact or External-Storage)
- FILTERS
Tested:
jam (linux32-LE), koala (linux64-LE), ostrich (linuxppc64-BE), tejeda (mac32-LE), linew (solaris-BE)
Clean up more FUNC_ENTER/FUNC_LEAVE macros and move H5D & H5T code toward
the final design (as exemplified by the H5EA & H5FA code).
Tested on:
Mac OSX/64 10.7.3 (amazon) w/debug & parallel
Correct use of deprecated API routines in test routine.
Tested on:
FreeBSD/32 8.2 (loyalty) w/gcc4.6, w/C++ & FORTRAN, in debug mode
FreeBSD/64 8.2 (freedom) w/gcc4.6, w/C++ & FORTRAN, in debug mode
Linux/32 2.6 (jam) w/PGI compilers, w/default API=1.8.x,
w/C++ & FORTRAN, w/threadsafe, in debug mode
Linux/64-amd64 2.6 (koala) w/Intel compilers, w/default API=1.6.x,
w/C++ & FORTRAN, in production mode
Solaris/32 2.10 (linew) w/deprecated symbols disabled, w/C++ & FORTRAN,
w/szip filter, w/threadsafe, in production mode
Linux/PPC 2.6 (ostrich) w/C++ & FORTRAN, w/threadsafe, in debug mode
Linux/64-ia64 2.6 (ember) w/Intel compilers, w/C++ & FORTRAN,
in production mode
Mac OS X/32 10.6.8 (amazon) in debug mode
Mac OS X/32 10.6.8 (amazon) w/C++ & FORTRAN, w/threadsafe,
in production mode
Check in "actual I/O mode" feature to trunk. Will merge back to 1.8 branch
after it bakes over the weekend.
Tested on:
FreeBSD/32 8.2 (loyalty) w/gcc4.6, w/C++ & FORTRAN, in debug mode
FreeBSD/64 8.2 (freedom) w/gcc4.6, w/C++ & FORTRAN, in debug mode
Linux/32 2.6 (jam) w/PGI compilers, w/default API=1.8.x,
w/C++ & FORTRAN, w/threadsafe, in debug mode
Linux/64-amd64 2.6 (koala) w/Intel compilers, w/default API=1.6.x,
w/C++ & FORTRAN, in production mode
Solaris/32 2.10 (linew) w/deprecated symbols disabled, w/C++ & FORTRAN,
w/szip filter, w/threadsafe, in production mode
Linux/PPC 2.6 (heiwa) w/C++ & FORTRAN, w/threadsafe, in debug mode
Linux/64-ia64 2.6 (ember) w/Intel compilers, w/C++ & FORTRAN,
in production mode
Linux/64-amd64 2.6 (abe) w/parallel, w/FORTRAN, in debug mode
Mac OS X/32 10.6.8 (amazon) in debug mode
Mac OS X/32 10.6.8 (amazon) w/C++ & FORTRAN, w/threadsafe,
in production mode
fraction of the subtests depending on the current express test level. Also
added code to display fraction of subtests skipped.
The current tables controlling the fraction of tests skipped as a function
of express test level is a guess at what will be needed. It will be necessary
to tune this table against the express test targets and our worst case system.
Initially commit tested on Jam, Koala, and Heiwa, but ran into an unrelated
failure on Heiwa (bug reported). Replaced Heiws with Linew and got a clean
h5commit test.
Also tested parallel on Koala. Initially got very bad results (test timed out
roughly 1/3 to 1/2 the way through). Discussed matters with Matthew, and moved
the build to the solid state drive on Koala. This dealt with the performance
issues completely.
Ran bin/reconfigure to update the Makefile.in in directories not part of the fortran directory check=in. Updates Makefile.in due to changes made in configure.in for the Fortran 2003 additions.
Tested on all platforms run under daily tests.
Purpose:
Remove H5_MPI_SPECIAL_COLLECTIVE_IO_WORKS and
H5_MPI_COMPLEX_DERIVED_DATATYPE_WORKS #defines from source.
Description:
Two advanced parallel functionalities, special collective IO and
complex derived datatypes, are not supported by older
implementations of mpi, and thus our code limits the use of these
features with #ifdefs and has checks in configure to set them (or
not). Unfortunately, configure can't actually run a parallel check
to see if these features are working (nor not) so it resorts to
looking in the config files where they are explicity enabled or
disabled based on versions of mpi, sytems being built on, or for
no documented reason at all (i.e. just set to on or off as some
'default'). Overriding these settings is easy if need be, provided
it is known that it needs to be done to get improved performance,
and oftentimes it is not.
Most new MPI implementations successfully handle the functionality
requested when these #defines are set, and many of the "turn these
features off" cases in the config files are for old (> 5 years)
versions of MPI and retired systems (such as NCSA's tungsten).
Therefore, the decision has been made to remove the support for
these old versions of MPI and systems that cannot handle these
behaviors. The #ifdefs and supporting setup in the config/ files
and configure script has been removed, and the code executed when
these options were not set removed from the source.
In passing, this commit also cleans up some whitespace issues in
both t_mpi.c and H5Dmpio.c. Furthermore, in t_mpi.c, the special
collective IO test was not getting regularly run due to it being
written to work only with four processes (we regularly test with
six, previously with three), and thus it failed when actually run
due to an out of bounds data buffer assignment. It has been
modified to run at any number of processes greater than four, and
the memory problem has been fixed so the test passes.
Tested:
jam, h5committest, ember
General shared library improvements for CYGWIN / AIX
Description:
Shared libraries are disabled on both CYGWIN and AIX due
to inability to build them correctly. Part of the problem
in both of these situations is the lack of the libtool
flag -no-undefined, which tells libtool that all needed
symbols are defined at link time (a requirement on these
systems) and that it's okay to build shared libraries.
Another problem are lack of dependencies between wrapper
libraries and core C HDF5 library.
This patch addresses both of these by fixing configure to
add in -no-undefined flag for libtool during linking and
adds automake dependencies in the Makefile.am files.
After testing, both CYGWIN and AIX now generate shared
libraries, but there are still some test failures in each.
(cache_api, dt_arith, and testerror.sh on CYGWIN, and
fortran tests on AIX).
Even though the shared libraries are not quite perfect,
this is a general improvement to what we had before, so
I'm applying the patch anyways. Note that default behavior
of shared libraries on these systems being disabled has
NOT been changed and requires the use of the
--enable-unsupported to attempt to build them.
We will need to address the test failures in each
architecture prior to formally supporting shared
libraries on each.
Tested:
h5committested & CYGWIN tested (on bangan)
(AIX tested by Albert on bp-login2)
Add "silent make" mode configure option.
Description:
Automake 1.11 has a new option available that allows for a
silent make mode. This functionality needs to be explicitly
enabled in configure.in via the use of the automake macro
AM_SILENT_RULES, which is what this commit is adding.
This introduces a new configure option:
--{en|dis}able-silent-rules
This option is on by default, and simplies compile and link
line outputs when building the library. Disabling this option
will print full "verbose" output (i.e., full compile and
linking lines for each target).
Tested:
This was tested on jam & h5committested
- Revise shared Fortran library disabling scenarios in configure
- Improve configure output summary
Description:
Shared Fortran libraries are not supported on Mac, but were being
disabled by configure in a way that also forced the C libraries
to be static-only. This has been fixed, so now only shared Fortran
is disabled while shared C can remain.
This prompted two additional changes:
1. While working on the check that addresses whether or not
shared Fortran libraries are allowed, removed old and no
longer needed check(s) that disable shared Fortran
libraries with HP, Intel 8, PGI, and Absoft compilers.
(Essentially, Mac is the only situation in which Fortran
shared are disabled by configure.)
2. Having two different states of libraries (i.e. shared C
library with static-only Fortran library) was not apparent
in the configure summary, which labeled all libraries as
either shared and/or static. I've added lines to both the
C++ and Fortran output sections to list shared/static-ness
of these libraries specifically.
Additionally, I've made sure that the new --enable-unsupported
configure option correctly overrides configure if it tries to
disable a shared library.
Tested:
jam, fred, & h5committest
Clean up MPI resource leaks in parallel tests, along with a bunch of
compiler warnings.
Tested on:
FreeBSD/32 6.3 (duty) in debug mode
FreeBSD/64 6.3 (liberty) w/C++ & FORTRAN, in debug mode
Linux/32 2.6 (jam) w/PGI compilers, w/default API=1.8.x,
w/C++ & FORTRAN, w/threadsafe, in debug mode
Linux/64-amd64 2.6 (amani) w/Intel compilers, w/default API=1.6.x,
w/C++ & FORTRAN, in production mode
Solaris/32 2.10 (linew) w/deprecated symbols disabled, w/C++ & FORTRAN,
w/szip filter, w/threadsafe, in production mode
Linux/PPC 2.6 (heiwa) w/C++ & FORTRAN, w/threadsafe, in debug mode
Linux/64-amd64 2.6 (abe) w/parallel, w/FORTRAN, in debug mode
Mac OS X/32 10.6.6 (amazon) in debug mode
Mac OS X/32 10.6.6 (amazon) w/C++ & FORTRAN, w/threadsafe,
in production mode
When $HDF5ExpressTest is NOT set or when it is set to 1 or 0, it does not
skip test.
When $HDF5ExpressTest is set other than values above, it may skip tests.
The following message is printed:
Test skipped
when some tests are really skipped.
This is a temporary patch so that v186 can be tested. A more permanent fix
is needed, later.
Tested: h5committest.
This continues the previous work and this one breaks the
checker_board_hyperslab_dr_pio_test() into 4 smaller
sub-tests.
Tested: h5committest plus jam serial.
The shape same tests ran too long. Break them into smaller subtests
so that they can finish sub-test in a shorter time. Easier to tell
which one sub-test is taking too much time and/or errors occur in
one fo the sub-tests.
This one breaks the contig_hyperslab_dr_pio_test() into 4 smaller
sub-tests.
Tested: h5committest
error and wanted to exit the test program. This was not good since if only a
subset of processes called MPI_Finalize(), the other processes will likely
hang. That happened in AIX that it would waited till the alarm signal to kill
the processes. Definitely a waste of time.
Solution: Changed it to call MPI_Abort.
That showed another problem. HDF5 has setup atexit post-process to try to close
unclose objects, release resources, etc. But if the MPI processes have
encountered an error and has been aborted, it is not likely any more MPI calls
can function properly. E.g., it would attempt to free some communicators in
the HDF5 MPIO file handle. It would again hang.
Solution: need to call H5dont_atexit() to disable any atexit post-processing.
This must be done early, like before calling H5open. This is added to each
parallel test main program.
testphdf5.h:
Changed macros VRFY and MESG. Added comments too.
testphdf5.c:
t_mpi.c:
t_cache.c:
t_shapesame.c:
Added H5dont_atexit.
Tested: h5committest.
not find t_shapesame in daily test. Turned out the mpiexec launcher is
working like real shell and the daily test signon (hdftest) does not have
"." in its $PATH. So, it could not automatically look for executables in
the current directory.
Solution:
Change the executable to an explicit ./t_shapesame. Now mpiexec can "find"
it.
Tested by hand in Amani.