Merge 64-bit ID changes from branch to trunk. (Plus a few minor cleanups
that aren't on the branch)
Tested on:
Mac OSX/64 10.9.4 (amazon) w/C++ & FORTRAN
(h5committested on branch already for a week)
ib files. The fix also removed dependencies on libhdf5, etc. when the --disable-sharedlib-rpath co
nfigure option was invoked.
Added instead configure variable hardcode_into_lib=no. This removes rpath from lib files on Linux
and solaris machines.
Tested with h5committest on jam, koala, ostrich and platypus (cmake), and with configure option --disable-sharedlib-rpath on emu, platypus and quail.
Begin process of migrating from using property list IDs internally to the
library to using the internal generic property list data structure.
Tested on:
Mac OSX/64 10.9.2 (amazon) w/C++, FORTRAN & parallel
(h5committest forthcoming)
Remove all traces of MPI-POSIX VFD and GPFS detection/code.
Remove remaining traces of stream VFD.
Remove testpar/t_posix_compliant test (it's not actually verifying anything).
Clean up H5D__mpio_opt_possible() further.
Moved environment variable that disables MPI collective operations into
MPI-IO VFD (instead of it being in src/H5S.c).
A few other small code cleanups.
Tested on:
Mac OSX/64 10.9.2 (amazon) w/parallel & serial
Check in Mohamad's changes to support collective I/O on point selections,
along with some other minor cleanups.
Tested on:
Mac OSX/64 10.9.2 (amazon) w/parallel & serial
(h5committest forthcoming)
view objects. The addition of view objects in the fastforward project
is expected to be brough into the trunk sometimes in the future, which
is why we need to make this change.
Tested Manually on Jam and Ostrich.
Tested with h5commitest - Koala with intel compilers failed, but nothing had to do with those changes.
error on Koala: error while loading shared libraries: libirng.so
or 2 processes.
First bug is in testpar/t_mdset.c, where the test reports an error in
addition to skipping the test if there are less than three procs. Fix
to just skip the test.
Second bug is in testpar/t_dset.c in actual_io_mode tests, where
incorrect expected value for IO mode was set if the number of procs
running the test is 1.
tested with h5committest.
exclusively.
Part of the preparation for a fix for HDFFV-8551.
Tested on:
32-bit LE linux (jam) w/ parallel and Fortran.
There are no behavior changes, so testing was minimal.
Daily test failed in koala with parallel and v16compat API from the previous
commit r22735.
Descriptoin:
Changed to use H5Dopen2() instead of H5Dopen().
Tested: koala --enable-parallel --with-default-api-version=v16.
No h5committest test since this is limited to parallel test program.
Stop aliasing property to indicate internal collective metadata operations
with property to perform collective raw data operations from the application.
Tested on:
Mac OSX/64 10.8.3 (amazon) w/paralllel
HDFFV-8146 - Remove "multi-chunk IO without optimization" sub-feature from MPI I/O optimization for chunked dataset feature
Description:
The “multi-chunk IO without optimization” feature is removed and made the related xfer property (H5FD_MPIO_CHUNK_MULTI_IO) go directly to “multi-chunk-io” feature.
Also update/fix/cleanup testings (chunk collective IO and actual chunk opt mode) accordingly.
Tested:
jam (linux32-LE), koala (linux64-LE), ostrich (linuxppc64-BE), fred (mac64-LE), Windows (32-LE cmake), cmake (jam)
HDFFV-8143 Provide a routine(s) for telling the user why the library broke collective data access
Description:
Fixed Daily test failed from the previous commit r22735. (ember)
Also changed H5Pget_mpio_no_collective_cause() parameter type from
H5D_mpio_no_collective_cause_t to uint32_t due to change to return
combined bitmap value which can be not emun defined value.
Tested:
jam (linux32-LE), koala-pp (linux64-LE), ember, h5committest
Bring generic improvements from encode/decode property list branch to
the trunk. This includes a better version of the property list comparison
routine, cleaned up compiler warnings, and some cleaned up property list
callbacks. Also, started on changes to clean up parallel test output, so that
it doesn't report successful tests from each process.
Tested on:
Mac OSX/64 10.7.4 (amazon) w/debug, GCC 4.7.x, FORTRAN, C++, threadsafe and parallel
Linux 2.6/32 (jam) w/debug
Solaris 2.7/64 (linew) w/debug
HDFFV-8143 Provide a routine(s) for telling the user why the library broke collective data access
Description:
Daily test failed from the previous commit r22735. (ember)
Skip tests not to disrupt other tests while finding a solution for ember.
HDFFV-8143 Provide a routine(s) for telling the user why the library broke collective data access
Description:
Daily test failed from the previous commit r22735. (ember)
Follow actual_io function to sync before go futher as this is similar
function.
Tested:
jam (linux32-LE), koala-pp (linux64-LE), ember
HDFFV-8143 Provide a routine(s) for telling the user why the library broke collective data access
Description:
Daily test failed from the previous commit r22735. (koala , ember)
Fixed failure due to not be able to read external-storage file from external test.
Tested:
jam (linux32-LE), koala-pp (linux64-LE)
HDFFV-8143 Provide a routine(s) for telling the user why the library broke collective data access
Description:
Added H5Pget_mpio_no_collective_cause() function that retrive reasons why the collective I/O was broken during Read/Write IO access.
Reasons to break collective I/O:
- SET_INDEPENDENT
- DATATYPE_CONVERSION
- DATA_TRANSFORMS
- MPIPOSIX
- NOT_SIMPLE_OR_SCALAR_DATASPACES (NULL Space)
- POINT_SELECTIONS
- NOT_CONTIGUOUS_OR_CHUNKED_DATASET (Compact or External-Storage)
- FILTERS
Tested:
jam (linux32-LE), koala (linux64-LE), ostrich (linuxppc64-BE), tejeda (mac32-LE), linew (solaris-BE)
Clean up more FUNC_ENTER/FUNC_LEAVE macros and move H5D & H5T code toward
the final design (as exemplified by the H5EA & H5FA code).
Tested on:
Mac OSX/64 10.7.3 (amazon) w/debug & parallel
Correct use of deprecated API routines in test routine.
Tested on:
FreeBSD/32 8.2 (loyalty) w/gcc4.6, w/C++ & FORTRAN, in debug mode
FreeBSD/64 8.2 (freedom) w/gcc4.6, w/C++ & FORTRAN, in debug mode
Linux/32 2.6 (jam) w/PGI compilers, w/default API=1.8.x,
w/C++ & FORTRAN, w/threadsafe, in debug mode
Linux/64-amd64 2.6 (koala) w/Intel compilers, w/default API=1.6.x,
w/C++ & FORTRAN, in production mode
Solaris/32 2.10 (linew) w/deprecated symbols disabled, w/C++ & FORTRAN,
w/szip filter, w/threadsafe, in production mode
Linux/PPC 2.6 (ostrich) w/C++ & FORTRAN, w/threadsafe, in debug mode
Linux/64-ia64 2.6 (ember) w/Intel compilers, w/C++ & FORTRAN,
in production mode
Mac OS X/32 10.6.8 (amazon) in debug mode
Mac OS X/32 10.6.8 (amazon) w/C++ & FORTRAN, w/threadsafe,
in production mode
Check in "actual I/O mode" feature to trunk. Will merge back to 1.8 branch
after it bakes over the weekend.
Tested on:
FreeBSD/32 8.2 (loyalty) w/gcc4.6, w/C++ & FORTRAN, in debug mode
FreeBSD/64 8.2 (freedom) w/gcc4.6, w/C++ & FORTRAN, in debug mode
Linux/32 2.6 (jam) w/PGI compilers, w/default API=1.8.x,
w/C++ & FORTRAN, w/threadsafe, in debug mode
Linux/64-amd64 2.6 (koala) w/Intel compilers, w/default API=1.6.x,
w/C++ & FORTRAN, in production mode
Solaris/32 2.10 (linew) w/deprecated symbols disabled, w/C++ & FORTRAN,
w/szip filter, w/threadsafe, in production mode
Linux/PPC 2.6 (heiwa) w/C++ & FORTRAN, w/threadsafe, in debug mode
Linux/64-ia64 2.6 (ember) w/Intel compilers, w/C++ & FORTRAN,
in production mode
Linux/64-amd64 2.6 (abe) w/parallel, w/FORTRAN, in debug mode
Mac OS X/32 10.6.8 (amazon) in debug mode
Mac OS X/32 10.6.8 (amazon) w/C++ & FORTRAN, w/threadsafe,
in production mode
fraction of the subtests depending on the current express test level. Also
added code to display fraction of subtests skipped.
The current tables controlling the fraction of tests skipped as a function
of express test level is a guess at what will be needed. It will be necessary
to tune this table against the express test targets and our worst case system.
Initially commit tested on Jam, Koala, and Heiwa, but ran into an unrelated
failure on Heiwa (bug reported). Replaced Heiws with Linew and got a clean
h5commit test.
Also tested parallel on Koala. Initially got very bad results (test timed out
roughly 1/3 to 1/2 the way through). Discussed matters with Matthew, and moved
the build to the solid state drive on Koala. This dealt with the performance
issues completely.
Ran bin/reconfigure to update the Makefile.in in directories not part of the fortran directory check=in. Updates Makefile.in due to changes made in configure.in for the Fortran 2003 additions.
Tested on all platforms run under daily tests.
Purpose:
Remove H5_MPI_SPECIAL_COLLECTIVE_IO_WORKS and
H5_MPI_COMPLEX_DERIVED_DATATYPE_WORKS #defines from source.
Description:
Two advanced parallel functionalities, special collective IO and
complex derived datatypes, are not supported by older
implementations of mpi, and thus our code limits the use of these
features with #ifdefs and has checks in configure to set them (or
not). Unfortunately, configure can't actually run a parallel check
to see if these features are working (nor not) so it resorts to
looking in the config files where they are explicity enabled or
disabled based on versions of mpi, sytems being built on, or for
no documented reason at all (i.e. just set to on or off as some
'default'). Overriding these settings is easy if need be, provided
it is known that it needs to be done to get improved performance,
and oftentimes it is not.
Most new MPI implementations successfully handle the functionality
requested when these #defines are set, and many of the "turn these
features off" cases in the config files are for old (> 5 years)
versions of MPI and retired systems (such as NCSA's tungsten).
Therefore, the decision has been made to remove the support for
these old versions of MPI and systems that cannot handle these
behaviors. The #ifdefs and supporting setup in the config/ files
and configure script has been removed, and the code executed when
these options were not set removed from the source.
In passing, this commit also cleans up some whitespace issues in
both t_mpi.c and H5Dmpio.c. Furthermore, in t_mpi.c, the special
collective IO test was not getting regularly run due to it being
written to work only with four processes (we regularly test with
six, previously with three), and thus it failed when actually run
due to an out of bounds data buffer assignment. It has been
modified to run at any number of processes greater than four, and
the memory problem has been fixed so the test passes.
Tested:
jam, h5committest, ember
General shared library improvements for CYGWIN / AIX
Description:
Shared libraries are disabled on both CYGWIN and AIX due
to inability to build them correctly. Part of the problem
in both of these situations is the lack of the libtool
flag -no-undefined, which tells libtool that all needed
symbols are defined at link time (a requirement on these
systems) and that it's okay to build shared libraries.
Another problem are lack of dependencies between wrapper
libraries and core C HDF5 library.
This patch addresses both of these by fixing configure to
add in -no-undefined flag for libtool during linking and
adds automake dependencies in the Makefile.am files.
After testing, both CYGWIN and AIX now generate shared
libraries, but there are still some test failures in each.
(cache_api, dt_arith, and testerror.sh on CYGWIN, and
fortran tests on AIX).
Even though the shared libraries are not quite perfect,
this is a general improvement to what we had before, so
I'm applying the patch anyways. Note that default behavior
of shared libraries on these systems being disabled has
NOT been changed and requires the use of the
--enable-unsupported to attempt to build them.
We will need to address the test failures in each
architecture prior to formally supporting shared
libraries on each.
Tested:
h5committested & CYGWIN tested (on bangan)
(AIX tested by Albert on bp-login2)
Add "silent make" mode configure option.
Description:
Automake 1.11 has a new option available that allows for a
silent make mode. This functionality needs to be explicitly
enabled in configure.in via the use of the automake macro
AM_SILENT_RULES, which is what this commit is adding.
This introduces a new configure option:
--{en|dis}able-silent-rules
This option is on by default, and simplies compile and link
line outputs when building the library. Disabling this option
will print full "verbose" output (i.e., full compile and
linking lines for each target).
Tested:
This was tested on jam & h5committested
- Revise shared Fortran library disabling scenarios in configure
- Improve configure output summary
Description:
Shared Fortran libraries are not supported on Mac, but were being
disabled by configure in a way that also forced the C libraries
to be static-only. This has been fixed, so now only shared Fortran
is disabled while shared C can remain.
This prompted two additional changes:
1. While working on the check that addresses whether or not
shared Fortran libraries are allowed, removed old and no
longer needed check(s) that disable shared Fortran
libraries with HP, Intel 8, PGI, and Absoft compilers.
(Essentially, Mac is the only situation in which Fortran
shared are disabled by configure.)
2. Having two different states of libraries (i.e. shared C
library with static-only Fortran library) was not apparent
in the configure summary, which labeled all libraries as
either shared and/or static. I've added lines to both the
C++ and Fortran output sections to list shared/static-ness
of these libraries specifically.
Additionally, I've made sure that the new --enable-unsupported
configure option correctly overrides configure if it tries to
disable a shared library.
Tested:
jam, fred, & h5committest
Clean up MPI resource leaks in parallel tests, along with a bunch of
compiler warnings.
Tested on:
FreeBSD/32 6.3 (duty) in debug mode
FreeBSD/64 6.3 (liberty) w/C++ & FORTRAN, in debug mode
Linux/32 2.6 (jam) w/PGI compilers, w/default API=1.8.x,
w/C++ & FORTRAN, w/threadsafe, in debug mode
Linux/64-amd64 2.6 (amani) w/Intel compilers, w/default API=1.6.x,
w/C++ & FORTRAN, in production mode
Solaris/32 2.10 (linew) w/deprecated symbols disabled, w/C++ & FORTRAN,
w/szip filter, w/threadsafe, in production mode
Linux/PPC 2.6 (heiwa) w/C++ & FORTRAN, w/threadsafe, in debug mode
Linux/64-amd64 2.6 (abe) w/parallel, w/FORTRAN, in debug mode
Mac OS X/32 10.6.6 (amazon) in debug mode
Mac OS X/32 10.6.6 (amazon) w/C++ & FORTRAN, w/threadsafe,
in production mode
When $HDF5ExpressTest is NOT set or when it is set to 1 or 0, it does not
skip test.
When $HDF5ExpressTest is set other than values above, it may skip tests.
The following message is printed:
Test skipped
when some tests are really skipped.
This is a temporary patch so that v186 can be tested. A more permanent fix
is needed, later.
Tested: h5committest.
This continues the previous work and this one breaks the
checker_board_hyperslab_dr_pio_test() into 4 smaller
sub-tests.
Tested: h5committest plus jam serial.
The shape same tests ran too long. Break them into smaller subtests
so that they can finish sub-test in a shorter time. Easier to tell
which one sub-test is taking too much time and/or errors occur in
one fo the sub-tests.
This one breaks the contig_hyperslab_dr_pio_test() into 4 smaller
sub-tests.
Tested: h5committest
error and wanted to exit the test program. This was not good since if only a
subset of processes called MPI_Finalize(), the other processes will likely
hang. That happened in AIX that it would waited till the alarm signal to kill
the processes. Definitely a waste of time.
Solution: Changed it to call MPI_Abort.
That showed another problem. HDF5 has setup atexit post-process to try to close
unclose objects, release resources, etc. But if the MPI processes have
encountered an error and has been aborted, it is not likely any more MPI calls
can function properly. E.g., it would attempt to free some communicators in
the HDF5 MPIO file handle. It would again hang.
Solution: need to call H5dont_atexit() to disable any atexit post-processing.
This must be done early, like before calling H5open. This is added to each
parallel test main program.
testphdf5.h:
Changed macros VRFY and MESG. Added comments too.
testphdf5.c:
t_mpi.c:
t_cache.c:
t_shapesame.c:
Added H5dont_atexit.
Tested: h5committest.
not find t_shapesame in daily test. Turned out the mpiexec launcher is
working like real shell and the daily test signon (hdftest) does not have
"." in its $PATH. So, it could not automatically look for executables in
the current directory.
Solution:
Change the executable to an explicit ./t_shapesame. Now mpiexec can "find"
it.
Tested by hand in Amani.
Moved the two shape same tests from testphdf5 to a separated executables,
named t_shapesame. The shape same tests runs too long for testphdf5.
In a separated executalbe, it will be easier to separate any errors in
testphdf5 sub-tests from the shape same tests.
t_shapesame.c:
Contains the shape same tests (cloned from t_rank_projection.c) plus
a duplicate of "testphdf5.c" for now. After verifying it is correct, more
cleanup is needed.
testphdf5.c:
Removed the two shape same tests (chsssdrpio & cbhsssdrpio).
Makefile.am:
Makefile.in:
Added t_shapesame as a new test executable.
Removed t_rank_projections.c from part of testphdf5.
testph5.sh.in:
Temporary added the "t_shapesame -p" test for testing shape same tests
with MPIO-Posix VFD.
Tested: h5committested, plus serial jam.
Checked in fix for failure in shape same tests that appeared after
Quincy's recent massage of the test code. The problem was a race
condition created when Quincey re-worked the code selecting either
collective or independant I/O.
Previously, when independant I/O was selected in the test, I had
used H5Pset_dxpl_mpio() and H5Pset_dxpl_mpio_collective_opt() to
select collective semantics with independant I/O going on under
the hood. Quincey modified this to call H5Pset_dxpl_mpio() when
collective I/O was selected, and do nothing in the independant I/O
case. As a result, processes were able to race ahead and
modify the initial values of the data set before some processes
had verified that the initialization was correct.
Solved the problem by adding barriers, and making all barriers
dependant on independant I/O being selected.
Tested parallel on amani and phoenix. h5committested.
Note that parallel on amani and h5committest on heiwa failed
several times before I got a clean pass without code changes.
The failures on amani seemed to be time outs caused by contention
for the machine -- worryingly, they occurred in the shape same
tests. However, given subsequent passes and passes on jam and
phoenix, I am going ahead with the commit.
The failure on heiwa was in the fheap test. I don't see how
this can be related to changes in testpar, and in any case, it
went away on the second try.
Correct tests to use native datatypes consistently, and also to use
"normal" methods for performing collective I/O. Also, minor cleanups for
zeroing out buffers, etc.
Tested on:
AIX/64 6.? (bp) w/parallel
metadata confusion test that appeared after Albert modified the test.
Cursory commit test. No test on Abe as that system is down, the
fix is very minor, and it seems to work in the 1.8.6 branch
John Mainzer fixed the bug and added a test which wrote file and flush a few
time; close the file then open it by serial and read simple structure. I
changed the test to two parallel running parts of ..._writer and ..._reader
and have the reader verify the file after every flush by the writer.
Tested: parallel in Jam and Amani.
Replaced calls to H5Dcreate() and H5Acreate() with calls to H5Dcreate2()
and H5Acreate2() respectively in t_mdset.c.
This was done to repair a compile failure that occured on a build
with the --with-default-api-version=v16 config option
Cursory commit test
Modified test code in t_mdset to use H5Dopen2() instead of H5Dopen1().
This should fix the compile failure when we used --disable-deprecated-symbols
Cursory commit test
of the H5Ocache.c code to update its image of the on disk representation
of the object header on a call to the clear callback.
This wasn't an issue as long as all flushes of the object header were
made from the same process, but if an object header is modified, and
then flushed on one process and cleared on the rest, the changes were
not be reflected in the images of the on disk representation on all
processes where the object header was cleared rather than flushed.
If one of these processes did the next flush, the changes were lost in
the on disk representation.
Fixed this by causing all dirty messages and to be written to the copy
of the on disk image maintained by the object header code on both flush
and clear.
Also added associated test code in t_mdset.c.
Also checking in some cache debug code developed while chasing this bug.
Commit tested and tested (parallel) on phoenix.
Problem appears to have been caused by file system contention.
In the chunked dataset case, reshaping the chunks so that only one
process would touch each chunk and setting the alignment equal to the
default Lustre block size more or less dealt with the problem.
For contiguous datasets, the problem was a bit more difficult, as
re-working the test to avoid contention would have been very time
consuming.
Instead, I added code to time one execution of each type of shape
same test, and skip additional tests of that type if the duration
of the test exceeded some threshold
In all cases, I set up code to turn off the above fixes if express
test is 0.
Tested on Abe and commit tested. On the commit test, the configure
test failed -- probably because I was h5committest from heiwa due
to some ssh wierdness. In any case a manual reconfigure run on
jam seemed to work fine.
Also, in h5committest, I ran into some data conversion warnings.
I didn't worry about them as the only code I changed was in testpar.