Task for HDFFV-7862 - Select data by chunk direction to improve performance in h5repack
Description:
h5repack sometimes became very slow when handling big chunked datasets in
certain cases. (when chunk boundary doesn't match with a hyperslab boundary.)
The main issue was from figuring out a hypeslab without considering chunk
boundary to read from and write to such datasets.
The update was made to figure out a better hyperslab unit with considering
chunk boundary to improve performance for such cases prior to the update.
Tested:
jam (linux32-LE), koala (linux64-LE), ostrich (linuxppc64-BE), tejeda (mac32-LE), linew (solaris-BE), Windows
- h5repack: h5repack failed to copy dataset if the layout is changed from chunked with
unlimited dims to contiguous. (PC -- 2011/07/15)
- h5diff: "--delta" option considers two NaN of the same type are different, which is wrong
based on http://www.hdfgroup.org/HDF5/doc/RM/Tools.html#Tools-Diff. (PC -- 2011/07/15)
Change to use HDxxx macros.
Description:
Originally this started for fixing incorrect pointer usage. But that got
fixed through coverity merge. So this is mainly changing to use HDxxx
macros and clean up some related code.
Tested:
jam (linux32-LE), amani (linux64-LE), heiwa (linuxppc64-BE), tejeda (mac32-LE), linew (solaris-BE)
Clean up Coverity warnings, and fix some style issues:
r19735:
Fix for memory leak in test/mf found by valgrind.
r19736:
Fix memory leak in h5repack. The buffer in copy_objects, when copying the
entire dataset at once, was not checked for the presence of a vlen, and vlen
storage was never reclaimed. Added check and call to H5D_vlen_reclaim().
r19772:
Change H5assert() to
if (H5T_VLEN != src->shared->type || H5T_VLEN != dst->shared->type)
HGOTO_ERROR(H5E_ARGS, H5E_BADTYPE, FAIL, "not a H5T_VLEN datatype")
r19774:
removed unused priv.
r19775:
removed unused variables
r19778:
Fix memory leak comparing for variable length data types.
r19834:
Fixed memory leaks found by valgrind. Memory errors remain for another day.
Tested on:
Mac OS X/32 10.6.6 (amazon) w/debug & production
(h5committested on branch)
Additional fix for Bug 1821 - h5repack outputs compression information where it is not supposed to
Description:
Remove unnecessary lopping code, which only lowers performance.
Tested:
jam, amani, heiwa
Fix for Bug1896 h5repack - changing layout to COMPACT does not work
Description:
Make h5repack be able to convert a layout to COMPACT for small size dataset as default. Also add verifying layout changes in our test script.
Tested:
jam, amani, heiwa
Bring changes from Coverity branch back to trunk:
r19079 & 19080:
[BZ1942] h5dump -u to generate XML, it does not respect the -m option
xml version of dump_data function didn't check for use of fp_format variable.
Added new test expected file for committed bug 1942
r19103, 19104 & 19105:
[BZ1821] h5repack -v did not display correct output for a selected compression. Needed new test for comparing output of -v option.
Added new test file for solution to BZ1821
BZ1821 - Bring test changes from the shell script actually used.
Tested on:
Mac OS X/32 10.6.4 (amazon) debug & production
(h5committested on branch)
Enable tools lib to be built as a dll on windows. Added two get/set functions for progname and d_status.
Also add windows import/export declarations to functions.
Updated error_mesg() and warn_mesg() to remove progname argument and use get functions
Tested:
Windows, linux
Fix for the bug1726 - NPOESS: h5repack loses attributes for datasets of
type H5T_REFERENCE.
Description:
include test cases.
also test cases for attribute with object and region reference.
Tested:
jam, amani, linew
Bring changes from file free space branch back to the trunk. *yay!*
Tested on:
FreeBSD/32 6.3 (duty) in debug mode
FreeBSD/64 6.3 (liberty) w/C++ & FORTRAN, in debug mode
Linux/32 2.6 (jam) w/PGI compilers, w/default API=1.8.x,
w/C++ & FORTRAN, w/threadsafe, in debug mode
Linux/64-amd64 2.6 (smirom) w/Intel compilers, w/default API=1.6.x,
w/C++ & FORTRAN, in production mode
Solaris/32 2.10 (linew) w/deprecated symbols disabled, w/C++ & FORTRAN,
w/szip filter, in production mode
Linux/64-ia64 2.6 (cobalt) w/Intel compilers, w/C++ & FORTRAN,
in production mode
Linux/64-ia64 2.4 (tg-login3) w/parallel, w/FORTRAN, in debug mode
Linux/64-amd64 2.6 (abe) w/parallel, w/FORTRAN, in production mode
Mac OS X/32 10.5.8 (amazon) in debug mode
Mac OS X/32 10.5.8 (amazon) w/C++ & FORTRAN, w/threadsafe,
in production mode
Bring back various minor code cleanups from the file free space branch
Tested on:
FreeBSD/32 6.3 (duty) in debug mode
FreeBSD/64 6.3 (liberty) w/C++ & FORTRAN, in debug mode
Linux/32 2.6 (kagiso) w/PGI compilers, w/C++ & FORTRAN, w/threadsafe,
in debug mode
Linux/64-amd64 2.6 (smirom) w/Intel compilers w/default API=1.6.x,
w/C++ & FORTRAN, in production mode
Solaris/32 2.10 (linew) w/deprecated symbols disabled, w/C++ & FORTRAN,
w/szip filter, in production mode
Linux/64-ia64 2.6 (cobalt) w/Intel compilers, w/C++ & FORTRAN,
in production mode
Linux/64-ia64 2.4 (tg-login3) w/parallel, w/FORTRAN, in production mode
Linux/64-amd64 2.6 (abe) w/parallel, w/FORTRAN, in production mode
Mac OS X/32 10.5.8 (amazon) in debug mode
Mac OS X/32 10.5.8 (amazon) w/C++ & FORTRAN, w/threadsafe,
in production mode
Description:
h5repack previously would not take named datatypes into consideration when copying
datasets and attributes. This would cause extra anonymous datatypes in the target file
at best, and cause errors halfway through the repacking at worst. h5repack should now
always handle named datatypes correctly. Named datatypes are also now converted to the
native type when -n is given.
Tested: jam, linew, smirom (h5committest)
ISSUE : h5repack does not handle group creation order flags.
ACTION: call H5P(g)(s)et_link_creation_order functions when handling groups, add new groups with these flags to the test generation program, and verify results in the test program.
TEST: in the test program, function that compares property lists, added code to verify groups
tested: windows, linux, solaris
ISSUE : the tools use the following formula to read by hyperslabs: hyperslab_size[i] = MIN( dim_size[i], H5TOOLS_BUFSIZE / datum_size) where H5TOOLS_BUFSIZE is a constant defined of 1024K. This is OK as long as the datum_size does not exceed 1024K, otherwise we have a hyperslab size of 0 (since 1024K/(greater than 1024K) = 0). This affects h5dump. h5repack, h5diff
SOLUTION: add a check for a 0 size and define as 1 if so.
TEST FOR H5DUMP: Defined a case in the h5dump test generator program of such a type (an array type of doubles with a large array dimension, that was the case the user reported). Since the written file commited in svn would be around 1024K, opted for not writing the data (the part of the code where the hyperslab is defined is executed, since h5dump always reads the files). Defined a macro WRITE_ARRAY to enable such writing if needed. Added a run on the h5dump shell script. Added 2 new files to svn: tools/testfiles/tarray8.ddl, tools/testfiles/tarray8.h5. NOTE: while doing this I thought of adding this dataset case to an existing file, but that would add the large array output to those files (the ddls). The issue is that the file list is increasing.
TEST FOR H5DIFF: for h5diff the check for reading by hyperslabs is H5TOOLS_MALLOCSIZE (128 * H5TOOLS_BUFSIZE) or 128 Mb. This makes it not possible to add such a file to svn, so used the same method as h5dump (only write the dataset if WRITE_ARRAY is defined). As opposed to h5dump, the hyperslab code is NOT executed when the dataset is empty (dataset is not read). Added the new dataset to existing files and shell run (tools/h5diff/testfiles/h5diff_dset1.h5 and tools/h5diff/testfiles/h5diff_dset2.h5 and output in tools/h5diff/testfiles/h5diff_80.txt).
TEST FOR H5REPACK: similar issue as h5diff with the difference that the hyperslab code is run. Added a run to the shell script (with a filter, otherwise the code uses H5Ocopy).
tested: linux (h5commitest failed , apparently it did not detect the code changes in /tools/lib that fix the bug: the error in an assertion in the hyperslab of 0. I am sure that making h5ccomitest --distclean will detect the new code , but don't want to wait more 3 hours :-) )
Remove trailing whitespace from C/C++ source files, with the following
script:
foreach f (*.[ch] *.cpp)
sed 's/[[:blank:]]*$//' $f > sed.out && mv sed.out $f
end
Tested on:
Mac OS X/32 10.5.5 (amazon)
No need for h5committest, just whitespace changes...
-t T, --threshold=T Threshold value for H5Pset_alignment
-a A, --alignment=A Alignment value for H5Pset_alignment
2) bug fix
the printing of the dataset name was not done for references (verbose mode)
tested: windows, linux
Add a userblock to an HDF5 file during the repack. The user gives
give a filename and userblock size as command line parameters to
h5repack and the contents of that file are stored in the
userblock for the HDF5 file created by h5repack.
New flags to handle this -u and -b
Tested : windows, linux
Tweak the constants for the shared message flags to be equal to the
other flags useds for the underlying messages.
Tested on:
FreeBSD/32 6.2 (duty) in debug mode
FreeBSD/64 6.2 (liberty) w/C++ & FORTRAN, in debug mode
Linux/32 2.6 (kagiso) w/PGI compilers, w/C++ & FORTRAN, w/threadsafe,
in debug mode
Linux/64-amd64 2.6 (smirom) w/default API=1.6.x, w/C++ & FORTRAN,
in production mode
Linux/64-ia64 2.6 (cobalt) w/Intel compilers, w/C++ & FORTRAN,
in production mode
Solaris/32 2.10 (linew) w/deprecated symbols disabled, w/C++ & FORTRAN,
w/szip filter, in production mode
Mac OS X/32 10.4.10 (amazon) in debug mode
Linux/64-ia64 2.4 (tg-login3) w/parallel, w/FORTRAN, in production mode
now the minus sign shows there was a DECREASE in compression
percentage is calculated from
per = (b-a)/a;
where a= size of dataset before
b = size after
tested: windows, linux
Change H5Oget_info -> H5Oget_info_by_name and re-add H5Oget_info in a
simpler form for querying a particular object, to align with other new API
routines.
Tested on:
FreeBSD/32 6.2 (duty) in debug mode
FreeBSD/64 6.2 (liberty) w/C++ & FORTRAN, in debug mode
Linux/32 2.6 (kagiso) w/PGI compilers, w/C++ & FORTRAN, w/threadsafe,
in debug mode
Linux/64-amd64 2.6 (smirom) w/default API=1.6.x, w/C++ & FORTRAN,
in production mode
Linux/64-ia64 2.6 (cobalt) w/Intel compilers, w/C++ & FORTRAN,
in production mode
Solaris/32 2.10 (linew) w/deprecated symbols disabled, w/C++ & FORTRAN,
w/szip filter, in production mode
Mac OS X/32 10.4.10 (amazon) in debug mode
Changed H5Acreate2 -> H5Acreate_by_name, to be more consistent with
other new API routines.
Re-added simpler form of H5Acreate2, which creates attributes directly
on an object.
Tested on:
FreeBSD/32 6.2 (duty) in debug mode
FreeBSD/64 6.2 (liberty) w/C++ & FORTRAN, in debug mode
Linux/32 2.6 (kagiso) w/PGI compilers, w/C++ & FORTRAN, w/threadsafe,
in debug mode
Linux/64-amd64 2.6 (smirom) w/default API=1.6.x, w/C++ & FORTRAN,
in production mode
Linux/64-ia64 2.6 (cobalt) w/Intel compilers, w/C++ & FORTRAN,
in production mode
Solaris/32 2.10 (linew) w/deprecated symbols disabled, w/C++ & FORTRAN,
w/szip filter, in production mode
Mac OS X/32 10.4.10 (amazon) in debug mode
Make H5Pget_filter API versioned and switch internal usage to
H5Pget_filter2.
Add regression test for H5Pget_filter1.
Tested on:
FreeBSD/32 6.2 (duty) in debug mode
FreeBSD/64 6.2 (liberty) w/C++ & FORTRAN, in debug mode
Linux/32 2.6 (kagiso) w/PGI compilers, w/C++ & FORTRAN, w/threadsafe,
in debug mode
Linux/64-amd64 2.6 (smirom) w/default API=1.6.x, w/C++ & FORTRAN,
in production mode
Linux/64-ia64 2.6 (cobalt) w/Intel compilers, w/C++ & FORTRAN,
in production mode
Solaris/32 2.10 (linew) w/deprecated symbols disabled, w/C++ & FORTRAN,
w/szip filter, in production mode
Mac OS X/32 10.4.10 (amazon) in debug mode
Make H5Dopen versioned and change all internal usage to use H5Dopen2
Add simple regression test for H5Dopen1
Tested on:
FreeBSD/32 6.2 (duty) in debug mode
FreeBSD/64 6.2 (liberty) w/C++ & FORTRAN, in debug mode
Linux/32 2.6 (kagiso) w/PGI compilers, w/C++ & FORTRAN, w/threadsafe,
in debug mode
Linux/64-amd64 2.6 (smirom) w/default API=1.6.x, w/C++ & FORTRAN,
in production mode
Linux/64-ia64 2.6 (cobalt) w/Intel compilers, w/C++ & FORTRAN,
in production mode
Solaris/32 2.10 (linew) w/deprecated symbols disabled, w/C++ & FORTRAN,
w/szip filter, in production mode
Mac OS X/32 10.4.10 (amazon) in debug mode
Put H5Acreate() under API versioning, with all internal usage shifted
to H5Acreate2().
Add regression tests for H5Acreate1().
Tested on:
FreeBSD/32 6.2 (duty) in debug mode
FreeBSD/64 6.2 (liberty) w/C++ & FORTRAN, in debug mode
Linux/32 2.6 (kagiso) w/PGI compilers, w/C++ & FORTRAN, w/threadsafe,
in debug mode
Linux/64-amd64 2.6 (smirom) w/default API=1.6.x, w/C++ & FORTRAN,
in production mode
Linux/64-ia64 2.6 (cobalt) w/Intel compilers, w/C++ & FORTRAN,
in production mode
Solaris/32 2.10 (linew) w/deprecated symbols disabled, w/C++ & FORTRAN,
w/szip filter, in production mode
Mac OS X/32 10.4.10 (amazon) in debug mode
Make H5Topen versioned, and add regression test for H5Topen1.
Tested on:
FreeBSD/32 6.2 (duty) in debug mode
FreeBSD/64 6.2 (liberty) w/C++ & FORTRAN, in debug mode
Linux/32 2.6 (kagiso) w/PGI compilers, w/C++ & FORTRAN, w/threadsafe,
in debug mode
Linux/64 2.6 (smirom) w/default API=1.6.x, w/C++ & FORTRAN,
in production mode
Solaris/32 2.10 (linew) w/deprecated symbols disabled, w/C++ & FORTRAN,
w/szip filter, in production mode
AIX/32 5.3 (copper) w/FORTRAN, w/parallel, in production mode
Mac OS X/32 10.4.10 (amazon) in debug mode
Add regression test for h5repack with userblock
Tested on:
FreeBSD/32 6.2 (duty) in debug mode
FreeBSD/64 6.2 (liberty) w/C++ & FORTRAN, in debug mode
Linux/32 2.6 (kagiso) w/PGI compilers, w/C++ & FORTRAN, w/threadsafe,
in debug mode
Linux/64 2.6 (smirom) w/default API=1.6.x, w/C++ & FORTRAN,
in production mode
Solaris/32 2.10 (linew) w/deprecated symbols disabled, w/C++ & FORTRAN,
w/szip filter, in production mode
AIX/32 5.3 (copper) w/FORTRAN, w/parallel, in production mode
Mac OS X/32 10.4.10 (amazon) in debug mode