If a corrupt file sets the page buffer size in the superblock to zero,
the library could attempt to divide by zero when allocating space in
the file. The library now checks for valid page buffer sizes when
reading the superblock message.
Fixes oss-fuzz issue 58762
* Fix bug in array conversion with strided background buffer. Convert some
memmove calls to non-overlapping buffers to memcpy.
* Revert inappropriate use of mempy to memmove in H5T__conv_array
* Add testing
* Add RELEASE.txt note and overwrite test case.
Add configure option to enable or disable extension features in general
Add configure option to enable or disable _Float16 support
Add new config options to various settings files
This API call sets the size of a file's page buffer cache. This call
was extremely strict about matching its parameters to the file strategy
and page size used to create the file, requiring a separate open of the
file to obtain these parameters.
These requirements have been relaxed when using the fapl to open
a previously-created file:
* When opening a file that does not use the H5F_FSPACE_STRATEGY_PAGE
strategy, the setting is ignored and the file will be opened, but
without a page buffer cache. This was previously an error.
* When opening a file that has a page size larger than the desired
page buffer cache size, the page buffer cache size will be increased
to the file's page size. This was previously an error.
The behavior when creating a file using H5Pset_page_buffer_size() is
unchanged.
Fixes GitHub issue #3382
H5PB_read previously did not account for the fact that the size of the
read it's performing could overflow the page buffer pointer, depending
on the calculated offset for the read. This has been fixed by adjusting
the size of the read if it's determined that it would overflow the page.
When the H5_addr_overlap macro was updated to use H5_RANGE_OVERLAP,
it failed to take into account that H5_RANGE_OVERLAP expects the
range to be inclusive. This lead to an assertion failure in
H5MM_memcpy due to a memcpy operation on overlapping memory.
This has been fixed by subtracting 1 from the calculated high
bound values passed to H5_RANGE_OVERLAP
* Remove an error check regarding large cache objects
In PR#4231 an assert() call was converted to a normal HDF5 error
check. It turns out that the original assert() was added by a
developer as a way of being alerted that large cache objects
existed instead of as a guard against incorrect behavior, making
it unnecessary in either debug or release builds.
The error check has been removed.
* Update RELEASE.txt
Both H5Dchunk_iter() and H5Dget_chunk_info(_by_coord)() did not take
the size of the user block into account when reporting addresses. Since
the #1 use of these functions is to root around in the file for the raw
data, this is kind of a problem.
Fixes GitHub issue #3003
If the library tries to load a metadata object that is above the
library's hard-coded limits, the size will trip an assert in debug
builds. In HDF5 1.14.4, this can happen if you create a very large
number of links in an old-style group that uses local heaps.
The library will now emit a normal error when it tries to load a
metadata object that is too large.
Partially addresses GitHub #3762
Changes Autotools testing to use HDF5_TEST_DRIVER environment
variable to avoid running tests that don't work well with several
VFDs
Restores old h5_get_vfd_fapl() testing function to setup a FAPL
with a particular VFD
Adds a macro for the default VFD name
* Fix issue with Subfiling VFD and multiple opens of same file
* Update H5_subfile_fid_to_context to return error value instead of ID
* Add helper routine to initialize open file mapping
The buffers passed to stat-like calls are only partially filled in by
the call, leaving ununitialized memory areas when the stat buffers are
created on the stack.
This change memsets the buffers to 0 before the stat calls, quieting
the -fsanitze=memory complaints.
* Fixes detection of various Windows libraries, etc.
* Corrects alarm(2) configure checks
* Uses Win32 threads by default w/ Pthreads override, if desired
* Set _WIN32_WINNT correctly for MinGW
* Fix setenv(3) wrapper for MinGW, which does not have getenv_s()
MinGW Autotools support is still not Amazing, but this at least
allows the library and tools build and is better about thread-safety
Removes some datatype copying calls that are now unnecessary after
refactoring the datatype conversion code to use pointers internally
rather than IDs
Rewrites the enum conversion function so that it uses cached copies
of the source and destination datatypes in order to avoid modifying
the datatypes passed in
Adds a 'recursive' field to the datatype conversion context which
allows the conversion functions for members of a container datatype
to skip unnecessary repetitive conversion setup code
Changes internal datatype conversion callback functions so that the
source and destination datatype structure pointers are const
Removes some unused and unnecessary internal IDs registered with
H5I_register
Fixed some conversion issues with Clang due to problematic undefined
behavior when casting a negative floating-point value to an integer
Fixed a bug in the library's software integer to floating-point
conversion function where a user's conversion exception function
returning H5T_CONV_UNHANDLED in the case of overflows would result in
incorrect data after conversion
Added configure checks for functions and macros related to _Float16
usage since some compilers expose the datatype but not the functions or
macros
Fixed a dt_arith test failure when H5_WANT_DCONV_EXCEPTION isn't defined
Fixed a few warnings from not explicitly casting some _Float16 variables
upwards
The reference manual states that the offset parameter of H5Soffset_simple()
can be set to NULL to reset the offset of a simple dataspace to 0. This
has never been true, and passing NULL was regarded as an error.
The library will now accept NULL for the offset parameter and will
correctly set the offset to zero.
Fixes HDFFV-9299
On Windows, HDF5 attempted to convert file paths passed to open() and
remove() to UTF-16 in order to handle Unicode file paths. This scheme
does not work when the system uses code pages to handle non-ASCII
file names.
As suggested in the forum post below, we now also try to see if we
can open the file with open(), which should handle systems where
non-ASCII code pages are in use.
https://forum.hdfgroup.org/t/open-create-hdf5-files-with-non-utf8-chars-such-as-shift-jis/11785
Externally visible:
* The HDF_ENABLE_LARGE_FILE option (advanced) has been removed
* We no longer run a test program to determine if LFS works, which
will help with cross-compiling
* On Linux we now unilaterally set -D_LARGEFILE_SOURCE and
-D_FILE_OFFSET_BITS=64, regardless of 32/64 bit system. CMake
doesn't offer a nice equivalent to AC_SYS_LARGEFILE and since
those options do nothing on 64-bit systems, this seems safe and
covers all our bases. We don't set -D_LARGEFILE64_SOURCE since
we don't use any of the POSIX 64-bit specific API calls like
ftello64, as noted above.
* We didn't test for LFS support on non-Linux platforms. We've added
comments for how LFS should probably be supported on AIX and Solaris,
which seem to be alive, though uncommon. PRs would be appreciated if
anyone wishes to test this.
Internal:
* Drops off64_t size checks since this is unused (as in Autotools)
* Remove HDF_EXTRA_FLAGS, which is now unused
* Remove hack around deprecated LINUX_LFS
Fixes#2395
Makes the datatype conversion context object available during both the
initialization and conversion processes for a datatype conversion
function, allowing the compound, variable-length and array datatype
conversion functions to avoid creating IDs for the datatypes when they
aren't necessary
Adds internal H5CX_pushed routine to determine if an API context is
available to retrieve values from
Also adds error checking to several places in H5T.c and H5Tconv.c where
the code had previously assumed object close operations would succeed
Switches assert() calls to HGOTO_ERROR in H5B__assert() so it can be
used in production mode. Also renames it to H5B__verify_structure()
to better reflect what it checks.
The datatype conversion code previously used IDs for the source and
destination datatypes rather than pointers to the internal structures
for those datatypes. This was mostly due to the need for an ID for these
datatypes that can be passed to an application-registered datatype
conversion function or datatype conversion exception function. However,
using IDs internally caused a lot of unnecessary ID lookups and hurt
performance of datatype conversions in general. This was especially
problematic for compound datatype conversions, where the ID lookups were
occuring on every member of every compound element of a dataset. The
code has now been refactored to use pointers internally and only create
IDs for datatypes when necessary.
Fixed a test issue in dt_arith where a library datatype conversion
function was being cast to an application conversion function. Since the
two have different prototypes, this started failing after the parameters
for a library conversion function changed from hid_t to H5T_t * and an
extra parameter was added. This appears to have worked coincidentally in
the past since the only different between a library conversion function
and application conversion function was an extra DXPL parameter at the
end of an application conversion function
Fixed an issue where memory wasn't being freed in the h5fc_chk_idx test
program. Even though the program exits quickly after allocating the
memory, it still causes failures when testing with -fsanitize=address
H5Z used the soon-to-be-removed HDEBUG macro to decide if stats
would be dumped and to what stream. This is now handled by a
DUMP_DEBUG_STATS_g variable and the output is always sent to
stdout.
This is an internal change, not normally visible to users.
The H5B (version 1 B-tree) package would add some computationally
expensive integrity checks when H5B_DEBUG was defined. Due to their
negative effects on performance, this option was rarely turned on,
making the H5B__assert() check function stale, if not dead, code.
This change:
* Builds H5B__assert() when NDEBUG is not defined (the function
relies on assert()) so it gets compiled more often.
* Removes some printf debugging statements in the B-tree code
* Removes all H5B "extra debug" checks that are leftover from
past debugging sessions. Maintainers can add H5B__assert()
selectively to perform integrity checks when debugging.
* Removes the HDF5_ENABLE_DEBUG_H5B CMake option
H5B_DEBUG now has no effect
When clearing out datatype conversion paths involving variable-length or reference datatypes
on file close, also check for these datatypes inside compound or array datatypes
* Fixed asserts due to H5Pset_est_link_info() values
If large values for est_num_entries and/or est_name_len were passed
to H5Pset_est_link_info(), the library would attempt to create an
object header NIL message to reserve enough space to hold the links in
compact form (i.e., concatenated), which could exceed allowable object
header message size limits and trip asserts in the library.
This bug only occurred when using the HDF5 1.8 file format or later and
required the product of the two values to be ~64k more than the size
of any links written to the group, which would cause the library to
write out a too-large NIL spacer message to reserve the space for the
unwritten links.
The library now inspects the phase change values to see if the dataset
is likely to be compact and checks the size to ensure any NIL spacer
messages won't be larger than the library allows.
Fixes GitHub #1632
* Fix copy-paste comments
The bin/trace script adds TRACE macros to public API calls in the main
C library. This script had a parsing bug that caused functions that
were annotated with /*out*/, etc. to be labeled as void pointers
instead of typed pointers.
This is mainly a developer feature and not visible to consumers
of the public API.
The bin/trace script now annotates public API calls properly.
Fixes GH #3733
Microsoft has added a new, standards-conformant preprocessor
to MSVC, which can be enabled with /Zc:preprocessor. This
preprocessor trips over our HDopen() function-like variadic
macro since it uses a hack that only works with the legacy
MSVC preprocessor.
This fix adds ifdefs to use the correct HDopen() macro
depending on the MSVC preprocessor selected.
Fixes#2515
H5Tset_fields did not account for any offset in a floating-point datatype,
causing it to fail when a datatype's precision is correctly set such that
it doesn't include the offset bits.
When the H5LT_FILE_IMAGE_DONT_COPY flag is passed to H5LTopen_file_image, the internally-allocated
udata structure gets leaked as the core file driver doesn't have a way to determine when or if it
needs to call the 'udata_free' callback. This has been fixed by freeing the udata structure when
the 'image_free' callback gets made during file close, where the file is holding the last reference
to the udata structure.
H5F_get_access_plist previously did not copy over the file locking settings
from a file into the new File Access Property List that it creates. This would
make it difficult to match the file locking settings between an external file
and its parent file.
* Replaced last sprintf with snprintf
To have the size of the buffer, it was required to change a function signature, and change all users of it.
In most cases, determining the buffer size wasn't trivial and so SIZE_MAX is passed. But at least this improves the infrastructure. Someone can later figure out the correct sizes.
* Fix for github issue #2414: segfault when copying dataset with attributes.
This also fixes github issue #3241: segfault when copying dataset.
Need to set the location via H5T_set_loc() of the src datatype
when copying dense attributes.
Otherwise the vlen callbacks are not set up therefore causing seg fault
when doing H5T_convert() -> H5T__conv_vlen().
* Switch warnings as errors to default OFF
* Enable mac docs
* Add doxygen action uses step
* Use html div around snippet
* Allow preset name to be an argument to cmake-ctest.yml
Remove cached datatype conversion path table entries on file close
When performing datatype conversions during I/O, the library
checks to see whether it can re-use a cached datatype conversion
pathway by performing comparisons between the source and destination
datatypes of the current operation and the source and destination
datatypes associated with each cached datatype conversion pathway.
For variable-length and reference datatypes, a comparison is made
between the VOL object for the file associated with these datatypes,
which may change as a file is closed and reopened. In workflows
involving a loop that opens a file, performs I/O on an object with a
variable-length or reference datatype and then closes the file, this
can lead to constant memory usage growth as the library compares the
file VOL objects between the datatypes as different and adds a new
cached conversion pathway entry on each iteration during I/O. This is
now fixed by clearing out any cached conversion pathway entries for
variable-length or reference datatypes associated with a particular
file when that file is closed.
off_t is a 32-bit signed value on Windows, so we should use HDoff_t
(which is __int64 on Windows) internally instead.
Also defines HDftell on Windows to be _ftelli64().
* Add 'warning density' computation to the warnhist script, along with several
cleanups to it. Add "--enable-show-all-warnings" configure (and CMake)
option to disable compiler diagnostic suppression (and therefore show all the
otherwise suppressed compiler diagnostics), disabled by default. Clean up
a buncn of misc. warnings.
Signed-off-by: Quincey Koziol <qkoziol@amazon.com>
Vector I/O requests are now processed within a single
set of I/O call batches, rather than each I/O vector
entry (tuple constructed from the types, addrs, sizes
and bufs arrays) being processed individually. This allows I/O to be
more efficiently parallelized among the I/O concentrator processes
during large I/O requests.
* Fixed some calculations and add test cases for issues spotted from review
* Removed a variable that was compensating for previous miscalculations
* Added missing \since tags to H5D.
* Committing clang-format changes
* Fixed H5T version info.
* Committing clang-format changes
* Added missing version info to H5E.
* Committing clang-format changes
* Added version info to H5F public APIs.
* Committing clang-format changes
* Added missing H5Z public API version info.
* Added missing version info to H5G public APIs
* Added missing version info to H5I public API.
* Added missing version info to H5 public APIs
* Committing clang-format changes
* Added missing version info to H5P public APIs
* Added missing version info to H5R public APIs
* Fix comment error.
* Committing clang-format changes
---------
Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
* Changes for ECP-344: Implement selection vector I/O with collective chunk filling.
Also fix a bug in H5FD__mpio_write_vector() to account for fixed size optimization
when computing max address.
* Fixes based on PR review comments:
For H5Dchunk.c: fix H5MM_xfree()
For H5FDmpio.c:
1) Revert the fix to H5FD__mpio_write_vector()
2) Apply the patch from Neil on the proper length of s_sizes reported by H5FD__mpio_vector_build_types()
* Put back the logic of dividing up the work among all the mpi ranks similar to the
original H5D__chunk_collective_fill() routine.
* Add a test to verify the fix for the illegal reference problem in H5FD__mpio_write_vector().
* Make filter callbacks use top-level API functions
When using VOL connectors, H5I_iterate may not provide
valid object pointers to its callback. This change keeps
existing functionality in H5Zunregister() without using
potentially unsafe pointers.
* Filter callbacks use internal API
* Skip MPI work on non-native VOL
The H5T floating-point datatype initialization code can raise exceptions when handling signaling NaNs. This change disables FE_INVALID exceptions during initialization.
Also removes the -ieee=full change for NAG Fortran as that shouldn't be necessary anymore.
Fixes#3831
The parallel compression test code tests for the case where all MPI ranks have no selection in a dataset when writing to it. Add an early exit to the code to avoid attempting to use a NULL pointer due to there being no work to do.
When opening a file with the core VFD and a file image, if the file
already exists, the file check would leak the POSIX file handle.
Fixes GitHub issue #635
Allow H5Pset_evict_on_close to be called regardless of whether a parallel build of HDF5 is being used
Fail during file opens if H5Pset_evict_on_close has been set to true on the given File Access Property List and the size of the MPI communicator being used is greater than 1
Add functions/callbacks for explicit control over chunk index open/close
Add functions/callbacks to check if chunk index is open or not so
that it can be opened if necessary before temporarily disabling
collective metadata reads in the library
Add functions/callbacks for requesting loading of additional chunk
index metadata beyond the chunk index itself
Adds a small cache of the first N bytes of a file opened with the
read-only S3 (ros3) VFD, where N is 4kiB or the size of the file,
whichever is smaller. This avoids a lot of small I/O operations
on file open.
Addresses GitHub issue #3381
This function allows the user to determine if the library performed selection I/O, vector I/O, or scalar (legacy) I/O during the last HDF5 operation performed with the provided DXPL. Expanded existing tests to check this functionality.
In the ros3 VFD, passing an empty string parameter to an internal
API call could result in accessing the -1th element of a string.
This would cause failures on big-endian systems like s390x.
This parameter is now checked before writing to the string.
Fixes GitHub #1168
MPICH defines MPI_STATUSES_IGNORE (a pointer) to 1, which raises warnings
w/ gcc. This is a known issue that the MPICH devs are not going to fix.
See here:
https://github.com/pmodels/mpich/issues/5687
This fix suppresses those issues w/ gcc
This macro was an attempt to quiet warnings about release mode unused
variables that only appear in asserts. It resolves to a void cast, which
doesn't quiet warnings when an assignment has already taken place.
A strncpy call in a path construction call used the size of the src
buffer instead of the dest buffer as the limit n.
This was switched to use the dest size and properly terminate the
string if truncation occurs.