Fix an issue where compound datatype member IDs can be leaked during conversion (#4459) (#4484)

* Fix an issue where compound datatype member IDs can be leaked during conversion (#4459)

Also fixes issues with handling of partially initialized datatypes during library shutdown

* Fix dead documentation links in 4 .md files.
This commit is contained in:
jhendersonHDF 2024-05-13 21:21:56 -05:00 committed by GitHub
parent f0ecc8bc26
commit 56da09d57a
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
8 changed files with 274 additions and 40 deletions

View File

@ -122,7 +122,7 @@ Please make sure that you check the items applicable to your pull request:
* [ ] If changes were done to Autotools build, were they added to CMake and vice versa?
* [ ] Is the pull request applicable to any other branches? If yes, which ones? Please document it in the GitHub issue.
* [ ] Is the new code sufficiently documented for future maintenance?
* [ ] Does the new feature require a change to an existing API? See "API Compatibility Macros" document (https://docs.hdfgroup.org/hdf5/v1_14_4/api-compat-macros.html)
* [ ] Does the new feature require a change to an existing API? See "API Compatibility Macros" document (https://docs.hdfgroup.org/hdf5/v1_14/v1_14_4/api-compat-macros.html)
* Documentation
* [ ] Was the change described in the release_docs/RELEASE.txt file?
* [ ] Was the new function documented in the corresponding public header file using [Doxygen](https://hdfgroup.github.io/hdf5/v1_14_4/_r_m_t.html)?

View File

@ -21,15 +21,15 @@ DOCUMENTATION
-------------
This release is fully functional for the API described in the documentation.
https://docs.hdfgroup.org/hdf5/v1_14_4/_l_b_a_p_i.html
https://docs.hdfgroup.org/hdf5/v1_14/v1_14_4/_l_b_a_p_i.html
Full Documentation and Programming Resources for this release can be found at
https://docs.hdfgroup.org/hdf5/v1_14_4/index.html
https://docs.hdfgroup.org/hdf5/v1_14/v1_14_4/index.html
The latest doxygen documentation generated on changes to HDF5 1.14.x is available at:
https://hdfgroup.github.io/hdf5/
https://hdfgroup.github.io/hdf5/v1_14
See the [RELEASE.txt](release_docs/RELEASE.txt) file in the [release_docs/](release_docs/) directory for information specific
to the features and updates included in this release of the library.

View File

@ -79,7 +79,7 @@ participate in the collective write call.
## Multi-dataset I/O support
The parallel compression feature is supported when using the
multi-dataset I/O API routines ([H5Dwrite_multi](https://hdfgroup.github.io/hdf5/group___h5_d.html#gaf6213bf3a876c1741810037ff2bb85d8)/[H5Dread_multi](https://hdfgroup.github.io/hdf5/group___h5_d.html#ga8eb1c838aff79a17de385d0707709915)), but the
multi-dataset I/O API routines ([H5Dwrite_multi](https://hdfgroup.github.io/hdf5/v1_14_4/group___h5_d.html#gaf6213bf3a876c1741810037ff2bb85d8)/[H5Dread_multi](https://hdfgroup.github.io/hdf5/v1_14_4/group___h5_d.html#ga8eb1c838aff79a17de385d0707709915)), but the
following should be kept in mind:
- Parallel writes to filtered datasets **must** still be collective,
@ -99,7 +99,7 @@ following should be kept in mind:
## Incremental file space allocation support
HDF5's [file space allocation time](https://hdfgroup.github.io/hdf5/group___d_c_p_l.html#ga85faefca58387bba409b65c470d7d851)
HDF5's [file space allocation time](https://hdfgroup.github.io/hdf5/v1_14_4/group___d_c_p_l.html#ga85faefca58387bba409b65c470d7d851)
is a dataset creation property that can have significant effects
on application performance, especially if the application uses
parallel HDF5. In a serial HDF5 application, the default file space
@ -118,7 +118,7 @@ While this strategy has worked in the past, it has some noticeable
drawbacks. For one, the larger the chunked dataset being created,
the more noticeable overhead there will be during dataset creation
as all of the data chunks are being allocated in the HDF5 file.
Further, these data chunks will, by default, be [filled](https://hdfgroup.github.io/hdf5/group___d_c_p_l.html#ga4335bb45b35386daa837b4ff1b9cd4a4)
Further, these data chunks will, by default, be [filled](https://hdfgroup.github.io/hdf5/v1_14_4/group___d_c_p_l.html#ga4335bb45b35386daa837b4ff1b9cd4a4)
with HDF5's default fill data value, leading to extraordinary
dataset creation overhead and resulting in pre-filling large
portions of a dataset that the application might have been planning
@ -126,7 +126,7 @@ to overwrite anyway. Even worse, there will be more initial overhead
from compressing that fill data before writing it out, only to have
it read back in, unfiltered and modified the first time a chunk is
written to. In the past, it was typically suggested that parallel
HDF5 applications should use [H5Pset_fill_time](https://hdfgroup.github.io/hdf5/group___d_c_p_l.html#ga6bd822266b31f86551a9a1d79601b6a2)
HDF5 applications should use [H5Pset_fill_time](https://hdfgroup.github.io/hdf5/v1_14_4/group___d_c_p_l.html#ga6bd822266b31f86551a9a1d79601b6a2)
with a value of `H5D_FILL_TIME_NEVER` in order to disable writing of
the fill value to dataset chunks, but this isn't ideal if the
application actually wishes to make use of fill values.
@ -220,14 +220,14 @@ chunks to end up at addresses in the file that do not align
well with the underlying file system, possibly leading to
poor performance. As an example, Lustre performance is generally
good when writes are aligned with the chosen stripe size.
The HDF5 application can use [H5Pset_alignment](https://hdfgroup.github.io/hdf5/group___f_a_p_l.html#gab99d5af749aeb3896fd9e3ceb273677a)
The HDF5 application can use [H5Pset_alignment](https://hdfgroup.github.io/hdf5/v1_14_4/group___f_a_p_l.html#gab99d5af749aeb3896fd9e3ceb273677a)
to have a bit more control over where objects in the HDF5
file end up. However, do note that setting the alignment
of objects generally wastes space in the file and has the
potential to dramatically increase its resulting size, so
caution should be used when choosing the alignment parameters.
[H5Pset_alignment](https://hdfgroup.github.io/hdf5/group___f_a_p_l.html#gab99d5af749aeb3896fd9e3ceb273677a)
[H5Pset_alignment](https://hdfgroup.github.io/hdf5/v1_14_4/group___f_a_p_l.html#gab99d5af749aeb3896fd9e3ceb273677a)
has two parameters that control the alignment of objects in
the HDF5 file, the "threshold" value and the alignment
value. The threshold value specifies that any object greater
@ -264,19 +264,19 @@ in a file, this can create significant amounts of free space
in the file over its lifetime and eventually cause performance
issues.
An HDF5 application can use [H5Pset_file_space_strategy](https://hdfgroup.github.io/hdf5/group___f_c_p_l.html#ga167ff65f392ca3b7f1933b1cee1b9f70)
An HDF5 application can use [H5Pset_file_space_strategy](https://hdfgroup.github.io/hdf5/v1_14_4/group___f_c_p_l.html#ga167ff65f392ca3b7f1933b1cee1b9f70)
with a value of `H5F_FSPACE_STRATEGY_PAGE` to enable the paged
aggregation feature, which can accumulate metadata and raw
data for dataset data chunks into well-aligned, configurably
sized "pages" for better performance. However, note that using
the paged aggregation feature will cause any setting from
[H5Pset_alignment](https://hdfgroup.github.io/hdf5/group___f_a_p_l.html#gab99d5af749aeb3896fd9e3ceb273677a)
[H5Pset_alignment](https://hdfgroup.github.io/hdf5/v1_14_4/group___f_a_p_l.html#gab99d5af749aeb3896fd9e3ceb273677a)
to be ignored. While an application should be able to get
comparable performance effects by [setting the size of these pages](https://hdfgroup.github.io/hdf5/group___f_c_p_l.html#gad012d7f3c2f1e1999eb1770aae3a4963) to be equal to the value that
would have been set for [H5Pset_alignment](https://hdfgroup.github.io/hdf5/group___f_a_p_l.html#gab99d5af749aeb3896fd9e3ceb273677a),
comparable performance effects by [setting the size of these pages](https://hdfgroup.github.io/hdf5/v1_14_4/group___f_c_p_l.html#gad012d7f3c2f1e1999eb1770aae3a4963) to be equal to the value that
would have been set for [H5Pset_alignment](https://hdfgroup.github.io/hdf5/v1_14_4/group___f_a_p_l.html#gab99d5af749aeb3896fd9e3ceb273677a),
this may not necessarily be the case and should be studied.
Note that [H5Pset_file_space_strategy](https://hdfgroup.github.io/hdf5/group___f_c_p_l.html#ga167ff65f392ca3b7f1933b1cee1b9f70)
Note that [H5Pset_file_space_strategy](https://hdfgroup.github.io/hdf5/v1_14_4/group___f_c_p_l.html#ga167ff65f392ca3b7f1933b1cee1b9f70)
has a `persist` parameter. This determines whether or not the
file free space manager should include extra metadata in the
HDF5 file about free space sections in the file. If this
@ -300,12 +300,12 @@ hid_t file_id = H5Fcreate("file.h5", H5F_ACC_TRUNC, fcpl_id, fapl_id);
While the parallel compression feature requires that the HDF5
application set and maintain collective I/O at the application
interface level (via [H5Pset_dxpl_mpio](https://hdfgroup.github.io/hdf5/group___d_x_p_l.html#ga001a22b64f60b815abf5de8b4776f09e)),
interface level (via [H5Pset_dxpl_mpio](https://hdfgroup.github.io/hdf5/v1_14_4/group___d_x_p_l.html#ga001a22b64f60b815abf5de8b4776f09e)),
it does not require that the actual MPI I/O that occurs at
the lowest layers of HDF5 be collective; independent I/O may
perform better depending on the application I/O patterns and
parallel file system performance, among other factors. The
application may use [H5Pset_dxpl_mpio_collective_opt](https://hdfgroup.github.io/hdf5/group___d_x_p_l.html#gacb30d14d1791ec7ff9ee73aa148a51a3)
application may use [H5Pset_dxpl_mpio_collective_opt](https://hdfgroup.github.io/hdf5/v1_14_4/group___d_x_p_l.html#gacb30d14d1791ec7ff9ee73aa148a51a3)
to control this setting and see which I/O method provides the
best performance.
@ -318,7 +318,7 @@ H5Dwrite(..., dxpl_id, ...);
### Runtime HDF5 Library version
An HDF5 application can use the [H5Pset_libver_bounds](https://hdfgroup.github.io/hdf5/group___f_a_p_l.html#gacbe1724e7f70cd17ed687417a1d2a910)
An HDF5 application can use the [H5Pset_libver_bounds](https://hdfgroup.github.io/hdf5/v1_14_4/group___f_a_p_l.html#gacbe1724e7f70cd17ed687417a1d2a910)
routine to set the upper and lower bounds on library versions
to use when creating HDF5 objects. For parallel compression
specifically, setting the library version to the latest available

View File

@ -499,6 +499,23 @@ Bug Fixes since HDF5-1.14.3 release
Library
-------
- Fixed a leak of datatype IDs created internally during datatype conversion
Fixed an issue where the library could leak IDs that it creates internally
for compound datatype members during datatype conversion. When the library's
table of datatype conversion functions is modified (such as when a new
conversion function is registered with the library from within an application),
the compound datatype conversion function has to recalculate data that it
has cached. When recalculating that data, the library was registering new
IDs for each of the members of the source and destination compound datatypes
involved in the conversion process and was overwriting the old cached IDs
without first closing them. This would result in use-after-free issues due
to multiple IDs pointing to the same internal H5T_t structure, as well as
crashes due to the library not gracefully handling partially initialized or
partially freed datatypes on library termination.
Fixes h5py GitHub #2419
- Fixed many (future) CVE issues
A partner organization corrected many potential security issues, which

View File

@ -1643,12 +1643,11 @@ H5T__unlock_cb(void *_dt, hid_t H5_ATTR_UNUSED id, void *_udata)
FUNC_ENTER_PACKAGE_NOERR
assert(dt);
assert(dt->shared);
if (H5T_STATE_IMMUTABLE == dt->shared->state) {
if (dt->shared && (H5T_STATE_IMMUTABLE == dt->shared->state)) {
dt->shared->state = H5T_STATE_RDONLY;
(*n)++;
} /* end if */
}
FUNC_LEAVE_NOAPI(SUCCEED)
} /* end H5T__unlock_cb() */
@ -1862,7 +1861,6 @@ H5T__close_cb(H5T_t *dt, void **request)
/* Sanity check */
assert(dt);
assert(dt->shared);
/* If this datatype is VOL-managed (i.e.: has a VOL object),
* close it through the VOL connector.
@ -4162,10 +4160,10 @@ H5T_close_real(H5T_t *dt)
FUNC_ENTER_NOAPI(FAIL)
/* Sanity check */
assert(dt && dt->shared);
assert(dt);
/* Clean up resources, depending on shared state */
if (dt->shared->state != H5T_STATE_OPEN) {
if (dt->shared && (dt->shared->state != H5T_STATE_OPEN)) {
if (H5T__free(dt) < 0)
HGOTO_ERROR(H5E_DATATYPE, H5E_CANTFREE, FAIL, "unable to free datatype");
@ -4202,10 +4200,9 @@ H5T_close(H5T_t *dt)
/* Sanity check */
assert(dt);
assert(dt->shared);
/* Named datatype cleanups */
if (dt->shared->state == H5T_STATE_OPEN) {
if (dt->shared && (dt->shared->state == H5T_STATE_OPEN)) {
/* Decrement refcount count on open named datatype */
dt->shared->fo_count--;

View File

@ -2241,21 +2241,32 @@ H5T__conv_struct_init(const H5T_t *src, const H5T_t *dst, H5T_cdata_t *cdata, co
if (need_ids) {
hid_t tid;
if ((tid = H5I_register(H5I_DATATYPE, priv->src_memb[i], false)) < 0) {
H5T__conv_struct_free(priv);
cdata->priv = NULL;
HGOTO_ERROR(H5E_DATATYPE, H5E_CANTREGISTER, FAIL,
"can't register ID for source compound member datatype");
/* Only register new IDs for the source and destination member datatypes
* if IDs weren't already registered for them. If the cached conversion
* information has to be recalculated (in the case where the library's
* table of conversion functions is modified), the same IDs can be reused
* since the only information that needs to be recalculated is the conversion
* paths used.
*/
if (priv->src_memb_id[i] == H5I_INVALID_HID) {
if ((tid = H5I_register(H5I_DATATYPE, priv->src_memb[i], false)) < 0) {
H5T__conv_struct_free(priv);
cdata->priv = NULL;
HGOTO_ERROR(H5E_DATATYPE, H5E_CANTREGISTER, FAIL,
"can't register ID for source compound member datatype");
}
priv->src_memb_id[i] = tid;
}
priv->src_memb_id[i] = tid;
if ((tid = H5I_register(H5I_DATATYPE, priv->dst_memb[src2dst[i]], false)) < 0) {
H5T__conv_struct_free(priv);
cdata->priv = NULL;
HGOTO_ERROR(H5E_DATATYPE, H5E_CANTREGISTER, FAIL,
"can't register ID for destination compound member datatype");
if (priv->dst_memb_id[src2dst[i]] == H5I_INVALID_HID) {
if ((tid = H5I_register(H5I_DATATYPE, priv->dst_memb[src2dst[i]], false)) < 0) {
H5T__conv_struct_free(priv);
cdata->priv = NULL;
HGOTO_ERROR(H5E_DATATYPE, H5E_CANTREGISTER, FAIL,
"can't register ID for destination compound member datatype");
}
priv->dst_memb_id[src2dst[i]] = tid;
}
priv->dst_memb_id[src2dst[i]] = tid;
}
} /* end if */
} /* end for */

View File

@ -1,7 +1,7 @@
# HDF5 API Tests
This directory contains several test applications that exercise HDF5's
public API and serve as regression tests for HDF5 [VOL Connectors](https://docs.hdfgroup.org/hdf5/v1_14_4/_h5_v_l__u_g.html).
public API and serve as regression tests for HDF5 [VOL Connectors](https://docs.hdfgroup.org/hdf5/v1_14/v1_14_4/_h5_v_l__u_g.html).
## Build Process and options
@ -42,7 +42,7 @@ Currently unsupported
These API tests currently only support usage with the native HDF5 VOL connector and HDF5 VOL
connectors that can be loaded dynamically as a plugin. For information on how to build a VOL
connector in this manner, refer to section 2.3 of the [HDF5 VOL Connector Author Guide](https://docs.hdfgroup.org/hdf5/v1_14_4/_v_o_l__connector.html).
connector in this manner, refer to section 2.3 of the [HDF5 VOL Connector Author Guide](https://docs.hdfgroup.org/hdf5/v1_14/v1_14_4/_v_o_l__connector.html).
TODO: section on building VOL connectors alongside HDF5 for use with tests

View File

@ -3913,6 +3913,214 @@ error:
return 1;
} /* end test_user_compound_conversion() */
/*-------------------------------------------------------------------------
* Function: test_compound_member_convert_id_leak_func1
*
* Purpose: Datatype conversion function for the
* test_compound_member_convert_id_leak test that just
* converts a float value to a double value with a cast.
*
* Return: Success: 0
* Failure: number of errors
*
*-------------------------------------------------------------------------
*/
static herr_t
test_compound_member_convert_id_leak_func1(hid_t src_id, hid_t dst_id, H5T_cdata_t *cdata, size_t nelmts,
size_t H5_ATTR_UNUSED buf_stride, size_t H5_ATTR_UNUSED bkg_stride,
void *buf, void H5_ATTR_UNUSED *bkg,
hid_t H5_ATTR_UNUSED dset_xfer_plist)
{
float tmp_val;
double tmp_val2;
switch (cdata->command) {
case H5T_CONV_INIT:
if (!H5Tequal(src_id, H5T_NATIVE_FLOAT) || !H5Tequal(dst_id, H5T_NATIVE_DOUBLE))
return FAIL;
break;
case H5T_CONV_CONV:
if (nelmts != 1)
return FAIL;
memcpy(&tmp_val, buf, sizeof(float));
tmp_val2 = (double)tmp_val;
memcpy(buf, &tmp_val2, sizeof(double));
break;
case H5T_CONV_FREE:
break;
default:
return FAIL;
}
return SUCCEED;
}
/*-------------------------------------------------------------------------
* Function: test_compound_member_convert_id_leak_func2
*
* Purpose: Datatype conversion function for the
* test_compound_member_convert_id_leak test that just
* returns the double value 0.1.
*
* Return: Success: 0
* Failure: number of errors
*
*-------------------------------------------------------------------------
*/
static herr_t
test_compound_member_convert_id_leak_func2(hid_t src_id, hid_t dst_id, H5T_cdata_t *cdata, size_t nelmts,
size_t H5_ATTR_UNUSED buf_stride, size_t H5_ATTR_UNUSED bkg_stride,
void *buf, void H5_ATTR_UNUSED *bkg,
hid_t H5_ATTR_UNUSED dset_xfer_plist)
{
double tmp_val = 0.1;
switch (cdata->command) {
case H5T_CONV_INIT:
if (!H5Tequal(src_id, H5T_NATIVE_FLOAT) || !H5Tequal(dst_id, H5T_NATIVE_DOUBLE))
return FAIL;
break;
case H5T_CONV_CONV:
if (nelmts != 1)
return FAIL;
memcpy(buf, &tmp_val, sizeof(double));
break;
case H5T_CONV_FREE:
break;
default:
return FAIL;
}
return SUCCEED;
}
/*-------------------------------------------------------------------------
* Function: test_compound_member_convert_id_leak
*
* Purpose: Tests for an issue where IDs that are registered for
* compound datatype members during datatype conversion were
* leaked when the library's conversion path table is modified
* and the compound conversion path recalculates its cached
* data.
*
* Return: Success: 0
* Failure: number of errors
*
*-------------------------------------------------------------------------
*/
static int
test_compound_member_convert_id_leak(void)
{
int64_t num_dtype_ids = 0;
float val1;
double val2;
double bkg;
hid_t tid1 = H5I_INVALID_HID;
hid_t tid2 = H5I_INVALID_HID;
TESTING("compound conversion member ID leak");
if ((tid1 = H5Tcreate(H5T_COMPOUND, sizeof(float))) < 0)
TEST_ERROR;
if ((tid2 = H5Tcreate(H5T_COMPOUND, sizeof(double))) < 0)
TEST_ERROR;
if (H5Tinsert(tid1, "mem", 0, H5T_NATIVE_FLOAT) < 0)
TEST_ERROR;
if (H5Tinsert(tid2, "mem", 0, H5T_NATIVE_DOUBLE) < 0)
TEST_ERROR;
/* Store the current number of datatype IDs registered */
if ((num_dtype_ids = H5I_nmembers(H5I_DATATYPE)) < 0)
TEST_ERROR;
/* Convert a value from float to double */
val1 = 3.0f;
val2 = 0.0;
memcpy(&val2, &val1, sizeof(float));
if (H5Tconvert(tid1, tid2, 1, &val2, &bkg, H5P_DEFAULT) < 0)
TEST_ERROR;
/* Make sure the number of datatype IDs registered didn't change */
if (num_dtype_ids != H5I_nmembers(H5I_DATATYPE))
TEST_ERROR;
/* Register a custom conversion function from float to double
* and convert the value again
*/
if (H5Tregister(H5T_PERS_HARD, "myflttodbl", H5T_NATIVE_FLOAT, H5T_NATIVE_DOUBLE,
test_compound_member_convert_id_leak_func1) < 0)
TEST_ERROR;
val1 = 3.0f;
val2 = 0.0;
memcpy(&val2, &val1, sizeof(float));
if (H5Tconvert(tid1, tid2, 1, &val2, &bkg, H5P_DEFAULT) < 0)
TEST_ERROR;
/* Since an application conversion function was used, two IDs should
* have been registered, one for the source type and one for the
* destination type
*/
num_dtype_ids += 2;
/* Make sure the number of datatype IDs registered is correct */
if (num_dtype_ids != H5I_nmembers(H5I_DATATYPE))
TEST_ERROR;
if (H5Tunregister(H5T_PERS_HARD, "myflttodbl", H5T_NATIVE_FLOAT, H5T_NATIVE_DOUBLE,
test_compound_member_convert_id_leak_func1) < 0)
TEST_ERROR;
/* Register a different custom conversion function from float to double
* and convert the value again
*/
if (H5Tregister(H5T_PERS_HARD, "myflttodbl", H5T_NATIVE_FLOAT, H5T_NATIVE_DOUBLE,
test_compound_member_convert_id_leak_func2) < 0)
TEST_ERROR;
val1 = 3.0f;
val2 = 0.0;
memcpy(&val2, &val1, sizeof(float));
if (H5Tconvert(tid1, tid2, 1, &val2, &bkg, H5P_DEFAULT) < 0)
TEST_ERROR;
/* Make sure the number of datatype IDs registered didn't change */
if (num_dtype_ids != H5I_nmembers(H5I_DATATYPE))
TEST_ERROR;
if (H5Tunregister(H5T_PERS_HARD, "myflttodbl", H5T_NATIVE_FLOAT, H5T_NATIVE_DOUBLE,
test_compound_member_convert_id_leak_func2) < 0)
TEST_ERROR;
if (H5Tclose(tid1) < 0)
TEST_ERROR;
if (H5Tclose(tid2) < 0)
TEST_ERROR;
PASSED();
return 0;
error:
H5E_BEGIN_TRY
{
H5Tunregister(H5T_PERS_HARD, "myflttodbl", H5T_NATIVE_FLOAT, H5T_NATIVE_DOUBLE,
test_compound_member_convert_id_leak_func1);
H5Tunregister(H5T_PERS_HARD, "myflttodbl", H5T_NATIVE_FLOAT, H5T_NATIVE_DOUBLE,
test_compound_member_convert_id_leak_func2);
H5Tclose(tid1);
H5Tclose(tid2);
}
H5E_END_TRY;
return 1;
} /* end test_compound_member_convert_id_leak() */
/*-------------------------------------------------------------------------
* Function: test_query
*
@ -10198,6 +10406,7 @@ main(void)
nerrors += test_compound_17();
nerrors += test_compound_18();
nerrors += test_user_compound_conversion();
nerrors += test_compound_member_convert_id_leak();
nerrors += test_conv_enum_1();
nerrors += test_conv_enum_2();
nerrors += test_conv_bitfield();