netcdf-c/libdispatch/dvlen.c

188 lines
5.6 KiB
C
Raw Normal View History

2011-07-14 20:21:03 +08:00
/*! \file
Functions for VLEN Types
Copyright 2018 University Corporation for Atmospheric
2011-07-14 20:21:03 +08:00
Research/Unidata. See \ref copyright file for more info. */
#include "ncdispatch.h"
/** \name Variable Length Array Types
Functions to create and learn about VLEN types. */
/*! \{ */ /* All these functions are part of this named group... */
/**
\ingroup user_types
Free an array of vlens given the number of elements and an array.
Improve performance of the nc_reclaim_data and nc_copy_data functions. re: Issue https://github.com/Unidata/netcdf-c/issues/2685 re: PR https://github.com/Unidata/netcdf-c/pull/2179 As noted in PR https://github.com/Unidata/netcdf-c/pull/2179, the old code did not allow for reclaiming instances of types, nor for properly copying them. That PR provided new functions capable of reclaiming/copying instances of arbitrary types. However, as noted by Issue https://github.com/Unidata/netcdf-c/issues/2685, using these most general functions resulted in a significant performance degradation, even for common cases. This PR attempts to mitigate the cost of using the general reclaim/copy functions in two ways. First, the previous functions operating at the top level by using ncid and typeid arguments. These functions were augmented with equivalent versions that used the netcdf-c library internal data structures to allow direct access to needed information. These new functions are used internally to the library. The second mitigation involves optimizing the internal functions by providing early tests for common cases. This avoids unnecessary recursive function calls. The overall result is a significant improvement in speed by a factor of roughly twenty -- your mileage may vary. These optimized functions are still not as fast as the original (more limited) functions, but they are getting close. Additional optimizations are possible. But the cost is a significant "uglification" of the code that I deemed a step too far, at least for now. ## Misc. Changes 1. Added a test case to check the proper reclamation/copy of complex types. 2. Found and fixed some places where nc_reclaim/copy should have been used. 3. Replaced, in the netcdf-c library, (almost all) occurrences of nc_reclaim_copy with calls to NC_reclaim/copy. This plus the optimizations is the primary speed-up mechanism. 4. In DAP4, the metadata is held in a substrate in-memory file; this required some changes so that the reclaim/copy code accessed that substrate dispatcher rather than the DAP4 dispatcher. 5. Re-factored and isolated the code that computes if a type is (transitively) variable-sized or not. 6. Clean up the reclamation code in ncgen; adding the use of nc_reclaim exposed some memory problems.
2023-05-21 07:11:25 +08:00
When you read an array of VLEN typed instances, the library will allocate
the storage space for the data in each VLEN in the array (but not the array itself).
That VLEN data must be freed eventually, so pass the pointer to the array plus
the number of elements in the array to this function when you're done with
the data, and it will free the all the VLEN instances.
The caller is still responsible for free'ing the array itself,
if it was dynamically allocated.
2011-07-14 20:21:03 +08:00
Improve performance of the nc_reclaim_data and nc_copy_data functions. re: Issue https://github.com/Unidata/netcdf-c/issues/2685 re: PR https://github.com/Unidata/netcdf-c/pull/2179 As noted in PR https://github.com/Unidata/netcdf-c/pull/2179, the old code did not allow for reclaiming instances of types, nor for properly copying them. That PR provided new functions capable of reclaiming/copying instances of arbitrary types. However, as noted by Issue https://github.com/Unidata/netcdf-c/issues/2685, using these most general functions resulted in a significant performance degradation, even for common cases. This PR attempts to mitigate the cost of using the general reclaim/copy functions in two ways. First, the previous functions operating at the top level by using ncid and typeid arguments. These functions were augmented with equivalent versions that used the netcdf-c library internal data structures to allow direct access to needed information. These new functions are used internally to the library. The second mitigation involves optimizing the internal functions by providing early tests for common cases. This avoids unnecessary recursive function calls. The overall result is a significant improvement in speed by a factor of roughly twenty -- your mileage may vary. These optimized functions are still not as fast as the original (more limited) functions, but they are getting close. Additional optimizations are possible. But the cost is a significant "uglification" of the code that I deemed a step too far, at least for now. ## Misc. Changes 1. Added a test case to check the proper reclamation/copy of complex types. 2. Found and fixed some places where nc_reclaim/copy should have been used. 3. Replaced, in the netcdf-c library, (almost all) occurrences of nc_reclaim_copy with calls to NC_reclaim/copy. This plus the optimizations is the primary speed-up mechanism. 4. In DAP4, the metadata is held in a substrate in-memory file; this required some changes so that the reclaim/copy code accessed that substrate dispatcher rather than the DAP4 dispatcher. 5. Re-factored and isolated the code that computes if a type is (transitively) variable-sized or not. 6. Clean up the reclamation code in ncgen; adding the use of nc_reclaim exposed some memory problems.
2023-05-21 07:11:25 +08:00
WARNING: this function only works if the basetype of the vlen type
is fixed size. This means it is an atomic type except NC_STRING,
or an NC_ENUM, or and NC_OPAQUE, or an NC_COMPOUND where all
the fields of the compound type are themselves fixed size.
Fix more memory leaks in netcdf-c library This is a follow up to PR https://github.com/Unidata/netcdf-c/pull/1173 Sorry that it is so big, but leak suppression can be complex. This PR fixes all remaining memory leaks -- as determined by -fsanitize=address, and with the exceptions noted below. Unfortunately. there remains a significant leak that I cannot solve. It involves vlens, and it is unclear if the leak is occurring in the netcdf-c library or the HDF5 library. I have added a check_PROGRAM to the ncdump directory to show the problem. The program is called tst_vlen_demo.c To exercise it, build the netcdf library with -fsanitize=address enabled. Then go into ncdump and do a "make clean check". This should build tst_vlen_demo without actually executing it. Then do the command "./tst_vlen_demo" to see the output of the memory checker. Note the the lost malloc is deep in the HDF5 library (in H5Tvlen.c). I am temporarily working around this error in the following way. 1. I modified several test scripts to not execute known vlen tests that fail as described above. 2. Added an environment variable called NC_VLEN_NOTEST. If set, then those specific tests are suppressed. This should mean that the --disable-utilities option to ./configure should not need to be set to get a memory leak clean build. This should allow for detection of any new leaks. Note: I used an environment variable rather than a ./configure option to control the vlen tests. This is because it is temporary (I hope) and because it is a bit tricky for shell scripts to access ./configure options. Finally, as before, this only been tested with netcdf-4 and hdf5 support.
2018-11-16 01:00:38 +08:00
Improve performance of the nc_reclaim_data and nc_copy_data functions. re: Issue https://github.com/Unidata/netcdf-c/issues/2685 re: PR https://github.com/Unidata/netcdf-c/pull/2179 As noted in PR https://github.com/Unidata/netcdf-c/pull/2179, the old code did not allow for reclaiming instances of types, nor for properly copying them. That PR provided new functions capable of reclaiming/copying instances of arbitrary types. However, as noted by Issue https://github.com/Unidata/netcdf-c/issues/2685, using these most general functions resulted in a significant performance degradation, even for common cases. This PR attempts to mitigate the cost of using the general reclaim/copy functions in two ways. First, the previous functions operating at the top level by using ncid and typeid arguments. These functions were augmented with equivalent versions that used the netcdf-c library internal data structures to allow direct access to needed information. These new functions are used internally to the library. The second mitigation involves optimizing the internal functions by providing early tests for common cases. This avoids unnecessary recursive function calls. The overall result is a significant improvement in speed by a factor of roughly twenty -- your mileage may vary. These optimized functions are still not as fast as the original (more limited) functions, but they are getting close. Additional optimizations are possible. But the cost is a significant "uglification" of the code that I deemed a step too far, at least for now. ## Misc. Changes 1. Added a test case to check the proper reclamation/copy of complex types. 2. Found and fixed some places where nc_reclaim/copy should have been used. 3. Replaced, in the netcdf-c library, (almost all) occurrences of nc_reclaim_copy with calls to NC_reclaim/copy. This plus the optimizations is the primary speed-up mechanism. 4. In DAP4, the metadata is held in a substrate in-memory file; this required some changes so that the reclaim/copy code accessed that substrate dispatcher rather than the DAP4 dispatcher. 5. Re-factored and isolated the code that computes if a type is (transitively) variable-sized or not. 6. Clean up the reclamation code in ncgen; adding the use of nc_reclaim exposed some memory problems.
2023-05-21 07:11:25 +08:00
If you have a more complex VLEN base-type, then it is better to call
the "nc_reclaim_data" function.
Fix more memory leaks in netcdf-c library This is a follow up to PR https://github.com/Unidata/netcdf-c/pull/1173 Sorry that it is so big, but leak suppression can be complex. This PR fixes all remaining memory leaks -- as determined by -fsanitize=address, and with the exceptions noted below. Unfortunately. there remains a significant leak that I cannot solve. It involves vlens, and it is unclear if the leak is occurring in the netcdf-c library or the HDF5 library. I have added a check_PROGRAM to the ncdump directory to show the problem. The program is called tst_vlen_demo.c To exercise it, build the netcdf library with -fsanitize=address enabled. Then go into ncdump and do a "make clean check". This should build tst_vlen_demo without actually executing it. Then do the command "./tst_vlen_demo" to see the output of the memory checker. Note the the lost malloc is deep in the HDF5 library (in H5Tvlen.c). I am temporarily working around this error in the following way. 1. I modified several test scripts to not execute known vlen tests that fail as described above. 2. Added an environment variable called NC_VLEN_NOTEST. If set, then those specific tests are suppressed. This should mean that the --disable-utilities option to ./configure should not need to be set to get a memory leak clean build. This should allow for detection of any new leaks. Note: I used an environment variable rather than a ./configure option to control the vlen tests. This is because it is temporary (I hope) and because it is a bit tricky for shell scripts to access ./configure options. Finally, as before, this only been tested with netcdf-4 and hdf5 support.
2018-11-16 01:00:38 +08:00
Improve performance of the nc_reclaim_data and nc_copy_data functions. re: Issue https://github.com/Unidata/netcdf-c/issues/2685 re: PR https://github.com/Unidata/netcdf-c/pull/2179 As noted in PR https://github.com/Unidata/netcdf-c/pull/2179, the old code did not allow for reclaiming instances of types, nor for properly copying them. That PR provided new functions capable of reclaiming/copying instances of arbitrary types. However, as noted by Issue https://github.com/Unidata/netcdf-c/issues/2685, using these most general functions resulted in a significant performance degradation, even for common cases. This PR attempts to mitigate the cost of using the general reclaim/copy functions in two ways. First, the previous functions operating at the top level by using ncid and typeid arguments. These functions were augmented with equivalent versions that used the netcdf-c library internal data structures to allow direct access to needed information. These new functions are used internally to the library. The second mitigation involves optimizing the internal functions by providing early tests for common cases. This avoids unnecessary recursive function calls. The overall result is a significant improvement in speed by a factor of roughly twenty -- your mileage may vary. These optimized functions are still not as fast as the original (more limited) functions, but they are getting close. Additional optimizations are possible. But the cost is a significant "uglification" of the code that I deemed a step too far, at least for now. ## Misc. Changes 1. Added a test case to check the proper reclamation/copy of complex types. 2. Found and fixed some places where nc_reclaim/copy should have been used. 3. Replaced, in the netcdf-c library, (almost all) occurrences of nc_reclaim_copy with calls to NC_reclaim/copy. This plus the optimizations is the primary speed-up mechanism. 4. In DAP4, the metadata is held in a substrate in-memory file; this required some changes so that the reclaim/copy code accessed that substrate dispatcher rather than the DAP4 dispatcher. 5. Re-factored and isolated the code that computes if a type is (transitively) variable-sized or not. 6. Clean up the reclamation code in ncgen; adding the use of nc_reclaim exposed some memory problems.
2023-05-21 07:11:25 +08:00
\param nelems number of elements in the array.
2011-07-14 20:21:03 +08:00
\param vlens pointer to the vlen object.
\returns ::NC_NOERR No error.
*/
int
Improve performance of the nc_reclaim_data and nc_copy_data functions. re: Issue https://github.com/Unidata/netcdf-c/issues/2685 re: PR https://github.com/Unidata/netcdf-c/pull/2179 As noted in PR https://github.com/Unidata/netcdf-c/pull/2179, the old code did not allow for reclaiming instances of types, nor for properly copying them. That PR provided new functions capable of reclaiming/copying instances of arbitrary types. However, as noted by Issue https://github.com/Unidata/netcdf-c/issues/2685, using these most general functions resulted in a significant performance degradation, even for common cases. This PR attempts to mitigate the cost of using the general reclaim/copy functions in two ways. First, the previous functions operating at the top level by using ncid and typeid arguments. These functions were augmented with equivalent versions that used the netcdf-c library internal data structures to allow direct access to needed information. These new functions are used internally to the library. The second mitigation involves optimizing the internal functions by providing early tests for common cases. This avoids unnecessary recursive function calls. The overall result is a significant improvement in speed by a factor of roughly twenty -- your mileage may vary. These optimized functions are still not as fast as the original (more limited) functions, but they are getting close. Additional optimizations are possible. But the cost is a significant "uglification" of the code that I deemed a step too far, at least for now. ## Misc. Changes 1. Added a test case to check the proper reclamation/copy of complex types. 2. Found and fixed some places where nc_reclaim/copy should have been used. 3. Replaced, in the netcdf-c library, (almost all) occurrences of nc_reclaim_copy with calls to NC_reclaim/copy. This plus the optimizations is the primary speed-up mechanism. 4. In DAP4, the metadata is held in a substrate in-memory file; this required some changes so that the reclaim/copy code accessed that substrate dispatcher rather than the DAP4 dispatcher. 5. Re-factored and isolated the code that computes if a type is (transitively) variable-sized or not. 6. Clean up the reclamation code in ncgen; adding the use of nc_reclaim exposed some memory problems.
2023-05-21 07:11:25 +08:00
nc_free_vlens(size_t nelems, nc_vlen_t vlens[])
2011-07-14 20:21:03 +08:00
{
int ret;
size_t i;
Improve performance of the nc_reclaim_data and nc_copy_data functions. re: Issue https://github.com/Unidata/netcdf-c/issues/2685 re: PR https://github.com/Unidata/netcdf-c/pull/2179 As noted in PR https://github.com/Unidata/netcdf-c/pull/2179, the old code did not allow for reclaiming instances of types, nor for properly copying them. That PR provided new functions capable of reclaiming/copying instances of arbitrary types. However, as noted by Issue https://github.com/Unidata/netcdf-c/issues/2685, using these most general functions resulted in a significant performance degradation, even for common cases. This PR attempts to mitigate the cost of using the general reclaim/copy functions in two ways. First, the previous functions operating at the top level by using ncid and typeid arguments. These functions were augmented with equivalent versions that used the netcdf-c library internal data structures to allow direct access to needed information. These new functions are used internally to the library. The second mitigation involves optimizing the internal functions by providing early tests for common cases. This avoids unnecessary recursive function calls. The overall result is a significant improvement in speed by a factor of roughly twenty -- your mileage may vary. These optimized functions are still not as fast as the original (more limited) functions, but they are getting close. Additional optimizations are possible. But the cost is a significant "uglification" of the code that I deemed a step too far, at least for now. ## Misc. Changes 1. Added a test case to check the proper reclamation/copy of complex types. 2. Found and fixed some places where nc_reclaim/copy should have been used. 3. Replaced, in the netcdf-c library, (almost all) occurrences of nc_reclaim_copy with calls to NC_reclaim/copy. This plus the optimizations is the primary speed-up mechanism. 4. In DAP4, the metadata is held in a substrate in-memory file; this required some changes so that the reclaim/copy code accessed that substrate dispatcher rather than the DAP4 dispatcher. 5. Re-factored and isolated the code that computes if a type is (transitively) variable-sized or not. 6. Clean up the reclamation code in ncgen; adding the use of nc_reclaim exposed some memory problems.
2023-05-21 07:11:25 +08:00
for(i = 0; i < nelems; i++)
2011-07-14 20:21:03 +08:00
if ((ret = nc_free_vlen(&vlens[i])))
return ret;
return NC_NOERR;
}
Improve performance of the nc_reclaim_data and nc_copy_data functions. re: Issue https://github.com/Unidata/netcdf-c/issues/2685 re: PR https://github.com/Unidata/netcdf-c/pull/2179 As noted in PR https://github.com/Unidata/netcdf-c/pull/2179, the old code did not allow for reclaiming instances of types, nor for properly copying them. That PR provided new functions capable of reclaiming/copying instances of arbitrary types. However, as noted by Issue https://github.com/Unidata/netcdf-c/issues/2685, using these most general functions resulted in a significant performance degradation, even for common cases. This PR attempts to mitigate the cost of using the general reclaim/copy functions in two ways. First, the previous functions operating at the top level by using ncid and typeid arguments. These functions were augmented with equivalent versions that used the netcdf-c library internal data structures to allow direct access to needed information. These new functions are used internally to the library. The second mitigation involves optimizing the internal functions by providing early tests for common cases. This avoids unnecessary recursive function calls. The overall result is a significant improvement in speed by a factor of roughly twenty -- your mileage may vary. These optimized functions are still not as fast as the original (more limited) functions, but they are getting close. Additional optimizations are possible. But the cost is a significant "uglification" of the code that I deemed a step too far, at least for now. ## Misc. Changes 1. Added a test case to check the proper reclamation/copy of complex types. 2. Found and fixed some places where nc_reclaim/copy should have been used. 3. Replaced, in the netcdf-c library, (almost all) occurrences of nc_reclaim_copy with calls to NC_reclaim/copy. This plus the optimizations is the primary speed-up mechanism. 4. In DAP4, the metadata is held in a substrate in-memory file; this required some changes so that the reclaim/copy code accessed that substrate dispatcher rather than the DAP4 dispatcher. 5. Re-factored and isolated the code that computes if a type is (transitively) variable-sized or not. 6. Clean up the reclamation code in ncgen; adding the use of nc_reclaim exposed some memory problems.
2023-05-21 07:11:25 +08:00
/**
\ingroup user_types
Free memory in a single VLEN object.
This function is equivalent to calling *nc_free_vlens* with nelems == 1.
\param vl pointer to the vlen object.
\returns ::NC_NOERR No error.
*/
int
nc_free_vlen(nc_vlen_t *vl)
{
free(vl->p);
return NC_NOERR;
}
2011-07-14 20:21:03 +08:00
/**
\ingroup user_types
Use this function to define a variable length array type.
\param ncid \ref ncid
\param name \ref object_name of new type.
\param base_typeid The typeid of the base type of the VLEN. For
example, for a VLEN of shorts, the base type is ::NC_SHORT. This can be
a user defined type.
\param xtypep A pointer to an nc_type variable. The typeid of the new
VLEN type will be set here.
\returns ::NC_NOERR No error.
\returns ::NC_EBADID Bad \ref ncid.
\returns ::NC_EBADTYPE Bad type id.
\returns ::NC_ENOTNC4 Not an netCDF-4 file, or classic model enabled.
\returns ::NC_EHDFERR An error was reported by the HDF5 layer.
\returns ::NC_ENAMEINUSE That name is in use.
\returns ::NC_EMAXNAME Name exceeds max length NC_MAX_NAME.
\returns ::NC_EBADNAME Name contains illegal characters.
\returns ::NC_EPERM Attempt to write to a read-only file.
\returns ::NC_ENOTINDEFINE Not in define mode.
*/
int
nc_def_vlen(int ncid, const char *name, nc_type base_typeid, nc_type *xtypep)
{
NC* ncp;
int stat = NC_check_id(ncid,&ncp);
if(stat != NC_NOERR) return stat;
return ncp->dispatch->def_vlen(ncid,name,base_typeid,xtypep);
}
/** \ingroup user_types
Learn about a VLEN type.
\param ncid \ref ncid
\param xtype The type of the VLEN to inquire about.
\param name \ref object_name of the type. \ref ignored_if_null.
\param datum_sizep A pointer to a size_t, this will get the size of
one element of this vlen. \ref ignored_if_null.
\param base_nc_typep Pointer to get the base type of the VLEN. \ref
ignored_if_null.
\returns ::NC_NOERR No error.
\returns ::NC_EBADID Bad \ref ncid.
\returns ::NC_EBADTYPE Bad type id.
\returns ::NC_ENOTNC4 Not an netCDF-4 file, or classic model enabled.
\returns ::NC_EHDFERR An error was reported by the HDF5 layer.
*/
int
nc_inq_vlen(int ncid, nc_type xtype, char *name, size_t *datum_sizep, nc_type *base_nc_typep)
{
int class = 0;
int stat = nc_inq_user_type(ncid,xtype,name,datum_sizep,base_nc_typep,NULL,&class);
if(stat != NC_NOERR) return stat;
if(class != NC_VLEN) stat = NC_EBADTYPE;
return stat;
}
/*! \} */ /* End of named group ...*/
/** \internal
\ingroup user_types
Put a VLEN element. This function writes an element of a VLEN for the
Fortran APIs.
\param ncid \ref ncid
\param typeid1 Typeid of the VLEN.
\param vlen_element Pointer to the element of the VLEN.
2018-04-24 06:38:08 +08:00
\param len Length of the VLEN element.
2011-07-14 20:21:03 +08:00
\param data VLEN data.
\returns ::NC_NOERR No error.
\returns ::NC_EBADID Bad \ref ncid.
\returns ::NC_EBADTYPE Bad type id.
\returns ::NC_ENOTNC4 Not an netCDF-4 file, or classic model enabled.
\returns ::NC_EHDFERR An error was reported by the HDF5 layer.
\returns ::NC_EPERM Attempt to write to a read-only file.
*/
int
nc_put_vlen_element(int ncid, int typeid1, void *vlen_element, size_t len, const void *data)
{
NC* ncp;
int stat = NC_check_id(ncid,&ncp);
if(stat != NC_NOERR) return stat;
return ncp->dispatch->put_vlen_element(ncid,typeid1,vlen_element,len,data);
}
/**
\internal
\ingroup user_types
Get a VLEN element. This function reads an element of a VLEN for the
Fortran APIs.
\param ncid \ref ncid
\param typeid1 Typeid of the VLEN.
\param vlen_element Pointer to the element of the VLEN.
2018-04-24 06:38:08 +08:00
\param len Length of the VLEN element.
2011-07-14 20:21:03 +08:00
\param data VLEN data.
\returns ::NC_NOERR No error.
\returns ::NC_EBADID Bad \ref ncid.
\returns ::NC_EBADTYPE Bad type id.
\returns ::NC_ENOTNC4 Not an netCDF-4 file, or classic model enabled.
\returns ::NC_EHDFERR An error was reported by the HDF5 layer.
*/
int
nc_get_vlen_element(int ncid, int typeid1, const void *vlen_element,
size_t *len, void *data)
{
NC *ncp;
int stat = NC_check_id(ncid,&ncp);
if(stat != NC_NOERR) return stat;
return ncp->dispatch->get_vlen_element(ncid, typeid1, vlen_element,
len, data);
}