netcdf-c/libsrc4/nc4internal.c

2077 lines
56 KiB
C
Raw Normal View History

2018-01-31 23:44:33 +08:00
/* Copyright 2003-2018, University Corporation for Atmospheric
* Research. See the COPYRIGHT file for copying and redistribution
* conditions.
*/
/**
* @file
* @internal
* Internal netcdf-4 functions.
*
* This file contains functions internal to the netcdf4 library. None of
* the functions in this file are exposed in the exetnal API. These
* functions all relate to the manipulation of netcdf-4's in-memory
* buffer of metadata information, i.e. the linked list of NC
* structs.
2017-12-05 03:21:14 +08:00
*
* @author Ed Hartnett, Dennis Heimbigner, Ward Fisher
2017-12-05 03:21:14 +08:00
*/
2010-06-03 21:24:43 +08:00
#include "config.h"
Add support for multiple filters per variable. re: https://github.com/Unidata/netcdf-c/issues/1584 Support has been added for multiple filters per variable. This affects a number of components in netcdf. The new APIs are documented in NUG/filters.md. The primary changes are: * A set of new functions are provided (see __include/netcdf_filter.h__). - Obtain a list of the filters associated with a variable - Obtain the parameters for a specific filter. * The existing __nc_inq_var_filter__ function now returns info about the first defined filter. * The utilities (ncgen, ncdump, and nccopy) now support an extended format for specifying a sequence of filters. The general form is __<filter>|<filter>..._. * The ncdump **_Filter** attribute now dumps a list of all the filters associated with a variable using the above new format. * Filter specifications can now use a filter name instead of number for filters known to the netcdf library, which in turn is taken from the HDF5 filter registration page. * New errors are defined: NC_EFILTER and NC_ENOFILTER. The latter is returned if an attempt is made to access an unknown filter. * Internally, the dispatch table has been extended to add a function to handle all of the filter functions. * New, filter-related, tests were added to nc_test4. * A new plugin was added to the plugins directory to help with testing. Notes: 1. The shuffle and fletcher32 filters are not part of the multifilter system. Misc. changes: 1. A debug module was added to libhdf5 to help catch error locations.
2020-02-17 03:59:33 +08:00
#include "netcdf.h"
#include "netcdf_filter.h"
2010-06-03 21:24:43 +08:00
#include "nc4internal.h"
#include "nc.h" /* from libsrc */
#include "ncdispatch.h" /* from libdispatch */
#include "ncutf8.h"
This PR adds EXPERIMENTAL support for accessing data in the cloud using a variant of the Zarr protocol and storage format. This enhancement is generically referred to as "NCZarr". The data model supported by NCZarr is netcdf-4 minus the user-defined types and the String type. In this sense it is similar to the CDF-5 data model. More detailed information about enabling and using NCZarr is described in the document NUG/nczarr.md and in a [Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in). WARNING: this code has had limited testing, so do use this version for production work. Also, performance improvements are ongoing. Note especially the following platform matrix of successful tests: Platform | Build System | S3 support ------------------------------------ Linux+gcc | Automake | yes Linux+gcc | CMake | yes Visual Studio | CMake | no Additionally, and as a consequence of the addition of NCZarr, major changes have been made to the Filter API. NOTE: NCZarr does not yet support filters, but these changes are enablers for that support in the future. Note that it is possible (probable?) that there will be some accidental reversions if the changes here did not correctly mimic the existing filter testing. In any case, previously filter ids and parameters were of type unsigned int. In order to support the more general zarr filter model, this was all converted to char*. The old HDF5-specific, unsigned int operations are still supported but they are wrappers around the new, char* based nc_filterx_XXX functions. This entailed at least the following changes: 1. Added the files libdispatch/dfilterx.c and include/ncfilter.h 2. Some filterx utilities have been moved to libdispatch/daux.c 3. A new entry, "filter_actions" was added to the NCDispatch table and the version bumped. 4. An overly complex set of structs was created to support funnelling all of the filterx operations thru a single dispatch "filter_actions" entry. 5. Move common code to from libhdf5 to libsrc4 so that it is accessible to nczarr. Changes directly related to Zarr: 1. Modified CMakeList.txt and configure.ac to support both C and C++ -- this is in support of S3 support via the awd-sdk libraries. 2. Define a size64_t type to support nczarr. 3. More reworking of libdispatch/dinfermodel.c to support zarr and to regularize the structure of the fragments section of a URL. Changes not directly related to Zarr: 1. Make client-side filter registration be conditional, with default off. 2. Hack include/nc4internal.h to make some flags added by Ed be unique: e.g. NC_CREAT, NC_INDEF, etc. 3. cleanup include/nchttp.h and libdispatch/dhttp.c. 4. Misc. changes to support compiling under Visual Studio including: * Better testing under windows for dirent.h and opendir and closedir. 5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags and to centralize error reporting. 6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them. 7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible. Changes Left TO-DO: 1. fix provenance code, it is too HDF5 specific.
2020-06-29 08:02:47 +08:00
/** @internal Number of reserved attributes. These attributes are
* hidden from the netcdf user, but exist in the implementation
* datasets to help netcdf read the dataset.
* Moved here from hdf5file.c.
* These tables need to capture all reserved attributes
* across all possible dispatchers
*/
/** @internal List of reserved attributes.
WARNING: This list must be in sorted order for binary search. */
static const NC_reservedatt NC_reserved[] = {
{NC_ATT_CLASS, READONLYFLAG|HIDDENATTRFLAG}, /*CLASS*/
{NC_ATT_DIMENSION_LIST, READONLYFLAG|HIDDENATTRFLAG}, /*DIMENSION_LIST*/
{NC_ATT_NAME, READONLYFLAG|HIDDENATTRFLAG}, /*NAME*/
{NC_ATT_REFERENCE_LIST, READONLYFLAG|HIDDENATTRFLAG}, /*REFERENCE_LIST*/
{NC_XARRAY_DIMS, READONLYFLAG|HIDDENATTRFLAG}, /*_ARRAY_DIMENSIONS*/
Add filter support to NCZarr Filter support has three goals: 1. Use the existing HDF5 filter implementations, 2. Allow filter metadata to be stored in the NumCodecs metadata format used by Zarr, 3. Allow filters to be used even when HDF5 is disabled Detailed usage directions are define in docs/filters.md. For now, the existing filter API is left in place. So filters are defined using ''nc_def_var_filter'' using the HDF5 style where the id and parameters are unsigned integers. This is a big change since filters affect many parts of the code. In the following, the terms "compressor" and "filter" and "codec" are generally used synonomously. ### Filter-Related Changes: * In order to support dynamic loading of shared filter libraries, a new library was added in the libncpoco directory; it helps to isolate dynamic loading across multiple platforms. * Provide a json parsing library for use by plugins; this is created by merging libdispatch/ncjson.c with include/ncjson.h. * Add a new _Codecs attribute to allow clients to see what codecs are being used; let ncdump -s print it out. * Provide special headers to help support compilation of HDF5 filters when HDF5 is not enabled: netcdf_filter_hdf5_build.h and netcdf_filter_build.h. * Add a number of new test to test the new nczarr filters. * Let ncgen parse _Codecs attribute, although it is ignored. ### Plugin directory changes: * Add support for the Blosc compressor; this is essential because it is the most common compressor used in Zarr datasets. This also necessitated adding a CMake FindBlosc.cmake file * Add NCZarr support for the big-four filters provided by HDF5: shuffle, fletcher32, deflate (zlib), and szip * Add a Codec defaulter (see docs/filters.md) for the big four filters. * Make plugins work with windows by properly adding __declspec declaration. ### Misc. Non-Filter Changes * Replace most uses of USE_NETCDF4 (deprecated) with USE_HDF5. * Improve support for caching * More fixes for path conversion code * Fix misc. memory leaks * Add new utility -- ncdump/ncpathcvt -- that does more or less the same thing as cygpath. * Add a number of new test to test the non-filter fixes. * Update the parsers * Convert most instances of '#ifdef _MSC_VER' to '#ifdef _WIN32'
2021-09-03 07:04:26 +08:00
{NC_ATT_CODECS, VARFLAG|READONLYFLAG|NAMEONLYFLAG|HIDDENATTRFLAG}, /*_Codecs*/
{NC_ATT_FORMAT, READONLYFLAG}, /*_Format*/
{ISNETCDF4ATT, READONLYFLAG|NAMEONLYFLAG}, /*_IsNetcdf4*/
{NCPROPS, READONLYFLAG|NAMEONLYFLAG|MATERIALIZEDFLAG}, /*_NCProperties*/
{NC_NCZARR_ATTR, READONLYFLAG|HIDDENATTRFLAG}, /*_NCZARR_ATTR*/
{NC_ATT_COORDINATES, READONLYFLAG|HIDDENATTRFLAG|MATERIALIZEDFLAG}, /*_Netcdf4Coordinates*/
{NC_ATT_DIMID_NAME, READONLYFLAG|HIDDENATTRFLAG|MATERIALIZEDFLAG}, /*_Netcdf4Dimid*/
{SUPERBLOCKATT, READONLYFLAG|NAMEONLYFLAG}, /*_SuperblockVersion*/
{NC_ATT_NC3_STRICT_NAME, READONLYFLAG|MATERIALIZEDFLAG}, /*_nc3_strict*/
This PR adds EXPERIMENTAL support for accessing data in the cloud using a variant of the Zarr protocol and storage format. This enhancement is generically referred to as "NCZarr". The data model supported by NCZarr is netcdf-4 minus the user-defined types and the String type. In this sense it is similar to the CDF-5 data model. More detailed information about enabling and using NCZarr is described in the document NUG/nczarr.md and in a [Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in). WARNING: this code has had limited testing, so do use this version for production work. Also, performance improvements are ongoing. Note especially the following platform matrix of successful tests: Platform | Build System | S3 support ------------------------------------ Linux+gcc | Automake | yes Linux+gcc | CMake | yes Visual Studio | CMake | no Additionally, and as a consequence of the addition of NCZarr, major changes have been made to the Filter API. NOTE: NCZarr does not yet support filters, but these changes are enablers for that support in the future. Note that it is possible (probable?) that there will be some accidental reversions if the changes here did not correctly mimic the existing filter testing. In any case, previously filter ids and parameters were of type unsigned int. In order to support the more general zarr filter model, this was all converted to char*. The old HDF5-specific, unsigned int operations are still supported but they are wrappers around the new, char* based nc_filterx_XXX functions. This entailed at least the following changes: 1. Added the files libdispatch/dfilterx.c and include/ncfilter.h 2. Some filterx utilities have been moved to libdispatch/daux.c 3. A new entry, "filter_actions" was added to the NCDispatch table and the version bumped. 4. An overly complex set of structs was created to support funnelling all of the filterx operations thru a single dispatch "filter_actions" entry. 5. Move common code to from libhdf5 to libsrc4 so that it is accessible to nczarr. Changes directly related to Zarr: 1. Modified CMakeList.txt and configure.ac to support both C and C++ -- this is in support of S3 support via the awd-sdk libraries. 2. Define a size64_t type to support nczarr. 3. More reworking of libdispatch/dinfermodel.c to support zarr and to regularize the structure of the fragments section of a URL. Changes not directly related to Zarr: 1. Make client-side filter registration be conditional, with default off. 2. Hack include/nc4internal.h to make some flags added by Ed be unique: e.g. NC_CREAT, NC_INDEF, etc. 3. cleanup include/nchttp.h and libdispatch/dhttp.c. 4. Misc. changes to support compiling under Visual Studio including: * Better testing under windows for dirent.h and opendir and closedir. 5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags and to centralize error reporting. 6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them. 7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible. Changes Left TO-DO: 1. fix provenance code, it is too HDF5 specific.
2020-06-29 08:02:47 +08:00
};
#define NRESERVED (sizeof(NC_reserved) / sizeof(NC_reservedatt)) /*|NC_reservedatt|*/
2010-06-03 21:24:43 +08:00
/* These hold the file caching settings for the library. */
size_t nc4_chunk_cache_size = CHUNK_CACHE_SIZE; /**< Default chunk cache size. */
size_t nc4_chunk_cache_nelems = CHUNK_CACHE_NELEMS; /**< Default chunk cache number of elements. */
float nc4_chunk_cache_preemption = CHUNK_CACHE_PREEMPTION; /**< Default chunk cache preemption. */
2010-06-03 21:24:43 +08:00
static int NC4_move_in_NCList(NC* nc, int new_id);
Add support for multiple filters per variable. re: https://github.com/Unidata/netcdf-c/issues/1584 Support has been added for multiple filters per variable. This affects a number of components in netcdf. The new APIs are documented in NUG/filters.md. The primary changes are: * A set of new functions are provided (see __include/netcdf_filter.h__). - Obtain a list of the filters associated with a variable - Obtain the parameters for a specific filter. * The existing __nc_inq_var_filter__ function now returns info about the first defined filter. * The utilities (ncgen, ncdump, and nccopy) now support an extended format for specifying a sequence of filters. The general form is __<filter>|<filter>..._. * The ncdump **_Filter** attribute now dumps a list of all the filters associated with a variable using the above new format. * Filter specifications can now use a filter name instead of number for filters known to the netcdf library, which in turn is taken from the HDF5 filter registration page. * New errors are defined: NC_EFILTER and NC_ENOFILTER. The latter is returned if an attempt is made to access an unknown filter. * Internally, the dispatch table has been extended to add a function to handle all of the filter functions. * New, filter-related, tests were added to nc_test4. * A new plugin was added to the plugins directory to help with testing. Notes: 1. The shuffle and fletcher32 filters are not part of the multifilter system. Misc. changes: 1. A debug module was added to libhdf5 to help catch error locations.
2020-02-17 03:59:33 +08:00
2010-06-03 21:24:43 +08:00
#ifdef LOGGING
/* This is the severity level of messages which will be logged. Use
severity 0 for errors, 1 for important log messages, 2 for less
important, etc. */
Primary change: add dap4 support Specific changes: 1. Add dap4 code: libdap4 and dap4_test. Note that until the d4ts server problem is solved, dap4 is turned off. 2. Modify various files to support dap4 flags: configure.ac, Makefile.am, CMakeLists.txt, etc. 3. Add nc_test/test_common.sh. This centralizes the handling of the locations of various things in the build tree: e.g. where is ncgen.exe located. See nc_test/test_common.sh for details. 4. Modify .sh files to use test_common.sh 5. Obsolete separate oc2 by moving it to be part of netcdf-c. This means replacing code with netcdf-c equivalents. 5. Add --with-testserver to configure.ac to allow override of the servers to be used for --enable-dap-remote-tests. 6. There were multiple versions of nctypealignment code. Try to centralize in libdispatch/doffset.c and include/ncoffsets.h 7. Add a unit test for the ncuri code because of its complexity. 8. Move the findserver code out of libdispatch and into a separate, self contained program in ncdap_test and dap4_test. 9. Move the dispatch header files (nc{3,4}dispatch.h) to .../include because they are now shared by modules. 10. Revamp the handling of TOPSRCDIR and TOPBUILDDIR for shell scripts. 11. Make use of MREMAP if available 12. Misc. minor changes e.g. - #include <config.h> -> #include "config.h" - Add some no-install headers to /include - extern -> EXTERNL and vice versa as needed - misc header cleanup - clean up checking for misc. unix vs microsoft functions 13. Change copyright decls in some files to point to LICENSE file. 14. Add notes to RELEASENOTES.md
2017-03-09 08:01:10 +08:00
int nc_log_level = NC_TURN_OFF_LOGGING;
#if HAS_PARALLEL4
/* Reference count for the parallel I/O log file. */
int pio_log_ref_cnt = 0;
/* File pointer for the parallel I/O log file. */
FILE *LOG_FILE = NULL;
/* Rank of this process. */
int my_rank;
#endif /* HAS_PARALLEL4 */
/* This function prints out a message, if the severity of
* the message is lower than the global nc_log_level. To use it, do
* something like this:
*
* nc_log(0, "this computer will explode in %d seconds", i);
*
* After the first arg (the severity), use the rest like a normal
* printf statement. Output will appear on stderr.
*
* Ed Hartnett
*/
void
nc_log(int severity, const char *fmt, ...)
{
va_list argp;
int t;
/* If the severity is greater than the log level, we don' care to
print this message. */
if (severity > nc_log_level)
return;
/* If the severity is zero, this is an error. Otherwise insert that
many tabs before the message. */
if (!severity)
fprintf(stderr, "ERROR: ");
for (t = 0; t < severity; t++)
fprintf(stderr, "\t");
/* Print out the variable list of args with vprintf. */
va_start(argp, fmt);
vfprintf(stderr, fmt, argp);
va_end(argp);
/* Put on a final linefeed. */
fprintf(stderr, "\n");
fflush(stderr);
}
2010-06-03 21:24:43 +08:00
#endif /* LOGGING */
2017-12-03 22:57:21 +08:00
/**
2017-12-05 03:21:14 +08:00
* @internal Check and normalize and name.
2017-12-03 22:57:21 +08:00
*
* @param name Name to normalize.
* @param norm_name The normalized name.
*
* @return ::NC_NOERR No error.
* @return ::NC_EMAXNAME Name too long.
2018-01-18 22:36:52 +08:00
* @return ::NC_EINVAL NULL given for name.
* @return ::NC_ENOMEM Out of memory.
2017-12-03 22:57:21 +08:00
* @author Dennis Heimbigner
*/
2010-06-03 21:24:43 +08:00
int
nc4_check_name(const char *name, char *norm_name)
{
char *temp;
int retval;
assert(norm_name);
/* Check for NULL. */
if (!name)
return NC_EINVAL;
/* Make sure this is a valid netcdf name. This should be done
* before the name is normalized, because it gives better error
* codes for bad utf8 strings. */
if ((retval = NC_check_name(name)))
return retval;
/* Normalize the name. */
if ((retval = nc_utf8_normalize((const unsigned char *)name,
(unsigned char **)&temp)))
return retval;
/* Check length of normalized name. */
if (strlen(temp) > NC_MAX_NAME)
{
free(temp);
return NC_EMAXNAME;
}
/* Copy the normalized name. */
strcpy(norm_name, temp);
free(temp);
return NC_NOERR;
2010-06-03 21:24:43 +08:00
}
/**
* @internal Add a file to the list of libsrc4 open files. This is
* used by dispatch layers that wish to use the libsrc4 metadata
* model, but don't know about struct NC. This is the same as
* nc4_nc4f_list_add(), except it takes an ncid instead of an NC *,
* and also passes back the dispatchdata pointer.
*
* @param ncid The (already-assigned) ncid of the file (aka ext_ncid).
* @param path The file name of the new file.
* @param mode The mode flag.
* @param dispatchdata Void * that gets pointer to dispatch data,
* which is the NC_FILE_INFO_T struct allocated for this file and its
* metadata. Ignored if NULL. (This is passed as a void to allow
* external user-defined formats to use this function.)
*
* @return ::NC_NOERR No error.
* @return ::NC_EBADID No NC struct with this ext_ncid.
* @return ::NC_ENOMEM Out of memory.
* @author Ed Hartnett
*/
int
nc4_file_list_add(int ncid, const char *path, int mode, void **dispatchdata)
{
NC *nc;
int ret;
/* Find NC pointer for this file. */
if ((ret = NC_check_id(ncid, &nc)))
return ret;
/* Add necessary structs to hold netcdf-4 file data. This is where
* the NC_FILE_INFO_T struct is allocated for the file. */
if ((ret = nc4_nc4f_list_add(nc, path, mode)))
return ret;
/* If the user wants a pointer to the NC_FILE_INFO_T, then provide
* it. */
if (dispatchdata)
*dispatchdata = nc->dispatchdata;
return NC_NOERR;
}
2019-09-17 01:28:18 +08:00
/**
* @internal Change the ncid of an open file. This is needed for PIO
* integration.
*
* @param ncid The ncid of the file (aka ext_ncid).
* @param new_ncid The new ncid index to use (i.e. the first two bytes
* of the ncid).
*
* @return ::NC_NOERR No error.
* @return ::NC_EBADID No NC struct with this ext_ncid.
* @return ::NC_ENOMEM Out of memory.
* @author Ed Hartnett
*/
int
nc4_file_change_ncid(int ncid, unsigned short new_ncid_index)
{
NC *nc;
int ret;
LOG((2, "%s: ncid %d new_ncid_index %d", __func__, ncid, new_ncid_index));
/* Find NC pointer for this file. */
if ((ret = NC_check_id(ncid, &nc)))
return ret;
/* Move it in the list. It will faile if list spot is already
* occupied. */
LOG((3, "moving nc->ext_ncid %d nc->ext_ncid >> ID_SHIFT %d",
nc->ext_ncid, nc->ext_ncid >> ID_SHIFT));
if (NC4_move_in_NCList(nc, new_ncid_index))
2019-09-17 01:28:18 +08:00
return NC_EIO;
LOG((3, "moved to new_ncid_index %d new nc->ext_ncid %d", new_ncid_index,
nc->ext_ncid));
return NC_NOERR;
}
/**
* @internal Get info about a file on the list of libsrc4 open
* files. This is used by dispatch layers that wish to use the libsrc4
* metadata model, but don't know about struct NC.
*
* @param ncid The ncid of the file (aka ext_ncid).
2019-09-18 10:27:43 +08:00
* @param path A pointer that gets file name (< NC_MAX_NAME). Ignored
* if NULL.
* @param mode A pointer that gets the mode flag. Ignored if NULL.
* @param dispatchdata Void * that gets pointer to dispatch data,
* which is the NC_FILE_INFO_T struct allocated for this file and its
* metadata. Ignored if NULL. (This is passed as a void to allow
* external user-defined formats to use this function.)
*
* @return ::NC_NOERR No error.
* @return ::NC_EBADID No NC struct with this ext_ncid.
* @return ::NC_ENOMEM Out of memory.
* @author Ed Hartnett
*/
int
nc4_file_list_get(int ncid, char **path, int *mode, void **dispatchdata)
{
NC *nc;
int ret;
/* Find NC pointer for this file. */
if ((ret = NC_check_id(ncid, &nc)))
return ret;
/* If the user wants path, give it. */
if (path)
strncpy(*path, nc->path, NC_MAX_NAME);
/* If the user wants mode, give it. */
if (mode)
*mode = nc->mode;
/* If the user wants dispatchdata, give it. */
if (dispatchdata)
*dispatchdata = nc->dispatchdata;
return NC_NOERR;
}
2017-12-03 22:57:21 +08:00
/**
* @internal Given an NC pointer, add the necessary stuff for a
* netcdf-4 file. This allocates the NC_FILE_INFO_T struct for the
* file, which is used by libhdf5 and libhdf4 (and perhaps other
* future dispatch layers) to hold the metadata for the file.
2017-12-05 03:21:14 +08:00
*
* @param nc Pointer to file's NC struct.
* @param path The file name of the new file.
* @param mode The mode flag.
2017-12-03 22:57:21 +08:00
*
* @return ::NC_NOERR No error.
* @return ::NC_ENOMEM Out of memory.
* @author Ed Hartnett, Dennis Heimbigner
2017-12-03 22:57:21 +08:00
*/
2010-06-03 21:24:43 +08:00
int
nc4_nc4f_list_add(NC *nc, const char *path, int mode)
2010-06-03 21:24:43 +08:00
{
NC_FILE_INFO_T *h5;
int retval;
2010-06-03 21:24:43 +08:00
assert(nc && !NC4_DATA(nc) && path);
2010-06-03 21:24:43 +08:00
/* We need to malloc and initialize the substructure
NC_FILE_INFO_T. */
if (!(h5 = calloc(1, sizeof(NC_FILE_INFO_T))))
return NC_ENOMEM;
nc->dispatchdata = h5;
h5->controller = nc;
2010-06-03 21:24:43 +08:00
h5->hdr.sort = NCFIL;
h5->hdr.name = strdup(path);
h5->hdr.id = nc->ext_ncid;
/* Hang on to cmode, and note that we're in define mode. */
h5->cmode = mode | NC_INDEF;
2010-06-03 21:24:43 +08:00
/* The next_typeid needs to be set beyond the end of our atomic
* types. */
h5->next_typeid = NC_FIRSTUSERTYPEID;
2010-06-03 21:24:43 +08:00
/* Initialize lists for dimensions, types, and groups. */
h5->alldims = nclistnew();
h5->alltypes = nclistnew();
h5->allgroups = nclistnew();
/* There's always at least one open group - the root
* group. Allocate space for one group's worth of information. Set
* its grp id, name, and allocate associated empty lists. */
if ((retval = nc4_grp_list_add(h5, NULL, NC_GROUP_NAME, &h5->root_grp)))
return retval;
return NC_NOERR;
2010-06-03 21:24:43 +08:00
}
2017-12-03 22:57:21 +08:00
/**
* @internal Given an ncid, find the relevant group and return a
2018-08-22 01:54:06 +08:00
* pointer to it.
2017-12-05 03:21:14 +08:00
*
* @param ncid File and group ID.
2018-08-22 01:23:12 +08:00
* @param grp Pointer that gets pointer to group info struct. Ignored
* if NULL.
2017-12-03 22:57:21 +08:00
*
* @return ::NC_NOERR No error.
2017-12-05 03:21:14 +08:00
* @return ::NC_ENOTNC4 Not a netCDF-4 file.
2017-12-03 22:57:21 +08:00
* @author Ed Hartnett
2017-12-05 03:21:14 +08:00
*/
2010-06-03 21:24:43 +08:00
int
nc4_find_nc4_grp(int ncid, NC_GRP_INFO_T **grp)
{
return nc4_find_nc_grp_h5(ncid, NULL, grp, NULL);
2010-06-03 21:24:43 +08:00
}
2017-12-03 22:57:21 +08:00
/**
* @internal Given an ncid, find the relevant group and return a
* pointer to it, also set a pointer to the nc4_info struct of the
2018-08-22 01:23:12 +08:00
* related file.
2017-12-05 03:21:14 +08:00
*
* @param ncid File and group ID.
2018-08-22 01:23:12 +08:00
* @param grp Pointer that gets pointer to group info struct. Ignored
* if NULL.
* @param h5 Pointer that gets pointer to file info struct. Ignored if
* NULL.
2017-12-03 22:57:21 +08:00
*
* @return ::NC_NOERR No error.
2017-12-05 03:21:14 +08:00
* @return ::NC_EBADID Bad ncid.
2017-12-03 22:57:21 +08:00
* @author Ed Hartnett
2017-12-05 03:21:14 +08:00
*/
2010-06-03 21:24:43 +08:00
int
2018-08-22 01:23:12 +08:00
nc4_find_grp_h5(int ncid, NC_GRP_INFO_T **grp, NC_FILE_INFO_T **h5)
2010-06-03 21:24:43 +08:00
{
return nc4_find_nc_grp_h5(ncid, NULL, grp, h5);
2010-06-03 21:24:43 +08:00
}
/**
2018-08-22 01:23:12 +08:00
* @internal Find info for this file and group, and set pointers.
*
* @param ncid File and group ID.
2018-08-22 01:23:12 +08:00
* @param nc Pointer that gets a pointer to the file's NC
* struct. Ignored if NULL.
* @param grp Pointer that gets a pointer to the group
* struct. Ignored if NULL.
* @param h5 Pointer that gets HDF5 file struct. Ignored if NULL.
*
2017-12-03 22:57:21 +08:00
* @return ::NC_NOERR No error.
* @return ::NC_EBADID Bad ncid.
2018-08-22 01:23:12 +08:00
* @author Ed Hartnett, Dennis Heimbigner
2017-12-05 03:21:14 +08:00
*/
2010-06-03 21:24:43 +08:00
int
2018-08-22 01:23:12 +08:00
nc4_find_nc_grp_h5(int ncid, NC **nc, NC_GRP_INFO_T **grp, NC_FILE_INFO_T **h5)
2010-06-03 21:24:43 +08:00
{
NC_GRP_INFO_T *my_grp = NULL;
NC_FILE_INFO_T *my_h5 = NULL;
NC *my_nc;
int retval;
Add filter support to NCZarr Filter support has three goals: 1. Use the existing HDF5 filter implementations, 2. Allow filter metadata to be stored in the NumCodecs metadata format used by Zarr, 3. Allow filters to be used even when HDF5 is disabled Detailed usage directions are define in docs/filters.md. For now, the existing filter API is left in place. So filters are defined using ''nc_def_var_filter'' using the HDF5 style where the id and parameters are unsigned integers. This is a big change since filters affect many parts of the code. In the following, the terms "compressor" and "filter" and "codec" are generally used synonomously. ### Filter-Related Changes: * In order to support dynamic loading of shared filter libraries, a new library was added in the libncpoco directory; it helps to isolate dynamic loading across multiple platforms. * Provide a json parsing library for use by plugins; this is created by merging libdispatch/ncjson.c with include/ncjson.h. * Add a new _Codecs attribute to allow clients to see what codecs are being used; let ncdump -s print it out. * Provide special headers to help support compilation of HDF5 filters when HDF5 is not enabled: netcdf_filter_hdf5_build.h and netcdf_filter_build.h. * Add a number of new test to test the new nczarr filters. * Let ncgen parse _Codecs attribute, although it is ignored. ### Plugin directory changes: * Add support for the Blosc compressor; this is essential because it is the most common compressor used in Zarr datasets. This also necessitated adding a CMake FindBlosc.cmake file * Add NCZarr support for the big-four filters provided by HDF5: shuffle, fletcher32, deflate (zlib), and szip * Add a Codec defaulter (see docs/filters.md) for the big four filters. * Make plugins work with windows by properly adding __declspec declaration. ### Misc. Non-Filter Changes * Replace most uses of USE_NETCDF4 (deprecated) with USE_HDF5. * Improve support for caching * More fixes for path conversion code * Fix misc. memory leaks * Add new utility -- ncdump/ncpathcvt -- that does more or less the same thing as cygpath. * Add a number of new test to test the non-filter fixes. * Update the parsers * Convert most instances of '#ifdef _MSC_VER' to '#ifdef _WIN32'
2021-09-03 07:04:26 +08:00
size_t index;
/* Look up file metadata. */
if ((retval = NC_check_id(ncid, &my_nc)))
return retval;
my_h5 = my_nc->dispatchdata;
assert(my_h5 && my_h5->root_grp);
/* If we can't find it, the grp id part of ncid is bad. */
Add filter support to NCZarr Filter support has three goals: 1. Use the existing HDF5 filter implementations, 2. Allow filter metadata to be stored in the NumCodecs metadata format used by Zarr, 3. Allow filters to be used even when HDF5 is disabled Detailed usage directions are define in docs/filters.md. For now, the existing filter API is left in place. So filters are defined using ''nc_def_var_filter'' using the HDF5 style where the id and parameters are unsigned integers. This is a big change since filters affect many parts of the code. In the following, the terms "compressor" and "filter" and "codec" are generally used synonomously. ### Filter-Related Changes: * In order to support dynamic loading of shared filter libraries, a new library was added in the libncpoco directory; it helps to isolate dynamic loading across multiple platforms. * Provide a json parsing library for use by plugins; this is created by merging libdispatch/ncjson.c with include/ncjson.h. * Add a new _Codecs attribute to allow clients to see what codecs are being used; let ncdump -s print it out. * Provide special headers to help support compilation of HDF5 filters when HDF5 is not enabled: netcdf_filter_hdf5_build.h and netcdf_filter_build.h. * Add a number of new test to test the new nczarr filters. * Let ncgen parse _Codecs attribute, although it is ignored. ### Plugin directory changes: * Add support for the Blosc compressor; this is essential because it is the most common compressor used in Zarr datasets. This also necessitated adding a CMake FindBlosc.cmake file * Add NCZarr support for the big-four filters provided by HDF5: shuffle, fletcher32, deflate (zlib), and szip * Add a Codec defaulter (see docs/filters.md) for the big four filters. * Make plugins work with windows by properly adding __declspec declaration. ### Misc. Non-Filter Changes * Replace most uses of USE_NETCDF4 (deprecated) with USE_HDF5. * Improve support for caching * More fixes for path conversion code * Fix misc. memory leaks * Add new utility -- ncdump/ncpathcvt -- that does more or less the same thing as cygpath. * Add a number of new test to test the non-filter fixes. * Update the parsers * Convert most instances of '#ifdef _MSC_VER' to '#ifdef _WIN32'
2021-09-03 07:04:26 +08:00
index = (ncid & GRP_ID_MASK);
if (!(my_grp = nclistget(my_h5->allgroups,index)))
return NC_EBADID;
/* Return pointers to caller, if desired. */
if (nc)
*nc = my_nc;
if (h5)
*h5 = my_h5;
if (grp)
*grp = my_grp;
return NC_NOERR;
2010-06-03 21:24:43 +08:00
}
2018-08-09 20:47:45 +08:00
/**
* @internal Given an ncid and varid, get pointers to the group and var
* metadata.
*
* @param ncid File ID.
* @param varid Variable ID.
* @param h5 Pointer that gets pointer to the NC_FILE_INFO_T struct
* for this file. Ignored if NULL.
* @param grp Pointer that gets pointer to group info. Ignored if
* NULL.
* @param var Pointer that gets pointer to var info. Ignored if NULL.
*
* @return ::NC_NOERR No error.
* @author Ed Hartnett
*/
int
nc4_find_grp_h5_var(int ncid, int varid, NC_FILE_INFO_T **h5, NC_GRP_INFO_T **grp,
NC_VAR_INFO_T **var)
{
NC_FILE_INFO_T *my_h5;
NC_GRP_INFO_T *my_grp;
NC_VAR_INFO_T *my_var;
int retval;
/* Look up file and group metadata. */
if ((retval = nc4_find_grp_h5(ncid, &my_grp, &my_h5)))
return retval;
assert(my_grp && my_h5);
/* Find the var. */
if (!(my_var = (NC_VAR_INFO_T *)ncindexith(my_grp->vars, varid)))
return NC_ENOTVAR;
assert(my_var && my_var->hdr.id == varid);
/* Return pointers that caller wants. */
if (h5)
*h5 = my_h5;
if (grp)
*grp = my_grp;
if (var)
*var = my_var;
return NC_NOERR;
2018-08-09 20:47:45 +08:00
}
/**
2018-12-11 21:25:33 +08:00
* @internal Find a dim in the file.
*
* @param grp Pointer to group info struct.
* @param dimid Dimension ID to find.
* @param dim Pointer that gets pointer to dim info if found.
2018-08-22 20:13:53 +08:00
* @param dim_grp Pointer that gets pointer to group info of group
* that contains dimension. Ignored if NULL.
2017-12-05 03:21:14 +08:00
*
2017-12-03 22:57:21 +08:00
* @return ::NC_NOERR No error.
* @return ::NC_EBADDIM Dimension not found.
2018-12-11 21:25:33 +08:00
* @author Ed Hartnett, Dennis Heimbigner
*/
2010-06-03 21:24:43 +08:00
int
nc4_find_dim(NC_GRP_INFO_T *grp, int dimid, NC_DIM_INFO_T **dim,
2017-12-05 03:21:14 +08:00
NC_GRP_INFO_T **dim_grp)
2010-06-03 21:24:43 +08:00
{
assert(grp && grp->nc4_info && dim);
LOG((4, "%s: dimid %d", __func__, dimid));
2010-06-03 21:24:43 +08:00
/* Find the dim info. */
if (!((*dim) = nclistget(grp->nc4_info->alldims, dimid)))
return NC_EBADDIM;
2010-06-03 21:24:43 +08:00
/* Give the caller the group the dimension is in. */
if (dim_grp)
*dim_grp = (*dim)->container;
2010-06-03 21:24:43 +08:00
return NC_NOERR;
2010-06-03 21:24:43 +08:00
}
2017-12-03 22:57:21 +08:00
/**
2017-12-05 03:21:14 +08:00
* @internal Find a var (by name) in a grp.
*
* @param grp Pointer to group info.
* @param name Name of var to find.
* @param var Pointer that gets pointer to var info struct, if found.
2017-12-03 22:57:21 +08:00
*
* @return ::NC_NOERR No error.
* @author Ed Hartnett
2017-12-05 03:21:14 +08:00
*/
int
nc4_find_var(NC_GRP_INFO_T *grp, const char *name, NC_VAR_INFO_T **var)
{
assert(grp && var && name);
/* Find the var info. */
*var = (NC_VAR_INFO_T*)ncindexlookup(grp->vars,name);
return NC_NOERR;
}
2017-12-03 22:57:21 +08:00
/**
* @internal Locate netCDF type by name.
2017-12-03 22:57:21 +08:00
*
2017-12-05 03:21:14 +08:00
* @param start_grp Pointer to starting group info.
* @param name Name of type to find.
*
* @return Pointer to type info, or NULL if not found.
* @author Ed Hartnett, Dennis Heimbigner
2017-12-05 03:21:14 +08:00
*/
2010-06-03 21:24:43 +08:00
NC_TYPE_INFO_T *
nc4_rec_find_named_type(NC_GRP_INFO_T *start_grp, char *name)
2010-06-03 21:24:43 +08:00
{
NC_GRP_INFO_T *g;
NC_TYPE_INFO_T *type, *res;
int i;
assert(start_grp);
/* Does this group have the type we are searching for? */
type = (NC_TYPE_INFO_T*)ncindexlookup(start_grp->type,name);
if(type != NULL)
return type;
/* Search subgroups. */
for(i=0;i<ncindexsize(start_grp->children);i++) {
g = (NC_GRP_INFO_T*)ncindexith(start_grp->children,i);
if(g == NULL) continue;
if ((res = nc4_rec_find_named_type(g, name)))
return res;
}
/* Can't find it. Oh, woe is me! */
return NULL;
2010-06-03 21:24:43 +08:00
}
2017-12-03 22:57:21 +08:00
/**
* @internal Use a netCDF typeid to find a type in a type_list.
*
* @param h5 Pointer to HDF5 file info struct.
* @param typeid The netCDF type ID.
* @param type Pointer to pointer to the list of type info structs.
*
* @return ::NC_NOERR No error.
* @return ::NC_EINVAL Invalid input.
* @author Ed Hartnett
2017-12-05 03:21:14 +08:00
*/
2010-06-03 21:24:43 +08:00
int
nc4_find_type(const NC_FILE_INFO_T *h5, nc_type typeid, NC_TYPE_INFO_T **type)
2010-06-03 21:24:43 +08:00
{
/* Check inputs. */
assert(h5);
if (typeid < 0 || !type)
return NC_EINVAL;
*type = NULL;
/* Atomic types don't have associated NC_TYPE_INFO_T struct, just
* return NOERR. */
if (typeid <= NC_STRING)
return NC_NOERR;
/* Find the type. */
if (!(*type = nclistget(h5->alltypes,typeid)))
return NC_EBADTYPID;
return NC_NOERR;
2010-06-03 21:24:43 +08:00
}
2017-12-03 22:57:21 +08:00
/**
* @internal Given a group, find an att. If name is provided, use that,
* otherwise use the attnum.
2017-12-03 22:57:21 +08:00
*
* @param grp Pointer to group info struct.
* @param varid Variable ID.
* @param name Name to of attribute.
* @param attnum Number of attribute.
* @param att Pointer to pointer that gets attribute info struct.
*
* @return ::NC_NOERR No error.
* @return ::NC_ENOTVAR Variable not found.
* @return ::NC_ENOTATT Attribute not found.
* @author Ed Hartnett
2017-12-05 03:21:14 +08:00
*/
2010-06-03 21:24:43 +08:00
int
nc4_find_grp_att(NC_GRP_INFO_T *grp, int varid, const char *name, int attnum,
2017-12-05 03:21:14 +08:00
NC_ATT_INFO_T **att)
2010-06-03 21:24:43 +08:00
{
NC_VAR_INFO_T *var;
NC_ATT_INFO_T *my_att;
NCindex *attlist = NULL;
assert(grp && grp->hdr.name && att);
LOG((4, "%s: grp->name %s varid %d attnum %d", __func__, grp->hdr.name,
varid, attnum));
/* Get either the global or a variable attribute list. */
if (varid == NC_GLOBAL)
{
attlist = grp->att;
}
else
{
var = (NC_VAR_INFO_T*)ncindexith(grp->vars,varid);
if (!var) return NC_ENOTVAR;
attlist = var->att;
}
assert(attlist);
/* Now find the attribute by name or number. If a name is provided,
* ignore the attnum. */
if (name)
my_att = (NC_ATT_INFO_T *)ncindexlookup(attlist, name);
else
my_att = (NC_ATT_INFO_T *)ncindexith(attlist, attnum);
if (!my_att)
return NC_ENOTATT;
*att = my_att;
return NC_NOERR;
2010-06-03 21:24:43 +08:00
}
2017-12-03 22:57:21 +08:00
/**
* @internal Given an ncid, varid, and name or attnum, find and return
* pointer to NC_ATT_INFO_T metadata.
*
* @param ncid File and group ID.
* @param varid Variable ID.
* @param name Name to of attribute.
* @param attnum Number of attribute.
* @param att Pointer to pointer that gets attribute info struct.
2017-12-03 22:57:21 +08:00
*
* @return ::NC_NOERR No error.
* @return ::NC_ENOTVAR Variable not found.
* @return ::NC_ENOTATT Attribute not found.
* @author Ed Hartnett
2017-12-05 03:21:14 +08:00
*/
2010-06-03 21:24:43 +08:00
int
nc4_find_nc_att(int ncid, int varid, const char *name, int attnum,
2017-12-05 03:21:14 +08:00
NC_ATT_INFO_T **att)
2010-06-03 21:24:43 +08:00
{
NC_GRP_INFO_T *grp;
int retval;
2010-06-03 21:24:43 +08:00
LOG((4, "nc4_find_nc_att: ncid 0x%x varid %d name %s attnum %d",
ncid, varid, name, attnum));
2010-06-03 21:24:43 +08:00
/* Find info for this file and group, and set pointer to each. */
if ((retval = nc4_find_grp_h5(ncid, &grp, NULL)))
return retval;
assert(grp);
2010-06-03 21:24:43 +08:00
return nc4_find_grp_att(grp, varid, name, attnum, att);
2010-06-03 21:24:43 +08:00
}
/**
* @internal Add NC_OBJ to allXXX lists in a file
*
* @param file Pointer to the containing file
* @param obj Pointer to object to add.
*
* @author Dennis Heimbigner
*/
static void
obj_track(NC_FILE_INFO_T* file, NC_OBJ* obj)
{
NClist* list = NULL;
/* record the object in the file */
switch (obj->sort) {
case NCDIM: list = file->alldims; break;
case NCTYP: list = file->alltypes; break;
case NCGRP: list = file->allgroups; break;
default:
assert(NC_FALSE);
}
/* Insert at the appropriate point in the list */
nclistset(list,obj->id,obj);
}
2010-06-03 21:24:43 +08:00
2017-12-03 22:57:21 +08:00
/**
2018-07-29 18:57:25 +08:00
* @internal Create a new variable and insert into relevant
* lists. Dimensionality info need not be known.
*
* @param grp the containing group
* @param name the name for the new variable
* @param var Pointer in which to return a pointer to the new var.
2017-12-03 22:57:21 +08:00
*
* @param var Pointer to pointer that gets variable info struct.
*
* @return ::NC_NOERR No error.
* @return ::NC_ENOMEM Out of memory.
* @author Ed Hartnett
2017-12-05 03:21:14 +08:00
*/
2010-06-03 21:24:43 +08:00
int
2018-07-29 18:57:25 +08:00
nc4_var_list_add2(NC_GRP_INFO_T *grp, const char *name, NC_VAR_INFO_T **var)
2010-06-03 21:24:43 +08:00
{
NC_VAR_INFO_T *new_var = NULL;
/* Allocate storage for new variable. */
if (!(new_var = calloc(1, sizeof(NC_VAR_INFO_T))))
return NC_ENOMEM;
new_var->hdr.sort = NCVAR;
new_var->container = grp;
/* These are the HDF5-1.8.4 defaults. */
new_var->chunk_cache_size = nc4_chunk_cache_size;
new_var->chunk_cache_nelems = nc4_chunk_cache_nelems;
new_var->chunk_cache_preemption = nc4_chunk_cache_preemption;
/* Now fill in the values in the var info structure. */
new_var->hdr.id = ncindexsize(grp->vars);
if (!(new_var->hdr.name = strdup(name))) {
if(new_var)
free(new_var);
return NC_ENOMEM;
}
/* Create an indexed list for the attributes. */
new_var->att = ncindexnew(0);
/* Officially track it */
ncindexadd(grp->vars, (NC_OBJ *)new_var);
/* Set the var pointer, if one was given */
if (var)
*var = new_var;
return NC_NOERR;
2018-07-29 18:57:25 +08:00
}
/**
* @internal Set the number of dims in an NC_VAR_INFO_T struct.
*
* @param var Pointer to the var.
* @param ndims Number of dimensions for this var.
*
* @param var Pointer to pointer that gets variable info struct.
*
* @return ::NC_NOERR No error.
* @return ::NC_ENOMEM Out of memory.
* @author Ed Hartnett
*/
int
nc4_var_set_ndims(NC_VAR_INFO_T *var, int ndims)
{
assert(var);
2018-07-29 18:57:25 +08:00
/* Remember the number of dimensions. */
var->ndims = ndims;
2018-07-29 18:57:25 +08:00
/* Allocate space for dimension information. */
if (ndims)
{
if (!(var->dim = calloc(ndims, sizeof(NC_DIM_INFO_T *))))
return NC_ENOMEM;
if (!(var->dimids = calloc(ndims, sizeof(int))))
return NC_ENOMEM;
2018-07-29 18:57:25 +08:00
/* Initialize dimids to illegal values (-1). See the comment
in nc4_rec_match_dimscales(). */
memset(var->dimids, -1, ndims * sizeof(int));
}
return NC_NOERR;
2018-07-29 18:57:25 +08:00
}
/**
* @internal Create a new variable and insert int relevant list.
*
* @param grp the containing group
* @param name the name for the new variable
* @param ndims the rank of the new variable
* @param var Pointer in which to return a pointer to the new var.
*
* @param var Pointer to pointer that gets variable info struct.
*
* @return ::NC_NOERR No error.
* @return ::NC_ENOMEM Out of memory.
* @author Ed Hartnett
*/
int
nc4_var_list_add(NC_GRP_INFO_T* grp, const char* name, int ndims,
NC_VAR_INFO_T **var)
{
int retval;
2018-07-29 18:57:25 +08:00
if ((retval = nc4_var_list_add2(grp, name, var)))
return retval;
if ((retval = nc4_var_set_ndims(*var, ndims)))
return retval;
return NC_NOERR;
2010-06-03 21:24:43 +08:00
}
2017-12-03 22:57:21 +08:00
/**
2018-07-29 18:57:25 +08:00
* @internal Add a dimension to the dimension list for a group.
2017-12-03 22:57:21 +08:00
*
* @param grp container for the dim
* @param name for the dim
* @param len for the dim
* @param assignedid override dimid if >= 0
2017-12-03 22:57:21 +08:00
* @param dim Pointer to pointer that gets the new dim info struct.
*
* @return ::NC_NOERR No error.
2018-07-29 18:57:25 +08:00
* @return ::NC_ENOMEM Out of memory.
2017-12-03 22:57:21 +08:00
* @author Ed Hartnett
2017-12-05 03:21:14 +08:00
*/
2010-06-03 21:24:43 +08:00
int
2018-07-29 18:57:25 +08:00
nc4_dim_list_add(NC_GRP_INFO_T *grp, const char *name, size_t len,
int assignedid, NC_DIM_INFO_T **dim)
2010-06-03 21:24:43 +08:00
{
NC_DIM_INFO_T *new_dim = NULL;
2018-07-29 18:57:25 +08:00
assert(grp && name);
/* Allocate memory for dim metadata. */
if (!(new_dim = calloc(1, sizeof(NC_DIM_INFO_T))))
return NC_ENOMEM;
2018-07-29 18:57:25 +08:00
new_dim->hdr.sort = NCDIM;
2010-06-03 21:24:43 +08:00
/* Assign the dimension ID. */
if (assignedid >= 0)
new_dim->hdr.id = assignedid;
else
new_dim->hdr.id = grp->nc4_info->next_dimid++;
/* Remember the name and create a hash. */
if (!(new_dim->hdr.name = strdup(name))) {
if(new_dim)
free(new_dim);
return NC_ENOMEM;
}
2018-07-29 18:57:25 +08:00
/* Is dimension unlimited? */
new_dim->len = len;
if (len == NC_UNLIMITED)
new_dim->unlimited = NC_TRUE;
2018-07-29 18:57:25 +08:00
/* Remember the containing group. */
new_dim->container = grp;
/* Add object to dimension list for this group. */
ncindexadd(grp->dim, (NC_OBJ *)new_dim);
obj_track(grp->nc4_info, (NC_OBJ *)new_dim);
/* Set the dim pointer, if one was given */
if (dim)
*dim = new_dim;
return NC_NOERR;
2010-06-03 21:24:43 +08:00
}
2017-12-03 22:57:21 +08:00
/**
2018-08-22 21:03:37 +08:00
* @internal Add to an attribute list.
2017-12-03 22:57:21 +08:00
*
* @param list NCindex of att info structs.
* @param name name of the new attribute
2018-08-22 21:03:37 +08:00
* @param att Pointer to pointer that gets the new att info
* struct. Ignored if NULL.
2017-12-03 22:57:21 +08:00
*
* @return ::NC_NOERR No error.
* @return ::NC_ENOMEM Out of memory.
2017-12-03 22:57:21 +08:00
* @author Ed Hartnett
2017-12-05 03:21:14 +08:00
*/
2010-06-03 21:24:43 +08:00
int
nc4_att_list_add(NCindex *list, const char *name, NC_ATT_INFO_T **att)
2010-06-03 21:24:43 +08:00
{
NC_ATT_INFO_T *new_att = NULL;
LOG((3, "%s: name %s ", __func__, name));
if (!(new_att = calloc(1, sizeof(NC_ATT_INFO_T))))
return NC_ENOMEM;
new_att->hdr.sort = NCATT;
/* Fill in the information we know. */
new_att->hdr.id = ncindexsize(list);
if (!(new_att->hdr.name = strdup(name))) {
if(new_att)
free(new_att);
return NC_ENOMEM;
}
/* Add object to list as specified by its number */
ncindexadd(list, (NC_OBJ *)new_att);
/* Set the attribute pointer, if one was given */
if (att)
*att = new_att;
2010-06-03 21:24:43 +08:00
return NC_NOERR;
2010-06-03 21:24:43 +08:00
}
2017-12-03 22:57:21 +08:00
/**
* @internal Add a group to a group list.
2017-12-03 22:57:21 +08:00
*
* @param h5 Pointer to the file info.
* @param parent Pointer to the parent group. Will be NULL when adding
* the root group.
2017-12-03 22:57:21 +08:00
* @param name Name of the group.
2018-08-22 21:03:37 +08:00
* @param grp Pointer to pointer that gets new group info
* struct. Ignored if NULL.
2017-12-03 22:57:21 +08:00
*
* @return ::NC_NOERR No error.
* @return ::NC_ENOMEM Out of memory.
* @author Ed Hartnett, Dennis Heimbigner
2017-12-05 03:21:14 +08:00
*/
2010-06-03 21:24:43 +08:00
int
nc4_grp_list_add(NC_FILE_INFO_T *h5, NC_GRP_INFO_T *parent, char *name,
NC_GRP_INFO_T **grp)
2010-06-03 21:24:43 +08:00
{
NC_GRP_INFO_T *new_grp;
/* Check inputs. */
assert(h5 && name);
LOG((3, "%s: name %s ", __func__, name));
/* Get the memory to store this groups info. */
if (!(new_grp = calloc(1, sizeof(NC_GRP_INFO_T))))
return NC_ENOMEM;
/* Fill in this group's information. */
new_grp->hdr.sort = NCGRP;
new_grp->nc4_info = h5;
new_grp->parent = parent;
/* Assign the group ID. The root group will get id 0. */
new_grp->hdr.id = h5->next_nc_grpid++;
assert(parent || !new_grp->hdr.id);
/* Handle the group name. */
if (!(new_grp->hdr.name = strdup(name)))
{
free(new_grp);
return NC_ENOMEM;
}
/* Set up new indexed lists for stuff this group can contain. */
new_grp->children = ncindexnew(0);
new_grp->dim = ncindexnew(0);
new_grp->att = ncindexnew(0);
new_grp->type = ncindexnew(0);
new_grp->vars = ncindexnew(0);
/* Add object to lists */
if (parent)
ncindexadd(parent->children, (NC_OBJ *)new_grp);
obj_track(h5, (NC_OBJ *)new_grp);
/* Set the group pointer, if one was given */
if (grp)
*grp = new_grp;
return NC_NOERR;
2010-06-03 21:24:43 +08:00
}
2017-12-03 22:57:21 +08:00
/**
* @internal Names for groups, variables, and types must not be the
* same. This function checks that a proposed name is not already in
2010-06-03 21:24:43 +08:00
* use. Normalzation of UTF8 strings should happen before this
2017-12-03 22:57:21 +08:00
* function is called.
*
* @param grp Pointer to group info struct.
* @param name Name to check.
*
* @return ::NC_NOERR No error.
* @return ::NC_ENAMEINUSE Name is in use.
2018-08-22 21:03:37 +08:00
* @author Ed Hartnett, Dennis Heimbigner
2017-12-05 03:21:14 +08:00
*/
2010-06-03 21:24:43 +08:00
int
nc4_check_dup_name(NC_GRP_INFO_T *grp, char *name)
{
NC_TYPE_INFO_T *type;
NC_GRP_INFO_T *g;
NC_VAR_INFO_T *var;
/* Any types of this name? */
type = (NC_TYPE_INFO_T*)ncindexlookup(grp->type,name);
if(type != NULL)
return NC_ENAMEINUSE;
/* Any child groups of this name? */
g = (NC_GRP_INFO_T*)ncindexlookup(grp->children,name);
if(g != NULL)
return NC_ENAMEINUSE;
/* Any variables of this name? */
var = (NC_VAR_INFO_T*)ncindexlookup(grp->vars,name);
if(var != NULL)
return NC_ENAMEINUSE;
return NC_NOERR;
2010-06-03 21:24:43 +08:00
}
2017-12-03 22:57:21 +08:00
/**
* @internal Create a type, but do not add to various lists nor
* increment its ref count
2017-12-03 22:57:21 +08:00
*
* @param size Size of type in bytes.
* @param name Name of type.
* @param assignedid if >= 0 then override the default type id.
* @param type Pointer that gets pointer to new type info struct.
2017-12-03 22:57:21 +08:00
*
* @return ::NC_NOERR No error.
* @return ::NC_ENOMEM Out of memory.
* @author Ed Hartnett, Ward Fisher
2017-12-05 03:21:14 +08:00
*/
2010-06-03 21:24:43 +08:00
int
2018-11-17 01:07:54 +08:00
nc4_type_new(size_t size, const char *name, int assignedid,
NC_TYPE_INFO_T **type)
2010-06-03 21:24:43 +08:00
{
NC_TYPE_INFO_T *new_type;
LOG((4, "%s: size %d name %s assignedid %d", __func__, size, name, assignedid));
2018-11-17 01:07:54 +08:00
/* Check inputs. */
assert(type);
2010-06-03 21:24:43 +08:00
/* Allocate memory for the type */
if (!(new_type = calloc(1, sizeof(NC_TYPE_INFO_T))))
return NC_ENOMEM;
new_type->hdr.sort = NCTYP;
new_type->hdr.id = assignedid;
2010-06-03 21:24:43 +08:00
/* Remember info about this type. */
new_type->size = size;
if (!(new_type->hdr.name = strdup(name))) {
free(new_type);
return NC_ENOMEM;
}
/* Return a pointer to the new type. */
*type = new_type;
return NC_NOERR;
}
/**
2018-08-22 21:03:37 +08:00
* @internal Add to the type list.
*
* @param grp Pointer to group info struct.
* @param size Size of type in bytes.
* @param name Name of type.
* @param type Pointer that gets pointer to new type info
* struct.
*
* @return ::NC_NOERR No error.
* @return ::NC_ENOMEM Out of memory.
* @author Ed Hartnett
*/
int
nc4_type_list_add(NC_GRP_INFO_T *grp, size_t size, const char *name,
NC_TYPE_INFO_T **type)
{
NC_TYPE_INFO_T *new_type;
int retval;
/* Check inputs. */
assert(grp && name && type);
LOG((4, "%s: size %d name %s", __func__, size, name));
/* Create the new TYPE_INFO struct. */
if ((retval = nc4_type_new(size, name, grp->nc4_info->next_typeid,
&new_type)))
return retval;
grp->nc4_info->next_typeid++;
/* Increment the ref. count on the type */
new_type->rc++;
/* Add object to lists */
ncindexadd(grp->type, (NC_OBJ *)new_type);
obj_track(grp->nc4_info,(NC_OBJ*)new_type);
/* Return a pointer to the new type. */
*type = new_type;
2010-06-03 21:24:43 +08:00
return NC_NOERR;
2010-06-03 21:24:43 +08:00
}
2017-12-03 22:57:21 +08:00
/**
2018-08-22 21:03:37 +08:00
* @internal Add to the compound field list.
2017-12-03 22:57:21 +08:00
*
* @param parent parent type
2017-12-03 22:57:21 +08:00
* @param name Name of the field.
* @param offset Offset in bytes.
* @param xtype The netCDF type of the field.
* @param ndims The number of dimensions of the field.
* @param dim_sizesp An array of dim sizes for the field.
2017-12-05 03:21:14 +08:00
*
2017-12-03 22:57:21 +08:00
* @return ::NC_NOERR No error.
2018-08-22 21:03:37 +08:00
* @author Ed Hartnett, Dennis Heimbigner
2017-12-05 03:21:14 +08:00
*/
2010-06-03 21:24:43 +08:00
int
nc4_field_list_add(NC_TYPE_INFO_T *parent, const char *name,
2018-11-16 23:26:09 +08:00
size_t offset, nc_type xtype, int ndims,
const int *dim_sizesp)
2010-06-03 21:24:43 +08:00
{
NC_FIELD_INFO_T *field;
/* Name has already been checked and UTF8 normalized. */
if (!name)
return NC_EINVAL;
/* Allocate storage for this field information. */
if (!(field = calloc(1, sizeof(NC_FIELD_INFO_T))))
return NC_ENOMEM;
field->hdr.sort = NCFLD;
/* Store the information about this field. */
if (!(field->hdr.name = strdup(name)))
{
free(field);
return NC_ENOMEM;
}
field->nc_typeid = xtype;
field->offset = offset;
field->ndims = ndims;
if (ndims)
{
int i;
if (!(field->dim_size = malloc(ndims * sizeof(int))))
{
free(field->hdr.name);
free(field);
return NC_ENOMEM;
}
for (i = 0; i < ndims; i++)
field->dim_size[i] = dim_sizesp[i];
}
/* Add object to lists */
field->hdr.id = nclistlength(parent->u.c.field);
nclistpush(parent->u.c.field,field);
return NC_NOERR;
2010-06-03 21:24:43 +08:00
}
2017-12-03 22:57:21 +08:00
/**
2017-12-05 03:21:14 +08:00
* @internal Add a member to an enum type.
2017-12-03 22:57:21 +08:00
*
* @param parent Containing NC_TYPE_INFO_T object
2017-12-03 22:57:21 +08:00
* @param size Size in bytes of new member.
* @param name Name of the member.
* @param value Value to associate with member.
*
* @return ::NC_NOERR No error.
* @return ::NC_ENOMEM Out of memory.
* @author Ed Hartnett
2017-12-05 03:21:14 +08:00
*/
2010-06-03 21:24:43 +08:00
int
nc4_enum_member_add(NC_TYPE_INFO_T *parent, size_t size,
2017-12-05 03:21:14 +08:00
const char *name, const void *value)
2010-06-03 21:24:43 +08:00
{
NC_ENUM_MEMBER_INFO_T *member;
/* Name has already been checked. */
assert(name && size > 0 && value);
LOG((4, "%s: size %d name %s", __func__, size, name));
/* Allocate storage for this field information. */
if (!(member = calloc(1, sizeof(NC_ENUM_MEMBER_INFO_T))))
return NC_ENOMEM;
if (!(member->value = malloc(size))) {
free(member);
return NC_ENOMEM;
}
if (!(member->name = strdup(name))) {
free(member->value);
free(member);
return NC_ENOMEM;
}
/* Store the value for this member. */
memcpy(member->value, value, size);
/* Add object to list */
nclistpush(parent->u.e.enum_member,member);
return NC_NOERR;
2010-06-03 21:24:43 +08:00
}
2017-12-03 22:57:21 +08:00
/**
* @internal Free up a field
2017-12-03 22:57:21 +08:00
*
* @param field Pointer to field info of field to delete.
*
* @return ::NC_NOERR No error.
* @author Ed Hartnett
2017-12-05 03:21:14 +08:00
*/
static void
field_free(NC_FIELD_INFO_T *field)
{
/* Free some stuff. */
if (field->hdr.name)
free(field->hdr.name);
if (field->dim_size)
free(field->dim_size);
/* Nc_Free the memory. */
free(field);
}
2017-12-03 22:57:21 +08:00
/**
2017-12-05 03:21:14 +08:00
* @internal Free allocated space for type information.
2017-12-03 22:57:21 +08:00
*
* @param type Pointer to type info struct.
*
* @return ::NC_NOERR No error.
2018-08-22 21:03:37 +08:00
* @return ::NC_EHDFERR HDF5 error.
* @author Ed Hartnett, Dennis Heimbigner
2017-12-05 03:21:14 +08:00
*/
int
nc4_type_free(NC_TYPE_INFO_T *type)
{
int i;
assert(type && type->rc && type->hdr.name);
/* Decrement the ref. count on the type */
type->rc--;
/* Release the type, if the ref. count drops to zero */
if (type->rc == 0)
{
LOG((4, "%s: deleting type %s", __func__, type->hdr.name));
/* Free the name. */
free(type->hdr.name);
/* Enums and compound types have lists of fields to clean up. */
switch (type->nc_type_class)
{
case NC_COMPOUND:
{
NC_FIELD_INFO_T *field;
/* Delete all the fields in this type (there will be some if its a
* compound). */
for(i=0;i<nclistlength(type->u.c.field);i++) {
field = nclistget(type->u.c.field,i);
field_free(field);
}
nclistfree(type->u.c.field);
}
break;
case NC_ENUM:
{
NC_ENUM_MEMBER_INFO_T *enum_member;
/* Delete all the enum_members, if any. */
for(i=0;i<nclistlength(type->u.e.enum_member);i++) {
enum_member = nclistget(type->u.e.enum_member,i);
free(enum_member->value);
free(enum_member->name);
free(enum_member);
}
nclistfree(type->u.e.enum_member);
}
break;
default:
break;
}
/* Release the memory. */
free(type);
}
return NC_NOERR;
}
2018-10-23 19:39:00 +08:00
/**
* @internal Free memory of an attribute object
*
* @param att Pointer to attribute info struct.
*
* @return ::NC_NOERR No error.
* @author Ed Hartnett
*/
int
nc4_att_free(NC_ATT_INFO_T *att)
2018-10-23 19:39:00 +08:00
{
Fix various problem around VLEN's re: https://github.com/Unidata/netcdf-c/issues/541 re: https://github.com/Unidata/netcdf-c/issues/1208 re: https://github.com/Unidata/netcdf-c/issues/2078 re: https://github.com/Unidata/netcdf-c/issues/2041 re: https://github.com/Unidata/netcdf-c/issues/2143 For a long time, there have been known problems with the management of complex types containing VLENs. This also involves the string type because it is stored as a VLEN of chars. This PR (mostly) fixes this problem. But note that it adds new functions to netcdf.h (see below) and this may require bumping the .so number. These new functions can be removed, if desired, in favor of functions in netcdf_aux.h, but netcdf.h seems the better place for them because they are intended as alternatives to the nc_free_vlen and nc_free_string functions already in netcdf.h. The term complex type refers to any type that directly or transitively references a VLEN type. So an array of VLENS, a compound with a VLEN field, and so on. In order to properly handle instances of these complex types, it is necessary to have function that can recursively walk instances of such types to perform various actions on them. The term "deep" is also used to mean recursive. At the moment, the two operations needed by the netcdf library are: * free'ing an instance of the complex type * copying an instance of the complex type. The current library does only shallow free and shallow copy of complex types. This means that only the top level is properly free'd or copied, but deep internal blocks in the instance are not touched. Note that the term "vector" will be used to mean a contiguous (in memory) sequence of instances of some type. Given an array with, say, dimensions 2 X 3 X 4, this will be stored in memory as a vector of length 2*3*4=24 instances. The use cases are primarily these. ## nc_get_vars Suppose one is reading a vector of instances using nc_get_vars (or nc_get_vara or nc_get_var, etc.). These functions will return the vector in the top-level memory provided. All interior blocks (form nested VLEN or strings) will have been dynamically allocated. After using this vector of instances, it is necessary to free (aka reclaim) the dynamically allocated memory, otherwise a memory leak occurs. So, the recursive reclaim function is used to walk the returned instance vector and do a deep reclaim of the data. Currently functions are defined in netcdf.h that are supposed to handle this: nc_free_vlen(), nc_free_vlens(), and nc_free_string(). Unfortunately, these functions only do a shallow free, so deeply nested instances are not properly handled by them. Note that internally, the provided data is immediately written so there is no need to copy it. But the caller may need to reclaim the data it passed into the function. ## nc_put_att Suppose one is writing a vector of instances as the data of an attribute using, say, nc_put_att. Internally, the incoming attribute data must be copied and stored so that changes/reclamation of the input data will not affect the attribute. Again, the code inside the netcdf library does only shallow copying rather than deep copy. As a result, one sees effects such as described in Github Issue https://github.com/Unidata/netcdf-c/issues/2143. Also, after defining the attribute, it may be necessary for the user to free the data that was provided as input to nc_put_att(). ## nc_get_att Suppose one is reading a vector of instances as the data of an attribute using, say, nc_get_att. Internally, the existing attribute data must be copied and returned to the caller, and the caller is responsible for reclaiming the returned data. Again, the code inside the netcdf library does only shallow copying rather than deep copy. So this can lead to memory leaks and errors because the deep data is shared between the library and the user. # Solution The solution is to build properly recursive reclaim and copy functions and use those as needed. These recursive functions are defined in libdispatch/dinstance.c and their signatures are defined in include/netcdf.h. For back compatibility, corresponding "ncaux_XXX" functions are defined in include/netcdf_aux.h. ```` int nc_reclaim_data(int ncid, nc_type xtypeid, void* memory, size_t count); int nc_reclaim_data_all(int ncid, nc_type xtypeid, void* memory, size_t count); int nc_copy_data(int ncid, nc_type xtypeid, const void* memory, size_t count, void* copy); int nc_copy_data_all(int ncid, nc_type xtypeid, const void* memory, size_t count, void** copyp); ```` There are two variants. The first two, nc_reclaim_data() and nc_copy_data(), assume the top-level vector is managed by the caller. For reclaim, this is so the user can use, for example, a statically allocated vector. For copy, it assumes the user provides the space into which the copy is stored. The second two, nc_reclaim_data_all() and nc_copy_data_all(), allows the functions to manage the top-level. So for nc_reclaim_data_all, the top level is assumed to be dynamically allocated and will be free'd by nc_reclaim_data_all(). The nc_copy_data_all() function will allocate the top level and return a pointer to it to the user. The user can later pass that pointer to nc_reclaim_data_all() to reclaim the instance(s). # Internal Changes The netcdf-c library internals are changed to use the proper reclaim and copy functions. It turns out that the places where these functions are needed is quite pervasive in the netcdf-c library code. Using these functions also allows some simplification of the code since the stdata and vldata fields of NC_ATT_INFO are no longer needed. Currently this is commented out using the SEPDATA \#define macro. When any bugs are largely fixed, all this code will be removed. # Known Bugs 1. There is still one known failure that has not been solved. All the failures revolve around some variant of this .cdl file. The proximate cause of failure is the use of a VLEN FillValue. ```` netcdf x { types: float(*) row_of_floats ; dimensions: m = 5 ; variables: row_of_floats ragged_array(m) ; row_of_floats ragged_array:_FillValue = {-999} ; data: ragged_array = {10, 11, 12, 13, 14}, {20, 21, 22, 23}, {30, 31, 32}, {40, 41}, _ ; } ```` When a solution is found, I will either add it to this PR or post a new PR. # Related Changes * Mark nc_free_vlen(s) as deprecated in favor of ncaux_reclaim_data. * Remove the --enable-unfixed-memory-leaks option. * Remove the NC_VLENS_NOTEST code that suppresses some vlen tests. * Document this change in docs/internal.md * Disable the tst_vlen_data test in ncdump/tst_nccopy4.sh. * Mark types as fixed size or not (transitively) to optimize the reclaim and copy functions. # Misc. Changes * Make Doxygen process libdispatch/daux.c * Make sure the NC_ATT_INFO_T.container field is set.
2022-01-09 09:30:00 +08:00
int stat = NC_NOERR;
assert(att);
LOG((3, "%s: name %s ", __func__, att->hdr.name));
/* Free the name. */
if (att->hdr.name)
free(att->hdr.name);
Fix various problem around VLEN's re: https://github.com/Unidata/netcdf-c/issues/541 re: https://github.com/Unidata/netcdf-c/issues/1208 re: https://github.com/Unidata/netcdf-c/issues/2078 re: https://github.com/Unidata/netcdf-c/issues/2041 re: https://github.com/Unidata/netcdf-c/issues/2143 For a long time, there have been known problems with the management of complex types containing VLENs. This also involves the string type because it is stored as a VLEN of chars. This PR (mostly) fixes this problem. But note that it adds new functions to netcdf.h (see below) and this may require bumping the .so number. These new functions can be removed, if desired, in favor of functions in netcdf_aux.h, but netcdf.h seems the better place for them because they are intended as alternatives to the nc_free_vlen and nc_free_string functions already in netcdf.h. The term complex type refers to any type that directly or transitively references a VLEN type. So an array of VLENS, a compound with a VLEN field, and so on. In order to properly handle instances of these complex types, it is necessary to have function that can recursively walk instances of such types to perform various actions on them. The term "deep" is also used to mean recursive. At the moment, the two operations needed by the netcdf library are: * free'ing an instance of the complex type * copying an instance of the complex type. The current library does only shallow free and shallow copy of complex types. This means that only the top level is properly free'd or copied, but deep internal blocks in the instance are not touched. Note that the term "vector" will be used to mean a contiguous (in memory) sequence of instances of some type. Given an array with, say, dimensions 2 X 3 X 4, this will be stored in memory as a vector of length 2*3*4=24 instances. The use cases are primarily these. ## nc_get_vars Suppose one is reading a vector of instances using nc_get_vars (or nc_get_vara or nc_get_var, etc.). These functions will return the vector in the top-level memory provided. All interior blocks (form nested VLEN or strings) will have been dynamically allocated. After using this vector of instances, it is necessary to free (aka reclaim) the dynamically allocated memory, otherwise a memory leak occurs. So, the recursive reclaim function is used to walk the returned instance vector and do a deep reclaim of the data. Currently functions are defined in netcdf.h that are supposed to handle this: nc_free_vlen(), nc_free_vlens(), and nc_free_string(). Unfortunately, these functions only do a shallow free, so deeply nested instances are not properly handled by them. Note that internally, the provided data is immediately written so there is no need to copy it. But the caller may need to reclaim the data it passed into the function. ## nc_put_att Suppose one is writing a vector of instances as the data of an attribute using, say, nc_put_att. Internally, the incoming attribute data must be copied and stored so that changes/reclamation of the input data will not affect the attribute. Again, the code inside the netcdf library does only shallow copying rather than deep copy. As a result, one sees effects such as described in Github Issue https://github.com/Unidata/netcdf-c/issues/2143. Also, after defining the attribute, it may be necessary for the user to free the data that was provided as input to nc_put_att(). ## nc_get_att Suppose one is reading a vector of instances as the data of an attribute using, say, nc_get_att. Internally, the existing attribute data must be copied and returned to the caller, and the caller is responsible for reclaiming the returned data. Again, the code inside the netcdf library does only shallow copying rather than deep copy. So this can lead to memory leaks and errors because the deep data is shared between the library and the user. # Solution The solution is to build properly recursive reclaim and copy functions and use those as needed. These recursive functions are defined in libdispatch/dinstance.c and their signatures are defined in include/netcdf.h. For back compatibility, corresponding "ncaux_XXX" functions are defined in include/netcdf_aux.h. ```` int nc_reclaim_data(int ncid, nc_type xtypeid, void* memory, size_t count); int nc_reclaim_data_all(int ncid, nc_type xtypeid, void* memory, size_t count); int nc_copy_data(int ncid, nc_type xtypeid, const void* memory, size_t count, void* copy); int nc_copy_data_all(int ncid, nc_type xtypeid, const void* memory, size_t count, void** copyp); ```` There are two variants. The first two, nc_reclaim_data() and nc_copy_data(), assume the top-level vector is managed by the caller. For reclaim, this is so the user can use, for example, a statically allocated vector. For copy, it assumes the user provides the space into which the copy is stored. The second two, nc_reclaim_data_all() and nc_copy_data_all(), allows the functions to manage the top-level. So for nc_reclaim_data_all, the top level is assumed to be dynamically allocated and will be free'd by nc_reclaim_data_all(). The nc_copy_data_all() function will allocate the top level and return a pointer to it to the user. The user can later pass that pointer to nc_reclaim_data_all() to reclaim the instance(s). # Internal Changes The netcdf-c library internals are changed to use the proper reclaim and copy functions. It turns out that the places where these functions are needed is quite pervasive in the netcdf-c library code. Using these functions also allows some simplification of the code since the stdata and vldata fields of NC_ATT_INFO are no longer needed. Currently this is commented out using the SEPDATA \#define macro. When any bugs are largely fixed, all this code will be removed. # Known Bugs 1. There is still one known failure that has not been solved. All the failures revolve around some variant of this .cdl file. The proximate cause of failure is the use of a VLEN FillValue. ```` netcdf x { types: float(*) row_of_floats ; dimensions: m = 5 ; variables: row_of_floats ragged_array(m) ; row_of_floats ragged_array:_FillValue = {-999} ; data: ragged_array = {10, 11, 12, 13, 14}, {20, 21, 22, 23}, {30, 31, 32}, {40, 41}, _ ; } ```` When a solution is found, I will either add it to this PR or post a new PR. # Related Changes * Mark nc_free_vlen(s) as deprecated in favor of ncaux_reclaim_data. * Remove the --enable-unfixed-memory-leaks option. * Remove the NC_VLENS_NOTEST code that suppresses some vlen tests. * Document this change in docs/internal.md * Disable the tst_vlen_data test in ncdump/tst_nccopy4.sh. * Mark types as fixed size or not (transitively) to optimize the reclaim and copy functions. # Misc. Changes * Make Doxygen process libdispatch/daux.c * Make sure the NC_ATT_INFO_T.container field is set.
2022-01-09 09:30:00 +08:00
#ifdef SEPDATA
/* Free memory that was malloced to hold data for this
* attribute. */
if (att->data) {
free(att->data);
}
/* If this is a string array attribute, delete all members of the
* string array, then delete the array of pointers to strings. (The
* array was filled with pointers by HDF5 when the att was read,
* and memory for each string was allocated by HDF5. That's why I
* use free and not nc_free, because the netCDF library didn't
* allocate the memory that is being freed.) */
if (att->stdata)
{
Fix various problem around VLEN's re: https://github.com/Unidata/netcdf-c/issues/541 re: https://github.com/Unidata/netcdf-c/issues/1208 re: https://github.com/Unidata/netcdf-c/issues/2078 re: https://github.com/Unidata/netcdf-c/issues/2041 re: https://github.com/Unidata/netcdf-c/issues/2143 For a long time, there have been known problems with the management of complex types containing VLENs. This also involves the string type because it is stored as a VLEN of chars. This PR (mostly) fixes this problem. But note that it adds new functions to netcdf.h (see below) and this may require bumping the .so number. These new functions can be removed, if desired, in favor of functions in netcdf_aux.h, but netcdf.h seems the better place for them because they are intended as alternatives to the nc_free_vlen and nc_free_string functions already in netcdf.h. The term complex type refers to any type that directly or transitively references a VLEN type. So an array of VLENS, a compound with a VLEN field, and so on. In order to properly handle instances of these complex types, it is necessary to have function that can recursively walk instances of such types to perform various actions on them. The term "deep" is also used to mean recursive. At the moment, the two operations needed by the netcdf library are: * free'ing an instance of the complex type * copying an instance of the complex type. The current library does only shallow free and shallow copy of complex types. This means that only the top level is properly free'd or copied, but deep internal blocks in the instance are not touched. Note that the term "vector" will be used to mean a contiguous (in memory) sequence of instances of some type. Given an array with, say, dimensions 2 X 3 X 4, this will be stored in memory as a vector of length 2*3*4=24 instances. The use cases are primarily these. ## nc_get_vars Suppose one is reading a vector of instances using nc_get_vars (or nc_get_vara or nc_get_var, etc.). These functions will return the vector in the top-level memory provided. All interior blocks (form nested VLEN or strings) will have been dynamically allocated. After using this vector of instances, it is necessary to free (aka reclaim) the dynamically allocated memory, otherwise a memory leak occurs. So, the recursive reclaim function is used to walk the returned instance vector and do a deep reclaim of the data. Currently functions are defined in netcdf.h that are supposed to handle this: nc_free_vlen(), nc_free_vlens(), and nc_free_string(). Unfortunately, these functions only do a shallow free, so deeply nested instances are not properly handled by them. Note that internally, the provided data is immediately written so there is no need to copy it. But the caller may need to reclaim the data it passed into the function. ## nc_put_att Suppose one is writing a vector of instances as the data of an attribute using, say, nc_put_att. Internally, the incoming attribute data must be copied and stored so that changes/reclamation of the input data will not affect the attribute. Again, the code inside the netcdf library does only shallow copying rather than deep copy. As a result, one sees effects such as described in Github Issue https://github.com/Unidata/netcdf-c/issues/2143. Also, after defining the attribute, it may be necessary for the user to free the data that was provided as input to nc_put_att(). ## nc_get_att Suppose one is reading a vector of instances as the data of an attribute using, say, nc_get_att. Internally, the existing attribute data must be copied and returned to the caller, and the caller is responsible for reclaiming the returned data. Again, the code inside the netcdf library does only shallow copying rather than deep copy. So this can lead to memory leaks and errors because the deep data is shared between the library and the user. # Solution The solution is to build properly recursive reclaim and copy functions and use those as needed. These recursive functions are defined in libdispatch/dinstance.c and their signatures are defined in include/netcdf.h. For back compatibility, corresponding "ncaux_XXX" functions are defined in include/netcdf_aux.h. ```` int nc_reclaim_data(int ncid, nc_type xtypeid, void* memory, size_t count); int nc_reclaim_data_all(int ncid, nc_type xtypeid, void* memory, size_t count); int nc_copy_data(int ncid, nc_type xtypeid, const void* memory, size_t count, void* copy); int nc_copy_data_all(int ncid, nc_type xtypeid, const void* memory, size_t count, void** copyp); ```` There are two variants. The first two, nc_reclaim_data() and nc_copy_data(), assume the top-level vector is managed by the caller. For reclaim, this is so the user can use, for example, a statically allocated vector. For copy, it assumes the user provides the space into which the copy is stored. The second two, nc_reclaim_data_all() and nc_copy_data_all(), allows the functions to manage the top-level. So for nc_reclaim_data_all, the top level is assumed to be dynamically allocated and will be free'd by nc_reclaim_data_all(). The nc_copy_data_all() function will allocate the top level and return a pointer to it to the user. The user can later pass that pointer to nc_reclaim_data_all() to reclaim the instance(s). # Internal Changes The netcdf-c library internals are changed to use the proper reclaim and copy functions. It turns out that the places where these functions are needed is quite pervasive in the netcdf-c library code. Using these functions also allows some simplification of the code since the stdata and vldata fields of NC_ATT_INFO are no longer needed. Currently this is commented out using the SEPDATA \#define macro. When any bugs are largely fixed, all this code will be removed. # Known Bugs 1. There is still one known failure that has not been solved. All the failures revolve around some variant of this .cdl file. The proximate cause of failure is the use of a VLEN FillValue. ```` netcdf x { types: float(*) row_of_floats ; dimensions: m = 5 ; variables: row_of_floats ragged_array(m) ; row_of_floats ragged_array:_FillValue = {-999} ; data: ragged_array = {10, 11, 12, 13, 14}, {20, 21, 22, 23}, {30, 31, 32}, {40, 41}, _ ; } ```` When a solution is found, I will either add it to this PR or post a new PR. # Related Changes * Mark nc_free_vlen(s) as deprecated in favor of ncaux_reclaim_data. * Remove the --enable-unfixed-memory-leaks option. * Remove the NC_VLENS_NOTEST code that suppresses some vlen tests. * Document this change in docs/internal.md * Disable the tst_vlen_data test in ncdump/tst_nccopy4.sh. * Mark types as fixed size or not (transitively) to optimize the reclaim and copy functions. # Misc. Changes * Make Doxygen process libdispatch/daux.c * Make sure the NC_ATT_INFO_T.container field is set.
2022-01-09 09:30:00 +08:00
int i;
for (i = 0; i < att->len; i++)
if(att->stdata[i])
free(att->stdata[i]);
free(att->stdata);
}
/* If this att has vlen data, release it. */
if (att->vldata)
{
Fix various problem around VLEN's re: https://github.com/Unidata/netcdf-c/issues/541 re: https://github.com/Unidata/netcdf-c/issues/1208 re: https://github.com/Unidata/netcdf-c/issues/2078 re: https://github.com/Unidata/netcdf-c/issues/2041 re: https://github.com/Unidata/netcdf-c/issues/2143 For a long time, there have been known problems with the management of complex types containing VLENs. This also involves the string type because it is stored as a VLEN of chars. This PR (mostly) fixes this problem. But note that it adds new functions to netcdf.h (see below) and this may require bumping the .so number. These new functions can be removed, if desired, in favor of functions in netcdf_aux.h, but netcdf.h seems the better place for them because they are intended as alternatives to the nc_free_vlen and nc_free_string functions already in netcdf.h. The term complex type refers to any type that directly or transitively references a VLEN type. So an array of VLENS, a compound with a VLEN field, and so on. In order to properly handle instances of these complex types, it is necessary to have function that can recursively walk instances of such types to perform various actions on them. The term "deep" is also used to mean recursive. At the moment, the two operations needed by the netcdf library are: * free'ing an instance of the complex type * copying an instance of the complex type. The current library does only shallow free and shallow copy of complex types. This means that only the top level is properly free'd or copied, but deep internal blocks in the instance are not touched. Note that the term "vector" will be used to mean a contiguous (in memory) sequence of instances of some type. Given an array with, say, dimensions 2 X 3 X 4, this will be stored in memory as a vector of length 2*3*4=24 instances. The use cases are primarily these. ## nc_get_vars Suppose one is reading a vector of instances using nc_get_vars (or nc_get_vara or nc_get_var, etc.). These functions will return the vector in the top-level memory provided. All interior blocks (form nested VLEN or strings) will have been dynamically allocated. After using this vector of instances, it is necessary to free (aka reclaim) the dynamically allocated memory, otherwise a memory leak occurs. So, the recursive reclaim function is used to walk the returned instance vector and do a deep reclaim of the data. Currently functions are defined in netcdf.h that are supposed to handle this: nc_free_vlen(), nc_free_vlens(), and nc_free_string(). Unfortunately, these functions only do a shallow free, so deeply nested instances are not properly handled by them. Note that internally, the provided data is immediately written so there is no need to copy it. But the caller may need to reclaim the data it passed into the function. ## nc_put_att Suppose one is writing a vector of instances as the data of an attribute using, say, nc_put_att. Internally, the incoming attribute data must be copied and stored so that changes/reclamation of the input data will not affect the attribute. Again, the code inside the netcdf library does only shallow copying rather than deep copy. As a result, one sees effects such as described in Github Issue https://github.com/Unidata/netcdf-c/issues/2143. Also, after defining the attribute, it may be necessary for the user to free the data that was provided as input to nc_put_att(). ## nc_get_att Suppose one is reading a vector of instances as the data of an attribute using, say, nc_get_att. Internally, the existing attribute data must be copied and returned to the caller, and the caller is responsible for reclaiming the returned data. Again, the code inside the netcdf library does only shallow copying rather than deep copy. So this can lead to memory leaks and errors because the deep data is shared between the library and the user. # Solution The solution is to build properly recursive reclaim and copy functions and use those as needed. These recursive functions are defined in libdispatch/dinstance.c and their signatures are defined in include/netcdf.h. For back compatibility, corresponding "ncaux_XXX" functions are defined in include/netcdf_aux.h. ```` int nc_reclaim_data(int ncid, nc_type xtypeid, void* memory, size_t count); int nc_reclaim_data_all(int ncid, nc_type xtypeid, void* memory, size_t count); int nc_copy_data(int ncid, nc_type xtypeid, const void* memory, size_t count, void* copy); int nc_copy_data_all(int ncid, nc_type xtypeid, const void* memory, size_t count, void** copyp); ```` There are two variants. The first two, nc_reclaim_data() and nc_copy_data(), assume the top-level vector is managed by the caller. For reclaim, this is so the user can use, for example, a statically allocated vector. For copy, it assumes the user provides the space into which the copy is stored. The second two, nc_reclaim_data_all() and nc_copy_data_all(), allows the functions to manage the top-level. So for nc_reclaim_data_all, the top level is assumed to be dynamically allocated and will be free'd by nc_reclaim_data_all(). The nc_copy_data_all() function will allocate the top level and return a pointer to it to the user. The user can later pass that pointer to nc_reclaim_data_all() to reclaim the instance(s). # Internal Changes The netcdf-c library internals are changed to use the proper reclaim and copy functions. It turns out that the places where these functions are needed is quite pervasive in the netcdf-c library code. Using these functions also allows some simplification of the code since the stdata and vldata fields of NC_ATT_INFO are no longer needed. Currently this is commented out using the SEPDATA \#define macro. When any bugs are largely fixed, all this code will be removed. # Known Bugs 1. There is still one known failure that has not been solved. All the failures revolve around some variant of this .cdl file. The proximate cause of failure is the use of a VLEN FillValue. ```` netcdf x { types: float(*) row_of_floats ; dimensions: m = 5 ; variables: row_of_floats ragged_array(m) ; row_of_floats ragged_array:_FillValue = {-999} ; data: ragged_array = {10, 11, 12, 13, 14}, {20, 21, 22, 23}, {30, 31, 32}, {40, 41}, _ ; } ```` When a solution is found, I will either add it to this PR or post a new PR. # Related Changes * Mark nc_free_vlen(s) as deprecated in favor of ncaux_reclaim_data. * Remove the --enable-unfixed-memory-leaks option. * Remove the NC_VLENS_NOTEST code that suppresses some vlen tests. * Document this change in docs/internal.md * Disable the tst_vlen_data test in ncdump/tst_nccopy4.sh. * Mark types as fixed size or not (transitively) to optimize the reclaim and copy functions. # Misc. Changes * Make Doxygen process libdispatch/daux.c * Make sure the NC_ATT_INFO_T.container field is set.
2022-01-09 09:30:00 +08:00
int i;
for (i = 0; i < att->len; i++)
nc_free_vlen(&att->vldata[i]);
free(att->vldata);
}
Fix various problem around VLEN's re: https://github.com/Unidata/netcdf-c/issues/541 re: https://github.com/Unidata/netcdf-c/issues/1208 re: https://github.com/Unidata/netcdf-c/issues/2078 re: https://github.com/Unidata/netcdf-c/issues/2041 re: https://github.com/Unidata/netcdf-c/issues/2143 For a long time, there have been known problems with the management of complex types containing VLENs. This also involves the string type because it is stored as a VLEN of chars. This PR (mostly) fixes this problem. But note that it adds new functions to netcdf.h (see below) and this may require bumping the .so number. These new functions can be removed, if desired, in favor of functions in netcdf_aux.h, but netcdf.h seems the better place for them because they are intended as alternatives to the nc_free_vlen and nc_free_string functions already in netcdf.h. The term complex type refers to any type that directly or transitively references a VLEN type. So an array of VLENS, a compound with a VLEN field, and so on. In order to properly handle instances of these complex types, it is necessary to have function that can recursively walk instances of such types to perform various actions on them. The term "deep" is also used to mean recursive. At the moment, the two operations needed by the netcdf library are: * free'ing an instance of the complex type * copying an instance of the complex type. The current library does only shallow free and shallow copy of complex types. This means that only the top level is properly free'd or copied, but deep internal blocks in the instance are not touched. Note that the term "vector" will be used to mean a contiguous (in memory) sequence of instances of some type. Given an array with, say, dimensions 2 X 3 X 4, this will be stored in memory as a vector of length 2*3*4=24 instances. The use cases are primarily these. ## nc_get_vars Suppose one is reading a vector of instances using nc_get_vars (or nc_get_vara or nc_get_var, etc.). These functions will return the vector in the top-level memory provided. All interior blocks (form nested VLEN or strings) will have been dynamically allocated. After using this vector of instances, it is necessary to free (aka reclaim) the dynamically allocated memory, otherwise a memory leak occurs. So, the recursive reclaim function is used to walk the returned instance vector and do a deep reclaim of the data. Currently functions are defined in netcdf.h that are supposed to handle this: nc_free_vlen(), nc_free_vlens(), and nc_free_string(). Unfortunately, these functions only do a shallow free, so deeply nested instances are not properly handled by them. Note that internally, the provided data is immediately written so there is no need to copy it. But the caller may need to reclaim the data it passed into the function. ## nc_put_att Suppose one is writing a vector of instances as the data of an attribute using, say, nc_put_att. Internally, the incoming attribute data must be copied and stored so that changes/reclamation of the input data will not affect the attribute. Again, the code inside the netcdf library does only shallow copying rather than deep copy. As a result, one sees effects such as described in Github Issue https://github.com/Unidata/netcdf-c/issues/2143. Also, after defining the attribute, it may be necessary for the user to free the data that was provided as input to nc_put_att(). ## nc_get_att Suppose one is reading a vector of instances as the data of an attribute using, say, nc_get_att. Internally, the existing attribute data must be copied and returned to the caller, and the caller is responsible for reclaiming the returned data. Again, the code inside the netcdf library does only shallow copying rather than deep copy. So this can lead to memory leaks and errors because the deep data is shared between the library and the user. # Solution The solution is to build properly recursive reclaim and copy functions and use those as needed. These recursive functions are defined in libdispatch/dinstance.c and their signatures are defined in include/netcdf.h. For back compatibility, corresponding "ncaux_XXX" functions are defined in include/netcdf_aux.h. ```` int nc_reclaim_data(int ncid, nc_type xtypeid, void* memory, size_t count); int nc_reclaim_data_all(int ncid, nc_type xtypeid, void* memory, size_t count); int nc_copy_data(int ncid, nc_type xtypeid, const void* memory, size_t count, void* copy); int nc_copy_data_all(int ncid, nc_type xtypeid, const void* memory, size_t count, void** copyp); ```` There are two variants. The first two, nc_reclaim_data() and nc_copy_data(), assume the top-level vector is managed by the caller. For reclaim, this is so the user can use, for example, a statically allocated vector. For copy, it assumes the user provides the space into which the copy is stored. The second two, nc_reclaim_data_all() and nc_copy_data_all(), allows the functions to manage the top-level. So for nc_reclaim_data_all, the top level is assumed to be dynamically allocated and will be free'd by nc_reclaim_data_all(). The nc_copy_data_all() function will allocate the top level and return a pointer to it to the user. The user can later pass that pointer to nc_reclaim_data_all() to reclaim the instance(s). # Internal Changes The netcdf-c library internals are changed to use the proper reclaim and copy functions. It turns out that the places where these functions are needed is quite pervasive in the netcdf-c library code. Using these functions also allows some simplification of the code since the stdata and vldata fields of NC_ATT_INFO are no longer needed. Currently this is commented out using the SEPDATA \#define macro. When any bugs are largely fixed, all this code will be removed. # Known Bugs 1. There is still one known failure that has not been solved. All the failures revolve around some variant of this .cdl file. The proximate cause of failure is the use of a VLEN FillValue. ```` netcdf x { types: float(*) row_of_floats ; dimensions: m = 5 ; variables: row_of_floats ragged_array(m) ; row_of_floats ragged_array:_FillValue = {-999} ; data: ragged_array = {10, 11, 12, 13, 14}, {20, 21, 22, 23}, {30, 31, 32}, {40, 41}, _ ; } ```` When a solution is found, I will either add it to this PR or post a new PR. # Related Changes * Mark nc_free_vlen(s) as deprecated in favor of ncaux_reclaim_data. * Remove the --enable-unfixed-memory-leaks option. * Remove the NC_VLENS_NOTEST code that suppresses some vlen tests. * Document this change in docs/internal.md * Disable the tst_vlen_data test in ncdump/tst_nccopy4.sh. * Mark types as fixed size or not (transitively) to optimize the reclaim and copy functions. # Misc. Changes * Make Doxygen process libdispatch/daux.c * Make sure the NC_ATT_INFO_T.container field is set.
2022-01-09 09:30:00 +08:00
#else
if (att->data) {
NC_OBJ* parent;
NC_FILE_INFO_T* h5 = NULL;
/* Locate relevant objects */
parent = att->container;
if(parent->sort == NCVAR) parent = (NC_OBJ*)(((NC_VAR_INFO_T*)parent)->container);
assert(parent->sort == NCGRP);
h5 = ((NC_GRP_INFO_T*)parent)->nc4_info;
/* Reclaim the attribute data */
if((stat = nc_reclaim_data(h5->controller->ext_ncid,att->nc_typeid,att->data,att->len))) goto done;
free(att->data); /* reclaim top level */
att->data = NULL;
}
#endif
Fix various problem around VLEN's re: https://github.com/Unidata/netcdf-c/issues/541 re: https://github.com/Unidata/netcdf-c/issues/1208 re: https://github.com/Unidata/netcdf-c/issues/2078 re: https://github.com/Unidata/netcdf-c/issues/2041 re: https://github.com/Unidata/netcdf-c/issues/2143 For a long time, there have been known problems with the management of complex types containing VLENs. This also involves the string type because it is stored as a VLEN of chars. This PR (mostly) fixes this problem. But note that it adds new functions to netcdf.h (see below) and this may require bumping the .so number. These new functions can be removed, if desired, in favor of functions in netcdf_aux.h, but netcdf.h seems the better place for them because they are intended as alternatives to the nc_free_vlen and nc_free_string functions already in netcdf.h. The term complex type refers to any type that directly or transitively references a VLEN type. So an array of VLENS, a compound with a VLEN field, and so on. In order to properly handle instances of these complex types, it is necessary to have function that can recursively walk instances of such types to perform various actions on them. The term "deep" is also used to mean recursive. At the moment, the two operations needed by the netcdf library are: * free'ing an instance of the complex type * copying an instance of the complex type. The current library does only shallow free and shallow copy of complex types. This means that only the top level is properly free'd or copied, but deep internal blocks in the instance are not touched. Note that the term "vector" will be used to mean a contiguous (in memory) sequence of instances of some type. Given an array with, say, dimensions 2 X 3 X 4, this will be stored in memory as a vector of length 2*3*4=24 instances. The use cases are primarily these. ## nc_get_vars Suppose one is reading a vector of instances using nc_get_vars (or nc_get_vara or nc_get_var, etc.). These functions will return the vector in the top-level memory provided. All interior blocks (form nested VLEN or strings) will have been dynamically allocated. After using this vector of instances, it is necessary to free (aka reclaim) the dynamically allocated memory, otherwise a memory leak occurs. So, the recursive reclaim function is used to walk the returned instance vector and do a deep reclaim of the data. Currently functions are defined in netcdf.h that are supposed to handle this: nc_free_vlen(), nc_free_vlens(), and nc_free_string(). Unfortunately, these functions only do a shallow free, so deeply nested instances are not properly handled by them. Note that internally, the provided data is immediately written so there is no need to copy it. But the caller may need to reclaim the data it passed into the function. ## nc_put_att Suppose one is writing a vector of instances as the data of an attribute using, say, nc_put_att. Internally, the incoming attribute data must be copied and stored so that changes/reclamation of the input data will not affect the attribute. Again, the code inside the netcdf library does only shallow copying rather than deep copy. As a result, one sees effects such as described in Github Issue https://github.com/Unidata/netcdf-c/issues/2143. Also, after defining the attribute, it may be necessary for the user to free the data that was provided as input to nc_put_att(). ## nc_get_att Suppose one is reading a vector of instances as the data of an attribute using, say, nc_get_att. Internally, the existing attribute data must be copied and returned to the caller, and the caller is responsible for reclaiming the returned data. Again, the code inside the netcdf library does only shallow copying rather than deep copy. So this can lead to memory leaks and errors because the deep data is shared between the library and the user. # Solution The solution is to build properly recursive reclaim and copy functions and use those as needed. These recursive functions are defined in libdispatch/dinstance.c and their signatures are defined in include/netcdf.h. For back compatibility, corresponding "ncaux_XXX" functions are defined in include/netcdf_aux.h. ```` int nc_reclaim_data(int ncid, nc_type xtypeid, void* memory, size_t count); int nc_reclaim_data_all(int ncid, nc_type xtypeid, void* memory, size_t count); int nc_copy_data(int ncid, nc_type xtypeid, const void* memory, size_t count, void* copy); int nc_copy_data_all(int ncid, nc_type xtypeid, const void* memory, size_t count, void** copyp); ```` There are two variants. The first two, nc_reclaim_data() and nc_copy_data(), assume the top-level vector is managed by the caller. For reclaim, this is so the user can use, for example, a statically allocated vector. For copy, it assumes the user provides the space into which the copy is stored. The second two, nc_reclaim_data_all() and nc_copy_data_all(), allows the functions to manage the top-level. So for nc_reclaim_data_all, the top level is assumed to be dynamically allocated and will be free'd by nc_reclaim_data_all(). The nc_copy_data_all() function will allocate the top level and return a pointer to it to the user. The user can later pass that pointer to nc_reclaim_data_all() to reclaim the instance(s). # Internal Changes The netcdf-c library internals are changed to use the proper reclaim and copy functions. It turns out that the places where these functions are needed is quite pervasive in the netcdf-c library code. Using these functions also allows some simplification of the code since the stdata and vldata fields of NC_ATT_INFO are no longer needed. Currently this is commented out using the SEPDATA \#define macro. When any bugs are largely fixed, all this code will be removed. # Known Bugs 1. There is still one known failure that has not been solved. All the failures revolve around some variant of this .cdl file. The proximate cause of failure is the use of a VLEN FillValue. ```` netcdf x { types: float(*) row_of_floats ; dimensions: m = 5 ; variables: row_of_floats ragged_array(m) ; row_of_floats ragged_array:_FillValue = {-999} ; data: ragged_array = {10, 11, 12, 13, 14}, {20, 21, 22, 23}, {30, 31, 32}, {40, 41}, _ ; } ```` When a solution is found, I will either add it to this PR or post a new PR. # Related Changes * Mark nc_free_vlen(s) as deprecated in favor of ncaux_reclaim_data. * Remove the --enable-unfixed-memory-leaks option. * Remove the NC_VLENS_NOTEST code that suppresses some vlen tests. * Document this change in docs/internal.md * Disable the tst_vlen_data test in ncdump/tst_nccopy4.sh. * Mark types as fixed size or not (transitively) to optimize the reclaim and copy functions. # Misc. Changes * Make Doxygen process libdispatch/daux.c * Make sure the NC_ATT_INFO_T.container field is set.
2022-01-09 09:30:00 +08:00
done:
free(att);
Fix various problem around VLEN's re: https://github.com/Unidata/netcdf-c/issues/541 re: https://github.com/Unidata/netcdf-c/issues/1208 re: https://github.com/Unidata/netcdf-c/issues/2078 re: https://github.com/Unidata/netcdf-c/issues/2041 re: https://github.com/Unidata/netcdf-c/issues/2143 For a long time, there have been known problems with the management of complex types containing VLENs. This also involves the string type because it is stored as a VLEN of chars. This PR (mostly) fixes this problem. But note that it adds new functions to netcdf.h (see below) and this may require bumping the .so number. These new functions can be removed, if desired, in favor of functions in netcdf_aux.h, but netcdf.h seems the better place for them because they are intended as alternatives to the nc_free_vlen and nc_free_string functions already in netcdf.h. The term complex type refers to any type that directly or transitively references a VLEN type. So an array of VLENS, a compound with a VLEN field, and so on. In order to properly handle instances of these complex types, it is necessary to have function that can recursively walk instances of such types to perform various actions on them. The term "deep" is also used to mean recursive. At the moment, the two operations needed by the netcdf library are: * free'ing an instance of the complex type * copying an instance of the complex type. The current library does only shallow free and shallow copy of complex types. This means that only the top level is properly free'd or copied, but deep internal blocks in the instance are not touched. Note that the term "vector" will be used to mean a contiguous (in memory) sequence of instances of some type. Given an array with, say, dimensions 2 X 3 X 4, this will be stored in memory as a vector of length 2*3*4=24 instances. The use cases are primarily these. ## nc_get_vars Suppose one is reading a vector of instances using nc_get_vars (or nc_get_vara or nc_get_var, etc.). These functions will return the vector in the top-level memory provided. All interior blocks (form nested VLEN or strings) will have been dynamically allocated. After using this vector of instances, it is necessary to free (aka reclaim) the dynamically allocated memory, otherwise a memory leak occurs. So, the recursive reclaim function is used to walk the returned instance vector and do a deep reclaim of the data. Currently functions are defined in netcdf.h that are supposed to handle this: nc_free_vlen(), nc_free_vlens(), and nc_free_string(). Unfortunately, these functions only do a shallow free, so deeply nested instances are not properly handled by them. Note that internally, the provided data is immediately written so there is no need to copy it. But the caller may need to reclaim the data it passed into the function. ## nc_put_att Suppose one is writing a vector of instances as the data of an attribute using, say, nc_put_att. Internally, the incoming attribute data must be copied and stored so that changes/reclamation of the input data will not affect the attribute. Again, the code inside the netcdf library does only shallow copying rather than deep copy. As a result, one sees effects such as described in Github Issue https://github.com/Unidata/netcdf-c/issues/2143. Also, after defining the attribute, it may be necessary for the user to free the data that was provided as input to nc_put_att(). ## nc_get_att Suppose one is reading a vector of instances as the data of an attribute using, say, nc_get_att. Internally, the existing attribute data must be copied and returned to the caller, and the caller is responsible for reclaiming the returned data. Again, the code inside the netcdf library does only shallow copying rather than deep copy. So this can lead to memory leaks and errors because the deep data is shared between the library and the user. # Solution The solution is to build properly recursive reclaim and copy functions and use those as needed. These recursive functions are defined in libdispatch/dinstance.c and their signatures are defined in include/netcdf.h. For back compatibility, corresponding "ncaux_XXX" functions are defined in include/netcdf_aux.h. ```` int nc_reclaim_data(int ncid, nc_type xtypeid, void* memory, size_t count); int nc_reclaim_data_all(int ncid, nc_type xtypeid, void* memory, size_t count); int nc_copy_data(int ncid, nc_type xtypeid, const void* memory, size_t count, void* copy); int nc_copy_data_all(int ncid, nc_type xtypeid, const void* memory, size_t count, void** copyp); ```` There are two variants. The first two, nc_reclaim_data() and nc_copy_data(), assume the top-level vector is managed by the caller. For reclaim, this is so the user can use, for example, a statically allocated vector. For copy, it assumes the user provides the space into which the copy is stored. The second two, nc_reclaim_data_all() and nc_copy_data_all(), allows the functions to manage the top-level. So for nc_reclaim_data_all, the top level is assumed to be dynamically allocated and will be free'd by nc_reclaim_data_all(). The nc_copy_data_all() function will allocate the top level and return a pointer to it to the user. The user can later pass that pointer to nc_reclaim_data_all() to reclaim the instance(s). # Internal Changes The netcdf-c library internals are changed to use the proper reclaim and copy functions. It turns out that the places where these functions are needed is quite pervasive in the netcdf-c library code. Using these functions also allows some simplification of the code since the stdata and vldata fields of NC_ATT_INFO are no longer needed. Currently this is commented out using the SEPDATA \#define macro. When any bugs are largely fixed, all this code will be removed. # Known Bugs 1. There is still one known failure that has not been solved. All the failures revolve around some variant of this .cdl file. The proximate cause of failure is the use of a VLEN FillValue. ```` netcdf x { types: float(*) row_of_floats ; dimensions: m = 5 ; variables: row_of_floats ragged_array(m) ; row_of_floats ragged_array:_FillValue = {-999} ; data: ragged_array = {10, 11, 12, 13, 14}, {20, 21, 22, 23}, {30, 31, 32}, {40, 41}, _ ; } ```` When a solution is found, I will either add it to this PR or post a new PR. # Related Changes * Mark nc_free_vlen(s) as deprecated in favor of ncaux_reclaim_data. * Remove the --enable-unfixed-memory-leaks option. * Remove the NC_VLENS_NOTEST code that suppresses some vlen tests. * Document this change in docs/internal.md * Disable the tst_vlen_data test in ncdump/tst_nccopy4.sh. * Mark types as fixed size or not (transitively) to optimize the reclaim and copy functions. # Misc. Changes * Make Doxygen process libdispatch/daux.c * Make sure the NC_ATT_INFO_T.container field is set.
2022-01-09 09:30:00 +08:00
return stat;
2018-10-23 19:39:00 +08:00
}
2017-12-03 22:57:21 +08:00
/**
2018-10-23 19:07:52 +08:00
* @internal Delete a var, and free the memory. All HDF5 objects for
* the var must be closed before this is called.
2017-12-03 22:57:21 +08:00
*
* @param var Pointer to the var info struct of var to delete.
*
* @return ::NC_NOERR No error.
2018-08-22 21:03:37 +08:00
* @author Ed Hartnett, Dennis Heimbigner
2017-12-05 03:21:14 +08:00
*/
2018-10-23 19:39:00 +08:00
static int
var_free(NC_VAR_INFO_T *var)
2010-06-03 21:24:43 +08:00
{
int i;
int retval;
2010-06-03 21:24:43 +08:00
assert(var);
LOG((4, "%s: deleting var %s", __func__, var->hdr.name));
/* First delete all the attributes attached to this var. */
for (i = 0; i < ncindexsize(var->att); i++)
if ((retval = nc4_att_free((NC_ATT_INFO_T *)ncindexith(var->att, i))))
return retval;
ncindexfree(var->att);
2010-06-03 21:24:43 +08:00
/* Free some things that may be allocated. */
if (var->chunksizes)
free(var->chunksizes);
if (var->alt_name)
free(var->alt_name);
if (var->dimids)
free(var->dimids);
if (var->dim)
free(var->dim);
/* Delete any fill value allocation. */
if (var->fill_value) {
int ncid = var->container->nc4_info->controller->ext_ncid;
int tid = var->type_info->hdr.id;
if((retval = nc_reclaim_data_all(ncid, tid, var->fill_value, 1))) return retval;
var->fill_value = NULL;
}
2010-06-03 21:24:43 +08:00
/* Release type information */
if (var->type_info)
if ((retval = nc4_type_free(var->type_info)))
return retval;
This PR adds EXPERIMENTAL support for accessing data in the cloud using a variant of the Zarr protocol and storage format. This enhancement is generically referred to as "NCZarr". The data model supported by NCZarr is netcdf-4 minus the user-defined types and the String type. In this sense it is similar to the CDF-5 data model. More detailed information about enabling and using NCZarr is described in the document NUG/nczarr.md and in a [Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in). WARNING: this code has had limited testing, so do use this version for production work. Also, performance improvements are ongoing. Note especially the following platform matrix of successful tests: Platform | Build System | S3 support ------------------------------------ Linux+gcc | Automake | yes Linux+gcc | CMake | yes Visual Studio | CMake | no Additionally, and as a consequence of the addition of NCZarr, major changes have been made to the Filter API. NOTE: NCZarr does not yet support filters, but these changes are enablers for that support in the future. Note that it is possible (probable?) that there will be some accidental reversions if the changes here did not correctly mimic the existing filter testing. In any case, previously filter ids and parameters were of type unsigned int. In order to support the more general zarr filter model, this was all converted to char*. The old HDF5-specific, unsigned int operations are still supported but they are wrappers around the new, char* based nc_filterx_XXX functions. This entailed at least the following changes: 1. Added the files libdispatch/dfilterx.c and include/ncfilter.h 2. Some filterx utilities have been moved to libdispatch/daux.c 3. A new entry, "filter_actions" was added to the NCDispatch table and the version bumped. 4. An overly complex set of structs was created to support funnelling all of the filterx operations thru a single dispatch "filter_actions" entry. 5. Move common code to from libhdf5 to libsrc4 so that it is accessible to nczarr. Changes directly related to Zarr: 1. Modified CMakeList.txt and configure.ac to support both C and C++ -- this is in support of S3 support via the awd-sdk libraries. 2. Define a size64_t type to support nczarr. 3. More reworking of libdispatch/dinfermodel.c to support zarr and to regularize the structure of the fragments section of a URL. Changes not directly related to Zarr: 1. Make client-side filter registration be conditional, with default off. 2. Hack include/nc4internal.h to make some flags added by Ed be unique: e.g. NC_CREAT, NC_INDEF, etc. 3. cleanup include/nchttp.h and libdispatch/dhttp.c. 4. Misc. changes to support compiling under Visual Studio including: * Better testing under windows for dirent.h and opendir and closedir. 5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags and to centralize error reporting. 6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them. 7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible. Changes Left TO-DO: 1. fix provenance code, it is too HDF5 specific.
2020-06-29 08:02:47 +08:00
/* Do this last because debugging may need it */
if (var->hdr.name)
free(var->hdr.name);
2017-11-15 00:15:13 +08:00
/* Delete the var. */
free(var);
2010-06-03 21:24:43 +08:00
return NC_NOERR;
2010-06-03 21:24:43 +08:00
}
2017-12-03 22:57:21 +08:00
/**
* @internal Delete a var, and free the memory.
2017-12-03 22:57:21 +08:00
*
* @param grp Pointer to the strct for the containing group.
* @param var Pointer to the var info struct of var to delete.
2017-12-03 22:57:21 +08:00
*
* @return ::NC_NOERR No error.
2018-08-22 21:03:37 +08:00
* @author Ed Hartnett, Dennis Heimbigner
2017-12-05 03:21:14 +08:00
*/
int
nc4_var_list_del(NC_GRP_INFO_T *grp, NC_VAR_INFO_T *var)
2010-06-03 21:24:43 +08:00
{
int i;
assert(var && grp);
/* Remove from lists */
i = ncindexfind(grp->vars, (NC_OBJ *)var);
if (i >= 0)
ncindexidel(grp->vars, i);
return var_free(var);
2010-06-03 21:24:43 +08:00
}
2017-12-03 22:57:21 +08:00
/**
* @internal Free a dim
2017-12-03 22:57:21 +08:00
*
* @param dim Pointer to dim info struct of type to delete.
*
* @return ::NC_NOERR No error.
* @author Ed Hartnett, Ward Fisher
2017-12-05 03:21:14 +08:00
*/
2018-10-23 19:39:00 +08:00
static int
dim_free(NC_DIM_INFO_T *dim)
2010-06-03 21:24:43 +08:00
{
assert(dim);
LOG((4, "%s: deleting dim %s", __func__, dim->hdr.name));
2018-10-23 19:39:00 +08:00
/* Free memory allocated for names. */
if (dim->hdr.name)
free(dim->hdr.name);
free(dim);
return NC_NOERR;
2010-06-03 21:24:43 +08:00
}
2017-12-03 22:57:21 +08:00
/**
* @internal Free a dim and unlist it
2017-12-03 22:57:21 +08:00
*
* @param grp Pointer to dim's containing group
* @param dim Pointer to dim info struct of type to delete.
2017-12-03 22:57:21 +08:00
*
* @return ::NC_NOERR No error.
* @author Dennis Heimbigner
2017-12-05 03:21:14 +08:00
*/
int
2019-07-17 06:02:08 +08:00
nc4_dim_list_del(NC_GRP_INFO_T *grp, NC_DIM_INFO_T *dim)
2010-06-03 21:24:43 +08:00
{
2019-07-17 06:02:08 +08:00
if (grp && dim)
{
int pos = ncindexfind(grp->dim, (NC_OBJ *)dim);
if(pos >= 0)
2019-07-17 06:02:08 +08:00
ncindexidel(grp->dim, pos);
}
2019-07-17 06:02:08 +08:00
return dim_free(dim);
2010-06-03 21:24:43 +08:00
}
2017-12-03 22:57:21 +08:00
/**
* @internal Recursively delete the data for a group (and everything
* it contains) in our internal metadata store.
*
* @param grp Pointer to group info struct.
*
* @return ::NC_NOERR No error.
2018-08-22 21:03:37 +08:00
* @author Ed Hartnett, Dennis Heimbigner
2017-12-05 03:21:14 +08:00
*/
int
nc4_rec_grp_del(NC_GRP_INFO_T *grp)
2010-06-03 21:24:43 +08:00
{
int i;
int retval;
assert(grp);
LOG((3, "%s: grp->name %s", __func__, grp->hdr.name));
/* Recursively call this function for each child, if any, stopping
* if there is an error. */
for (i = 0; i < ncindexsize(grp->children); i++)
if ((retval = nc4_rec_grp_del((NC_GRP_INFO_T *)ncindexith(grp->children,
i))))
return retval;
ncindexfree(grp->children);
/* Free attributes */
for (i = 0; i < ncindexsize(grp->att); i++)
if ((retval = nc4_att_free((NC_ATT_INFO_T *)ncindexith(grp->att, i))))
return retval;
ncindexfree(grp->att);
/* Delete all vars. */
Add support for multiple filters per variable. re: https://github.com/Unidata/netcdf-c/issues/1584 Support has been added for multiple filters per variable. This affects a number of components in netcdf. The new APIs are documented in NUG/filters.md. The primary changes are: * A set of new functions are provided (see __include/netcdf_filter.h__). - Obtain a list of the filters associated with a variable - Obtain the parameters for a specific filter. * The existing __nc_inq_var_filter__ function now returns info about the first defined filter. * The utilities (ncgen, ncdump, and nccopy) now support an extended format for specifying a sequence of filters. The general form is __<filter>|<filter>..._. * The ncdump **_Filter** attribute now dumps a list of all the filters associated with a variable using the above new format. * Filter specifications can now use a filter name instead of number for filters known to the netcdf library, which in turn is taken from the HDF5 filter registration page. * New errors are defined: NC_EFILTER and NC_ENOFILTER. The latter is returned if an attempt is made to access an unknown filter. * Internally, the dispatch table has been extended to add a function to handle all of the filter functions. * New, filter-related, tests were added to nc_test4. * A new plugin was added to the plugins directory to help with testing. Notes: 1. The shuffle and fletcher32 filters are not part of the multifilter system. Misc. changes: 1. A debug module was added to libhdf5 to help catch error locations.
2020-02-17 03:59:33 +08:00
for (i = 0; i < ncindexsize(grp->vars); i++) {
NC_VAR_INFO_T* v = (NC_VAR_INFO_T *)ncindexith(grp->vars, i);
if ((retval = var_free(v)))
return retval;
Add support for multiple filters per variable. re: https://github.com/Unidata/netcdf-c/issues/1584 Support has been added for multiple filters per variable. This affects a number of components in netcdf. The new APIs are documented in NUG/filters.md. The primary changes are: * A set of new functions are provided (see __include/netcdf_filter.h__). - Obtain a list of the filters associated with a variable - Obtain the parameters for a specific filter. * The existing __nc_inq_var_filter__ function now returns info about the first defined filter. * The utilities (ncgen, ncdump, and nccopy) now support an extended format for specifying a sequence of filters. The general form is __<filter>|<filter>..._. * The ncdump **_Filter** attribute now dumps a list of all the filters associated with a variable using the above new format. * Filter specifications can now use a filter name instead of number for filters known to the netcdf library, which in turn is taken from the HDF5 filter registration page. * New errors are defined: NC_EFILTER and NC_ENOFILTER. The latter is returned if an attempt is made to access an unknown filter. * Internally, the dispatch table has been extended to add a function to handle all of the filter functions. * New, filter-related, tests were added to nc_test4. * A new plugin was added to the plugins directory to help with testing. Notes: 1. The shuffle and fletcher32 filters are not part of the multifilter system. Misc. changes: 1. A debug module was added to libhdf5 to help catch error locations.
2020-02-17 03:59:33 +08:00
}
ncindexfree(grp->vars);
/* Delete all dims, and free the list of dims. */
for (i = 0; i < ncindexsize(grp->dim); i++)
if ((retval = dim_free((NC_DIM_INFO_T *)ncindexith(grp->dim, i))))
return retval;
ncindexfree(grp->dim);
/* Delete all types. */
for (i = 0; i < ncindexsize(grp->type); i++)
if ((retval = nc4_type_free((NC_TYPE_INFO_T *)ncindexith(grp->type, i))))
return retval;
ncindexfree(grp->type);
/* Free the name. */
free(grp->hdr.name);
/* Free up this group */
free(grp);
return NC_NOERR;
2010-06-03 21:24:43 +08:00
}
Fix various problem around VLEN's re: https://github.com/Unidata/netcdf-c/issues/541 re: https://github.com/Unidata/netcdf-c/issues/1208 re: https://github.com/Unidata/netcdf-c/issues/2078 re: https://github.com/Unidata/netcdf-c/issues/2041 re: https://github.com/Unidata/netcdf-c/issues/2143 For a long time, there have been known problems with the management of complex types containing VLENs. This also involves the string type because it is stored as a VLEN of chars. This PR (mostly) fixes this problem. But note that it adds new functions to netcdf.h (see below) and this may require bumping the .so number. These new functions can be removed, if desired, in favor of functions in netcdf_aux.h, but netcdf.h seems the better place for them because they are intended as alternatives to the nc_free_vlen and nc_free_string functions already in netcdf.h. The term complex type refers to any type that directly or transitively references a VLEN type. So an array of VLENS, a compound with a VLEN field, and so on. In order to properly handle instances of these complex types, it is necessary to have function that can recursively walk instances of such types to perform various actions on them. The term "deep" is also used to mean recursive. At the moment, the two operations needed by the netcdf library are: * free'ing an instance of the complex type * copying an instance of the complex type. The current library does only shallow free and shallow copy of complex types. This means that only the top level is properly free'd or copied, but deep internal blocks in the instance are not touched. Note that the term "vector" will be used to mean a contiguous (in memory) sequence of instances of some type. Given an array with, say, dimensions 2 X 3 X 4, this will be stored in memory as a vector of length 2*3*4=24 instances. The use cases are primarily these. ## nc_get_vars Suppose one is reading a vector of instances using nc_get_vars (or nc_get_vara or nc_get_var, etc.). These functions will return the vector in the top-level memory provided. All interior blocks (form nested VLEN or strings) will have been dynamically allocated. After using this vector of instances, it is necessary to free (aka reclaim) the dynamically allocated memory, otherwise a memory leak occurs. So, the recursive reclaim function is used to walk the returned instance vector and do a deep reclaim of the data. Currently functions are defined in netcdf.h that are supposed to handle this: nc_free_vlen(), nc_free_vlens(), and nc_free_string(). Unfortunately, these functions only do a shallow free, so deeply nested instances are not properly handled by them. Note that internally, the provided data is immediately written so there is no need to copy it. But the caller may need to reclaim the data it passed into the function. ## nc_put_att Suppose one is writing a vector of instances as the data of an attribute using, say, nc_put_att. Internally, the incoming attribute data must be copied and stored so that changes/reclamation of the input data will not affect the attribute. Again, the code inside the netcdf library does only shallow copying rather than deep copy. As a result, one sees effects such as described in Github Issue https://github.com/Unidata/netcdf-c/issues/2143. Also, after defining the attribute, it may be necessary for the user to free the data that was provided as input to nc_put_att(). ## nc_get_att Suppose one is reading a vector of instances as the data of an attribute using, say, nc_get_att. Internally, the existing attribute data must be copied and returned to the caller, and the caller is responsible for reclaiming the returned data. Again, the code inside the netcdf library does only shallow copying rather than deep copy. So this can lead to memory leaks and errors because the deep data is shared between the library and the user. # Solution The solution is to build properly recursive reclaim and copy functions and use those as needed. These recursive functions are defined in libdispatch/dinstance.c and their signatures are defined in include/netcdf.h. For back compatibility, corresponding "ncaux_XXX" functions are defined in include/netcdf_aux.h. ```` int nc_reclaim_data(int ncid, nc_type xtypeid, void* memory, size_t count); int nc_reclaim_data_all(int ncid, nc_type xtypeid, void* memory, size_t count); int nc_copy_data(int ncid, nc_type xtypeid, const void* memory, size_t count, void* copy); int nc_copy_data_all(int ncid, nc_type xtypeid, const void* memory, size_t count, void** copyp); ```` There are two variants. The first two, nc_reclaim_data() and nc_copy_data(), assume the top-level vector is managed by the caller. For reclaim, this is so the user can use, for example, a statically allocated vector. For copy, it assumes the user provides the space into which the copy is stored. The second two, nc_reclaim_data_all() and nc_copy_data_all(), allows the functions to manage the top-level. So for nc_reclaim_data_all, the top level is assumed to be dynamically allocated and will be free'd by nc_reclaim_data_all(). The nc_copy_data_all() function will allocate the top level and return a pointer to it to the user. The user can later pass that pointer to nc_reclaim_data_all() to reclaim the instance(s). # Internal Changes The netcdf-c library internals are changed to use the proper reclaim and copy functions. It turns out that the places where these functions are needed is quite pervasive in the netcdf-c library code. Using these functions also allows some simplification of the code since the stdata and vldata fields of NC_ATT_INFO are no longer needed. Currently this is commented out using the SEPDATA \#define macro. When any bugs are largely fixed, all this code will be removed. # Known Bugs 1. There is still one known failure that has not been solved. All the failures revolve around some variant of this .cdl file. The proximate cause of failure is the use of a VLEN FillValue. ```` netcdf x { types: float(*) row_of_floats ; dimensions: m = 5 ; variables: row_of_floats ragged_array(m) ; row_of_floats ragged_array:_FillValue = {-999} ; data: ragged_array = {10, 11, 12, 13, 14}, {20, 21, 22, 23}, {30, 31, 32}, {40, 41}, _ ; } ```` When a solution is found, I will either add it to this PR or post a new PR. # Related Changes * Mark nc_free_vlen(s) as deprecated in favor of ncaux_reclaim_data. * Remove the --enable-unfixed-memory-leaks option. * Remove the NC_VLENS_NOTEST code that suppresses some vlen tests. * Document this change in docs/internal.md * Disable the tst_vlen_data test in ncdump/tst_nccopy4.sh. * Mark types as fixed size or not (transitively) to optimize the reclaim and copy functions. # Misc. Changes * Make Doxygen process libdispatch/daux.c * Make sure the NC_ATT_INFO_T.container field is set.
2022-01-09 09:30:00 +08:00
/**
* @internal Recursively delete the data for a group (and everything
* it contains) in our internal metadata store.
*
* @param grp Pointer to group info struct.
*
* @return ::NC_NOERR No error.
* @author Ed Hartnett, Dennis Heimbigner
*/
int
nc4_rec_grp_del_att_data(NC_GRP_INFO_T *grp)
{
int i;
int retval;
assert(grp);
LOG((3, "%s: grp->name %s", __func__, grp->hdr.name));
/* Recursively call this function for each child, if any, stopping
* if there is an error. */
for (i = 0; i < ncindexsize(grp->children); i++)
if ((retval = nc4_rec_grp_del_att_data((NC_GRP_INFO_T *)ncindexith(grp->children, i))))
return retval;
/* Free attribute data in this group */
for (i = 0; i < ncindexsize(grp->att); i++) {
NC_ATT_INFO_T * att = (NC_ATT_INFO_T*)ncindexith(grp->att, i);
if((retval = nc_reclaim_data_all(grp->nc4_info->controller->ext_ncid,att->nc_typeid,att->data,att->len)))
return retval;
att->data = NULL;
att->len = 0;
att->dirty = 0;
}
/* Delete att data from all contained vars in this group */
for (i = 0; i < ncindexsize(grp->vars); i++) {
int j;
NC_VAR_INFO_T* v = (NC_VAR_INFO_T *)ncindexith(grp->vars, i);
for(j=0;j<ncindexsize(v->att);j++) {
NC_ATT_INFO_T* att = (NC_ATT_INFO_T*)ncindexith(v->att, j);
if((retval = nc_reclaim_data_all(grp->nc4_info->controller->ext_ncid,att->nc_typeid,att->data,att->len)))
return retval;
att->data = NULL;
att->len = 0;
att->dirty = 0;
}
}
return NC_NOERR;
}
2017-12-03 22:57:21 +08:00
/**
* @internal Remove a NC_ATT_INFO_T from an index.
* This will nc_free the memory too.
2017-12-03 22:57:21 +08:00
*
* @param list Pointer to pointer of list.
* @param att Pointer to attribute info struct.
*
* @return ::NC_NOERR No error.
* @author Dennis Heimbigner
*/
int
nc4_att_list_del(NCindex *list, NC_ATT_INFO_T *att)
{
assert(att && list);
ncindexidel(list, ((NC_OBJ *)att)->id);
return nc4_att_free(att);
2010-06-03 21:24:43 +08:00
}
/**
* @internal Free all resources and memory associated with a
* NC_FILE_INFO_T. This is the same as nc4_nc4f_list_del(), except it
* takes ncid. This function allows external dispatch layers, like
* PIO, to manipulate the file list without needing to know about
* internal netcdf structures.
*
* @param ncid The ncid of the file to release.
*
* @return ::NC_NOERR No error.
* @return ::NC_EBADID Bad ncid.
* @author Ed Hartnett
*/
int
nc4_file_list_del(int ncid)
{
NC_FILE_INFO_T *h5;
int retval;
/* Find our metadata for this file. */
if ((retval = nc4_find_grp_h5(ncid, NULL, &h5)))
return retval;
assert(h5);
/* Delete the file resources. */
if ((retval = nc4_nc4f_list_del(h5)))
return retval;
return NC_NOERR;
}
2019-07-17 06:02:08 +08:00
/**
* @internal Free all resources and memory associated with a
* NC_FILE_INFO_T.
*
* @param h5 Pointer to NC_FILE_INFO_T to be freed.
*
* @return ::NC_NOERR No error.
* @author Ed Hartnett
*/
int
nc4_nc4f_list_del(NC_FILE_INFO_T *h5)
{
int retval;
2019-07-17 06:02:08 +08:00
assert(h5);
2019-07-17 06:07:21 +08:00
Fix various problem around VLEN's re: https://github.com/Unidata/netcdf-c/issues/541 re: https://github.com/Unidata/netcdf-c/issues/1208 re: https://github.com/Unidata/netcdf-c/issues/2078 re: https://github.com/Unidata/netcdf-c/issues/2041 re: https://github.com/Unidata/netcdf-c/issues/2143 For a long time, there have been known problems with the management of complex types containing VLENs. This also involves the string type because it is stored as a VLEN of chars. This PR (mostly) fixes this problem. But note that it adds new functions to netcdf.h (see below) and this may require bumping the .so number. These new functions can be removed, if desired, in favor of functions in netcdf_aux.h, but netcdf.h seems the better place for them because they are intended as alternatives to the nc_free_vlen and nc_free_string functions already in netcdf.h. The term complex type refers to any type that directly or transitively references a VLEN type. So an array of VLENS, a compound with a VLEN field, and so on. In order to properly handle instances of these complex types, it is necessary to have function that can recursively walk instances of such types to perform various actions on them. The term "deep" is also used to mean recursive. At the moment, the two operations needed by the netcdf library are: * free'ing an instance of the complex type * copying an instance of the complex type. The current library does only shallow free and shallow copy of complex types. This means that only the top level is properly free'd or copied, but deep internal blocks in the instance are not touched. Note that the term "vector" will be used to mean a contiguous (in memory) sequence of instances of some type. Given an array with, say, dimensions 2 X 3 X 4, this will be stored in memory as a vector of length 2*3*4=24 instances. The use cases are primarily these. ## nc_get_vars Suppose one is reading a vector of instances using nc_get_vars (or nc_get_vara or nc_get_var, etc.). These functions will return the vector in the top-level memory provided. All interior blocks (form nested VLEN or strings) will have been dynamically allocated. After using this vector of instances, it is necessary to free (aka reclaim) the dynamically allocated memory, otherwise a memory leak occurs. So, the recursive reclaim function is used to walk the returned instance vector and do a deep reclaim of the data. Currently functions are defined in netcdf.h that are supposed to handle this: nc_free_vlen(), nc_free_vlens(), and nc_free_string(). Unfortunately, these functions only do a shallow free, so deeply nested instances are not properly handled by them. Note that internally, the provided data is immediately written so there is no need to copy it. But the caller may need to reclaim the data it passed into the function. ## nc_put_att Suppose one is writing a vector of instances as the data of an attribute using, say, nc_put_att. Internally, the incoming attribute data must be copied and stored so that changes/reclamation of the input data will not affect the attribute. Again, the code inside the netcdf library does only shallow copying rather than deep copy. As a result, one sees effects such as described in Github Issue https://github.com/Unidata/netcdf-c/issues/2143. Also, after defining the attribute, it may be necessary for the user to free the data that was provided as input to nc_put_att(). ## nc_get_att Suppose one is reading a vector of instances as the data of an attribute using, say, nc_get_att. Internally, the existing attribute data must be copied and returned to the caller, and the caller is responsible for reclaiming the returned data. Again, the code inside the netcdf library does only shallow copying rather than deep copy. So this can lead to memory leaks and errors because the deep data is shared between the library and the user. # Solution The solution is to build properly recursive reclaim and copy functions and use those as needed. These recursive functions are defined in libdispatch/dinstance.c and their signatures are defined in include/netcdf.h. For back compatibility, corresponding "ncaux_XXX" functions are defined in include/netcdf_aux.h. ```` int nc_reclaim_data(int ncid, nc_type xtypeid, void* memory, size_t count); int nc_reclaim_data_all(int ncid, nc_type xtypeid, void* memory, size_t count); int nc_copy_data(int ncid, nc_type xtypeid, const void* memory, size_t count, void* copy); int nc_copy_data_all(int ncid, nc_type xtypeid, const void* memory, size_t count, void** copyp); ```` There are two variants. The first two, nc_reclaim_data() and nc_copy_data(), assume the top-level vector is managed by the caller. For reclaim, this is so the user can use, for example, a statically allocated vector. For copy, it assumes the user provides the space into which the copy is stored. The second two, nc_reclaim_data_all() and nc_copy_data_all(), allows the functions to manage the top-level. So for nc_reclaim_data_all, the top level is assumed to be dynamically allocated and will be free'd by nc_reclaim_data_all(). The nc_copy_data_all() function will allocate the top level and return a pointer to it to the user. The user can later pass that pointer to nc_reclaim_data_all() to reclaim the instance(s). # Internal Changes The netcdf-c library internals are changed to use the proper reclaim and copy functions. It turns out that the places where these functions are needed is quite pervasive in the netcdf-c library code. Using these functions also allows some simplification of the code since the stdata and vldata fields of NC_ATT_INFO are no longer needed. Currently this is commented out using the SEPDATA \#define macro. When any bugs are largely fixed, all this code will be removed. # Known Bugs 1. There is still one known failure that has not been solved. All the failures revolve around some variant of this .cdl file. The proximate cause of failure is the use of a VLEN FillValue. ```` netcdf x { types: float(*) row_of_floats ; dimensions: m = 5 ; variables: row_of_floats ragged_array(m) ; row_of_floats ragged_array:_FillValue = {-999} ; data: ragged_array = {10, 11, 12, 13, 14}, {20, 21, 22, 23}, {30, 31, 32}, {40, 41}, _ ; } ```` When a solution is found, I will either add it to this PR or post a new PR. # Related Changes * Mark nc_free_vlen(s) as deprecated in favor of ncaux_reclaim_data. * Remove the --enable-unfixed-memory-leaks option. * Remove the NC_VLENS_NOTEST code that suppresses some vlen tests. * Document this change in docs/internal.md * Disable the tst_vlen_data test in ncdump/tst_nccopy4.sh. * Mark types as fixed size or not (transitively) to optimize the reclaim and copy functions. # Misc. Changes * Make Doxygen process libdispatch/daux.c * Make sure the NC_ATT_INFO_T.container field is set.
2022-01-09 09:30:00 +08:00
/* Order is important here. We must delete the attribute contents
before deleteing any metadata because nc_reclaim_data depends
on the existence of the type info.
*/
/* Delete all the attribute data contents in each group and variable. */
if ((retval = nc4_rec_grp_del_att_data(h5->root_grp)))
return retval;
/* Delete all the list contents for vars, dims, and atts, in each
* group. */
if ((retval = nc4_rec_grp_del(h5->root_grp)))
return retval;
/* Cleanup these (extra) lists of all dims, groups, and types. */
nclistfree(h5->alldims);
nclistfree(h5->allgroups);
nclistfree(h5->alltypes);
/* Free the NC_FILE_INFO_T struct. */
nullfree(h5->hdr.name);
2019-07-17 06:07:21 +08:00
free(h5);
2019-07-17 06:02:08 +08:00
return NC_NOERR;
}
2017-12-03 22:57:21 +08:00
/**
* @internal Normalize a UTF8 name. Put the result in norm_name, which
* can be NC_MAX_NAME + 1 in size. This function makes sure the free()
* gets called on the return from utf8proc_NFC, and also ensures that
* the name is not too long.
*
* @param name Name to normalize.
* @param norm_name The normalized name.
*
* @return ::NC_NOERR No error.
* @return ::NC_EMAXNAME Name too long.
* @author Dennis Heimbigner
2017-12-05 03:21:14 +08:00
*/
2010-06-03 21:24:43 +08:00
int
nc4_normalize_name(const char *name, char *norm_name)
{
char *temp_name;
int stat = nc_utf8_normalize((const unsigned char *)name,(unsigned char **)&temp_name);
if(stat != NC_NOERR)
return stat;
if (strlen(temp_name) > NC_MAX_NAME)
{
free(temp_name);
return NC_EMAXNAME;
}
strcpy(norm_name, temp_name);
free(temp_name);
return NC_NOERR;
2010-06-03 21:24:43 +08:00
}
#ifdef ENABLE_SET_LOG_LEVEL
/**
* Initialize parallel I/O logging. For parallel I/O builds, open log
* file, if not opened yet, or increment ref count if already open.
*
* @author Ed Hartnett
*/
int
nc4_init_logging(void)
{
int ret = NC_NOERR;
#if LOGGING
#if HAS_PARALLEL4
if (!LOG_FILE && nc_log_level >= 0)
{
char log_filename[NC4_MAX_NAME];
int mpierr;
/* Create a filename with the rank in it. */
if ((mpierr = MPI_Comm_rank(MPI_COMM_WORLD, &my_rank)))
return NC_EMPI;
sprintf(log_filename, "nc4_log_%d.log", my_rank);
/* Open a file for this rank to log messages. */
if (!(LOG_FILE = fopen(log_filename, "w")))
return NC_EINTERNAL;
nc4_log_ref_cnt = 1;
}
else if(LOG_FILE)
{
nc4_log_ref_cnt++;
}
#endif /* HAS_PARALLEL4 */
#endif /* LOGGING */
return ret;
}
/**
* Finalize logging - close parallel I/O log files, if open. This only
* does anything on parallel I/O builds.
*
* @author Ed Hartnett
*/
void
nc4_finalize_logging(void)
{
#if LOGGING
#if HAS_PARALLEL4
nc4_log_ref_cnt -= 1;
if (LOG_FILE)
{
if (nc4_log_ref_cnt == 0)
{
fclose(LOG_FILE);
LOG_FILE = NULL;
}
else
LOG((2, "nc4_finalize_logging, postpone close, ref_cnt = %d",
nc4_log_ref_cnt));
}
#endif /* HAS_PARALLEL4 */
#endif /* LOGGING */
}
2017-12-03 22:57:21 +08:00
/**
* @internal Use this to set the global log level. Set it to
* NC_TURN_OFF_LOGGING (-1) to turn off all logging. Set it to 0 to
* show only errors, and to higher numbers to show more and more
* logging details. If logging is not enabled with --enable-logging at
* configure when building netCDF, this function will do nothing.
* Note that it is possible to set the log level using the environment
* variable named _NETCDF_LOG_LEVEL_ (e.g. _export NETCDF_LOG_LEVEL=4_).
2017-12-03 22:57:21 +08:00
*
* @param new_level The new logging level.
*
* @return ::NC_NOERR No error.
* @author Ed Hartnett
2017-12-05 03:21:14 +08:00
*/
int
2010-06-03 21:24:43 +08:00
nc_set_log_level(int new_level)
{
#ifdef LOGGING
/* Remember the new level. */
nc_log_level = new_level;
#if HAS_PARALLEL4
if (new_level >= 0)
{
if (!LOG_FILE)
nc4_init_logging();
}
else
nc4_finalize_logging();
#endif /* HAS_PARALLEL4 */
LOG((4, "log_level changed to %d", nc_log_level));
#endif /*LOGGING */
return NC_NOERR;
}
#endif /* ENABLE_SET_LOG_LEVEL */
2010-06-03 21:24:43 +08:00
#ifdef LOGGING
2010-06-03 21:24:43 +08:00
#define MAX_NESTS 10
2017-12-03 22:57:21 +08:00
/**
2017-12-05 03:21:14 +08:00
* @internal Recursively print the metadata of a group.
2017-12-03 22:57:21 +08:00
*
* @param grp Pointer to group info struct.
* @param tab_count Number of tabs.
*
* @return ::NC_NOERR No error.
* @author Ed Hartnett, Dennis Heimbigner
2017-12-05 03:21:14 +08:00
*/
2010-06-03 21:24:43 +08:00
static int
rec_print_metadata(NC_GRP_INFO_T *grp, int tab_count)
2010-06-03 21:24:43 +08:00
{
NC_ATT_INFO_T *att;
NC_VAR_INFO_T *var;
NC_DIM_INFO_T *dim;
NC_TYPE_INFO_T *type;
NC_FIELD_INFO_T *field;
char tabs[MAX_NESTS+1] = "";
char temp_string[10];
int t, retval, d, i;
/* Come up with a number of tabs relative to the group. */
for (t = 0; t < tab_count && t < MAX_NESTS; t++)
tabs[t] = '\t';
tabs[t] = '\0';
LOG((2, "%s GROUP - %s nc_grpid: %d nvars: %d natts: %d",
tabs, grp->hdr.name, grp->hdr.id, ncindexsize(grp->vars), ncindexsize(grp->att)));
for (i = 0; i < ncindexsize(grp->att); i++)
{
att = (NC_ATT_INFO_T *)ncindexith(grp->att, i);
assert(att);
LOG((2, "%s GROUP ATTRIBUTE - attnum: %d name: %s type: %d len: %d",
tabs, att->hdr.id, att->hdr.name, att->nc_typeid, att->len));
}
for (i = 0; i < ncindexsize(grp->dim); i++)
{
dim = (NC_DIM_INFO_T *)ncindexith(grp->dim, i);
assert(dim);
LOG((2, "%s DIMENSION - dimid: %d name: %s len: %d unlimited: %d",
tabs, dim->hdr.id, dim->hdr.name, dim->len, dim->unlimited));
}
for (i = 0; i < ncindexsize(grp->vars); i++)
{
int j;
2020-02-28 05:06:45 +08:00
char storage_str[NC_MAX_NAME] = "";
char *dims_string = NULL;
var = (NC_VAR_INFO_T*)ncindexith(grp->vars,i);
assert(var);
if (var->ndims > 0)
{
if (!(dims_string = malloc(sizeof(char) * var->ndims * 4)))
return NC_ENOMEM;
strcpy(dims_string, "");
for (d = 0; d < var->ndims; d++)
{
sprintf(temp_string, " %d", var->dimids[d]);
strcat(dims_string, temp_string);
}
}
2020-02-28 05:06:45 +08:00
if (!var->meta_read)
strcat(storage_str, "unknown");
2020-03-08 21:13:07 +08:00
else if (var->storage == NC_CONTIGUOUS)
2020-02-28 05:06:45 +08:00
strcat(storage_str, "contiguous");
2020-03-08 21:13:07 +08:00
else if (var->storage == NC_COMPACT)
2020-02-28 05:06:45 +08:00
strcat(storage_str, "compact");
else if (var->storage == NC_CHUNKED)
2020-02-28 05:06:45 +08:00
strcat(storage_str, "chunked");
2020-09-03 23:51:46 +08:00
else if (var->storage == NC_VIRTUAL)
strcat(storage_str, "virtual");
else
strcat(storage_str, "unknown");
LOG((2, "%s VARIABLE - varid: %d name: %s ndims: %d "
2020-02-28 05:06:45 +08:00
"dimids:%s storage: %s", tabs, var->hdr.id, var->hdr.name,
var->ndims,
2020-02-28 05:06:45 +08:00
(dims_string ? dims_string : " -"), storage_str));
for (j = 0; j < ncindexsize(var->att); j++)
{
att = (NC_ATT_INFO_T *)ncindexith(var->att, j);
assert(att);
LOG((2, "%s VAR ATTRIBUTE - attnum: %d name: %s type: %d len: %d",
tabs, att->hdr.id, att->hdr.name, att->nc_typeid, att->len));
}
if (dims_string)
free(dims_string);
}
for (i = 0; i < ncindexsize(grp->type); i++)
{
type = (NC_TYPE_INFO_T*)ncindexith(grp->type, i);
assert(type);
LOG((2, "%s TYPE - nc_typeid: %d size: %d committed: %d name: %s",
tabs, type->hdr.id, type->size, (int)type->committed, type->hdr.name));
/* Is this a compound type? */
if (type->nc_type_class == NC_COMPOUND)
{
int j;
LOG((3, "compound type"));
for (j = 0; j < nclistlength(type->u.c.field); j++)
{
field = (NC_FIELD_INFO_T *)nclistget(type->u.c.field, j);
LOG((4, "field %s offset %d nctype %d ndims %d", field->hdr.name,
field->offset, field->nc_typeid, field->ndims));
}
}
else if (type->nc_type_class == NC_VLEN)
{
LOG((3, "VLEN type"));
LOG((4, "base_nc_type: %d", type->u.v.base_nc_typeid));
}
else if (type->nc_type_class == NC_OPAQUE)
LOG((3, "Opaque type"));
else if (type->nc_type_class == NC_ENUM)
{
LOG((3, "Enum type"));
LOG((4, "base_nc_type: %d", type->u.e.base_nc_typeid));
}
else
{
LOG((0, "Unknown class: %d", type->nc_type_class));
return NC_EBADTYPE;
}
}
/* Call self for each child of this group. */
for (i = 0; i < ncindexsize(grp->children); i++)
if ((retval = rec_print_metadata((NC_GRP_INFO_T *)ncindexith(grp->children, i),
tab_count + 1)))
return retval;
return NC_NOERR;
2010-06-03 21:24:43 +08:00
}
2017-12-03 22:57:21 +08:00
/**
* @internal Print out the internal metadata for a file. This is
* useful to check that netCDF is working! Nonetheless, this function
2017-12-05 03:21:14 +08:00
* will print nothing if logging is not set to at least two.
2017-12-03 22:57:21 +08:00
*
2018-08-22 20:08:19 +08:00
* @param Pointer to the file info struct.
*
2017-12-03 22:57:21 +08:00
* @return ::NC_NOERR No error.
* @author Ed Hartnett
2017-12-05 03:21:14 +08:00
*/
2010-06-03 21:24:43 +08:00
int
2018-08-22 20:08:19 +08:00
log_metadata_nc(NC_FILE_INFO_T *h5)
2010-06-03 21:24:43 +08:00
{
LOG((2, "*** NetCDF-4 Internal Metadata: int_ncid 0x%x ext_ncid 0x%x",
h5->root_grp->nc4_info->controller->int_ncid,
h5->root_grp->nc4_info->controller->ext_ncid));
if (!h5)
{
LOG((2, "This is a netCDF-3 file."));
return NC_NOERR;
}
LOG((2, "FILE - path: %s cmode: 0x%x parallel: %d redef: %d "
"fill_mode: %d no_write: %d next_nc_grpid: %d", h5->root_grp->nc4_info->controller->path,
h5->cmode, (int)h5->parallel, (int)h5->redef, h5->fill_mode, (int)h5->no_write,
h5->next_nc_grpid));
if(nc_log_level >= 2)
return rec_print_metadata(h5->root_grp, 0);
return NC_NOERR;
2010-06-03 21:24:43 +08:00
}
#endif /*LOGGING */
2017-12-03 22:57:21 +08:00
/**
2018-08-09 20:47:45 +08:00
* @internal Show the in-memory metadata for a netcdf file. This
* function does nothing unless netCDF was built with
* the configure option --enable-logging.
2017-12-03 22:57:21 +08:00
*
2017-12-05 03:21:14 +08:00
* @param ncid File and group ID.
*
2017-12-03 22:57:21 +08:00
* @return ::NC_NOERR No error.
2018-08-09 20:47:45 +08:00
* @return ::NC_EBADID Bad ncid.
2017-12-03 22:57:21 +08:00
* @author Ed Hartnett
2017-12-05 03:21:14 +08:00
*/
2010-06-03 21:24:43 +08:00
int
NC4_show_metadata(int ncid)
{
int retval = NC_NOERR;
2010-06-03 21:24:43 +08:00
#ifdef LOGGING
NC_FILE_INFO_T *h5;
int old_log_level = nc_log_level;
/* Find file metadata. */
if ((retval = nc4_find_grp_h5(ncid, NULL, &h5)))
return retval;
2010-06-03 21:24:43 +08:00
/* Log level must be 2 to see metadata. */
nc_log_level = 2;
retval = log_metadata_nc(h5);
nc_log_level = old_log_level;
2010-06-03 21:24:43 +08:00
#endif /*LOGGING*/
return retval;
2010-06-03 21:24:43 +08:00
}
Add support for multiple filters per variable. re: https://github.com/Unidata/netcdf-c/issues/1584 Support has been added for multiple filters per variable. This affects a number of components in netcdf. The new APIs are documented in NUG/filters.md. The primary changes are: * A set of new functions are provided (see __include/netcdf_filter.h__). - Obtain a list of the filters associated with a variable - Obtain the parameters for a specific filter. * The existing __nc_inq_var_filter__ function now returns info about the first defined filter. * The utilities (ncgen, ncdump, and nccopy) now support an extended format for specifying a sequence of filters. The general form is __<filter>|<filter>..._. * The ncdump **_Filter** attribute now dumps a list of all the filters associated with a variable using the above new format. * Filter specifications can now use a filter name instead of number for filters known to the netcdf library, which in turn is taken from the HDF5 filter registration page. * New errors are defined: NC_EFILTER and NC_ENOFILTER. The latter is returned if an attempt is made to access an unknown filter. * Internally, the dispatch table has been extended to add a function to handle all of the filter functions. * New, filter-related, tests were added to nc_test4. * A new plugin was added to the plugins directory to help with testing. Notes: 1. The shuffle and fletcher32 filters are not part of the multifilter system. Misc. changes: 1. A debug module was added to libhdf5 to help catch error locations.
2020-02-17 03:59:33 +08:00
This PR adds EXPERIMENTAL support for accessing data in the cloud using a variant of the Zarr protocol and storage format. This enhancement is generically referred to as "NCZarr". The data model supported by NCZarr is netcdf-4 minus the user-defined types and the String type. In this sense it is similar to the CDF-5 data model. More detailed information about enabling and using NCZarr is described in the document NUG/nczarr.md and in a [Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in). WARNING: this code has had limited testing, so do use this version for production work. Also, performance improvements are ongoing. Note especially the following platform matrix of successful tests: Platform | Build System | S3 support ------------------------------------ Linux+gcc | Automake | yes Linux+gcc | CMake | yes Visual Studio | CMake | no Additionally, and as a consequence of the addition of NCZarr, major changes have been made to the Filter API. NOTE: NCZarr does not yet support filters, but these changes are enablers for that support in the future. Note that it is possible (probable?) that there will be some accidental reversions if the changes here did not correctly mimic the existing filter testing. In any case, previously filter ids and parameters were of type unsigned int. In order to support the more general zarr filter model, this was all converted to char*. The old HDF5-specific, unsigned int operations are still supported but they are wrappers around the new, char* based nc_filterx_XXX functions. This entailed at least the following changes: 1. Added the files libdispatch/dfilterx.c and include/ncfilter.h 2. Some filterx utilities have been moved to libdispatch/daux.c 3. A new entry, "filter_actions" was added to the NCDispatch table and the version bumped. 4. An overly complex set of structs was created to support funnelling all of the filterx operations thru a single dispatch "filter_actions" entry. 5. Move common code to from libhdf5 to libsrc4 so that it is accessible to nczarr. Changes directly related to Zarr: 1. Modified CMakeList.txt and configure.ac to support both C and C++ -- this is in support of S3 support via the awd-sdk libraries. 2. Define a size64_t type to support nczarr. 3. More reworking of libdispatch/dinfermodel.c to support zarr and to regularize the structure of the fragments section of a URL. Changes not directly related to Zarr: 1. Make client-side filter registration be conditional, with default off. 2. Hack include/nc4internal.h to make some flags added by Ed be unique: e.g. NC_CREAT, NC_INDEF, etc. 3. cleanup include/nchttp.h and libdispatch/dhttp.c. 4. Misc. changes to support compiling under Visual Studio including: * Better testing under windows for dirent.h and opendir and closedir. 5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags and to centralize error reporting. 6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them. 7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible. Changes Left TO-DO: 1. fix provenance code, it is too HDF5 specific.
2020-06-29 08:02:47 +08:00
/**
* @internal Define a binary searcher for reserved attributes
* @param name for which to search
* @return pointer to the matching NC_reservedatt structure.
* @return NULL if not found.
* @author Dennis Heimbigner
*/
const NC_reservedatt*
NC_findreserved(const char* name)
Add support for multiple filters per variable. re: https://github.com/Unidata/netcdf-c/issues/1584 Support has been added for multiple filters per variable. This affects a number of components in netcdf. The new APIs are documented in NUG/filters.md. The primary changes are: * A set of new functions are provided (see __include/netcdf_filter.h__). - Obtain a list of the filters associated with a variable - Obtain the parameters for a specific filter. * The existing __nc_inq_var_filter__ function now returns info about the first defined filter. * The utilities (ncgen, ncdump, and nccopy) now support an extended format for specifying a sequence of filters. The general form is __<filter>|<filter>..._. * The ncdump **_Filter** attribute now dumps a list of all the filters associated with a variable using the above new format. * Filter specifications can now use a filter name instead of number for filters known to the netcdf library, which in turn is taken from the HDF5 filter registration page. * New errors are defined: NC_EFILTER and NC_ENOFILTER. The latter is returned if an attempt is made to access an unknown filter. * Internally, the dispatch table has been extended to add a function to handle all of the filter functions. * New, filter-related, tests were added to nc_test4. * A new plugin was added to the plugins directory to help with testing. Notes: 1. The shuffle and fletcher32 filters are not part of the multifilter system. Misc. changes: 1. A debug module was added to libhdf5 to help catch error locations.
2020-02-17 03:59:33 +08:00
{
This PR adds EXPERIMENTAL support for accessing data in the cloud using a variant of the Zarr protocol and storage format. This enhancement is generically referred to as "NCZarr". The data model supported by NCZarr is netcdf-4 minus the user-defined types and the String type. In this sense it is similar to the CDF-5 data model. More detailed information about enabling and using NCZarr is described in the document NUG/nczarr.md and in a [Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in). WARNING: this code has had limited testing, so do use this version for production work. Also, performance improvements are ongoing. Note especially the following platform matrix of successful tests: Platform | Build System | S3 support ------------------------------------ Linux+gcc | Automake | yes Linux+gcc | CMake | yes Visual Studio | CMake | no Additionally, and as a consequence of the addition of NCZarr, major changes have been made to the Filter API. NOTE: NCZarr does not yet support filters, but these changes are enablers for that support in the future. Note that it is possible (probable?) that there will be some accidental reversions if the changes here did not correctly mimic the existing filter testing. In any case, previously filter ids and parameters were of type unsigned int. In order to support the more general zarr filter model, this was all converted to char*. The old HDF5-specific, unsigned int operations are still supported but they are wrappers around the new, char* based nc_filterx_XXX functions. This entailed at least the following changes: 1. Added the files libdispatch/dfilterx.c and include/ncfilter.h 2. Some filterx utilities have been moved to libdispatch/daux.c 3. A new entry, "filter_actions" was added to the NCDispatch table and the version bumped. 4. An overly complex set of structs was created to support funnelling all of the filterx operations thru a single dispatch "filter_actions" entry. 5. Move common code to from libhdf5 to libsrc4 so that it is accessible to nczarr. Changes directly related to Zarr: 1. Modified CMakeList.txt and configure.ac to support both C and C++ -- this is in support of S3 support via the awd-sdk libraries. 2. Define a size64_t type to support nczarr. 3. More reworking of libdispatch/dinfermodel.c to support zarr and to regularize the structure of the fragments section of a URL. Changes not directly related to Zarr: 1. Make client-side filter registration be conditional, with default off. 2. Hack include/nc4internal.h to make some flags added by Ed be unique: e.g. NC_CREAT, NC_INDEF, etc. 3. cleanup include/nchttp.h and libdispatch/dhttp.c. 4. Misc. changes to support compiling under Visual Studio including: * Better testing under windows for dirent.h and opendir and closedir. 5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags and to centralize error reporting. 6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them. 7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible. Changes Left TO-DO: 1. fix provenance code, it is too HDF5 specific.
2020-06-29 08:02:47 +08:00
int n = NRESERVED;
int L = 0;
int R = (n - 1);
Add filter support to NCZarr Filter support has three goals: 1. Use the existing HDF5 filter implementations, 2. Allow filter metadata to be stored in the NumCodecs metadata format used by Zarr, 3. Allow filters to be used even when HDF5 is disabled Detailed usage directions are define in docs/filters.md. For now, the existing filter API is left in place. So filters are defined using ''nc_def_var_filter'' using the HDF5 style where the id and parameters are unsigned integers. This is a big change since filters affect many parts of the code. In the following, the terms "compressor" and "filter" and "codec" are generally used synonomously. ### Filter-Related Changes: * In order to support dynamic loading of shared filter libraries, a new library was added in the libncpoco directory; it helps to isolate dynamic loading across multiple platforms. * Provide a json parsing library for use by plugins; this is created by merging libdispatch/ncjson.c with include/ncjson.h. * Add a new _Codecs attribute to allow clients to see what codecs are being used; let ncdump -s print it out. * Provide special headers to help support compilation of HDF5 filters when HDF5 is not enabled: netcdf_filter_hdf5_build.h and netcdf_filter_build.h. * Add a number of new test to test the new nczarr filters. * Let ncgen parse _Codecs attribute, although it is ignored. ### Plugin directory changes: * Add support for the Blosc compressor; this is essential because it is the most common compressor used in Zarr datasets. This also necessitated adding a CMake FindBlosc.cmake file * Add NCZarr support for the big-four filters provided by HDF5: shuffle, fletcher32, deflate (zlib), and szip * Add a Codec defaulter (see docs/filters.md) for the big four filters. * Make plugins work with windows by properly adding __declspec declaration. ### Misc. Non-Filter Changes * Replace most uses of USE_NETCDF4 (deprecated) with USE_HDF5. * Improve support for caching * More fixes for path conversion code * Fix misc. memory leaks * Add new utility -- ncdump/ncpathcvt -- that does more or less the same thing as cygpath. * Add a number of new test to test the non-filter fixes. * Update the parsers * Convert most instances of '#ifdef _MSC_VER' to '#ifdef _WIN32'
2021-09-03 07:04:26 +08:00
This PR adds EXPERIMENTAL support for accessing data in the cloud using a variant of the Zarr protocol and storage format. This enhancement is generically referred to as "NCZarr". The data model supported by NCZarr is netcdf-4 minus the user-defined types and the String type. In this sense it is similar to the CDF-5 data model. More detailed information about enabling and using NCZarr is described in the document NUG/nczarr.md and in a [Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in). WARNING: this code has had limited testing, so do use this version for production work. Also, performance improvements are ongoing. Note especially the following platform matrix of successful tests: Platform | Build System | S3 support ------------------------------------ Linux+gcc | Automake | yes Linux+gcc | CMake | yes Visual Studio | CMake | no Additionally, and as a consequence of the addition of NCZarr, major changes have been made to the Filter API. NOTE: NCZarr does not yet support filters, but these changes are enablers for that support in the future. Note that it is possible (probable?) that there will be some accidental reversions if the changes here did not correctly mimic the existing filter testing. In any case, previously filter ids and parameters were of type unsigned int. In order to support the more general zarr filter model, this was all converted to char*. The old HDF5-specific, unsigned int operations are still supported but they are wrappers around the new, char* based nc_filterx_XXX functions. This entailed at least the following changes: 1. Added the files libdispatch/dfilterx.c and include/ncfilter.h 2. Some filterx utilities have been moved to libdispatch/daux.c 3. A new entry, "filter_actions" was added to the NCDispatch table and the version bumped. 4. An overly complex set of structs was created to support funnelling all of the filterx operations thru a single dispatch "filter_actions" entry. 5. Move common code to from libhdf5 to libsrc4 so that it is accessible to nczarr. Changes directly related to Zarr: 1. Modified CMakeList.txt and configure.ac to support both C and C++ -- this is in support of S3 support via the awd-sdk libraries. 2. Define a size64_t type to support nczarr. 3. More reworking of libdispatch/dinfermodel.c to support zarr and to regularize the structure of the fragments section of a URL. Changes not directly related to Zarr: 1. Make client-side filter registration be conditional, with default off. 2. Hack include/nc4internal.h to make some flags added by Ed be unique: e.g. NC_CREAT, NC_INDEF, etc. 3. cleanup include/nchttp.h and libdispatch/dhttp.c. 4. Misc. changes to support compiling under Visual Studio including: * Better testing under windows for dirent.h and opendir and closedir. 5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags and to centralize error reporting. 6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them. 7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible. Changes Left TO-DO: 1. fix provenance code, it is too HDF5 specific.
2020-06-29 08:02:47 +08:00
for(;;) {
if(L > R) break;
int m = (L + R) / 2;
const NC_reservedatt* p = &NC_reserved[m];
int cmp = strcmp(p->name,name);
Add filter support to NCZarr Filter support has three goals: 1. Use the existing HDF5 filter implementations, 2. Allow filter metadata to be stored in the NumCodecs metadata format used by Zarr, 3. Allow filters to be used even when HDF5 is disabled Detailed usage directions are define in docs/filters.md. For now, the existing filter API is left in place. So filters are defined using ''nc_def_var_filter'' using the HDF5 style where the id and parameters are unsigned integers. This is a big change since filters affect many parts of the code. In the following, the terms "compressor" and "filter" and "codec" are generally used synonomously. ### Filter-Related Changes: * In order to support dynamic loading of shared filter libraries, a new library was added in the libncpoco directory; it helps to isolate dynamic loading across multiple platforms. * Provide a json parsing library for use by plugins; this is created by merging libdispatch/ncjson.c with include/ncjson.h. * Add a new _Codecs attribute to allow clients to see what codecs are being used; let ncdump -s print it out. * Provide special headers to help support compilation of HDF5 filters when HDF5 is not enabled: netcdf_filter_hdf5_build.h and netcdf_filter_build.h. * Add a number of new test to test the new nczarr filters. * Let ncgen parse _Codecs attribute, although it is ignored. ### Plugin directory changes: * Add support for the Blosc compressor; this is essential because it is the most common compressor used in Zarr datasets. This also necessitated adding a CMake FindBlosc.cmake file * Add NCZarr support for the big-four filters provided by HDF5: shuffle, fletcher32, deflate (zlib), and szip * Add a Codec defaulter (see docs/filters.md) for the big four filters. * Make plugins work with windows by properly adding __declspec declaration. ### Misc. Non-Filter Changes * Replace most uses of USE_NETCDF4 (deprecated) with USE_HDF5. * Improve support for caching * More fixes for path conversion code * Fix misc. memory leaks * Add new utility -- ncdump/ncpathcvt -- that does more or less the same thing as cygpath. * Add a number of new test to test the non-filter fixes. * Update the parsers * Convert most instances of '#ifdef _MSC_VER' to '#ifdef _WIN32'
2021-09-03 07:04:26 +08:00
if(cmp == 0) return p;
This PR adds EXPERIMENTAL support for accessing data in the cloud using a variant of the Zarr protocol and storage format. This enhancement is generically referred to as "NCZarr". The data model supported by NCZarr is netcdf-4 minus the user-defined types and the String type. In this sense it is similar to the CDF-5 data model. More detailed information about enabling and using NCZarr is described in the document NUG/nczarr.md and in a [Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in). WARNING: this code has had limited testing, so do use this version for production work. Also, performance improvements are ongoing. Note especially the following platform matrix of successful tests: Platform | Build System | S3 support ------------------------------------ Linux+gcc | Automake | yes Linux+gcc | CMake | yes Visual Studio | CMake | no Additionally, and as a consequence of the addition of NCZarr, major changes have been made to the Filter API. NOTE: NCZarr does not yet support filters, but these changes are enablers for that support in the future. Note that it is possible (probable?) that there will be some accidental reversions if the changes here did not correctly mimic the existing filter testing. In any case, previously filter ids and parameters were of type unsigned int. In order to support the more general zarr filter model, this was all converted to char*. The old HDF5-specific, unsigned int operations are still supported but they are wrappers around the new, char* based nc_filterx_XXX functions. This entailed at least the following changes: 1. Added the files libdispatch/dfilterx.c and include/ncfilter.h 2. Some filterx utilities have been moved to libdispatch/daux.c 3. A new entry, "filter_actions" was added to the NCDispatch table and the version bumped. 4. An overly complex set of structs was created to support funnelling all of the filterx operations thru a single dispatch "filter_actions" entry. 5. Move common code to from libhdf5 to libsrc4 so that it is accessible to nczarr. Changes directly related to Zarr: 1. Modified CMakeList.txt and configure.ac to support both C and C++ -- this is in support of S3 support via the awd-sdk libraries. 2. Define a size64_t type to support nczarr. 3. More reworking of libdispatch/dinfermodel.c to support zarr and to regularize the structure of the fragments section of a URL. Changes not directly related to Zarr: 1. Make client-side filter registration be conditional, with default off. 2. Hack include/nc4internal.h to make some flags added by Ed be unique: e.g. NC_CREAT, NC_INDEF, etc. 3. cleanup include/nchttp.h and libdispatch/dhttp.c. 4. Misc. changes to support compiling under Visual Studio including: * Better testing under windows for dirent.h and opendir and closedir. 5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags and to centralize error reporting. 6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them. 7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible. Changes Left TO-DO: 1. fix provenance code, it is too HDF5 specific.
2020-06-29 08:02:47 +08:00
if(cmp < 0)
L = (m + 1);
else /*cmp > 0*/
R = (m - 1);
Add support for multiple filters per variable. re: https://github.com/Unidata/netcdf-c/issues/1584 Support has been added for multiple filters per variable. This affects a number of components in netcdf. The new APIs are documented in NUG/filters.md. The primary changes are: * A set of new functions are provided (see __include/netcdf_filter.h__). - Obtain a list of the filters associated with a variable - Obtain the parameters for a specific filter. * The existing __nc_inq_var_filter__ function now returns info about the first defined filter. * The utilities (ncgen, ncdump, and nccopy) now support an extended format for specifying a sequence of filters. The general form is __<filter>|<filter>..._. * The ncdump **_Filter** attribute now dumps a list of all the filters associated with a variable using the above new format. * Filter specifications can now use a filter name instead of number for filters known to the netcdf library, which in turn is taken from the HDF5 filter registration page. * New errors are defined: NC_EFILTER and NC_ENOFILTER. The latter is returned if an attempt is made to access an unknown filter. * Internally, the dispatch table has been extended to add a function to handle all of the filter functions. * New, filter-related, tests were added to nc_test4. * A new plugin was added to the plugins directory to help with testing. Notes: 1. The shuffle and fletcher32 filters are not part of the multifilter system. Misc. changes: 1. A debug module was added to libhdf5 to help catch error locations.
2020-02-17 03:59:33 +08:00
}
This PR adds EXPERIMENTAL support for accessing data in the cloud using a variant of the Zarr protocol and storage format. This enhancement is generically referred to as "NCZarr". The data model supported by NCZarr is netcdf-4 minus the user-defined types and the String type. In this sense it is similar to the CDF-5 data model. More detailed information about enabling and using NCZarr is described in the document NUG/nczarr.md and in a [Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in). WARNING: this code has had limited testing, so do use this version for production work. Also, performance improvements are ongoing. Note especially the following platform matrix of successful tests: Platform | Build System | S3 support ------------------------------------ Linux+gcc | Automake | yes Linux+gcc | CMake | yes Visual Studio | CMake | no Additionally, and as a consequence of the addition of NCZarr, major changes have been made to the Filter API. NOTE: NCZarr does not yet support filters, but these changes are enablers for that support in the future. Note that it is possible (probable?) that there will be some accidental reversions if the changes here did not correctly mimic the existing filter testing. In any case, previously filter ids and parameters were of type unsigned int. In order to support the more general zarr filter model, this was all converted to char*. The old HDF5-specific, unsigned int operations are still supported but they are wrappers around the new, char* based nc_filterx_XXX functions. This entailed at least the following changes: 1. Added the files libdispatch/dfilterx.c and include/ncfilter.h 2. Some filterx utilities have been moved to libdispatch/daux.c 3. A new entry, "filter_actions" was added to the NCDispatch table and the version bumped. 4. An overly complex set of structs was created to support funnelling all of the filterx operations thru a single dispatch "filter_actions" entry. 5. Move common code to from libhdf5 to libsrc4 so that it is accessible to nczarr. Changes directly related to Zarr: 1. Modified CMakeList.txt and configure.ac to support both C and C++ -- this is in support of S3 support via the awd-sdk libraries. 2. Define a size64_t type to support nczarr. 3. More reworking of libdispatch/dinfermodel.c to support zarr and to regularize the structure of the fragments section of a URL. Changes not directly related to Zarr: 1. Make client-side filter registration be conditional, with default off. 2. Hack include/nc4internal.h to make some flags added by Ed be unique: e.g. NC_CREAT, NC_INDEF, etc. 3. cleanup include/nchttp.h and libdispatch/dhttp.c. 4. Misc. changes to support compiling under Visual Studio including: * Better testing under windows for dirent.h and opendir and closedir. 5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags and to centralize error reporting. 6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them. 7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible. Changes Left TO-DO: 1. fix provenance code, it is too HDF5 specific.
2020-06-29 08:02:47 +08:00
return NULL;
Add support for multiple filters per variable. re: https://github.com/Unidata/netcdf-c/issues/1584 Support has been added for multiple filters per variable. This affects a number of components in netcdf. The new APIs are documented in NUG/filters.md. The primary changes are: * A set of new functions are provided (see __include/netcdf_filter.h__). - Obtain a list of the filters associated with a variable - Obtain the parameters for a specific filter. * The existing __nc_inq_var_filter__ function now returns info about the first defined filter. * The utilities (ncgen, ncdump, and nccopy) now support an extended format for specifying a sequence of filters. The general form is __<filter>|<filter>..._. * The ncdump **_Filter** attribute now dumps a list of all the filters associated with a variable using the above new format. * Filter specifications can now use a filter name instead of number for filters known to the netcdf library, which in turn is taken from the HDF5 filter registration page. * New errors are defined: NC_EFILTER and NC_ENOFILTER. The latter is returned if an attempt is made to access an unknown filter. * Internally, the dispatch table has been extended to add a function to handle all of the filter functions. * New, filter-related, tests were added to nc_test4. * A new plugin was added to the plugins directory to help with testing. Notes: 1. The shuffle and fletcher32 filters are not part of the multifilter system. Misc. changes: 1. A debug module was added to libhdf5 to help catch error locations.
2020-02-17 03:59:33 +08:00
}
static int
NC4_move_in_NCList(NC* nc, int new_id)
{
int stat = move_in_NCList(nc,new_id);
if(stat == NC_NOERR) {
/* Synchronize header */
if(nc->dispatchdata)
((NC_OBJ*)nc->dispatchdata)->id = nc->ext_ncid;
}
return stat;
}
Fix various problem around VLEN's re: https://github.com/Unidata/netcdf-c/issues/541 re: https://github.com/Unidata/netcdf-c/issues/1208 re: https://github.com/Unidata/netcdf-c/issues/2078 re: https://github.com/Unidata/netcdf-c/issues/2041 re: https://github.com/Unidata/netcdf-c/issues/2143 For a long time, there have been known problems with the management of complex types containing VLENs. This also involves the string type because it is stored as a VLEN of chars. This PR (mostly) fixes this problem. But note that it adds new functions to netcdf.h (see below) and this may require bumping the .so number. These new functions can be removed, if desired, in favor of functions in netcdf_aux.h, but netcdf.h seems the better place for them because they are intended as alternatives to the nc_free_vlen and nc_free_string functions already in netcdf.h. The term complex type refers to any type that directly or transitively references a VLEN type. So an array of VLENS, a compound with a VLEN field, and so on. In order to properly handle instances of these complex types, it is necessary to have function that can recursively walk instances of such types to perform various actions on them. The term "deep" is also used to mean recursive. At the moment, the two operations needed by the netcdf library are: * free'ing an instance of the complex type * copying an instance of the complex type. The current library does only shallow free and shallow copy of complex types. This means that only the top level is properly free'd or copied, but deep internal blocks in the instance are not touched. Note that the term "vector" will be used to mean a contiguous (in memory) sequence of instances of some type. Given an array with, say, dimensions 2 X 3 X 4, this will be stored in memory as a vector of length 2*3*4=24 instances. The use cases are primarily these. ## nc_get_vars Suppose one is reading a vector of instances using nc_get_vars (or nc_get_vara or nc_get_var, etc.). These functions will return the vector in the top-level memory provided. All interior blocks (form nested VLEN or strings) will have been dynamically allocated. After using this vector of instances, it is necessary to free (aka reclaim) the dynamically allocated memory, otherwise a memory leak occurs. So, the recursive reclaim function is used to walk the returned instance vector and do a deep reclaim of the data. Currently functions are defined in netcdf.h that are supposed to handle this: nc_free_vlen(), nc_free_vlens(), and nc_free_string(). Unfortunately, these functions only do a shallow free, so deeply nested instances are not properly handled by them. Note that internally, the provided data is immediately written so there is no need to copy it. But the caller may need to reclaim the data it passed into the function. ## nc_put_att Suppose one is writing a vector of instances as the data of an attribute using, say, nc_put_att. Internally, the incoming attribute data must be copied and stored so that changes/reclamation of the input data will not affect the attribute. Again, the code inside the netcdf library does only shallow copying rather than deep copy. As a result, one sees effects such as described in Github Issue https://github.com/Unidata/netcdf-c/issues/2143. Also, after defining the attribute, it may be necessary for the user to free the data that was provided as input to nc_put_att(). ## nc_get_att Suppose one is reading a vector of instances as the data of an attribute using, say, nc_get_att. Internally, the existing attribute data must be copied and returned to the caller, and the caller is responsible for reclaiming the returned data. Again, the code inside the netcdf library does only shallow copying rather than deep copy. So this can lead to memory leaks and errors because the deep data is shared between the library and the user. # Solution The solution is to build properly recursive reclaim and copy functions and use those as needed. These recursive functions are defined in libdispatch/dinstance.c and their signatures are defined in include/netcdf.h. For back compatibility, corresponding "ncaux_XXX" functions are defined in include/netcdf_aux.h. ```` int nc_reclaim_data(int ncid, nc_type xtypeid, void* memory, size_t count); int nc_reclaim_data_all(int ncid, nc_type xtypeid, void* memory, size_t count); int nc_copy_data(int ncid, nc_type xtypeid, const void* memory, size_t count, void* copy); int nc_copy_data_all(int ncid, nc_type xtypeid, const void* memory, size_t count, void** copyp); ```` There are two variants. The first two, nc_reclaim_data() and nc_copy_data(), assume the top-level vector is managed by the caller. For reclaim, this is so the user can use, for example, a statically allocated vector. For copy, it assumes the user provides the space into which the copy is stored. The second two, nc_reclaim_data_all() and nc_copy_data_all(), allows the functions to manage the top-level. So for nc_reclaim_data_all, the top level is assumed to be dynamically allocated and will be free'd by nc_reclaim_data_all(). The nc_copy_data_all() function will allocate the top level and return a pointer to it to the user. The user can later pass that pointer to nc_reclaim_data_all() to reclaim the instance(s). # Internal Changes The netcdf-c library internals are changed to use the proper reclaim and copy functions. It turns out that the places where these functions are needed is quite pervasive in the netcdf-c library code. Using these functions also allows some simplification of the code since the stdata and vldata fields of NC_ATT_INFO are no longer needed. Currently this is commented out using the SEPDATA \#define macro. When any bugs are largely fixed, all this code will be removed. # Known Bugs 1. There is still one known failure that has not been solved. All the failures revolve around some variant of this .cdl file. The proximate cause of failure is the use of a VLEN FillValue. ```` netcdf x { types: float(*) row_of_floats ; dimensions: m = 5 ; variables: row_of_floats ragged_array(m) ; row_of_floats ragged_array:_FillValue = {-999} ; data: ragged_array = {10, 11, 12, 13, 14}, {20, 21, 22, 23}, {30, 31, 32}, {40, 41}, _ ; } ```` When a solution is found, I will either add it to this PR or post a new PR. # Related Changes * Mark nc_free_vlen(s) as deprecated in favor of ncaux_reclaim_data. * Remove the --enable-unfixed-memory-leaks option. * Remove the NC_VLENS_NOTEST code that suppresses some vlen tests. * Document this change in docs/internal.md * Disable the tst_vlen_data test in ncdump/tst_nccopy4.sh. * Mark types as fixed size or not (transitively) to optimize the reclaim and copy functions. # Misc. Changes * Make Doxygen process libdispatch/daux.c * Make sure the NC_ATT_INFO_T.container field is set.
2022-01-09 09:30:00 +08:00