netcdf-c

mirror of https://github.com/Unidata/netcdf-c.git synced 2024-12-27 08:49:16 +08:00

Author	SHA1	Message	Date
Ward Fisher	d281be2333	Merge branch 'main' into open_mem_truncated_file	2022-03-14 15:02:02 -06:00
Dennis Heimbigner	8b9253fef2	Fix various problem around VLEN's re: https://github.com/Unidata/netcdf-c/issues/541 re: https://github.com/Unidata/netcdf-c/issues/1208 re: https://github.com/Unidata/netcdf-c/issues/2078 re: https://github.com/Unidata/netcdf-c/issues/2041 re: https://github.com/Unidata/netcdf-c/issues/2143 For a long time, there have been known problems with the management of complex types containing VLENs. This also involves the string type because it is stored as a VLEN of chars. This PR (mostly) fixes this problem. But note that it adds new functions to netcdf.h (see below) and this may require bumping the .so number. These new functions can be removed, if desired, in favor of functions in netcdf_aux.h, but netcdf.h seems the better place for them because they are intended as alternatives to the nc_free_vlen and nc_free_string functions already in netcdf.h. The term complex type refers to any type that directly or transitively references a VLEN type. So an array of VLENS, a compound with a VLEN field, and so on. In order to properly handle instances of these complex types, it is necessary to have function that can recursively walk instances of such types to perform various actions on them. The term "deep" is also used to mean recursive. At the moment, the two operations needed by the netcdf library are: * free'ing an instance of the complex type * copying an instance of the complex type. The current library does only shallow free and shallow copy of complex types. This means that only the top level is properly free'd or copied, but deep internal blocks in the instance are not touched. Note that the term "vector" will be used to mean a contiguous (in memory) sequence of instances of some type. Given an array with, say, dimensions 2 X 3 X 4, this will be stored in memory as a vector of length 234=24 instances. The use cases are primarily these. ## nc_get_vars Suppose one is reading a vector of instances using nc_get_vars (or nc_get_vara or nc_get_var, etc.). These functions will return the vector in the top-level memory provided. All interior blocks (form nested VLEN or strings) will have been dynamically allocated. After using this vector of instances, it is necessary to free (aka reclaim) the dynamically allocated memory, otherwise a memory leak occurs. So, the recursive reclaim function is used to walk the returned instance vector and do a deep reclaim of the data. Currently functions are defined in netcdf.h that are supposed to handle this: nc_free_vlen(), nc_free_vlens(), and nc_free_string(). Unfortunately, these functions only do a shallow free, so deeply nested instances are not properly handled by them. Note that internally, the provided data is immediately written so there is no need to copy it. But the caller may need to reclaim the data it passed into the function. ## nc_put_att Suppose one is writing a vector of instances as the data of an attribute using, say, nc_put_att. Internally, the incoming attribute data must be copied and stored so that changes/reclamation of the input data will not affect the attribute. Again, the code inside the netcdf library does only shallow copying rather than deep copy. As a result, one sees effects such as described in Github Issue https://github.com/Unidata/netcdf-c/issues/2143. Also, after defining the attribute, it may be necessary for the user to free the data that was provided as input to nc_put_att(). ## nc_get_att Suppose one is reading a vector of instances as the data of an attribute using, say, nc_get_att. Internally, the existing attribute data must be copied and returned to the caller, and the caller is responsible for reclaiming the returned data. Again, the code inside the netcdf library does only shallow copying rather than deep copy. So this can lead to memory leaks and errors because the deep data is shared between the library and the user. # Solution The solution is to build properly recursive reclaim and copy functions and use those as needed. These recursive functions are defined in libdispatch/dinstance.c and their signatures are defined in include/netcdf.h. For back compatibility, corresponding "ncaux_XXX" functions are defined in include/netcdf_aux.h. ```` int nc_reclaim_data(int ncid, nc_type xtypeid, void* memory, size_t count); int nc_reclaim_data_all(int ncid, nc_type xtypeid, void* memory, size_t count); int nc_copy_data(int ncid, nc_type xtypeid, const void* memory, size_t count, void* copy); int nc_copy_data_all(int ncid, nc_type xtypeid, const void* memory, size_t count, void** copyp); ```` There are two variants. The first two, nc_reclaim_data() and nc_copy_data(), assume the top-level vector is managed by the caller. For reclaim, this is so the user can use, for example, a statically allocated vector. For copy, it assumes the user provides the space into which the copy is stored. The second two, nc_reclaim_data_all() and nc_copy_data_all(), allows the functions to manage the top-level. So for nc_reclaim_data_all, the top level is assumed to be dynamically allocated and will be free'd by nc_reclaim_data_all(). The nc_copy_data_all() function will allocate the top level and return a pointer to it to the user. The user can later pass that pointer to nc_reclaim_data_all() to reclaim the instance(s). # Internal Changes The netcdf-c library internals are changed to use the proper reclaim and copy functions. It turns out that the places where these functions are needed is quite pervasive in the netcdf-c library code. Using these functions also allows some simplification of the code since the stdata and vldata fields of NC_ATT_INFO are no longer needed. Currently this is commented out using the SEPDATA \#define macro. When any bugs are largely fixed, all this code will be removed. # Known Bugs 1. There is still one known failure that has not been solved. All the failures revolve around some variant of this .cdl file. The proximate cause of failure is the use of a VLEN FillValue. ```` netcdf x { types: float() row_of_floats ; dimensions: m = 5 ; variables: row_of_floats ragged_array(m) ; row_of_floats ragged_array:_FillValue = {-999} ; data: ragged_array = {10, 11, 12, 13, 14}, {20, 21, 22, 23}, {30, 31, 32}, {40, 41}, _ ; } ```` When a solution is found, I will either add it to this PR or post a new PR. # Related Changes Mark nc_free_vlen(s) as deprecated in favor of ncaux_reclaim_data. * Remove the --enable-unfixed-memory-leaks option. * Remove the NC_VLENS_NOTEST code that suppresses some vlen tests. * Document this change in docs/internal.md * Disable the tst_vlen_data test in ncdump/tst_nccopy4.sh. * Mark types as fixed size or not (transitively) to optimize the reclaim and copy functions. # Misc. Changes * Make Doxygen process libdispatch/daux.c * Make sure the NC_ATT_INFO_T.container field is set.	2022-01-08 18:30:00 -07:00
Dennis Heimbigner	f6e25b695e	Fix additional S3 support issues re: https://github.com/Unidata/netcdf-c/issues/2117 re: https://github.com/Unidata/netcdf-c/issues/2119 * Modify libsrc to allow byte-range reading of netcdf-3 files in private S3 buckets; this required using the aws sdk. Also add a test case. * The aws sdk can sometimes cause problems if the Awd::ShutdownAPI function is not called. So at optional atexit() support to ensure it is called. This is disabled for Windows. * Add documentation to nczarr.md on how to build and use the aws sdk under windows. Currently it builds, but testing fails. * Switch testing from stratus to the Unidata bucket on S3. * Improve support for the s3: url protocol. * Add a s3 specific utility code file: ds3util.c * Modify NC_infermodel to attempt to read the magic number of byte-ranged files in S3. ## Misc. * Move and rename the core S3 SDK wrapper code (libnczarr/zs3sdk.cpp) to libdispatch since it now used in libsrc as well as libnczarr. * Add calls to nc_finalize in the utilities in case atexit is disabled. * Add header only json parser to the distribution rather than as a built source.	2021-10-29 20:06:37 -06:00
Tobias Kölling	b58a3ff07a	allow missing udata when closing file with abort=1 If a memory backed file is closed due to an aborted opening (i.e. a broken file was attempted to be opened), udata may not be set. In this case, the assertation checking for udata should not be triggered.	2021-08-24 17:11:05 +02:00
Dennis Heimbigner	e1c470683c	Fix merge error from PR https://github.com/Unidata/netcdf-c/pull/1892/files	2020-12-01 20:10:48 -07:00
Ward Fisher	c96722e001	Merge pull request #1890 from gsjaardema/patch-46 Fix undefined struct member access	2020-12-01 14:41:27 -07:00
Dennis Heimbigner	68bcd1122a	Enforce that !ENABLE_BYTERANGE => !ENABLE_HDF5_ROS3 This is a follow on to PR https://github.com/Unidata/netcdf-c/pull/1890 Modify configure.ac to enforce that !ENABLE_BYTERANGE => !ENABLE_HDF5_ROS3	2020-11-28 13:00:06 -07:00
Greg Sjaardema	db0b84252c	Fix undefined struct member access The `http` field of the hdf5 info struct is not defined unless `ENABLE_HDF5_ROS3` or `ENABLE_BYTERANGE` or `ENABLE_S3_SDK` is defined. Based on a quick look at the code, I think that the `ENABLE_HDF5_ROS3` define is the relavant one here. Maybe a better fix is to check if any of them are defined...	2020-11-24 08:25:29 -07:00
Dennis Heimbigner	eb3d9eb0c9	Provide a Number of fixes/improvements to NCZarr Primary changes: * Add an improved cache system to speed up performance. * Fix NCZarr to properly handle scalar variables. Misc. Related Changes: * Added unit tests for extendible hash and for the generic cache. * Add config parameter to set size of the NCZarr cache. * Add initial performance tests but leave them unused. * Add CRC64 support. * Move location of ncdumpchunks utility from /ncgen to /ncdump. * Refactor auth support. Misc. Unrelated Changes: * More cleanup of the S3 support * Add support for S3 authentication in .rc files: HTTP.S3.ACCESSID and HTTP.S3.SECRETKEY. * Remove the hashkey from the struct OBJHDR since it is never used.	2020-11-19 17:01:04 -07:00
Edward Hartnett	832fbf19c8	now dont return error on second redef call for netcdf/HDF5 files	2020-07-08 11:10:15 -06:00
Dennis Heimbigner	59e04ae071	This PR adds EXPERIMENTAL support for accessing data in the cloud using a variant of the Zarr protocol and storage format. This enhancement is generically referred to as "NCZarr". The data model supported by NCZarr is netcdf-4 minus the user-defined types and the String type. In this sense it is similar to the CDF-5 data model. More detailed information about enabling and using NCZarr is described in the document NUG/nczarr.md and in a [Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in). WARNING: this code has had limited testing, so do use this version for production work. Also, performance improvements are ongoing. Note especially the following platform matrix of successful tests: Platform \| Build System \| S3 support ------------------------------------ Linux+gcc \| Automake \| yes Linux+gcc \| CMake \| yes Visual Studio \| CMake \| no Additionally, and as a consequence of the addition of NCZarr, major changes have been made to the Filter API. NOTE: NCZarr does not yet support filters, but these changes are enablers for that support in the future. Note that it is possible (probable?) that there will be some accidental reversions if the changes here did not correctly mimic the existing filter testing. In any case, previously filter ids and parameters were of type unsigned int. In order to support the more general zarr filter model, this was all converted to char. The old HDF5-specific, unsigned int operations are still supported but they are wrappers around the new, char based nc_filterx_XXX functions. This entailed at least the following changes: 1. Added the files libdispatch/dfilterx.c and include/ncfilter.h 2. Some filterx utilities have been moved to libdispatch/daux.c 3. A new entry, "filter_actions" was added to the NCDispatch table and the version bumped. 4. An overly complex set of structs was created to support funnelling all of the filterx operations thru a single dispatch "filter_actions" entry. 5. Move common code to from libhdf5 to libsrc4 so that it is accessible to nczarr. Changes directly related to Zarr: 1. Modified CMakeList.txt and configure.ac to support both C and C++ -- this is in support of S3 support via the awd-sdk libraries. 2. Define a size64_t type to support nczarr. 3. More reworking of libdispatch/dinfermodel.c to support zarr and to regularize the structure of the fragments section of a URL. Changes not directly related to Zarr: 1. Make client-side filter registration be conditional, with default off. 2. Hack include/nc4internal.h to make some flags added by Ed be unique: e.g. NC_CREAT, NC_INDEF, etc. 3. cleanup include/nchttp.h and libdispatch/dhttp.c. 4. Misc. changes to support compiling under Visual Studio including: * Better testing under windows for dirent.h and opendir and closedir. 5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags and to centralize error reporting. 6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them. 7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible. Changes Left TO-DO: 1. fix provenance code, it is too HDF5 specific.	2020-06-28 18:02:47 -06:00
Dennis Heimbigner	b0e0d81aa9	Fix reclamation of the ->format_XXX_info fields nc4internal.c contains code to free the format_XXX_info fields. Since these are format specific, this code was moved to the dispatch code (libhdf5 and libhdf4 in the current case). Additionally, there are some fields in nc4internal.h (e.g. dimscale fields) that are specific to HDF5 and have been moved to the corresponding HDF5 data structures and code. Misc. other changes: 1. NC_VAR_INFO_T->hdf5_name renamed to alt_name to avoid implying it is necessarily HDF5 specific. 2. prefix NC_FILE_INFO_T with an instance of NC_OBJ for consistency. this also requires wrapping move_in_NCList() to keep hdr.id consistent.	2020-03-29 12:48:59 -06:00
Ed Hartnett	76d6b55eff	moved call to nc4_rec_grp_del() to inside nc4_nc4f_list_del()	2019-07-16 16:29:06 -06:00
Ed Hartnett	4398cad8f5	whitespace cleanup	2019-07-16 16:17:07 -06:00
Ed Hartnett	b8e50c9254	moved freeing of allvars, alldims, alltypes lists to nc4_nc4f_list_del	2019-07-16 16:16:11 -06:00
Ed Hartnett	e9666f7333	moved free(h5) intonc4_nc4f_list_del	2019-07-16 16:07:21 -06:00
Ed Hartnett	d840c1864c	removed unused prototype	2019-07-16 16:02:08 -06:00
Dennis Heimbigner	2eb1a8d8cf	For some reason, the code for this was incorrect. Anyway, I repaired it as follows: 1. Created NC4_write_provenance as parallel to NC4_read_provenance 2. Modified hdf5file.c to use NC4_write_provenance 3. Modified hdf5open.c to use NC4_read_provenance (was NC4_read_ncproperties). 4. The creation of the _NCProperties string was seriously hosed: was using all the wrong fields.	2019-04-18 14:23:20 -06:00
Dennis Heimbigner	88a7a1753c	Simplify libhdf5/nc5info.c to move to lazy parsing re: https://github.com/Unidata/netcdf-c/issues/1352 When nc4info.c encounters an _NCProperties attribute with a version number it does not recognize, it does not show it correctly. Solution chosen is to arrange so that accessing the attribute returns the raw value of the Attribute from the file. This way, even if the version is unrecognized, it will return something usable. The changes were primarily to never attempt to parse the value of _NCProperties until actually required. Which since they are currently not used means that parsing never occurs. Also modified ncdump/tst_fileinfo.sh to include some extra testing I tested the original failure by changing the value of NCPROPS to 3. However, there is no way to test this at build time. Misc. Changes * Inlined the provenance info in the NC_FILE_INFO_T structure * Centralized stuff from elsewhere into include/nc_provenance.h Misc. Unrelated Changes * Removed/turned off some misc debug output left on by accident * Fix CPPFLAGS name error in libhdf5/Makefile.am	2019-03-09 20:35:57 -07:00
Ed Hartnett	8b1f5a8fad	cleanup of whitespace in HDF5 directory	2019-02-19 05:18:02 -07:00
Ed Hartnett	840d51d035	changed NC_GRP_INFO_T to use atts_read instead of atts_not_read	2019-01-22 08:11:52 -07:00
Ed Hartnett	c6a9948a8e	removed unneeded var, fixed broken log statements that cause segfaults	2019-01-20 09:37:13 -07:00
Ed Hartnett	3c9a141ee3	moved function detect_preserve_dimids and made it static	2018-12-11 06:15:47 -07:00
Ed Hartnett	8d31f5b806	clean up	2018-11-07 14:23:55 -07:00
Ed Hartnett	6f4b4ac80d	moving attribute HDF5 stuff to libhdf5	2018-11-07 14:21:57 -07:00
Ed Hartnett	5f36a3b425	merged master	2018-11-02 10:00:53 -06:00
Dennis Heimbigner	245961de00	re: github issues https://github.com/Unidata/netcdf-c/issues/1168 https://github.com/Unidata/netcdf-c/issues/1163 https://github.com/Unidata/netcdf-c/issues/1162 This PR partially fixes memory leaks in the netcdf-c library, in the ncdump utility, and in some test cases. The netcdf-c library now runs memory clean with the assumption that the --disable-utilities option is used. The primary remaining problem is ncgen. Once that is fixed, I believe the netcdf-c library will run memory clean with no limitations. Notes ----------- 1. Memory checking was performed using gcc -fsanitize=address. Valgrind-based testing has yet to be performed. 2. The pnetcdf, hdf4, and examples code has not been tested. Misc. Non-leak changes 1. Make tst_diskless2 only run when netcdf4 is enabled (issue 1162) 2. Fix CmakeLists.txt to turn off logging if ENABLE_NETCDF_4 is OFF 3. Isolated all my debug scripts into a single top-level directory called debug 4. Fix some USE_NETCDF4 dependencies in nc_test and nc_test4 Makefile.am	2018-10-30 20:48:12 -06:00
Ed Hartnett	452f75fadd	commented out some tests	2018-10-22 15:04:44 -06:00
Ed Hartnett	958826d0af	merged ejh_mem_check	2018-10-22 13:29:46 -06:00
Dennis Heimbigner	e40eb2e950	switch	2018-10-19 11:11:36 -06:00
Ed Hartnett	695e295734	now calling recursive function to close HDF5 objects in file	2018-10-18 03:29:21 -06:00
Ed Hartnett	fa86d3c488	continuing to separate hdf5/libsrc4 file close code	2018-10-18 03:20:39 -06:00
Ed Hartnett	c90ab24b48	moving towards separating HDF5 file close from netcdf4 file close	2018-10-18 03:17:38 -06:00
Dennis Heimbigner	979873f81d	Fix provenance memory leak	2018-10-17 11:43:14 -06:00
Dennis Heimbigner	4636584d5b	Revert/Improve nc_create + NC_DISKLESS behavior re: https://github.com/Unidata/netcdf-c/issues/1154 Inadvertently, the behavior of NC_DISKLESS with nc_create() was changed in release 4.6.1. Previously, the NC_WRITE flag needed to be explicitly used with NC_DISKLESS in order to cause the created file to be persisted to disk. Additional analyis indicated that the current NC_DISKLESS implementation was seriously flawed. This PR attempts to clean up and regularize the situation with respect to NC_DISKLESS control. One important aspect of diskless operation is that there are two different notions of write. 1. The file is read-write vs read-only when using the netcdf API. 2. The file is persisted or not to disk at nc_close(). Previously, these two were conflated. The rules now are as follows. 1. NC_DISKLESS + NC_WRITE means that the file is read/write using the netcdf API 2. NC_DISKLESS + NC_PERSIST means that the file is persisted to a disk file at nc_close. 3. NC_DISKLESS + NC_PERSIST + NC_WRITE means both 1 and 2. The NC_PERSIST flag is new and takes over the obsolete NC_MPIPOSIX flag. NC_MPIPOSIX is still defined, but is now an alias for the NC_MPIIO flag. It is also now the case that for netcdf-4, NC_DISKLESS is independent of NC_INMEMORY and in fact it is an error to specify both flags simultaneously. Finally, the MMAP code was fixed to use NC_PERSIST as well. Also marked MMAP as deprecated. Also added a test case to test various combinations of NC_DISKLESS, NC_PERSIST, and NC_WRITE. This PR affects a number of files and especially test cases that used NC_DISKLESS. Misc. Unrelated fixes 1. fixed some warnings in ncdump/dumplib.c	2018-10-10 13:32:17 -06:00
Ed Hartnett	d9ef143d1e	separated cache code from hdf5file.c	2018-09-14 13:33:22 -06:00
Ed Hartnett	b501748f58	fixed merge error	2018-09-14 11:56:10 -06:00
Ed Hartnett	a009dab557	fixed inadvertant move of function	2018-09-14 11:39:57 -06:00
Ed Hartnett	e2839c120f	Merge branch 'master' into ejh_hdf5_sep_next_2	2018-09-14 11:33:59 -06:00
Ed Hartnett	eabb690949	fixing merge issue	2018-09-12 09:36:36 -06:00
Ed Hartnett	abf247de92	made dumpopenobjects static in attempt to get appvayor build working	2018-09-12 07:53:31 -06:00
Ed Hartnett	08a2dce904	merged master	2018-09-07 12:40:44 -06:00
Ed Hartnett	8390d572ad	Merge branch 'master' into ejh_hdf5_sep_next	2018-09-06 17:30:37 -06:00
Ward Fisher	784d777bff	Merge branch 'master' into provenance.dmh	2018-09-06 15:13:09 -06:00
Ed Hartnett	86e002d794	merged master	2018-09-06 14:01:59 -06:00
Ed Hartnett	4213f10ed6	moved function	2018-09-06 13:57:39 -06:00
Ed Hartnett	80dc5bc0f7	merged master	2018-09-06 12:24:29 -06:00
Ed Hartnett	6c86ad8229	moved functions back	2018-09-06 12:19:17 -06:00
Ward Fisher	fbe0a18b1c	Merge branch 'master' into ejh_loop_cleanup_2	2018-09-05 11:22:55 -06:00
Dennis Heimbigner	d62a9e623c	Fix the NC_INMEMORY code to work in all cases with HDF5 1.10. re: github issue https://github.com/Unidata/netcdf-c/issues/1111 One of the less common use cases for the in-memory feature is apparently failing with HDF5-1.10.x. The fix is complicated and requires significant changes to libhdf5/nc4memcb.c. The current setup is detailed in the file docs/inmeminternal.dox. Additionally, it was discovered that the program nc_test/tst_inmemory.c, which is invoked by nc_test/run_inmemory.sh, actually was failing because of the above problem. But the failure is not detected since the script does not return non-zero value. Other Changes: 1. Fix nc_test_tst_inmemory to return errors correctly. 2. Make ncdap_tests/findtestserver.c and dap4_tests/findtestserver4.c be generated from ncdap_test/findtestserver.c.in. 3. Make LOG() print output to stderr instead of stdout to avoid contaminating e.g. ncdump output. 4. Modify the handling of NC_INMEMORY and NC_DISKLESS flags to properly handle that NC_DISKLESS => NC_INMEMORY. This affects a number of code pieces, especially memio.c.	2018-09-04 11:27:47 -06:00

1 2 3

114 Commits