Commit Graph

52 Commits

Author SHA1 Message Date
Dennis Heimbigner
8b9253fef2 Fix various problem around VLEN's
re: https://github.com/Unidata/netcdf-c/issues/541
re: https://github.com/Unidata/netcdf-c/issues/1208
re: https://github.com/Unidata/netcdf-c/issues/2078
re: https://github.com/Unidata/netcdf-c/issues/2041
re: https://github.com/Unidata/netcdf-c/issues/2143

For a long time, there have been known problems with the
management of complex types containing VLENs.  This also
involves the string type because it is stored as a VLEN of
chars.

This PR (mostly) fixes this problem. But note that it adds new
functions to netcdf.h (see below) and this may require bumping
the .so number.  These new functions can be removed, if desired,
in favor of functions in netcdf_aux.h, but netcdf.h seems the
better place for them because they are intended as alternatives
to the nc_free_vlen and nc_free_string functions already in
netcdf.h.

The term complex type refers to any type that directly or
transitively references a VLEN type. So an array of VLENS, a
compound with a VLEN field, and so on.

In order to properly handle instances of these complex types, it
is necessary to have function that can recursively walk
instances of such types to perform various actions on them.  The
term "deep" is also used to mean recursive.

At the moment, the two operations needed by the netcdf library are:
* free'ing an instance of the complex type
* copying an instance of the complex type.

The current library does only shallow free and shallow copy of
complex types. This means that only the top level is properly
free'd or copied, but deep internal blocks in the instance are
not touched.

Note that the term "vector" will be used to mean a contiguous (in
memory) sequence of instances of some type. Given an array with,
say, dimensions 2 X 3 X 4, this will be stored in memory as a
vector of length 2*3*4=24 instances.

The use cases are primarily these.

## nc_get_vars
Suppose one is reading a vector of instances using nc_get_vars
(or nc_get_vara or nc_get_var, etc.).  These functions will
return the vector in the top-level memory provided.  All
interior blocks (form nested VLEN or strings) will have been
dynamically allocated.

After using this vector of instances, it is necessary to free
(aka reclaim) the dynamically allocated memory, otherwise a
memory leak occurs.  So, the recursive reclaim function is used
to walk the returned instance vector and do a deep reclaim of
the data.

Currently functions are defined in netcdf.h that are supposed to
handle this: nc_free_vlen(), nc_free_vlens(), and
nc_free_string().  Unfortunately, these functions only do a
shallow free, so deeply nested instances are not properly
handled by them.

Note that internally, the provided data is immediately written so
there is no need to copy it. But the caller may need to reclaim the
data it passed into the function.

## nc_put_att
Suppose one is writing a vector of instances as the data of an attribute
using, say, nc_put_att.

Internally, the incoming attribute data must be copied and stored
so that changes/reclamation of the input data will not affect
the attribute.

Again, the code inside the netcdf library does only shallow copying
rather than deep copy. As a result, one sees effects such as described
in Github Issue https://github.com/Unidata/netcdf-c/issues/2143.

Also, after defining the attribute, it may be necessary for the user
to free the data that was provided as input to nc_put_att().

## nc_get_att
Suppose one is reading a vector of instances as the data of an attribute
using, say, nc_get_att.

Internally, the existing attribute data must be copied and returned
to the caller, and the caller is responsible for reclaiming
the returned data.

Again, the code inside the netcdf library does only shallow copying
rather than deep copy. So this can lead to memory leaks and errors
because the deep data is shared between the library and the user.

# Solution

The solution is to build properly recursive reclaim and copy
functions and use those as needed.
These recursive functions are defined in libdispatch/dinstance.c
and their signatures are defined in include/netcdf.h.
For back compatibility, corresponding "ncaux_XXX" functions
are defined in include/netcdf_aux.h.
````
int nc_reclaim_data(int ncid, nc_type xtypeid, void* memory, size_t count);
int nc_reclaim_data_all(int ncid, nc_type xtypeid, void* memory, size_t count);
int nc_copy_data(int ncid, nc_type xtypeid, const void* memory, size_t count, void* copy);
int nc_copy_data_all(int ncid, nc_type xtypeid, const void* memory, size_t count, void** copyp);
````
There are two variants. The first two, nc_reclaim_data() and
nc_copy_data(), assume the top-level vector is managed by the
caller. For reclaim, this is so the user can use, for example, a
statically allocated vector. For copy, it assumes the user
provides the space into which the copy is stored.

The second two, nc_reclaim_data_all() and
nc_copy_data_all(), allows the functions to manage the
top-level.  So for nc_reclaim_data_all, the top level is
assumed to be dynamically allocated and will be free'd by
nc_reclaim_data_all().  The nc_copy_data_all() function
will allocate the top level and return a pointer to it to the
user. The user can later pass that pointer to
nc_reclaim_data_all() to reclaim the instance(s).

# Internal Changes
The netcdf-c library internals are changed to use the proper
reclaim and copy functions.  It turns out that the places where
these functions are needed is quite pervasive in the netcdf-c
library code.  Using these functions also allows some
simplification of the code since the stdata and vldata fields of
NC_ATT_INFO are no longer needed.  Currently this is commented
out using the SEPDATA \#define macro.  When any bugs are largely
fixed, all this code will be removed.

# Known Bugs

1. There is still one known failure that has not been solved.
   All the failures revolve around some variant of this .cdl file.
   The proximate cause of failure is the use of a VLEN FillValue.
````
        netcdf x {
        types:
          float(*) row_of_floats ;
        dimensions:
          m = 5 ;
        variables:
          row_of_floats ragged_array(m) ;
              row_of_floats ragged_array:_FillValue = {-999} ;
        data:
          ragged_array = {10, 11, 12, 13, 14}, {20, 21, 22, 23}, {30, 31, 32},
                         {40, 41}, _ ;
        }
````
When a solution is found, I will either add it to this PR or post a new PR.

# Related Changes

* Mark nc_free_vlen(s) as deprecated in favor of ncaux_reclaim_data.
* Remove the --enable-unfixed-memory-leaks option.
* Remove the NC_VLENS_NOTEST code that suppresses some vlen tests.
* Document this change in docs/internal.md
* Disable the tst_vlen_data test in ncdump/tst_nccopy4.sh.
* Mark types as fixed size or not (transitively) to optimize the reclaim
  and copy functions.

# Misc. Changes

* Make Doxygen process libdispatch/daux.c
* Make sure the NC_ATT_INFO_T.container field is set.
2022-01-08 18:30:00 -07:00
Dennis Heimbigner
9380790ea8 Support MSYS2/Mingw platform
re:

The current netcdf-c release has some problems with the mingw platform
on windows. Mostly they are path issues.

Changes to support mingw+msys2:
-------------------------------
* Enable option of looking into the windows registry to find
  the mingw root path. In aid of proper path handling.
* Add mingw+msys as a specific platform in configure.ac and move testing
  of the platform to the front so it is available early.
* Handle mingw X libncpoco (dynamic loader) properly even though
  mingw does not yet support it.
* Handle mingw X plugins properly even though mingw does not yet support it.
* Alias pwd='pwd -W' to better handle paths in shell scripts.
* Plus a number of other minor compile irritations.
* Disallow the use of multiple nc_open's on the same file for windows
  (and mingw) because windows does not seem to handle these properly.
  Not sure why we did not catch this earlier.
* Add mountpoint info to dpathmgr.c to help support mingw.
* Cleanup dpathmgr conversions.

Known problems:
---------------
* I have not been able to get shared libraries to work, so
  plugins/filters must be disabled.
* There is some kind of problem with libcurl that I have not solved,
  so all uses of libcurl (currently DAP+Byterange) must be disabled.

Misc. other fixes:
------------------
* Cleanup the relationship between ENABLE_PLUGINS and various other flags
  in CMakeLists.txt and configure.ac.
* Re-arrange the TESTDIRS order in Makefile.am.
* Add pseudo-breakpoint to nclog.[ch] for debugging.
* Improve the documentation of the path manager code in ncpathmgr.h
* Add better support for relative paths in dpathmgr.c
* Default the mode args to NCfopen to include "b" (binary) for windows.
* Add optional debugging output in various places.
* Make sure that everything builds with plugins disabled.
* Fix numerous (s)printf inconsistencies betweenb the format spec
  and the arguments.
2021-12-23 22:18:56 -07:00
Dennis Heimbigner
73caeb674d Cleanup the CMake inter-test dependencies
The ncdump test set has a number of inter-test dependencies
that are not properly established in ncdump/CMakeLists.txt.

So this PR attempts to:
1. reorder the tests
2. change tests in CMakeLists.txt from build_bin_test_no_prefix to add_bin_test_no_prefix so they get executed

Plus a couple of minor bug fixes.
1. Change ENABLE_NC4 => ENABLE_HDF5 in github action.
2. fix a memory error in findtestserver.c.in
3. fix bug in ncdap_tests/tst_urls.sh
4. fix netcdf file name bug in tst_netcdf4_4.sh
2021-12-20 15:13:08 -07:00
Dennis Heimbigner
11fe00ea05 Add filter support to NCZarr
Filter support has three goals:

1. Use the existing HDF5 filter implementations,
2. Allow filter metadata to be stored in the NumCodecs metadata format used by Zarr,
3. Allow filters to be used even when HDF5 is disabled

Detailed usage directions are define in docs/filters.md.

For now, the existing filter API is left in place. So filters
are defined using ''nc_def_var_filter'' using the HDF5 style
where the id and parameters are unsigned integers.

This is a big change since filters affect many parts of the code.

In the following, the terms "compressor" and "filter" and "codec" are generally
used synonomously.

### Filter-Related Changes:
* In order to support dynamic loading of shared filter libraries, a new library was added in the libncpoco directory; it helps to isolate dynamic loading across multiple platforms.
* Provide a json parsing library for use by plugins; this is created by merging libdispatch/ncjson.c with include/ncjson.h.
* Add a new _Codecs attribute to allow clients to see what codecs are being used; let ncdump -s print it out.
* Provide special headers to help support compilation of HDF5 filters when HDF5 is not enabled: netcdf_filter_hdf5_build.h and netcdf_filter_build.h.
* Add a number of new test to test the new nczarr filters.
* Let ncgen parse _Codecs attribute, although it is ignored.

### Plugin directory changes:
* Add support for the Blosc compressor; this is essential because it is the most common compressor used in Zarr datasets. This also necessitated adding a CMake FindBlosc.cmake file
* Add NCZarr support for the big-four filters provided by HDF5: shuffle, fletcher32, deflate (zlib), and szip
* Add a Codec defaulter (see docs/filters.md) for the big four filters.
* Make plugins work with windows by properly adding __declspec declaration.

### Misc. Non-Filter Changes
* Replace most uses of USE_NETCDF4 (deprecated) with USE_HDF5.
* Improve support for caching
* More fixes for path conversion code
* Fix misc. memory leaks
* Add new utility -- ncdump/ncpathcvt -- that does more or less the same thing as cygpath.
* Add a number of new test to test the non-filter fixes.
* Update the parsers
* Convert most instances of '#ifdef _MSC_VER' to '#ifdef _WIN32'
2021-09-02 17:04:26 -06:00
Ward Fisher
18086dae10 Correct a typo in the github actions yaml file. 2021-08-25 14:32:01 -06:00
Ward Fisher
7ee0281c74 Test dependency on one-offs updated. 2021-08-25 14:31:09 -06:00
Ward Fisher
48d943864d Syntax debugging. 2021-08-25 14:28:59 -06:00
Ward Fisher
d6f06bf3d6 A bit more twiddling of the github actions workflow. 2021-08-25 14:25:41 -06:00
Ward Fisher
8fc52da9d7 Thanks to newer ctest functionality, if ctest tests fail, re-run just the failed ones with more verbose output. 2021-08-25 12:43:04 -06:00
Ward Fisher
40c3fc2169 Reverse GA workflow. Run the one-off tests first and then run the full test matrix. This should shorten the test/failure cycle. 2021-08-25 11:16:47 -06:00
Ward Fisher
912fd7574c Modify GA workflow a bit. 2021-08-24 15:22:20 -06:00
Ward Fisher
5bcf91f880 Correct spacing issue in GA yaml file. 2021-08-24 15:07:10 -06:00
Ward Fisher
37d6cc9191 Propagate github actions tests into cmake-based stanza. 2021-08-24 14:40:13 -06:00
Ward Fisher
1376067870 Added a one-off stanza for autoconf-based tests on Github Actions. 2021-08-24 14:26:28 -06:00
Edward Hartnett
e8d3198a77 added commas 2021-08-20 05:46:22 -06:00
Edward Hartnett
5c23e72e28 added 1.12.1 to hdf5 versions built by GitHub actions 2021-08-20 05:44:43 -06:00
Dennis Heimbigner
669fd34357 remove push 2021-07-18 13:37:51 -06:00
Ward Fisher
e21ef7bcb0
Merge branch 'master' into dap4fixes2.dmh 2021-06-01 14:11:39 -06:00
Ward Fisher
5b49ee9f3b Temporarily remove distcheck from Github Actions 2021-05-26 13:55:59 -06:00
Ward Fisher
6eaf39f3c3 Revert previous change. 2021-05-26 13:54:40 -06:00
Ward Fisher
a5d7277092 Testing a different theory 2021-05-26 13:44:56 -06:00
Ward Fisher
5f95b7ca9f Temporarily add an ssh-interface when make distcheck fails. 2021-05-26 13:15:12 -06:00
Ward Fisher
ceb9b29e17 Speculating on fix, perhaps a race issue on make distcheck when passed the command to use concurrent processes. 2021-05-26 11:16:22 -06:00
Ward Fisher
e2eb7bb52e Added DISTCHECK_CONFIGURE_FLAGS to Github Actions distcheck stanza. 2021-05-26 10:44:07 -06:00
Ward Fisher
d003f98367 Updated github actions to add make distcheck 2021-05-25 10:55:14 -06:00
Dennis Heimbigner
8ceafa62d4 Improve operation of the DAP4 code and fix bugs
re: e-support EOT-483791

* Add a new set of remote tests based on using the thredds-test server.
* Improve error reporting when server requests fail.
* Fix handing of _NCProperties attribute
2021-05-21 20:46:56 -06:00
Ward Fisher
95719addd2 Added 1.0.1 to test matrix in support of https://github.com/Unidata/netcdf-c/pull/1931#issuecomment-804312933 2021-03-26 14:50:22 -06:00
Dennis Heimbigner
911d0a5deb Enable nczarr testing in github actions.
Changes:
1. add "use_nczarr: [ nczarr_off, nczarr_on ]" to matrix
2. add libzip-dev to the apt installs (might need caching).
3. convert deprecated "--enable-netcdf-4" to "--enable-hdf5" (also for cmake)
2021-01-27 11:30:48 -07:00
Ward Fisher
bb2b864674 Added more updates 2020-12-16 09:34:42 -07:00
Ward Fisher
3e01272cee Added an apt update stanza to the github action script. 2020-12-16 09:19:21 -07:00
Ward Fisher
5fed325180 Cleaned up orphaned github action files, modified GA to run on PR instead of push. 2020-12-07 11:16:04 -07:00
Ward Fisher
4cf290dc0b
Merge pull request #1897 from DennisHeimbigner/oceanunavail.dmh
Disable use of opendap2.oceanbrowser.net
2020-12-04 16:11:47 -07:00
Ward Fisher
7fce4b3482 Clean up, bring over a new action script. This one uses apt and cached builds instead of conda. 2020-12-04 15:38:13 -07:00
Ward Fisher
1814da1f7f Temporarily reducing test matrix. 2020-12-03 12:20:52 -07:00
Ward Fisher
04640abe32 Github Action debugging. 2020-12-03 12:12:07 -07:00
Ward Fisher
9b9ca97712 Explicitly link rt library for github actions. 2020-12-03 12:09:24 -07:00
Ward Fisher
d263a48637 Correct previous 'fix' that wasn't. 2020-12-03 12:03:34 -07:00
Ward Fisher
f48ccb465d Address link issue on github actions platform. 2020-12-03 11:59:25 -07:00
Dennis Heimbigner
c797eb29ad remove worflows and fix ifdef 2020-12-02 16:36:09 -07:00
Dennis Heimbigner
af89295852 ignore 2020-12-01 22:09:13 -07:00
Dennis Heimbigner
5b8842b373
manual.yml 2020-12-01 22:06:16 -07:00
Dennis Heimbigner
62036cbf45
ignore.yml 2020-12-01 22:01:44 -07:00
Dennis Heimbigner
8cf8c31df4
automake-dmh.yml 2020-12-01 21:59:52 -07:00
Dennis Heimbigner
c164650cb6 rename 2020-12-01 21:58:10 -07:00
Dennis Heimbigner
6cdddd7596 update workflows 2020-12-01 21:53:33 -07:00
Dennis Heimbigner
0aa80f91d6 manual workflow 2020-12-01 21:48:39 -07:00
Ward Fisher
1ba45b9e9a Adding github actions to netcdf-c for CI purposes. 2020-11-30 11:21:51 -07:00
Ward Fisher
cca0f8d46e Moved code of conduct and contributing to .github folder. 2019-10-08 10:27:32 -06:00
Ward Fisher
a747922bff Adopted codeowners file from metpy project; this is a technical ownership and is not asserting ownership of code/copyright. Any code that does not have an obvious owner (from the perspective of github) will default to @WardF and @DennisHeimbigner 2018-03-16 14:46:08 -06:00
Ward Fisher
d813448e2b Put CONTRIBUTING.md back in top level of repository. 2016-02-24 10:55:56 -07:00