Commit Graph

114 Commits

Author SHA1 Message Date
Ward Fisher
d281be2333
Merge branch 'main' into open_mem_truncated_file 2022-03-14 15:02:02 -06:00
Dennis Heimbigner
8b9253fef2 Fix various problem around VLEN's
re: https://github.com/Unidata/netcdf-c/issues/541
re: https://github.com/Unidata/netcdf-c/issues/1208
re: https://github.com/Unidata/netcdf-c/issues/2078
re: https://github.com/Unidata/netcdf-c/issues/2041
re: https://github.com/Unidata/netcdf-c/issues/2143

For a long time, there have been known problems with the
management of complex types containing VLENs.  This also
involves the string type because it is stored as a VLEN of
chars.

This PR (mostly) fixes this problem. But note that it adds new
functions to netcdf.h (see below) and this may require bumping
the .so number.  These new functions can be removed, if desired,
in favor of functions in netcdf_aux.h, but netcdf.h seems the
better place for them because they are intended as alternatives
to the nc_free_vlen and nc_free_string functions already in
netcdf.h.

The term complex type refers to any type that directly or
transitively references a VLEN type. So an array of VLENS, a
compound with a VLEN field, and so on.

In order to properly handle instances of these complex types, it
is necessary to have function that can recursively walk
instances of such types to perform various actions on them.  The
term "deep" is also used to mean recursive.

At the moment, the two operations needed by the netcdf library are:
* free'ing an instance of the complex type
* copying an instance of the complex type.

The current library does only shallow free and shallow copy of
complex types. This means that only the top level is properly
free'd or copied, but deep internal blocks in the instance are
not touched.

Note that the term "vector" will be used to mean a contiguous (in
memory) sequence of instances of some type. Given an array with,
say, dimensions 2 X 3 X 4, this will be stored in memory as a
vector of length 2*3*4=24 instances.

The use cases are primarily these.

## nc_get_vars
Suppose one is reading a vector of instances using nc_get_vars
(or nc_get_vara or nc_get_var, etc.).  These functions will
return the vector in the top-level memory provided.  All
interior blocks (form nested VLEN or strings) will have been
dynamically allocated.

After using this vector of instances, it is necessary to free
(aka reclaim) the dynamically allocated memory, otherwise a
memory leak occurs.  So, the recursive reclaim function is used
to walk the returned instance vector and do a deep reclaim of
the data.

Currently functions are defined in netcdf.h that are supposed to
handle this: nc_free_vlen(), nc_free_vlens(), and
nc_free_string().  Unfortunately, these functions only do a
shallow free, so deeply nested instances are not properly
handled by them.

Note that internally, the provided data is immediately written so
there is no need to copy it. But the caller may need to reclaim the
data it passed into the function.

## nc_put_att
Suppose one is writing a vector of instances as the data of an attribute
using, say, nc_put_att.

Internally, the incoming attribute data must be copied and stored
so that changes/reclamation of the input data will not affect
the attribute.

Again, the code inside the netcdf library does only shallow copying
rather than deep copy. As a result, one sees effects such as described
in Github Issue https://github.com/Unidata/netcdf-c/issues/2143.

Also, after defining the attribute, it may be necessary for the user
to free the data that was provided as input to nc_put_att().

## nc_get_att
Suppose one is reading a vector of instances as the data of an attribute
using, say, nc_get_att.

Internally, the existing attribute data must be copied and returned
to the caller, and the caller is responsible for reclaiming
the returned data.

Again, the code inside the netcdf library does only shallow copying
rather than deep copy. So this can lead to memory leaks and errors
because the deep data is shared between the library and the user.

# Solution

The solution is to build properly recursive reclaim and copy
functions and use those as needed.
These recursive functions are defined in libdispatch/dinstance.c
and their signatures are defined in include/netcdf.h.
For back compatibility, corresponding "ncaux_XXX" functions
are defined in include/netcdf_aux.h.
````
int nc_reclaim_data(int ncid, nc_type xtypeid, void* memory, size_t count);
int nc_reclaim_data_all(int ncid, nc_type xtypeid, void* memory, size_t count);
int nc_copy_data(int ncid, nc_type xtypeid, const void* memory, size_t count, void* copy);
int nc_copy_data_all(int ncid, nc_type xtypeid, const void* memory, size_t count, void** copyp);
````
There are two variants. The first two, nc_reclaim_data() and
nc_copy_data(), assume the top-level vector is managed by the
caller. For reclaim, this is so the user can use, for example, a
statically allocated vector. For copy, it assumes the user
provides the space into which the copy is stored.

The second two, nc_reclaim_data_all() and
nc_copy_data_all(), allows the functions to manage the
top-level.  So for nc_reclaim_data_all, the top level is
assumed to be dynamically allocated and will be free'd by
nc_reclaim_data_all().  The nc_copy_data_all() function
will allocate the top level and return a pointer to it to the
user. The user can later pass that pointer to
nc_reclaim_data_all() to reclaim the instance(s).

# Internal Changes
The netcdf-c library internals are changed to use the proper
reclaim and copy functions.  It turns out that the places where
these functions are needed is quite pervasive in the netcdf-c
library code.  Using these functions also allows some
simplification of the code since the stdata and vldata fields of
NC_ATT_INFO are no longer needed.  Currently this is commented
out using the SEPDATA \#define macro.  When any bugs are largely
fixed, all this code will be removed.

# Known Bugs

1. There is still one known failure that has not been solved.
   All the failures revolve around some variant of this .cdl file.
   The proximate cause of failure is the use of a VLEN FillValue.
````
        netcdf x {
        types:
          float(*) row_of_floats ;
        dimensions:
          m = 5 ;
        variables:
          row_of_floats ragged_array(m) ;
              row_of_floats ragged_array:_FillValue = {-999} ;
        data:
          ragged_array = {10, 11, 12, 13, 14}, {20, 21, 22, 23}, {30, 31, 32},
                         {40, 41}, _ ;
        }
````
When a solution is found, I will either add it to this PR or post a new PR.

# Related Changes

* Mark nc_free_vlen(s) as deprecated in favor of ncaux_reclaim_data.
* Remove the --enable-unfixed-memory-leaks option.
* Remove the NC_VLENS_NOTEST code that suppresses some vlen tests.
* Document this change in docs/internal.md
* Disable the tst_vlen_data test in ncdump/tst_nccopy4.sh.
* Mark types as fixed size or not (transitively) to optimize the reclaim
  and copy functions.

# Misc. Changes

* Make Doxygen process libdispatch/daux.c
* Make sure the NC_ATT_INFO_T.container field is set.
2022-01-08 18:30:00 -07:00
Dennis Heimbigner
f6e25b695e Fix additional S3 support issues
re: https://github.com/Unidata/netcdf-c/issues/2117
re: https://github.com/Unidata/netcdf-c/issues/2119

* Modify libsrc to allow byte-range reading of netcdf-3 files in private S3 buckets; this required using the aws sdk. Also add a test case.
* The aws sdk can sometimes cause problems if the Awd::ShutdownAPI function is not called. So at optional atexit() support to ensure it is called. This is disabled for Windows.
* Add documentation to nczarr.md on how to build and use the aws sdk under windows. Currently it builds, but testing fails.
* Switch testing from stratus to the Unidata bucket on S3.
* Improve support for the s3: url protocol.
* Add a s3 specific utility code file: ds3util.c
* Modify NC_infermodel to attempt to read the magic number of byte-ranged files in S3.

## Misc.

* Move and rename the core S3 SDK wrapper code (libnczarr/zs3sdk.cpp) to libdispatch since it now used in libsrc as well as libnczarr.
* Add calls to nc_finalize in the utilities in case atexit is disabled.
* Add header only json parser to the distribution rather than as a built source.
2021-10-29 20:06:37 -06:00
Tobias Kölling
b58a3ff07a allow missing udata when closing file with abort=1
If a memory backed file is closed due to an aborted opening (i.e. a
broken file was attempted to be opened), udata may not be set. In this
case, the assertation checking for udata should not be triggered.
2021-08-24 17:11:05 +02:00
Dennis Heimbigner
e1c470683c Fix merge error from PR https://github.com/Unidata/netcdf-c/pull/1892/files 2020-12-01 20:10:48 -07:00
Ward Fisher
c96722e001
Merge pull request #1890 from gsjaardema/patch-46
Fix undefined struct member access
2020-12-01 14:41:27 -07:00
Dennis Heimbigner
68bcd1122a Enforce that !ENABLE_BYTERANGE => !ENABLE_HDF5_ROS3
This is a follow on to PR https://github.com/Unidata/netcdf-c/pull/1890

Modify configure.ac to enforce that
!ENABLE_BYTERANGE => !ENABLE_HDF5_ROS3
2020-11-28 13:00:06 -07:00
Greg Sjaardema
db0b84252c
Fix undefined struct member access
The `http` field of the hdf5 info struct is not defined unless `ENABLE_HDF5_ROS3` or `ENABLE_BYTERANGE` or `ENABLE_S3_SDK` is defined.  Based on a quick look at the code, I think that the `ENABLE_HDF5_ROS3` define is the relavant one here.  Maybe a better fix is to check if any of them are defined...
2020-11-24 08:25:29 -07:00
Dennis Heimbigner
eb3d9eb0c9 Provide a Number of fixes/improvements to NCZarr
Primary changes:
* Add an improved cache system to speed up performance.
* Fix NCZarr to properly handle scalar variables.

Misc. Related Changes:
* Added unit tests for extendible hash and for the generic cache.
* Add config parameter to set size of the NCZarr cache.
* Add initial performance tests but leave them unused.
* Add CRC64 support.
* Move location of ncdumpchunks utility from /ncgen to /ncdump.
* Refactor auth support.

Misc. Unrelated Changes:
* More cleanup of the S3 support
* Add support for S3 authentication in .rc files: HTTP.S3.ACCESSID and HTTP.S3.SECRETKEY.
* Remove the hashkey from the struct OBJHDR since it is never used.
2020-11-19 17:01:04 -07:00
Edward Hartnett
832fbf19c8 now dont return error on second redef call for netcdf/HDF5 files 2020-07-08 11:10:15 -06:00
Dennis Heimbigner
59e04ae071 This PR adds EXPERIMENTAL support for accessing data in the
cloud using a variant of the Zarr protocol and storage
format. This enhancement is generically referred to as "NCZarr".

The data model supported by NCZarr is netcdf-4 minus the user-defined
types and the String type. In this sense it is similar to the CDF-5
data model.

More detailed information about enabling and using NCZarr is
described in the document NUG/nczarr.md and in a
[Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in).

WARNING: this code has had limited testing, so do use this version
for production work. Also, performance improvements are ongoing.
Note especially the following platform matrix of successful tests:

Platform | Build System | S3 support
------------------------------------
Linux+gcc      | Automake     | yes
Linux+gcc      | CMake        | yes
Visual Studio  | CMake        | no

Additionally, and as a consequence of the addition of NCZarr,
major changes have been made to the Filter API. NOTE: NCZarr
does not yet support filters, but these changes are enablers for
that support in the future.  Note that it is possible
(probable?) that there will be some accidental reversions if the
changes here did not correctly mimic the existing filter testing.

In any case, previously filter ids and parameters were of type
unsigned int. In order to support the more general zarr filter
model, this was all converted to char*.  The old HDF5-specific,
unsigned int operations are still supported but they are
wrappers around the new, char* based nc_filterx_XXX functions.
This entailed at least the following changes:
1. Added the files libdispatch/dfilterx.c and include/ncfilter.h
2. Some filterx utilities have been moved to libdispatch/daux.c
3. A new entry, "filter_actions" was added to the NCDispatch table
   and the version bumped.
4. An overly complex set of structs was created to support funnelling
   all of the filterx operations thru a single dispatch
   "filter_actions" entry.
5. Move common code to from libhdf5 to libsrc4 so that it is accessible
   to nczarr.

Changes directly related to Zarr:
1. Modified CMakeList.txt and configure.ac to support both C and C++
   -- this is in support of S3 support via the awd-sdk libraries.
2. Define a size64_t type to support nczarr.
3. More reworking of libdispatch/dinfermodel.c to
   support zarr and to regularize the structure of the fragments
   section of a URL.

Changes not directly related to Zarr:
1. Make client-side filter registration be conditional, with default off.
2. Hack include/nc4internal.h to make some flags added by Ed be unique:
   e.g. NC_CREAT, NC_INDEF, etc.
3. cleanup include/nchttp.h and libdispatch/dhttp.c.
4. Misc. changes to support compiling under Visual Studio including:
   * Better testing under windows for dirent.h and opendir and closedir.
5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags
   and to centralize error reporting.
6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them.
7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible.

Changes Left TO-DO:
1. fix provenance code, it is too HDF5 specific.
2020-06-28 18:02:47 -06:00
Dennis Heimbigner
b0e0d81aa9 Fix reclamation of the ->format_XXX_info fields
nc4internal.c contains code to free the format_XXX_info
fields. Since these are format specific, this code
was moved to the dispatch code (libhdf5 and libhdf4
in the current case).

Additionally, there are some fields in nc4internal.h (e.g.
dimscale fields) that are specific to HDF5 and have been moved
to the corresponding HDF5 data structures and code.

Misc. other changes:
1. NC_VAR_INFO_T->hdf5_name renamed to alt_name to avoid
   implying it is necessarily HDF5 specific.
2. prefix NC_FILE_INFO_T with an instance of NC_OBJ for consistency.
   this also requires wrapping move_in_NCList() to keep
   hdr.id consistent.
2020-03-29 12:48:59 -06:00
Ed Hartnett
76d6b55eff moved call to nc4_rec_grp_del() to inside nc4_nc4f_list_del() 2019-07-16 16:29:06 -06:00
Ed Hartnett
4398cad8f5 whitespace cleanup 2019-07-16 16:17:07 -06:00
Ed Hartnett
b8e50c9254 moved freeing of allvars, alldims, alltypes lists to nc4_nc4f_list_del 2019-07-16 16:16:11 -06:00
Ed Hartnett
e9666f7333 moved free(h5) intonc4_nc4f_list_del 2019-07-16 16:07:21 -06:00
Ed Hartnett
d840c1864c removed unused prototype 2019-07-16 16:02:08 -06:00
Dennis Heimbigner
2eb1a8d8cf For some reason, the code for this was incorrect.
Anyway, I repaired it as follows:
1. Created NC4_write_provenance as parallel to NC4_read_provenance
2. Modified hdf5file.c to use NC4_write_provenance
3. Modified hdf5open.c to use NC4_read_provenance (was NC4_read_ncproperties).
4. The creation of the _NCProperties string was seriously hosed:
   was using all the wrong fields.
2019-04-18 14:23:20 -06:00
Dennis Heimbigner
88a7a1753c Simplify libhdf5/nc5info.c to move to lazy parsing
re: https://github.com/Unidata/netcdf-c/issues/1352

When nc4info.c encounters an _NCProperties attribute
with a version number it does not recognize, it does not
show it correctly.

Solution chosen is to arrange so that accessing the attribute
returns the raw value of the Attribute from the file. This way,
even if the version is unrecognized, it will return something
usable.

The changes were primarily to never attempt to parse the value
of _NCProperties until actually required. Which since they
are currently not used means that parsing never occurs.

Also modified ncdump/tst_fileinfo.sh to include some extra testing

I tested the original failure by changing the value of NCPROPS to 3.
However, there is no way to test this at build time.

Misc. Changes
* Inlined the provenance info in the NC_FILE_INFO_T structure
* Centralized stuff from elsewhere into include/nc_provenance.h

Misc. Unrelated Changes
* Removed/turned off some misc debug output left on by accident
* Fix CPPFLAGS name error in libhdf5/Makefile.am
2019-03-09 20:35:57 -07:00
Ed Hartnett
8b1f5a8fad cleanup of whitespace in HDF5 directory 2019-02-19 05:18:02 -07:00
Ed Hartnett
840d51d035 changed NC_GRP_INFO_T to use atts_read instead of atts_not_read 2019-01-22 08:11:52 -07:00
Ed Hartnett
c6a9948a8e removed unneeded var, fixed broken log statements that cause segfaults 2019-01-20 09:37:13 -07:00
Ed Hartnett
3c9a141ee3 moved function detect_preserve_dimids and made it static 2018-12-11 06:15:47 -07:00
Ed Hartnett
8d31f5b806 clean up 2018-11-07 14:23:55 -07:00
Ed Hartnett
6f4b4ac80d moving attribute HDF5 stuff to libhdf5 2018-11-07 14:21:57 -07:00
Ed Hartnett
5f36a3b425 merged master 2018-11-02 10:00:53 -06:00
Dennis Heimbigner
245961de00 re: github issues
https://github.com/Unidata/netcdf-c/issues/1168
    https://github.com/Unidata/netcdf-c/issues/1163
    https://github.com/Unidata/netcdf-c/issues/1162

This PR partially fixes memory leaks in the netcdf-c library,
in the ncdump utility, and in some test cases.

The netcdf-c library now runs memory clean with the assumption
that the --disable-utilities option is used. The primary remaining
problem is ncgen. Once that is fixed, I believe the netcdf-c library
will run memory clean with no limitations.

Notes
-----------
1. Memory checking was performed using gcc -fsanitize=address.
   Valgrind-based testing has yet to be performed.
2. The pnetcdf, hdf4, and examples code has not been tested.

Misc. Non-leak changes
1. Make tst_diskless2 only run when netcdf4 is enabled (issue 1162)
2. Fix CmakeLists.txt to turn off logging if ENABLE_NETCDF_4 is OFF
3. Isolated all my debug scripts into a single top-level directory
   called debug
4. Fix some USE_NETCDF4 dependencies in nc_test and nc_test4 Makefile.am
2018-10-30 20:48:12 -06:00
Ed Hartnett
452f75fadd commented out some tests 2018-10-22 15:04:44 -06:00
Ed Hartnett
958826d0af merged ejh_mem_check 2018-10-22 13:29:46 -06:00
Dennis Heimbigner
e40eb2e950 switch 2018-10-19 11:11:36 -06:00
Ed Hartnett
695e295734 now calling recursive function to close HDF5 objects in file 2018-10-18 03:29:21 -06:00
Ed Hartnett
fa86d3c488 continuing to separate hdf5/libsrc4 file close code 2018-10-18 03:20:39 -06:00
Ed Hartnett
c90ab24b48 moving towards separating HDF5 file close from netcdf4 file close 2018-10-18 03:17:38 -06:00
Dennis Heimbigner
979873f81d Fix provenance memory leak 2018-10-17 11:43:14 -06:00
Dennis Heimbigner
4636584d5b Revert/Improve nc_create + NC_DISKLESS behavior
re: https://github.com/Unidata/netcdf-c/issues/1154

Inadvertently, the behavior of NC_DISKLESS with nc_create() was
changed in release 4.6.1. Previously, the NC_WRITE flag needed
to be explicitly used with NC_DISKLESS in order to cause the
created file to be persisted to disk.

Additional analyis indicated that the current NC_DISKLESS
implementation was seriously flawed.

This PR attempts to clean up and regularize the situation with
respect to NC_DISKLESS control. One important aspect of diskless
operation is that there are two different notions of write.

1. The file is read-write vs read-only when using the netcdf API.
2. The file is persisted or not to disk at nc_close().

Previously, these two were conflated. The rules now are
as follows.

1. NC_DISKLESS + NC_WRITE means that the file is read/write using the netcdf API
2. NC_DISKLESS + NC_PERSIST means that the file is persisted to a disk file at nc_close.
3. NC_DISKLESS + NC_PERSIST + NC_WRITE means both 1 and 2.

The NC_PERSIST flag is new and takes over the obsolete NC_MPIPOSIX flag.
NC_MPIPOSIX is still defined, but is now an alias for the NC_MPIIO flag.

It is also now the case that for netcdf-4, NC_DISKLESS is independent
of NC_INMEMORY and in fact it is an error to specify both flags
simultaneously.

Finally, the MMAP code was fixed to use NC_PERSIST as well.
Also marked MMAP as deprecated.

Also added a test case to test various combinations of NC_DISKLESS,
NC_PERSIST, and NC_WRITE.

This PR affects a number of files and especially test cases
that used NC_DISKLESS.

Misc. Unrelated fixes
1. fixed some warnings in ncdump/dumplib.c
2018-10-10 13:32:17 -06:00
Ed Hartnett
d9ef143d1e separated cache code from hdf5file.c 2018-09-14 13:33:22 -06:00
Ed Hartnett
b501748f58 fixed merge error 2018-09-14 11:56:10 -06:00
Ed Hartnett
a009dab557 fixed inadvertant move of function 2018-09-14 11:39:57 -06:00
Ed Hartnett
e2839c120f
Merge branch 'master' into ejh_hdf5_sep_next_2 2018-09-14 11:33:59 -06:00
Ed Hartnett
eabb690949 fixing merge issue 2018-09-12 09:36:36 -06:00
Ed Hartnett
abf247de92 made dumpopenobjects static in attempt to get appvayor build working 2018-09-12 07:53:31 -06:00
Ed Hartnett
08a2dce904 merged master 2018-09-07 12:40:44 -06:00
Ed Hartnett
8390d572ad
Merge branch 'master' into ejh_hdf5_sep_next 2018-09-06 17:30:37 -06:00
Ward Fisher
784d777bff Merge branch 'master' into provenance.dmh 2018-09-06 15:13:09 -06:00
Ed Hartnett
86e002d794 merged master 2018-09-06 14:01:59 -06:00
Ed Hartnett
4213f10ed6 moved function 2018-09-06 13:57:39 -06:00
Ed Hartnett
80dc5bc0f7 merged master 2018-09-06 12:24:29 -06:00
Ed Hartnett
6c86ad8229 moved functions back 2018-09-06 12:19:17 -06:00
Ward Fisher
fbe0a18b1c
Merge branch 'master' into ejh_loop_cleanup_2 2018-09-05 11:22:55 -06:00
Dennis Heimbigner
d62a9e623c Fix the NC_INMEMORY code to work in all cases with HDF5 1.10.
re: github issue https://github.com/Unidata/netcdf-c/issues/1111

One of the less common use cases for the in-memory feature is
apparently failing with HDF5-1.10.x.  The fix is complicated and
requires significant changes to libhdf5/nc4memcb.c. The current
setup is detailed in the file docs/inmeminternal.dox.

Additionally, it was discovered that the program
nc_test/tst_inmemory.c, which is invoked by
nc_test/run_inmemory.sh, actually was failing because of the
above problem. But the failure is not detected since the script
does not return non-zero value.

Other Changes:
1. Fix nc_test_tst_inmemory to return errors correctly.
2. Make ncdap_tests/findtestserver.c and dap4_tests/findtestserver4.c
   be generated from ncdap_test/findtestserver.c.in.
3. Make LOG() print output to stderr instead of stdout to
   avoid contaminating e.g. ncdump output.
4. Modify the handling of NC_INMEMORY and NC_DISKLESS flags
   to properly handle that NC_DISKLESS => NC_INMEMORY. This
   affects a number of code pieces, especially memio.c.
2018-09-04 11:27:47 -06:00