Commit Graph

506 Commits

Author SHA1 Message Date
Ward Fisher
7d2a646f25 Merge branch 'ejh_fix_redef' of https://github.com/NOAA-GSD/netcdf-c into NOAA-GSD-ejh_fix_redef 2020-07-09 13:55:37 -06:00
Edward Hartnett
3e60a863de fixed warning in hdf5filter.c 2020-07-08 11:24:54 -06:00
Edward Hartnett
832fbf19c8 now dont return error on second redef call for netcdf/HDF5 files 2020-07-08 11:10:15 -06:00
Edward Hartnett
6c112efb8e fixed problem setting szip on var with unlimited dim and added test 2020-07-02 10:55:34 -06:00
Dennis Heimbigner
59e04ae071 This PR adds EXPERIMENTAL support for accessing data in the
cloud using a variant of the Zarr protocol and storage
format. This enhancement is generically referred to as "NCZarr".

The data model supported by NCZarr is netcdf-4 minus the user-defined
types and the String type. In this sense it is similar to the CDF-5
data model.

More detailed information about enabling and using NCZarr is
described in the document NUG/nczarr.md and in a
[Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in).

WARNING: this code has had limited testing, so do use this version
for production work. Also, performance improvements are ongoing.
Note especially the following platform matrix of successful tests:

Platform | Build System | S3 support
------------------------------------
Linux+gcc      | Automake     | yes
Linux+gcc      | CMake        | yes
Visual Studio  | CMake        | no

Additionally, and as a consequence of the addition of NCZarr,
major changes have been made to the Filter API. NOTE: NCZarr
does not yet support filters, but these changes are enablers for
that support in the future.  Note that it is possible
(probable?) that there will be some accidental reversions if the
changes here did not correctly mimic the existing filter testing.

In any case, previously filter ids and parameters were of type
unsigned int. In order to support the more general zarr filter
model, this was all converted to char*.  The old HDF5-specific,
unsigned int operations are still supported but they are
wrappers around the new, char* based nc_filterx_XXX functions.
This entailed at least the following changes:
1. Added the files libdispatch/dfilterx.c and include/ncfilter.h
2. Some filterx utilities have been moved to libdispatch/daux.c
3. A new entry, "filter_actions" was added to the NCDispatch table
   and the version bumped.
4. An overly complex set of structs was created to support funnelling
   all of the filterx operations thru a single dispatch
   "filter_actions" entry.
5. Move common code to from libhdf5 to libsrc4 so that it is accessible
   to nczarr.

Changes directly related to Zarr:
1. Modified CMakeList.txt and configure.ac to support both C and C++
   -- this is in support of S3 support via the awd-sdk libraries.
2. Define a size64_t type to support nczarr.
3. More reworking of libdispatch/dinfermodel.c to
   support zarr and to regularize the structure of the fragments
   section of a URL.

Changes not directly related to Zarr:
1. Make client-side filter registration be conditional, with default off.
2. Hack include/nc4internal.h to make some flags added by Ed be unique:
   e.g. NC_CREAT, NC_INDEF, etc.
3. cleanup include/nchttp.h and libdispatch/dhttp.c.
4. Misc. changes to support compiling under Visual Studio including:
   * Better testing under windows for dirent.h and opendir and closedir.
5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags
   and to centralize error reporting.
6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them.
7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible.

Changes Left TO-DO:
1. fix provenance code, it is too HDF5 specific.
2020-06-28 18:02:47 -06:00
Greg Sjaardema
edf0ca6c98
Avoid potential integer overrun
It is possible for the values stored to `file_value_size` to overrun the storage capacity of a 32-bit integer.  The value does need to store negative values potentially, so can be `size_t` or `hsize_t`, so use the `hssize_t` which is a signed 64-bit value.  Could also use `ssize_t`, but that is not used in this routine...
2020-06-10 15:42:22 -06:00
Dennis Heimbigner
84c69afca7 Allow redefinition of variable filters
re: Github issue https://github.com/Unidata/netcdf-c/issues/1713

If nc_def_var_filter or nc_def_var_deflate or nc_def_var_szip is
called multiple times with the same filter id, but possibly with
different sets of parameters, then the first invocation is
sticky and later invocations are ignored. The desired behavior
is to have the last invocation be used.

This PR implements that desired behavior, with some special
cases.  If you call nc_def_var_deflate multiple times, then the
last invocation rule applies with respect to deflate. However,
the shuffle filter, if enabled, is always applied just before
applying deflate.

Misc unrelated changes:
1. Make client-side filters be disabled by default
2. Fix the definition of uintptr_t and use in oc2 and libdap4
3. Add some test cases
4. modify filter order tests to use plugin filters rather
   than client-side filters
2020-05-11 09:42:31 -06:00
Edward Hartnett
6aa6eff710 now properly setting HDF5 file cache for files created/opened sequentially on parallel IO builds 2020-05-08 11:00:56 -06:00
Edward Hartnett
e3c9e83ecf adding internal function, plus some documentation 2020-05-08 08:58:42 -06:00
Greg Sjaardema
3e919a568f
Remove line that was missed in original patch 2020-04-30 14:00:18 -06:00
Greg Sjaardema
1db3d07beb
Proof-of-Concept: Avoid N^2 behavior in NC4_inq_dim
The current library seems to have some behavior which is N^2 in the number of vars in a file.

The `NC4_inq_dim` routine calls down to `nc4_find_dim_len` which iterates through each `var` in the file/group and calls `find_var_dim_max_length` on each var and finds the largest length of the dim on each of those vars. This is done only for unlimited vars.

I have a file with 129 dim and 1630 vars.  The unlimited dimension is of length 41.  In my test program, I am reading data from 4 files which have the same dim and var count and reading every 4th time step (unlimited dimension).  If I run a profile, I see that 98.2% of the program time is in the `nc_get_vara_float` call tree and most of that is in `find_var_dim_max_length` (94.8%).

There are 66,142 calls to `nc_get_vara_float` resulting in 107,307,290 calls to `find_var_dim_max_length` with twice that number of calls to `malloc/free` and calls to 5 HDF5 routines.  All of this, at least in my case, to return the same `41` each time.

The proof of concept patch here will check whether the file is read-only (or no_write) and if so, it will cache the value of the dim length the first time it is calculated.   With this change, my example run is sped up by a factor of 60.  The time for `NC4_inq_dim` and below drops from 97.2% down to 2.7%.

I'm not sure whether this is the correct fix, or if there is some behavior that I am overlooking, but my users would definitely like a 10 second run compared to a 10 minute run... 

This is on current Netcdf master branch.

I will try to attach some valgrind/callgrind profiles.
2020-04-30 11:01:10 -06:00
Scot Breitenfeld
7b1b06b5ca Merge remote-tracking branch 'upstream/master' 2020-04-23 15:36:14 -05:00
Dennis Heimbigner
b0e0d81aa9 Fix reclamation of the ->format_XXX_info fields
nc4internal.c contains code to free the format_XXX_info
fields. Since these are format specific, this code
was moved to the dispatch code (libhdf5 and libhdf4
in the current case).

Additionally, there are some fields in nc4internal.h (e.g.
dimscale fields) that are specific to HDF5 and have been moved
to the corresponding HDF5 data structures and code.

Misc. other changes:
1. NC_VAR_INFO_T->hdf5_name renamed to alt_name to avoid
   implying it is necessarily HDF5 specific.
2. prefix NC_FILE_INFO_T with an instance of NC_OBJ for consistency.
   this also requires wrapping move_in_NCList() to keep
   hdr.id consistent.
2020-03-29 12:48:59 -06:00
Edward Hartnett
e7b9b1b587 fixed documentation of cache int functions 2020-03-24 15:02:42 -06:00
Scot Breitenfeld
c5d2e99417 Updated to use H5O_info2_t for HDF5 1.12 and the use of H5Oget_info3 instead of H5Gget_objinfo 2020-03-12 15:50:24 +00:00
Edward Hartnett
b29f9f34a0 whitespace cleanup 2020-03-08 09:10:07 -06:00
Edward Hartnett
4c7e162f34 less use of contiguous/compact field 2020-03-08 07:31:21 -06:00
Edward Hartnett
053752440b stop setting contiguous field in nc4hdf5.c 2020-03-08 07:18:52 -06:00
Edward Hartnett
04eafff166 stop setting contiguous field in hdf5filter.c 2020-03-08 07:18:11 -06:00
Edward Hartnett
5574317db7 stop setting contiguous/compact fields at file open 2020-03-08 07:17:01 -06:00
Edward Hartnett
61357cfd4d more use of storage field 2020-03-08 07:09:15 -06:00
Edward Hartnett
1761850795 continuing to switch to storage field 2020-03-08 07:05:51 -06:00
Edward Hartnett
b98a37e0b3 using storage field in nc4var.c 2020-03-08 06:38:44 -06:00
Edward Hartnett
119e8e9465 using storage in hdf5filter.c 2020-03-08 06:31:34 -06:00
Edward Hartnett
8dec9f6c99 now setting storage field when setting var storage 2020-03-08 06:29:49 -06:00
Edward Hartnett
d87a073a34 starting to use storage field when opening file 2020-03-08 06:21:08 -06:00
Edward Hartnett
0c419ec582 removed commented-out code 2020-03-06 09:57:33 -07:00
Edward Hartnett
502336c2c7 now return NC_EINVAL on attempt to set chunking on scalar var 2020-03-03 11:57:16 -07:00
Dennis Heimbigner
73537603e2 Make scalar X filter return an error instead of ignoring it 2020-03-02 15:10:54 -07:00
Dennis Heimbigner
420fdf4625 fix memory allocation failure in hdf5var.c 2020-03-02 11:45:41 -07:00
Dennis Heimbigner
7d1ca9ac85 fix references to var->deflate' 2020-03-02 11:12:30 -07:00
Dennis Heimbigner
e66c727c28 Fix Filters x compact 2020-02-29 15:33:27 -07:00
Dennis Heimbigner
f376c23329 Make utilities support NC_COMPACT
re: https://github.com/Unidata/netcdf-c/issues/1642

Modify ncdump, nccopy, and ncgen to support the NC_COMPACT storage option.
Added test cases and added description to the man pages for the utilities.

1. ncdump: For compact storage variable, print special attribute __Storage_ as
````
    <var>: _Storage = "compact";
````

2. ncgen: parse and implement
````
    <var>: _Storage = "compact";
````
in a .cdl file

3. nccopy: Extend the chunk specification (-c flag) to support
   compact using the forms
````
nccopy ... -c <var>:compact
and
nccopy ... -c <var>:contiguous
````

Misc. other changes
1. cleanup the copy_chunking function in ncdump/nccopy.c
2020-02-29 12:06:21 -07:00
Dennis Heimbigner
10d227fc1b fix parallel filter error discovered by Hartnett 2020-02-28 11:36:58 -07:00
Dennis Heimbigner
a3a3e15cb1 fix bad edit 2020-02-27 15:33:39 -07:00
Dennis Heimbigner
afe5a2998c
Merge branch 'master' into multifilter.dmh 2020-02-27 15:02:27 -07:00
Dennis Heimbigner
b488c272d5 Fix conflicts with master 2020-02-27 14:06:45 -07:00
Edward Hartnett
6f95f655dd
Merge branch 'master' into ejh_dispatch 2020-02-26 16:12:57 -07:00
Edward Hartnett
418e428a05 fixed problem with scalar compact 2020-02-26 09:13:12 -07:00
Edward Hartnett
b31aedcc8e all tests passing but compact storage for scalars not being properly written in file yet 2020-02-26 08:14:06 -07:00
Edward Hartnett
6241a6e7a0 more tests for storage, changed var names to reflect stortage 2020-02-25 15:55:34 -07:00
Edward Hartnett
2ff24bd6fe more tests for compact storage 2020-02-25 13:30:38 -07:00
Dennis Heimbigner
44d0dcaad2 Add support for multiple filters per variable.
re: https://github.com/Unidata/netcdf-c/issues/1584

Support has been added for multiple filters per variable.  This
affects a number of components in netcdf. The new APIs are
documented in NUG/filters.md.

The primary changes are:
* A set of new functions are provided (see __include/netcdf_filter.h__).
    - Obtain a list of the filters associated with a variable
    - Obtain the parameters for a specific filter.
* The existing __nc_inq_var_filter__ function now returns info
  about the first defined filter.
* The utilities (ncgen, ncdump, and nccopy) now support
  an extended format for specifying a sequence of filters.
  The general form is __<filter>|<filter>..._.
* The ncdump **_Filter** attribute now dumps a list of all the
  filters associated with a variable using the above new format.
* Filter specifications can now use a filter name instead of number
  for filters known to the netcdf library, which in turn is taken
  from the HDF5 filter registration page.
* New errors are defined: NC_EFILTER and NC_ENOFILTER. The latter
  is returned if an attempt is made to access an unknown filter.
* Internally, the dispatch table has been extended to add a function
  to handle all of the filter functions.
* New, filter-related, tests were added to nc_test4.
* A new plugin was added to the plugins directory to help with testing.

Notes:
1. The shuffle and fletcher32 filters are not part of the multifilter system.

Misc. changes:
1. A debug module was added to libhdf5 to help catch error locations.
2020-02-16 12:59:33 -07:00
Edward Hartnett
a8684c730c fixed merge conflict in RELEASE_NOTES 2020-02-15 06:42:49 -07:00
Edward Hartnett
05a6ff74b2 merged changes from master 2020-02-11 17:19:53 -07:00
Edward Hartnett
15059a18b7 merged changes from master 2020-02-11 17:19:25 -07:00
Edward Hartnett
a0839a2a7a added version to dispatch table 2020-02-09 13:07:58 -07:00
Edward Hartnett
b7ac19a43f only close non-zero typeids 2020-02-09 12:03:21 -07:00
Edward Hartnett
af6b6787bf fix for memory leak due to HDF5 types 2020-02-09 11:47:13 -07:00
Edward Hartnett
8057a552ef move nc_def_var_szip function so it will appear in the documentation 2020-02-07 09:09:01 -07:00