re: Discussion https://github.com/Unidata/netcdf-c/discussions/2214
The primary change is to support so-called "standard filters".
A standard filter is one that is defined by the following
netcdf-c API:
````
int nc_def_var_XXX(int ncid, int varid, size_t nparams, unsigned* params);
int nc_inq_var_XXXX(int ncid, int varid, int* usefilterp, unsigned* params);
````
So for example, zstandard would be a standard filter by defining
the functions *nc_def_var_zstandard* and *nc_inq_var_zstandard*.
In order to define these functions, we need a new dispatch function:
````
int nc_inq_filter_avail(int ncid, unsigned filterid);
````
This function, combined with the existing filter API can be used
to implement arbitrary standard filters using a simple code pattern.
Note that I would have preferred that this function return a list
of all available filters, but HDF5 does not support that functionality.
So this PR implements the dispatch function and implements
the following standard functions:
+ bzip2
+ zstandard
+ blosc
Specific test cases are also provided for HDF5 and NCZarr.
Over time, other specific standard filters will be defined.
## Primary Changes
* Add nc_inq_filter_avail() to netcdf-c API.
* Add standard filter implementations to test use of *nc_inq_filter_avail*.
* Bump the dispatch table version number and add to all the relevant
dispatch tables (libsrc, libsrcp, etc).
* Create a program to invoke nc_inq_filter_avail so that it is accessible
to shell scripts.
* Cleanup szip support to properly support szip
when HDF5 is disabled. This involves detecting
libsz separately from testing if HDF5 supports szip.
* Integrate shuffle and fletcher32 into the existing
filter API. This means that, for example, nc_def_var_fletcher32
is now a wrapper around nc_def_var_filter.
* Extend the Codec defaulting to allow multiple default shared libraries.
## Misc. Changes
* Modify configure.ac/CMakeLists.txt to look for the relevant
libraries implementing standard filters.
* Modify libnetcdf.settings to list available standard filters
(including deflate and szip).
* Add CMake test modules to locate libbz2 and libzstd.
* Cleanup the HDF5 memory manager function use in the plugins.
* remove unused file include//ncfilter.h
* remove tests for the HDF5 memory operations e.g. H5allocate_memory.
* Add flag to ncdump to force use of _Filter instead of _Deflate
or _Shuffle or _Fletcher32. Used for testing.
re: https://github.com/Unidata/netcdf-c/issues/1836
Revert the internal filter code to simplify it. From the user's
point of view, the only visible changes should be:
1. The functions that convert text to filter specs have had their signature reverted and have been moved to netcdf_aux.h
2. Some filter API functions now return NC_ENOFILTER when inquiry is made about some filter.
Internally,the dispatch table has been modified to get rid of the filter_actions
entry and associated complex structures. It has been replaced with
inq_var_filter_ids and inq_var_filter_info entries and the dispatch table
version has been bumped to 3. Corresponding NOOP and NOTNC4 functions
were added to libdispatch/dnotnc4.c. Also, the filter_action entries
in dispatch tables were replaced for all dispatch code bases (HDF5, DAP2,
etc). This should only impact UDF users.
In the process, it became clear that the form of the filters
field in NC_VAR_INFO_T was format dependent, so I converted it to
be of type void* and pushed its management into the various dispatch
code bases. Specifically libhdf5 and libnczarr now manage the filters
field in their own way.
The auxilliary functions for parsing textual filter specifications
were moved to netcdf_aux.h and were renamed to the following:
* ncaux_h5filterspec_parse
* ncaux_h5filterspec_parselist
* ncaux_h5filterspec_free
* ncaux_h5filter_fix8
Misc. Other Changes:
1. Document NUG/filters.md updated to reflect the changes above.
2. All the old data types (structs and enums)
used by filter_actions actions were deleted.
The exception is the NC_H5_Filterspec because it is needed
by ncaux_h5filterspec_parselist.
3. Clientside filters were removed -- another enhancement
for which no-one ever asked.
4. The ability to remove filters was itself removed.
5. Some functionality needed by nczarr was moved from libhdf5
to libsrc4 e.g. nc4_find_default_chunksizes
6. All the filterx code was removed
7. ncfilter.h and nc4filter.c no longer used
Misc. Unrelated Changes:
1. The nczarr_test makefile clean was leaving some directories; so
add clean-local to take care of them.
disengagement of enable-netcdf4 from enable-hdf5.
That is, with the advent of nczarr, it is possible
to turn off hdf5 but still need netcdf-4 enabled
because nczarr uses libsrc4, but not libhdf5.
This change involves a bunch of things:
1. Modify configure.ac and CMakelist to make enable_hdf5
control if hdf5 support is provided. For back compatibility,
disable-netcdf4 is treated as disable-hdf5. But internally,
netcdf4 support is controlled only by the enabling of formats
that require it.
2. In support of #1, modify .travis.yml to use enable/disable-hdf5
instead of enable/disable-netcdf4.
3. test_common.in is modified to track selected features,
including enable-hdf5 and enable-s3-tests. This is used in
selected tests that mix netcdf-3 and netcdf4 tests.
4. The conflation of USE_HDF5 and USE_NETCDF4 is common in
code, tests, and build files, so all of those had to be weeded out.
5. It turns out that some of the NC4_dim functions really are HDF5 specific,
but are not treated as such. So they are moved from nc4dim.c to
hdf5dim.c or hdf5dispatch.c
6. Some generic functions in libhdf5 can be (and were) moved to libsrc4.
re: https://github.com/Unidata/netcdf-c/issues/1584
Support has been added for multiple filters per variable. This
affects a number of components in netcdf. The new APIs are
documented in NUG/filters.md.
The primary changes are:
* A set of new functions are provided (see __include/netcdf_filter.h__).
- Obtain a list of the filters associated with a variable
- Obtain the parameters for a specific filter.
* The existing __nc_inq_var_filter__ function now returns info
about the first defined filter.
* The utilities (ncgen, ncdump, and nccopy) now support
an extended format for specifying a sequence of filters.
The general form is __<filter>|<filter>..._.
* The ncdump **_Filter** attribute now dumps a list of all the
filters associated with a variable using the above new format.
* Filter specifications can now use a filter name instead of number
for filters known to the netcdf library, which in turn is taken
from the HDF5 filter registration page.
* New errors are defined: NC_EFILTER and NC_ENOFILTER. The latter
is returned if an attempt is made to access an unknown filter.
* Internally, the dispatch table has been extended to add a function
to handle all of the filter functions.
* New, filter-related, tests were added to nc_test4.
* A new plugin was added to the plugins directory to help with testing.
Notes:
1. The shuffle and fletcher32 filters are not part of the multifilter system.
Misc. changes:
1. A debug module was added to libhdf5 to help catch error locations.
Partially address: https://github.com/Unidata/netcdf-c/issues/1056
Currently, some of the entries in the dispatch table
are conditional'd on USE_NETCDF4.
As a step in upgrading the dispatch table for use
with user-defined tables, we remove that conditional.
This means that all dispatch tables must implement the
netcdf-4 specific functions even if only to make them
return NC_ENOTNC4. To simplify this, a set of default
functions are defined in libdispatch/dnotnc4.c to provide this
behavior. The file libdispatch/dnotnc3.c is also relevant to
this.
The primary fix is to modify the various dispatch tables to
remove the conditional and use the functions in
libdispatch/dnotnc4.c as appropriate. In practice, all of the
existing tables are prepared to handle this, so the only
real change is to remove the conditionals.
Misc. Unrelated fixes
1. Fix some annoying warnings in ncvalidator.
Notes:
1. This has not been tested with either pnetcdf or hdf4 enabled.
When those are enabled, it is possible that there are still
some conditionals that need to be fixed.
re: https://github.com/Unidata/netcdf-c/issues/1388
1. Centralize calls to curl_global_init and curl_global_cleanup
to libdispatch/ddispatch.c
2. Make the above calls if options require curl: currently
any of DAP2, DAP4, or byterange.
3. Side issue: Fix obscure bug in mmapio.c involving non-persistent mmap.
re: https://github.com/Unidata/netcdf-c/issues/1373 (partial)
* Mark some global constants be const to indicate to make them easier to track.
* Hide direct access to the ncrc_globalstate behind a function call.
* Convert dispatch tables to constants (except the user defined ones)
This has some consequences in terms of function arguments needing to be marked
as const also.
* Remove some no longer needed global fields
* Aggregate all the globals in nclog.c
* Uniformly replace nc_sizevector{0,1} with NC_coord_{zero,one}
* Uniformly replace nc_ptrdffvector1 with NC_stride_one
* Remove some obsolete code
re: issue https://github.com/Unidata/netcdf-c/issues/1251
Assume that you have the URL to a remote dataset
which is a normal netcdf-3 or netcdf-4 file.
This PR allows the netcdf-c to read that dataset's
contents as a netcdf file using HTTP byte ranges
if the remote server supports byte-range access.
Originally, this PR was set up to access Amazon S3 objects,
but it can also access other remote datasets such as those
provided by a Thredds server via the HTTPServer access protocol.
It may also work for other kinds of servers.
Note that this is not intended as a true production
capability because, as is known, this kind of access to
can be quite slow. In addition, the byte-range IO drivers
do not currently do any sort of optimization or caching.
An additional goal here is to gain some experience with
the Amazon S3 REST protocol.
This architecture and its use documented in
the file docs/byterange.dox.
There are currently two test cases:
1. nc_test/tst_s3raw.c - this does a simple open, check format, close cycle
for a remote netcdf-3 file and a remote netcdf-4 file.
2. nc_test/test_s3raw.sh - this uses ncdump to investigate some remote
datasets.
This PR also incorporates significantly changed model inference code
(see the superceded PR https://github.com/Unidata/netcdf-c/pull/1259).
1. It centralizes the code that infers the dispatcher.
2. It adds support for byte-range URLs
Other changes:
1. NC_HDF5_finalize was not being properly called by nc_finalize().
2. Fix minor bug in ncgen3.l
3. fix memory leak in nc4info.c
4. add code to walk the .daprc triples and to replace protocol=
fragment tag with a more general mode= tag.
Final Note:
Th inference code is still way too complicated. We need to move
to the validfile() model used by netcdf Java, where each
dispatcher is asked if it can process the file. This decentralizes
the inference code. This will be done after all the major new
dispatchers (PIO, Zarr, etc) have been implemented.