This PR started as an attempt to add unlimited dimensions to NCZarr.
It did that, but this exposed significant problems with test interference.
So this PR is mostly about fixing -- well mitigating anyway -- test
interference.
The problem of test interference is now documented in the document docs/internal.md.
The solutions implemented here are also describe in that document.
The solution is somewhat fragile but multiple cleanup mechanisms
are provided. Note that this feature requires that the
AWS command line utility must be installed.
## Unlimited Dimensions.
The existing NCZarr extensions to Zarr are modified to support unlimited dimensions.
NCzarr extends the Zarr meta-data for the ".zgroup" object to include netcdf-4 model extensions. This information is stored in ".zgroup" as dictionary named "_nczarr_group".
Inside "_nczarr_group", there is a key named "dims" that stores information about netcdf-4 named dimensions. The value of "dims" is a dictionary whose keys are the named dimensions. The value associated with each dimension name has one of two forms
Form 1 is a special case of form 2, and is kept for backward compatibility. Whenever a new file is written, it uses format 1 if possible, otherwise format 2.
* Form 1: An integer representing the size of the dimension, which is used for simple named dimensions.
* Form 2: A dictionary with the following keys and values"
- "size" with an integer value representing the (current) size of the dimension.
- "unlimited" with a value of either "1" or "0" to indicate if this dimension is an unlimited dimension.
For Unlimited dimensions, the size is initially zero, and as variables extend the length of that dimension, the size value for the dimension increases.
That dimension size is shared by all arrays referencing that dimension, so if one array extends an unlimited dimension, it is implicitly extended for all other arrays that reference that dimension.
This is the standard semantics for unlimited dimensions.
Adding unlimited dimensions required a number of other changes to the NCZarr code-base. These included the following.
* Did a partial refactor of the slice handling code in zwalk.c to clean it up.
* Added a number of tests for unlimited dimensions derived from the same test in nc_test4.
* Added several NCZarr specific unlimited tests; more are needed.
* Add test of endianness.
## Misc. Other Changes
* Modify libdispatch/ncs3sdk_aws.cpp to optionally support use of the
AWS Transfer Utility mechanism. This is controlled by the
```#define TRANSFER```` command in that file. It defaults to being disabled.
* Parameterize both the standard Unidata S3 bucket (S3TESTBUCKET) and the netcdf-c test data prefix (S3TESTSUBTREE).
* Fixed an obscure memory leak in ncdump.
* Removed some obsolete unit testing code and test cases.
* Uncovered a bug in the netcdf-c handling of big-endian floats and doubles. Have not fixed yet. See tst_h5_endians.c.
* Renamed some nczarr_tests testcases to avoid name conflicts with nc_test4.
* Modify the semantics of zmap\#ncsmap_write to only allow total rewrite of objects.
* Modify the semantics of zodom to properly handle stride > 1.
* Add a truncate operation to the libnczarr zmap code.
## Overwriting
I think I solved the file overwrite problem by doing light name
mangling of the shared library names. With this change the probabilty
is very small that installing our filter wrappers in a directory will
overwrite code produced by others.
## Default Install Location
I have setup the --with-plugin-dir option default to install in
the following locations in order of preference
1. If HDF5_PLUGIN_PATH is defined (at build time remember), then the last directory in that path will be where the filter wrapper shared libraries will be installed.
2. Otherwise the default is "/usr/local/hdf5/lib/plugin" (on *nix*) or "%ALLUSERSPROFILE%\\hdf5\\lib\\plugin" for Windows or Mingw.
Currently, --with-plugin-dir is disabled by default.
I should note that even if I enable it by default, installing
netcdf-c will still not run "out of the box" because the hypothetical
naive user will not know which compressor libraries need to be
pre-installed before netcdf is installed. Nor will that user have any
way to find out what needs to be installed.
re: https://github.com/Unidata/netcdf-c/issues/2338
re: https://github.com/Unidata/netcdf-c/issues/2294
In issue https://github.com/Unidata/netcdf-c/issues/2338,
Ed Hartnett suggested a better way to install filters to a user
defined location -- for Automake, anyway.
This PR implements that suggestion. It turns out to be more
complicated than it appears, so there are fair number of changes;
mostly to shell scripts. Most of the change is in plugins/Makefile.am.
NOTE: this PR still does NOT address the use of HDF5_PLUGIN_PATH
as the default; this turns out to be complex when dealing with NCZarr.
So this will be addressed in a subsequent post 4.9.0 PR.
## Misc. Changes
1. Record the occurrences of incomplete codecs in libnczarr so that
they can be included in _Codecs attribute correctly. This allows
users to see what missing filters are referenced in the Zarr file.
Primarily affects libnczarr/zfilter.[ch]. Also required creating a
new no-effect filter: H5Zunknown.c.
2. Move the unknown filter test to a separate test file.
3. Incorporates PR https://github.com/Unidata/netcdf-c/pull/2343
re: https://github.com/Unidata/netcdf-c/issues/2294
Ed Hartnett suggested that the netcdf library installation process
be extended to install the standard filters into a user specified
location. The user can then set HDF5_PLUGIN_PATH to that location.
This PR provides that capability using:
````
configure option: --with-plugin-dir=<absolute directory path>
cmake option: -DPLUGIN_INSTALL_DIR=<absolute directory path>
````
Currently, the following plugins are always installed, if
available: bzip2, zstd, blosc.
If NCZarr is enabled, then additional plugins are installed:
fletcher32, shuffle, deflate, szip.
Additionally, the necessary codec support is installed
for each of the above filters that is installed.
## Changes:
1. Cleanup handling of built-in bzip2.
2. Add documentation to docs/filters.md
3. Re-factor the NCZarr codec libraries
4. Add a test, although it can only be exercised after
the library is installed, so it cannot be used during
normal testing.
5. Cleanup use of HDF5_PLUGIN_PATH in the filter test cases.
re: PR https://github.com/Unidata/netcdf-c/pull/2088
re: PR https://github.com/Unidata/netcdf-c/pull/2130
replaces: https://github.com/Unidata/netcdf-c/pull/2140
Changes:
* Add NCZarr-specific quantize functions to the dispatch table.
* Copy (modified) quantize code from libhdf5 to NCZarr
* Add quantize invocation to zvar.c
* Add support for _QuantizeBitgroomNumberOfSignificantDigits
and _QuantizeGranularBitgroomNumberOfSignificantDigits to ncgen.
* Modify nc_test4/tst_quantize.c to allow it to be used both for hdf5
and for nczarr.
* Make dap4 properly handle quantize functions in dispatch table.
* Add quantize attribute support to ncgen.
Other changes:
* Caught and fixed some S3 problems
* Fixed some nczarr fillvalue problems.
* Fixed some nczarr cache problems.
* Cleanup some flaws in libdispatch/dinfermodel.c
* Allow byterange requests to S3 be readable by dinfermodel.c/check_file_type
* Remove the libnczarr ztracedispatch code (big change).
re:
The current netcdf-c release has some problems with the mingw platform
on windows. Mostly they are path issues.
Changes to support mingw+msys2:
-------------------------------
* Enable option of looking into the windows registry to find
the mingw root path. In aid of proper path handling.
* Add mingw+msys as a specific platform in configure.ac and move testing
of the platform to the front so it is available early.
* Handle mingw X libncpoco (dynamic loader) properly even though
mingw does not yet support it.
* Handle mingw X plugins properly even though mingw does not yet support it.
* Alias pwd='pwd -W' to better handle paths in shell scripts.
* Plus a number of other minor compile irritations.
* Disallow the use of multiple nc_open's on the same file for windows
(and mingw) because windows does not seem to handle these properly.
Not sure why we did not catch this earlier.
* Add mountpoint info to dpathmgr.c to help support mingw.
* Cleanup dpathmgr conversions.
Known problems:
---------------
* I have not been able to get shared libraries to work, so
plugins/filters must be disabled.
* There is some kind of problem with libcurl that I have not solved,
so all uses of libcurl (currently DAP+Byterange) must be disabled.
Misc. other fixes:
------------------
* Cleanup the relationship between ENABLE_PLUGINS and various other flags
in CMakeLists.txt and configure.ac.
* Re-arrange the TESTDIRS order in Makefile.am.
* Add pseudo-breakpoint to nclog.[ch] for debugging.
* Improve the documentation of the path manager code in ncpathmgr.h
* Add better support for relative paths in dpathmgr.c
* Default the mode args to NCfopen to include "b" (binary) for windows.
* Add optional debugging output in various places.
* Make sure that everything builds with plugins disabled.
* Fix numerous (s)printf inconsistencies betweenb the format spec
and the arguments.
Filter support has three goals:
1. Use the existing HDF5 filter implementations,
2. Allow filter metadata to be stored in the NumCodecs metadata format used by Zarr,
3. Allow filters to be used even when HDF5 is disabled
Detailed usage directions are define in docs/filters.md.
For now, the existing filter API is left in place. So filters
are defined using ''nc_def_var_filter'' using the HDF5 style
where the id and parameters are unsigned integers.
This is a big change since filters affect many parts of the code.
In the following, the terms "compressor" and "filter" and "codec" are generally
used synonomously.
### Filter-Related Changes:
* In order to support dynamic loading of shared filter libraries, a new library was added in the libncpoco directory; it helps to isolate dynamic loading across multiple platforms.
* Provide a json parsing library for use by plugins; this is created by merging libdispatch/ncjson.c with include/ncjson.h.
* Add a new _Codecs attribute to allow clients to see what codecs are being used; let ncdump -s print it out.
* Provide special headers to help support compilation of HDF5 filters when HDF5 is not enabled: netcdf_filter_hdf5_build.h and netcdf_filter_build.h.
* Add a number of new test to test the new nczarr filters.
* Let ncgen parse _Codecs attribute, although it is ignored.
### Plugin directory changes:
* Add support for the Blosc compressor; this is essential because it is the most common compressor used in Zarr datasets. This also necessitated adding a CMake FindBlosc.cmake file
* Add NCZarr support for the big-four filters provided by HDF5: shuffle, fletcher32, deflate (zlib), and szip
* Add a Codec defaulter (see docs/filters.md) for the big four filters.
* Make plugins work with windows by properly adding __declspec declaration.
### Misc. Non-Filter Changes
* Replace most uses of USE_NETCDF4 (deprecated) with USE_HDF5.
* Improve support for caching
* More fixes for path conversion code
* Fix misc. memory leaks
* Add new utility -- ncdump/ncpathcvt -- that does more or less the same thing as cygpath.
* Add a number of new test to test the non-filter fixes.
* Update the parsers
* Convert most instances of '#ifdef _MSC_VER' to '#ifdef _WIN32'
There were some irregularities in the flags for handling NCZarr S3 support.
The primary change is to regularize the flags controlling this to the following.
1. Automake: --enable-nczarr-s3 and CMake: ENABLE_NCZARR_S3
2. Automake: --enable-nczarr-s3-tests and CMake: ENABLE_NCZARR_S3_TESTS
Flag 1 indicates that NCZarr should be built with S3 support enabled.
Flag 2 indicates that the NCZarr S3 tests should be run
These two flags are separate because running the NCZarr S3 tests
requires access to protected S3 resources. Currently, running
these tests is restricted to Unidata personnel. However, users
may want to enable S3 support even if they cannot run the tests.
It is, of course, an error to specify 2 without specifying 1.
Additionally, if the AWS S3 SDK library is not found, then the NCZARR S3
support and testing must be disabled. Otherwise an error is signaled
during the build.
Some of these NCZarr and S3 changes are propagated to nc-config.
Misc. Other Changes:
1. Allow testing for CYGWIN or MSVC in shell scripts.
2. Add specific test for HDF5 library version 1.10.6.
This is encoded as "HDF5_UTF8_PATHS" because that is the first
version where HDF5 properly supports it under Windows. This is used
in hdf5internal/nc4_ndf5_ansi_to_utf8.
3. Add a AM Conditional -- AX_IGNORE -- for use in testing
when it is desirable to temporarily suppress Makefile code.
4. Add MULTIFILTER flag to CMakeLists.txt
disengagement of enable-netcdf4 from enable-hdf5.
That is, with the advent of nczarr, it is possible
to turn off hdf5 but still need netcdf-4 enabled
because nczarr uses libsrc4, but not libhdf5.
This change involves a bunch of things:
1. Modify configure.ac and CMakelist to make enable_hdf5
control if hdf5 support is provided. For back compatibility,
disable-netcdf4 is treated as disable-hdf5. But internally,
netcdf4 support is controlled only by the enabling of formats
that require it.
2. In support of #1, modify .travis.yml to use enable/disable-hdf5
instead of enable/disable-netcdf4.
3. test_common.in is modified to track selected features,
including enable-hdf5 and enable-s3-tests. This is used in
selected tests that mix netcdf-3 and netcdf4 tests.
4. The conflation of USE_HDF5 and USE_NETCDF4 is common in
code, tests, and build files, so all of those had to be weeded out.
5. It turns out that some of the NC4_dim functions really are HDF5 specific,
but are not treated as such. So they are moved from nc4dim.c to
hdf5dim.c or hdf5dispatch.c
6. Some generic functions in libhdf5 can be (and were) moved to libsrc4.
After a long discussion, I implemented the rules at the end of that issue.
They are documented in nccopy.1.
Additionally, I added a new, per-variable, -c flag that allows
for the direct setting of the chunking parameters for a variable.
The form is
-c var:c1,c2,...ck
where var is the name of the variable (possibly a fully qualified name)
and the ci are the chunksizes for that variable. It must be the case
that the rank of the variable is k. If the new form is used as well
as the old form, then the new form overrides the old form for the
specified variable. Note that multiple occurrences of the new form
-c flag may be specified.
Misc. Other fixes
1. Added -M <size> option to nccopy to specify the minimum
allowable chunksize.
2. Removed the unused variables from bigmeta.c
(Issue https://github.com/Unidata/netcdf-c/issues/1079)
3. Fixed failure of nc_test4/tst_filter.sh by using the new -M
flag (#1) to allow filter test on a small chunk size.
I took Ed's advice and moved the plugin stuff to its own
top-level directory. This is an attempt to solve the problem of
copying files that we have experienced. In any case, it will
serve as a place to stick additional plugins.
2. Fixed plugin building (nc_test4/hdf5plugins)
to be done properly by cmake and automake.
4. Duplicated part of the nc_test4 filter test code
in examples/C
An incomplete and untested set of hooks exist
for OS-X in nc_test4/findplugins.in. They need testing.