re: https://github.com/Unidata/netcdf-c/issues/2119
H/T to [Egbert Eich](https://github.com/e4t) and [Bas Couwenberg](https://github.com/sebastic) for this PR.
It is undesirable to make netcdf be dependent on the availability
of libxml2, but it is desirable to allow its use if available.
In order to do this, a wrapper API (include/ncxml.h) was constructed
that supports either ezxml or libxml2 as the implementation.
Additionally, the xml support code was moved to a new directory
netcdf-c/libncxml.
Primary changes:
* Create a new sub-directory named netcdf-c/libncxml to hold all the xml implementation code.
* Move ezxml.c and ezxml.h to libncxml
* Create a wrapper API -- include/ncxml.h
* Create an implementation, ncxml_ezxml.c to support use of ezxml.
* Create an implementation, ncxml_xml2.c to support use of libxml2.
* Add a check for libxml2 in configure.ac and CMakeLists.txt
* Modify libdap to use the wrapper API instead of ezxml directly.
Misc. Other Changes:
* Change include/netcdf_json.h from built source to be part of the distribution.
re: https://github.com/Unidata/netcdf-c/issues/2117
re: https://github.com/Unidata/netcdf-c/issues/2119
* Modify libsrc to allow byte-range reading of netcdf-3 files in private S3 buckets; this required using the aws sdk. Also add a test case.
* The aws sdk can sometimes cause problems if the Awd::ShutdownAPI function is not called. So at optional atexit() support to ensure it is called. This is disabled for Windows.
* Add documentation to nczarr.md on how to build and use the aws sdk under windows. Currently it builds, but testing fails.
* Switch testing from stratus to the Unidata bucket on S3.
* Improve support for the s3: url protocol.
* Add a s3 specific utility code file: ds3util.c
* Modify NC_infermodel to attempt to read the magic number of byte-ranged files in S3.
## Misc.
* Move and rename the core S3 SDK wrapper code (libnczarr/zs3sdk.cpp) to libdispatch since it now used in libsrc as well as libnczarr.
* Add calls to nc_finalize in the utilities in case atexit is disabled.
* Add header only json parser to the distribution rather than as a built source.
HDF5 can depend on the Z library (in fact required for netCDF). Moved the detection of whether hdf5 was built with zlib up before any other tests that may require linking of the hdf5 library to determine presence/absence of symbols. These tests require that the link line include "-lz" if the hdf5 library was built with libz support.
This is typically handled somewhat automatically if shared libraries are being used, but in the static library case, the explicit dependency needs to be specified. For internal CMake checks, it uses the `CMAKE_REQUIRED_LIBRARIES` list to specify the libraries that should be used in a `CHECK_C_SOURCE_COMPILE` or a `CHECK_LIBRARY_EXISTS` call. In the current CMakeLists.txt ordering, the zlib detection is done _after_ the `CHECK_LIBRARY_EXISTS` calls which can cause them to fail and give an incorrect result about whether the function being tested for exists. With the reordering in this PR, I am able to correctly configure netCDF on a CRAY HPC system that uses static libraries by default.
On some versions of the HDF5 find_package call, it sets `HDF5_C_LIBRARIES` and `HDF5_HL_LIBRARIES`, but does not set the `HDF5_C_LIBRARY` or `HDF5_HL_LIBRARY` to anything. Control then falls out of the if block with these unset and it falls into the default setting at line 792. This does not include the path, so then when the later `CHECK_LIBRARY_EXISTS` calls are run, they do not have the full path to the library and will not link correctly. Since the link fails, the code defaults to thinking that none of the symbols are defined.
I don't think this change will have any affect since it only sets the symbols if they are unset.
## S3 Related Fixes
* Add comprehensive support for specifying AWS profiles to provide access credentials.
* Parse the files "~/.aws/config" and "~/.aws/credentials to provide credentials for the HDF5 ROS3 driver and to locate default region.
* Add a function to obtain the currently active S3 credentials. The search rules are defined in docs/nczarr.md.
* Provide documentation for the new features.
* Modify the struct NCauth (in include/ncauth.h) to replace specific S3 credentials with a profile name.
* Add a unit test to test the operation of profile and credentials management.
* Add support for URLS of the form "s3://<bucket>/<key>"; this requires obtaining a default region.
* Allows the specification of profile and/or region in a URL of the form "#mode=nczarr,...&aws.region=...&aws.profile=..."
## Misc. Fixes
* Move the ezxml code to libdispatch so that it can be used both by DAP4 and nczarr.
* Modify nclist to provide a deep clone operation.
* Modify ncuri to provide a deep clone operation.
* Modify the .rc file format to allow the specification of a path to be tested when looking for an entry in the .rc file.
* Ensure that the NC_rcload function is called.
* Modify nchttp to support setting request headers.
Filter support has three goals:
1. Use the existing HDF5 filter implementations,
2. Allow filter metadata to be stored in the NumCodecs metadata format used by Zarr,
3. Allow filters to be used even when HDF5 is disabled
Detailed usage directions are define in docs/filters.md.
For now, the existing filter API is left in place. So filters
are defined using ''nc_def_var_filter'' using the HDF5 style
where the id and parameters are unsigned integers.
This is a big change since filters affect many parts of the code.
In the following, the terms "compressor" and "filter" and "codec" are generally
used synonomously.
### Filter-Related Changes:
* In order to support dynamic loading of shared filter libraries, a new library was added in the libncpoco directory; it helps to isolate dynamic loading across multiple platforms.
* Provide a json parsing library for use by plugins; this is created by merging libdispatch/ncjson.c with include/ncjson.h.
* Add a new _Codecs attribute to allow clients to see what codecs are being used; let ncdump -s print it out.
* Provide special headers to help support compilation of HDF5 filters when HDF5 is not enabled: netcdf_filter_hdf5_build.h and netcdf_filter_build.h.
* Add a number of new test to test the new nczarr filters.
* Let ncgen parse _Codecs attribute, although it is ignored.
### Plugin directory changes:
* Add support for the Blosc compressor; this is essential because it is the most common compressor used in Zarr datasets. This also necessitated adding a CMake FindBlosc.cmake file
* Add NCZarr support for the big-four filters provided by HDF5: shuffle, fletcher32, deflate (zlib), and szip
* Add a Codec defaulter (see docs/filters.md) for the big four filters.
* Make plugins work with windows by properly adding __declspec declaration.
### Misc. Non-Filter Changes
* Replace most uses of USE_NETCDF4 (deprecated) with USE_HDF5.
* Improve support for caching
* More fixes for path conversion code
* Fix misc. memory leaks
* Add new utility -- ncdump/ncpathcvt -- that does more or less the same thing as cygpath.
* Add a number of new test to test the non-filter fixes.
* Update the parsers
* Convert most instances of '#ifdef _MSC_VER' to '#ifdef _WIN32'
A fairly vanilla build of 1.12.1 into a non-default directory ends up
with `HDF5_C_LIBRARY` set to `hdf5` which ends up failing all of the
`try_compile` checks because `-lhdf5` cannot be found.
* Set `CMAKE_REQUIRED_INCLUDES` to include the path found for `curl.h`. The `CHECK_C_SOURCE_COMPILES` function uses this and not the `INCLUDE_DIRECTORIES`
* Make the test for version 7.66 or later match the same test in `configure.ac`
* If the version is 7.66 or later, then we can skip the tests for the curl symbols which were all added in versions prior to 7.66.
* If the version is earlier than 7.66, then continue to perform the tests.
The thredds-test server now has some password protected datasets
that can be used to test DAP2 authorization support.
The general location is
````
https://thredds.ucar.edu/thredds/tdscapabilities/authTest.html
````
and specifically:
````
https://thredds.ucar.edu/thredds/dodsC/test3/testData.nc.html
````
This PR replaces old testcases with ncdap_test/testauth.sh.
This testcase allows us to test use of the .dodsrc file and .netrc file
and embedded user+pwd.
As part of this, I had to create a program (ncdap_test/pathcvt.c)
that is essentially the equivalent to cygpath. Given a path in
windows, unix, msys or cygwin format, it converts it to the
equivalent format in one of those four cases. So it can be used
to convert a cygwin path to a windows path, for example. This is
needed in testpathcvt and testauth to make sure that the paths
in .daprc (e.g. the reference to .netrc) are of the proper
format.
Misc. Other Changes:
1. Fix some memory leaks in libdap2
2. Setting the env variable CURLOPT_VERBOSE allows tracking of curl
operations.
3. Make tst_charvlenbug be conditional on NC_VLEN_NOTEST.
Re: https://github.com/zarr-developers/zarr-python/pull/716
The Zarr version 2 spec has been extended to include the ability
to choose the dimension separator in chunk name keys. The legal
separators has been extended from {'.'} to {'.' '/'}. So now it
is possible to use a key like "0/1/2/0" for chunk names.
This PR implements this for NCZarr. The V2 spec now says that
this separator can be set on a per-variable basis. For now, I
have chosen to allow this be set only globally by adding a key
named "ZARR.DIMENSION_SEPARATOR=<char>" in the
.daprc/.dodsrc/ncrc file. Currently, the only legal separator
characters are '.' (the default) and '/'. On writing, this key
will only be written if its value is different than the default.
This change caused problems because supporting a separator of '/'
is difficult to parse when keys/paths use '/' as the path separator.
A test case was added for this.
Additionally, make nczarr be enabled default by default. This required
some additional changes so that if zip and/or AWS S3 sdk are unavailable,
then they are disabled for NCZarr.
In addition the following unrelated changes were made.
1. Tested that pure-zarr mode could read an nczarr formatted store.
1. The .rc file handling now merges all known .rc files (.ncrc,.daprc, and .dodsrc) in that order and using those in HOME first, then in current directory. For duplicate entries, the later ones override the earlier ones. This change is to remove some of the conflicts inherent in the current .rc file load process. A set of test cases was also added.
1. Re-order tests in configure.ac and CMakeLists.txt so that if libcurl
is not found then the other options that depend upon it properly
are disabled.
1. I decided that xarray support should be enabled by default for pure
zarr. In order to allow disabling, I added a new mode flag "noxarray".
1. Certain test in nczarr_test depend on use of .dodsrc. In order for these
to work when testing in parallel, some inter-test dependencies needed to
be added.
1. Improved authorization testing to use changes in thredds.ucar.edu