re: PR https://github.com/Unidata/netcdf-c/pull/2088
re: PR https://github.com/Unidata/netcdf-c/pull/2130
replaces: https://github.com/Unidata/netcdf-c/pull/2140
Changes:
* Add NCZarr-specific quantize functions to the dispatch table.
* Copy (modified) quantize code from libhdf5 to NCZarr
* Add quantize invocation to zvar.c
* Add support for _QuantizeBitgroomNumberOfSignificantDigits
and _QuantizeGranularBitgroomNumberOfSignificantDigits to ncgen.
* Modify nc_test4/tst_quantize.c to allow it to be used both for hdf5
and for nczarr.
* Make dap4 properly handle quantize functions in dispatch table.
* Add quantize attribute support to ncgen.
Other changes:
* Caught and fixed some S3 problems
* Fixed some nczarr fillvalue problems.
* Fixed some nczarr cache problems.
* Cleanup some flaws in libdispatch/dinfermodel.c
* Allow byterange requests to S3 be readable by dinfermodel.c/check_file_type
* Remove the libnczarr ztracedispatch code (big change).
re:
The current netcdf-c release has some problems with the mingw platform
on windows. Mostly they are path issues.
Changes to support mingw+msys2:
-------------------------------
* Enable option of looking into the windows registry to find
the mingw root path. In aid of proper path handling.
* Add mingw+msys as a specific platform in configure.ac and move testing
of the platform to the front so it is available early.
* Handle mingw X libncpoco (dynamic loader) properly even though
mingw does not yet support it.
* Handle mingw X plugins properly even though mingw does not yet support it.
* Alias pwd='pwd -W' to better handle paths in shell scripts.
* Plus a number of other minor compile irritations.
* Disallow the use of multiple nc_open's on the same file for windows
(and mingw) because windows does not seem to handle these properly.
Not sure why we did not catch this earlier.
* Add mountpoint info to dpathmgr.c to help support mingw.
* Cleanup dpathmgr conversions.
Known problems:
---------------
* I have not been able to get shared libraries to work, so
plugins/filters must be disabled.
* There is some kind of problem with libcurl that I have not solved,
so all uses of libcurl (currently DAP+Byterange) must be disabled.
Misc. other fixes:
------------------
* Cleanup the relationship between ENABLE_PLUGINS and various other flags
in CMakeLists.txt and configure.ac.
* Re-arrange the TESTDIRS order in Makefile.am.
* Add pseudo-breakpoint to nclog.[ch] for debugging.
* Improve the documentation of the path manager code in ncpathmgr.h
* Add better support for relative paths in dpathmgr.c
* Default the mode args to NCfopen to include "b" (binary) for windows.
* Add optional debugging output in various places.
* Make sure that everything builds with plugins disabled.
* Fix numerous (s)printf inconsistencies betweenb the format spec
and the arguments.
re: https://github.com/Unidata/netcdf-c/issues/2151
The of a non-aws appliance broke during the switch to testing
against Amazon S3.
So make necessary changes to get non-aws appliances work correctly.
re: https://github.com/Unidata/netcdf-c/issues/2119
H/T to [Egbert Eich](https://github.com/e4t) and [Bas Couwenberg](https://github.com/sebastic) for this PR.
It is undesirable to make netcdf be dependent on the availability
of libxml2, but it is desirable to allow its use if available.
In order to do this, a wrapper API (include/ncxml.h) was constructed
that supports either ezxml or libxml2 as the implementation.
Additionally, the xml support code was moved to a new directory
netcdf-c/libncxml.
Primary changes:
* Create a new sub-directory named netcdf-c/libncxml to hold all the xml implementation code.
* Move ezxml.c and ezxml.h to libncxml
* Create a wrapper API -- include/ncxml.h
* Create an implementation, ncxml_ezxml.c to support use of ezxml.
* Create an implementation, ncxml_xml2.c to support use of libxml2.
* Add a check for libxml2 in configure.ac and CMakeLists.txt
* Modify libdap to use the wrapper API instead of ezxml directly.
Misc. Other Changes:
* Change include/netcdf_json.h from built source to be part of the distribution.
re: https://github.com/Unidata/netcdf-c/issues/2117
re: https://github.com/Unidata/netcdf-c/issues/2119
* Modify libsrc to allow byte-range reading of netcdf-3 files in private S3 buckets; this required using the aws sdk. Also add a test case.
* The aws sdk can sometimes cause problems if the Awd::ShutdownAPI function is not called. So at optional atexit() support to ensure it is called. This is disabled for Windows.
* Add documentation to nczarr.md on how to build and use the aws sdk under windows. Currently it builds, but testing fails.
* Switch testing from stratus to the Unidata bucket on S3.
* Improve support for the s3: url protocol.
* Add a s3 specific utility code file: ds3util.c
* Modify NC_infermodel to attempt to read the magic number of byte-ranged files in S3.
## Misc.
* Move and rename the core S3 SDK wrapper code (libnczarr/zs3sdk.cpp) to libdispatch since it now used in libsrc as well as libnczarr.
* Add calls to nc_finalize in the utilities in case atexit is disabled.
* Add header only json parser to the distribution rather than as a built source.
## S3 Related Fixes
* Add comprehensive support for specifying AWS profiles to provide access credentials.
* Parse the files "~/.aws/config" and "~/.aws/credentials to provide credentials for the HDF5 ROS3 driver and to locate default region.
* Add a function to obtain the currently active S3 credentials. The search rules are defined in docs/nczarr.md.
* Provide documentation for the new features.
* Modify the struct NCauth (in include/ncauth.h) to replace specific S3 credentials with a profile name.
* Add a unit test to test the operation of profile and credentials management.
* Add support for URLS of the form "s3://<bucket>/<key>"; this requires obtaining a default region.
* Allows the specification of profile and/or region in a URL of the form "#mode=nczarr,...&aws.region=...&aws.profile=..."
## Misc. Fixes
* Move the ezxml code to libdispatch so that it can be used both by DAP4 and nczarr.
* Modify nclist to provide a deep clone operation.
* Modify ncuri to provide a deep clone operation.
* Modify the .rc file format to allow the specification of a path to be tested when looking for an entry in the .rc file.
* Ensure that the NC_rcload function is called.
* Modify nchttp to support setting request headers.
Filter support has three goals:
1. Use the existing HDF5 filter implementations,
2. Allow filter metadata to be stored in the NumCodecs metadata format used by Zarr,
3. Allow filters to be used even when HDF5 is disabled
Detailed usage directions are define in docs/filters.md.
For now, the existing filter API is left in place. So filters
are defined using ''nc_def_var_filter'' using the HDF5 style
where the id and parameters are unsigned integers.
This is a big change since filters affect many parts of the code.
In the following, the terms "compressor" and "filter" and "codec" are generally
used synonomously.
### Filter-Related Changes:
* In order to support dynamic loading of shared filter libraries, a new library was added in the libncpoco directory; it helps to isolate dynamic loading across multiple platforms.
* Provide a json parsing library for use by plugins; this is created by merging libdispatch/ncjson.c with include/ncjson.h.
* Add a new _Codecs attribute to allow clients to see what codecs are being used; let ncdump -s print it out.
* Provide special headers to help support compilation of HDF5 filters when HDF5 is not enabled: netcdf_filter_hdf5_build.h and netcdf_filter_build.h.
* Add a number of new test to test the new nczarr filters.
* Let ncgen parse _Codecs attribute, although it is ignored.
### Plugin directory changes:
* Add support for the Blosc compressor; this is essential because it is the most common compressor used in Zarr datasets. This also necessitated adding a CMake FindBlosc.cmake file
* Add NCZarr support for the big-four filters provided by HDF5: shuffle, fletcher32, deflate (zlib), and szip
* Add a Codec defaulter (see docs/filters.md) for the big four filters.
* Make plugins work with windows by properly adding __declspec declaration.
### Misc. Non-Filter Changes
* Replace most uses of USE_NETCDF4 (deprecated) with USE_HDF5.
* Improve support for caching
* More fixes for path conversion code
* Fix misc. memory leaks
* Add new utility -- ncdump/ncpathcvt -- that does more or less the same thing as cygpath.
* Add a number of new test to test the non-filter fixes.
* Update the parsers
* Convert most instances of '#ifdef _MSC_VER' to '#ifdef _WIN32'
re: https://github.com/zarr-developers/zarr-specs/issues/41
After discussions with the Zarr community, it was decided to
convert to a new representation of the NCZarr meta-data extensions: version 2.
These extensions store information necessary to mapping the Zarr data model
to the netcdf-4 data model.
The basic change is to remove the NCZarr specific objects: .nczarr, .nczgroup, .nczarray, and .nczattr.
The contents of these objects is moved into the corresponding existing Zarr objects as special keys. The mapping is as follows:
* ''.nczarr'' => ''/.zgroup/_NCZARR_SUPERBLOCK_''
* ''.nczgroup => ''.zgroup/_NCZARR_GROUP_''
* ''.nczarray => ''.zarray/_NCZARR_ARRAY_''
* ''.nczattr => ''.zattr/_NCZARR_ATTR_''
Backward compatibility is maintained by looking for the object ''/.nczarr''
and if found, then assuming that the dataset is in the older version 1 format.
This compatibility only supports reading of such version 1 datasets.
Documentation and test cases are also added.
Misc. Other Changes:
1. The json parsing code was added to the general library instead of nczarr only (ncjson.c, ncjson.h).
2. Improved support for different platform paths by allowing conversion
to a single common path representation.
3. Add some new error codes.
4. Modify nccopy usage to mention the new chunking specification.
re: https://github.com/Unidata/netcdf-c/pull/2003#issuecomment-847637871
Turns out that mingw defines both _WIN32 and also defines getopt.
This means that this test:
````
#ifdef _WIN32
#include "XGetopt.h"
#endif
````
fails on this error:
````
../include/XGetopt.h:38:24: error: conflicting types for 'getopt'
````
Fix is to replace
````
#ifdef _WIN32
with
#if defined(_WIN32) && !defined(__MINGW32__)
````
re: Issue https://github.com/Unidata/netcdf-c/issues/1999
NCclosedir code is incorrect. Fix.
Note that this issue crops up when using a non-VisualStudio windows build
such as Mingw because Mingq defines dirent.h, but Visual Studio does not.
Addendum:
Fix some mingw bugs:
1. Modify XGetopt.h to be conditional on _WIN32 instead of _MSC_VER.
2. Make sure sys/stat.h is included in ncpathmgr.h
re: github issue https://github.com/Unidata/netcdf-c/issues/1982
The problem was that the libnczarr/zsjon.c handling of strings with
embedded double quotes was wrong; a one line fix.
Also added a test case.
Misc. other changes:
1. I Discovered, en passant, that the handling of 64 bit constants
had an error that was fixed.
2. cleanup of the constant conversion code to recurse on arrays of values.
Re: https://github.com/zarr-developers/zarr-python/pull/716
The Zarr version 2 spec has been extended to include the ability
to choose the dimension separator in chunk name keys. The legal
separators has been extended from {'.'} to {'.' '/'}. So now it
is possible to use a key like "0/1/2/0" for chunk names.
This PR implements this for NCZarr. The V2 spec now says that
this separator can be set on a per-variable basis. For now, I
have chosen to allow this be set only globally by adding a key
named "ZARR.DIMENSION_SEPARATOR=<char>" in the
.daprc/.dodsrc/ncrc file. Currently, the only legal separator
characters are '.' (the default) and '/'. On writing, this key
will only be written if its value is different than the default.
This change caused problems because supporting a separator of '/'
is difficult to parse when keys/paths use '/' as the path separator.
A test case was added for this.
Additionally, make nczarr be enabled default by default. This required
some additional changes so that if zip and/or AWS S3 sdk are unavailable,
then they are disabled for NCZarr.
In addition the following unrelated changes were made.
1. Tested that pure-zarr mode could read an nczarr formatted store.
1. The .rc file handling now merges all known .rc files (.ncrc,.daprc, and .dodsrc) in that order and using those in HOME first, then in current directory. For duplicate entries, the later ones override the earlier ones. This change is to remove some of the conflicts inherent in the current .rc file load process. A set of test cases was also added.
1. Re-order tests in configure.ac and CMakeLists.txt so that if libcurl
is not found then the other options that depend upon it properly
are disabled.
1. I decided that xarray support should be enabled by default for pure
zarr. In order to allow disabling, I added a new mode flag "noxarray".
1. Certain test in nczarr_test depend on use of .dodsrc. In order for these
to work when testing in parallel, some inter-test dependencies needed to
be added.
1. Improved authorization testing to use changes in thredds.ucar.edu
re: https://github.com/Unidata/netcdf-c/issues/1988
There was an issue with certain shell programs (bash notably).
For certain platforms and when given a url that had an escaped
'#' character (e.g. \\#) bash would not remove the backslash. So I
had to add a hack for this. Unfortunately I overdid it and it
removed all '' characters. This is ok for non-windows platforms,
but obviously fails for windows.
The fix is this.
1. In a utility program (ncgen, ncdump, nccopy, etc) there is probably a call (or calls) to NC_backslashUnescape(xxx) where xxx is a path argument from the command line.
2. Replace each such call with NC_shellUnescape(xxx).
The NC_shellUnescape function was added and searched only for occurrences of "\#" and replaces them with "#".
interoperability fixed. We were given a Zarr format dataset
stored as a directory+file tree. This dataset uses the XArray
conventions and was generated by some non-Unidata Zarr implementation.
In attempting to process it with NCZarr, several interoperability
problems were discovered and fixed. This gives us more confidence
that NCZarr -- using pure zarr -- can interoperate with other
Zarr implementations.
Specific changes:
* Add test nczarr_test/run_interop.sh
* Support attributes with single value not enclosed in JSON array tags.
* Add mode inferencing and use it in nczarr_test/run_purezarr.sh
* Reduce size of tst_err_enddef.nc because it is more than 3 GB.
The netcdf-c code has to deal with a variety of platforms:
Windows, OSX, Linux, Cygwin, MSYS, etc. These platforms differ
significantly in the kind of file paths that they accept. So in
order to handle this, I have created a set of replacements for
the most common file system operations such as _open_ or _fopen_
or _access_ to manage the file path differences correctly.
A more limited version of this idea was already implemented via
the ncwinpath.h and dwinpath.c code. So this can be viewed as a
replacement for that code. And in path in many cases, the only
change that was required was to replace '#include <ncwinpath.h>'
with '#include <ncpathmgt.h>' and then replace file operation
calls with the NCxxx equivalent from ncpathmgr.h Note that
recently, the ncwinpath.h was renamed ncpathmgmt.h, so this pull
request should not require dealing with winpath.
The heart of the change is include/ncpathmgmt.h, which provides
alternate operations such as NCfopen or NCaccess and which properly
parse and rebuild path arguments to work for the platform on which
the code is executing. This mostly matters for Windows because of the
way that it uses backslash and drive letters, as compared to *nix*.
One important feature is that the user can do string manipulations
on a file path without having to worry too much about the platform
because the path management code will properly handle most mixed cases.
So one can for example concatenate a path suffix that uses forward
slashes to a Windows path and have it work correctly.
The conversion code is in libdispatch/dpathmgr.c, and the
important function there is NCpathcvt which does the proper
conversions to the local path format.
As a rule, most code should just replace their file operations with
the corresponding NCxxx ones defined in include/ncpathmgmt.h. These
NCxxx functions all call NCpathcvt on their path arguments before
executing the actual file operation.
In some rare cases, the client may need to directly use NCpathcvt,
but this should be avoided as much as possible. If there is a need
for supporting a new file operation not already in ncpathmgmt.h, then
use the code in dpathmgr.c as a template. Also please notify Unidata
so we can include it as a formal part or our supported operations.
Also, if you see an operation in the library that is not using the
NCxxx form, then please submit an issue so we can fix it.
Misc. Changes:
* Clean up the utf8 testing code; it is impossible to get some
tests to work under windows using shell scripts; the args do
not pass as utf8 but as some other encoding.
* Added an extra utf8 test case: test_unicode_path.sh
* Add a true test for HDF5 1.10.6 or later because as noted in
PR https://github.com/Unidata/netcdf-c/pull/1794,
HDF5 changed its Windows file path handling.
The XArray implementation that uses Zarr for storage
provides a mechanism to simulate named dimensions.
It does this by adding a per-variable attribute called
_ARRAY_DIMENSIONS. This attribute contains a list of names
to be matched against the shape values of the variable.
In effect a named dimension is created with the name
_ARRAY_DIMENSIONS(i) and length shape(i) for all i
in range 0..rank(variable).
Both read and write support is provided.
This XArray support is only invoked if the mode value
of "xarray" is defined. So for example, as in this URL.
````
https://s3.us-west-1.amazonaws.com/bucket/dataset#mode=nczarr,xarray,s3
````
Note that the "xarray" mode flag also implies mode flag "zarr", so the above
is equivalent to this URL.
````
https://s3.us-west-1.amazonaws.com/bucket/dataset#mode=nczarr,zarr,xarray,s3
````
The primary change to implement this was to unify the handling
of dimension references in libnczarr/zsync.
A test for this and other pure-zarr features was added as
nczarr_test/run_purezarr.sh
Other changes:
* Make sure distcheck leaves no files around.
* Change the special attribute flag DIMSCALEFLAG to HIDDENATTRFLAG
to support the xarray attribute.
* Annotate the zmap implementations with feature flags such as
WRITEONCE (for zip files).
The primary change is to support the use of a zip file as a
storage format. Simultaneously the .nz4 support is made obsolete
Use of zip requires the libzip support library, so a number of
changes to the build files (Makefile.am, CMakeLists.txt) are
necessary to locate and incorporate libzip. The nczarr_tests
tests are also changed to add zip testing.
Other changes:
* Make sure distcheck leaves no files around.
* Add some functions to netcdf_aux to export some functions of libnetcdf.
* Add a new error NC_EFOUND as the complement of NC_EEMPTY.
* Add tracing support to nclog and use it in libnczarr.
* Modify the zmap interface to support the writeonce semantics of zip.
* Create a new s3util.c to support a variety of S3 auxilliary functions.
* EXTERNL'ize a number of functions so they can be used in s3util.
* Add support for the S3 ListObjects CommonPrefixes mechanism
to improve search.
* Add experimental support for running nczarr X s3 tests against
the actual Amazon S3 cloud.
* Replace wholevar with more useful wholechunk optimization
* Add optimization to read multiple values at one time
* Replace NCDEFAULT_get/put_vars with native nczarr versions.
* Clarify chunk projection computations
* zdebdispatch.h
* Add more chunking test cases and re-enable run_chunkcases
* If !szip, then suppress deflate interference test
* Make H5Znoop(1) filter produce more information
* cleanup bzlib.c API
* Turn back on the DAP2 remote tests, but leave DAP4 remote tests disabled.
* Turn on selected non-remote DAP4 tests
* Replace d4crc32 with the equivalents in libdispatch
re: https://github.com/Unidata/netcdf-c/issues/1923
re: https://github.com/Unidata/netcdf-c/issues/1921
The issue was raised about the order of returned filter ids
for nc_inq_var_filter_ids() when creating a file as opposed
to later reading the file.
For creation, the order is the same as the order in which the
calls to nc_def_var_filter() occur.
However, after the file is closed and then reopened for reading,
the question was raised if the returned order is the same or the reverse.
In fact the order is the same in both cases.
This PR extends the existing filter order testcase to check the create
versus read orders. This also required changing the H5Znoop(1) filters
in the plugins directory.
Misc. Unrelated Changes
1. fix calls to fdopen under windows
2. Temporarily suppres the nczarr_tests/run_chunkcases test
since it seems to be causing problems with github actions.
Primary Fixes:
* Add a whole variable optimization -- used in the rare case that nc_get/put_vara covers the whole of a variable and the variable has a single chunk.
* Fix chunking error when stride causes whole chunks to be skipped.
* Fix some memory leaks
* Add test cases
* Add one performance test to nczarr_test/. This uses the timer utils from unit_test: timer_utils.[ch].
* Move ncdumpchunks utility from ncdump to nczarr_test
Misc. Other Changes:
* Make check for aws libraries conditional on --enable-nczarr-s3
* Remove all but one bm tests from nczarr_test until they are working.
* Remove another dependency on HDF5 from supposedly non-HDF5 specific code; specifically hdf5_log_hdf5.
* Make the BAIL2 macro be hdf5 specific and replace elsewhere with an HDF5 independent equivalent.
* Move hdf5cache.c to libsrc4/nc4cache.c because it is used by nczarr.
* Modify unit_tests so that some of them are run even if using Windows.
* Misc. small bug fixes and refactors and memory leaks.
* Rename some conflicting tests for cmake.
* Attempted to make nc_perf work with cmake and failed.
Re: GH Issue https://github.com/Unidata/netcdf-c/issues/1900
Apparently the clock_gettime() function is not always available.
It is used in unit_test/tst_exhash.c and unit_test/tst_xcache.c.
To solve this, a number of things were changed:
* Move the timing code to a new file unit_tests/timer_utils.[ch]
* Modify the timing code to choose one of several timing methods
depending on availability. The prioritized order is as follows:
1. If Windows, use the QueryPerformanceCounter mechanism else
2. Use clock_gettime if available else
3. Use gettimeofday if available else
4. Use getrusage if available
Note that the resolution of 3 and 4 is less than 1 or 2.
Misc. Other Changes:
* Move the test in CMakeLists.txt that disables unit tests for WIN32 to unit_test/CMakeLists.txt since some unit tests actually work under Visual Studio.
* Fix some of the unit tests to work under visual studio
* Fix problem with using remove() in zmap_nzf.c
* Remove some warning about use of EXTERNL