re: https://github.com/zarr-developers/zarr-specs/issues/41
After discussions with the Zarr community, it was decided to
convert to a new representation of the NCZarr meta-data extensions: version 2.
These extensions store information necessary to mapping the Zarr data model
to the netcdf-4 data model.
The basic change is to remove the NCZarr specific objects: .nczarr, .nczgroup, .nczarray, and .nczattr.
The contents of these objects is moved into the corresponding existing Zarr objects as special keys. The mapping is as follows:
* ''.nczarr'' => ''/.zgroup/_NCZARR_SUPERBLOCK_''
* ''.nczgroup => ''.zgroup/_NCZARR_GROUP_''
* ''.nczarray => ''.zarray/_NCZARR_ARRAY_''
* ''.nczattr => ''.zattr/_NCZARR_ATTR_''
Backward compatibility is maintained by looking for the object ''/.nczarr''
and if found, then assuming that the dataset is in the older version 1 format.
This compatibility only supports reading of such version 1 datasets.
Documentation and test cases are also added.
Misc. Other Changes:
1. The json parsing code was added to the general library instead of nczarr only (ncjson.c, ncjson.h).
2. Improved support for different platform paths by allowing conversion
to a single common path representation.
3. Add some new error codes.
4. Modify nccopy usage to mention the new chunking specification.
This is a follow-on to pull request
````https://github.com/Unidata/netcdf-c/pull/1959````,
which fixed up type scoping.
The primary changes are to _nc\_inq\_dimid()_ and to ncdump.
The _nc\_inq\_dimid()_ function is supposed to allow the name to be
and FQN, but this apparently never got implemented. So if was modified
to support FQNs.
The ncdump program is supposed to output fully qualified dimension names
in its generated CDL file under certain conditions.
Suppose ncdump has a netcdf-4 file F with variable V, and V's parent group
is G. For each dimension id D referenced by V, ncdump needs to determine
whether to print its name as a simple name or as a fully qualified name (FQN).
The algorithm is as follows:
1. Search up the tree of ancestor groups.
2. If one of those ancestor groups contains the dimid, then call it dimgrp.
3. If one of those ancestor groups contains a dim with the same name as the dimid, but with a different dimid, then record that as duplicate=true.
4. If dimgrp is defined and duplicate == false, then we do not need an fqn.
5. If dimgrp is defined and duplicate == true, then we do need an fqn to avoid incorrectly using the duplicate.
6. If dimgrp is undefined, then do a preorder breadth-first search of all the groups looking for the dimid.
7. If found, then use the fqn of the first found such dimension location.
8. If not found, then fail.
Test case ncdump/test_scope.sh was modified to test the proper
operation of ncdump and _nc\_inq\_dimid()_.
Misc. Other Changes:
* Fix nc_inq_ncid (NC4_inq_ncid actually) to return root group id if the name argument is NULL.
* Modify _ncdump/printfqn_ to print out a dimid FQN; this supports verification that the resulting .nc files were properly created.
The primary change is to support the use of a zip file as a
storage format. Simultaneously the .nz4 support is made obsolete
Use of zip requires the libzip support library, so a number of
changes to the build files (Makefile.am, CMakeLists.txt) are
necessary to locate and incorporate libzip. The nczarr_tests
tests are also changed to add zip testing.
Other changes:
* Make sure distcheck leaves no files around.
* Add some functions to netcdf_aux to export some functions of libnetcdf.
* Add a new error NC_EFOUND as the complement of NC_EEMPTY.
* Add tracing support to nclog and use it in libnczarr.
* Modify the zmap interface to support the writeonce semantics of zip.
* Create a new s3util.c to support a variety of S3 auxilliary functions.
* EXTERNL'ize a number of functions so they can be used in s3util.
* Add support for the S3 ListObjects CommonPrefixes mechanism
to improve search.
* Add experimental support for running nczarr X s3 tests against
the actual Amazon S3 cloud.
In case HDF5 adds more storage specifications, netcdf4 should be able to
cope with them by default. Further specializations could be added
nonetheless.
cloud using a variant of the Zarr protocol and storage
format. This enhancement is generically referred to as "NCZarr".
The data model supported by NCZarr is netcdf-4 minus the user-defined
types and the String type. In this sense it is similar to the CDF-5
data model.
More detailed information about enabling and using NCZarr is
described in the document NUG/nczarr.md and in a
[Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in).
WARNING: this code has had limited testing, so do use this version
for production work. Also, performance improvements are ongoing.
Note especially the following platform matrix of successful tests:
Platform | Build System | S3 support
------------------------------------
Linux+gcc | Automake | yes
Linux+gcc | CMake | yes
Visual Studio | CMake | no
Additionally, and as a consequence of the addition of NCZarr,
major changes have been made to the Filter API. NOTE: NCZarr
does not yet support filters, but these changes are enablers for
that support in the future. Note that it is possible
(probable?) that there will be some accidental reversions if the
changes here did not correctly mimic the existing filter testing.
In any case, previously filter ids and parameters were of type
unsigned int. In order to support the more general zarr filter
model, this was all converted to char*. The old HDF5-specific,
unsigned int operations are still supported but they are
wrappers around the new, char* based nc_filterx_XXX functions.
This entailed at least the following changes:
1. Added the files libdispatch/dfilterx.c and include/ncfilter.h
2. Some filterx utilities have been moved to libdispatch/daux.c
3. A new entry, "filter_actions" was added to the NCDispatch table
and the version bumped.
4. An overly complex set of structs was created to support funnelling
all of the filterx operations thru a single dispatch
"filter_actions" entry.
5. Move common code to from libhdf5 to libsrc4 so that it is accessible
to nczarr.
Changes directly related to Zarr:
1. Modified CMakeList.txt and configure.ac to support both C and C++
-- this is in support of S3 support via the awd-sdk libraries.
2. Define a size64_t type to support nczarr.
3. More reworking of libdispatch/dinfermodel.c to
support zarr and to regularize the structure of the fragments
section of a URL.
Changes not directly related to Zarr:
1. Make client-side filter registration be conditional, with default off.
2. Hack include/nc4internal.h to make some flags added by Ed be unique:
e.g. NC_CREAT, NC_INDEF, etc.
3. cleanup include/nchttp.h and libdispatch/dhttp.c.
4. Misc. changes to support compiling under Visual Studio including:
* Better testing under windows for dirent.h and opendir and closedir.
5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags
and to centralize error reporting.
6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them.
7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible.
Changes Left TO-DO:
1. fix provenance code, it is too HDF5 specific.
re: https://github.com/Unidata/netcdf-c/issues/1584
Support has been added for multiple filters per variable. This
affects a number of components in netcdf. The new APIs are
documented in NUG/filters.md.
The primary changes are:
* A set of new functions are provided (see __include/netcdf_filter.h__).
- Obtain a list of the filters associated with a variable
- Obtain the parameters for a specific filter.
* The existing __nc_inq_var_filter__ function now returns info
about the first defined filter.
* The utilities (ncgen, ncdump, and nccopy) now support
an extended format for specifying a sequence of filters.
The general form is __<filter>|<filter>..._.
* The ncdump **_Filter** attribute now dumps a list of all the
filters associated with a variable using the above new format.
* Filter specifications can now use a filter name instead of number
for filters known to the netcdf library, which in turn is taken
from the HDF5 filter registration page.
* New errors are defined: NC_EFILTER and NC_ENOFILTER. The latter
is returned if an attempt is made to access an unknown filter.
* Internally, the dispatch table has been extended to add a function
to handle all of the filter functions.
* New, filter-related, tests were added to nc_test4.
* A new plugin was added to the plugins directory to help with testing.
Notes:
1. The shuffle and fletcher32 filters are not part of the multifilter system.
Misc. changes:
1. A debug module was added to libhdf5 to help catch error locations.
re: issue https://github.com/Unidata/netcdf-c/issues/1251
Assume that you have the URL to a remote dataset
which is a normal netcdf-3 or netcdf-4 file.
This PR allows the netcdf-c to read that dataset's
contents as a netcdf file using HTTP byte ranges
if the remote server supports byte-range access.
Originally, this PR was set up to access Amazon S3 objects,
but it can also access other remote datasets such as those
provided by a Thredds server via the HTTPServer access protocol.
It may also work for other kinds of servers.
Note that this is not intended as a true production
capability because, as is known, this kind of access to
can be quite slow. In addition, the byte-range IO drivers
do not currently do any sort of optimization or caching.
An additional goal here is to gain some experience with
the Amazon S3 REST protocol.
This architecture and its use documented in
the file docs/byterange.dox.
There are currently two test cases:
1. nc_test/tst_s3raw.c - this does a simple open, check format, close cycle
for a remote netcdf-3 file and a remote netcdf-4 file.
2. nc_test/test_s3raw.sh - this uses ncdump to investigate some remote
datasets.
This PR also incorporates significantly changed model inference code
(see the superceded PR https://github.com/Unidata/netcdf-c/pull/1259).
1. It centralizes the code that infers the dispatcher.
2. It adds support for byte-range URLs
Other changes:
1. NC_HDF5_finalize was not being properly called by nc_finalize().
2. Fix minor bug in ncgen3.l
3. fix memory leak in nc4info.c
4. add code to walk the .daprc triples and to replace protocol=
fragment tag with a more general mode= tag.
Final Note:
Th inference code is still way too complicated. We need to move
to the validfile() model used by netcdf Java, where each
dispatcher is asked if it can process the file. This decentralizes
the inference code. This will be done after all the major new
dispatchers (PIO, Zarr, etc) have been implemented.
re: https://github.com/Unidata/netcdf-c/issues/1154
Inadvertently, the behavior of NC_DISKLESS with nc_create() was
changed in release 4.6.1. Previously, the NC_WRITE flag needed
to be explicitly used with NC_DISKLESS in order to cause the
created file to be persisted to disk.
Additional analyis indicated that the current NC_DISKLESS
implementation was seriously flawed.
This PR attempts to clean up and regularize the situation with
respect to NC_DISKLESS control. One important aspect of diskless
operation is that there are two different notions of write.
1. The file is read-write vs read-only when using the netcdf API.
2. The file is persisted or not to disk at nc_close().
Previously, these two were conflated. The rules now are
as follows.
1. NC_DISKLESS + NC_WRITE means that the file is read/write using the netcdf API
2. NC_DISKLESS + NC_PERSIST means that the file is persisted to a disk file at nc_close.
3. NC_DISKLESS + NC_PERSIST + NC_WRITE means both 1 and 2.
The NC_PERSIST flag is new and takes over the obsolete NC_MPIPOSIX flag.
NC_MPIPOSIX is still defined, but is now an alias for the NC_MPIIO flag.
It is also now the case that for netcdf-4, NC_DISKLESS is independent
of NC_INMEMORY and in fact it is an error to specify both flags
simultaneously.
Finally, the MMAP code was fixed to use NC_PERSIST as well.
Also marked MMAP as deprecated.
Also added a test case to test various combinations of NC_DISKLESS,
NC_PERSIST, and NC_WRITE.
This PR affects a number of files and especially test cases
that used NC_DISKLESS.
Misc. Unrelated fixes
1. fixed some warnings in ncdump/dumplib.c
re: issue https://github.com/Unidata/netcdf-c/issues/1151
Modify DAP2 and DAP4 code to handle case when _FillValue type is not
same as the parent variable type.
Specifically:
1. Define a parameter [fillmismatch] to allow this mismatch;
default is to disallow.
2. If allowed, forcibly change the type of the _FillValue to match
the parent variable.
3. If allowed Convert the values to match new type
4. Generate a log message
5. if not allowed, then fail
Implementing this required some changes to ncdap_test/dapcvt.c
Also added test cases.
Minor Unrelated Changes:
1. There were a number of warnings about e.g.
assigning a const char* to a char*. Fix these
2. In nccopy.1, replace .NP with .IP "n"
(re PR https://github.com/Unidata/netcdf-c/pull/1144)
3. fix minor error in ncdump/ocprint
Add the ability to set some additional curlopt values via .daprc (aka .dodsrc).
This effects both DAP2 and DAP4 protocols.
Related issues:
[1] re: esupport: KOZ-821332
[2] re: github issue https://github.com/Unidata/netcdf4-python/issues/836
[3] re: github issue https://github.com/Unidata/netcdf-c/issues/1074
1. CURLOPT_BUFFERSIZE: Relevant to [1]. Allow user to set the read/write
buffersizes used by curl.
This is done by adding the following to .daprc (aka .dodsrc):
HTTP.READ.BUFFERSIZE=n
where n is the buffersize in bytes. There is a built-in (to curl)
limit of 512k for this value.
2. CURLOPT_TCP_KEEPALIVE (and CURLOPT_TCP_KEEPIDLE and CURLOPT_TCP_KEEPINTVL):
Relevant (maybe) to [2] and [3]. Allow the user to turn on KEEPALIVE
This is done by adding the following to .daprc (aka .dodsrc):
HTTP.KEEPALIVE=on|n/m
If the value is "on", then simply enable default KEEPALIVE. If the value
is n/m, then enable KEEPALIVE and set KEEPIDLE to n and KEEPINTVL to m.
The fix includes the following changes.
1. Checking and using the default file format at file create time is now
done only when the create mode (argument cmode) does not include any
format related flags, i.e. NC_64BIT_OFFSET, NC_64BIT_DATA,
NC_CLASSIC_MODEL, and NC_NETCDF4.
2. Adjustment of cmode based on the default format is now done in
NC_create() only. The idea is to adjust cmode before entering the
dispatcher's file create subroutine.
3. Any adjustment of cmode is removed from all I/O dispatchers, i.e.
NC4_create(), NC3_create(), and NCP_create().
4. Checking for illegal cmode has been done in check_create_mode() called
in NC_create(). This commit removes the redundant checking from
NCP_create().
5. Remove PnetCDF tests in nc_test/tst_names.c, so it can focus on testing
all classic formats and netCDF4 formats.
Two new test programs are added. They can be used to test netCDF with and
without this commit.
1. nc_test/tst_default_format.c
2. nc_test/tst_default_format_pnetcdf.c (use when PnetCDF is enabled).
and https://github.com/Unidata/netcdf-c/issues/708
Expand the NC_INMEMORY capabilities to support writing and accessing
the final modified memory.
Three new functions have been added:
nc_open_memio, nc_create_mem, and nc_close_memio.
The following new capabilities were added.
1. nc_open_memio() allows the NC_WRITE mode flag
so a chunk of memory can be passed in and be modified
2. nc_create_mem() allows the NC_INMEMORY flag to be set
to cause the created file to be kept in memory.
3. nc_close_mem() allows the final in-memory contents to be
retrieved at the time the file is closed.
4. A special flag, NC_MEMIO_LOCK, is provided to ensure that
the provided memory will not be freed or reallocated.
Note the following.
1. If nc_open_memio() is called with NC_WRITE, and NC_MEMIO_LOCK is not set,
then the netcdf-c library will take control of the incoming memory.
This means that the original memory block should not be freed
but the block returned by nc_close_mem() must be freed.
2. If nc_open_memio() is called with NC_WRITE, and NC_MEMIO_LOCK is set,
then modifications to the original memory may fail if the space available
is insufficient.
Documentation is provided in the file docs/inmemory.md.
A test case is provided: nc_test/tst_inmemory.c driven by
nc_test/run_inmemory.sh
WARNING: changes were made to the dispatch table for
the close entry. From int (*close)(int) to int (*close)(int,void*).
(I hope) metadata mechanism. This mostly just adds new pieces of
code (e.g. nclistmap) and does some minor fixes.
It should be transparent to everything else.
The next set of changes will be the big step.