Commit Graph

147 Commits

Author SHA1 Message Date
Ward Fisher
5973f3d683
Merge pull request #2847 from K20shores/packaging
Use cmake netCDF with target_* for many options
2024-03-11 15:55:36 -06:00
Ward Fisher
cc1494d988
Merge branch 'main' into awsdfalt.dmh 2024-03-05 12:50:08 -07:00
Kyle Shores
c29db073eb setting dll export on each target 2024-02-29 11:36:47 -06:00
Peter Hill
907e5cc43f
CMake: Use target_link_libraries with HDF5::HDF5 target 2024-02-16 10:51:20 +00:00
Kyle Shores
f9e3247164 merging main, addressing some PR comments 2024-02-07 09:53:45 -06:00
Kyle Shores
f41178f6e0 adding hdf5 to nczarr 2024-01-26 16:48:50 -06:00
Ward Fisher
16bcb1ddb9 Merge branch 'silence-nclist-warnings' of https://github.com/ZedThree/netcdf-c into rebase-gh2812.wif 2024-01-19 11:11:21 -07:00
Ward Fisher
c1fb4b0bae
Merge pull request #2809 from ZedThree/silence-malloc-warnings
Silence conversion warnings from `malloc` arguments
2023-12-21 17:20:56 -07:00
Ward Fisher
6a628b9ca7 Rebased PR by hand against main. 2023-12-12 16:49:13 -07:00
wkliao
52198b3f12 count argument in H5Sselect_hyperslab
Argument 'count' in NetCDF is not exactly the same as the 'count' in
H5Sselect_hyperslabs(space_id, op, start, stride, count, block).
When the argument 'stride' is NULL, NetCDF's 'count' should be used in
argument 'block', for example,
   H5Sselect_hyperslabs(space_id, op, start, NULL, ones, count);
where 'one' is an array of all 1s. Although using NULL 'block' below
   H5Sselect_hyperslabs(space_id, op, start, NULL, count, NULL);
has the same effect, HDF5 internally stores the space of a subarray as a
list of single elements, instead of a "block", which can affect the
performance.
2023-12-12 16:45:19 -07:00
Ward Fisher
2616e2c411
Merge pull request #2745 from e-kwsm/chmod-x
chore: unset executable flag
2023-12-11 17:28:46 -07:00
Sean McBride
adc4dc1435 Replaced some sprintf with snprintf with aid of new variable containing size
One case required slightly complicated accounting of how much space is left in the buffer.
2023-12-08 13:30:38 -05:00
Dennis Heimbigner
b83ff8fc2b oops 2023-12-07 15:48:39 -07:00
Dennis Heimbigner
80dd2e95c6 Fix truncate code for S3 2023-12-07 15:38:46 -07:00
Dennis Heimbigner
d81d95a5cc unthrow 2023-12-02 21:03:25 -07:00
Peter Hill
4e1ff160e1
Change signature of nczm_sortenvv to take size_t
Always called with a `size_t` and passes `n` to `qsort` which expects
a `size_t` anyway
2023-11-28 16:28:31 +00:00
Peter Hill
653e09fd6d
Try to more consistently use size_t for nclistget index argument 2023-11-28 16:28:31 +00:00
Peter Hill
d07dac918c
Silence conversion warnings from malloc arguments
Mostly just add an explicit cast when calling `malloc` and its
variants. Sometimes instead change the type of a local variable if
this would silence multiple warnings.
2023-11-24 18:20:52 +00:00
Dennis Heimbigner
87497d79cf update 2023-11-04 16:08:59 -06:00
Dennis Heimbigner
5fa2defc7e Improve fetch performance of DAP4
Prior to this PR, DAP4 always fetched the whole (constrained) dataset
This PR changes the query processing so
1. It reads data on a per-variable request (equivalent to calling nc_get_var()).
2. It tracks a response for every query.

Most of the changes reflect having to do per-variable requests.
In any case, doing all this significantly reduces the amount of data transmitted and hence speeds up DAP4 requests.
2023-10-08 19:59:28 -06:00
Dennis Heimbigner
1552d894a2 Cleanup a number of issues.
re: Issue https://github.com/Unidata/netcdf-c/issues/2748

This PR fixes a number of issues and bugs.

## s3cleanup fixes
* Delete extraneous s3cleanup.sh related files.
* Remove duplicate s3cleanup.uids entries.

## Support the Google S3 API
* Add code to recognize "storage.gooleapis.com"
* Add extra code to track the kind of server being accessed: unknown, Amazon, Google.
* Add a new mode flag "gs3" (analog to "s3") to support this api.
* Modify the S3 URL code to support this case.
* Modify the listobjects result parsing because Google returns some non-standard XML elements.
* Change signature and calls for NC_s3urlrebuild.

## Handle corrupt Zarr files where shape is empty for a variable.
Modify behavior when a variable's "shape" dictionary entry.
Previously it returned an error, but now it suppresses such a variable.
This change makes it possible to read non-corrupt data from the file.
Also added a test case.

## Misc. Other Changes
* Fix the nclog level handling to suppress output by default.
* Fix de-duplicates code in ncuri.c
* Restore testing of iridl.ldeo.columbia.edu.
* Fix bug in define_vars() which did not always do a proper reclaim between variables.
2023-10-08 11:22:52 -06:00
Ward Fisher
80c746981d
Merge pull request #2758 from ZedThree/cmake-fix-linking-mpi
CMake: Ensure all libraries link against MPI if needed
2023-10-02 16:20:43 -06:00
Peter Hill
18c813b20b
CMake: Ensure all libraries link against MPI if needed 2023-10-02 10:31:24 +01:00
Dennis Heimbigner
df3636b959 Mitigate S3 test interference + Unlimited Dimensions in NCZarr
This PR started as an attempt to add unlimited dimensions to NCZarr.
It did that, but this exposed significant problems with test interference.
So this PR is mostly about fixing -- well mitigating anyway -- test
interference.

The problem of test interference is now documented in the document docs/internal.md.
The solutions implemented here are also describe in that document.
The solution is somewhat fragile but multiple cleanup mechanisms
are provided. Note that this feature requires that the
AWS command line utility must be installed.

## Unlimited Dimensions.
The existing NCZarr extensions to Zarr are modified to support unlimited dimensions.
NCzarr extends the Zarr meta-data for the ".zgroup" object to include netcdf-4 model extensions. This information is stored in ".zgroup" as dictionary named "_nczarr_group".
Inside "_nczarr_group", there is a key named "dims" that stores information about netcdf-4 named dimensions. The value of "dims" is a dictionary whose keys are the named dimensions. The value associated with each dimension name has one of two forms
Form 1 is a special case of form 2, and is kept for backward compatibility. Whenever a new file is written, it uses format 1 if possible, otherwise format 2.
* Form 1: An integer representing the size of the dimension, which is used for simple named dimensions.
* Form 2: A dictionary with the following keys and values"
   - "size" with an integer value representing the (current) size of the dimension.
   - "unlimited" with a value of either "1" or "0" to indicate if this dimension is an unlimited dimension.

For Unlimited dimensions, the size is initially zero, and as variables extend the length of that dimension, the size value for the dimension increases.
That dimension size is shared by all arrays referencing that dimension, so if one array extends an unlimited dimension, it is implicitly extended for all other arrays that reference that dimension.
This is the standard semantics for unlimited dimensions.

Adding unlimited dimensions required a number of other changes to the NCZarr code-base. These included the following.
* Did a partial refactor of the slice handling code in zwalk.c to clean it up.
* Added a number of tests for unlimited dimensions derived from the same test in nc_test4.
* Added several NCZarr specific unlimited tests; more are needed.
* Add test of endianness.

## Misc. Other Changes
* Modify libdispatch/ncs3sdk_aws.cpp to optionally support use of the
   AWS Transfer Utility mechanism. This is controlled by the
   ```#define TRANSFER```` command in that file. It defaults to being disabled.
* Parameterize both the standard Unidata S3 bucket (S3TESTBUCKET) and the netcdf-c test data prefix (S3TESTSUBTREE).
* Fixed an obscure memory leak in ncdump.
* Removed some obsolete unit testing code and test cases.
* Uncovered a bug in the netcdf-c handling of big-endian floats and doubles. Have not fixed yet. See tst_h5_endians.c.
* Renamed some nczarr_tests testcases to avoid name conflicts with nc_test4.
* Modify the semantics of zmap\#ncsmap_write to only allow total rewrite of objects.
* Modify the semantics of zodom to properly handle stride > 1.
* Add a truncate operation to the libnczarr zmap code.
2023-09-26 16:56:48 -06:00
Eisuke Kawashima
d755c4ff32
chore: unset executable flag 2023-08-23 13:31:42 +09:00
Dennis Heimbigner
9094d25409 Fix major bug in the NCZarr cache management
re: PR https://github.com/Unidata/netcdf-c/pull/2734
re: Issue https://github.com/Unidata/netcdf-c/issues/2733

As a result of an investigation by https://github.com/uweschulzweida,
I discovered a significant bug in the NCZarr cache management.
This PR extends the above PR to fix that bug.

## Change Overview
* Insert extra checks for cache overflow.
* Added test cases contingent on the --enable-large-file-tests option.
* The Columbia server is down, so it has been temporarily disabled.
2023-08-16 23:07:05 -06:00
Dennis Heimbigner
f1a3a64b65 Cleanup the handling of cache parameters.
re: https://github.com/Unidata/netcdf-c/issues/2733

When addressing the above issue, I noticed that there was a disconnect
in NCZarr between nc_set_chunk_cache and nc_set_var_chunk cache.
Specifically, setting nc_set_chunk_cache had no impact on the per-variable cache parameters when nc_set_var_chunk_cache was not used.

So, modified the NCZarr code so that the per-variable cache parameters are set in this order (#1 is first choice):
1. The values set by nc_set_var_chunk_cache
2. The values set by nc_set_chunk_cache
3. The defaults set by configure.ac
2023-08-10 16:57:57 -06:00
Dennis Heimbigner
db772ce34c Explicitly suppress variable length type compression
re: PR https://github.com/Unidata/netcdf-c/pull/2716).
re: Issue https://github.com/Unidata/netcdf-c/issues/2189

The basic change is to make use of the fact that HDF5 automatically suppresses optional filters when an attempt is made to apply them to variable-length typed arrays.
This means that e.g. ncdump or nccopy will properly see meaningful data.
Note that if a filter is defined as HDF5 mandatory, then the corresponding variable will be suppressed and will be invisible to ncdump and nccopy.
This functionality is also propagated to NCZarr.

This PR makes some minor changes to PR https://github.com/Unidata/netcdf-c/pull/2716 as follows:
* Move the test for filter X variable-length from dfilter.c down into the dispatch table functions.
* Make all filters for HDF5 optional rather than mandatory so that the built-in HDF5 test for filter X variable-length will be invoked.

The test case for this was expanded to verify that the filters are defined, but suppressed.
2023-08-03 15:47:28 -06:00
Dennis Heimbigner
8cab468169 Suppress filters on variables with non-fixed-size types.
re: Discussion https://github.com/Unidata/netcdf-c/discussions/2554
re: PR https://github.com/Unidata/netcdf-c/pull/2231
re: Issue https://github.com/Unidata/netcdf-c/issues/2189

After some discussion, the issue of applying filters on variables
whose type is not fixed size, was resolved as follows:
1. A call to nc_def_var_filter will ignore such filters, but will issue a log warning.
2. Loading (from an existing file) a variable whose type is not fixed-size and which has filters, will cause the variable to be suppressed.

This PR enforces those rules.

### Misc. Other changes
* Add a test case to test the vlen change.
* Make some minor clean-ups in various cmake and automake files.
* Remove unused test
2023-06-21 14:46:22 -06:00
Dennis Heimbigner
fb40a72b45 Improve performance of the nc_reclaim_data and nc_copy_data functions.
re: Issue https://github.com/Unidata/netcdf-c/issues/2685
re: PR https://github.com/Unidata/netcdf-c/pull/2179

As noted in PR https://github.com/Unidata/netcdf-c/pull/2179,
the old code did not allow for reclaiming instances of types,
nor for properly copying them. That PR provided new functions
capable of reclaiming/copying instances of arbitrary types.

However, as noted by Issue https://github.com/Unidata/netcdf-c/issues/2685, using these
most general functions resulted in a significant performance
degradation, even for common cases.

This PR attempts to mitigate the cost of using the general
reclaim/copy functions in two ways.

First, the previous functions operating at the top level by
using ncid and typeid arguments. These functions were augmented
with equivalent versions that used the netcdf-c library internal
data structures to allow direct access to needed information.
These new functions are used internally to the library.

The second mitigation involves optimizing the internal functions
by providing early tests for common cases. This avoids
unnecessary recursive function calls.

The overall result is a significant improvement in speed by a
factor of roughly twenty -- your mileage may vary. These
optimized functions are still not as fast as the original (more
limited) functions, but they are getting close. Additional optimizations are
possible. But the cost is a significant "uglification" of the
code that I deemed a step too far, at least for now.

## Misc. Changes
1. Added a test case to check the proper reclamation/copy of complex types.
2. Found and fixed some places where nc_reclaim/copy should have been used.
3. Replaced, in the netcdf-c library, (almost all) occurrences of nc_reclaim_copy with calls to NC_reclaim/copy. This plus the optimizations is the primary speed-up mechanism.
4. In DAP4, the metadata is held in a substrate in-memory file; this required some changes so that the reclaim/copy code accessed that substrate dispatcher rather than the DAP4 dispatcher.
5. Re-factored and isolated the code that computes if a type is (transitively) variable-sized or not.
6. Clean up the reclamation code in ncgen; adding the use of nc_reclaim exposed some memory problems.
2023-05-20 17:11:25 -06:00
Ward Fisher
bfb8a31aed
Merge pull request #2644 from ZhipengXue97/main
Fix potential dead store
2023-05-15 10:38:59 -06:00
Dennis Heimbigner
aeaf9e4bec notrace 2023-04-26 14:16:22 -06:00
Dennis Heimbigner
49737888ca Improve S3 Documentation and Support
## Improvements to S3 Documentation
* Create a new document *quickstart_paths.md* that give a summary of the legal path formats used by netcdf-c. This includes both file paths and URL paths.
* Modify *nczarr.md* to remove most of the S3 related text.
* Move the S3 text from *nczarr.md* to a new document *cloud.md*.
* Add some S3-related text to the *byterange.md* document.

Hopefully, this will make it easier for users to find the information they want.

## Rebuild NCZarr Testing
In order to avoid problems with running make check in parallel, two changes were made:
1. The *nczarr_test* test system was rebuilt. Now, for each test.
any generated files are kept in a test-specific directory, isolated
from all other test executions.
2. Similarly, since the S3 test bucket is shared, any generated S3 objects
are isolated using a test-specific key path.

## Other S3 Related Changes
* Add code to ensure that files created on S3 are reclaimed at end of testing.
* Used the bash "trap" command to ensure S3 cleanup even if the test fails.
* Cleanup the S3 related configure.ac flag set since S3 is used in several places. So now one should use the option *--enable-s3* instead of *--enable-nczarr-s3*, although the latter is still kept as a deprecated alias for the former.
* Get some of the github actions yml to work with S3; required fixing various test scripts adding a secret to access the Unidata S3 bucket.
* Cleanup S3 portion of libnetcdf.settings.in and netcdf_meta.h.in and test_common.in.
* Merge partial S3 support into dhttp.c.
* Create an experimental s3 access library especially for use with Windows. It is enabled by using the options *--enable-s3-internal* (automake) or *-DENABLE_S3_INTERNAL=ON* (CMake). Also add a unit-test for it.
* Move some definitions from ncrc.h to ncs3sdk.h

## Other Changes
* Provide a default implementation of strlcpy and move this and similar defaults into *dmissing.c*.
2023-04-25 17:15:06 -06:00
Ward Fisher
dc6e392c9d
Merge branch 'main' into znotnc.dmh 2023-04-12 16:02:34 -06:00
Dennis Heimbigner
d7d216a3f5 Merge branch 'master' into dap4tests2.dmh 2023-03-16 14:03:29 -06:00
Dennis Heimbigner
69b5fa4f4e fix memory leak 2023-03-13 20:11:54 -06:00
Dennis Heimbigner
5c07ebfd11 Check at nc_open if file appears to be in NCZarr/Zarr format.
re: Issue https://github.com/Unidata/netcdf-c/issues/2656

Charlie Zender notes that *nc_open()* does not immediately detect that the given path refers to a file not in zarr format. Rather it fails later when trying to read the (meta-)data.

The reason is that the Zarr format is highly decentralized. There is no easily testable magic number or superblock to look for. In effect the only way to see if a directory is Zarr is to successfully read it.

It is possible to heuristically detect that a path refers to an NCZarr/Zarr file by doing a breadth-first search of the file tree starting at the given path. If the search encounters a file whose name starts with ".z", then assume it is a legitimate NCZarr/Zarr file. Of course, this test could be costly. One hopes that in practice that it is not.

In addition to this fix, a corresponding test case was added.

## Other Changes

re: PR https://github.com/Unidata/netcdf-c/pull/2529

There was an error under Cygwin for this PR that is fixed in this PR. The fix was to convert all *noinst_* references to *check_*.
2023-03-13 13:24:14 -06:00
Dennis Heimbigner
69e84fe9f1 Fix byterange handling of some URLS
re: Issue

The byterange handling of the following URLS fails.

### Problem 1: "https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc#mode=bytes"
It turns out that byterange in hdf5 has two possible targets: S3 and not-S3 (e.g. a thredds server or the crudata URL above). Each uses a different HDF5 Virtual File Driver (VFD).
I incorrectly set up the byterange code in libhdf5 so that it would choose one or the other of the two VFD's for any netcdf-c library build. The fix is to allow it to choose either one at run-time.

### Problem 2: "https://noaa-goes16.s3.amazonaws.com/ABI-L1b-RadF/2022/001/18/OR_ABI-L1b-RadF-M6C01_G16_s20220011800205_e20220011809513_c20220011809562.nc#mode=bytes,s3"
When given what appears to be an S3-related URL, the netcdf-c library code converts it into a canonical, so-called "path" format. In casing out the possible input URL formats, I missed the case where the host contains the bucket ("noaa-goes16"), but not the region. So the fix was to check for this case.

## Misc. Related Changes
1. Since S3 is used in more than just NCZarr, I changed the automake/cmake options to replace "--enable-nczarr-s3" with "--enable-s3", but keeping the former option as a synonym for the latter. This also entailed cleaning up libnetcdf.settings WRT S3 support
2. Added the above URLS as additional test cases

## Misc. Un-Related Changes
1. CURLOPT_PUT is deprecated in favor to CURLOPT_UPLOAD
2. Fix some minor warnings

## Open Problems
* Under Ubuntu, either libcrypto or aws-sdk-cpp has a memory leak.
2023-03-02 19:51:02 -07:00
Zhipeng Xue
a992aadb32 Fix potential dead store 2023-02-28 16:26:27 +08:00
Dennis Heimbigner
9dfafe6c63 Bring up-to-date with main 2023-01-17 16:28:45 -07:00
Dennis Heimbigner
a03bb5e601 Fix infinite loop in file inferencing
re: Issue https://github.com/Unidata/netcdf-c/issues/2573

The file type inferencer in libdispatch/dinference.c has a simple
forward inference mechanism so that the occurrence of certain mode
values in a URL fragment implies inclusion of additional mode values.
This kind of inference is notorious for leading to cycles if not
careful. Unfortunately, this occurred in the one in dinference.c.

This was fixed by providing a more complicated, but more reliable inference
mechanism.

## Misc. Other Changes
* Found and fixed a couple of memory leaks.
* There is a recent problem in building HDF4 support on github actions. Fixed by using the internal HDF4 xdr capability.
* Some filter-related code was not being properly ifdef'd with ENABLE_NCZARRA_FILTERS.
2022-12-18 13:18:00 -07:00
Dennis Heimbigner
591e6b2f6d Fix DAP4 remotetest server
Warning: This PR is a follow on to PR https://github.com/Unidata/netcdf-c/pull/2555 and should not be merged until that prior PR has been merged. The changeset for this PR is a delta on the PR https://github.com/Unidata/netcdf-c/pull/2555.

This PR re-enables the use of the server *remotetest.unidata.ucar.edu/d4ts*
to test several features:
1. Show that access over the Internet to servers using the DAP4 protocol works.
2. Test that DAP4 support in the [Thredds Data Server](https://github.com/Unidata/tds) is operating correctly.
4. Test that the DAP4 support in the [netcdf-java library](https://github.com/Unidata/netcdf-java) library and the DAP4 support in the netcdf-c library are consistent and are interoperable.

The test inputs (primarily *\*.nc* files) provided in the netcdf-c library
are also used by the DAP4 Test Server (aka d4ts) to present web access to a
collection of data files accessible via the DAP4 protocol and which can be
used for testing Internet access to a working server.

To be precise, this version of d4ts is currently in unmerged branches
of the *netcdf-java* and *tds* Github repositories and so are not actually
in the main repositories *yet*. However, the *d4ts.war* file was created
from that branch and used to populate the *remotetest.unidata.ucar.edu*
server

The two other remote servers that were used in the past are *Hyrax* (OPenDAP.org)
and *thredds-test*. These will continue to remain disabled until
those servers can be fixed.

## Primary Changes

* Rebuild the *baselineremote* directory. This directory contains the validation data needed to test the remote servers.
* Re-enable using remotetest.unidata.ucar.edu as part of the DAP4 testing process.
* Fix the *dap4_test/test_remote.sh* test script to match the current available test data.
* Make some changes to libdap4 to improve the ability to catch malformed data streams [affects a lot of files in libdap4].

## Misc. Unrelated Changes

* Remove a raft of warnings, especially in nc_test4/tst_quantize.c.
* Add some additional explanatory information to the NCZarr documentation.
* Cleanup some Doxygen errors in the docs file and reorder some files.
2022-11-15 20:29:21 -07:00
Dan Ibanez
6173956790 Rename variable to avoid function name conflict
I was getting the following error while compiling:

```
netcdf-c/libnczarr/zutil.c:544:26: error: called object 'strlen' is not a function or function pointer
  544 |     if(dnamep) *dnamep = strdup(dname);
      |                          ^~~~~~
netcdf-c/libnczarr/zutil.c:533:68: note: declared here
  533 | ncz_nctype2dtype(nc_type nctype, int endianness, int purezarr, int strlen, char** dnamep)
      |                                                                ~~~~^~~~~~
```

My interpretation is that strdup() is implemented as a macro
which calls strlen() the standard C function, and when that
macro is being substituted here the call to strlen tries
to "call" the integer variable named strlen.

Resolving this by renaming the integer variable to "len"
instead of "strlen", avoiding a conflict with a standard
C library function name.
2022-11-07 13:24:20 -07:00
Dennis Heimbigner
1a45ee025f Fix some addtional errors in NCZarr
re: Issue https://github.com/Unidata/netcdf-c/issues/2502

H/T Charlie Zender

* Fix NCZarr handling of endianness value NC_ENDIAN_NATIVE. This now matches how it is handled in libhdf5
* Fix NCZarr handling of char typed attribute with value "". This now matches how it is handled in libhdf5
* Add test for various char attribute values
* Change the mapping of NC_CHAR and NC_STRING to dtype; requires changing some test files also.
* Optimize the testing for NC_ENOTBUILT in NC_open.
* Turn off debugging left on accidentally
* Fix memory leak in tst_pnetcdf.c
* Fix blosc test
2022-09-09 14:25:24 -06:00
Ward Fisher
c489aad975
Merge branch 'main' into bloscfix.dmh 2022-09-06 15:50:01 -06:00
Ward Fisher
0d17edf2ea
Merge branch 'main' into bloscfix.dmh 2022-09-06 13:49:18 -06:00
Dennis Heimbigner
00a80ec8f9 Catch Xarray dimension inconsistencies 2022-09-04 13:45:29 -06:00
Dennis Heimbigner
6abaab967b Fix some problems with PR https://github.com/Unidata/netcdf-c/pull/2492
re: PR https://github.com/Unidata/netcdf-c/pull/2492
re: Issue https://github.com/Unidata/netcdf-c/issues/2494

This PR fixes some problems with the pull request https://github.com/Unidata/netcdf-c/pull/2492 in response to Issue https://github.com/Unidata/netcdf-c/issues/2494.

* Found and fixed more scalar handling problems and add a test case for scalars.
* Cleanup nczarr_test/run_string.sh test
* Document *_nczarr_default_maxstrlen* and *_nczarr_maxstrlen*.

* Support both "Nan" and *Nan* as being floating point constants
  for attributes. It is unclear from the Zarr V2 spec if
  unquoted *Nan* is legal or not, but support for reading.
  Write the quoted versions when writing an attribute.  Similar
  for Infinity constants.
  So NCZarr supports the following constants for use in Attributes
    * *Nan*, "Nan", *-Nan*, "-Nan"
    * *Nanf*, "Nanf", *-Nanf*, "-Nanf"
    * *Infinity*, "Infinity", *-Infinity*, "-Infinity"
    * *Infinityf*, "Infinityf", *-Infinityf*, "-Infinityf"
2022-09-03 14:21:48 -06:00
Dennis Heimbigner
d0aff6ac3a Fix LGTM alert: too few args 2022-08-27 21:35:04 -06:00
Dennis Heimbigner
231ae96c4b Add support for Zarr string type to NCZarr
* re: https://github.com/Unidata/netcdf-c/pull/2278
* re: https://github.com/Unidata/netcdf-c/issues/2485
* re: https://github.com/Unidata/netcdf-c/issues/2474

This PR subsumes PR https://github.com/Unidata/netcdf-c/pull/2278.
Actually is a bit an omnibus covering several issues.

## PR https://github.com/Unidata/netcdf-c/pull/2278
Add support for the Zarr string type.
Zarr strings are restricted currently to be of fixed size.
The primary issue to be addressed is to provide a way for user to
specify the size of the fixed length strings. This is handled by providing
the following new attributes special:
1. **_nczarr_default_maxstrlen** —
This is an attribute of the root group. It specifies the default
maximum string length for string types. If not specified, then
it has the value of 64 characters.
2. **_nczarr_maxstrlen** —
This is a per-variable attribute. It specifies the maximum
string length for the string type associated with the variable.
If not specified, then it is assigned the value of
**_nczarr_default_maxstrlen**.

This PR also requires some hacking to handle the existing netcdf-c NC_CHAR
type, which does not exist in zarr. The goal was to choose numpy types for
both the netcdf-c NC_STRING type and the netcdf-c NC_CHAR type such that
if a pure zarr implementation read them, it would still work and an
NC_CHAR type would be handled by zarr as a string of length 1.

For writing variables and NCZarr attributes, the type mapping is as follows:
* "|S1" for NC_CHAR.
* ">S1" for NC_STRING && MAXSTRLEN==1
* ">Sn" for NC_STRING && MAXSTRLEN==n

Note that it is a bit of a hack to use endianness, but it should be ok since for
string/char, the endianness has no meaning.

For reading attributes with pure zarr (i.e. with no nczarr
atribute types defined), they will always be interpreted as of
type NC_CHAR.

## Issue: https://github.com/Unidata/netcdf-c/issues/2474
This PR partly fixes this issue because it provided more
comprehensive support for Zarr attributes that are JSON valued expressions.
This PR still does not address the problem in that issue where the
_ARRAY_DIMENSION attribute is incorrectly set. Than can only be
fixed by the creator of the datasets.

## Issue: https://github.com/Unidata/netcdf-c/issues/2485
This PR also fixes the scalar failure shown in this issue.
It generally cleans up scalar handling.
It also adds a note to the documentation describing that
NCZarr supports scalars while Zarr does not and also how
scalar interoperability is achieved.

## Misc. Other Changes
1. Convert the nczarr special attributes and keys to be all lower case. So "_NCZARR_ATTR" now used "_nczarr_attr. Support back compatibility for the upper case names.
2. Cleanup my too-clever-by-half handling of scalars in libnczarr.
2022-08-27 20:21:13 -06:00