Commit Graph

116 Commits

Author SHA1 Message Date
Dennis Heimbigner
87497d79cf update 2023-11-04 16:08:59 -06:00
Ward Fisher
5f5f908a96 Remove vestigial file glob stanza 2023-10-09 10:18:04 -06:00
Dennis Heimbigner
5fa2defc7e Improve fetch performance of DAP4
Prior to this PR, DAP4 always fetched the whole (constrained) dataset
This PR changes the query processing so
1. It reads data on a per-variable request (equivalent to calling nc_get_var()).
2. It tracks a response for every query.

Most of the changes reflect having to do per-variable requests.
In any case, doing all this significantly reduces the amount of data transmitted and hence speeds up DAP4 requests.
2023-10-08 19:59:28 -06:00
Dennis Heimbigner
df3636b959 Mitigate S3 test interference + Unlimited Dimensions in NCZarr
This PR started as an attempt to add unlimited dimensions to NCZarr.
It did that, but this exposed significant problems with test interference.
So this PR is mostly about fixing -- well mitigating anyway -- test
interference.

The problem of test interference is now documented in the document docs/internal.md.
The solutions implemented here are also describe in that document.
The solution is somewhat fragile but multiple cleanup mechanisms
are provided. Note that this feature requires that the
AWS command line utility must be installed.

## Unlimited Dimensions.
The existing NCZarr extensions to Zarr are modified to support unlimited dimensions.
NCzarr extends the Zarr meta-data for the ".zgroup" object to include netcdf-4 model extensions. This information is stored in ".zgroup" as dictionary named "_nczarr_group".
Inside "_nczarr_group", there is a key named "dims" that stores information about netcdf-4 named dimensions. The value of "dims" is a dictionary whose keys are the named dimensions. The value associated with each dimension name has one of two forms
Form 1 is a special case of form 2, and is kept for backward compatibility. Whenever a new file is written, it uses format 1 if possible, otherwise format 2.
* Form 1: An integer representing the size of the dimension, which is used for simple named dimensions.
* Form 2: A dictionary with the following keys and values"
   - "size" with an integer value representing the (current) size of the dimension.
   - "unlimited" with a value of either "1" or "0" to indicate if this dimension is an unlimited dimension.

For Unlimited dimensions, the size is initially zero, and as variables extend the length of that dimension, the size value for the dimension increases.
That dimension size is shared by all arrays referencing that dimension, so if one array extends an unlimited dimension, it is implicitly extended for all other arrays that reference that dimension.
This is the standard semantics for unlimited dimensions.

Adding unlimited dimensions required a number of other changes to the NCZarr code-base. These included the following.
* Did a partial refactor of the slice handling code in zwalk.c to clean it up.
* Added a number of tests for unlimited dimensions derived from the same test in nc_test4.
* Added several NCZarr specific unlimited tests; more are needed.
* Add test of endianness.

## Misc. Other Changes
* Modify libdispatch/ncs3sdk_aws.cpp to optionally support use of the
   AWS Transfer Utility mechanism. This is controlled by the
   ```#define TRANSFER```` command in that file. It defaults to being disabled.
* Parameterize both the standard Unidata S3 bucket (S3TESTBUCKET) and the netcdf-c test data prefix (S3TESTSUBTREE).
* Fixed an obscure memory leak in ncdump.
* Removed some obsolete unit testing code and test cases.
* Uncovered a bug in the netcdf-c handling of big-endian floats and doubles. Have not fixed yet. See tst_h5_endians.c.
* Renamed some nczarr_tests testcases to avoid name conflicts with nc_test4.
* Modify the semantics of zmap\#ncsmap_write to only allow total rewrite of objects.
* Modify the semantics of zodom to properly handle stride > 1.
* Add a truncate operation to the libnczarr zmap code.
2023-09-26 16:56:48 -06:00
Dennis Heimbigner
a37ca49d25 Modify PR https://github.com/Unidata/netcdf-c/pull/2655 to ensure transient types have names.
re: PR https://github.com/Unidata/netcdf-c/pull/2655

This PR modifies the transient types PR so that all created
transient types are given a created unique name (within a
group). The form of the name is "_Anonymous<Class>NN". The class
is the user-defined type class: Enum, Compound, Opaque, or
Vlen. NN is an integer identifier to ensure uniqueness.
Additionally, this was applied to DAP/4 anonymous dimensions.
This also required some test baseline data changes.

The transient test case is modified to verify that the name exists.
2023-07-22 20:40:53 -06:00
Dennis Heimbigner
9d9a40d2eb Fix missing file 2023-06-10 22:08:40 -06:00
Dennis Heimbigner
a88da4165f add testcase 2023-06-10 22:03:04 -06:00
Dennis Heimbigner
9341904b0b Add earthdata test case 2023-06-10 20:11:26 -06:00
Dennis Heimbigner
49737888ca Improve S3 Documentation and Support
## Improvements to S3 Documentation
* Create a new document *quickstart_paths.md* that give a summary of the legal path formats used by netcdf-c. This includes both file paths and URL paths.
* Modify *nczarr.md* to remove most of the S3 related text.
* Move the S3 text from *nczarr.md* to a new document *cloud.md*.
* Add some S3-related text to the *byterange.md* document.

Hopefully, this will make it easier for users to find the information they want.

## Rebuild NCZarr Testing
In order to avoid problems with running make check in parallel, two changes were made:
1. The *nczarr_test* test system was rebuilt. Now, for each test.
any generated files are kept in a test-specific directory, isolated
from all other test executions.
2. Similarly, since the S3 test bucket is shared, any generated S3 objects
are isolated using a test-specific key path.

## Other S3 Related Changes
* Add code to ensure that files created on S3 are reclaimed at end of testing.
* Used the bash "trap" command to ensure S3 cleanup even if the test fails.
* Cleanup the S3 related configure.ac flag set since S3 is used in several places. So now one should use the option *--enable-s3* instead of *--enable-nczarr-s3*, although the latter is still kept as a deprecated alias for the former.
* Get some of the github actions yml to work with S3; required fixing various test scripts adding a secret to access the Unidata S3 bucket.
* Cleanup S3 portion of libnetcdf.settings.in and netcdf_meta.h.in and test_common.in.
* Merge partial S3 support into dhttp.c.
* Create an experimental s3 access library especially for use with Windows. It is enabled by using the options *--enable-s3-internal* (automake) or *-DENABLE_S3_INTERNAL=ON* (CMake). Also add a unit-test for it.
* Move some definitions from ncrc.h to ncs3sdk.h

## Other Changes
* Provide a default implementation of strlcpy and move this and similar defaults into *dmissing.c*.
2023-04-25 17:15:06 -06:00
Ward Fisher
7c48b58a46 Update dap4_test for systems without getopt 2023-04-12 14:05:38 -06:00
Ward Fisher
c8b3b37b1a Merge branch 'dap4tests2.dmh' of https://github.com/DennisHeimbigner/netcdf-c 2023-04-11 16:42:32 -06:00
Dennis Heimbigner
5b42e382b0 Update to latest main 2023-04-04 18:37:20 -06:00
Dennis Heimbigner
cf6fcb3b9c Merge branch 'master' into dap4tests2.dmh 2023-03-02 20:00:05 -07:00
Dennis Heimbigner
bec55cb95e Merge branch 'master' into dap4tests1.dmh 2023-03-02 19:59:32 -07:00
Ward Fisher
677587b444 Fix issue with dangling test file not getting cleaned up. 2023-02-27 15:07:33 -07:00
Dennis Heimbigner
158c790ae5 Fix Memory Leak
re: PR https://github.com/Unidata/netcdf-c/pull/2584
re: PR https://github.com/Unidata/netcdf-c/pull/2596

Repaired a memory leak in *netcdf-c/ncdump/utils.c*. I think introduced
by PR 2584.

## Misc. Other Changes
* Fixed references to *netcdf-c/docs/byterange.dox* ->  *netcdf-c/docs/byterange.md* (PR 2596).
2023-01-26 13:11:25 -07:00
Dennis Heimbigner
d1d2808919 Additional DAP4 fixes
This change-set modifies PR https://github.com/Unidata/netcdf-c/pull/2555
to add the changes listed below. Most of these changes are required
by changes to the Java remotetest.unidata.ucar.edu server.

## DAP4 Related Changes
* Add tests *dap4_test/test_constraints.sh* and *dap4_test/test_hyrax.sh*.
* Provide explicit list of remotetest files to test.
* Cleanup local checksum computing and verification.
* Define a temporary Hyrax hack flag to deal with the way Hyrax handles checksums and add "#hyrax" fragment flag for it.
* Add a hack to get past an LGTM problem with using "http:".
* Improve debug support.

## Other Changes
* Cleanup the recipe in *docs/nczarr.md* for building *aws-sdk-cpp* library.
2023-01-18 19:47:29 -07:00
Dennis Heimbigner
7fd059fc4e conflicts 2022-11-27 16:49:37 -07:00
Dennis Heimbigner
65eca292e6 Merge branch 'dap4tests2.tmp' into dap4tests2.dmh 2022-11-27 14:20:48 -07:00
Dennis Heimbigner
591e6b2f6d Fix DAP4 remotetest server
Warning: This PR is a follow on to PR https://github.com/Unidata/netcdf-c/pull/2555 and should not be merged until that prior PR has been merged. The changeset for this PR is a delta on the PR https://github.com/Unidata/netcdf-c/pull/2555.

This PR re-enables the use of the server *remotetest.unidata.ucar.edu/d4ts*
to test several features:
1. Show that access over the Internet to servers using the DAP4 protocol works.
2. Test that DAP4 support in the [Thredds Data Server](https://github.com/Unidata/tds) is operating correctly.
4. Test that the DAP4 support in the [netcdf-java library](https://github.com/Unidata/netcdf-java) library and the DAP4 support in the netcdf-c library are consistent and are interoperable.

The test inputs (primarily *\*.nc* files) provided in the netcdf-c library
are also used by the DAP4 Test Server (aka d4ts) to present web access to a
collection of data files accessible via the DAP4 protocol and which can be
used for testing Internet access to a working server.

To be precise, this version of d4ts is currently in unmerged branches
of the *netcdf-java* and *tds* Github repositories and so are not actually
in the main repositories *yet*. However, the *d4ts.war* file was created
from that branch and used to populate the *remotetest.unidata.ucar.edu*
server

The two other remote servers that were used in the past are *Hyrax* (OPenDAP.org)
and *thredds-test*. These will continue to remain disabled until
those servers can be fixed.

## Primary Changes

* Rebuild the *baselineremote* directory. This directory contains the validation data needed to test the remote servers.
* Re-enable using remotetest.unidata.ucar.edu as part of the DAP4 testing process.
* Fix the *dap4_test/test_remote.sh* test script to match the current available test data.
* Make some changes to libdap4 to improve the ability to catch malformed data streams [affects a lot of files in libdap4].

## Misc. Unrelated Changes

* Remove a raft of warnings, especially in nc_test4/tst_quantize.c.
* Add some additional explanatory information to the NCZarr documentation.
* Cleanup some Doxygen errors in the docs file and reorder some files.
2022-11-15 20:29:21 -07:00
Dennis Heimbigner
c1bb174ecd switch 2022-11-14 19:30:28 -07:00
Dennis Heimbigner
5d0f1ca907 debug3 2022-11-14 18:53:30 -07:00
Dennis Heimbigner
a0b05758b6 debug1 2022-11-14 18:37:25 -07:00
Dennis Heimbigner
9c1119e824 switch 2022-11-14 18:25:45 -07:00
Dennis Heimbigner
cc25aac175 remotetest 2022-11-14 12:51:09 -07:00
Dennis Heimbigner
2b17c0cc68 ckp 2022-11-13 14:35:48 -07:00
Dennis Heimbigner
835b81a285 Cleanup DAP4 testing
NOTE: This PR should not be included in 4.9.1 since additional
DAP4 related PRs will be forthcoming.

This PR makes major changes to libdap4 and dap4_test driven by changes to TDS.

* Enable DAP4
* Clean up the test input files and the test baseline comparison files. This entails:
    * Remove a multitude of unused test input and baseline data files; among them are dap4_test/: daptestfiles, dmrtestfiles, nctestfiles, and misctestfiles.
    * Define a canonical set of test input files and record in dap4_test/cdltestfiles.
    * Use the cdltestfiles to generate the .nc test inputs. This set of .nc files is then moved to the d4ts (DAP4 test server) war file in the tds repository. This set then becomes the canonical set of DAP4 test sources.
    * Scrape d4ts to obtain copies of the raw streams of DAP4 encoded data. The .dmr and .dap streams are then stored in dap4_test/rawtestfiles.
    * Disable some remote server tests until those servers are fixed.
* Add an option to ncdump (-XF) that forces the type of the _FillValue attribute; this is primarily to simplify testing of fill mismatch.
* Minor bug fixes to ncgen.
* Changes to libdap4:
    * Replace old checksum hack with the dap4.checksum flag.
    * Support the dap4.XXX controls.
    * Cleanup _FillValue handling, especially var-attribute type mismatches.
    * Fix enum handling based on changes to netcdf-java.
* Changes to dap4_test:
    * Add getopt support to various test support programs.
    * Remove unneeded shell scripts.
    * Add new scripts: test_curlopt.sh
2022-11-13 13:15:11 -07:00
Ward Fisher
533572f987 Correct an issue with 'make distcheck' where out-of-source tests were failing. 2022-10-18 14:27:12 -06:00
Ward Fisher
0d17edf2ea
Merge branch 'main' into bloscfix.dmh 2022-09-06 13:49:18 -06:00
Dennis Heimbigner
231ae96c4b Add support for Zarr string type to NCZarr
* re: https://github.com/Unidata/netcdf-c/pull/2278
* re: https://github.com/Unidata/netcdf-c/issues/2485
* re: https://github.com/Unidata/netcdf-c/issues/2474

This PR subsumes PR https://github.com/Unidata/netcdf-c/pull/2278.
Actually is a bit an omnibus covering several issues.

## PR https://github.com/Unidata/netcdf-c/pull/2278
Add support for the Zarr string type.
Zarr strings are restricted currently to be of fixed size.
The primary issue to be addressed is to provide a way for user to
specify the size of the fixed length strings. This is handled by providing
the following new attributes special:
1. **_nczarr_default_maxstrlen** &mdash;
This is an attribute of the root group. It specifies the default
maximum string length for string types. If not specified, then
it has the value of 64 characters.
2. **_nczarr_maxstrlen** &mdash;
This is a per-variable attribute. It specifies the maximum
string length for the string type associated with the variable.
If not specified, then it is assigned the value of
**_nczarr_default_maxstrlen**.

This PR also requires some hacking to handle the existing netcdf-c NC_CHAR
type, which does not exist in zarr. The goal was to choose numpy types for
both the netcdf-c NC_STRING type and the netcdf-c NC_CHAR type such that
if a pure zarr implementation read them, it would still work and an
NC_CHAR type would be handled by zarr as a string of length 1.

For writing variables and NCZarr attributes, the type mapping is as follows:
* "|S1" for NC_CHAR.
* ">S1" for NC_STRING && MAXSTRLEN==1
* ">Sn" for NC_STRING && MAXSTRLEN==n

Note that it is a bit of a hack to use endianness, but it should be ok since for
string/char, the endianness has no meaning.

For reading attributes with pure zarr (i.e. with no nczarr
atribute types defined), they will always be interpreted as of
type NC_CHAR.

## Issue: https://github.com/Unidata/netcdf-c/issues/2474
This PR partly fixes this issue because it provided more
comprehensive support for Zarr attributes that are JSON valued expressions.
This PR still does not address the problem in that issue where the
_ARRAY_DIMENSION attribute is incorrectly set. Than can only be
fixed by the creator of the datasets.

## Issue: https://github.com/Unidata/netcdf-c/issues/2485
This PR also fixes the scalar failure shown in this issue.
It generally cleans up scalar handling.
It also adds a note to the documentation describing that
NCZarr supports scalars while Zarr does not and also how
scalar interoperability is achieved.

## Misc. Other Changes
1. Convert the nczarr special attributes and keys to be all lower case. So "_NCZARR_ATTR" now used "_nczarr_attr. Support back compatibility for the upper case names.
2. Cleanup my too-clever-by-half handling of scalars in libnczarr.
2022-08-27 20:21:13 -06:00
Dennis Heimbigner
3623e17920 Fix some bugs in the blosc filter wrapper
re: Issue https://github.com/Unidata/netcdf-c/issues/2458

The above Github Issue revealed some bugs in the file netcdf-c/plugins/H5Zblosc.c. Fixed and added a testcase. Also discovered that the Blosc LZ sub-compressors do not work well with small datasets.

Misc. Other Change(s): I noticed that the file "dap4_test/baselinethredds/GOES16_CONUS_20170821_020218_0.47_1km_33.3N_91.4W.nc4.thredds" is still causing tar errors during "make distcheck", so I made some changes to do rename at test-time.
2022-07-12 15:19:07 -06:00
Dennis Heimbigner
53890fd3a0 Fix distcheck problems
re: https://github.com/Unidata/netcdf-c/issues/2342
This PR replaces PR https://github.com/Unidata/netcdf-c/pull/2342

This PR extends the distcheck corrections in PR
https://github.com/Unidata/netcdf-c/pull/2342.  That original PR
exposed some errors in the file naming in the plugins and
nczarr_test directories.  This PR corrects those problems and
should be used instead of https://github.com/Unidata/netcdf-c/pull/2342

Ed Hartnett's suggestion about how to install the plugins in the
user specified directory will be addressed in a subsequent PR.
2022-05-09 12:10:53 -06:00
Dennis Heimbigner
2856ee751d restore 2022-04-29 12:36:33 -06:00
Dennis Heimbigner
ad62ed2d41 ckp 2022-04-26 17:58:20 -06:00
Ward Fisher
3446aa0c13 Merge branch 'winutf8.dmh' of https://github.com/DennisHeimbigner/netcdf-c into gh2222.wif 2022-04-05 10:46:22 -06:00
Dennis Heimbigner
6d44ec39f6 1. Fix conflicts with current master.
2. There is a bug in building tinyxml2 under OSX, so as a hack, the absence of an installed libxml2 under OSX will disable libxml2 and DAP4.
2022-03-15 15:33:13 -06:00
Dennis Heimbigner
ad4a3f69b9 typo 2022-03-14 14:36:49 -06:00
Dennis Heimbigner
746dbc05f8 more strlcat remove 2022-03-14 14:08:54 -06:00
Dennis Heimbigner
1290022433 remove use of strlcat 2022-03-14 14:00:09 -06:00
Dennis Heimbigner
9b7202bf06 Explicitly disallow variable length type compression
re: https://github.com/Unidata/netcdf-c/issues/2189

Compression of a variable whose type is variable length
fails for all current filters. This is because at some point,
the compression buffer will contain pointers to data instead
of the actual data. Compression of pointers of course is meaningless.

The PR changes the behavior of nc_def_var_filter so that it will
fail with error NC_EFILTER if an attempt is made to add a filter
to a variable whose type is variable-length.

A variable is variable-length if it is of type string or VLEN
or transitively (via a compound type) contains a string or VLEN.

Also added a test case for this.

## Misc Changes
1. Turn off a number of debugging statements
2022-02-19 16:47:31 -07:00
Dennis Heimbigner
36102e3c32 Improve UTF8 Support On Windows
re: Issue https://github.com/Unidata/netcdf-c/issues/2190

The primary purpose of this PR is to improve the utf8 support
for windows. This is persuant to a change in Windows that
supports utf8 natively (almost). The almost means that it is
still utf16 internally and the set of characters representable
by utf8 is larger than those representable by utf16.

This leaves open the question in the Issue about handling
the Windows 1252 character set.

This required the following changes:

1. Test the Windows build and major version in order to see if
   native utf8 is supported.
2. If native utf8 is supported, Modify dpathmgr.c to call the 8-bit
   version of the windows fopen() and open() functions.
3. In support of this, programs that use XGetOpt (Windows versions)
   need to get the command line as utf8 and then parse to
   arc+argv as utf8. This requires using a homegrown command line parser
   named XCommandLineToArgvA.
4. Add a utility program called "acpget" that prints out the
   current Windows code page and locale.

Additionally, some technical debt was cleaned up as follows:

1. Unify all the places which attempt to read all or a part
   of a file into the dutil.c#NC_readfile code.
2. Similary unify all the code that creates temp files into
   dutil.c#NC_mktmp code.
3. Convert almost all remaining calls to fopen() and open()
   to NCfopen() and NCopen3(). This is to ensure that path management
   is used consistently. This touches a number of files.
4. extern->EXTERNL as needed to get it to work under Windows.
2022-02-08 20:53:30 -07:00
Milton Woods
b33a6348f1
Merge branch 'main' into mingw-w64-strcasecmp 2022-01-11 10:45:15 +11:00
Dennis Heimbigner
73caeb674d Cleanup the CMake inter-test dependencies
The ncdump test set has a number of inter-test dependencies
that are not properly established in ncdump/CMakeLists.txt.

So this PR attempts to:
1. reorder the tests
2. change tests in CMakeLists.txt from build_bin_test_no_prefix to add_bin_test_no_prefix so they get executed

Plus a couple of minor bug fixes.
1. Change ENABLE_NC4 => ENABLE_HDF5 in github action.
2. fix a memory error in findtestserver.c.in
3. fix bug in ncdap_tests/tst_urls.sh
4. fix netcdf file name bug in tst_netcdf4_4.sh
2021-12-20 15:13:08 -07:00
Dennis Heimbigner
2da684fc37 ckp 2021-10-26 22:52:23 -06:00
Ward Fisher
7ec0ac0a08
Merge branch 'main' into mingw-w64-strcasecmp 2021-10-01 17:07:37 -05:00
Milton Woods
4fa91d8241 Use strcasecmp definitions from config.h 2021-09-05 17:17:30 +10:00
Dennis Heimbigner
11fe00ea05 Add filter support to NCZarr
Filter support has three goals:

1. Use the existing HDF5 filter implementations,
2. Allow filter metadata to be stored in the NumCodecs metadata format used by Zarr,
3. Allow filters to be used even when HDF5 is disabled

Detailed usage directions are define in docs/filters.md.

For now, the existing filter API is left in place. So filters
are defined using ''nc_def_var_filter'' using the HDF5 style
where the id and parameters are unsigned integers.

This is a big change since filters affect many parts of the code.

In the following, the terms "compressor" and "filter" and "codec" are generally
used synonomously.

### Filter-Related Changes:
* In order to support dynamic loading of shared filter libraries, a new library was added in the libncpoco directory; it helps to isolate dynamic loading across multiple platforms.
* Provide a json parsing library for use by plugins; this is created by merging libdispatch/ncjson.c with include/ncjson.h.
* Add a new _Codecs attribute to allow clients to see what codecs are being used; let ncdump -s print it out.
* Provide special headers to help support compilation of HDF5 filters when HDF5 is not enabled: netcdf_filter_hdf5_build.h and netcdf_filter_build.h.
* Add a number of new test to test the new nczarr filters.
* Let ncgen parse _Codecs attribute, although it is ignored.

### Plugin directory changes:
* Add support for the Blosc compressor; this is essential because it is the most common compressor used in Zarr datasets. This also necessitated adding a CMake FindBlosc.cmake file
* Add NCZarr support for the big-four filters provided by HDF5: shuffle, fletcher32, deflate (zlib), and szip
* Add a Codec defaulter (see docs/filters.md) for the big four filters.
* Make plugins work with windows by properly adding __declspec declaration.

### Misc. Non-Filter Changes
* Replace most uses of USE_NETCDF4 (deprecated) with USE_HDF5.
* Improve support for caching
* More fixes for path conversion code
* Fix misc. memory leaks
* Add new utility -- ncdump/ncpathcvt -- that does more or less the same thing as cygpath.
* Add a number of new test to test the non-filter fixes.
* Update the parsers
* Convert most instances of '#ifdef _MSC_VER' to '#ifdef _WIN32'
2021-09-02 17:04:26 -06:00
Dennis Heimbigner
4f37d1a826 Make Issue https://github.com/Unidata/netcdf-c/issues/2077 work when build is repeated.
re: https://github.com/Unidata/netcdf-c/pull/2075

The long file name fix fails when the build is manually repeated
because the source file has already been renamed. Solution is to
test if the dest file exists or not before doing the rename.
2021-08-24 15:11:26 -06:00
Ward Fisher
24411c429e Update cmake-based builds in support of https://github.com/Unidata/netcdf-c/issues/2077 2021-08-18 11:40:09 -06:00
Ward Fisher
379f51a68c Additional refactoring of the thredds dap4 test in support of #2077 2021-08-18 10:24:42 -06:00