Commit Graph

84 Commits

Author SHA1 Message Date
Ward Fisher
54102a3cea Update path in s3sdk shell script unit test. 2023-10-16 14:30:43 -06:00
Ward Fisher
3c013ce342 Merged in current state of https://github.com/Unidata/netcdf-c/pulls/2741 2023-09-29 15:05:43 -06:00
Dennis Heimbigner
df3636b959 Mitigate S3 test interference + Unlimited Dimensions in NCZarr
This PR started as an attempt to add unlimited dimensions to NCZarr.
It did that, but this exposed significant problems with test interference.
So this PR is mostly about fixing -- well mitigating anyway -- test
interference.

The problem of test interference is now documented in the document docs/internal.md.
The solutions implemented here are also describe in that document.
The solution is somewhat fragile but multiple cleanup mechanisms
are provided. Note that this feature requires that the
AWS command line utility must be installed.

## Unlimited Dimensions.
The existing NCZarr extensions to Zarr are modified to support unlimited dimensions.
NCzarr extends the Zarr meta-data for the ".zgroup" object to include netcdf-4 model extensions. This information is stored in ".zgroup" as dictionary named "_nczarr_group".
Inside "_nczarr_group", there is a key named "dims" that stores information about netcdf-4 named dimensions. The value of "dims" is a dictionary whose keys are the named dimensions. The value associated with each dimension name has one of two forms
Form 1 is a special case of form 2, and is kept for backward compatibility. Whenever a new file is written, it uses format 1 if possible, otherwise format 2.
* Form 1: An integer representing the size of the dimension, which is used for simple named dimensions.
* Form 2: A dictionary with the following keys and values"
   - "size" with an integer value representing the (current) size of the dimension.
   - "unlimited" with a value of either "1" or "0" to indicate if this dimension is an unlimited dimension.

For Unlimited dimensions, the size is initially zero, and as variables extend the length of that dimension, the size value for the dimension increases.
That dimension size is shared by all arrays referencing that dimension, so if one array extends an unlimited dimension, it is implicitly extended for all other arrays that reference that dimension.
This is the standard semantics for unlimited dimensions.

Adding unlimited dimensions required a number of other changes to the NCZarr code-base. These included the following.
* Did a partial refactor of the slice handling code in zwalk.c to clean it up.
* Added a number of tests for unlimited dimensions derived from the same test in nc_test4.
* Added several NCZarr specific unlimited tests; more are needed.
* Add test of endianness.

## Misc. Other Changes
* Modify libdispatch/ncs3sdk_aws.cpp to optionally support use of the
   AWS Transfer Utility mechanism. This is controlled by the
   ```#define TRANSFER```` command in that file. It defaults to being disabled.
* Parameterize both the standard Unidata S3 bucket (S3TESTBUCKET) and the netcdf-c test data prefix (S3TESTSUBTREE).
* Fixed an obscure memory leak in ncdump.
* Removed some obsolete unit testing code and test cases.
* Uncovered a bug in the netcdf-c handling of big-endian floats and doubles. Have not fixed yet. See tst_h5_endians.c.
* Renamed some nczarr_tests testcases to avoid name conflicts with nc_test4.
* Modify the semantics of zmap\#ncsmap_write to only allow total rewrite of objects.
* Modify the semantics of zodom to properly handle stride > 1.
* Add a truncate operation to the libnczarr zmap code.
2023-09-26 16:56:48 -06:00
Ward Fisher
b5bb7d8837 Correct a couple of typos. 2023-09-14 12:55:45 -06:00
Ward Fisher
571fa068eb Made some changes to spacing, also discovered an issue around specifying rpath on MacOSX when using configure. 2023-09-14 12:01:42 -06:00
Ward Fisher
543eeee980 A few tweaks for readability. 2023-09-14 11:20:10 -06:00
Ward Fisher
22ecb88e8d Turned off verbose debugging in test_s3sdk.c for now. 2023-08-31 10:31:06 -06:00
Ward Fisher
468e98f3b4 Making test cases more verbose, turning off s3-based interoperability tests TEMPORARILY. 2023-08-30 16:11:37 -06:00
Dennis Heimbigner
12ec5711d7 Fix some problems with Earthdata authorization.
re: Issue https://github.com/Unidata/netcdf-c/issues/2704

The issue reported problems accessing e.g. opendap.earthdata.nasa.gov,
which uses the authentication mechanisms of urs.earthdata.nasa.gov.
The file *docs/auth.md* describes how to setup the proper authorization
mechanisms for earthdata, but there turned out to be some bugs
in the code that prevented this from working.

## Primary Changes
* Add some clarification text to *auth.md*.
* Fix the process for loading and merging *.ncrc* and *.dodsrc* file to conform to documentation.
* Fix *NC_s3urlrebuild* so that non-S3 urls are passed through unchanged.
* Fix a bug in the .rc test *test_rcmerge.sh*.
2023-06-10 18:51:13 -06:00
Dennis Heimbigner
fb40a72b45 Improve performance of the nc_reclaim_data and nc_copy_data functions.
re: Issue https://github.com/Unidata/netcdf-c/issues/2685
re: PR https://github.com/Unidata/netcdf-c/pull/2179

As noted in PR https://github.com/Unidata/netcdf-c/pull/2179,
the old code did not allow for reclaiming instances of types,
nor for properly copying them. That PR provided new functions
capable of reclaiming/copying instances of arbitrary types.

However, as noted by Issue https://github.com/Unidata/netcdf-c/issues/2685, using these
most general functions resulted in a significant performance
degradation, even for common cases.

This PR attempts to mitigate the cost of using the general
reclaim/copy functions in two ways.

First, the previous functions operating at the top level by
using ncid and typeid arguments. These functions were augmented
with equivalent versions that used the netcdf-c library internal
data structures to allow direct access to needed information.
These new functions are used internally to the library.

The second mitigation involves optimizing the internal functions
by providing early tests for common cases. This avoids
unnecessary recursive function calls.

The overall result is a significant improvement in speed by a
factor of roughly twenty -- your mileage may vary. These
optimized functions are still not as fast as the original (more
limited) functions, but they are getting close. Additional optimizations are
possible. But the cost is a significant "uglification" of the
code that I deemed a step too far, at least for now.

## Misc. Changes
1. Added a test case to check the proper reclamation/copy of complex types.
2. Found and fixed some places where nc_reclaim/copy should have been used.
3. Replaced, in the netcdf-c library, (almost all) occurrences of nc_reclaim_copy with calls to NC_reclaim/copy. This plus the optimizations is the primary speed-up mechanism.
4. In DAP4, the metadata is held in a substrate in-memory file; this required some changes so that the reclaim/copy code accessed that substrate dispatcher rather than the DAP4 dispatcher.
5. Re-factored and isolated the code that computes if a type is (transitively) variable-sized or not.
6. Clean up the reclamation code in ncgen; adding the use of nc_reclaim exposed some memory problems.
2023-05-20 17:11:25 -06:00
Dennis Heimbigner
98477b9f25 ## Addendum [5/9/23]
It turns out that attempting to test S3 using a github action secret is a very complex process. So, this was disabled for github actions. However, a new *run_tests_s3.yml* action file was added that will eventually encapsulate S3 testing.
2023-05-09 21:13:49 -06:00
Dennis Heimbigner
681abc3fb1 s3-off 2023-04-30 18:41:31 -06:00
Dennis Heimbigner
6eac55dc44 debug11 2023-04-29 21:01:05 -06:00
Dennis Heimbigner
a97cabfb12 debug10 2023-04-29 20:51:46 -06:00
Dennis Heimbigner
ff6b6a72d1 debug7 2023-04-29 20:40:21 -06:00
Dennis Heimbigner
5be7088c9e debug6 2023-04-29 20:36:18 -06:00
Dennis Heimbigner
20065682bb debug1 2023-04-28 14:30:48 -06:00
Dennis Heimbigner
aa82e1b4cb Merge branch 's3update.dmh' of https://github.com/DennisHeimbigner/netcdf-c into s3update.dmh 2023-04-28 14:05:14 -06:00
Dennis Heimbigner
62f8d31415 profile1 2023-04-28 14:05:06 -06:00
Dennis Heimbigner
7e5c4ebe66 debug1 2023-04-27 20:01:39 -06:00
Dennis Heimbigner
7939ec559f forward1 2023-04-27 19:52:03 -06:00
Dennis Heimbigner
44441af5bb clean1 2023-04-27 19:46:34 -06:00
Dennis Heimbigner
5d5ff73847 segv2 2023-04-27 15:26:34 -06:00
Dennis Heimbigner
c8d14cb029 gdb1 2023-04-27 13:32:02 -06:00
Dennis Heimbigner
6707508a26 list1 2023-04-27 12:28:14 -06:00
Dennis Heimbigner
938dcd913e revamp1 2023-04-27 11:10:10 -06:00
Dennis Heimbigner
91473c231b ga2 2023-04-26 20:48:50 -06:00
Dennis Heimbigner
a77bdc0a91 ga1 2023-04-26 20:39:48 -06:00
Dennis Heimbigner
076aad4174 val10 2023-04-26 20:25:43 -06:00
Dennis Heimbigner
aeaf9e4bec notrace 2023-04-26 14:16:22 -06:00
Dennis Heimbigner
b19bdb6bba vg5 2023-04-26 14:05:23 -06:00
Dennis Heimbigner
f584296472 ga1 2023-04-26 13:51:17 -06:00
Dennis Heimbigner
61f42a4404 valg2 2023-04-26 13:31:36 -06:00
Dennis Heimbigner
8ee9453043 valg2 2023-04-26 13:20:54 -06:00
Dennis Heimbigner
5dd237246f fault1 2023-04-26 13:03:21 -06:00
Dennis Heimbigner
3eaa4bbb2c valgrind1 2023-04-26 12:38:11 -06:00
Dennis Heimbigner
f6f4b89f39 cyg1 2023-04-25 20:27:59 -06:00
Dennis Heimbigner
49737888ca Improve S3 Documentation and Support
## Improvements to S3 Documentation
* Create a new document *quickstart_paths.md* that give a summary of the legal path formats used by netcdf-c. This includes both file paths and URL paths.
* Modify *nczarr.md* to remove most of the S3 related text.
* Move the S3 text from *nczarr.md* to a new document *cloud.md*.
* Add some S3-related text to the *byterange.md* document.

Hopefully, this will make it easier for users to find the information they want.

## Rebuild NCZarr Testing
In order to avoid problems with running make check in parallel, two changes were made:
1. The *nczarr_test* test system was rebuilt. Now, for each test.
any generated files are kept in a test-specific directory, isolated
from all other test executions.
2. Similarly, since the S3 test bucket is shared, any generated S3 objects
are isolated using a test-specific key path.

## Other S3 Related Changes
* Add code to ensure that files created on S3 are reclaimed at end of testing.
* Used the bash "trap" command to ensure S3 cleanup even if the test fails.
* Cleanup the S3 related configure.ac flag set since S3 is used in several places. So now one should use the option *--enable-s3* instead of *--enable-nczarr-s3*, although the latter is still kept as a deprecated alias for the former.
* Get some of the github actions yml to work with S3; required fixing various test scripts adding a secret to access the Unidata S3 bucket.
* Cleanup S3 portion of libnetcdf.settings.in and netcdf_meta.h.in and test_common.in.
* Merge partial S3 support into dhttp.c.
* Create an experimental s3 access library especially for use with Windows. It is enabled by using the options *--enable-s3-internal* (automake) or *-DENABLE_S3_INTERNAL=ON* (CMake). Also add a unit-test for it.
* Move some definitions from ncrc.h to ncs3sdk.h

## Other Changes
* Provide a default implementation of strlcpy and move this and similar defaults into *dmissing.c*.
2023-04-25 17:15:06 -06:00
Dennis Heimbigner
f80cbc2959 Missed unit_test 2023-04-10 16:07:34 -06:00
Dennis Heimbigner
126b3f9423 Support installation of filters into user-specified location
re: https://github.com/Unidata/netcdf-c/issues/2294

Ed Hartnett suggested that the netcdf library installation process
be extended to install the standard filters into a user specified
location. The user can then set HDF5_PLUGIN_PATH to that location.

This PR provides that capability using:
````
configure option: --with-plugin-dir=<absolute directory path>
cmake option: -DPLUGIN_INSTALL_DIR=<absolute directory path>
````

Currently, the following plugins are always installed, if
available: bzip2, zstd, blosc.
If NCZarr is enabled, then additional plugins are installed:
fletcher32, shuffle, deflate, szip.

Additionally, the necessary codec support is installed
for each of the above filters that is installed.

## Changes:
1. Cleanup handling of built-in bzip2.
2. Add documentation to docs/filters.md
3. Re-factor the NCZarr codec libraries
4. Add a test, although it can only be exercised after
   the library is installed, so it cannot be used during
   normal testing.
5. Cleanup use of HDF5_PLUGIN_PATH in the filter test cases.
2022-04-29 14:31:55 -06:00
Edward Hartnett
8d8fdfd12c fixed warnings 2022-04-26 05:31:07 -06:00
Dennis Heimbigner
f3e711e2b8 Add support for setting HDF5 alignment property when creating a file
re: https://github.com/Unidata/netcdf-c/issues/2177
re: https://github.com/Unidata/netcdf-c/pull/2178

Provide get/set functions to store global data alignment
information and apply it when a file is created.

The api is as follows:
````
int nc_set_alignment(int threshold, int alignment);
int nc_get_alignment(int* thresholdp, int* alignmentp);
````

If defined, then for every file created opened after the call to
nc_set_alignment, for every new variable added to the file, the
most recently set threshold and alignment values will be applied
to that variable.

The nc_get_alignment function return the last values set by
nc_set_alignment.  If nc_set_alignment has not been called, then
it returns the value 0 for both threshold and alignment.

The alignment parameters are stored in the NCglobalstate object
(see below) for use as needed. Repeated calls to nc_set_alignment
will overwrite any existing values in NCglobalstate.

The alignment parameters are applied in libhdf5/hdf5create.c
and libhdf5/hdf5open.c

The set/get alignment functions are defined in libsrc4/nc4internal.c.

A test program was added as nc_test4/tst_alignment.c.

## Misc. Changes Unrelated to Alignment

* The NCRCglobalstate type was renamed to NCglobalstate to
  indicate that it represented more general global state than
  just .rc data.  It was also moved to nc4internal.h.  This led
  to a large number of small changes: mostly renaming. The
  global state management functions were moved to nc4internal.c.

* The global chunk cache variables have been moved into
  NCglobalstate.  As warranted, other global state will be moved
  as well.

* Some misc. problems with the nczarr performance tests were corrected.
2022-01-29 15:27:52 -07:00
Dennis Heimbigner
9380790ea8 Support MSYS2/Mingw platform
re:

The current netcdf-c release has some problems with the mingw platform
on windows. Mostly they are path issues.

Changes to support mingw+msys2:
-------------------------------
* Enable option of looking into the windows registry to find
  the mingw root path. In aid of proper path handling.
* Add mingw+msys as a specific platform in configure.ac and move testing
  of the platform to the front so it is available early.
* Handle mingw X libncpoco (dynamic loader) properly even though
  mingw does not yet support it.
* Handle mingw X plugins properly even though mingw does not yet support it.
* Alias pwd='pwd -W' to better handle paths in shell scripts.
* Plus a number of other minor compile irritations.
* Disallow the use of multiple nc_open's on the same file for windows
  (and mingw) because windows does not seem to handle these properly.
  Not sure why we did not catch this earlier.
* Add mountpoint info to dpathmgr.c to help support mingw.
* Cleanup dpathmgr conversions.

Known problems:
---------------
* I have not been able to get shared libraries to work, so
  plugins/filters must be disabled.
* There is some kind of problem with libcurl that I have not solved,
  so all uses of libcurl (currently DAP+Byterange) must be disabled.

Misc. other fixes:
------------------
* Cleanup the relationship between ENABLE_PLUGINS and various other flags
  in CMakeLists.txt and configure.ac.
* Re-arrange the TESTDIRS order in Makefile.am.
* Add pseudo-breakpoint to nclog.[ch] for debugging.
* Improve the documentation of the path manager code in ncpathmgr.h
* Add better support for relative paths in dpathmgr.c
* Default the mode args to NCfopen to include "b" (binary) for windows.
* Add optional debugging output in various places.
* Make sure that everything builds with plugins disabled.
* Fix numerous (s)printf inconsistencies betweenb the format spec
  and the arguments.
2021-12-23 22:18:56 -07:00
Dennis Heimbigner
55a2643cac Fix a number of OS specific bugs
1. Issue https://github.com/Unidata/netcdf-c/issues/2043
   * FreeBSD build fails because of conflicts in defining the fileno() function. So removed all extern declarations of fileno.

2. Issue https://github.com/Unidata/netcdf-c/issues/2124
   * There were a couple of problems here.
     * I was conflating msys with mingw and they need separate handling of paths. So treat mingw like windows.
     * memio.c was not always writing the full content of the memory to file. Untested fix by properly accounting for zero size writes.
     * Fix bug when skipping white space in tst_xcache.c

3. Issue https://github.com/Unidata/netcdf-c/pull/2105
   * On MINGW, bash and other POSIX utilities use a mounted root directory,
     but executables compiled for Windows do not recognise the mount point.
     Ensure that Windows paths are used in tests of Windows executables.

4. Issue https://github.com/Unidata/netcdf-c/issues/2132
   * Apparently the Intel C compiler on OSX defines isnan etc.
     So disable declaration in dutil.c under that condition.

5. Fix and re-enable test_rcmerge.sh by allowing override of where to
   look for .rc files

6. CMakeLists.txt suppresses certain ncdump directory tests because of differences in printing floats/doubles.
   * Extend the list to include those that also fail under mingw.
   * Suppress the mingw tests in ncdump/Makefile.am
2021-11-03 12:49:54 -06:00
Dennis Heimbigner
f6e25b695e Fix additional S3 support issues
re: https://github.com/Unidata/netcdf-c/issues/2117
re: https://github.com/Unidata/netcdf-c/issues/2119

* Modify libsrc to allow byte-range reading of netcdf-3 files in private S3 buckets; this required using the aws sdk. Also add a test case.
* The aws sdk can sometimes cause problems if the Awd::ShutdownAPI function is not called. So at optional atexit() support to ensure it is called. This is disabled for Windows.
* Add documentation to nczarr.md on how to build and use the aws sdk under windows. Currently it builds, but testing fails.
* Switch testing from stratus to the Unidata bucket on S3.
* Improve support for the s3: url protocol.
* Add a s3 specific utility code file: ds3util.c
* Modify NC_infermodel to attempt to read the magic number of byte-ranged files in S3.

## Misc.

* Move and rename the core S3 SDK wrapper code (libnczarr/zs3sdk.cpp) to libdispatch since it now used in libsrc as well as libnczarr.
* Add calls to nc_finalize in the utilities in case atexit is disabled.
* Add header only json parser to the distribution rather than as a built source.
2021-10-29 20:06:37 -06:00
Dennis Heimbigner
dc2ecc74ac (1) improve INI parser (2) Fix make discheck 2021-09-30 13:45:09 -06:00
Dennis Heimbigner
6b69b9c52c Significantly Improve Amazon S3 Cloud Storage Support
## S3 Related Fixes

* Add comprehensive support for specifying AWS profiles to provide access credentials.
* Parse the files "~/.aws/config" and "~/.aws/credentials to provide credentials for the HDF5 ROS3 driver and to locate default region.
* Add a function to obtain the currently active S3 credentials. The search rules are defined in docs/nczarr.md.
* Provide documentation for the new features.
* Modify the struct NCauth (in include/ncauth.h) to replace specific S3 credentials with a profile name.
* Add a unit test to test the operation of profile and credentials management.
* Add support for URLS of the form "s3://<bucket>/<key>"; this requires obtaining a default region.
* Allows the specification of profile and/or region in a URL of the form "#mode=nczarr,...&aws.region=...&aws.profile=..."

## Misc. Fixes

* Move the ezxml code to libdispatch so that it can be used both by DAP4 and nczarr.
* Modify nclist to provide a deep clone operation.
* Modify ncuri to provide a deep clone operation.
* Modify the .rc file format to allow the specification of a path to be tested when looking for an entry in the .rc file.
* Ensure that the NC_rcload function is called.
* Modify nchttp to support setting request headers.
2021-09-27 18:36:33 -06:00
Dennis Heimbigner
de23473ac3 Support Windows network paths: \\svc\x\y...
re: Issue https:\\github.com\Unidata\netcdf-c\issues\2060

The path conversion code forgot to consider the case of
windows network paths of the form \\svc\x\y...

I have added support for it, but I can't really test it
since I do not have access to a network drive.
2021-08-09 15:34:23 -06:00
Dennis Heimbigner
efd1be5d62 Fix shell handling of escapes
re: https://github.com/Unidata/netcdf-c/issues/1988

There was an issue with certain shell programs (bash notably).
For certain platforms and when given a url that had an escaped
'#' character (e.g. \\#) bash would not remove the backslash. So I
had to add a hack for this. Unfortunately I overdid it and it
removed all '' characters. This is ok for non-windows platforms,
but obviously fails for windows.

The fix is this.

1. In a utility program (ncgen, ncdump, nccopy, etc) there is probably a call (or calls) to NC_backslashUnescape(xxx) where xxx is a path argument from the command line.
2. Replace each such call with NC_shellUnescape(xxx).

The NC_shellUnescape function was added and searched only for occurrences of "\#" and replaces them with "#".
2021-04-21 14:59:15 -06:00
Dennis Heimbigner
0b7a5382e7 Codify cross-platform file paths
The netcdf-c code has to deal with a variety of platforms:
Windows, OSX, Linux, Cygwin, MSYS, etc.  These platforms differ
significantly in the kind of file paths that they accept.  So in
order to handle this, I have created a set of replacements for
the most common file system operations such as _open_ or _fopen_
or _access_ to manage the file path differences correctly.

A more limited version of this idea was already implemented via
the ncwinpath.h and dwinpath.c code. So this can be viewed as a
replacement for that code. And in path in many cases, the only
change that was required was to replace '#include <ncwinpath.h>'
with '#include <ncpathmgt.h>' and then replace file operation
calls with the NCxxx equivalent from ncpathmgr.h Note that
recently, the ncwinpath.h was renamed ncpathmgmt.h, so this pull
request should not require dealing with winpath.

The heart of the change is include/ncpathmgmt.h, which provides
alternate operations such as NCfopen or NCaccess and which properly
parse and rebuild path arguments to work for the platform on which
the code is executing. This mostly matters for Windows because of the
way that it uses backslash and drive letters, as compared to *nix*.
One important feature is that the user can do string manipulations
on a file path without having to worry too much about the platform
because the path management code will properly handle most mixed cases.
So one can for example concatenate a path suffix that uses forward
slashes to a Windows path and have it work correctly.

The conversion code is in libdispatch/dpathmgr.c, and the
important function there is NCpathcvt which does the proper
conversions to the local path format.

As a rule, most code should just replace their file operations with
the corresponding NCxxx ones defined in include/ncpathmgmt.h. These
NCxxx functions all call NCpathcvt on their path arguments before
executing the actual file operation.

In some rare cases, the client may need to directly use NCpathcvt,
but this should be avoided as much as possible. If there is a need
for supporting a new file operation not already in ncpathmgmt.h, then
use the code in dpathmgr.c as a template. Also please notify Unidata
so we can include it as a formal part or our supported operations.
Also, if you see an operation in the library that is not using the
NCxxx form, then please submit an issue so we can fix it.

Misc. Changes:
* Clean up the utf8 testing code; it is impossible to get some
  tests to work under windows using shell scripts; the args do
  not pass as utf8 but as some other encoding.
* Added an extra utf8 test case: test_unicode_path.sh
* Add a true test for HDF5 1.10.6 or later because as noted in
  PR https://github.com/Unidata/netcdf-c/pull/1794,
  HDF5 changed its Windows file path handling.
2021-03-04 13:41:31 -07:00