The netcdf-c code has to deal with a variety of platforms:
Windows, OSX, Linux, Cygwin, MSYS, etc. These platforms differ
significantly in the kind of file paths that they accept. So in
order to handle this, I have created a set of replacements for
the most common file system operations such as _open_ or _fopen_
or _access_ to manage the file path differences correctly.
A more limited version of this idea was already implemented via
the ncwinpath.h and dwinpath.c code. So this can be viewed as a
replacement for that code. And in path in many cases, the only
change that was required was to replace '#include <ncwinpath.h>'
with '#include <ncpathmgt.h>' and then replace file operation
calls with the NCxxx equivalent from ncpathmgr.h Note that
recently, the ncwinpath.h was renamed ncpathmgmt.h, so this pull
request should not require dealing with winpath.
The heart of the change is include/ncpathmgmt.h, which provides
alternate operations such as NCfopen or NCaccess and which properly
parse and rebuild path arguments to work for the platform on which
the code is executing. This mostly matters for Windows because of the
way that it uses backslash and drive letters, as compared to *nix*.
One important feature is that the user can do string manipulations
on a file path without having to worry too much about the platform
because the path management code will properly handle most mixed cases.
So one can for example concatenate a path suffix that uses forward
slashes to a Windows path and have it work correctly.
The conversion code is in libdispatch/dpathmgr.c, and the
important function there is NCpathcvt which does the proper
conversions to the local path format.
As a rule, most code should just replace their file operations with
the corresponding NCxxx ones defined in include/ncpathmgmt.h. These
NCxxx functions all call NCpathcvt on their path arguments before
executing the actual file operation.
In some rare cases, the client may need to directly use NCpathcvt,
but this should be avoided as much as possible. If there is a need
for supporting a new file operation not already in ncpathmgmt.h, then
use the code in dpathmgr.c as a template. Also please notify Unidata
so we can include it as a formal part or our supported operations.
Also, if you see an operation in the library that is not using the
NCxxx form, then please submit an issue so we can fix it.
Misc. Changes:
* Clean up the utf8 testing code; it is impossible to get some
tests to work under windows using shell scripts; the args do
not pass as utf8 but as some other encoding.
* Added an extra utf8 test case: test_unicode_path.sh
* Add a true test for HDF5 1.10.6 or later because as noted in
PR https://github.com/Unidata/netcdf-c/pull/1794,
HDF5 changed its Windows file path handling.
The primary change is to support the use of a zip file as a
storage format. Simultaneously the .nz4 support is made obsolete
Use of zip requires the libzip support library, so a number of
changes to the build files (Makefile.am, CMakeLists.txt) are
necessary to locate and incorporate libzip. The nczarr_tests
tests are also changed to add zip testing.
Other changes:
* Make sure distcheck leaves no files around.
* Add some functions to netcdf_aux to export some functions of libnetcdf.
* Add a new error NC_EFOUND as the complement of NC_EEMPTY.
* Add tracing support to nclog and use it in libnczarr.
* Modify the zmap interface to support the writeonce semantics of zip.
* Create a new s3util.c to support a variety of S3 auxilliary functions.
* EXTERNL'ize a number of functions so they can be used in s3util.
* Add support for the S3 ListObjects CommonPrefixes mechanism
to improve search.
* Add experimental support for running nczarr X s3 tests against
the actual Amazon S3 cloud.
Primary Fixes:
* Add a whole variable optimization -- used in the rare case that nc_get/put_vara covers the whole of a variable and the variable has a single chunk.
* Fix chunking error when stride causes whole chunks to be skipped.
* Fix some memory leaks
* Add test cases
* Add one performance test to nczarr_test/. This uses the timer utils from unit_test: timer_utils.[ch].
* Move ncdumpchunks utility from ncdump to nczarr_test
Misc. Other Changes:
* Make check for aws libraries conditional on --enable-nczarr-s3
* Remove all but one bm tests from nczarr_test until they are working.
* Remove another dependency on HDF5 from supposedly non-HDF5 specific code; specifically hdf5_log_hdf5.
* Make the BAIL2 macro be hdf5 specific and replace elsewhere with an HDF5 independent equivalent.
* Move hdf5cache.c to libsrc4/nc4cache.c because it is used by nczarr.
* Modify unit_tests so that some of them are run even if using Windows.
* Misc. small bug fixes and refactors and memory leaks.
* Rename some conflicting tests for cmake.
* Attempted to make nc_perf work with cmake and failed.
The primary fix is to improve CMake build support.
Specific changes include:
* CMake: Provide a better soln to locating the AWS SDK
libraries; the new way is the preferred method as described in
the aws-cpp-sdk documentation.
* CMake (and Automake): allow -DENABLE_S3_SDK (default off) to suppress
looking for AWS libraries.
* CMake: add the complete set of nczarr tests
* CMake: add EXTERNL as needed to various .h files.
* Improve support for windows drive letters in paths.
* Add nczarr and s3 flags to nc-config
* For VisualStudio X nczarr, cleanup the NAN+INFINITY handling
* Convert _MSC_VER -> _WIN32 and vice versa as needed
* NCZarr - support multiple platform paths including windows, cygwin.
mingw, etc.
* NCZarr - sort the test outputs because different platforms
produce directory contents in different orders.
One big change concerns netcdf-c/CMakeLists.txt and netcdf-c/configure.ac.
In the current versions, it was the case that --disable-hdf5
disabled netcdf-4 (libsrc4). With nczarr, this can no longer
be the case because nczarr requires libsrc4 even if libhdf5
is disabled. So, I modified the above files to move the
format options (HDF5, NCZarr, HDF4, etc) to a single place
near the front of the files. Now it is the case that:
* Enabling any of the formats that require libsrc4
also does an implicit --enable-netcdf4.
* --disable-netcdf4 | --disable-netcdf-4 now becomes
and alias for --disable-hdf5.
There are probably some bugs in this change in terms of
dependencies between format options.
Problems:
* CMake S3 support is still not working for Visual Studio
* A recent issue points out that there is work to do on handling
UTF8 filenames, but that will be addressed in a separate fix.
Notes:
* Consider converting all of our includes/.h files to use EXTERNL
cloud using a variant of the Zarr protocol and storage
format. This enhancement is generically referred to as "NCZarr".
The data model supported by NCZarr is netcdf-4 minus the user-defined
types and the String type. In this sense it is similar to the CDF-5
data model.
More detailed information about enabling and using NCZarr is
described in the document NUG/nczarr.md and in a
[Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in).
WARNING: this code has had limited testing, so do use this version
for production work. Also, performance improvements are ongoing.
Note especially the following platform matrix of successful tests:
Platform | Build System | S3 support
------------------------------------
Linux+gcc | Automake | yes
Linux+gcc | CMake | yes
Visual Studio | CMake | no
Additionally, and as a consequence of the addition of NCZarr,
major changes have been made to the Filter API. NOTE: NCZarr
does not yet support filters, but these changes are enablers for
that support in the future. Note that it is possible
(probable?) that there will be some accidental reversions if the
changes here did not correctly mimic the existing filter testing.
In any case, previously filter ids and parameters were of type
unsigned int. In order to support the more general zarr filter
model, this was all converted to char*. The old HDF5-specific,
unsigned int operations are still supported but they are
wrappers around the new, char* based nc_filterx_XXX functions.
This entailed at least the following changes:
1. Added the files libdispatch/dfilterx.c and include/ncfilter.h
2. Some filterx utilities have been moved to libdispatch/daux.c
3. A new entry, "filter_actions" was added to the NCDispatch table
and the version bumped.
4. An overly complex set of structs was created to support funnelling
all of the filterx operations thru a single dispatch
"filter_actions" entry.
5. Move common code to from libhdf5 to libsrc4 so that it is accessible
to nczarr.
Changes directly related to Zarr:
1. Modified CMakeList.txt and configure.ac to support both C and C++
-- this is in support of S3 support via the awd-sdk libraries.
2. Define a size64_t type to support nczarr.
3. More reworking of libdispatch/dinfermodel.c to
support zarr and to regularize the structure of the fragments
section of a URL.
Changes not directly related to Zarr:
1. Make client-side filter registration be conditional, with default off.
2. Hack include/nc4internal.h to make some flags added by Ed be unique:
e.g. NC_CREAT, NC_INDEF, etc.
3. cleanup include/nchttp.h and libdispatch/dhttp.c.
4. Misc. changes to support compiling under Visual Studio including:
* Better testing under windows for dirent.h and opendir and closedir.
5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags
and to centralize error reporting.
6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them.
7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible.
Changes Left TO-DO:
1. fix provenance code, it is too HDF5 specific.
re: https://github.com/Unidata/netcdf-c/issues/1584
Support has been added for multiple filters per variable. This
affects a number of components in netcdf. The new APIs are
documented in NUG/filters.md.
The primary changes are:
* A set of new functions are provided (see __include/netcdf_filter.h__).
- Obtain a list of the filters associated with a variable
- Obtain the parameters for a specific filter.
* The existing __nc_inq_var_filter__ function now returns info
about the first defined filter.
* The utilities (ncgen, ncdump, and nccopy) now support
an extended format for specifying a sequence of filters.
The general form is __<filter>|<filter>..._.
* The ncdump **_Filter** attribute now dumps a list of all the
filters associated with a variable using the above new format.
* Filter specifications can now use a filter name instead of number
for filters known to the netcdf library, which in turn is taken
from the HDF5 filter registration page.
* New errors are defined: NC_EFILTER and NC_ENOFILTER. The latter
is returned if an attempt is made to access an unknown filter.
* Internally, the dispatch table has been extended to add a function
to handle all of the filter functions.
* New, filter-related, tests were added to nc_test4.
* A new plugin was added to the plugins directory to help with testing.
Notes:
1. The shuffle and fletcher32 filters are not part of the multifilter system.
Misc. changes:
1. A debug module was added to libhdf5 to help catch error locations.
* For URL paths, the new approach essentially centralizes all information
in the URL into the "#mode=" fragment key and uses that value
to determine the dispatcher for (most) URLs.
* The new approach has the following steps:
1. canonicalize the path if it is a URL.
2. use the mode= fragment key to determine the dispatcher
3. if dispatcher still not determined, then use the mode flags
argument to nc_open/nc_create to determine the dispatcher.
4. if the path points to something readable, attempt to read the
magic number at the front, and use that to determine the dispatcher.
this case may override all previous cases.
* Misc changes.
1. Update documentation
2. Moved some unit tests from libdispatch to unit_test directory.
3. Fixed use of wrong #ifdef macro in test_filter_reg.c
[I think this may fix an previously reported esupport query].
re: https://github.com/Unidata/netcdf-c/issues/1408
1. Add some function tests to configure.ac; these are functions
not defined with -ansi.
2. When using -ansi, fix include/ncconfigure.h to check for
the possibilty that certain functions are being defined
by macros. Apparently Debian does this for some reason.
No idea why.
Unrelated: modify the debug/cf.cmake debug shell script.
supercede PR: https://github.com/Unidata/netcdf-c/pull/1384
Since we have an mmap user, undeprecate it and make sure
it works. Other changes:
* fix test cases to work with make -j
* fix exposed ncgen error.
re: https://github.com/Unidata/netcdf-c/issues/1373 (partial)
* Mark some global constants be const to indicate to make them easier to track.
* Hide direct access to the ncrc_globalstate behind a function call.
* Convert dispatch tables to constants (except the user defined ones)
This has some consequences in terms of function arguments needing to be marked
as const also.
* Remove some no longer needed global fields
* Aggregate all the globals in nclog.c
* Uniformly replace nc_sizevector{0,1} with NC_coord_{zero,one}
* Uniformly replace nc_ptrdffvector1 with NC_stride_one
* Remove some obsolete code
Add Wei-King Liao's ncvalidator program (with his permission) as
an uninstalled (for now) tool in the ncdump directory. It has in
the past been useful for debugging netcdf-3 files.
A user suggested that the nccopy -F option
syntax should be extended to support specification
of multiple (or all) variables in a single -F option.
The new syntax allows:
1. '*' as the name of the variable; this means apply the
filter to all variables in the data set.
2. *var1|var2|...* as the variable name to indicate that the filter
should be applied to the multiple specified variables.
Primary fixes to get -ansi to work.
1. Convert all '//' C++ style comments to /*...*/ or to use #if 0...#endif
2. It turns out that when -ansi is specified, then a number of
functions no longer are defined in the header -- but they are still
in the .so file.<br>
The big example is strdup(). So, added code to include/ncconfig.h to define
externs for those missing functions that occur in more than one place.
These are enabled if !_WIN32 && __STDC__ == 1 (__STDC__ is supposed to
be the equivalent compile time flag to -ansi). Note that this requires
config.h (which references ncconfig.h) to be included in files where it is
currently not included. Single uses will be only in the file that uses them.
3. Added mmap test for the MAP_ANONYMOUS flag to configure.ac. Apparently
this is not always defined with -ansi.
4. fix some large integer constants in nc_test4/tst_atts3.c and nc_test4/tst_filterparser.c
to avoid compiler complaints.
5. fix a double constant in nc_test4/tst_filterparser.c to avoid compiler complaints.
[Note I suspect #4 and #5 will be a problem on big-endian machines, but we have no way to test]
Misc. Changes:
1. convert more instances of _MSC_VER to _WIN32.
2. added some debugging code to include/nctestserver.h
3. added comment about libdispatch/drc.c always being compiled.
4. modify parser generation in ncgen to remove unneeded files.
re: issue https://github.com/Unidata/netcdf-c/issues/1233
Changes:
1. remove exit that was there for testing.
2. the program tst_open_mem must be netcdf-4 only.
3. fix some diff problems
- Change dataset name for tst_inmemory4_create to tst_inmemory4
- Modify tst_inmemory.c to reorder the variables (somewhat major rewrite)
Minor Unrelated Fixes:
1. fix comment problem in nc_provenance.h
2. Fix memory leak in tst_open_mem.c
3. fix ncdump/bindata.c to properly compile if netcdf4 is disabled.
4. minor changes to ncgen.l
https://github.com/Unidata/netcdf-c/issues/1168https://github.com/Unidata/netcdf-c/issues/1163https://github.com/Unidata/netcdf-c/issues/1162
This PR partially fixes memory leaks in the netcdf-c library,
in the ncdump utility, and in some test cases.
The netcdf-c library now runs memory clean with the assumption
that the --disable-utilities option is used. The primary remaining
problem is ncgen. Once that is fixed, I believe the netcdf-c library
will run memory clean with no limitations.
Notes
-----------
1. Memory checking was performed using gcc -fsanitize=address.
Valgrind-based testing has yet to be performed.
2. The pnetcdf, hdf4, and examples code has not been tested.
Misc. Non-leak changes
1. Make tst_diskless2 only run when netcdf4 is enabled (issue 1162)
2. Fix CmakeLists.txt to turn off logging if ENABLE_NETCDF_4 is OFF
3. Isolated all my debug scripts into a single top-level directory
called debug
4. Fix some USE_NETCDF4 dependencies in nc_test and nc_test4 Makefile.am