Commit Graph

61 Commits

Author SHA1 Message Date
Even Rouault
0582c2044a
Fix a stack-read-overflow in ncindexlookup()
Fixes an issue with strlen() reading outside the stack allocated buffer
by NC4_HDF5_inq_att, when reading a name whose length is NC_MAX_NAME.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=39189 found
on GDAL

==1895951== Conditional jump or move depends on uninitialised value(s)
==1895951==    at 0x483EF58: strlen (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1895951==    by 0x48EF73E: ncindexlookup (ncindex.c:60)
==1895951==    by 0x48E81DF: nc4_find_grp_att (nc4internal.c:587)
==1895951==    by 0x48E5B39: nc4_get_att_ptrs (nc4attr.c:72)
==1895951==    by 0x48F98A0: NC4_HDF5_inq_att (hdf5attr.c:818)
==1895951==    by 0x48847F7: nc_inq_att (dattinq.c:91)
==1895951==    by 0x10D693: pr_att (ncdump.c:767)
==1895951==    by 0x110ADB: do_ncdump_rec (ncdump.c:1887)
==1895951==    by 0x1112F1: do_ncdump (ncdump.c:2038)
==1895951==    by 0x11248B: main (ncdump.c:2478)
==1895951==
==1895951== Use of uninitialised value of size 8
==1895951==    at 0x48A24E4: crc64_little (dcrc64.c:173)
==1895951==    by 0x48A27F4: NC_crc64 (dcrc64.c:229)
==1895951==    by 0x4892D49: NC_hashmapkey (nchashmap.c:159)
==1895951==    by 0x489314B: NC_hashmapget (nchashmap.c:263)
==1895951==    by 0x48EF75F: ncindexlookup (ncindex.c:60)
==1895951==    by 0x48E81DF: nc4_find_grp_att (nc4internal.c:587)
==1895951==    by 0x48E5B39: nc4_get_att_ptrs (nc4attr.c:72)
==1895951==    by 0x48F98A0: NC4_HDF5_inq_att (hdf5attr.c:818)
==1895951==    by 0x48847F7: nc_inq_att (dattinq.c:91)
==1895951==    by 0x10D693: pr_att (ncdump.c:767)
==1895951==    by 0x110ADB: do_ncdump_rec (ncdump.c:1887)
==1895951==    by 0x1112F1: do_ncdump (ncdump.c:2038)
==1895951==
2021-09-24 11:59:48 +02:00
Dennis Heimbigner
11fe00ea05 Add filter support to NCZarr
Filter support has three goals:

1. Use the existing HDF5 filter implementations,
2. Allow filter metadata to be stored in the NumCodecs metadata format used by Zarr,
3. Allow filters to be used even when HDF5 is disabled

Detailed usage directions are define in docs/filters.md.

For now, the existing filter API is left in place. So filters
are defined using ''nc_def_var_filter'' using the HDF5 style
where the id and parameters are unsigned integers.

This is a big change since filters affect many parts of the code.

In the following, the terms "compressor" and "filter" and "codec" are generally
used synonomously.

### Filter-Related Changes:
* In order to support dynamic loading of shared filter libraries, a new library was added in the libncpoco directory; it helps to isolate dynamic loading across multiple platforms.
* Provide a json parsing library for use by plugins; this is created by merging libdispatch/ncjson.c with include/ncjson.h.
* Add a new _Codecs attribute to allow clients to see what codecs are being used; let ncdump -s print it out.
* Provide special headers to help support compilation of HDF5 filters when HDF5 is not enabled: netcdf_filter_hdf5_build.h and netcdf_filter_build.h.
* Add a number of new test to test the new nczarr filters.
* Let ncgen parse _Codecs attribute, although it is ignored.

### Plugin directory changes:
* Add support for the Blosc compressor; this is essential because it is the most common compressor used in Zarr datasets. This also necessitated adding a CMake FindBlosc.cmake file
* Add NCZarr support for the big-four filters provided by HDF5: shuffle, fletcher32, deflate (zlib), and szip
* Add a Codec defaulter (see docs/filters.md) for the big four filters.
* Make plugins work with windows by properly adding __declspec declaration.

### Misc. Non-Filter Changes
* Replace most uses of USE_NETCDF4 (deprecated) with USE_HDF5.
* Improve support for caching
* More fixes for path conversion code
* Fix misc. memory leaks
* Add new utility -- ncdump/ncpathcvt -- that does more or less the same thing as cygpath.
* Add a number of new test to test the non-filter fixes.
* Update the parsers
* Convert most instances of '#ifdef _MSC_VER' to '#ifdef _WIN32'
2021-09-02 17:04:26 -06:00
Greg Sjaardema
cbcee382b0 Remove need for HDF5-1.6 API being defined 2021-04-28 13:59:24 -06:00
Dennis Heimbigner
0b7a5382e7 Codify cross-platform file paths
The netcdf-c code has to deal with a variety of platforms:
Windows, OSX, Linux, Cygwin, MSYS, etc.  These platforms differ
significantly in the kind of file paths that they accept.  So in
order to handle this, I have created a set of replacements for
the most common file system operations such as _open_ or _fopen_
or _access_ to manage the file path differences correctly.

A more limited version of this idea was already implemented via
the ncwinpath.h and dwinpath.c code. So this can be viewed as a
replacement for that code. And in path in many cases, the only
change that was required was to replace '#include <ncwinpath.h>'
with '#include <ncpathmgt.h>' and then replace file operation
calls with the NCxxx equivalent from ncpathmgr.h Note that
recently, the ncwinpath.h was renamed ncpathmgmt.h, so this pull
request should not require dealing with winpath.

The heart of the change is include/ncpathmgmt.h, which provides
alternate operations such as NCfopen or NCaccess and which properly
parse and rebuild path arguments to work for the platform on which
the code is executing. This mostly matters for Windows because of the
way that it uses backslash and drive letters, as compared to *nix*.
One important feature is that the user can do string manipulations
on a file path without having to worry too much about the platform
because the path management code will properly handle most mixed cases.
So one can for example concatenate a path suffix that uses forward
slashes to a Windows path and have it work correctly.

The conversion code is in libdispatch/dpathmgr.c, and the
important function there is NCpathcvt which does the proper
conversions to the local path format.

As a rule, most code should just replace their file operations with
the corresponding NCxxx ones defined in include/ncpathmgmt.h. These
NCxxx functions all call NCpathcvt on their path arguments before
executing the actual file operation.

In some rare cases, the client may need to directly use NCpathcvt,
but this should be avoided as much as possible. If there is a need
for supporting a new file operation not already in ncpathmgmt.h, then
use the code in dpathmgr.c as a template. Also please notify Unidata
so we can include it as a formal part or our supported operations.
Also, if you see an operation in the library that is not using the
NCxxx form, then please submit an issue so we can fix it.

Misc. Changes:
* Clean up the utf8 testing code; it is impossible to get some
  tests to work under windows using shell scripts; the args do
  not pass as utf8 but as some other encoding.
* Added an extra utf8 test case: test_unicode_path.sh
* Add a true test for HDF5 1.10.6 or later because as noted in
  PR https://github.com/Unidata/netcdf-c/pull/1794,
  HDF5 changed its Windows file path handling.
2021-03-04 13:41:31 -07:00
Dennis Heimbigner
d2316f866c Additional Fixes to NCZarr
Primary Fixes:
* Add a whole variable optimization -- used in the rare case that nc_get/put_vara covers the whole of a variable and the variable has a single chunk.
* Fix chunking error when stride causes whole chunks to be skipped.
* Fix some memory leaks
* Add test cases
* Add one performance test to nczarr_test/. This uses the timer utils from unit_test: timer_utils.[ch].
* Move ncdumpchunks utility from ncdump to nczarr_test

Misc. Other Changes:
* Make check for aws libraries conditional on --enable-nczarr-s3
* Remove all but one bm tests from nczarr_test until they are working.
* Remove another dependency on HDF5 from supposedly non-HDF5 specific code; specifically hdf5_log_hdf5.
* Make the BAIL2 macro be hdf5 specific and replace elsewhere with an HDF5 independent equivalent.
* Move hdf5cache.c to libsrc4/nc4cache.c because it is used by nczarr.
* Modify unit_tests so that some of them are run even if using Windows.
* Misc. small bug fixes and refactors and memory leaks.
* Rename some conflicting tests for cmake.
* Attempted to make nc_perf work with cmake and failed.
2020-12-16 20:48:02 -07:00
Dennis Heimbigner
730aa1f6bc Improve the building of NCZARR S3 support in CMake and Autoconf
There were some irregularities in the flags for handling NCZarr S3 support.

The primary change is to regularize the flags controlling this to the following.

1. Automake: --enable-nczarr-s3 and CMake: ENABLE_NCZARR_S3
2. Automake: --enable-nczarr-s3-tests and CMake: ENABLE_NCZARR_S3_TESTS

Flag 1 indicates that NCZarr should be built with S3 support enabled.
Flag 2 indicates that the NCZarr S3 tests should be run

These two flags are separate because running the NCZarr S3 tests
requires access to protected S3 resources. Currently, running
these tests is restricted to Unidata personnel. However, users
may want to enable S3 support even if they cannot run the tests.
It is, of course, an error to specify 2 without specifying 1.

Additionally, if the AWS S3 SDK library is not found, then the NCZARR S3
support and testing must be disabled. Otherwise an error is signaled
during the build.

Some of these NCZarr and S3 changes are propagated to nc-config.

Misc. Other Changes:

1. Allow testing for CYGWIN or MSVC in shell scripts.
2. Add specific test for HDF5 library version 1.10.6.
   This is encoded as "HDF5_UTF8_PATHS" because that is the first
   version where HDF5 properly supports it under Windows. This is used
   in hdf5internal/nc4_ndf5_ansi_to_utf8.
3. Add a AM Conditional -- AX_IGNORE -- for use in testing
   when it is desirable to temporarily suppress Makefile code.
4. Add MULTIFILTER flag to CMakeLists.txt
2020-10-16 15:04:51 -06:00
Dennis Heimbigner
aeb3ac2809 Mostly revert the filter code to reduce its complexity of use.
re: https://github.com/Unidata/netcdf-c/issues/1836

Revert the internal filter code to simplify it. From the user's
point of view, the only visible changes should be:

1. The functions that convert text to filter specs have had their signature reverted and have been moved to netcdf_aux.h
2. Some filter API functions now return NC_ENOFILTER when inquiry is made about some filter.

Internally,the dispatch table has been modified to get rid of the filter_actions
entry and associated complex structures. It has been replaced with
inq_var_filter_ids and inq_var_filter_info entries and the dispatch table
version has been bumped to 3. Corresponding NOOP and NOTNC4 functions
were added to libdispatch/dnotnc4.c. Also, the filter_action entries
in dispatch tables were replaced for all dispatch code bases (HDF5, DAP2,
etc). This should only impact UDF users.

In the process, it became clear that the form of the filters
field in NC_VAR_INFO_T was format dependent, so I converted it to
be of type void* and pushed its management into the various dispatch
code bases. Specifically libhdf5 and libnczarr now manage the filters
field in their own way.

The auxilliary functions for parsing textual filter specifications
were moved to netcdf_aux.h and were renamed to the following:
* ncaux_h5filterspec_parse
* ncaux_h5filterspec_parselist
* ncaux_h5filterspec_free
* ncaux_h5filter_fix8

Misc. Other Changes:

1. Document NUG/filters.md updated to reflect the changes above.
2. All the old data types (structs and enums)
   used by filter_actions actions were deleted.
   The exception is the NC_H5_Filterspec because it is needed
   by ncaux_h5filterspec_parselist.
3. Clientside filters were removed -- another enhancement
   for which no-one ever asked.
4. The ability to remove filters was itself removed.
5. Some functionality needed by nczarr was moved from libhdf5
   to libsrc4 e.g. nc4_find_default_chunksizes
6. All the filterx code was removed
7. ncfilter.h and nc4filter.c no longer used

Misc. Unrelated Changes:

1. The nczarr_test makefile clean was leaving some directories; so
   add clean-local to take care of them.
2020-09-27 12:43:46 -06:00
bombipappoo
ecbb0f5bbf Convert filename from ANSI to UTF-8 before calling HDF5. 2020-07-14 22:44:42 +09:00
Dennis Heimbigner
59e04ae071 This PR adds EXPERIMENTAL support for accessing data in the
cloud using a variant of the Zarr protocol and storage
format. This enhancement is generically referred to as "NCZarr".

The data model supported by NCZarr is netcdf-4 minus the user-defined
types and the String type. In this sense it is similar to the CDF-5
data model.

More detailed information about enabling and using NCZarr is
described in the document NUG/nczarr.md and in a
[Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in).

WARNING: this code has had limited testing, so do use this version
for production work. Also, performance improvements are ongoing.
Note especially the following platform matrix of successful tests:

Platform | Build System | S3 support
------------------------------------
Linux+gcc      | Automake     | yes
Linux+gcc      | CMake        | yes
Visual Studio  | CMake        | no

Additionally, and as a consequence of the addition of NCZarr,
major changes have been made to the Filter API. NOTE: NCZarr
does not yet support filters, but these changes are enablers for
that support in the future.  Note that it is possible
(probable?) that there will be some accidental reversions if the
changes here did not correctly mimic the existing filter testing.

In any case, previously filter ids and parameters were of type
unsigned int. In order to support the more general zarr filter
model, this was all converted to char*.  The old HDF5-specific,
unsigned int operations are still supported but they are
wrappers around the new, char* based nc_filterx_XXX functions.
This entailed at least the following changes:
1. Added the files libdispatch/dfilterx.c and include/ncfilter.h
2. Some filterx utilities have been moved to libdispatch/daux.c
3. A new entry, "filter_actions" was added to the NCDispatch table
   and the version bumped.
4. An overly complex set of structs was created to support funnelling
   all of the filterx operations thru a single dispatch
   "filter_actions" entry.
5. Move common code to from libhdf5 to libsrc4 so that it is accessible
   to nczarr.

Changes directly related to Zarr:
1. Modified CMakeList.txt and configure.ac to support both C and C++
   -- this is in support of S3 support via the awd-sdk libraries.
2. Define a size64_t type to support nczarr.
3. More reworking of libdispatch/dinfermodel.c to
   support zarr and to regularize the structure of the fragments
   section of a URL.

Changes not directly related to Zarr:
1. Make client-side filter registration be conditional, with default off.
2. Hack include/nc4internal.h to make some flags added by Ed be unique:
   e.g. NC_CREAT, NC_INDEF, etc.
3. cleanup include/nchttp.h and libdispatch/dhttp.c.
4. Misc. changes to support compiling under Visual Studio including:
   * Better testing under windows for dirent.h and opendir and closedir.
5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags
   and to centralize error reporting.
6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them.
7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible.

Changes Left TO-DO:
1. fix provenance code, it is too HDF5 specific.
2020-06-28 18:02:47 -06:00
Edward Hartnett
6aa6eff710 now properly setting HDF5 file cache for files created/opened sequentially on parallel IO builds 2020-05-08 11:00:56 -06:00
Edward Hartnett
e3c9e83ecf adding internal function, plus some documentation 2020-05-08 08:58:42 -06:00
Dennis Heimbigner
b0e0d81aa9 Fix reclamation of the ->format_XXX_info fields
nc4internal.c contains code to free the format_XXX_info
fields. Since these are format specific, this code
was moved to the dispatch code (libhdf5 and libhdf4
in the current case).

Additionally, there are some fields in nc4internal.h (e.g.
dimscale fields) that are specific to HDF5 and have been moved
to the corresponding HDF5 data structures and code.

Misc. other changes:
1. NC_VAR_INFO_T->hdf5_name renamed to alt_name to avoid
   implying it is necessarily HDF5 specific.
2. prefix NC_FILE_INFO_T with an instance of NC_OBJ for consistency.
   this also requires wrapping move_in_NCList() to keep
   hdr.id consistent.
2020-03-29 12:48:59 -06:00
Edward Hartnett
b7ac19a43f only close non-zero typeids 2020-02-09 12:03:21 -07:00
Edward Hartnett
af6b6787bf fix for memory leak due to HDF5 types 2020-02-09 11:47:13 -07:00
Greg Sjaardema
56c0d5cf8a Spelling fixes 2019-09-18 08:03:01 -06:00
Ed Hartnett
c771443e43 cleanup of whitespace in HDF5 directory 2019-02-19 05:18:25 -07:00
Ed Hartnett
66cc9c5020 fixed rename bug 2019-01-24 10:20:46 -07:00
Ed Hartnett
840d51d035 changed NC_GRP_INFO_T to use atts_read instead of atts_not_read 2019-01-22 08:11:52 -07:00
Ed Hartnett
243cef8fa5 changed var atts_not_read to atts_read 2019-01-21 08:40:04 -07:00
Ed Hartnett
15e6a782db removed unneeded variable, shortened function name 2019-01-20 09:25:04 -07:00
Ed Hartnett
1b38d9aef8 lazy read of some var metadata 2018-12-18 07:48:22 -07:00
Ed Hartnett
ef5a39e19a cleaned up hdf5internal.c 2018-12-11 06:22:48 -07:00
Ed Hartnett
b5ed407e9f better handling of normalizing names in HDF5 atts 2018-11-30 13:28:18 -07:00
Ed Hartnett
d0587c9536 more name normalization 2018-11-30 09:14:53 -07:00
Ed Hartnett
77d0922d49 starting to deal with normalized name in HDF5 attribute code 2018-11-30 08:59:58 -07:00
Ed Hartnett
aade08ee22 moving checking for lazy att reads to libhdf5 2018-11-26 07:49:58 -07:00
Ed Hartnett
ab963e3d41 removing HDF5 type info from libsrc4 2018-11-20 14:24:40 -07:00
Ed Hartnett
6829a583da changing over native_hdf_typeid 2018-11-20 10:52:43 -07:00
Ed Hartnett
d29feb53de have switched location of hdf_native_typeid 2018-11-20 10:29:02 -07:00
Ed Hartnett
9e970562cf more type work 2018-11-20 10:01:39 -07:00
Ed Hartnett
c072c89357 more type work 2018-11-20 09:57:47 -07:00
Ed Hartnett
1d5307b600 merged in ejh_next_10 2018-11-19 09:25:04 -07:00
Ed Hartnett
3ec8b34bfb removed unneeded HDF5 fields 2018-11-19 09:23:43 -07:00
Ed Hartnett
d7fe095066 moving rest of var stuff 2018-11-13 16:59:07 -07:00
Ed Hartnett
f6def4a089 moving some fill value handling to libhdf5 2018-11-13 15:37:49 -07:00
Ed Hartnett
55ab435849 more var changes 2018-11-13 11:44:27 -07:00
Ed Hartnett
6c135d0c24 more var changes 2018-11-13 11:39:45 -07:00
Ed Hartnett
25fe9e8364 more var changes 2018-11-13 11:36:25 -07:00
Ed Hartnett
825047b8f6 rest of moving HDF5 specific group info to libhdf5 2018-11-12 14:04:17 -07:00
Ed Hartnett
79c2f843d8 rest of hdf5 specific group changes 2018-11-12 13:57:04 -07:00
Ed Hartnett
856e4ead03 moved hdf_dimscaleid to hdf5-specific dim info 2018-11-08 10:55:21 -07:00
Ed Hartnett
bff06f2c4d starting to use HDF5-specific dim info 2018-11-08 10:22:37 -07:00
Ed Hartnett
dab52a97ee more looking up of HDF5 dim info 2018-11-08 10:16:55 -07:00
Ed Hartnett
92dc85b802 continuing to set hdf5-specific dim info struct 2018-11-08 10:11:46 -07:00
Ed Hartnett
008d4ee796 starting to find HDF5 specific dim info 2018-11-08 08:57:44 -07:00
Ed Hartnett
6f4b4ac80d moving attribute HDF5 stuff to libhdf5 2018-11-07 14:21:57 -07:00
Ed Hartnett
11d725facc allocating and freeing memory for hdf5-specific attribute info 2018-11-07 13:45:51 -07:00
Ed Hartnett
5f36a3b425 merged master 2018-11-02 10:00:53 -06:00
Dennis Heimbigner
245961de00 re: github issues
https://github.com/Unidata/netcdf-c/issues/1168
    https://github.com/Unidata/netcdf-c/issues/1163
    https://github.com/Unidata/netcdf-c/issues/1162

This PR partially fixes memory leaks in the netcdf-c library,
in the ncdump utility, and in some test cases.

The netcdf-c library now runs memory clean with the assumption
that the --disable-utilities option is used. The primary remaining
problem is ncgen. Once that is fixed, I believe the netcdf-c library
will run memory clean with no limitations.

Notes
-----------
1. Memory checking was performed using gcc -fsanitize=address.
   Valgrind-based testing has yet to be performed.
2. The pnetcdf, hdf4, and examples code has not been tested.

Misc. Non-leak changes
1. Make tst_diskless2 only run when netcdf4 is enabled (issue 1162)
2. Fix CmakeLists.txt to turn off logging if ENABLE_NETCDF_4 is OFF
3. Isolated all my debug scripts into a single top-level directory
   called debug
4. Fix some USE_NETCDF4 dependencies in nc_test and nc_test4 Makefile.am
2018-10-30 20:48:12 -06:00
Ed Hartnett
c2cb3379f4 doing HDF5 closing in libhdf5 2018-10-23 06:59:22 -06:00