Commit Graph

36 Commits

Author SHA1 Message Date
Dennis Heimbigner
eb3d9eb0c9 Provide a Number of fixes/improvements to NCZarr
Primary changes:
* Add an improved cache system to speed up performance.
* Fix NCZarr to properly handle scalar variables.

Misc. Related Changes:
* Added unit tests for extendible hash and for the generic cache.
* Add config parameter to set size of the NCZarr cache.
* Add initial performance tests but leave them unused.
* Add CRC64 support.
* Move location of ncdumpchunks utility from /ncgen to /ncdump.
* Refactor auth support.

Misc. Unrelated Changes:
* More cleanup of the S3 support
* Add support for S3 authentication in .rc files: HTTP.S3.ACCESSID and HTTP.S3.SECRETKEY.
* Remove the hashkey from the struct OBJHDR since it is never used.
2020-11-19 17:01:04 -07:00
Dennis Heimbigner
59e04ae071 This PR adds EXPERIMENTAL support for accessing data in the
cloud using a variant of the Zarr protocol and storage
format. This enhancement is generically referred to as "NCZarr".

The data model supported by NCZarr is netcdf-4 minus the user-defined
types and the String type. In this sense it is similar to the CDF-5
data model.

More detailed information about enabling and using NCZarr is
described in the document NUG/nczarr.md and in a
[Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in).

WARNING: this code has had limited testing, so do use this version
for production work. Also, performance improvements are ongoing.
Note especially the following platform matrix of successful tests:

Platform | Build System | S3 support
------------------------------------
Linux+gcc      | Automake     | yes
Linux+gcc      | CMake        | yes
Visual Studio  | CMake        | no

Additionally, and as a consequence of the addition of NCZarr,
major changes have been made to the Filter API. NOTE: NCZarr
does not yet support filters, but these changes are enablers for
that support in the future.  Note that it is possible
(probable?) that there will be some accidental reversions if the
changes here did not correctly mimic the existing filter testing.

In any case, previously filter ids and parameters were of type
unsigned int. In order to support the more general zarr filter
model, this was all converted to char*.  The old HDF5-specific,
unsigned int operations are still supported but they are
wrappers around the new, char* based nc_filterx_XXX functions.
This entailed at least the following changes:
1. Added the files libdispatch/dfilterx.c and include/ncfilter.h
2. Some filterx utilities have been moved to libdispatch/daux.c
3. A new entry, "filter_actions" was added to the NCDispatch table
   and the version bumped.
4. An overly complex set of structs was created to support funnelling
   all of the filterx operations thru a single dispatch
   "filter_actions" entry.
5. Move common code to from libhdf5 to libsrc4 so that it is accessible
   to nczarr.

Changes directly related to Zarr:
1. Modified CMakeList.txt and configure.ac to support both C and C++
   -- this is in support of S3 support via the awd-sdk libraries.
2. Define a size64_t type to support nczarr.
3. More reworking of libdispatch/dinfermodel.c to
   support zarr and to regularize the structure of the fragments
   section of a URL.

Changes not directly related to Zarr:
1. Make client-side filter registration be conditional, with default off.
2. Hack include/nc4internal.h to make some flags added by Ed be unique:
   e.g. NC_CREAT, NC_INDEF, etc.
3. cleanup include/nchttp.h and libdispatch/dhttp.c.
4. Misc. changes to support compiling under Visual Studio including:
   * Better testing under windows for dirent.h and opendir and closedir.
5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags
   and to centralize error reporting.
6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them.
7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible.

Changes Left TO-DO:
1. fix provenance code, it is too HDF5 specific.
2020-06-28 18:02:47 -06:00
Dennis Heimbigner
b0e0d81aa9 Fix reclamation of the ->format_XXX_info fields
nc4internal.c contains code to free the format_XXX_info
fields. Since these are format specific, this code
was moved to the dispatch code (libhdf5 and libhdf4
in the current case).

Additionally, there are some fields in nc4internal.h (e.g.
dimscale fields) that are specific to HDF5 and have been moved
to the corresponding HDF5 data structures and code.

Misc. other changes:
1. NC_VAR_INFO_T->hdf5_name renamed to alt_name to avoid
   implying it is necessarily HDF5 specific.
2. prefix NC_FILE_INFO_T with an instance of NC_OBJ for consistency.
   this also requires wrapping move_in_NCList() to keep
   hdr.id consistent.
2020-03-29 12:48:59 -06:00
Edward Hartnett
21d2458afe fixed HDF4 use of contiguous 2020-03-08 07:28:30 -06:00
edwardhartnett
80542c0cdd fixed comment to trigger CI 2019-08-04 08:03:53 -06:00
edwardhartnett
7ce322a6f1 now have libhdf5 use nc4_file_list_add() 2019-08-02 09:29:18 -06:00
edwardhartnett
181f260a20 adding, and starting to use nc4_file_list_add() 2019-08-02 09:14:35 -06:00
edwardhartnett
49dcb48c25 clean up 2019-08-01 14:39:45 -06:00
edwardhartnett
170c5b0901 removed NC from open in dispatch table 2019-08-01 14:30:20 -06:00
edwardhartnett
ef9b459977 now libhdf4 layer also looks up the NC on open 2019-08-01 11:40:36 -06:00
Ed Hartnett
76d6b55eff moved call to nc4_rec_grp_del() to inside nc4_nc4f_list_del() 2019-07-16 16:29:06 -06:00
Ed Hartnett
b8e50c9254 moved freeing of allvars, alldims, alltypes lists to nc4_nc4f_list_del 2019-07-16 16:16:11 -06:00
Ed Hartnett
e9666f7333 moved free(h5) intonc4_nc4f_list_del 2019-07-16 16:07:21 -06:00
Ed Hartnett
d840c1864c removed unused prototype 2019-07-16 16:02:08 -06:00
Ed Hartnett
3310e67567 hdf4file.c cleanup 2019-07-16 14:54:42 -06:00
Ed Hartnett
bdf2711333 cleaned up handling of nc_file->dispatchdata in hdf4 file code 2019-07-16 14:48:26 -06:00
Ed Hartnett
d7e46e8c5f documentation update 2019-07-04 10:16:21 -06:00
Ed Hartnett
9a7a7e28d1 documentation update 2019-07-04 10:15:32 -06:00
Ed Hartnett
b965daf649 documentation update 2019-07-04 10:14:28 -06:00
Ed Hartnett
0ccb3d1c5e documentation update 2019-07-04 10:13:54 -06:00
Ed Hartnett
5a4948faec documentation update 2019-07-04 10:13:15 -06:00
Ed Hartnett
0cfe7ef8b2 removed unneeded reference to int_ncid in the HDF4 code 2019-07-04 10:11:42 -06:00
Dennis Heimbigner
6934aa2e8b Thread safety: step 1: cleanup
re: https://github.com/Unidata/netcdf-c/issues/1373 (partial)

* Mark some global constants be const to indicate to make them easier to track.
* Hide direct access to the ncrc_globalstate behind a function call.
* Convert dispatch tables to constants (except the user defined ones)
  This has some consequences in terms of function arguments needing to be marked
  as const also.
* Remove some no longer needed global fields
* Aggregate all the globals in nclog.c
* Uniformly replace nc_sizevector{0,1} with NC_coord_{zero,one}
* Uniformly replace nc_ptrdffvector1 with NC_stride_one
* Remove some obsolete code
2019-03-30 14:06:20 -06:00
Ed Hartnett
f9a9100d9a cleanup of whitespace in HDF4 directory 2019-02-19 05:17:00 -07:00
Ed Hartnett
6f1d0b8191 fixed HDF4 2019-01-22 10:04:50 -07:00
Ed Hartnett
6416abe887 fixed HDF4 abort warning 2019-01-02 05:58:54 -07:00
Ed Hartnett
f7e6746c7c fixed hDF4 memory freeing of format specific var info 2018-11-13 07:47:24 -07:00
Wei-keng Liao
a7c08c3a49 add a missed update to NC_HDF4_open 2018-09-23 01:36:50 -05:00
Wei-keng Liao
0ed70756cc Ignore flags NC_MPIIO and NC_MPIPOSIX. 2018-09-22 20:22:34 -05:00
Ed Hartnett
8885c75ade removing unneeded lookups 2018-08-22 06:08:19 -06:00
Ed Hartnett
697f033823 renamed NC_HDF5_FILE_INFO to NC_FILE_INFO 2018-06-22 07:08:09 -06:00
Ed Hartnett
6b90169278 switching to att_not_read 2018-06-19 05:05:44 -06:00
Ed Hartnett
dad70cf880 more lazy atts 2018-06-19 04:54:03 -06:00
Ed Hartnett
96154d9303 added merged HDF4 changes 2018-04-04 14:11:44 -06:00
Dennis Heimbigner
25f062528b This completes (for now) the refactoring of libsrc4.
The file docs/indexing.dox tries to provide design
information for the refactoring.

The primary change is to replace all walking of linked
lists with the use of the NCindex data structure.
Ncindex is a combination of a hash table (for name-based
lookup) and a vector (for walking the elements in the index).
Additionally, global vectors are added to NC_HDF5_FILE_INFO_T
to support direct mapping of an e.g. dimid to the NC_DIM_INFO_T
object. These global vectors exist for dimensions, types, and groups
because they have globally unique id numbers.

WARNING:
1. since libsrc4 and libsrchdf4 share code, there are also
   changes in libsrchdf4.
2. Any outstanding pull requests that change libsrc4 or libhdf4
   are likely to cause conflicts with this code.
3. The original reason for doing this was for performance improvements,
   but as noted elsewhere, this may not be significant because
   the meta-data read performance apparently is being dominated
   by the hdf5 library because we do bulk meta-data reading rather
   than lazy reading.
2018-03-16 11:46:18 -06:00
Ed Hartnett
2358d4a910 moved HDF4 to its own dispatch layer 2018-02-08 06:20:58 -07:00