Commit Graph

520 Commits

Author SHA1 Message Date
Ward Fisher
e4138efa9d
Merge pull request #1851 from brtnfld/master
Replaced deprecated (in 1.8.0) H5Aopen_name with H5Aopen_by_name
2020-10-01 14:25:06 -07:00
Dennis Heimbigner
aeb3ac2809 Mostly revert the filter code to reduce its complexity of use.
re: https://github.com/Unidata/netcdf-c/issues/1836

Revert the internal filter code to simplify it. From the user's
point of view, the only visible changes should be:

1. The functions that convert text to filter specs have had their signature reverted and have been moved to netcdf_aux.h
2. Some filter API functions now return NC_ENOFILTER when inquiry is made about some filter.

Internally,the dispatch table has been modified to get rid of the filter_actions
entry and associated complex structures. It has been replaced with
inq_var_filter_ids and inq_var_filter_info entries and the dispatch table
version has been bumped to 3. Corresponding NOOP and NOTNC4 functions
were added to libdispatch/dnotnc4.c. Also, the filter_action entries
in dispatch tables were replaced for all dispatch code bases (HDF5, DAP2,
etc). This should only impact UDF users.

In the process, it became clear that the form of the filters
field in NC_VAR_INFO_T was format dependent, so I converted it to
be of type void* and pushed its management into the various dispatch
code bases. Specifically libhdf5 and libnczarr now manage the filters
field in their own way.

The auxilliary functions for parsing textual filter specifications
were moved to netcdf_aux.h and were renamed to the following:
* ncaux_h5filterspec_parse
* ncaux_h5filterspec_parselist
* ncaux_h5filterspec_free
* ncaux_h5filter_fix8

Misc. Other Changes:

1. Document NUG/filters.md updated to reflect the changes above.
2. All the old data types (structs and enums)
   used by filter_actions actions were deleted.
   The exception is the NC_H5_Filterspec because it is needed
   by ncaux_h5filterspec_parselist.
3. Clientside filters were removed -- another enhancement
   for which no-one ever asked.
4. The ability to remove filters was itself removed.
5. Some functionality needed by nczarr was moved from libhdf5
   to libsrc4 e.g. nc4_find_default_chunksizes
6. All the filterx code was removed
7. ncfilter.h and nc4filter.c no longer used

Misc. Unrelated Changes:

1. The nczarr_test makefile clean was leaving some directories; so
   add clean-local to take care of them.
2020-09-27 12:43:46 -06:00
Scot Breitenfeld
2620c01067 Replaced deprecated (in 1.8.0) H5Aopen_name with H5Aopen_by_name 2020-09-25 12:17:20 -05:00
Dennis Heimbigner
f3218a2e2c Use the built-in HDF5 byte-range reader, if available.
re: Issue https://github.com/Unidata/netcdf-c/issues/1848

The existing Virtual File Driver built to support byte-range
read-only file access is quite old. It turns out to be extremely
slow (reason unknown at the moment).

Starting with HDF5 1.10.6, the HDF5 library has its own version
of such a file driver. The HDF5 developers have better knowledge
about building such a driver and what incantations are needed to
get good performance.

This PR modifies the byte-range code in hdf5open.c so
that if the HDF5 file driver is available, then it is used
in preference to the one written by the Netcdf group.

Misc. Other Changes:

1. Moved all of nc4print code to ncdump to keep appveyor quiet.
2020-09-24 14:33:58 -06:00
Dennis Heimbigner
2f0a6d22e9 Fix error where not converting fill data
re: Github Issue https://github.com/Unidata/netcdf-c/issues/1826

It turns out that the common get code (NC4_get_vars) in libhdf5
(and libnczarr) has an optimization where it does not attempt to
read from the file if the file is all fill values. Rather it
just fills the output buffer with the fill value.  The problem
is that -- in that case -- it forgets that conversion might still be
needed.  So the conversion never occurs and the raw bits of
the fill data are stored directly into the memory space.

Solution: move some code around to properly do the
conversion no matter how the data was obtained.

Added a test cases nc_test4/test_fillonly.sh and
nczarr_test/test_fillonlyz.sh
2020-09-12 14:49:59 -06:00
Ward Fisher
31dee0c4da
Revert "Revert "Fix nczarr-experimental: improve build support, disengage hdf5 vs netcdf4 flags, and find AWS libraries"" 2020-08-17 19:15:47 -06:00
Ward Fisher
16c27ca13f
Revert "Fix nczarr-experimental: improve build support, disengage hdf5 vs netcdf4 flags, and find AWS libraries" 2020-08-17 15:51:01 -06:00
Dennis Heimbigner
d85bb6fe20 The big change for this commit is complete the
disengagement of enable-netcdf4 from enable-hdf5.
That is, with the advent of nczarr, it is possible
to turn off hdf5 but still need netcdf-4 enabled
because nczarr uses libsrc4, but not libhdf5.
This change involves a bunch of things:
1. Modify configure.ac and CMakelist to make enable_hdf5
   control if hdf5 support is provided. For back compatibility,
   disable-netcdf4 is treated as disable-hdf5. But internally,
   netcdf4 support is controlled only by the enabling of formats
   that require it.
2. In support of #1, modify .travis.yml to use enable/disable-hdf5
   instead of enable/disable-netcdf4.
3. test_common.in is modified to track selected features,
   including enable-hdf5 and enable-s3-tests. This is used in
   selected tests that mix netcdf-3 and netcdf4 tests.
4. The conflation of USE_HDF5 and USE_NETCDF4 is common in
   code, tests, and build files, so all of those had to be weeded out.
5. It turns out that some of the NC4_dim functions really are HDF5 specific,
   but are not treated as such. So they are moved from nc4dim.c to
   hdf5dim.c or hdf5dispatch.c
6. Some generic functions in libhdf5 can be (and were) moved to libsrc4.
2020-08-12 15:42:50 -06:00
bombipappoo
ecbb0f5bbf Convert filename from ANSI to UTF-8 before calling HDF5. 2020-07-14 22:44:42 +09:00
Ward Fisher
0825c9767f Merge branch 'ejh_par_test' of https://github.com/NOAA-GSD/netcdf-c into NOAA-GSD-ejh_par_test 2020-07-09 17:29:45 -06:00
Ward Fisher
7d2a646f25 Merge branch 'ejh_fix_redef' of https://github.com/NOAA-GSD/netcdf-c into NOAA-GSD-ejh_fix_redef 2020-07-09 13:55:37 -06:00
Edward Hartnett
3e60a863de fixed warning in hdf5filter.c 2020-07-08 11:24:54 -06:00
Edward Hartnett
832fbf19c8 now dont return error on second redef call for netcdf/HDF5 files 2020-07-08 11:10:15 -06:00
Edward Hartnett
ac3b77d418 merged in changes from ejh_test_szip_unlim 2020-07-04 07:43:50 -06:00
Edward Hartnett
4b78c0c4a3 merged master 2020-07-03 13:57:47 -06:00
Edward Hartnett
6c112efb8e fixed problem setting szip on var with unlimited dim and added test 2020-07-02 10:55:34 -06:00
Edward Hartnett
dc37446a5f more test development 2020-06-29 09:01:24 -06:00
Edward Hartnett
467f342ae9 further test development 2020-06-29 08:35:11 -06:00
Dennis Heimbigner
59e04ae071 This PR adds EXPERIMENTAL support for accessing data in the
cloud using a variant of the Zarr protocol and storage
format. This enhancement is generically referred to as "NCZarr".

The data model supported by NCZarr is netcdf-4 minus the user-defined
types and the String type. In this sense it is similar to the CDF-5
data model.

More detailed information about enabling and using NCZarr is
described in the document NUG/nczarr.md and in a
[Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in).

WARNING: this code has had limited testing, so do use this version
for production work. Also, performance improvements are ongoing.
Note especially the following platform matrix of successful tests:

Platform | Build System | S3 support
------------------------------------
Linux+gcc      | Automake     | yes
Linux+gcc      | CMake        | yes
Visual Studio  | CMake        | no

Additionally, and as a consequence of the addition of NCZarr,
major changes have been made to the Filter API. NOTE: NCZarr
does not yet support filters, but these changes are enablers for
that support in the future.  Note that it is possible
(probable?) that there will be some accidental reversions if the
changes here did not correctly mimic the existing filter testing.

In any case, previously filter ids and parameters were of type
unsigned int. In order to support the more general zarr filter
model, this was all converted to char*.  The old HDF5-specific,
unsigned int operations are still supported but they are
wrappers around the new, char* based nc_filterx_XXX functions.
This entailed at least the following changes:
1. Added the files libdispatch/dfilterx.c and include/ncfilter.h
2. Some filterx utilities have been moved to libdispatch/daux.c
3. A new entry, "filter_actions" was added to the NCDispatch table
   and the version bumped.
4. An overly complex set of structs was created to support funnelling
   all of the filterx operations thru a single dispatch
   "filter_actions" entry.
5. Move common code to from libhdf5 to libsrc4 so that it is accessible
   to nczarr.

Changes directly related to Zarr:
1. Modified CMakeList.txt and configure.ac to support both C and C++
   -- this is in support of S3 support via the awd-sdk libraries.
2. Define a size64_t type to support nczarr.
3. More reworking of libdispatch/dinfermodel.c to
   support zarr and to regularize the structure of the fragments
   section of a URL.

Changes not directly related to Zarr:
1. Make client-side filter registration be conditional, with default off.
2. Hack include/nc4internal.h to make some flags added by Ed be unique:
   e.g. NC_CREAT, NC_INDEF, etc.
3. cleanup include/nchttp.h and libdispatch/dhttp.c.
4. Misc. changes to support compiling under Visual Studio including:
   * Better testing under windows for dirent.h and opendir and closedir.
5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags
   and to centralize error reporting.
6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them.
7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible.

Changes Left TO-DO:
1. fix provenance code, it is too HDF5 specific.
2020-06-28 18:02:47 -06:00
Greg Sjaardema
edf0ca6c98
Avoid potential integer overrun
It is possible for the values stored to `file_value_size` to overrun the storage capacity of a 32-bit integer.  The value does need to store negative values potentially, so can be `size_t` or `hsize_t`, so use the `hssize_t` which is a signed 64-bit value.  Could also use `ssize_t`, but that is not used in this routine...
2020-06-10 15:42:22 -06:00
Dennis Heimbigner
84c69afca7 Allow redefinition of variable filters
re: Github issue https://github.com/Unidata/netcdf-c/issues/1713

If nc_def_var_filter or nc_def_var_deflate or nc_def_var_szip is
called multiple times with the same filter id, but possibly with
different sets of parameters, then the first invocation is
sticky and later invocations are ignored. The desired behavior
is to have the last invocation be used.

This PR implements that desired behavior, with some special
cases.  If you call nc_def_var_deflate multiple times, then the
last invocation rule applies with respect to deflate. However,
the shuffle filter, if enabled, is always applied just before
applying deflate.

Misc unrelated changes:
1. Make client-side filters be disabled by default
2. Fix the definition of uintptr_t and use in oc2 and libdap4
3. Add some test cases
4. modify filter order tests to use plugin filters rather
   than client-side filters
2020-05-11 09:42:31 -06:00
Edward Hartnett
6aa6eff710 now properly setting HDF5 file cache for files created/opened sequentially on parallel IO builds 2020-05-08 11:00:56 -06:00
Edward Hartnett
e3c9e83ecf adding internal function, plus some documentation 2020-05-08 08:58:42 -06:00
Greg Sjaardema
3e919a568f
Remove line that was missed in original patch 2020-04-30 14:00:18 -06:00
Greg Sjaardema
1db3d07beb
Proof-of-Concept: Avoid N^2 behavior in NC4_inq_dim
The current library seems to have some behavior which is N^2 in the number of vars in a file.

The `NC4_inq_dim` routine calls down to `nc4_find_dim_len` which iterates through each `var` in the file/group and calls `find_var_dim_max_length` on each var and finds the largest length of the dim on each of those vars. This is done only for unlimited vars.

I have a file with 129 dim and 1630 vars.  The unlimited dimension is of length 41.  In my test program, I am reading data from 4 files which have the same dim and var count and reading every 4th time step (unlimited dimension).  If I run a profile, I see that 98.2% of the program time is in the `nc_get_vara_float` call tree and most of that is in `find_var_dim_max_length` (94.8%).

There are 66,142 calls to `nc_get_vara_float` resulting in 107,307,290 calls to `find_var_dim_max_length` with twice that number of calls to `malloc/free` and calls to 5 HDF5 routines.  All of this, at least in my case, to return the same `41` each time.

The proof of concept patch here will check whether the file is read-only (or no_write) and if so, it will cache the value of the dim length the first time it is calculated.   With this change, my example run is sped up by a factor of 60.  The time for `NC4_inq_dim` and below drops from 97.2% down to 2.7%.

I'm not sure whether this is the correct fix, or if there is some behavior that I am overlooking, but my users would definitely like a 10 second run compared to a 10 minute run... 

This is on current Netcdf master branch.

I will try to attach some valgrind/callgrind profiles.
2020-04-30 11:01:10 -06:00
Scot Breitenfeld
7b1b06b5ca Merge remote-tracking branch 'upstream/master' 2020-04-23 15:36:14 -05:00
Dennis Heimbigner
b0e0d81aa9 Fix reclamation of the ->format_XXX_info fields
nc4internal.c contains code to free the format_XXX_info
fields. Since these are format specific, this code
was moved to the dispatch code (libhdf5 and libhdf4
in the current case).

Additionally, there are some fields in nc4internal.h (e.g.
dimscale fields) that are specific to HDF5 and have been moved
to the corresponding HDF5 data structures and code.

Misc. other changes:
1. NC_VAR_INFO_T->hdf5_name renamed to alt_name to avoid
   implying it is necessarily HDF5 specific.
2. prefix NC_FILE_INFO_T with an instance of NC_OBJ for consistency.
   this also requires wrapping move_in_NCList() to keep
   hdr.id consistent.
2020-03-29 12:48:59 -06:00
Edward Hartnett
e7b9b1b587 fixed documentation of cache int functions 2020-03-24 15:02:42 -06:00
Scot Breitenfeld
c5d2e99417 Updated to use H5O_info2_t for HDF5 1.12 and the use of H5Oget_info3 instead of H5Gget_objinfo 2020-03-12 15:50:24 +00:00
Edward Hartnett
b29f9f34a0 whitespace cleanup 2020-03-08 09:10:07 -06:00
Edward Hartnett
4c7e162f34 less use of contiguous/compact field 2020-03-08 07:31:21 -06:00
Edward Hartnett
053752440b stop setting contiguous field in nc4hdf5.c 2020-03-08 07:18:52 -06:00
Edward Hartnett
04eafff166 stop setting contiguous field in hdf5filter.c 2020-03-08 07:18:11 -06:00
Edward Hartnett
5574317db7 stop setting contiguous/compact fields at file open 2020-03-08 07:17:01 -06:00
Edward Hartnett
61357cfd4d more use of storage field 2020-03-08 07:09:15 -06:00
Edward Hartnett
1761850795 continuing to switch to storage field 2020-03-08 07:05:51 -06:00
Edward Hartnett
b98a37e0b3 using storage field in nc4var.c 2020-03-08 06:38:44 -06:00
Edward Hartnett
119e8e9465 using storage in hdf5filter.c 2020-03-08 06:31:34 -06:00
Edward Hartnett
8dec9f6c99 now setting storage field when setting var storage 2020-03-08 06:29:49 -06:00
Edward Hartnett
d87a073a34 starting to use storage field when opening file 2020-03-08 06:21:08 -06:00
Edward Hartnett
0c419ec582 removed commented-out code 2020-03-06 09:57:33 -07:00
Edward Hartnett
502336c2c7 now return NC_EINVAL on attempt to set chunking on scalar var 2020-03-03 11:57:16 -07:00
Dennis Heimbigner
73537603e2 Make scalar X filter return an error instead of ignoring it 2020-03-02 15:10:54 -07:00
Dennis Heimbigner
420fdf4625 fix memory allocation failure in hdf5var.c 2020-03-02 11:45:41 -07:00
Dennis Heimbigner
7d1ca9ac85 fix references to var->deflate' 2020-03-02 11:12:30 -07:00
Dennis Heimbigner
e66c727c28 Fix Filters x compact 2020-02-29 15:33:27 -07:00
Dennis Heimbigner
f376c23329 Make utilities support NC_COMPACT
re: https://github.com/Unidata/netcdf-c/issues/1642

Modify ncdump, nccopy, and ncgen to support the NC_COMPACT storage option.
Added test cases and added description to the man pages for the utilities.

1. ncdump: For compact storage variable, print special attribute __Storage_ as
````
    <var>: _Storage = "compact";
````

2. ncgen: parse and implement
````
    <var>: _Storage = "compact";
````
in a .cdl file

3. nccopy: Extend the chunk specification (-c flag) to support
   compact using the forms
````
nccopy ... -c <var>:compact
and
nccopy ... -c <var>:contiguous
````

Misc. other changes
1. cleanup the copy_chunking function in ncdump/nccopy.c
2020-02-29 12:06:21 -07:00
Dennis Heimbigner
10d227fc1b fix parallel filter error discovered by Hartnett 2020-02-28 11:36:58 -07:00
Dennis Heimbigner
a3a3e15cb1 fix bad edit 2020-02-27 15:33:39 -07:00
Dennis Heimbigner
afe5a2998c
Merge branch 'master' into multifilter.dmh 2020-02-27 15:02:27 -07:00