Commit Graph

142 Commits

Author SHA1 Message Date
Edward Hartnett
b2c0bb9810 more quantize testing 2021-08-25 01:54:25 -06:00
Edward Hartnett
9a18689ffa getting ready for next try at quantization code 2021-08-24 00:45:38 -06:00
Ward Fisher
9f798e2ed6 Merge branch 'virtual_datasets' of https://github.com/d70-t/netcdf-c into gh1983.wif 2021-07-19 09:44:35 -07:00
Dennis Heimbigner
d953899559 Move to Version 2 NCZarr Extended Meta-Data
re: https://github.com/zarr-developers/zarr-specs/issues/41

After discussions with the Zarr community, it was decided to
convert to a new representation of the NCZarr meta-data extensions: version 2.
These extensions store information necessary to mapping the Zarr data model
to the netcdf-4 data model.

The basic change is to remove the NCZarr specific objects: .nczarr, .nczgroup, .nczarray, and .nczattr.
The contents of these objects is moved into the corresponding existing Zarr objects as special keys. The mapping is as follows:

* ''.nczarr'' => ''/.zgroup/_NCZARR_SUPERBLOCK_''
* ''.nczgroup => ''.zgroup/_NCZARR_GROUP_''
* ''.nczarray => ''.zarray/_NCZARR_ARRAY_''
* ''.nczattr => ''.zattr/_NCZARR_ATTR_''

Backward compatibility is maintained by looking for the object ''/.nczarr''
and if found, then assuming that the dataset is in the older version 1 format.
This compatibility only supports reading of such version 1 datasets.

Documentation and test cases are also added.

Misc. Other Changes:
1. The json parsing code was added to the general library instead of nczarr only (ncjson.c, ncjson.h).
2. Improved support for different platform paths by allowing conversion
   to a single common path representation.
3. Add some new error codes.
4. Modify nccopy usage to mention the new chunking specification.
2021-07-17 16:55:30 -06:00
Dennis Heimbigner
ec5b3f9a4f Regularize the scoping of dimensions
This is a follow-on to pull request
````https://github.com/Unidata/netcdf-c/pull/1959````,
which fixed up type scoping.

The primary changes are to _nc\_inq\_dimid()_ and to ncdump.

The _nc\_inq\_dimid()_ function is supposed to allow the name to be
and FQN, but this apparently never got implemented. So if was modified
to support FQNs.

The ncdump program is supposed to output fully qualified dimension names
in its generated CDL file under certain conditions.

Suppose ncdump has a netcdf-4 file F with variable V, and V's parent group
is G. For each dimension id D referenced by V, ncdump needs to determine
whether to print its name as a simple name or as a fully qualified name (FQN).

The algorithm is as follows:

1. Search up the tree of ancestor groups.
2. If one of those ancestor groups contains the dimid, then call it dimgrp.
3. If one of those ancestor groups contains a dim with the same name as the dimid, but with a different dimid, then record that as duplicate=true.
4. If dimgrp is defined and duplicate == false, then we do not need an fqn.
5. If dimgrp is defined and duplicate == true, then we do need an fqn to avoid incorrectly using the duplicate.
6. If dimgrp is undefined, then do a preorder breadth-first search of all the groups looking for the dimid.
7. If found, then use the fqn of the first found such dimension location.
8. If not found, then fail.

Test case ncdump/test_scope.sh was modified to test the proper
operation of ncdump and _nc\_inq\_dimid()_.

Misc. Other Changes:
* Fix nc_inq_ncid (NC4_inq_ncid actually) to return root group id if the name argument is NULL.
* Modify _ncdump/printfqn_ to print out a dimid FQN; this supports verification that the resulting .nc files were properly created.
2021-05-31 15:51:12 -06:00
Dennis Heimbigner
e7d5f24078 Add zip file support
The primary change is to support the use of a zip file as a
storage format. Simultaneously the .nz4 support is made obsolete

Use of zip requires the libzip support library, so a number of
changes to the build files (Makefile.am, CMakeLists.txt) are
necessary to locate and incorporate libzip.  The nczarr_tests
tests are also changed to add zip testing.

Other changes:
* Make sure distcheck leaves no files around.
* Add some functions to netcdf_aux to export some functions of libnetcdf.
* Add a new error NC_EFOUND as the complement of NC_EEMPTY.
* Add tracing support to nclog and use it in libnczarr.
* Modify the zmap interface to support the writeonce semantics of zip.
* Create a new s3util.c to support a variety of S3 auxilliary functions.
* EXTERNL'ize a number of functions so they can be used in s3util.
* Add support for the S3 ListObjects CommonPrefixes mechanism
  to improve search.
* Add experimental support for running nczarr X s3 tests against
  the actual Amazon S3 cloud.
2021-01-28 20:11:01 -07:00
Tobias Kölling
69fb44ec4b added NC_VIRTUAL storage layout 2020-09-03 17:51:46 +02:00
Tobias Kölling
4c27730ae3 hdf5: added unknown storage specification
In case HDF5 adds more storage specifications, netcdf4 should be able to
cope with them by default. Further specializations could be added
nonetheless.
2020-09-02 16:13:23 +02:00
Dennis Heimbigner
59e04ae071 This PR adds EXPERIMENTAL support for accessing data in the
cloud using a variant of the Zarr protocol and storage
format. This enhancement is generically referred to as "NCZarr".

The data model supported by NCZarr is netcdf-4 minus the user-defined
types and the String type. In this sense it is similar to the CDF-5
data model.

More detailed information about enabling and using NCZarr is
described in the document NUG/nczarr.md and in a
[Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in).

WARNING: this code has had limited testing, so do use this version
for production work. Also, performance improvements are ongoing.
Note especially the following platform matrix of successful tests:

Platform | Build System | S3 support
------------------------------------
Linux+gcc      | Automake     | yes
Linux+gcc      | CMake        | yes
Visual Studio  | CMake        | no

Additionally, and as a consequence of the addition of NCZarr,
major changes have been made to the Filter API. NOTE: NCZarr
does not yet support filters, but these changes are enablers for
that support in the future.  Note that it is possible
(probable?) that there will be some accidental reversions if the
changes here did not correctly mimic the existing filter testing.

In any case, previously filter ids and parameters were of type
unsigned int. In order to support the more general zarr filter
model, this was all converted to char*.  The old HDF5-specific,
unsigned int operations are still supported but they are
wrappers around the new, char* based nc_filterx_XXX functions.
This entailed at least the following changes:
1. Added the files libdispatch/dfilterx.c and include/ncfilter.h
2. Some filterx utilities have been moved to libdispatch/daux.c
3. A new entry, "filter_actions" was added to the NCDispatch table
   and the version bumped.
4. An overly complex set of structs was created to support funnelling
   all of the filterx operations thru a single dispatch
   "filter_actions" entry.
5. Move common code to from libhdf5 to libsrc4 so that it is accessible
   to nczarr.

Changes directly related to Zarr:
1. Modified CMakeList.txt and configure.ac to support both C and C++
   -- this is in support of S3 support via the awd-sdk libraries.
2. Define a size64_t type to support nczarr.
3. More reworking of libdispatch/dinfermodel.c to
   support zarr and to regularize the structure of the fragments
   section of a URL.

Changes not directly related to Zarr:
1. Make client-side filter registration be conditional, with default off.
2. Hack include/nc4internal.h to make some flags added by Ed be unique:
   e.g. NC_CREAT, NC_INDEF, etc.
3. cleanup include/nchttp.h and libdispatch/dhttp.c.
4. Misc. changes to support compiling under Visual Studio including:
   * Better testing under windows for dirent.h and opendir and closedir.
5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags
   and to centralize error reporting.
6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them.
7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible.

Changes Left TO-DO:
1. fix provenance code, it is too HDF5 specific.
2020-06-28 18:02:47 -06:00
Dennis Heimbigner
44d0dcaad2 Add support for multiple filters per variable.
re: https://github.com/Unidata/netcdf-c/issues/1584

Support has been added for multiple filters per variable.  This
affects a number of components in netcdf. The new APIs are
documented in NUG/filters.md.

The primary changes are:
* A set of new functions are provided (see __include/netcdf_filter.h__).
    - Obtain a list of the filters associated with a variable
    - Obtain the parameters for a specific filter.
* The existing __nc_inq_var_filter__ function now returns info
  about the first defined filter.
* The utilities (ncgen, ncdump, and nccopy) now support
  an extended format for specifying a sequence of filters.
  The general form is __<filter>|<filter>..._.
* The ncdump **_Filter** attribute now dumps a list of all the
  filters associated with a variable using the above new format.
* Filter specifications can now use a filter name instead of number
  for filters known to the netcdf library, which in turn is taken
  from the HDF5 filter registration page.
* New errors are defined: NC_EFILTER and NC_ENOFILTER. The latter
  is returned if an attempt is made to access an unknown filter.
* Internally, the dispatch table has been extended to add a function
  to handle all of the filter functions.
* New, filter-related, tests were added to nc_test4.
* A new plugin was added to the plugins directory to help with testing.

Notes:
1. The shuffle and fletcher32 filters are not part of the multifilter system.

Misc. changes:
1. A debug module was added to libhdf5 to help catch error locations.
2020-02-16 12:59:33 -07:00
Edward Hartnett
164de982bd merged changes from master branch 2020-02-11 04:05:35 -07:00
Edward Hartnett
da904f6438 raised NC_MAX_HDF4_NAME length to NC_MAX_NAME 2020-02-08 09:21:01 -07:00
Edward Hartnett
96182a9236 trying to fix windows build 2020-02-07 11:57:56 -07:00
Edward Hartnett
82df2876b6 starting to support compact storage 2019-12-04 07:53:37 -07:00
Greg Sjaardema
56c0d5cf8a Spelling fixes 2019-09-18 08:03:01 -06:00
edwardhartnett
fb371d3c3e added 2019 to netcdf.h copyright notice 2019-08-16 04:38:26 -06:00
edwardhartnett
60f436e7ee starting to remove obsolete _CRAYMPP macros 2019-08-14 06:13:45 -06:00
Ward Fisher
e2b31ffae4
Merge branch 'master' into byterange.dmh 2019-03-19 12:05:44 -06:00
Wei-keng Liao
8ebe73059c revise comments about the deprecated flags in netcdf.h to avoid confusion 2019-02-26 11:16:54 -06:00
Dennis Heimbigner
bf2746b8ea Provide byte-range reading of remote datasets
re: issue https://github.com/Unidata/netcdf-c/issues/1251

Assume that you have the URL to a remote dataset
which is a normal netcdf-3 or netcdf-4 file.

This PR allows the netcdf-c to read that dataset's
contents as a netcdf file using HTTP byte ranges
if the remote server supports byte-range access.

Originally, this PR was set up to access Amazon S3 objects,
but it can also access other remote datasets such as those
provided by a Thredds server via the HTTPServer access protocol.
It may also work for other kinds of servers.

Note that this is not intended as a true production
capability because, as is known, this kind of access to
can be quite slow. In addition, the byte-range IO drivers
do not currently do any sort of optimization or caching.

An additional goal here is to gain some experience with
the Amazon S3 REST protocol.

This architecture and its use documented in
the file docs/byterange.dox.

There are currently two test cases:

1. nc_test/tst_s3raw.c - this does a simple open, check format, close cycle
   for a remote netcdf-3 file and a remote netcdf-4 file.
2. nc_test/test_s3raw.sh - this uses ncdump to investigate some remote
   datasets.

This PR also incorporates significantly changed model inference code
(see the superceded PR https://github.com/Unidata/netcdf-c/pull/1259).

1. It centralizes the code that infers the dispatcher.
2. It adds support for byte-range URLs

Other changes:

1. NC_HDF5_finalize was not being properly called by nc_finalize().
2. Fix minor bug in ncgen3.l
3. fix memory leak in nc4info.c
4. add code to walk the .daprc triples and to replace protocol=
   fragment tag with a more general mode= tag.

Final Note:
Th inference code is still way too complicated. We need to move
to the validfile() model used by netcdf Java, where each
dispatcher is asked if it can process the file. This decentralizes
the inference code. This will be done after all the major new
dispatchers (PIO, Zarr, etc) have been implemented.
2019-01-01 18:27:36 -07:00
Ward Fisher
5be0126920 More standardizing of the copyright stanza. 2018-12-06 14:13:56 -07:00
Ward Fisher
3c59fb860d Updating files to refer to the top-level COPYRIGHT file. 2018-12-04 15:52:43 -07:00
Dennis Heimbigner
4636584d5b Revert/Improve nc_create + NC_DISKLESS behavior
re: https://github.com/Unidata/netcdf-c/issues/1154

Inadvertently, the behavior of NC_DISKLESS with nc_create() was
changed in release 4.6.1. Previously, the NC_WRITE flag needed
to be explicitly used with NC_DISKLESS in order to cause the
created file to be persisted to disk.

Additional analyis indicated that the current NC_DISKLESS
implementation was seriously flawed.

This PR attempts to clean up and regularize the situation with
respect to NC_DISKLESS control. One important aspect of diskless
operation is that there are two different notions of write.

1. The file is read-write vs read-only when using the netcdf API.
2. The file is persisted or not to disk at nc_close().

Previously, these two were conflated. The rules now are
as follows.

1. NC_DISKLESS + NC_WRITE means that the file is read/write using the netcdf API
2. NC_DISKLESS + NC_PERSIST means that the file is persisted to a disk file at nc_close.
3. NC_DISKLESS + NC_PERSIST + NC_WRITE means both 1 and 2.

The NC_PERSIST flag is new and takes over the obsolete NC_MPIPOSIX flag.
NC_MPIPOSIX is still defined, but is now an alias for the NC_MPIIO flag.

It is also now the case that for netcdf-4, NC_DISKLESS is independent
of NC_INMEMORY and in fact it is an error to specify both flags
simultaneously.

Finally, the MMAP code was fixed to use NC_PERSIST as well.
Also marked MMAP as deprecated.

Also added a test case to test various combinations of NC_DISKLESS,
NC_PERSIST, and NC_WRITE.

This PR affects a number of files and especially test cases
that used NC_DISKLESS.

Misc. Unrelated fixes
1. fixed some warnings in ncdump/dumplib.c
2018-10-10 13:32:17 -06:00
Dennis Heimbigner
a5a34f6aba
Merge branch 'master' into nc_mpiio_nc_mpiposix 2018-10-06 13:33:55 -06:00
Dennis Heimbigner
8072d1f6bb Modify DAP2 and DAP4 to optionally allow Fillvalue/Variable mismatch
re: issue https://github.com/Unidata/netcdf-c/issues/1151

Modify DAP2 and DAP4 code to handle case when _FillValue type is not
same as the parent variable type.

Specifically:
1. Define a parameter [fillmismatch] to allow this mismatch;
   default is to disallow.
2. If allowed, forcibly change the type of the _FillValue to match
   the parent variable.
3. If allowed Convert the values to match new type
4. Generate a log message
5. if not allowed, then fail

Implementing this required some changes to ncdap_test/dapcvt.c
Also added test cases.

Minor Unrelated Changes:
1. There were a number of warnings about e.g.
   assigning a const char* to a char*. Fix these
2. In nccopy.1, replace .NP with .IP "n"
   (re PR https://github.com/Unidata/netcdf-c/pull/1144)
3. fix minor error in ncdump/ocprint
2018-10-01 15:51:43 -06:00
Wei-keng Liao
0ed70756cc Ignore flags NC_MPIIO and NC_MPIPOSIX. 2018-09-22 20:22:34 -05:00
Wei-keng Liao
48da78e133 Use PnetCDF instead of parallel-netcdf to avoid confusion with
parallel netcdf4. Also, update PnetCDF web page.
2018-09-17 17:18:48 -05:00
Ward Fisher
fbe0a18b1c
Merge branch 'master' into ejh_loop_cleanup_2 2018-09-05 11:22:55 -06:00
Ward Fisher
3c3119bed2 Merge remote-tracking branch 'origin/dapcurlopt.dmh' into combined-pr.wif 2018-09-04 12:50:55 -06:00
Ed Hartnett
27281dfe4b
Merge branch 'master' into ejh_loop_cleanup_2 2018-08-28 17:32:40 -06:00
Dennis Heimbigner
79e38de840 Add the ability to set some additional curlopt values
Add the ability to set some additional curlopt values via .daprc (aka .dodsrc).
This effects both DAP2 and DAP4 protocols.

Related issues:
[1] re: esupport: KOZ-821332
[2] re: github issue https://github.com/Unidata/netcdf4-python/issues/836
[3] re: github issue https://github.com/Unidata/netcdf-c/issues/1074

1. CURLOPT_BUFFERSIZE: Relevant to [1]. Allow user to set the read/write
buffersizes used by curl.
This is done by adding the following to .daprc (aka .dodsrc):
	HTTP.READ.BUFFERSIZE=n
where n is the buffersize in bytes. There is a built-in (to curl)
limit of 512k for this value.

2. CURLOPT_TCP_KEEPALIVE (and CURLOPT_TCP_KEEPIDLE and CURLOPT_TCP_KEEPINTVL):
Relevant (maybe) to [2] and [3]. Allow the user to turn on KEEPALIVE
This is done by adding the following to .daprc (aka .dodsrc):
	HTTP.KEEPALIVE=on|n/m
If the value is "on", then simply enable default KEEPALIVE. If the value
is n/m, then enable KEEPALIVE and set KEEPIDLE to n and KEEPINTVL to m.
2018-08-26 17:04:46 -06:00
Ed Hartnett
5b2f6ecebc changed literal in netcdf.h 2018-08-18 05:22:07 -06:00
Ward Fisher
59e98ef087
Merge branch 'master' into default_format 2018-08-03 10:59:41 -06:00
Wei-keng Liao
025c7d1bb2 introduce error code NC_EPNETCDF for errors at PnetCDF level 2018-07-29 15:33:08 -05:00
Wei-keng Liao
0ee68a3263 This commit fixes the logical problem of using the default file formats.
The fix includes the following changes.
1. Checking and using the default file format at file create time is now
   done only when the create mode (argument cmode) does not include any
   format related flags, i.e. NC_64BIT_OFFSET, NC_64BIT_DATA,
   NC_CLASSIC_MODEL, and NC_NETCDF4.
2. Adjustment of cmode based on the default format is now done in
   NC_create() only. The idea is to adjust cmode before entering the
   dispatcher's file create subroutine.
3. Any adjustment of cmode is removed from all I/O dispatchers, i.e.
   NC4_create(), NC3_create(), and NCP_create().
4. Checking for illegal cmode has been done in check_create_mode() called
   in NC_create(). This commit removes the redundant checking from
   NCP_create().
5. Remove PnetCDF tests in nc_test/tst_names.c, so it can focus on testing
   all classic formats and netCDF4 formats.

Two new test programs are added. They can be used to test netCDF with and
without this commit.
1. nc_test/tst_default_format.c
2. nc_test/tst_default_format_pnetcdf.c (use when PnetCDF is enabled).
2018-07-28 11:18:28 -05:00
Dennis Heimbigner
3c23254e34
Fix esupport # HYV-329576
nc_finalize() -> nc_finalize(void)
2018-07-12 12:14:14 -06:00
Ward Fisher
82c9788bb8 Correcting #1057, also adding fortran tests to travis-ci for the time being. 2018-07-11 13:30:53 -06:00
Ed Hartnett
e28405744c merged master 2018-06-25 17:08:22 -06:00
Ed Hartnett
2ca8526278 merged master 2018-06-14 16:57:38 -06:00
Ed Hartnett
8c879b7d2b removed unwanted ifdefs 2018-06-09 19:01:47 -06:00
Ed Hartnett
8d08b8c236 more magic numbers 2018-06-06 13:09:16 -06:00
Ed Hartnett
17da700a5c adding support for magic numbers 2018-06-05 12:22:38 -06:00
Ed Hartnett
2e831e9bbe initialization of user-defined formats 2018-06-02 08:43:34 -06:00
Ed Hartnett
a44fda8901 user defined formats 2018-05-13 14:16:07 -06:00
Ed Hartnett
27a74dee54 now always define nc_set_log_level unless user disables 2018-05-12 11:27:11 -06:00
Dennis Heimbigner
4739cd3225 Master merge and conflict resolution 2018-04-12 21:51:17 -06:00
Dennis Heimbigner
ccc70d640b re: esupport MQO-415619
and https://github.com/Unidata/netcdf-c/issues/708

Expand the NC_INMEMORY capabilities to support writing and accessing
the final modified memory.

Three new functions have been added:
nc_open_memio, nc_create_mem, and nc_close_memio.

The following new capabilities were added.
1. nc_open_memio() allows the NC_WRITE mode flag
   so a chunk of memory can be passed in and be modified
2. nc_create_mem() allows the NC_INMEMORY flag to be set
   to cause the created file to be kept in memory.
3. nc_close_mem() allows the final in-memory contents to be
   retrieved at the time the file is closed.
4. A special flag, NC_MEMIO_LOCK, is provided to ensure that
   the provided memory will not be freed or reallocated.

Note the following.
1. If nc_open_memio() is called with NC_WRITE, and NC_MEMIO_LOCK is not set,
   then the netcdf-c library will take control of the incoming memory.
   This means that the original memory block should not be freed
   but the block returned by nc_close_mem() must be freed.
2. If nc_open_memio() is called with NC_WRITE, and NC_MEMIO_LOCK is set,
   then modifications to the original memory may fail if the space available
   is insufficient.

Documentation is provided in the file docs/inmemory.md.
A test case is provided: nc_test/tst_inmemory.c driven by
nc_test/run_inmemory.sh

WARNING: changes were made to the dispatch table for
the close entry. From int (*close)(int) to int (*close)(int,void*).
2018-02-25 21:45:31 -07:00
Dennis Heimbigner
8cb1fc4cfe This is the second step in refactoring the libsrc4 code.
The first was branch newhash0.dmh.

As with newhash0.dmh, these changes should be transparent.
2018-02-24 20:36:24 -07:00
Dennis Heimbigner
727b613459 This is the initial step in moving to the new higher performance
(I hope) metadata mechanism. This mostly just adds new pieces of
code (e.g. nclistmap) and does some minor fixes.

It should be transparent to everything else.
The next set of changes will be the big step.
2018-02-08 19:53:40 -07:00
Ward Fisher
3a758cc32c Reinserted the NC_HAVE_META_H macro in netcdf.h 2018-01-25 12:41:39 -07:00