Commit Graph

137 Commits

Author SHA1 Message Date
Dennis Heimbigner
45a8a265b8 master merge 2019-02-23 17:14:12 -07:00
Ed Hartnett
c771443e43 cleanup of whitespace in HDF5 directory 2019-02-19 05:18:25 -07:00
Ward Fisher
1fde39c8d7
Merge branch 'master' into byterange.dmh 2019-02-07 14:28:23 -07:00
Ward Fisher
b27c7d899d Merge branch 'master' into byterange.dmh 2019-01-25 14:50:23 -07:00
Ed Hartnett
840d51d035 changed NC_GRP_INFO_T to use atts_read instead of atts_not_read 2019-01-22 08:11:52 -07:00
Ed Hartnett
243cef8fa5 changed var atts_not_read to atts_read 2019-01-21 08:40:04 -07:00
Ed Hartnett
adb3356aff merged ejh_test_pnetcdf, fixes broken logging statement, added comments 2019-01-20 09:42:18 -07:00
Dennis Heimbigner
84c2bc0d78 Merge branch 'master' into byterange.dmh 2019-01-02 13:18:45 -07:00
Dennis Heimbigner
bf2746b8ea Provide byte-range reading of remote datasets
re: issue https://github.com/Unidata/netcdf-c/issues/1251

Assume that you have the URL to a remote dataset
which is a normal netcdf-3 or netcdf-4 file.

This PR allows the netcdf-c to read that dataset's
contents as a netcdf file using HTTP byte ranges
if the remote server supports byte-range access.

Originally, this PR was set up to access Amazon S3 objects,
but it can also access other remote datasets such as those
provided by a Thredds server via the HTTPServer access protocol.
It may also work for other kinds of servers.

Note that this is not intended as a true production
capability because, as is known, this kind of access to
can be quite slow. In addition, the byte-range IO drivers
do not currently do any sort of optimization or caching.

An additional goal here is to gain some experience with
the Amazon S3 REST protocol.

This architecture and its use documented in
the file docs/byterange.dox.

There are currently two test cases:

1. nc_test/tst_s3raw.c - this does a simple open, check format, close cycle
   for a remote netcdf-3 file and a remote netcdf-4 file.
2. nc_test/test_s3raw.sh - this uses ncdump to investigate some remote
   datasets.

This PR also incorporates significantly changed model inference code
(see the superceded PR https://github.com/Unidata/netcdf-c/pull/1259).

1. It centralizes the code that infers the dispatcher.
2. It adds support for byte-range URLs

Other changes:

1. NC_HDF5_finalize was not being properly called by nc_finalize().
2. Fix minor bug in ncgen3.l
3. fix memory leak in nc4info.c
4. add code to walk the .daprc triples and to replace protocol=
   fragment tag with a more general mode= tag.

Final Note:
Th inference code is still way too complicated. We need to move
to the validfile() model used by netcdf Java, where each
dispatcher is asked if it can process the file. This decentralizes
the inference code. This will be done after all the major new
dispatchers (PIO, Zarr, etc) have been implemented.
2019-01-01 18:27:36 -07:00
Ed Hartnett
029aa5f626 now using hidden coordinates att to speed file opens 2018-12-20 05:59:31 -07:00
Ed Hartnett
7249350c3f now remember whether coords att has been read for a var 2018-12-19 09:43:32 -07:00
Ed Hartnett
f5c7209838 comment and code cleanup 2018-12-19 09:10:15 -07:00
Ed Hartnett
070214f81a more comments, code cleanup 2018-12-19 06:52:26 -07:00
Ed Hartnett
25184f3843 moved code to get_attached_info() 2018-12-18 09:16:03 -07:00
Ed Hartnett
1b38d9aef8 lazy read of some var metadata 2018-12-18 07:48:22 -07:00
Ed Hartnett
e9bd94821d split out some var meta reading, fixed setting of var->container 2018-12-17 08:41:43 -07:00
Ed Hartnett
77b3a81d86 starting to deal with var metadata separately 2018-12-17 08:25:19 -07:00
Ed Hartnett
aaa5a50ca9 cleanup of hdf5open.c, now check for COORDINATES hidden att and use it to find dimids if available 2018-12-11 09:57:08 -07:00
Ed Hartnett
dc1115ae76 moved rec_match_dimscales() to hdf5open.c, made it faster by skipping already-identified dims 2018-12-11 09:40:59 -07:00
Ed Hartnett
26db90e4e3 changes 2018-11-21 07:39:05 -07:00
Ed Hartnett
c1ffb92a7f checking return value for HDF5 function call 2018-11-20 15:43:34 -07:00
Ed Hartnett
ab963e3d41 removing HDF5 type info from libsrc4 2018-11-20 14:24:40 -07:00
Ed Hartnett
9aedbd0c41 changing over native_hdf_typeid 2018-11-20 10:55:45 -07:00
Ed Hartnett
d29feb53de have switched location of hdf_native_typeid 2018-11-20 10:29:02 -07:00
Ed Hartnett
9e970562cf more type work 2018-11-20 10:01:39 -07:00
Ed Hartnett
2708f5b660 more type work 2018-11-20 09:11:28 -07:00
Ed Hartnett
e11e9b7bfd more type work 2018-11-20 09:06:40 -07:00
Ed Hartnett
5104262f6b changing types 2018-11-20 08:00:48 -07:00
Ed Hartnett
b5be30175d cleanup 2018-11-19 09:35:29 -07:00
Ed Hartnett
1d5307b600 merged in ejh_next_10 2018-11-19 09:25:04 -07:00
Ed Hartnett
3ec8b34bfb removed unneeded HDF5 fields 2018-11-19 09:23:43 -07:00
Ed Hartnett
8ae5ebf6bc remove unneeded params from function 2018-11-16 08:26:09 -07:00
Ed Hartnett
6a66ecd3d0 moving rest of var stuff 2018-11-13 17:03:11 -07:00
Ed Hartnett
d7fe095066 moving rest of var stuff 2018-11-13 16:59:07 -07:00
Ed Hartnett
4045588516 removing hid_t from NC_VAR_INFO_T 2018-11-13 16:10:34 -07:00
Ed Hartnett
a9a0e5ba39 moving hdf5-specific var stuff 2018-11-13 10:04:55 -07:00
Ed Hartnett
18ca0e4e9a changing var to libhdf5 info 2018-11-13 09:58:32 -07:00
Ed Hartnett
87ddebac4b setting values for var hdf5 info 2018-11-13 07:31:59 -07:00
Ed Hartnett
421eb32ea0 allocating storage for var hdf5 info 2018-11-13 06:34:09 -07:00
Ed Hartnett
825047b8f6 rest of moving HDF5 specific group info to libhdf5 2018-11-12 14:04:17 -07:00
Ed Hartnett
971b16cdc8 hdf5 specific changes for hdf5open.c 2018-11-12 13:51:08 -07:00
Ed Hartnett
65bde4878b HDF5-specific group changes 2018-11-12 11:03:00 -07:00
Ed Hartnett
6b75bb483b allocating HDF5-specific group info struct for root group 2018-11-12 09:02:46 -07:00
Ed Hartnett
9364157a9b allocating and freeing memory for HDF5-specific group info 2018-11-12 08:11:06 -07:00
Ed Hartnett
7534cb24fe cleanup 2018-11-08 11:12:38 -07:00
Ed Hartnett
e88b98daaa moving hdf5_objid field to hdf5-specific dim info 2018-11-08 11:05:20 -07:00
Ed Hartnett
856e4ead03 moved hdf_dimscaleid to hdf5-specific dim info 2018-11-08 10:55:21 -07:00
Ed Hartnett
cc02ec3aa3 more use of HDF5-specific dim info 2018-11-08 10:48:02 -07:00
Ed Hartnett
3e80723525 continuing to find HDF5 specific dim info 2018-11-08 09:20:01 -07:00
Ed Hartnett
9373989931 allocating and freeing space for HDF5-specific dim info 2018-11-08 08:51:46 -07:00
Ed Hartnett
6f4b4ac80d moving attribute HDF5 stuff to libhdf5 2018-11-07 14:21:57 -07:00
Ed Hartnett
11d725facc allocating and freeing memory for hdf5-specific attribute info 2018-11-07 13:45:51 -07:00
Ed Hartnett
5f36a3b425 merged master 2018-11-02 10:00:53 -06:00
Dennis Heimbigner
245961de00 re: github issues
https://github.com/Unidata/netcdf-c/issues/1168
    https://github.com/Unidata/netcdf-c/issues/1163
    https://github.com/Unidata/netcdf-c/issues/1162

This PR partially fixes memory leaks in the netcdf-c library,
in the ncdump utility, and in some test cases.

The netcdf-c library now runs memory clean with the assumption
that the --disable-utilities option is used. The primary remaining
problem is ncgen. Once that is fixed, I believe the netcdf-c library
will run memory clean with no limitations.

Notes
-----------
1. Memory checking was performed using gcc -fsanitize=address.
   Valgrind-based testing has yet to be performed.
2. The pnetcdf, hdf4, and examples code has not been tested.

Misc. Non-leak changes
1. Make tst_diskless2 only run when netcdf4 is enabled (issue 1162)
2. Fix CmakeLists.txt to turn off logging if ENABLE_NETCDF_4 is OFF
3. Isolated all my debug scripts into a single top-level directory
   called debug
4. Fix some USE_NETCDF4 dependencies in nc_test and nc_test4 Makefile.am
2018-10-30 20:48:12 -06:00
Ed Hartnett
c90ab24b48 moving towards separating HDF5 file close from netcdf4 file close 2018-10-18 03:17:38 -06:00
Dennis Heimbigner
4636584d5b Revert/Improve nc_create + NC_DISKLESS behavior
re: https://github.com/Unidata/netcdf-c/issues/1154

Inadvertently, the behavior of NC_DISKLESS with nc_create() was
changed in release 4.6.1. Previously, the NC_WRITE flag needed
to be explicitly used with NC_DISKLESS in order to cause the
created file to be persisted to disk.

Additional analyis indicated that the current NC_DISKLESS
implementation was seriously flawed.

This PR attempts to clean up and regularize the situation with
respect to NC_DISKLESS control. One important aspect of diskless
operation is that there are two different notions of write.

1. The file is read-write vs read-only when using the netcdf API.
2. The file is persisted or not to disk at nc_close().

Previously, these two were conflated. The rules now are
as follows.

1. NC_DISKLESS + NC_WRITE means that the file is read/write using the netcdf API
2. NC_DISKLESS + NC_PERSIST means that the file is persisted to a disk file at nc_close.
3. NC_DISKLESS + NC_PERSIST + NC_WRITE means both 1 and 2.

The NC_PERSIST flag is new and takes over the obsolete NC_MPIPOSIX flag.
NC_MPIPOSIX is still defined, but is now an alias for the NC_MPIIO flag.

It is also now the case that for netcdf-4, NC_DISKLESS is independent
of NC_INMEMORY and in fact it is an error to specify both flags
simultaneously.

Finally, the MMAP code was fixed to use NC_PERSIST as well.
Also marked MMAP as deprecated.

Also added a test case to test various combinations of NC_DISKLESS,
NC_PERSIST, and NC_WRITE.

This PR affects a number of files and especially test cases
that used NC_DISKLESS.

Misc. Unrelated fixes
1. fixed some warnings in ncdump/dumplib.c
2018-10-10 13:32:17 -06:00
Wei-keng Liao
0ed70756cc Ignore flags NC_MPIIO and NC_MPIPOSIX. 2018-09-22 20:22:34 -05:00
Dennis Heimbigner
108dc0f01d Fix szip filter handling code and correspondingtests
re: https://github.com/Unidata/netcdf-c/issues/972

The current szip plugin code in the HDF5 library has some
unexpected behaviors that require some changes to how
nc_inq_var_szip is implemented and to the corresponding tests:
nc_test4/{test_szip,tst_vars3}.

Specifically, the following can happen:

1. The number of parameters provided by the user will be two,
   but the number of parameters returned by nc_inq_var_filter
   will be four because the HDF5 code (H5Zszip) will add two
   extra parameters for internal use. It turns out that the two
   parameters provided when calling nc_def_var_filter correspond
   to the first two parameters of the four parameters returned
   by nc_inq_var_filter.

2. The nc_inq_var_szip values corresponding to the ones provided
   by the caller may be different than those provided by
   nc_def_var_filter.  The value of the options_mask argument is
   known to add additional flag bits, and the pixels_per_block
   parameter may be modified.
2018-09-15 15:21:51 -06:00
Ed Hartnett
8390d572ad
Merge branch 'master' into ejh_hdf5_sep_next 2018-09-06 17:30:37 -06:00
Ward Fisher
784d777bff Merge branch 'master' into provenance.dmh 2018-09-06 15:13:09 -06:00
Ed Hartnett
80dc5bc0f7 merged master 2018-09-06 12:24:29 -06:00
Ward Fisher
fbe0a18b1c
Merge branch 'master' into ejh_loop_cleanup_2 2018-09-05 11:22:55 -06:00
Dennis Heimbigner
d62a9e623c Fix the NC_INMEMORY code to work in all cases with HDF5 1.10.
re: github issue https://github.com/Unidata/netcdf-c/issues/1111

One of the less common use cases for the in-memory feature is
apparently failing with HDF5-1.10.x.  The fix is complicated and
requires significant changes to libhdf5/nc4memcb.c. The current
setup is detailed in the file docs/inmeminternal.dox.

Additionally, it was discovered that the program
nc_test/tst_inmemory.c, which is invoked by
nc_test/run_inmemory.sh, actually was failing because of the
above problem. But the failure is not detected since the script
does not return non-zero value.

Other Changes:
1. Fix nc_test_tst_inmemory to return errors correctly.
2. Make ncdap_tests/findtestserver.c and dap4_tests/findtestserver4.c
   be generated from ncdap_test/findtestserver.c.in.
3. Make LOG() print output to stderr instead of stdout to
   avoid contaminating e.g. ncdump output.
4. Modify the handling of NC_INMEMORY and NC_DISKLESS flags
   to properly handle that NC_DISKLESS => NC_INMEMORY. This
   affects a number of code pieces, especially memio.c.
2018-09-04 11:27:47 -06:00
Dennis Heimbigner
2ea1cf5f1b There was a request to extend the provenance information
stored in the _NCProperties attribute to allow two things:
1. capture of additional library dependencies (over and above
   hdf5)
2. Recognition of non-netcdf libraries that create netcdf-4 format
   files.

To this end, the _NCProperties format has been extended to be
and arbitrary set of key=value pairs separated by commas.
This new format has version = 2, and uses commas as the pair separator.
Thus the general form is:
    _NCProperties = "version=2,key1=value,key2=value2..." ;

This new version is accompanied by a new ./configure option of the form
    --with-ncproperties="key1=value1,key2=value2..."
that specifies pairs to add to the _NCProperties attribute for all
files created with that netcdf library.

At this point, what is missing is some programmatic way to
specify either all the pairs or additional pairs
to the _NCProperties attribute. Not sure of the best way
to do this.

Builders using non-netcdf libraries can specify
whatever they want in the key value pairs (as long
as the version=2 is specified first).

By convention, the primary library is expected to be the
the first pair after the leading version=2 pair, but this
is convention only and is neither required nor enforced.

Related changes:
1. Fixed the tests that check _NCProperties to properly operate with version=2.
2. When reading a version 1 _NCProperties attribute, convert it to look
   like a version 2 attribute.
2. Added some version 2 tests to ncdump/tst_fileinfo.c and
   ncdump/tst_fileinfo.sh

Misc Changes:
1. Fix minor problem in ncdap_test/testurl.sh where a parameter to
   buildurl needed to be quoted.
2. Minor fix to ncgen to swap switches -H and -h to be consistent
   with other utilities.
3. Document the -M flag in nccopy usage() and the nccopy man page.
4. Modify a test case to use the nccopy -M flag.
2018-08-25 21:44:41 -06:00
Ed Hartnett
8885c75ade removing unneeded lookups 2018-08-22 06:08:19 -06:00
Ed Hartnett
f77350c66d testing error for HDF5 file with circular groups 2018-08-20 09:26:50 -06:00
Ed Hartnett
72805a5eed tracking enum issue 2018-08-20 05:45:30 -06:00
Ed Hartnett
412c18a4a1 fixed error handling in read_coord_dimids 2018-08-18 05:59:30 -06:00
Ed Hartnett
56a57a57f0 bring in changes from lazy vars branch 2018-08-06 10:57:19 -06:00
Ed Hartnett
9dc99c8a04 cleaned up some functions in preparation for lazy vars 2018-08-06 10:16:49 -06:00
Ed Hartnett
da7af8f1e7 comment cleanup 2018-07-21 15:36:58 -06:00
Ed Hartnett
2cc3d10f4b comment cleanup 2018-07-21 10:47:01 -06:00
Ed Hartnett
5a52f28bb7 further condensing code 2018-07-21 10:43:36 -06:00
Ed Hartnett
48d6473844 further condensing code 2018-07-21 09:31:51 -06:00
Ed Hartnett
afa7c0a87d further condensing code 2018-07-21 09:26:16 -06:00
Ed Hartnett
21dade79a9 use same callback function for global and var atts 2018-07-21 09:20:22 -06:00
Ed Hartnett
b56a8edcc9 code cleanup 2018-07-21 09:09:12 -06:00
Ed Hartnett
a17d66f66b clean up 2018-07-21 07:35:27 -06:00
Ed Hartnett
7aed50a902 performance test for fast global att reads 2018-07-21 07:29:12 -06:00
Ed Hartnett
43f094ca50 changing the way global atts are read 2018-07-21 06:46:29 -06:00
Ed Hartnett
fcbd5226a7 clean up 2018-07-20 12:39:02 -06:00
Ed Hartnett
9849c23773 now using NC_HDF5_FILE_INFO_T for hdf5 specific file info 2018-07-19 13:07:13 -06:00
Ed Hartnett
582bd5ec1a further movement towards NC_HDF5_FILE_INFO_T 2018-07-19 12:47:43 -06:00
Ed Hartnett
43b6aeca34 further movement towards NC_HDF5_FILE_INFO_T 2018-07-19 12:39:49 -06:00
Ed Hartnett
2d53a8ece0 moving hdf5 field to NC_HDF5_FILE_INFO_T 2018-07-19 12:32:06 -06:00
Ed Hartnett
752e76df22 cleanup 2018-07-17 09:03:19 -06:00
Ed Hartnett
b0f9f965b7 clean up, moved hdf5open and hdf5create code to their own code files 2018-07-17 08:29:47 -06:00