Commit Graph

20 Commits

Author SHA1 Message Date
Dennis Heimbigner
4636584d5b Revert/Improve nc_create + NC_DISKLESS behavior
re: https://github.com/Unidata/netcdf-c/issues/1154

Inadvertently, the behavior of NC_DISKLESS with nc_create() was
changed in release 4.6.1. Previously, the NC_WRITE flag needed
to be explicitly used with NC_DISKLESS in order to cause the
created file to be persisted to disk.

Additional analyis indicated that the current NC_DISKLESS
implementation was seriously flawed.

This PR attempts to clean up and regularize the situation with
respect to NC_DISKLESS control. One important aspect of diskless
operation is that there are two different notions of write.

1. The file is read-write vs read-only when using the netcdf API.
2. The file is persisted or not to disk at nc_close().

Previously, these two were conflated. The rules now are
as follows.

1. NC_DISKLESS + NC_WRITE means that the file is read/write using the netcdf API
2. NC_DISKLESS + NC_PERSIST means that the file is persisted to a disk file at nc_close.
3. NC_DISKLESS + NC_PERSIST + NC_WRITE means both 1 and 2.

The NC_PERSIST flag is new and takes over the obsolete NC_MPIPOSIX flag.
NC_MPIPOSIX is still defined, but is now an alias for the NC_MPIIO flag.

It is also now the case that for netcdf-4, NC_DISKLESS is independent
of NC_INMEMORY and in fact it is an error to specify both flags
simultaneously.

Finally, the MMAP code was fixed to use NC_PERSIST as well.
Also marked MMAP as deprecated.

Also added a test case to test various combinations of NC_DISKLESS,
NC_PERSIST, and NC_WRITE.

This PR affects a number of files and especially test cases
that used NC_DISKLESS.

Misc. Unrelated fixes
1. fixed some warnings in ncdump/dumplib.c
2018-10-10 13:32:17 -06:00
Ward Fisher
784d777bff Merge branch 'master' into provenance.dmh 2018-09-06 15:13:09 -06:00
Dennis Heimbigner
d62a9e623c Fix the NC_INMEMORY code to work in all cases with HDF5 1.10.
re: github issue https://github.com/Unidata/netcdf-c/issues/1111

One of the less common use cases for the in-memory feature is
apparently failing with HDF5-1.10.x.  The fix is complicated and
requires significant changes to libhdf5/nc4memcb.c. The current
setup is detailed in the file docs/inmeminternal.dox.

Additionally, it was discovered that the program
nc_test/tst_inmemory.c, which is invoked by
nc_test/run_inmemory.sh, actually was failing because of the
above problem. But the failure is not detected since the script
does not return non-zero value.

Other Changes:
1. Fix nc_test_tst_inmemory to return errors correctly.
2. Make ncdap_tests/findtestserver.c and dap4_tests/findtestserver4.c
   be generated from ncdap_test/findtestserver.c.in.
3. Make LOG() print output to stderr instead of stdout to
   avoid contaminating e.g. ncdump output.
4. Modify the handling of NC_INMEMORY and NC_DISKLESS flags
   to properly handle that NC_DISKLESS => NC_INMEMORY. This
   affects a number of code pieces, especially memio.c.
2018-09-04 11:27:47 -06:00
Dennis Heimbigner
2ea1cf5f1b There was a request to extend the provenance information
stored in the _NCProperties attribute to allow two things:
1. capture of additional library dependencies (over and above
   hdf5)
2. Recognition of non-netcdf libraries that create netcdf-4 format
   files.

To this end, the _NCProperties format has been extended to be
and arbitrary set of key=value pairs separated by commas.
This new format has version = 2, and uses commas as the pair separator.
Thus the general form is:
    _NCProperties = "version=2,key1=value,key2=value2..." ;

This new version is accompanied by a new ./configure option of the form
    --with-ncproperties="key1=value1,key2=value2..."
that specifies pairs to add to the _NCProperties attribute for all
files created with that netcdf library.

At this point, what is missing is some programmatic way to
specify either all the pairs or additional pairs
to the _NCProperties attribute. Not sure of the best way
to do this.

Builders using non-netcdf libraries can specify
whatever they want in the key value pairs (as long
as the version=2 is specified first).

By convention, the primary library is expected to be the
the first pair after the leading version=2 pair, but this
is convention only and is neither required nor enforced.

Related changes:
1. Fixed the tests that check _NCProperties to properly operate with version=2.
2. When reading a version 1 _NCProperties attribute, convert it to look
   like a version 2 attribute.
2. Added some version 2 tests to ncdump/tst_fileinfo.c and
   ncdump/tst_fileinfo.sh

Misc Changes:
1. Fix minor problem in ncdap_test/testurl.sh where a parameter to
   buildurl needed to be quoted.
2. Minor fix to ncgen to swap switches -H and -h to be consistent
   with other utilities.
3. Document the -M flag in nccopy usage() and the nccopy man page.
4. Modify a test case to use the nccopy -M flag.
2018-08-25 21:44:41 -06:00
Ward Fisher
df4942d280 Merge branch 'master' into vars.dmh 2018-06-08 15:50:28 -06:00
Dennis Heimbigner
4b936ee26a Fix use of 'int' to represent 'hid_t' that caused HDF5 1.10 to fail.
re: github issue https://github.com/Unidata/netcdf-fortran/issues/82

This was originally discovered in the Fortran tests, but is
a problem in the C library.

The problem only occurred when using HDF5-1.10.x.  The reason it
failed is that starting with 1.10, the hid_t type was changed
from 32 bits to 64 bits.
The function libsrc4/nc4memcb.c#NC4_image_init was using type int (doh!)
to return the hdf fileid instead of hid_t type.  This, of course,
caused the id to be truncated and in turn later use of the id
caused hdf5 to fail.

Fix is trivial: replace int with hid_t. This also requires a related
change in nc4mem.c.

Also added the test case derived from the original Fortran code.

You would think I would learn...
2018-05-30 14:47:37 -06:00
Dennis Heimbigner
ee509ff4f3 Re-Implement the nc_get/put_vars operations for netcdf-4 using the
corresponding HDF5 operations.

re: github issue https://github.com/Unidata/netcdf-c/issues/908
also in reference to https://github.com/pydata/xarray/issues/2004

The netcdf-c library has implemented the nc_get_vars and nc_put_vars
operations as element at a time. This has resulted in very slow
operation.

This pr attempts to improve the situation for netcdf-4/hdf5 files
by using the slab operations provided by the hdf5 library. The new
implementation passes the get/put vars stride information down to
the hdf5 slab operations.

The result appears to improve performance significantly. Some simple
tests on large 2-D arrays shows speedups in excess of 150.

Misc. other changes:
1. fix bug in ncgen/semantics.c; using a list's allocated length
   instead of actual length.
2. Added a temporary hook in the netcdf library plus a performance
   test case (tst_varsperf.c) to estimate the speedup. After users
   have had some experience with this, I will remove it, probably
   after the 4.7 release.
2018-05-22 16:50:52 -06:00
Dennis Heimbigner
3c7ffcc6d1 Fix https://github.com/Unidata/netcdf-c/issues/963
Fix https://github.com/Unidata/netcdf-c/issues/962

1. remove the --disable-diskless option since it is no
   longer needed. Similarly for CMakeLists.txt.
2. Fixed nc4files.c where BAIL and return were mixed
   leading to situation where cleanup code was not
   being invoked. This probably occurs elsewhere,
   but I did not find any specifically.
2018-05-11 15:30:19 -06:00
Dennis Heimbigner
4739cd3225 Master merge and conflict resolution 2018-04-12 21:51:17 -06:00
Dennis Heimbigner
25f062528b This completes (for now) the refactoring of libsrc4.
The file docs/indexing.dox tries to provide design
information for the refactoring.

The primary change is to replace all walking of linked
lists with the use of the NCindex data structure.
Ncindex is a combination of a hash table (for name-based
lookup) and a vector (for walking the elements in the index).
Additionally, global vectors are added to NC_HDF5_FILE_INFO_T
to support direct mapping of an e.g. dimid to the NC_DIM_INFO_T
object. These global vectors exist for dimensions, types, and groups
because they have globally unique id numbers.

WARNING:
1. since libsrc4 and libsrchdf4 share code, there are also
   changes in libsrchdf4.
2. Any outstanding pull requests that change libsrc4 or libhdf4
   are likely to cause conflicts with this code.
3. The original reason for doing this was for performance improvements,
   but as noted elsewhere, this may not be significant because
   the meta-data read performance apparently is being dominated
   by the hdf5 library because we do bulk meta-data reading rather
   than lazy reading.
2018-03-16 11:46:18 -06:00
Dennis Heimbigner
ccc70d640b re: esupport MQO-415619
and https://github.com/Unidata/netcdf-c/issues/708

Expand the NC_INMEMORY capabilities to support writing and accessing
the final modified memory.

Three new functions have been added:
nc_open_memio, nc_create_mem, and nc_close_memio.

The following new capabilities were added.
1. nc_open_memio() allows the NC_WRITE mode flag
   so a chunk of memory can be passed in and be modified
2. nc_create_mem() allows the NC_INMEMORY flag to be set
   to cause the created file to be kept in memory.
3. nc_close_mem() allows the final in-memory contents to be
   retrieved at the time the file is closed.
4. A special flag, NC_MEMIO_LOCK, is provided to ensure that
   the provided memory will not be freed or reallocated.

Note the following.
1. If nc_open_memio() is called with NC_WRITE, and NC_MEMIO_LOCK is not set,
   then the netcdf-c library will take control of the incoming memory.
   This means that the original memory block should not be freed
   but the block returned by nc_close_mem() must be freed.
2. If nc_open_memio() is called with NC_WRITE, and NC_MEMIO_LOCK is set,
   then modifications to the original memory may fail if the space available
   is insufficient.

Documentation is provided in the file docs/inmemory.md.
A test case is provided: nc_test/tst_inmemory.c driven by
nc_test/run_inmemory.sh

WARNING: changes were made to the dispatch table for
the close entry. From int (*close)(int) to int (*close)(int,void*).
2018-02-25 21:45:31 -07:00
Dennis Heimbigner
9983b9d911 re e-support UBS-599337
re pull request https://github.com/Unidata/netcdf-c/pull/405
re pull request https://github.com/Unidata/netcdf-c/pull/446

Notes:
1. This branch is a cleanup of the magic.dmh branch.
2. magic.dmh was originally merged, but caused problems with parallel IO.
   It was re-issued as pull request https://github.com/Unidata/netcdf-c/pull/446.
3. This branch + pull request replace any previous pull requests and magic.dmh branch.

Given an otherwise valid netCDF file that has a corrupted header,
the netcdf library currently crashes. Instead, it should return
NC_ENOTNC.

Additionally, the NC_check_file_type code does not do the
forward search required by hdf5 files. It currently only looks
at file position 0 instead of 512, 1024, 2048,... Also, it turns
out that the HDF4 magic number is assumed to always be at the
beginning of the file (unlike HDF5).
The change is localized to libdispatch/dfile.c See
https://support.hdfgroup.org/release4/doc/DSpec_html/DS.pdf

Also, it turns out that the code in NC_check_file_type is duplicated
(mostly) in the function libsrc4/nc4file.c#nc_check_for_hdf.

This branch does the following.
1. Make NC_check_file_type return NC_ENOTNC instead of crashing.
2. Remove nc_check_for_hdf and centralize all file format checking
   NC_check_file_type.
3. Add proper forward search for HDF5 files (but not HDF4 files)
   to look for the magic number at offsets of 0, 512, 1024...
4. Add test tst_hdf5_offset.sh. This tests that hdf5 files with
   an offset are properly recognized. It does so by prefixing
   a legal file with some number of zero bytes: 512, 1024, etc.
5. Off-topic: Added -N flag to ncdump to force a specific output dataset name.
2017-10-24 16:25:09 -06:00
Dennis Heimbigner
47daf33074 Resolves Github issue https://github.com/Unidata/netcdf-c/issues/349.
Update utf8proc.[ch] to use the version now
maintained by the Julia Language project
(https://github.com/JuliaLang/utf8proc/blob/master/LICENSE.md).
The license for the previous version was
unacceptable for the Debian and Ubuntu release
systems. The new version both updates the code
and addresses the license issue.

It turns out that the utf8proc software we are using
was turned over to the Julia Language developers
and the license terms changed to allow modification.
(https://github.com/JuliaLang/utf8proc/blob/master/LICENSE.md).

So the fix here is as follows:
1. Wrap the library with a fixed interface: libdispatch/dutf8.c
   and include/ncutf8.h.
2. Replace the existing utf8proc code with the new version
   from https://github.com/JuliaLang/utf8proc.
3. Add a couple more test cases: nc_test/tst_utf8_validate.c
   and nc_test_utf8_phrases.c.  If/when I can find a usable
   normalization test, I will incorporate that later.
2017-02-16 14:27:54 -07:00
Dennis Heimbigner
11a259ad86 Add provenance info for netcdf-4 files.
This consists of a persistent attribute named
_NCProperties plus two computed attributes
_IsNetcdf4 and _SuperblockVersion.
See the 'Provenance Attributes' section
of docs/attribute_conventions.md for details.
2016-05-07 14:32:07 -06:00
Ward Fisher
612b35a84c Merge branch 'master' into cdf-5, in preparation for merging the CDF-5 functionality into the master branch. This will be the key new feature for netcdf 4.4.0. 2015-11-05 13:40:35 -07:00
dmh
ed57a337ec add ptrdiff_t checks 2015-10-11 13:35:44 -06:00
dmh
859f105005 merge-squash 2015-08-15 16:26:35 -06:00
dmh
d67d00ca7e re NCF-319
The pnetcdf support was not
properly being used to provide
mpi parallel io for netcdf-3 classic
files. The wrong dispatch table was being
used.  The fix was to modify
dfile.c#NC_check_file_type to properly
specify the pnetcdf dispatch table when
use_parallel was true.
2014-10-13 14:33:06 -06:00
dmh
be329e7a23 Add ability to programmatically
extract info from libnetcdf.settings
API is below.
I have made this API public yet
by adding it to netcdf.h. I will
do that when everyone is agreed on the
proper API.

extern const char* nc_settings(const char* key); /*get value of a specific key */
extern const char** nc_settings_all(); /*get all settings in envv format */
extern void nc_settings_reclaim(); /* reclaim all space and clean up */

Envv format is
{key,value}*,NULL

Also added test: nc_test/tst_settings.c
2014-08-29 14:51:14 -06:00
Dennis Heimbigner
48ca394d2e diskless: make read and write loop to read/write whole file when persisting 2012-04-09 22:03:02 +00:00