re: github issue https://github.com/Unidata/netcdf-fortran/issues/82
This was originally discovered in the Fortran tests, but is
a problem in the C library.
The problem only occurred when using HDF5-1.10.x. The reason it
failed is that starting with 1.10, the hid_t type was changed
from 32 bits to 64 bits.
The function libsrc4/nc4memcb.c#NC4_image_init was using type int (doh!)
to return the hdf fileid instead of hid_t type. This, of course,
caused the id to be truncated and in turn later use of the id
caused hdf5 to fail.
Fix is trivial: replace int with hid_t. This also requires a related
change in nc4mem.c.
Also added the test case derived from the original Fortran code.
You would think I would learn...
corresponding HDF5 operations.
re: github issue https://github.com/Unidata/netcdf-c/issues/908
also in reference to https://github.com/pydata/xarray/issues/2004
The netcdf-c library has implemented the nc_get_vars and nc_put_vars
operations as element at a time. This has resulted in very slow
operation.
This pr attempts to improve the situation for netcdf-4/hdf5 files
by using the slab operations provided by the hdf5 library. The new
implementation passes the get/put vars stride information down to
the hdf5 slab operations.
The result appears to improve performance significantly. Some simple
tests on large 2-D arrays shows speedups in excess of 150.
Misc. other changes:
1. fix bug in ncgen/semantics.c; using a list's allocated length
instead of actual length.
2. Added a temporary hook in the netcdf library plus a performance
test case (tst_varsperf.c) to estimate the speedup. After users
have had some experience with this, I will remove it, probably
after the 4.7 release.
Fix unused variable warning for `printid`. Looks like it was not substituted in all places where it should have been and was only used inside an ifdef so was unused if `LOGGING` was false. Add use in `fprintf` line following the ifdef block.
Minor change, but compilation output is so clean that a little wart sticks out...
In hdf5-1.10.2, THG added a new enumeration for use in the `H5Pset_libver_bounds` function which makes it easier to maintain compatibility with hdf5-1.8.X clients.
An explanation is provided at https://www.hdfgroup.org/2018/04/why-should-i-care-about-the-hdf5-1-10-2-release/
In particular, the section "What does this change mean to an HDF5 application?" discusses its use with NetCDF
The file docs/indexing.dox tries to provide design
information for the refactoring.
The primary change is to replace all walking of linked
lists with the use of the NCindex data structure.
Ncindex is a combination of a hash table (for name-based
lookup) and a vector (for walking the elements in the index).
Additionally, global vectors are added to NC_HDF5_FILE_INFO_T
to support direct mapping of an e.g. dimid to the NC_DIM_INFO_T
object. These global vectors exist for dimensions, types, and groups
because they have globally unique id numbers.
WARNING:
1. since libsrc4 and libsrchdf4 share code, there are also
changes in libsrchdf4.
2. Any outstanding pull requests that change libsrc4 or libhdf4
are likely to cause conflicts with this code.
3. The original reason for doing this was for performance improvements,
but as noted elsewhere, this may not be significant because
the meta-data read performance apparently is being dominated
by the hdf5 library because we do bulk meta-data reading rather
than lazy reading.
and https://github.com/Unidata/netcdf-c/issues/708
Expand the NC_INMEMORY capabilities to support writing and accessing
the final modified memory.
Three new functions have been added:
nc_open_memio, nc_create_mem, and nc_close_memio.
The following new capabilities were added.
1. nc_open_memio() allows the NC_WRITE mode flag
so a chunk of memory can be passed in and be modified
2. nc_create_mem() allows the NC_INMEMORY flag to be set
to cause the created file to be kept in memory.
3. nc_close_mem() allows the final in-memory contents to be
retrieved at the time the file is closed.
4. A special flag, NC_MEMIO_LOCK, is provided to ensure that
the provided memory will not be freed or reallocated.
Note the following.
1. If nc_open_memio() is called with NC_WRITE, and NC_MEMIO_LOCK is not set,
then the netcdf-c library will take control of the incoming memory.
This means that the original memory block should not be freed
but the block returned by nc_close_mem() must be freed.
2. If nc_open_memio() is called with NC_WRITE, and NC_MEMIO_LOCK is set,
then modifications to the original memory may fail if the space available
is insufficient.
Documentation is provided in the file docs/inmemory.md.
A test case is provided: nc_test/tst_inmemory.c driven by
nc_test/run_inmemory.sh
WARNING: changes were made to the dispatch table for
the close entry. From int (*close)(int) to int (*close)(int,void*).
2. Fixed plugin building (nc_test4/hdf5plugins)
to be done properly by cmake and automake.
4. Duplicated part of the nc_test4 filter test code
in examples/C
An incomplete and untested set of hooks exist
for OS-X in nc_test4/findplugins.in. They need testing.
Fix#299
The conditions to make this error are the following:
* Two variables with different chunk sizes
* Both variables write on the same unlimited dimension
* The first variable has already written data when the second variable is created
re pull request https://github.com/Unidata/netcdf-c/pull/405
re pull request https://github.com/Unidata/netcdf-c/pull/446
Notes:
1. This branch is a cleanup of the magic.dmh branch.
2. magic.dmh was originally merged, but caused problems with parallel IO.
It was re-issued as pull request https://github.com/Unidata/netcdf-c/pull/446.
3. This branch + pull request replace any previous pull requests and magic.dmh branch.
Given an otherwise valid netCDF file that has a corrupted header,
the netcdf library currently crashes. Instead, it should return
NC_ENOTNC.
Additionally, the NC_check_file_type code does not do the
forward search required by hdf5 files. It currently only looks
at file position 0 instead of 512, 1024, 2048,... Also, it turns
out that the HDF4 magic number is assumed to always be at the
beginning of the file (unlike HDF5).
The change is localized to libdispatch/dfile.c See
https://support.hdfgroup.org/release4/doc/DSpec_html/DS.pdf
Also, it turns out that the code in NC_check_file_type is duplicated
(mostly) in the function libsrc4/nc4file.c#nc_check_for_hdf.
This branch does the following.
1. Make NC_check_file_type return NC_ENOTNC instead of crashing.
2. Remove nc_check_for_hdf and centralize all file format checking
NC_check_file_type.
3. Add proper forward search for HDF5 files (but not HDF4 files)
to look for the magic number at offsets of 0, 512, 1024...
4. Add test tst_hdf5_offset.sh. This tests that hdf5 files with
an offset are properly recognized. It does so by prefixing
a legal file with some number of zero bytes: 512, 1024, etc.
5. Off-topic: Added -N flag to ncdump to force a specific output dataset name.
dap code will create a real temporary file in which to store the
converted metadata for the DAP .dds or .dmr.
It was assumed that the nc_close code would reclaim the
temporary file. For DAP2, reclamation occurs in the ncio
code. For DAP4, it was assumed that the libsrc4 code would do
the reclamation, but for whatever reason, this is not happening.
Thus, in this situation, a temporary file is left in the file
system. Aside from being irritating to users, this screws up
'make distcheck'.
So the DAP4 code is fixed to ensure that the temporary file is
properly reclaimed independent of the libsrc4 code.
nc4_check_name() checks that the provided string doesn't exceed NC_MAX_NAME,
but fails to do so after calling nc_utf8_normalize(). This extra check is
needed since a caller of nc4_check_name(), like NC4_def_dim, allocates
norm_name as char norm_name[NC_MAX_NAME + 1]
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=2840
Credit to OSS-Fuzz
were added to provide a path name converter from e.g. cygwin
paths to e.g. windows paths. This is necessary because
the shell scripts may produce cygwin paths, but the code
may have been compiled with Visual Studio. Similar issues
arise with Mingw.
At appropriate places, and if using Visual Studio or Mingw,
I added calls to the path conversion code.
Apparently I forgot to find all the places where this
conversion was needed. So this pr does the following:
1. Push the calls to the converter to the various libXXX
directories and out of libdispatch/dfile.c.
2. Add conversion calls to other parts of the code like oc2.
I also turns out that conversion code in dapcvt.c
had a bug when handling DAP Byte type under visual studio.
Notes:
1. there may still be places I missed that need to do path conversion.
2. need to make sure that calls to e.g. H5open also use converted path.
Current code only frees char* text in error cases. It should
also free it in success case.
Otherwise Valgrind reports a leak:
==28536== 64 bytes in 1 blocks are definitely lost in loss record 4 of 13
==28536== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==28536== by 0xE673496: NC4_buildpropinfo (nc4info.c:239)
==28536== by 0xE67313B: NC4_put_propattr (nc4info.c:162)
==28536== by 0xE65BF35: nc4_create_file (nc4file.c:468)
==28536== by 0xE65C0AC: NC4_create (nc4file.c:564)
==28536== by 0xE608A08: NC_create (dfile.c:1773)
==28536== by 0xE607E6A: nc__create (dfile.c:511)
==28536== by 0xE607E23: nc_create (dfile.c:440)
Credit to OSS Fuzz
This relies on the HDF5 capability to
dynamically load compression filters.
Note that a compression filter is just
a subcase of filters.
The primary user-visible changes are as follows:
1. Add a standard header "netcdf_filter.h" that defines
the necessary API extensions
2. Modify ncgen to support two new special attributes
"_Filter_ID" and "_Filter_Parameters" so that compression
can be turned on when creating a file using ncgen.
4. Add a detailed description of filtering support
to the user's guide; see the file filters.md
5. Add a test case directory for this: nc_test4/filter_test.
It is fragile and a ./configure flags (-enable-filter-test)
is defined (default disabled) to shut this off this test
to avoid spurious 'make check' failures.
Note that the HDF5 documentation is not up-to-date, so
much of what is encoded here comes from examining the
actual code in the file H5PL.c in the HDF5 source code.
Specific changes:
1. Add dap4 code: libdap4 and dap4_test.
Note that until the d4ts server problem is solved, dap4 is turned off.
2. Modify various files to support dap4 flags:
configure.ac, Makefile.am, CMakeLists.txt, etc.
3. Add nc_test/test_common.sh. This centralizes
the handling of the locations of various
things in the build tree: e.g. where is
ncgen.exe located. See nc_test/test_common.sh
for details.
4. Modify .sh files to use test_common.sh
5. Obsolete separate oc2 by moving it to be part of
netcdf-c. This means replacing code with netcdf-c
equivalents.
5. Add --with-testserver to configure.ac to allow
override of the servers to be used for --enable-dap-remote-tests.
6. There were multiple versions of nctypealignment code. Try to
centralize in libdispatch/doffset.c and include/ncoffsets.h
7. Add a unit test for the ncuri code because of its complexity.
8. Move the findserver code out of libdispatch and into
a separate, self contained program in ncdap_test and dap4_test.
9. Move the dispatch header files (nc{3,4}dispatch.h) to
.../include because they are now shared by modules.
10. Revamp the handling of TOPSRCDIR and TOPBUILDDIR for shell scripts.
11. Make use of MREMAP if available
12. Misc. minor changes e.g.
- #include <config.h> -> #include "config.h"
- Add some no-install headers to /include
- extern -> EXTERNL and vice versa as needed
- misc header cleanup
- clean up checking for misc. unix vs microsoft functions
13. Change copyright decls in some files to point to LICENSE file.
14. Add notes to RELEASENOTES.md
The nvars, ndims, and natts fields on the NC_HDF5_FILE_INFO struct are
never set. The nvars field is read, but since it is never written,
the value is always zero.
Update utf8proc.[ch] to use the version now
maintained by the Julia Language project
(https://github.com/JuliaLang/utf8proc/blob/master/LICENSE.md).
The license for the previous version was
unacceptable for the Debian and Ubuntu release
systems. The new version both updates the code
and addresses the license issue.
It turns out that the utf8proc software we are using
was turned over to the Julia Language developers
and the license terms changed to allow modification.
(https://github.com/JuliaLang/utf8proc/blob/master/LICENSE.md).
So the fix here is as follows:
1. Wrap the library with a fixed interface: libdispatch/dutf8.c
and include/ncutf8.h.
2. Replace the existing utf8proc code with the new version
from https://github.com/JuliaLang/utf8proc.
3. Add a couple more test cases: nc_test/tst_utf8_validate.c
and nc_test_utf8_phrases.c. If/when I can find a usable
normalization test, I will incorporate that later.
HDF5 does not permit a variable to have no_fill == TRUE if the variable type is variable length. This includes types NC_STRING and NC_VLEN.
The test above also excludes user-defined types which I'm not sure is needed or not.
In nc4 mode, the variables were ignoring the default "database" fill/no_fill mode set via a call to nc_set_fill(). This sets the no_fill mode on the variable to match the default database setting at the time the variable is defined.
Modified provenance code to allocate the minimal space
needed for _NCProperties attribute in file. Basically
required using malloc in the provenance code and in ncdump.
Otherwise should cause no externally visible effects.
Also removed the ENABLE_FILEINFO from configure.ac since
the provenance code is no longer optional.
This modifies the previous change to be more pedantically correct. It should always be an NC_EINVALCOORDS error if start exceeds fdims[2]; however, if start equals fdims[2], then it is only an error if count is non-zero.
The following code is in nc4hdf.c, function `nc4_put_vara`.
```
/* Check dimension bounds. Remember that unlimited dimnsions can
* put data beyond their current length. */
for (d2 = 0; d2 < var->ndims; d2++)
{
dim = var->dim[d2];
assert(dim && dim->dimid == var->dimids[d2]);
if (!dim->unlimited)
{
if (start[d2] >= (hssize_t)fdims[d2])
BAIL_QUIET(NC_EINVALCOORDS);
if (start[d2] + count[d2] > fdims[d2])
BAIL_QUIET(NC_EEDGE);
}
}
```
There is an issue when the process with the highest rank has zero items to output. As an example, if I have 4 mpi processes which are each writing the following amount of data:
* rank 0: 0 items
* rank 1: 2548 items
* rank 2: 4352 items
* rank 3: 0 items.
I will define the variable to have a length of 6900 items (0 + 2548 + 4352 + 0). When I am outputting data to the variable, each rank will call nc_put_vara_longlong with the following start and count values:
* rank 0: start = 0, count = 0
* rank 1: start = 0, count = 2548
* rank 2: start = 2548, count = 4352
* rank 3: start = 6900, count = 0.
In each case, the `start` for rank N is equal to `start` for rank N-1 + `count` for rank N-1. This all works ok until the highest rank is writing 0 items. In that case, the `start` value for that rank is equal to the total size of the variable and the check in the code fragment shown above fails since `start[] == fdims[]`.
This could be fixed in the application code by checking whether the `count` is zero and if so, then set `start` to 0 also, but I think that is a kluge that should not be required.
Note that this test appears three times in this file. In one case, the check for non-zero count already exists, but not in the other two. This pull request adds the check to the other two tests.
The H5Aexists hdf5 function does the same function as the manually coded loop with much less code and fewer function calls.
Also, the H5Aopen_idx and H5Aget_num_attrs functions are deprecated.
If H5Aopen_idx on line 1964 fails, then attid will be < 0. The BAIL will goto exit at line 1989 and then the test of "if (attid ...)" at line 1995 will pass (attd != 0) and then call H5Aclose(attid) with a negative attid. Similar issue for spaceid.
Result of function if probably the same since there is a failure somewhere, but more difficult to track down if looks like failure is happening in the wrong place.
not being computed. Fix in nc4file.c.
Not sure how this ever worked for any variable.
What is also weird is that the dim hash is
apparently being computed.
re: github netcdf-c issue #271
This occurs for several reasons, including:
1. using H5Aopen_name instead of H5Aexists to test if attribute exists.
2. using H5Eset_auto instead of H5Eset_auto2.
There are probably others that will have to be extinguished as encountered.
p.s Hope I did not overdo this and kill too much.
The hash field for phony dimensions was not being set
(in nc4hdf.c). Also added test case (nc_test4/?).
Note that I searched for other similar failures and
did not find any, but I may have missed them.
If in parallel mode and the H5Fcreate fails, then the h5->comm field may/will point to MPI_COMM_NULL instead of to a valid communicator. If that is passed to MPI_Comm_free, then the code will core dump down in the MPI_Comm_free function and not return a valid return status to the caller routine. If communicator is checked, then execution proceeds back up the call tree and caller can report a better error message about the failed file create.
Charlie Zender noted that we forgot to define what happens for
various netcdf API attribute operations, notably nc_inq_att()
and nc_get_att().
So, I added a list of legal and illegal api calls for the provenance
attributes in docs/attribute_conventions.md.
I also added more test cases to ncdump/tst_fileinfo.c to verify
and fixed resultant errors.
This consists of a persistent attribute named
_NCProperties plus two computed attributes
_IsNetcdf4 and _SuperblockVersion.
See the 'Provenance Attributes' section
of docs/attribute_conventions.md for details.
The var struct has a 'dim' field which was not being used
Instead, the dimids field would always search for the dim
with the matching dimid. For db with large numbers of dims,
this could be a significant time sync.
Modified code to always set var-dim[i] when var->dimids[i] was
set (if the dim existed at that point). Then use the var->dim
field instead of var->dimids and search whenever requested.
All var->dim accesses are protected by asserts that verify
non-null and that the var->dim[]->dimid == var->dimids[].
In non-classic netcdf-4 models, it is allowable to have
large numbers of dims and vars. In many operations, the
entire list of dims or vars is searched for a dim/var matching
a specific name which results in *lots* of strncmp or strcmp
calls.
If we add a hash field to the var and dim structs similar to what
has already been done for the netcdf-3 formats, then we can hash the
name being searched for and numerically compare that value with
the var/dim hash value. If they match, then do a more expensive
strncmp call to ensure that the names truly match.