A user suggested that the nccopy -F option
syntax should be extended to support specification
of multiple (or all) variables in a single -F option.
The new syntax allows:
1. '*' as the name of the variable; this means apply the
filter to all variables in the data set.
2. *var1|var2|...* as the variable name to indicate that the filter
should be applied to the multiple specified variables.
re: issue https://github.com/Unidata/netcdf-c/issues/1278
re: issue https://github.com/Unidata/netcdf-c/issues/876
re: issue https://github.com/Unidata/netcdf-c/issues/806
* Major change to the handling of 8-byte parameters for nc_def_var_filter.
The old code was not well thought out.
* The new algorithm is documented in docs/filters.md.
* Added new utility file plugins/H5Zutil.c to support
* Modified plugins/H5Zmisc.c to use new algorithm
the new algorithm.
* Renamed include/ncfilter.h to include/netcdf_filter.h
and made it an installed header so clients can access the
new algorithm utility.
* Fixed nc_test4/tst_filterparser.c and nc_test4/test_filter_misc.c
to use the new algorithm
* libdap4/ fixes:
* d4swap.c has an error in the endian pre-processing such
that record counts were not being swapped correctly.
* d4data.c had an error in that checksums were being computed
after endian swapping rather than before.
* ocinitialize() was never being called, so xxdr bigendian handling
was never set correctly.
* Required adding debug statements to occompile
* Found and fixed memory leak in ncdump.c
Not tested:
* HDF4
* Pnetcdf
* parallel HDF5
re: issue https://github.com/Unidata/netcdf-c/issues/1251
Assume that you have the URL to a remote dataset
which is a normal netcdf-3 or netcdf-4 file.
This PR allows the netcdf-c to read that dataset's
contents as a netcdf file using HTTP byte ranges
if the remote server supports byte-range access.
Originally, this PR was set up to access Amazon S3 objects,
but it can also access other remote datasets such as those
provided by a Thredds server via the HTTPServer access protocol.
It may also work for other kinds of servers.
Note that this is not intended as a true production
capability because, as is known, this kind of access to
can be quite slow. In addition, the byte-range IO drivers
do not currently do any sort of optimization or caching.
An additional goal here is to gain some experience with
the Amazon S3 REST protocol.
This architecture and its use documented in
the file docs/byterange.dox.
There are currently two test cases:
1. nc_test/tst_s3raw.c - this does a simple open, check format, close cycle
for a remote netcdf-3 file and a remote netcdf-4 file.
2. nc_test/test_s3raw.sh - this uses ncdump to investigate some remote
datasets.
This PR also incorporates significantly changed model inference code
(see the superceded PR https://github.com/Unidata/netcdf-c/pull/1259).
1. It centralizes the code that infers the dispatcher.
2. It adds support for byte-range URLs
Other changes:
1. NC_HDF5_finalize was not being properly called by nc_finalize().
2. Fix minor bug in ncgen3.l
3. fix memory leak in nc4info.c
4. add code to walk the .daprc triples and to replace protocol=
fragment tag with a more general mode= tag.
Final Note:
Th inference code is still way too complicated. We need to move
to the validfile() model used by netcdf Java, where each
dispatcher is asked if it can process the file. This decentralizes
the inference code. This will be done after all the major new
dispatchers (PIO, Zarr, etc) have been implemented.
Primary fixes to get -ansi to work.
1. Convert all '//' C++ style comments to /*...*/ or to use #if 0...#endif
2. It turns out that when -ansi is specified, then a number of
functions no longer are defined in the header -- but they are still
in the .so file.<br>
The big example is strdup(). So, added code to include/ncconfig.h to define
externs for those missing functions that occur in more than one place.
These are enabled if !_WIN32 && __STDC__ == 1 (__STDC__ is supposed to
be the equivalent compile time flag to -ansi). Note that this requires
config.h (which references ncconfig.h) to be included in files where it is
currently not included. Single uses will be only in the file that uses them.
3. Added mmap test for the MAP_ANONYMOUS flag to configure.ac. Apparently
this is not always defined with -ansi.
4. fix some large integer constants in nc_test4/tst_atts3.c and nc_test4/tst_filterparser.c
to avoid compiler complaints.
5. fix a double constant in nc_test4/tst_filterparser.c to avoid compiler complaints.
[Note I suspect #4 and #5 will be a problem on big-endian machines, but we have no way to test]
Misc. Changes:
1. convert more instances of _MSC_VER to _WIN32.
2. added some debugging code to include/nctestserver.h
3. added comment about libdispatch/drc.c always being compiled.
4. modify parser generation in ncgen to remove unneeded files.
This is a follow up to PR https://github.com/Unidata/netcdf-c/pull/1173
Sorry that it is so big, but leak suppression can be complex.
This PR fixes all remaining memory leaks -- as determined by
-fsanitize=address, and with the exceptions noted below.
Unfortunately. there remains a significant leak that I cannot
solve. It involves vlens, and it is unclear if the leak is
occurring in the netcdf-c library or the HDF5 library.
I have added a check_PROGRAM to the ncdump directory to show the
problem. The program is called tst_vlen_demo.c To exercise it,
build the netcdf library with -fsanitize=address enabled. Then
go into ncdump and do a "make clean check". This should build
tst_vlen_demo without actually executing it. Then do the
command "./tst_vlen_demo" to see the output of the memory
checker. Note the the lost malloc is deep in the HDF5 library
(in H5Tvlen.c).
I am temporarily working around this error in the following way.
1. I modified several test scripts to not execute known vlen tests
that fail as described above.
2. Added an environment variable called NC_VLEN_NOTEST.
If set, then those specific tests are suppressed.
This should mean that the --disable-utilities option to
./configure should not need to be set to get a memory leak clean
build. This should allow for detection of any new leaks.
Note: I used an environment variable rather than a ./configure
option to control the vlen tests. This is because it is
temporary (I hope) and because it is a bit tricky for shell
scripts to access ./configure options.
Finally, as before, this only been tested with netcdf-4 and hdf5 support.
https://github.com/Unidata/netcdf-c/issues/1168https://github.com/Unidata/netcdf-c/issues/1163https://github.com/Unidata/netcdf-c/issues/1162
This PR partially fixes memory leaks in the netcdf-c library,
in the ncdump utility, and in some test cases.
The netcdf-c library now runs memory clean with the assumption
that the --disable-utilities option is used. The primary remaining
problem is ncgen. Once that is fixed, I believe the netcdf-c library
will run memory clean with no limitations.
Notes
-----------
1. Memory checking was performed using gcc -fsanitize=address.
Valgrind-based testing has yet to be performed.
2. The pnetcdf, hdf4, and examples code has not been tested.
Misc. Non-leak changes
1. Make tst_diskless2 only run when netcdf4 is enabled (issue 1162)
2. Fix CmakeLists.txt to turn off logging if ENABLE_NETCDF_4 is OFF
3. Isolated all my debug scripts into a single top-level directory
called debug
4. Fix some USE_NETCDF4 dependencies in nc_test and nc_test4 Makefile.am
re: issue https://github.com/Unidata/netcdf-c/issues/1156
Starting with HDF5 version 1.10.x, the plugin code MUST be
careful when using the standard *malloc()*, *realloc()*, and
*free()* function.
In the event that the code is allocating, reallocating, or
free'ing memory that either came from -- or will be exported to --
the calling HDF5 library, then one MUST use the corresponding
HDF5 functions *H5allocate_memory()*, *H5resize_memory()*,
*H5free_memory()* [5] to avoid memory failures.
Additionally, if your filter code leaks memory, then the HDF5 library
generates a failure something like this.
````
H5MM.c:232: H5MM_final_sanity_check: Assertion `0 == H5MM_curr_alloc_bytes_s' failed.
````
This PR modifies the code in the plugins directory to
conform to these new requirements.
This raises a question about the libhdf5 code where this
same problem may occur. We need to scan especially nc4hdf.c
to look for this problem.
re: issue https://github.com/Unidata/netcdf-c/issues/1151
Modify DAP2 and DAP4 code to handle case when _FillValue type is not
same as the parent variable type.
Specifically:
1. Define a parameter [fillmismatch] to allow this mismatch;
default is to disallow.
2. If allowed, forcibly change the type of the _FillValue to match
the parent variable.
3. If allowed Convert the values to match new type
4. Generate a log message
5. if not allowed, then fail
Implementing this required some changes to ncdap_test/dapcvt.c
Also added test cases.
Minor Unrelated Changes:
1. There were a number of warnings about e.g.
assigning a const char* to a char*. Fix these
2. In nccopy.1, replace .NP with .IP "n"
(re PR https://github.com/Unidata/netcdf-c/pull/1144)
3. fix minor error in ncdump/ocprint
re: https://github.com/Unidata/netcdf-c/issues/972
The current szip plugin code in the HDF5 library has some
unexpected behaviors that require some changes to how
nc_inq_var_szip is implemented and to the corresponding tests:
nc_test4/{test_szip,tst_vars3}.
Specifically, the following can happen:
1. The number of parameters provided by the user will be two,
but the number of parameters returned by nc_inq_var_filter
will be four because the HDF5 code (H5Zszip) will add two
extra parameters for internal use. It turns out that the two
parameters provided when calling nc_def_var_filter correspond
to the first two parameters of the four parameters returned
by nc_inq_var_filter.
2. The nc_inq_var_szip values corresponding to the ones provided
by the caller may be different than those provided by
nc_def_var_filter. The value of the options_mask argument is
known to add additional flag bits, and the pixels_per_block
parameter may be modified.
After a long discussion, I implemented the rules at the end of that issue.
They are documented in nccopy.1.
Additionally, I added a new, per-variable, -c flag that allows
for the direct setting of the chunking parameters for a variable.
The form is
-c var:c1,c2,...ck
where var is the name of the variable (possibly a fully qualified name)
and the ci are the chunksizes for that variable. It must be the case
that the rank of the variable is k. If the new form is used as well
as the old form, then the new form overrides the old form for the
specified variable. Note that multiple occurrences of the new form
-c flag may be specified.
Misc. Other fixes
1. Added -M <size> option to nccopy to specify the minimum
allowable chunksize.
2. Removed the unused variables from bigmeta.c
(Issue https://github.com/Unidata/netcdf-c/issues/1079)
3. Fixed failure of nc_test4/tst_filter.sh by using the new -M
flag (#1) to allow filter test on a small chunk size.
corresponding HDF5 operations.
re: github issue https://github.com/Unidata/netcdf-c/issues/908
also in reference to https://github.com/pydata/xarray/issues/2004
The netcdf-c library has implemented the nc_get_vars and nc_put_vars
operations as element at a time. This has resulted in very slow
operation.
This pr attempts to improve the situation for netcdf-4/hdf5 files
by using the slab operations provided by the hdf5 library. The new
implementation passes the get/put vars stride information down to
the hdf5 slab operations.
The result appears to improve performance significantly. Some simple
tests on large 2-D arrays shows speedups in excess of 150.
Misc. other changes:
1. fix bug in ncgen/semantics.c; using a list's allocated length
instead of actual length.
2. Added a temporary hook in the netcdf library plus a performance
test case (tst_varsperf.c) to estimate the speedup. After users
have had some experience with this, I will remove it, probably
after the 4.7 release.
I took Ed's advice and moved the plugin stuff to its own
top-level directory. This is an attempt to solve the problem of
copying files that we have experienced. In any case, it will
serve as a place to stick additional plugins.
The file docs/indexing.dox tries to provide design
information for the refactoring.
The primary change is to replace all walking of linked
lists with the use of the NCindex data structure.
Ncindex is a combination of a hash table (for name-based
lookup) and a vector (for walking the elements in the index).
Additionally, global vectors are added to NC_HDF5_FILE_INFO_T
to support direct mapping of an e.g. dimid to the NC_DIM_INFO_T
object. These global vectors exist for dimensions, types, and groups
because they have globally unique id numbers.
WARNING:
1. since libsrc4 and libsrchdf4 share code, there are also
changes in libsrchdf4.
2. Any outstanding pull requests that change libsrc4 or libhdf4
are likely to cause conflicts with this code.
3. The original reason for doing this was for performance improvements,
but as noted elsewhere, this may not be significant because
the meta-data read performance apparently is being dominated
by the hdf5 library because we do bulk meta-data reading rather
than lazy reading.
(I hope) metadata mechanism. This mostly just adds new pieces of
code (e.g. nclistmap) and does some minor fixes.
It should be transparent to everything else.
The next set of changes will be the big step.