## Overwriting
I think I solved the file overwrite problem by doing light name
mangling of the shared library names. With this change the probabilty
is very small that installing our filter wrappers in a directory will
overwrite code produced by others.
## Default Install Location
I have setup the --with-plugin-dir option default to install in
the following locations in order of preference
1. If HDF5_PLUGIN_PATH is defined (at build time remember), then the last directory in that path will be where the filter wrapper shared libraries will be installed.
2. Otherwise the default is "/usr/local/hdf5/lib/plugin" (on *nix*) or "%ALLUSERSPROFILE%\\hdf5\\lib\\plugin" for Windows or Mingw.
Currently, --with-plugin-dir is disabled by default.
I should note that even if I enable it by default, installing
netcdf-c will still not run "out of the box" because the hypothetical
naive user will not know which compressor libraries need to be
pre-installed before netcdf is installed. Nor will that user have any
way to find out what needs to be installed.
re: https://github.com/Unidata/netcdf-c/issues/2294
Ed Hartnett suggested that the netcdf library installation process
be extended to install the standard filters into a user specified
location. The user can then set HDF5_PLUGIN_PATH to that location.
This PR provides that capability using:
````
configure option: --with-plugin-dir=<absolute directory path>
cmake option: -DPLUGIN_INSTALL_DIR=<absolute directory path>
````
Currently, the following plugins are always installed, if
available: bzip2, zstd, blosc.
If NCZarr is enabled, then additional plugins are installed:
fletcher32, shuffle, deflate, szip.
Additionally, the necessary codec support is installed
for each of the above filters that is installed.
## Changes:
1. Cleanup handling of built-in bzip2.
2. Add documentation to docs/filters.md
3. Re-factor the NCZarr codec libraries
4. Add a test, although it can only be exercised after
the library is installed, so it cannot be used during
normal testing.
5. Cleanup use of HDF5_PLUGIN_PATH in the filter test cases.
Update the following documentation files:
## docs/FAQ.md
* Discuss the use of UTF-8 names under Windows 10+.
## docs/filters.md
* Add documentation about NCzarr filters.
* Specifically Codec support and HDF5 <-> Codec translation
* Add documentation about standard filters
## docs/dispatch.md
* Convert from .dox format to .md (markdown) format.
* Add discussion about the user defined dispatch tables.
* Update the example.
* Abbreviate the NC_infermodel documentation and move the more detailed discusion to the companion *dinternal.md* documenation.
## docs/internal.md
This is a (mostly) new file that attempts to provide detailed
descriptions about how various features are implemented inside
the netcdf-c library. The topics currently covered the
following.
### Including C++ Code in the netcdf-c Library {#intern_c++}
The state of C compiler technology has reached the point where
it is possible to include C++ code into the netcdf-c library
code base. The document describes how to do this.
### Managing instances of complex data types
The document describes how to properly handle instances of
complex types (those with variable length). This involves
having functions that can recursively walk instances of such
types to perform various actions on them. These new functions
are intended to replace the *nc_free_vlen*, *nc_free_vlens* and
*nc_free_string* functions in *netcdf.h*.
### Inferring File Types
As described in the companion document -- docs/dispatch.md --
when *nc\_create()* or *nc\_open()* is called, the library must
figure out what kind of file is being created or opened. Once it
has figured out the file kind, the appropriate "dispatch table"
can be used to process that file.
As a result of the introduction of remote data access to the netcdf-c
library, the path arguments to *nc\_open()* and *nc\_create()* have
been extended to support URLs as paths. Processing URLs requires
some significant changes to the file inference algorithm. The
details of that processing are recorded in the document.
## S3 Related Fixes
* Add comprehensive support for specifying AWS profiles to provide access credentials.
* Parse the files "~/.aws/config" and "~/.aws/credentials to provide credentials for the HDF5 ROS3 driver and to locate default region.
* Add a function to obtain the currently active S3 credentials. The search rules are defined in docs/nczarr.md.
* Provide documentation for the new features.
* Modify the struct NCauth (in include/ncauth.h) to replace specific S3 credentials with a profile name.
* Add a unit test to test the operation of profile and credentials management.
* Add support for URLS of the form "s3://<bucket>/<key>"; this requires obtaining a default region.
* Allows the specification of profile and/or region in a URL of the form "#mode=nczarr,...&aws.region=...&aws.profile=..."
## Misc. Fixes
* Move the ezxml code to libdispatch so that it can be used both by DAP4 and nczarr.
* Modify nclist to provide a deep clone operation.
* Modify ncuri to provide a deep clone operation.
* Modify the .rc file format to allow the specification of a path to be tested when looking for an entry in the .rc file.
* Ensure that the NC_rcload function is called.
* Modify nchttp to support setting request headers.
Filter support has three goals:
1. Use the existing HDF5 filter implementations,
2. Allow filter metadata to be stored in the NumCodecs metadata format used by Zarr,
3. Allow filters to be used even when HDF5 is disabled
Detailed usage directions are define in docs/filters.md.
For now, the existing filter API is left in place. So filters
are defined using ''nc_def_var_filter'' using the HDF5 style
where the id and parameters are unsigned integers.
This is a big change since filters affect many parts of the code.
In the following, the terms "compressor" and "filter" and "codec" are generally
used synonomously.
### Filter-Related Changes:
* In order to support dynamic loading of shared filter libraries, a new library was added in the libncpoco directory; it helps to isolate dynamic loading across multiple platforms.
* Provide a json parsing library for use by plugins; this is created by merging libdispatch/ncjson.c with include/ncjson.h.
* Add a new _Codecs attribute to allow clients to see what codecs are being used; let ncdump -s print it out.
* Provide special headers to help support compilation of HDF5 filters when HDF5 is not enabled: netcdf_filter_hdf5_build.h and netcdf_filter_build.h.
* Add a number of new test to test the new nczarr filters.
* Let ncgen parse _Codecs attribute, although it is ignored.
### Plugin directory changes:
* Add support for the Blosc compressor; this is essential because it is the most common compressor used in Zarr datasets. This also necessitated adding a CMake FindBlosc.cmake file
* Add NCZarr support for the big-four filters provided by HDF5: shuffle, fletcher32, deflate (zlib), and szip
* Add a Codec defaulter (see docs/filters.md) for the big four filters.
* Make plugins work with windows by properly adding __declspec declaration.
### Misc. Non-Filter Changes
* Replace most uses of USE_NETCDF4 (deprecated) with USE_HDF5.
* Improve support for caching
* More fixes for path conversion code
* Fix misc. memory leaks
* Add new utility -- ncdump/ncpathcvt -- that does more or less the same thing as cygpath.
* Add a number of new test to test the non-filter fixes.
* Update the parsers
* Convert most instances of '#ifdef _MSC_VER' to '#ifdef _WIN32'
Priority: Low
re: issue https://github.com/Unidata/netcdf-c/issues/1329
HDF5 has the ability to programmatically define new filters,
as opposed to using HDF5_PLUGIN_PATH env variable.
This PR adds support for that feature.
Not clear how useful this is, though.
See docs/filters.md for details.
A user suggested that the nccopy -F option
syntax should be extended to support specification
of multiple (or all) variables in a single -F option.
The new syntax allows:
1. '*' as the name of the variable; this means apply the
filter to all variables in the data set.
2. *var1|var2|...* as the variable name to indicate that the filter
should be applied to the multiple specified variables.
re: issue https://github.com/Unidata/netcdf-c/issues/1278
re: issue https://github.com/Unidata/netcdf-c/issues/876
re: issue https://github.com/Unidata/netcdf-c/issues/806
* Major change to the handling of 8-byte parameters for nc_def_var_filter.
The old code was not well thought out.
* The new algorithm is documented in docs/filters.md.
* Added new utility file plugins/H5Zutil.c to support
* Modified plugins/H5Zmisc.c to use new algorithm
the new algorithm.
* Renamed include/ncfilter.h to include/netcdf_filter.h
and made it an installed header so clients can access the
new algorithm utility.
* Fixed nc_test4/tst_filterparser.c and nc_test4/test_filter_misc.c
to use the new algorithm
* libdap4/ fixes:
* d4swap.c has an error in the endian pre-processing such
that record counts were not being swapped correctly.
* d4data.c had an error in that checksums were being computed
after endian swapping rather than before.
* ocinitialize() was never being called, so xxdr bigendian handling
was never set correctly.
* Required adding debug statements to occompile
* Found and fixed memory leak in ncdump.c
Not tested:
* HDF4
* Pnetcdf
* parallel HDF5
re: issue https://github.com/Unidata/netcdf-c/issues/1156
Starting with HDF5 version 1.10.x, the plugin code MUST be
careful when using the standard *malloc()*, *realloc()*, and
*free()* function.
In the event that the code is allocating, reallocating, or
free'ing memory that either came from -- or will be exported to --
the calling HDF5 library, then one MUST use the corresponding
HDF5 functions *H5allocate_memory()*, *H5resize_memory()*,
*H5free_memory()* [5] to avoid memory failures.
Additionally, if your filter code leaks memory, then the HDF5 library
generates a failure something like this.
````
H5MM.c:232: H5MM_final_sanity_check: Assertion `0 == H5MM_curr_alloc_bytes_s' failed.
````
This PR modifies the code in the plugins directory to
conform to these new requirements.
This raises a question about the libhdf5 code where this
same problem may occur. We need to scan especially nc4hdf.c
to look for this problem.
re: https://github.com/Unidata/netcdf-c/issues/972
The current szip plugin code in the HDF5 library has some
unexpected behaviors that require some changes to how
nc_inq_var_szip is implemented and to the corresponding tests:
nc_test4/{test_szip,tst_vars3}.
Specifically, the following can happen:
1. The number of parameters provided by the user will be two,
but the number of parameters returned by nc_inq_var_filter
will be four because the HDF5 code (H5Zszip) will add two
extra parameters for internal use. It turns out that the two
parameters provided when calling nc_def_var_filter correspond
to the first two parameters of the four parameters returned
by nc_inq_var_filter.
2. The nc_inq_var_szip values corresponding to the ones provided
by the caller may be different than those provided by
nc_def_var_filter. The value of the options_mask argument is
known to add additional flag bits, and the pixels_per_block
parameter may be modified.
and https://github.com/Unidata/netcdf-c/issues/708
Expand the NC_INMEMORY capabilities to support writing and accessing
the final modified memory.
Three new functions have been added:
nc_open_memio, nc_create_mem, and nc_close_memio.
The following new capabilities were added.
1. nc_open_memio() allows the NC_WRITE mode flag
so a chunk of memory can be passed in and be modified
2. nc_create_mem() allows the NC_INMEMORY flag to be set
to cause the created file to be kept in memory.
3. nc_close_mem() allows the final in-memory contents to be
retrieved at the time the file is closed.
4. A special flag, NC_MEMIO_LOCK, is provided to ensure that
the provided memory will not be freed or reallocated.
Note the following.
1. If nc_open_memio() is called with NC_WRITE, and NC_MEMIO_LOCK is not set,
then the netcdf-c library will take control of the incoming memory.
This means that the original memory block should not be freed
but the block returned by nc_close_mem() must be freed.
2. If nc_open_memio() is called with NC_WRITE, and NC_MEMIO_LOCK is set,
then modifications to the original memory may fail if the space available
is insufficient.
Documentation is provided in the file docs/inmemory.md.
A test case is provided: nc_test/tst_inmemory.c driven by
nc_test/run_inmemory.sh
WARNING: changes were made to the dispatch table for
the close entry. From int (*close)(int) to int (*close)(int,void*).
2. Fixed plugin building (nc_test4/hdf5plugins)
to be done properly by cmake and automake.
4. Duplicated part of the nc_test4 filter test code
in examples/C
An incomplete and untested set of hooks exist
for OS-X in nc_test4/findplugins.in. They need testing.
2. Factored out the parameter string parsing for ncgen and nccopy
int libdispatch/dfilter.c + include/ncfilter.h
3. Allow a parameter string to use constant types other than
unsigned int. See docs/filters.md for details.
4. Moved the old content of include/netcdf_filter.h into include/netcdf.h
and removed include/netcdf_filter.h as no longer needed.
5. Force the test filter (bzip2) in nc_test4/filter_test to
be built using BUILT_SOURCES.
to docs/filter.md
2. Moved location of filter.md in documentation
3. Add a template file as the basis for building new filters.
4. Did some test case cleanup
1. Allow nccopy to apply filters, especially on the output file.
This provides a third way to do this other than using ncgen or
programatically
2. Make sure that even if the filter code is not available, it is
possible to see the filter id and parameters for variables using
e.g ncdump -hs.
3. Fix bug in nccopy so that the input file does
not necessarily have to be netcdf-4.
4. At last minute decided to change to using a
single "_Filter" attribute for ncgen
5. Added a test to tst_filter.sh to generate C code using ncgen.
This relies on the HDF5 capability to
dynamically load compression filters.
Note that a compression filter is just
a subcase of filters.
The primary user-visible changes are as follows:
1. Add a standard header "netcdf_filter.h" that defines
the necessary API extensions
2. Modify ncgen to support two new special attributes
"_Filter_ID" and "_Filter_Parameters" so that compression
can be turned on when creating a file using ncgen.
4. Add a detailed description of filtering support
to the user's guide; see the file filters.md
5. Add a test case directory for this: nc_test4/filter_test.
It is fragile and a ./configure flags (-enable-filter-test)
is defined (default disabled) to shut this off this test
to avoid spurious 'make check' failures.
Note that the HDF5 documentation is not up-to-date, so
much of what is encoded here comes from examining the
actual code in the file H5PL.c in the HDF5 source code.