Replaces PR https://github.com/Unidata/netcdf-c/pull/3024 and PR https://github.com/Unidata/netcdf-c/pull/3033 re: https://github.com/Unidata/netcdf-c/issues/2753 As suggested by Ed Hartnett, This PR extends the netcdf.h API to support programmatic control over the search path used to locate plugins. I created several different APIs, but finally settled on the following API as being the simplest possible. It does have the disadvantage that it requires use of a global lock (not implemented) if used in a threaded environment. Specifically, note that modifying the plugin path must be done "atomically". That is, in a multi-threaded environment, it is important that the sequence of actions involved in setting up the plugin path must be done by a single processor or in some other way as to guarantee that two or more processors are not simultaneously accessing the plugin path get/set operations. As an example, assume there exists a mutex lock called PLUGINLOCK. Then any processor accessing the plugin paths should operate as follows: ```` lock(PLUGINLOCK); nc_plugin_path_get(...); <rebuild plugin path> nc_plugin_path_set(...); unlock(PLUGINLOCK); ```` ## Internal Architecture It is assumed here that there only needs to be a single set of plugin path directories that is shared by all filter code and is independent of any file descriptor; it is global in other words. This means, for example, that the path list for NCZarr and for HDF5 will always be the same. However internally, processing the set of plugin paths depends on the particular NC_FORMATX value (NC_FORMATX_NC_HDF5 and NC_FORMATX_NCZARR, currently). So the *nc_plugin_path_set* function, will take the paths it is given and propagate them to each of the NC_FORMATX dispatchers to store in a way that is appropriate to the given dispatcher. There is a complication with respect to the *nc_plugin_path_get* function. It is possible for users to bypass the netcdf API and modify the HDF5 plugin paths directly. This can result in an inconsistent plugin path between the value used by HDF5 and the global value used by netcdf-c. Since there is no obvious fix for this, we warn the user of this possibility and otherwise ignore it. ## Test Changes * New tests<br> a. unit_test/run_pluginpaths.sh -- was created to test this new capability.<br> b. A new test utility has been added as *unit_test/run_dfaltpluginpath.sh* to test the default plugin path list. * New test support utilities<br> a. unit_test/ncpluginpath.c -- report current state of the plugin path<br> b. unit_test/tst_pluginpaths.c -- test program to support run_pluginpaths.sh ## Documentation * A new file -- docs/pluginpath.md -- provides documentation of the new API. It includes some material taken fro filters.md. ## Other Major Changes 1. Cleanup the whole plugin path decision tree. This is described in the *docs/pluginpath.md* document and summarized in Addendum 2 below. 2. I noticed that the ncdump/testpathcvt.sh had been disabled, so fixed and re-enabled it. This necessitated some significant changes to dpathmgr.c. ## Misc. Changes 1. Add some path manipulation utilities to netcf_aux.h 2. Fix some minor bugs in netcdf_json.h 3. Convert netcdf_json.h and netcdf_proplist.h to BUILT_SOURCE. 4. Add NETCDF_ENABLE_HDF5 as synonym for USE_HDF5 5. Fix some size_t <-> int conversion warnings. 6. Encountered and fixed the Windows \r\n problem in tst_pluginpaths.c. 7. Cleanup some minor CMakeLists.txt problems. 8. Provide an implementation of echo -n since it appears to not be available on all platforms. 9. Add a property list mechanism to pass environmental information to filters. 10. Cleanup Doxyfile.in 11. Fixed a memory leak in libdap2; surprised that I did not find this earlier. ## Addendum 1: Proposed API The API makes use of a counted vector of strings representing the sequence of directories in the path. The relevant type definition is as follows. ```` typedef struct NCPluginList {size_t ndirs; char** dirs;} NCPluginList; ```` The API proposed in this PR looks like this (from netcdf-c/include/netcdf_filter.h). * ````int nc_plugin_path_ndirs(size_t* ndirsp);```` Arguments: *ndirsp* -- store the number of directories in this memory. This function returns the number of directories in the sequence if internal directories of the internal plugin path list. * ````int nc_plugin_path_get(NCPluginList* dirs);```` Arguments: *dirs* -- counted vector for storing the sequence of directies in the internal path list. This function returns the current sequence of directories from the internal plugin path list. Since this function does not modify the plugin path, it does not need to be locked; it is only when used to get the path to be modified that locking is required. If the value of *dirs.dirs* is NULL (the normal case), then memory is allocated to hold the vector of directories. Otherwise, use the memory of *dirs.dirs* to hold the vector of directories. * ````int nc_plugin_path_set(const NCPluginList* dirs);```` Arguments: *dirs* -- counted vector for providing the new sequence of directories in the internal path list. This function empties the current internal path sequence and replaces it with the sequence of directories argument. Using an *ndirs* argument of 0 will clear the set of plugin paths. ## Addendum 2: Build-Time and Run-Time Constants. ### Build-Time Constants <table style="border:2px solid black;border-collapse:collapse"> <tr style="outline: thin solid;" align="center"><td colspan="4">Table showing the build-time computation of NETCDF_PLUGIN_INSTALL_DIR and NETCDF_PLUGIN_SEARCH_PATH.</td> <tr style="outline: thin solid" ><th>--with-plugin-dir<th>--prefix<th>NETCDF_PLUGIN_INSTALL_DIR<th>NETCDF_PLUGIN_SEARCH_PATH <tr style="outline: thin solid" ><td>undefined<td>undefined<td>undefined<td>PLATFORMDEFALT <tr style="outline: thin solid" ><td>undefined<td><abspath-prefix><td><abspath-prefix>/hdf5/lib/plugin<td><abspath-prefix>/hdf5/lib/plugin<SEP>PLATFORMDEFALT <tr style="outline: thin solid" ><td><abspath-plugins><td>N.A.<td><abspath-plugins><td><abspath-plugins><SEP>PLATFORMDEFALT </table> <table style="border:2px solid black;border-collapse:collapse"> <tr style="outline: thin solid" align="center"><td colspan="2">Table showing the computation of the initial global plugin path</td> <tr style="outline: thin solid"><th>HDF5_PLUGIN_PATH<th>Initial global plugin path <tr style="outline: thin solid"><td>undefined<td>NETCDF_PLUGIN_SEARCH_PATH <tr style="outline: thin solid"><td><path1;...pathn><td><path1;...pathn> </table>
9.8 KiB
Appendix E. NetCDF-4 Plugin Path Support
[TOC]
Plugin Path Overview
The processes by which plugins are installed into some directory and the process by which plugins are located are unfortunately complicated. This is in part due to the historical requirements to support existing HDF5 and Zarr mechanisms.
This document describes the following major processes:
- Discovery -- at run-time, any reference to a plugin must do a search to locate a dynamic library that implements the plugin.
- Plugin Path Management -- at run-time, the client program may wish to programmatically set the sequence of directories to use in locating plugins.
- Installation -- during the build of the netcdf-c library, any compiled plugins may optionally be installed into some directory.
Discovering a Specific Plugin at Run-Time
The netcdf-c library maintains an internal sequence of directory paths -- collectively called the plugin path -- that controls the search for plugin libraries. Basically, when looking for a specific plugin, each directory in the plugin path is examined in order. For each such directory, the files in that directory are checked to see if it contains the specified plugin. The details of how a file is processed is described in the document filters.md.
The netcdf-c search algorithm is closely tied to the HDF5 discovery process. The HDF5 process searches its own internal plugin path (sequence of directories) in order to discover a specific plugin library.
The addition of NCZarr support to the netcdf-c library requires yet another plugin path (sequence of directories) for its search process.
It is important to know that the plugin path is completely controlled by a global plugin path. If it changes, then this global plugin path is propagated to HDF5 and NCZarr to ensure that all such plugin paths use the same sequence of directories for discovery.
Programmatic Management of the Plugin Path
As of netcdf-c version 4.9.3, it is possible for a client program to set the global plugin path to control plugin discovery. Since the global path and the HDF5 and NCZarr paths are kept in sync, this means that both HDF5 and NCZarr will look in the same directories in order to locate specified plugins. Appendix E.1 defines the current API for managing the global plugin path.
Note that it is best practice for a client program to use the API to set the plugin path before any calls to nc_open or nc_create. Modifying the plugin paths later may fail because it cannot be guaranteed that the underlying implementations (i.e. HDF5 or NCZarr) will take notice of the change.
When the netcdf-c library initializes itself, it chooses an initial global plugin path using the following rules, which are those used by the HDF5 library:
- If HDF5_PLUGIN_PATH environment variable is defined, then its value is used as the initial plugin path.
- If HDF5_PLUGIN_PATH is not defined, then the
initial plugin path is either:
- /usr/local/hdf5/plugin -- for Linux/Unix/Cygwin,
- %ALLUSERSPROFILE%/hdf5/lib/plugin -- for Windows/Mingw.
This initial global plugin path will be propagated to HDF5 and NCZarr.
Installing Plugins at Build-Time
At build-time, the target location directory into which libraries implementing plugins are installed is specified using a special ./configure option
--with-plugin-dir=<directorypath>
or its corresponding cmake option.
-DNETCDF_WITH_PLUGIN_DIR=<directorypath>
Build-Time Operations
At build time, certain plugin-related constants are constructed.
- NETCDF_PLUGIN_INSTALL_DIR -- the directory into which compiled plugins should be installed
- NETCDF_PLUGIN_SEARCH_PATH -- the default search path to be used at run-time if not over-ridden by the HDF5_PLUGIN_PATH environment variable.
Table showing the build-time computation of DEFAULT_PLUGIN_INSTALL_DIR and DEFAULT_PLUGIN_SEARCH_PATH. | |||
--with-plugin-dir | --prefix | DEFAULT_PLUGIN_INSTALL_DIR | DEFAULT_PLUGIN_SEARCH_PATH |
---|---|---|---|
undefined | undefined | undefined | PLATFORMDEFALT |
undefined | <abspath-prefix> | <abspath-prefix>/hdf5/lib/plugin | <abspath-prefix>/hdf5/lib/plugin<SEP>PLATFORMDEFALT |
<abspath-plugins> | N.A. | <abspath-plugins> | <abspath-plugins><SEP>PLATFORMDEFALT |
Notes:
-
HDF5_PLUGIN_PATH is ignored at build time.
-
';' is used as a placeholder for PLATFORMSEP.
-
The term PLATFORMDEFAULT stands for:
- /usr/local/hdf5/lib/plugin If on a nix machine
- %ALLUSERSPROFILE%/hdf5/lib/plugins If on a windows or Mingw platform
-
The term SEP stands for:
- ':' If on a nix machine
- ';' If on a windows or Mingw platform
Run-Time Operations
When the netcdf-c library initializes itself (at runtime), it chooses an initial global plugin path for the config.h value. This value defaults to NETCDF_PLUGIN_SEARCH_PATH. If, however, HDF5_PLUGIN_PATH is defined, then it is used to override NETCDF_PLUGIN_SEARCH_PATH.
Table showing the computation of the initial global plugin path | |
HDF5_PLUGIN_PATH | Initial global plugin path |
---|---|
undefined | NETCDF_PLUGIN_SEARCH_PATH |
<path1;...pathn> | <path1;...pathn> |
Multi-Threaded Access to the Plugin Path.
Specifically, note that modifying the plugin path must be done "atomically". That is, in a multi-threaded environment, it is important that the sequence of actions involved in setting up the plugin path must be done by a single processor or in some other way as to guarantee that two or more processors are not simultaneously accessing the plugin path get/set operations.
As an example, assume there exists a mutex lock called PLUGINLOCK. Then any processor accessing the plugin paths should operate as follows:
lock(PLUGINLOCK);
nc_plugin_path_get(...);
<rebuild plugin path>
nc_plugin_path_set(...);
unlock(PLUGINLOCK);
Internal Architecture
It is assumed here that there only needs to be a single set of plugin path directories that is shared by all filter code and is independent of any file descriptor; it is global in other words. This means, for example, that the path list for NCZarr and for HDF5 will always be the same.
However, and internally, processing the set of plugin paths depends on the particular NC_FORMATX value (NC_FORMATX_NC_HDF5 and NC_FORMATX_NCZARR, currently). So the nc_plugin_path_set function, will take the paths it is given and propagate them to each of the NC_FORMATX dispatchers to store in a way that is appropriate to the given dispatcher.
There is a complication with respect to the nc_plugin_path_get function. It is possible for users to bypass the netcdf API and modify the HDF5 plugin paths directly. This can result in an inconsistent plugin path between the value used by HDF5 and the global value used by netcdf-c. Since there is no obvious fix for this, we warn the user of this possibility and otherwise ignore it.
Appendix E.1. Programmatic Plugin Path API
The API makes use of a counted vector of strings representing the sequence of directories in the path. The relevant type definition is as follows.
typedef struct NCPluginList {size_t ndirs; char** dirs;} NCPluginList;
The API proposed in this PR looks like this (from netcdf-c/include/netcdf_filter.h).
-
int nc_plugin_path_ndirs(size_t* ndirsp);
This function returns the number of directories in the sequence if internal directories of the internal plugin path list.
The argument is as follows:
- ndirsp store the number of directories in this memory.
-
int nc_plugin_path_get(NCPluginList* dirs);
This function returns the current sequence of directories from the internal plugin path list. Since this function does not modify the plugin path, it does not need to be locked; it is only when used to get the path to be modified that locking is required.
The argument is as follows:
- dirs counted vector for storing the sequence of directies in the internal path list.
If the value of *dirs.dirs is NULL (the normal case), then memory is allocated to hold the vector of directories. Otherwise, use the memory of dirs.dirs to hold the vector of directories.
-
int nc_plugin_path_set(const NCPluginList* dirs);
This function empties the current internal path sequence and replaces it with the sequence of directories argument. Using an ndirs argument of 0 will clear the set of plugin paths.
The argument are as follows:
- dirs counted vector for storing the sequence of directies in the internal path list.
HDF5_PLUGIN_PATH is a typical Windows or Unix style path-list. That is it is a sequence of absolute directory paths separated by a specific separator character. For Windows, the separator character is a semicolon (';') and for Unix, it is a colon (':').
At the moment, NetCDF optionally (i.e. not overridden) uses the existing HDF5 environment variable HDF5_PLUGIN_PATH to locate the directories in which plugin libraries are located. It also optionally uses the last directory in the path as the installation directory. This is used both for the HDF5 filter wrappers but also the NCZarr codec wrappers.
History
Author: Dennis Heimbigner
Email: dennis.heimbigner@gmail.com
Initial Version: 9/28/2024
Last Revised: 9/28/2024