Update selected documentation

Update the following documentation files: ## docs/FAQ.md * Discuss the use of UTF-8 names under Windows 10+. ## docs/filters.md * Add documentation about NCzarr filters. * Specifically Codec support and HDF5 <-> Codec translation * Add documentation about standard filters ## docs/dispatch.md * Convert from .dox format to .md (markdown) format. * Add discussion about the user defined dispatch tables. * Update the example. * Abbreviate the NC_infermodel documentation and move the more detailed discusion to the companion *dinternal.md* documenation. ## docs/internal.md This is a (mostly) new file that attempts to provide detailed descriptions about how various features are implemented inside the netcdf-c library. The topics currently covered the following. ### Including C++ Code in the netcdf-c Library {#intern_c++} The state of C compiler technology has reached the point where it is possible to include C++ code into the netcdf-c library code base. The document describes how to do this. ### Managing instances of complex data types The document describes how to properly handle instances of complex types (those with variable length). This involves having functions that can recursively walk instances of such types to perform various actions on them. These new functions are intended to replace the *nc_free_vlen*, *nc_free_vlens* and *nc_free_string* functions in *netcdf.h*. ### Inferring File Types As described in the companion document -- docs/dispatch.md -- when *nc\_create()* or *nc\_open()* is called, the library must figure out what kind of file is being created or opened. Once it has figured out the file kind, the appropriate "dispatch table" can be used to process that file. As a result of the introduction of remote data access to the netcdf-c library, the path arguments to *nc\_open()* and *nc\_create()* have been extended to support URLs as paths. Processing URLs requires some significant changes to the file inference algorithm. The details of that processing are recorded in the document.
2025-04-12 18:10:24 +08:00 · 2022-03-16 12:38:00 -06:00 · 2022-03-16 12:38:00 -06:00 · f1eaefd91e
commit f1eaefd91e
parent e0a2236b5a
7 changed files with 1594 additions and 293 deletions
--- a/.github/workflows/run_tests_win_mingw.yml
+++ b/.github/workflows/run_tests_win_mingw.yml
@ -7,7 +7,7 @@
 name: Run MSYS2, MinGW64-based Tests


-on: [ pull_request ]
+on: [pull_request]

 jobs:

--- a/docs/Doxyfile.in
+++ b/docs/Doxyfile.in
@ -754,7 +754,8 @@ INPUT = \
    @abs_top_srcdir@/docs/COPYRIGHT.md \
    @abs_top_srcdir@/docs/credits.md \
    @abs_top_srcdir@/docs/tutorial.dox \
-    @abs_top_srcdir@/docs/internal.dox \
+    @abs_top_srcdir@/docs/internal.md \
+    @abs_top_srcdir@/docs/dispatch.md \
    @abs_top_srcdir@/docs/inmeminternal.dox \
    @abs_top_srcdir@/docs/indexing.dox \
    @abs_top_srcdir@/docs/testserver.dox \
--- a/docs/FAQ.md
+++ b/docs/FAQ.md
@ -1079,9 +1079,22 @@ and writable by programs that used older versions of the libraries.
 However, programs linked to older library versions will not be able to
 create new data objects with the new less-restrictive names.

-How difficult is it to convert my application to handle arbitrary netCDF-4 files? {#How-difficult-is-it-to-convert-my-application-to-handle-arbitrary-netCDF-4-files}
+Can I use UTF-8 File Names with Windows? {#Can-I-use-UTF-8-File-Names-with-Windows}
 -----------------

+Starting with Windows 10 build 17134, Windows can support use of
+the UTF-8 character set. We strongly encourage Windows users to
+enable this feature. This requires the following steps.
+
+1. In the "run" toolbar, execute the command "intl.cpl".
+2. Move to the Administrative tab.
+3. Move to "Change system locale"
+4. Check the box at the bottom labeled something like
+"Beta: Use Unicode UTF-8 for worldwide language support"
+
+
+How difficult is it to convert my application to handle arbitrary netCDF-4 files? {#How-difficult-is-it-to-convert-my-application-to-handle-arbitrary-netCDF-4-files}
+-----------------

 Modifying an application to fully support the new enhanced data model
 may be relatively easy or arbitrarily difficult :-), depending on what
--- a/docs/Makefile.am
+++ b/docs/Makefile.am
@ -9,7 +9,7 @@
 # These files will be included with the dist.
 EXTRA_DIST = netcdf.m4 DoxygenLayout.xml Doxyfile.in footer.html	\
 mainpage.dox tutorial.dox			\
-architecture.dox internal.dox windows-binaries.md			\
+architecture.dox internal.md windows-binaries.md dispatch.md \
 building-with-cmake.md CMakeLists.txt groups.dox notes.md	\
 install-fortran.md all-error-codes.md credits.md auth.md filters.md \
 obsolete/fan_utils.html indexing.dox	\
--- a/docs/dispatch.md
+++ b/docs/dispatch.md
@ -0,0 +1,507 @@
+Internal Dispatch Table Architecture
+============================
+<!-- double header is needed to workaround doxygen bug -->
+
+# Internal Dispatch Table Architecture
+
+\tableofcontents
+
+# Introduction {#dispatch_intro}
+
+The netcdf-c library uses an internal dispatch mechanism
+as the means for wrapping the netcdf-c API around a wide variety
+of underlying storage and stream data formats.
+As of last check, the following formats are supported and each
+has its own dispatch table.
+
+Warning: some of the listed function signatures may be out of date
+and the specific code should be consulted to see the actual parameters.
+
+<table>
+<tr><th>Format<td>Directory<th>NC_FORMATX Name
+<tr><td>NetCDF-classic<td>libsrc<td>NC_FORMATX_NC3
+<tr><td>NetCDF-enhanced<td>libhdf5<td>NC_FORMATX_NC_HDF5
+<tr><td>HDF4<td>libhdf4<td>NC_FORMATX_NC_HDF4
+<tr><td>PNetCDF<td>libsrcp<td>NC_FORMATX_PNETCDF
+<tr><td>DAP2<td>libdap2<td>NC_FORMATX_DAP2
+<tr><td>DAP4<td>libdap4<td>NC_FORMATX_DAP4
+<tr><td>UDF0<td>N.A.<td>NC_FORMATX_UDF0
+<tr><td>UDF1<td>N.A.<td>NC_FORMATX_UDF1
+<tr><td>NCZarr<td>libnczarr<td>NC_FORMATX_NCZARR
+</table>
+
+Note that UDF0 and UDF1 allow for user-defined dispatch tables to
+be implemented.
+
+The idea is that when a user opens or creates a netcdf file, a
+specific dispatch table is chosen.  A dispatch table is a struct
+containing an entry for (almost) every function in the netcdf-c API.
+During execution, netcdf API calls are channeled through that
+dispatch table to the appropriate function for implementing that
+API call. The functions in the dispatch table are not quite the
+same as those defined in *netcdf.h*. For simplicity and
+compactness, some netcdf.h API calls are mapped to the same
+dispatch table function. In addition to the functions, the first
+entry in the table defines the model that this dispatch table
+implements. It will be one of the NC_FORMATX_XXX values.
+The second entry in the table is the version of the dispatch table.
+The rule is that previous entries may not be removed, but new entries
+may be added, and adding new entries increases the version number.
+
+The dispatch table represents a distillation of the netcdf API down to
+a minimal set of internal operations. The format of the dispatch table
+is defined in the file *libdispatch/ncdispatch.h*. Every new dispatch
+table must define this minimal set of operations.
+
+# Adding a New Dispatch Table
+In order to make this process concrete, let us assume we plan to add
+an in-memory implementation of netcdf-3.
+
+## Defining configure.ac flags
+
+Define a *–-enable* flag option for *configure.ac*.  For our
+example, we assume the option "--enable-ncm" and the
+internal corresponding flag "enable_ncm". If you examine the existing
+*configure.ac* and see how, for example, *--enable_dap2* is
+defined, then it should be clear how to do it for your code.
+
+## Defining a "name space"
+
+Choose some prefix of characters to identify the new dispatch
+system. In effect we are defining a name-space. For our in-memory
+system, we will choose "NCM" and "ncm". NCM is used for non-static
+procedures to be entered into the dispatch table and ncm for all other
+non-static procedures. Note that the chosen prefix should probably start
+with "nc" or "NC" in order to avoid name conflicts outside the netcdf-c library.
+
+## Extend include/netcdf.h
+
+Modify the file *include/netcdf.h* to add an NC_FORMATX_XXX flag
+by adding a flag for this dispatch format at the appropriate places.
+````
+  #define NC_FORMATX_NCM  7
+````
+
+Add any format specific new error codes.
+````
+#define NC_ENCM  (?)
+````
+
+## Extend include/ncdispatch.h
+
+Modify the file *include/ncdispatch.h* to
+add format specific data and initialization functions;
+note the use of our NCM namespace.
+````
+    #ifdef ENABLE_NCM
+    extern NC_Dispatch* NCM_dispatch_table;
+    extern int NCM_initialize(void);
+    #endif
+````
+
+## Define the dispatch table functions
+
+Define the functions necessary to fill in the dispatch table. As a
+rule, we assume that a new directory is defined, *libsrcm*, say. Within
+this directory, we need to define *Makefile.am* and *CMakeLists.txt*.
+We also need to define the source files
+containing the dispatch table and the functions to be placed in the
+dispatch table -– call them *ncmdispatch.c* and *ncmdispatch.h*. Look at
+*libsrc/nc3dispatch.[ch]* or *libnczarr/zdispatch.[ch]* for examples.
+
+Similarly, it is best to take existing *Makefile.am* and *CMakeLists.txt*
+files (from *libsrcp* for example) and modify them.
+
+## Adding the dispatch code to libnetcdf
+
+Provide for the inclusion of this library in the final libnetcdf
+library. This is accomplished by modifying *liblib/Makefile.am* by
+adding something like the following.
+````
+     if ENABLE_NCM
+        libnetcdf_la_LIBADD += $(top_builddir)/libsrcm/libnetcdfm.la
+     endif
+````
+
+## Extend library initialization
+
+Modify the *NC_initialize* function in *liblib/nc_initialize.c* by adding
+appropriate references to the NCM dispatch function.
+````
+     #ifdef ENABLE_NCM
+     extern int NCM_initialize(void);
+     #endif
+     ...
+     int NC_initialize(void)
+     {
+     ...
+     #ifdef ENABLE_NCM
+         if((stat = NCM_initialize())) return stat;
+     #endif
+     ...
+     }
+````
+
+Finalization is handled in an analogous fashion.
+
+## Testing the new dispatch table
+
+Add a directory of tests: *ncm_test*, say. The file *ncm_test/Makefile.am*
+will look something like this.
+````
+     # These files are created by the tests.
+     CLEANFILES = ...
+     # These are the tests which are always run.
+     TESTPROGRAMS = test1 test2 ...
+     test1_SOURCES = test1.c ...
+     ...
+     # Set up the tests.
+     check_PROGRAMS = $(TESTPROGRAMS)
+     TESTS = $(TESTPROGRAMS)
+     # Any extra files required by the tests
+     EXTRA_DIST = ...
+````
+
+# Top-Level build of the dispatch code
+
+Provide for *libnetcdfm* to be constructed by adding the following to
+the top-level *Makefile.am*.
+
+````
+     if ENABLE_NCM
+     NCM=libsrcm
+     NCMTESTDIR=ncm_test
+     endif
+     ...
+     SUBDIRS = ... $(DISPATCHDIR)  $(NCM) ... $(NCMTESTDIR)
+````
+
+# Choosing a Dispatch Table
+
+The dispatch table is ultimately chosen by the function
+NC_infermodel() in libdispatch/dinfermodel.c. This function is
+invoked by the NC_create and the NC_open procedures.  This can
+be, unfortunately, a complex process. The detailed operation of 
+NC_infermodel() is defined in the companion document in docs/dinternal.md.
+
+In any case, the choice of dispatch table is currently based on the following
+pieces of information.
+
+1. The mode argument – this can be used to detect, for example, what kind
+of file to create: netcdf-3, netcdf-4, 64-bit netcdf-3, etc.
+Using a mode flag is the most common mechanism, in which case
+*netcdf.h* needs to be modified to define the relevant mode flag.
+
+2. The file path – this can be used to detect, for example, a DAP url
+versus a normal file system file. If the path looks like a URL, then
+the fragment part of the URL is examined to determine the specific
+dispatch function.
+
+3. The file contents - when the contents of a real file are available,
+the contents of the file can be used to determine the dispatch table.
+As a rule, this is likely to be useful only for *nc_open*.
+
+4. If the file is being opened vs being created.
+
+5. Is parallel IO available?
+
+The *NC_infermodel* function returns two values.
+
+1. model - this is used by nc_open and nc_create to choose the dispatch table.
+2. newpath - in some case, usually URLS, the path may be rewritten to include extra information for use by the dispatch functions.
+
+# Special Dispatch Table Signatures.
+
+The entries in the dispatch table do not necessarily correspond
+to the external API. In many cases, multiple related API functions
+are merged into a single dispatch table entry.
+
+## Create/Open
+
+The create table entry and the open table entry in the dispatch table
+have the following signatures respectively.
+````
+     int (*create)(const char *path, int cmode,
+                size_t initialsz, int basepe, size_t *chunksizehintp,
+                int useparallel, void* parameters,
+                struct NC_Dispatch* table, NC* ncp);
+
+     int (*open)(const char *path, int mode,
+              int basepe, size_t *chunksizehintp,
+              int use_parallel, void* parameters,
+              struct NC_Dispatch* table, NC* ncp);
+````
+
+The key difference is that these are the union of all the possible
+create/open signatures from the include/netcdfXXX.h files. Note especially the last
+three parameters. The parameters argument is a pointer to arbitrary data
+to provide extra info to the dispatcher.
+The table argument is included in case the create
+function (e.g. *NCM_create_) needs to invoke other dispatch
+functions. The very last argument, ncp, is a pointer to an NC
+instance. The raw NC instance will have been created by *libdispatch/dfile.c*
+and is passed to e.g. open with the expectation that it will be filled in
+by the dispatch open function.
+
+## Accessing Data with put_vara() and get_vara()
+
+````
+     int (*put_vara)(int ncid, int varid, const size_t *start, const size_t *count,
+                          const void *value, nc_type memtype);
+````
+
+````
+     int (*get_vara)(int ncid, int varid, const size_t *start, const size_t *count,
+                     void *value, nc_type memtype);
+````
+
+Most of the parameters are similar to the netcdf API parameters. The
+last parameter, however, is the type of the data in
+memory. Additionally, instead of using an "int islong" parameter, the
+memtype will be either ::NC_INT or ::NC_INT64, depending on the value
+of sizeof(long). This means that even netcdf-3 code must be prepared
+to encounter the ::NC_INT64 type.
+
+## Accessing Attributes with put_attr() and get_attr()
+
+````
+     int (*get_att)(int ncid, int varid, const char *name,
+                         void *value, nc_type memtype);
+````
+
+````
+     int (*put_att)(int ncid, int varid, const char *name, nc_type datatype, size_t len,
+                    const void *value, nc_type memtype);
+````
+
+Again, the key difference is the memtype parameter. As with
+put/get_vara, it used ::NC_INT64 to encode the long case.
+
+## Pre-defined Dispatch Functions
+
+It is sometimes not necessary to implement all the functions in the
+dispatch table. Some pre-defined functions are available which may be
+used in many cases.
+
+## Inquiry Functions
+
+Many of The netCDF inquiry functions operate from an in-memory model of
+metadata. Once a file is opened, or a file is created, this
+in-memory metadata model is kept up to date. Consequenty the inquiry
+functions do not depend on the dispatch layer code. These functions
+can be used by all dispatch layers which use the internal netCDF
+enhanced data model.
+
+- NC4_inq
+- NC4_inq_type
+- NC4_inq_dimid
+- NC4_inq_dim
+- NC4_inq_unlimdim
+- NC4_inq_att
+- NC4_inq_attid
+- NC4_inq_attname
+- NC4_get_att
+- NC4_inq_varid
+- NC4_inq_var_all
+- NC4_show_metadata
+- NC4_inq_unlimdims
+- NC4_inq_ncid
+- NC4_inq_grps
+- NC4_inq_grpname
+- NC4_inq_grpname_full
+- NC4_inq_grp_parent
+- NC4_inq_grp_full_ncid
+- NC4_inq_varids
+- NC4_inq_dimids
+- NC4_inq_typeids
+- NC4_inq_type_equal
+- NC4_inq_user_type
+- NC4_inq_typeid
+
+## NCDEFAULT get/put Functions
+
+The mapped (varm) get/put functions have been
+implemented in terms of the array (vara) functions. So dispatch layers
+need only implement the vara functions, and can use the following
+functions to get the and varm functions:
+
+- NCDEFAULT_get_varm
+- NCDEFAULT_put_varm
+
+For the netcdf-3 format, the strided functions (nc_get/put_vars)
+are similarly implemented in terms of the vara functions. So the following
+convenience functions are available.
+
+- NCDEFAULT_get_vars
+- NCDEFAULT_put_vars
+
+For the netcdf-4 format, the vars functions actually exist, so
+the default vars functions are not used.
+
+## Read-Only Functions
+
+Some dispatch layers are read-only (ex. HDF4). Any function which
+writes to a file, including nc_create(), needs to return error code
+::NC_EPERM. The following read-only functions are available so that
+these don't have to be re-implemented in each read-only dispatch layer:
+
+- NC_RO_create
+- NC_RO_redef
+- NC_RO__enddef
+- NC_RO_sync
+- NC_RO_set_fill
+- NC_RO_def_dim
+- NC_RO_rename_dim
+- NC_RO_rename_att
+- NC_RO_del_att
+- NC_RO_put_att
+- NC_RO_def_var
+- NC_RO_rename_var
+- NC_RO_put_vara
+- NC_RO_def_var_fill
+
+## Classic NetCDF Only Functions
+
+There are two functions that are only used in the classic code. All
+other dispatch layers (except PnetCDF) return error ::NC_ENOTNC3 for
+these functions. The following functions are provided for this
+purpose:
+
+- NOTNC3_inq_base_pe
+- NOTNC3_set_base_pe
+
+# HDF4 Dispatch Layer as a Simple Example
+
+The HDF4 dispatch layer is about the simplest possible dispatch
+layer. It is read-only, classic model. It will serve as a nice, simple
+example of a dispatch layer.
+
+Note that the HDF4 layer is optional in the netCDF build. Not all
+users will have HDF4 installed, and those users will not build with
+the HDF4 dispatch layer enabled. For this reason HDF4 code is guarded
+as follows.
+````
+#ifdef USE_HDF4
+...
+#endif /*USE_HDF4*/
+````
+
+Code in libhdf4 is only compiled if HDF4 is
+turned on in the build.
+
+### The netcdf.h File
+
+In the main netcdf.h file, we have the following:
+
+````
+#define NC_FORMATX_NC_HDF4   (3)
+````
+
+### The ncdispatch.h File
+
+In ncdispatch.h we have the following:
+
+````
+#ifdef USE_HDF4
+extern NC_Dispatch* HDF4_dispatch_table;
+extern int HDF4_initialize(void);
+extern int HDF4_finalize(void);
+#endif
+````
+
+### The netcdf_meta.h File
+
+The netcdf_meta.h file allows for easy determination of what features
+are in use. For HDF4, It contains the following, set by configure:
+````
+...
+#define NC_HAS_HDF4      0 /*!< HDF4 support. */
+...
+````
+
+### The hdf4dispatch.h File
+
+The file *hdf4dispatch.h* contains prototypes and
+macro definitions used within the HDF4 code in libhdf4. This include
+file should not be used anywhere except in libhdf4.
+
+### Initialization Code Changes in liblib Directory
+
+The file *nc_initialize.c* is modified to include the following:
+````
+#ifdef USE_HDF4
+extern int HDF4_initialize(void);
+extern int HDF4_finalize(void);
+#endif
+````
+
+### Changes to libdispatch/dfile.c
+
+In order for a dispatch layer to be used, it must be correctly
+determined in functions *NC_open()* or *NC_create()* in *libdispatch/dfile.c*.
+HDF4 has a magic number that is detected in
+*NC_interpret_magic_number()*, which allows *NC_open* to automatically
+detect an HDF4 file.
+
+Once HDF4 is detected, the *model* variable is set to *NC_FORMATX_NC_HDF4*,
+and later this is used in a case statement:
+````
+      case NC_FORMATX_NC_HDF4:
+         dispatcher = HDF4_dispatch_table;
+         break;
+````
+
+This sets the dispatcher to the HDF4 dispatcher, which is defined in
+the libhdf4 directory.
+
+### Dispatch Table in libhdf4/hdf4dispatch.c
+
+The file *hdf4dispatch.c* contains the definition of the HDF4 dispatch
+table. It looks like this:
+````
+/* This is the dispatch object that holds pointers to all the
+ * functions that make up the HDF4 dispatch interface. */
+static NC_Dispatch HDF4_dispatcher = {
+NC_FORMATX_NC_HDF4,
+NC_DISPATCH_VERSION,
+NC_RO_create,
+NC_HDF4_open,
+NC_RO_redef,
+NC_RO__enddef,
+NC_RO_sync,
+...
+NC_NOTNC4_set_var_chunk_cache,
+NC_NOTNC4_get_var_chunk_cache,
+...
+};
+````
+Note that most functions use some of the predefined dispatch
+functions. Functions that start with NC_RO* are read-only, they return
+::NC_EPERM. Functions that start with NOTNC4* return ::NC_ENOTNC4.
+
+Only the functions that start with NC_HDF4* need to be implemented for
+the HDF4 dispatch layer. There are 6 such functions:
+
+- NC_HDF4_open
+- NC_HDF4_abort
+- NC_HDF4_close
+- NC_HDF4_inq_format
+- NC_HDF4_inq_format_extended
+- NC_HDF4_get_vara
+
+### HDF4 Reading Code
+
+The code in *hdf4file.c* opens the HDF4 SD dataset, and reads the
+metadata. This metadata is stored in the netCDF internal metadata
+model, allowing the inq functions to work.
+
+The code in *hdf4var.c* does an *nc_get_vara()* on the HDF4 SD
+dataset. This is all that is needed for all the nc_get_* functions to
+work.
+
+# Point of Contact {#filters_poc}
+
+*Author*: Dennis Heimbigner<br>
+*Email*: dmh at ucar dot edu<br>
+*Initial Version*: 12/22/2021<br>
+*Last Revised*: 12/22/2021
--- a/docs/filters.md
+++ b/docs/filters.md
--- a/docs/internal.md
+++ b/docs/internal.md
@ -0,0 +1,639 @@
+Notes On the Internals of the NetCDF-C Library
+============================
+<!-- double header is needed to workaround doxygen bug -->
+
+# Notes On the Internals of the NetCDF-C Library {#intern_head}
+
+\tableofcontents
+
+This document attempts to record important information about
+the internal architecture and operation of the netcdf-c library.
+
+# 1. Including C++ Code in the netcdf-c Library {#intern_c++}
+
+The state of C compiler technology has reached the point where
+it is possible to include C++ code into the netcdf-c library
+code base. Two examples are:
+
+1. The AWS S3 SDK wrapper *libdispatch/ncs3sdk.cpp* file.
+2. The TinyXML wrapper *ncxml\_tinyxml2.cpp* file.
+
+However there are some consequences that must be handled for this to work.
+Specifically, the compiler must be told that the C++ runtime is needed
+in the following ways.
+
+## Modifications to *lib\_flags.am*
+Suppose we have a flag *ENABLE\_XXX* where that XXX
+feature entails using C++ code. Then the following must be added
+to *lib\_flags.am*
+````
+if ENABLE_XXX
+AM_LDFLAGS += -lstdc++
+endif
+````
+
+## Modifications to *libxxx/Makefile.am*
+
+The Makefile in which the C++ code is included and compiled
+(assumed here to be the *libxxx* directory) must have this set.
+````
+AM_CXXFLAGS = -std=c++11
+````
+It is possible that other values (e.g. *-std=c++14*) may also work.
+
+# 2. Managing instances of complex data types
+
+For a long time, there have been known problems with the
+management of complex types containing VLENs.  This also
+involves the string type because it is stored as a VLEN of
+chars.
+
+The term complex type refers to any type that directly or
+recursively references a VLEN type. So an array of VLENS, a
+compound with a VLEN field, and so on.
+
+In order to properly handle instances of these complex types, it
+is necessary to have function that can recursively walk
+instances of such types to perform various actions on them.  The
+term "deep" is also used to mean recursive.
+
+Two deep walking operations are provided by the netcdf-c library
+to aid in managing instances of complex structures.
+* free'ing an instance of the complex type
+* copying an instance of the complex type.
+
+Previously The netcdf-c library only did shallow free and shallow copy of
+complex types. This meant that only the top level was properly
+free'd or copied, but deep internal blocks in the instance were
+not touched. This led to a host of memory leaks and failures
+when the deep data was effectively shared between the netcdf-c library
+internally and the user's data. 
+
+Note that the term "vector" is used to mean a contiguous (in
+memory) sequence of instances of some type. Given an array with,
+say, dimensions 2 X 3 X 4, this will be stored in memory as a
+vector of length 2*3*4=24 instances.
+
+The use cases are primarily these.
+
+## nc\_get\_vars
+Suppose one is reading a vector of instances using nc\_get\_vars
+(or nc\_get\_vara or nc\_get\_var, etc.).  These functions will
+return the vector in the top-level memory provided.  All
+interior blocks (form nested VLEN or strings) will have been
+dynamically allocated. Note that computing the size of the vector 
+may be tricky because the strides must be taken into account.
+
+After using this vector of instances, it is necessary to free
+(aka reclaim) the dynamically allocated memory, otherwise a
+memory leak occurs.  So, the recursive reclaim function is used
+to walk the returned instance vector and do a deep reclaim of
+the data.
+
+Currently functions are defined in netcdf.h that are supposed to
+handle this: nc\_free\_vlen(), nc\_free\_vlens(), and
+nc\_free\_string().  Unfortunately, these functions only do a
+shallow free, so deeply nested instances are not properly
+handled by them. They are marked in the description as
+deprecated in favor of the newer recursive function.
+
+## nc\_put\_vars
+
+Suppose one is writing a vector of instances using nc\_put\_vars
+(or nc\_put\_vara or nc\_put\_var, etc.).  These functions will
+write the contents of the vector to the specified variable.
+Note that internally, the data passed to the nc\_put\_xxx function is
+immediately written so there is no need to copy it internally. But the
+caller may need to reclaim the vector of data that was created and passed
+in to the nc\_put\_xxx function.
+
+After writing this vector of instances, and assuming it was dynamically
+created, at some point it will be necessary to reclaim that data.
+So again, the recursive reclaim function can be used
+to walk the returned instance vector and do a deep reclaim of
+the data.
+
+## nc\_put\_att
+Suppose one is writing a vector of instances as the data of an attribute
+using, say, nc\_put\_att.
+
+Internally, the incoming attribute data must be copied and stored
+so that changes/reclamation of the input data will not affect
+the attribute. Note that this copying behavior is different from
+writing to a variable, where the data is written immediately.
+
+Again, the code inside the netcdf library used to use only shallow copying
+rather than deep copy. As a result, one saw effects such as described
+in Github Issue https://github.com/Unidata/netcdf-c/issues/2143.
+
+Also, after defining the attribute, it may be necessary for the user
+to free the data that was provided as input to nc\_put\_att() as in the
+nc\_put\_xxx functions (previously described).
+
+## nc\_get\_att
+Suppose one is reading a vector of instances as the data of an attribute
+using, say, nc\_get\_att.
+
+Internally, the existing attribute data must be copied and returned
+to the caller, and the caller is responsible for reclaiming
+the returned data.
+
+Again, the code inside the netcdf library used to only do shallow copying
+rather than deep copy. So this could lead to memory leaks and errors
+because the deep data was shared between the library and the user.
+
+## New Instance Walking API
+
+Proper recursive functions were added to the netcdf-c library to
+provide reclaim and copy functions and use those as needed.
+These functions are defined in libdispatch/dinstance.c and their
+signatures are defined in include/netcdf.h. For back
+compatibility, corresponding "ncaux\_XXX" functions are defined
+in include/netcdf\_aux.h.
+````
+int nc_reclaim_data(int ncid, nc_type xtypeid, void* memory, size_t count);
+int nc_reclaim_data_all(int ncid, nc_type xtypeid, void* memory, size_t count);
+int nc_copy_data(int ncid, nc_type xtypeid, const void* memory, size_t count, void* copy);
+int nc_copy_data_all(int ncid, nc_type xtypeid, const void* memory, size_t count, void** copyp);
+````
+There are two variants. The first two, nc\_reclaim\_data() and
+nc\_copy\_data(), assume the top-level vector is managed by the
+caller. For reclaim, this is so the user can use, for example, a
+statically allocated vector. For copy, it assumes the user
+provides the space into which the copy is stored.
+
+The second two, nc\_reclaim\_data\_all() and
+nc\_copy\_data\_all(), allows the functions to manage the
+top-level.  So for nc\_reclaim\_data\_all, the top level is
+assumed to be dynamically allocated and will be free'd by
+nc\_reclaim\_data\_all().  The nc\_copy\_data\_all() function
+will allocate the top level and return a pointer to it to the
+user. The user can later pass that pointer to
+nc\_reclaim\_data\_all() to reclaim the instance(s).
+
+# Internal Changes
+The netcdf-c library internals are changed to use the proper reclaim
+and copy functions. This also allows some simplification of the code
+since the stdata and vldata fields of NC\_ATT\_INFO are no longer needed.
+Currently this is commented out using the SEPDATA \#define macro.
+When the bugs are found and fixed, all this code will be removed.
+
+## Optimizations
+
+In order to make these functions as efficient as possible, it is
+desirable to classify all types as to whether or not they contain
+variable-size data. If a type is fixed sized (i.e. does not contain
+variable-size data) then it can be freed or copied as a single chunk.
+This significantly increases the performance for such types.
+For variable-size types, it is necessary to walk each instance of the type
+and recursively reclaim or copy it. As another optimization,
+if the type is a vector of strings, then the per-instance walk can be
+sped up by doing the reclaim or copy inline.
+
+The rules for classifying types as fixed or variable size are as follows.
+
+1. All atomic types, except string, are fixed size.
+2. All enum type and opaque types are fixed size.
+3. All string types and VLEN types are variable size.
+4. A compound type is fixed size if all of the types of its
+   fields are fixed size. Otherwise it has variable size.
+
+The classification of types can be made at the time the type is defined
+or is read in from some existing file. The reclaim and copy functions
+use this information to speed up the handling of fixed size types.
+
+# Warnings
+
+1. The new API functions require that the type information be
+   accessible. This means that you cannot use these functions
+   after the file has been closed. After the file is closed, you
+   are on your own.
+
+2. There is still one known failure that has not been solved; it is
+   possibly an HDF5 memory leak. All the failures revolve around
+   some variant of this .cdl file. The proximate cause of failure is
+   the use of a VLEN FillValue.
+````
+        netcdf x {
+        types:
+          float(*) row_of_floats ;
+        dimensions:
+          m = 5 ;
+        variables:
+          row_of_floats ragged_array(m) ;
+              row_of_floats ragged_array:_FillValue = {-999} ;
+        data:
+          ragged_array = {10, 11, 12, 13, 14}, {20, 21, 22, 23}, {30, 31, 32}, 
+                         {40, 41}, _ ;
+        }
+````
+
+# 3. Inferring File Types
+
+As described in the companion document -- docs/dispatch.md --
+when nc\_create() or nc\_open() is called, it must figure out what
+kind of file is being created or opened.  Once it has figured out
+the file kind, the appropriate "dispatch table" can be used
+to process that file.
+
+## The Role of URLs
+
+Figuring out the kind of file is referred to as model inference
+and is, unfortunately, a complicated process. The complication
+is mostly a result of allowing a path argument to be a URL.
+Inferring the file kind from a URL requires deep processing of
+the URL structure: the protocol, the host, the path, and the fragment
+parts in particular. The query part is currently not used because
+it usually contains information to be processed by the server
+receiving the URL.
+
+The "fragment" part of the URL may be unfamiliar.
+The last part of a URL may optionally contain a fragment, which
+is syntactically of this form in this pseudo URL specification.
+````
+<protocol>://<host>/<path>?<query>#<fragment>
+````
+The form of the fragment is similar to a query and takes this general form.
+````
+'#'<key>=<value>&<key>=<value>&...
+````
+The key is a simple name, the value is any sequence of characters,
+although URL special characters such as '&' must be URL encoded in
+the '%XX' form where each X is a hexadecimal digit.
+An example might look like this non-sensical example:
+````
+https://host.com/path#mode=nczarr,s3&bytes
+````
+It is important to note that the fragment part is not intended to be
+passed to the server, but rather is processed by the client program.
+It is this property that allows the netcdf-c library to use it to
+pass information deep into the dispatch table code that is processing the
+URL.
+
+## Model Inference Inputs
+
+The inference algorithm is given the following information
+from which it must determine the kind of file being accessed.
+
+### Mode
+
+The mode is a set of flags that are passed as the second
+argument to nc\_create and nc\_open. The set of flags is define in
+the netcdf.h header file. Generally it specifies the general
+format of the file: netcdf-3 (classic) or netcdf-4 (enhanced).
+Variants of these can also be specified, e.g. 64-bit netcdf-3 or
+classic netcdf-4.
+In the case where the path argument is a simple file path, 
+using a mode flag is the most common mechanism for specifying
+the model.
+
+### Path
+The file path, the first argument to nc\_create and nc\_open,
+Can be either a simple file path or a URL.
+If it is a URL, then it will be deeply inspected to determine
+the model.
+
+### File Contents
+When the contents of a real file are available,
+the contents of the file can be used to determine the dispatch table.
+As a rule, this is likely to be useful only for *nc\_open*.
+It also requires access to functions that can open and read at least
+the initial part of the file.
+As a rule, the initial small prefix of the file is read
+and examined to see if it matches any of the so-called
+"magic numbers" that indicate the kind of file being read.
+
+### Open vs Create
+Is the file being opened or is it being created?
+
+### Parallelism
+Is parallel IO available?
+
+## Model Inference Outputs
+The inferencing algorithm outputs two pieces of information.
+
+1. model - this is used by nc\_open and nc\_create to choose the dispatch table.
+2. newpath - in some case, usually URLS, the path may be rewritten to include extra information for use by the dispatch functions.
+
+The model output is actually a struct containing two fields:
+
+1. implementation - this is a value from the NC\_FORMATX\_xxx
+   values in netcdf.h. It generally determines the dispatch
+   table to use.
+2. format -- this is an NC\_FORMAT\_xxx value defining, in effect,
+   the netcdf-format to which the underlying format is to be
+   translated. Thus it can tell the netcdf-3 dispatcher that it
+   should actually implement CDF5 rather than standard netcdf classic.
+
+## The Inference Algorithm
+
+The construction of the model is primarily carried out by the function
+*NC\_infermodel()* (in *libdispatch/dinfermodel.c).
+It is given the following parameters:
+1. path -- (IN) absolute file path or URL
+2. modep -- (IN/OUT) the set of mode flags given to *NC\_open* or *NC\_create*.
+3. iscreate -- (IN) distinguish open from create.
+4. useparallel -- (IN) indicate if parallel IO can be used.
+5. params -- (IN/OUT) arbitrary data dependent on the mode and path.
+6. model -- (IN/OUT) place to store inferred model.
+7. newpathp -- (OUT) the canonical rewrite of the path argument.
+
+As a rule, these values are used in the this order of preference
+to infer the model.
+
+1. file contents -- highest precedence
+2. url (if it is one) -- using the "mode=" key in the fragment (see below).
+3. mode flags
+4. default format -- lowest precedence
+ 
+The sequence of steps is as follows.
+
+### URL processing -- processuri()
+
+If the path appears to be a URL, then it is parsed
+and processed by the processuri function as follows.
+
+1. Protocol --
+The protocol is extracted and tested against the list of
+legal protocols. If not found, then it is an error.
+If found, then it is replaced by a substitute -- if specified.
+So, for example, the protocol "dods" is replaced the protocol "http"
+(note that at some point "http" will be replaced with "https").
+Additionally, one or more "key=value" strings is appended
+to the existing fragment of the url. So, again for "dods",
+the fragment is extended by the string "mode=dap2".
+Thus replacing "dods" does not lose information, but rather transfers
+it to the fragment for later use. 
+
+2. Fragment --
+After the protocol is processed, the initial fragment processing occurs
+by converting it to a list data structure of the form
+````
+        {<key>,<value>,<key>,<value>,<key>,<value>....}
+````
+
+### Macro Processing -- processmacros()
+
+If the fragment list produced by processuri() is non-empty, then
+it is processed for "macros". Notice that if the original path
+was not a URL, then the fragment list is empty and this
+processing will be bypassed.  In any case, It is convenient to
+allow some singleton fragment keys to be expanded into larger
+fragment components. In effect, those singletons act as
+macros. They can help to simplify the user's URL. The term
+singleton means a fragment key with no associated value:
+"#bytes", for example.
+
+The list of fragments is searched looking for keys whose
+value part is NULL or the empty string. Then the table
+of macros is searched for that key and if found, then
+a key and values is appended to the fragment list and the singleton
+is removed.
+
+### Mode Inference -- processinferences()
+
+This function just processes the list of values associated
+with the "mode" key. It is similar to a macro in that
+certain mode values are added or removed based on tables
+of "inferences" and "negations".
+Again, the purpose is to allow users to provide simplified URL fragments.
+
+The list of mode values is repeatedly searched and whenever a value
+is found that is in the "modeinferences" table, then the associated inference value
+is appended to the list of mode values. This process stops when no changes
+occur. This form of inference allows the user to specify "mode=zarr"
+and have it converted to "mode=nczarr,zarr". This avoids the need for the
+dispatch table code to do the same inference.
+
+After the inferences are made, The list of mode values is again
+repeatedly searched and whenever a value
+is found that is in the "modenegations" table, then the associated negation value
+is removed from the list of mode values, assuming it is there. This process stops when no changes
+occur. This form of inference allows the user to make sure that "mode=bytes,nczarr"
+has the bytes mode take precedence by removing the "nczarr" value. Such illegal
+combinations can occur because of previous processing steps.
+
+### Fragment List Normalization
+As the fragment list is processed, duplicates appear with the same key.
+A function -- cleanfragments() -- is applied to clean up the fragment list
+by coalesing the values of duplicate keys and removing duplicate key values.
+
+### S3 Rebuild
+If the URL is determined to be a reference to a resource on the Amazon S3 cloud,
+then the URL needs to be converted to what is called "path format".
+There are four S3 URL formats:
+
+1. Virtual -- ````https://<bucket>.s3.<region>.amazonaws.com/<path>````
+2. Path -- ````https://s3.<region>.amazonaws.com/<bucket>/<path>````
+3. S3 -- ````s3://<bucket>/<path>````
+4. Other -- ````https://<host>/<bucket>/<path>````
+
+The S3 processing converts all of these to the Path format. In the "S3" format case
+it is necessary to find or default the region from examining the ".aws" directory files.
+
+### File Rebuild
+If the URL protocol is "file" and its path is a relative file path,
+then it is made absolute by prepending the path of the current working directory.
+
+In any case, after S3 or File rebuilds, the URL is completely
+rebuilt using any modified protocol, host, path, and
+fragments. The query is left unchanged in the current algorithm.
+The resulting rebuilt URL is passed back to the caller.
+
+### Mode Key Processing
+The set of values of the fragment's "mode" key are processed one by one
+to see if it is possible to determine the model.
+There is a table for format interpretations that maps a mode value
+to the model's implementation and format. So for example,
+if the mode value "dap2" is encountered, then the model
+implementation is set to NC\_FORMATX\_DAP2 and the format
+is set to NC\_FORMAT\_CLASSIC.
+
+### Non-Mode Key Processing
+If processing the mode does not tell us the implementation, then
+all other fragment keys are processed to see if the implementaton
+(and format) can be deduced. Currently this does nothing.
+
+### URL Defaults
+If the model is still not determined and the path is a URL, then
+the implementation is defaulted to DAP2. This is for back
+compatibility when all URLS implied DAP2.
+
+### Mode Flags
+In the event that the path is not a URL, then it is necessary
+to use the mode flags and the isparallel arguments to choose a model.
+This is just a straight forward flag checking exercise.
+
+### Content Inference -- check\_file\_type()
+If the path is being opened (as opposed to created), then
+it may be possible to actually read the first few bytes of the
+resource specified by the path and use that to determine the
+model. If this succeeds, then it takes precedence over
+all other model inferences.
+
+### Flag Consistency
+Once the model is known, then the set of mode flags
+is modified to be consistent with that information.
+So for example, if DAP2 is the model, then all netcdf-4 mode flags 
+and some netcdf-3 flags are removed from the set of mode flags
+because DAP2 provides only a standard netcdf-classic format.
+
+# 4. Adding a Standard Filter
+
+The standard filter system extends the netcdf-c library API to
+support a fixed set of "standard" filters. This is similar to the
+way that deflate and szip are currently supported.
+For background, the file filter.md should be consulted.
+
+In general, the API for a standard filter has the following prototypes.
+The case of zstandard (libzstd) is used as an example.
+````
+int nc_def_var_zstandard(int ncid, int varid, int level);
+int nc_inq_var_zstandard(int ncid, int varid, int* has_filterp, int* levelp);
+````
+So generally the API has the ncid and the varid as fixed, and then
+a list of parameters specific to the filter -- level in this case.
+For the inquire function, there is an additional argument -- has_filterp --
+that is set to 1 if the filter is defined for the given variable
+and is 0 if not.
+The remainder of the inquiry parameters are pointers to memory
+into which the parameters are stored -- levelp in this case.
+
+It is important to note that including a standard filter still
+requires three supporting objects:
+
+1. The implementing library for the filter. For example,
+   libzstd must be installed in order to use the zstandard
+   API.
+2. A HDF5 wrapper for the filter must be installed in the
+   directory pointed to by the HDF5_PLUGIN_PATH environment
+   variable.
+3. (Optional) An NCZarr Codec implementation must be installed
+   in the the HDF5_PLUGIN_PATH directory.
+
+## Adding a New Standard Filter
+
+The implementation of a standard filter must be loaded from one
+of several locations.
+
+1. It can be part of libnetcdf.so (preferred),
+2. it can be loaded as part of the client code,
+3. or it can be loaded as part of an external library such as libccr.
+
+However, the three objects listed above need to be 
+stored in the HDF5_PLUGIN_DIR directory, so adding a standard
+filter still requires modification to the netcdf build system.
+This limitation may be lifted in the future.
+
+### Build Changes
+In order to detect a standard library, the following changes
+must be made for Automake (configure.ac/Makefile.am)
+and CMake (CMakeLists.txt)
+
+#### Configure.ac
+Configure.ac must have a block that similar to this that locates
+the implementing library.
+````
+# See if we have libzstd
+AC_CHECK_LIB([zstd],[ZSTD_compress],[have_zstd=yes],[have_zstd=no])
+if test "x$have_zstd" = "xyes" ; then
+   AC_SEARCH_LIBS([ZSTD_compress],[zstd zstd.dll cygzstd.dll], [], [])
+   AC_DEFINE([HAVE_ZSTD], [1], [if true, zstd library is available])
+fi
+AC_MSG_CHECKING([whether libzstd library is available])
+AC_MSG_RESULT([${have_zstd}])
+````
+Note the the entry point (*ZSTD_compress*) is library dependent
+and is used to see if the library is available.
+
+#### Makefile.am
+
+It is assumed you have an HDF5 wrapper for zstd. If you want it
+to be built as part of the netcdf-c library then you need to
+add the following to *netcdf-c/plugins/Makefile.am*.
+````
+if HAVE_ZSTD
+noinst_LTLIBRARIES += libh5zstd.la
+libh5szip_la_SOURCES = H5Zzstd.c H5Zzstd.h
+endif
+````
+
+# Need our version of szip if libsz available and we are not using HDF5
+if HAVE_SZ
+noinst_LTLIBRARIES += libh5szip.la
+libh5szip_la_SOURCES = H5Zszip.c H5Zszip.h
+endif
+
+#### CMakeLists.txt
+In an analog to *configure.ac*, a block like
+this needs to be in *netcdf-c/CMakeLists.txt*.
+````
+FIND_PACKAGE(Zstd)
+set_std_filter(Zstd)
+````
+The FIND_PACKAGE requires a CMake module for the filter
+in the cmake/modules directory.
+The *set_std_filter* function is a macro.
+
+An entry in the file config.h.cmake.in will also be needed.
+````
+/* Define to 1 if zstd library available. */
+#cmakedefine HAVE_ZSTD 1
+````
+
+### Implementation Template
+As a template, here is the implementation for zstandard.
+It can be used as the template for adding other standard filters.
+It is currently located in *netcdf-d/libdispatch/dfilter.c*, but
+could be anywhere as indicated above.
+````
+#ifdef HAVE_ZSTD
+int
+nc_def_var_zstandard(int ncid, int varid, int level)
+{
+    int stat = NC_NOERR;
+    unsigned ulevel;
+    
+    if((stat = nc_inq_filter_avail(ncid,H5Z_FILTER_ZSTD))) goto done;
+    /* Filter is available */
+    /* Level must be between -131072 and 22 on Zstandard v. 1.4.5 (~202009)
+       Earlier versions have fewer levels (especially fewer negative levels) */
+    if (level < -131072 || level > 22)
+        return NC_EINVAL;
+    ulevel = (unsigned) level; /* Keep bit pattern */
+    if((stat = nc_def_var_filter(ncid,varid,H5Z_FILTER_ZSTD,1,&ulevel))) goto done;
+done:
+    return stat;
+}
+
+int
+nc_inq_var_zstandard(int ncid, int varid, int* hasfilterp, int *levelp)
+{
+    int stat = NC_NOERR;
+    size_t nparams;
+    unsigned params = 0;
+    int hasfilter = 0;
+    
+    if((stat = nc_inq_filter_avail(ncid,H5Z_FILTER_ZSTD))) goto done;
+    /* Filter is available */
+    /* Get filter info */
+    stat = nc_inq_var_filter_info(ncid,varid,H5Z_FILTER_ZSTD,&nparams,NULL);
+    if(stat == NC_ENOFILTER) {stat = NC_NOERR; hasfilter = 0; goto done;}
+    if(stat != NC_NOERR) goto done;
+    hasfilter = 1;
+    if(nparams != 1) {stat = NC_EFILTER; goto done;}
+    if((stat = nc_inq_var_filter_info(ncid,varid,H5Z_FILTER_ZSTD,&nparams,&params))) goto done;
+done:
+    if(levelp) *levelp = (int)params;
+    if(hasfilterp) *hasfilterp = hasfilter;
+    return stat;
+}
+#endif /*HAVE_ZSTD*/
+````
+
+# Point of Contact {#intern_poc}
+
+*Author*: Dennis Heimbigner<br>
+*Email*: dmh at ucar dot edu<br>
+*Initial Version*: 12/22/2021<br>
+*Last Revised*: 01/25/2022