mirror of
https://github.com/Unidata/netcdf-c.git
synced 2024-11-27 07:30:33 +08:00
Merge pull request #2249 from DennisHeimbigner/updatedocs.dmh
Update selected documentation
This commit is contained in:
commit
2404731793
@ -7,6 +7,7 @@ This file contains a high-level description of this package's evolution. Release
|
||||
|
||||
## 4.8.2 - TBD
|
||||
|
||||
* [Enhancement] Update the documentation to match the current filter capabilities See [Github #2249](https://github.com/Unidata/netcdf-c/pull/2249).
|
||||
* [Enhancement] Support installation of pre-built standard filters into user-specified location. See [Github #2318](https://github.com/Unidata/netcdf-c/pull/2318).
|
||||
* [Enhancement] Improve filter support. More specifically (1) add nc_inq_filter_avail to check if a filter is available, (2) add the notion of standard filters, (3) cleanup szip support to fix interaction with NCZarr. See [Github #2245](https://github.com/Unidata/netcdf-c/pull/2245).
|
||||
* [Enhancement] Switch to tinyxml2 as the default xml parser implementation. See [Github #2170](https://github.com/Unidata/netcdf-c/pull/2170).
|
||||
|
@ -754,7 +754,8 @@ INPUT = \
|
||||
@abs_top_srcdir@/docs/COPYRIGHT.md \
|
||||
@abs_top_srcdir@/docs/credits.md \
|
||||
@abs_top_srcdir@/docs/tutorial.dox \
|
||||
@abs_top_srcdir@/docs/internal.dox \
|
||||
@abs_top_srcdir@/docs/internal.md \
|
||||
@abs_top_srcdir@/docs/dispatch.md \
|
||||
@abs_top_srcdir@/docs/inmeminternal.dox \
|
||||
@abs_top_srcdir@/docs/indexing.dox \
|
||||
@abs_top_srcdir@/docs/testserver.dox \
|
||||
|
15
docs/FAQ.md
15
docs/FAQ.md
@ -1079,9 +1079,22 @@ and writable by programs that used older versions of the libraries.
|
||||
However, programs linked to older library versions will not be able to
|
||||
create new data objects with the new less-restrictive names.
|
||||
|
||||
How difficult is it to convert my application to handle arbitrary netCDF-4 files? {#How-difficult-is-it-to-convert-my-application-to-handle-arbitrary-netCDF-4-files}
|
||||
Can I use UTF-8 File Names with Windows? {#Can-I-use-UTF-8-File-Names-with-Windows}
|
||||
-----------------
|
||||
|
||||
Starting with Windows 10 build 17134, Windows can support use of
|
||||
the UTF-8 character set. We strongly encourage Windows users to
|
||||
enable this feature. This requires the following steps.
|
||||
|
||||
1. In the "run" toolbar, execute the command "intl.cpl".
|
||||
2. Move to the Administrative tab.
|
||||
3. Move to "Change system locale"
|
||||
4. Check the box at the bottom labeled something like
|
||||
"Beta: Use Unicode UTF-8 for worldwide language support"
|
||||
|
||||
|
||||
How difficult is it to convert my application to handle arbitrary netCDF-4 files? {#How-difficult-is-it-to-convert-my-application-to-handle-arbitrary-netCDF-4-files}
|
||||
-----------------
|
||||
|
||||
Modifying an application to fully support the new enhanced data model
|
||||
may be relatively easy or arbitrarily difficult :-), depending on what
|
||||
|
@ -9,7 +9,7 @@
|
||||
# These files will be included with the dist.
|
||||
EXTRA_DIST = netcdf.m4 DoxygenLayout.xml Doxyfile.in footer.html \
|
||||
mainpage.dox tutorial.dox \
|
||||
architecture.dox internal.dox windows-binaries.md \
|
||||
architecture.dox internal.md windows-binaries.md dispatch.md \
|
||||
building-with-cmake.md CMakeLists.txt groups.dox notes.md \
|
||||
install-fortran.md all-error-codes.md credits.md auth.md filters.md \
|
||||
obsolete/fan_utils.html indexing.dox \
|
||||
|
507
docs/dispatch.md
Normal file
507
docs/dispatch.md
Normal file
@ -0,0 +1,507 @@
|
||||
Internal Dispatch Table Architecture
|
||||
============================
|
||||
<!-- double header is needed to workaround doxygen bug -->
|
||||
|
||||
# Internal Dispatch Table Architecture
|
||||
|
||||
\tableofcontents
|
||||
|
||||
# Introduction {#dispatch_intro}
|
||||
|
||||
The netcdf-c library uses an internal dispatch mechanism
|
||||
as the means for wrapping the netcdf-c API around a wide variety
|
||||
of underlying storage and stream data formats.
|
||||
As of last check, the following formats are supported and each
|
||||
has its own dispatch table.
|
||||
|
||||
Warning: some of the listed function signatures may be out of date
|
||||
and the specific code should be consulted to see the actual parameters.
|
||||
|
||||
<table>
|
||||
<tr><th>Format<td>Directory<th>NC_FORMATX Name
|
||||
<tr><td>NetCDF-classic<td>libsrc<td>NC_FORMATX_NC3
|
||||
<tr><td>NetCDF-enhanced<td>libhdf5<td>NC_FORMATX_NC_HDF5
|
||||
<tr><td>HDF4<td>libhdf4<td>NC_FORMATX_NC_HDF4
|
||||
<tr><td>PNetCDF<td>libsrcp<td>NC_FORMATX_PNETCDF
|
||||
<tr><td>DAP2<td>libdap2<td>NC_FORMATX_DAP2
|
||||
<tr><td>DAP4<td>libdap4<td>NC_FORMATX_DAP4
|
||||
<tr><td>UDF0<td>N.A.<td>NC_FORMATX_UDF0
|
||||
<tr><td>UDF1<td>N.A.<td>NC_FORMATX_UDF1
|
||||
<tr><td>NCZarr<td>libnczarr<td>NC_FORMATX_NCZARR
|
||||
</table>
|
||||
|
||||
Note that UDF0 and UDF1 allow for user-defined dispatch tables to
|
||||
be implemented.
|
||||
|
||||
The idea is that when a user opens or creates a netcdf file, a
|
||||
specific dispatch table is chosen. A dispatch table is a struct
|
||||
containing an entry for (almost) every function in the netcdf-c API.
|
||||
During execution, netcdf API calls are channeled through that
|
||||
dispatch table to the appropriate function for implementing that
|
||||
API call. The functions in the dispatch table are not quite the
|
||||
same as those defined in *netcdf.h*. For simplicity and
|
||||
compactness, some netcdf.h API calls are mapped to the same
|
||||
dispatch table function. In addition to the functions, the first
|
||||
entry in the table defines the model that this dispatch table
|
||||
implements. It will be one of the NC_FORMATX_XXX values.
|
||||
The second entry in the table is the version of the dispatch table.
|
||||
The rule is that previous entries may not be removed, but new entries
|
||||
may be added, and adding new entries increases the version number.
|
||||
|
||||
The dispatch table represents a distillation of the netcdf API down to
|
||||
a minimal set of internal operations. The format of the dispatch table
|
||||
is defined in the file *libdispatch/ncdispatch.h*. Every new dispatch
|
||||
table must define this minimal set of operations.
|
||||
|
||||
# Adding a New Dispatch Table
|
||||
In order to make this process concrete, let us assume we plan to add
|
||||
an in-memory implementation of netcdf-3.
|
||||
|
||||
## Defining configure.ac flags
|
||||
|
||||
Define a *–-enable* flag option for *configure.ac*. For our
|
||||
example, we assume the option "--enable-ncm" and the
|
||||
internal corresponding flag "enable_ncm". If you examine the existing
|
||||
*configure.ac* and see how, for example, *--enable_dap2* is
|
||||
defined, then it should be clear how to do it for your code.
|
||||
|
||||
## Defining a "name space"
|
||||
|
||||
Choose some prefix of characters to identify the new dispatch
|
||||
system. In effect we are defining a name-space. For our in-memory
|
||||
system, we will choose "NCM" and "ncm". NCM is used for non-static
|
||||
procedures to be entered into the dispatch table and ncm for all other
|
||||
non-static procedures. Note that the chosen prefix should probably start
|
||||
with "nc" or "NC" in order to avoid name conflicts outside the netcdf-c library.
|
||||
|
||||
## Extend include/netcdf.h
|
||||
|
||||
Modify the file *include/netcdf.h* to add an NC_FORMATX_XXX flag
|
||||
by adding a flag for this dispatch format at the appropriate places.
|
||||
````
|
||||
#define NC_FORMATX_NCM 7
|
||||
````
|
||||
|
||||
Add any format specific new error codes.
|
||||
````
|
||||
#define NC_ENCM (?)
|
||||
````
|
||||
|
||||
## Extend include/ncdispatch.h
|
||||
|
||||
Modify the file *include/ncdispatch.h* to
|
||||
add format specific data and initialization functions;
|
||||
note the use of our NCM namespace.
|
||||
````
|
||||
#ifdef ENABLE_NCM
|
||||
extern NC_Dispatch* NCM_dispatch_table;
|
||||
extern int NCM_initialize(void);
|
||||
#endif
|
||||
````
|
||||
|
||||
## Define the dispatch table functions
|
||||
|
||||
Define the functions necessary to fill in the dispatch table. As a
|
||||
rule, we assume that a new directory is defined, *libsrcm*, say. Within
|
||||
this directory, we need to define *Makefile.am* and *CMakeLists.txt*.
|
||||
We also need to define the source files
|
||||
containing the dispatch table and the functions to be placed in the
|
||||
dispatch table -– call them *ncmdispatch.c* and *ncmdispatch.h*. Look at
|
||||
*libsrc/nc3dispatch.[ch]* or *libnczarr/zdispatch.[ch]* for examples.
|
||||
|
||||
Similarly, it is best to take existing *Makefile.am* and *CMakeLists.txt*
|
||||
files (from *libsrcp* for example) and modify them.
|
||||
|
||||
## Adding the dispatch code to libnetcdf
|
||||
|
||||
Provide for the inclusion of this library in the final libnetcdf
|
||||
library. This is accomplished by modifying *liblib/Makefile.am* by
|
||||
adding something like the following.
|
||||
````
|
||||
if ENABLE_NCM
|
||||
libnetcdf_la_LIBADD += $(top_builddir)/libsrcm/libnetcdfm.la
|
||||
endif
|
||||
````
|
||||
|
||||
## Extend library initialization
|
||||
|
||||
Modify the *NC_initialize* function in *liblib/nc_initialize.c* by adding
|
||||
appropriate references to the NCM dispatch function.
|
||||
````
|
||||
#ifdef ENABLE_NCM
|
||||
extern int NCM_initialize(void);
|
||||
#endif
|
||||
...
|
||||
int NC_initialize(void)
|
||||
{
|
||||
...
|
||||
#ifdef ENABLE_NCM
|
||||
if((stat = NCM_initialize())) return stat;
|
||||
#endif
|
||||
...
|
||||
}
|
||||
````
|
||||
|
||||
Finalization is handled in an analogous fashion.
|
||||
|
||||
## Testing the new dispatch table
|
||||
|
||||
Add a directory of tests: *ncm_test*, say. The file *ncm_test/Makefile.am*
|
||||
will look something like this.
|
||||
````
|
||||
# These files are created by the tests.
|
||||
CLEANFILES = ...
|
||||
# These are the tests which are always run.
|
||||
TESTPROGRAMS = test1 test2 ...
|
||||
test1_SOURCES = test1.c ...
|
||||
...
|
||||
# Set up the tests.
|
||||
check_PROGRAMS = $(TESTPROGRAMS)
|
||||
TESTS = $(TESTPROGRAMS)
|
||||
# Any extra files required by the tests
|
||||
EXTRA_DIST = ...
|
||||
````
|
||||
|
||||
# Top-Level build of the dispatch code
|
||||
|
||||
Provide for *libnetcdfm* to be constructed by adding the following to
|
||||
the top-level *Makefile.am*.
|
||||
|
||||
````
|
||||
if ENABLE_NCM
|
||||
NCM=libsrcm
|
||||
NCMTESTDIR=ncm_test
|
||||
endif
|
||||
...
|
||||
SUBDIRS = ... $(DISPATCHDIR) $(NCM) ... $(NCMTESTDIR)
|
||||
````
|
||||
|
||||
# Choosing a Dispatch Table
|
||||
|
||||
The dispatch table is ultimately chosen by the function
|
||||
NC_infermodel() in libdispatch/dinfermodel.c. This function is
|
||||
invoked by the NC_create and the NC_open procedures. This can
|
||||
be, unfortunately, a complex process. The detailed operation of
|
||||
NC_infermodel() is defined in the companion document in docs/dinternal.md.
|
||||
|
||||
In any case, the choice of dispatch table is currently based on the following
|
||||
pieces of information.
|
||||
|
||||
1. The mode argument – this can be used to detect, for example, what kind
|
||||
of file to create: netcdf-3, netcdf-4, 64-bit netcdf-3, etc.
|
||||
Using a mode flag is the most common mechanism, in which case
|
||||
*netcdf.h* needs to be modified to define the relevant mode flag.
|
||||
|
||||
2. The file path – this can be used to detect, for example, a DAP url
|
||||
versus a normal file system file. If the path looks like a URL, then
|
||||
the fragment part of the URL is examined to determine the specific
|
||||
dispatch function.
|
||||
|
||||
3. The file contents - when the contents of a real file are available,
|
||||
the contents of the file can be used to determine the dispatch table.
|
||||
As a rule, this is likely to be useful only for *nc_open*.
|
||||
|
||||
4. If the file is being opened vs being created.
|
||||
|
||||
5. Is parallel IO available?
|
||||
|
||||
The *NC_infermodel* function returns two values.
|
||||
|
||||
1. model - this is used by nc_open and nc_create to choose the dispatch table.
|
||||
2. newpath - in some case, usually URLS, the path may be rewritten to include extra information for use by the dispatch functions.
|
||||
|
||||
# Special Dispatch Table Signatures.
|
||||
|
||||
The entries in the dispatch table do not necessarily correspond
|
||||
to the external API. In many cases, multiple related API functions
|
||||
are merged into a single dispatch table entry.
|
||||
|
||||
## Create/Open
|
||||
|
||||
The create table entry and the open table entry in the dispatch table
|
||||
have the following signatures respectively.
|
||||
````
|
||||
int (*create)(const char *path, int cmode,
|
||||
size_t initialsz, int basepe, size_t *chunksizehintp,
|
||||
int useparallel, void* parameters,
|
||||
struct NC_Dispatch* table, NC* ncp);
|
||||
|
||||
int (*open)(const char *path, int mode,
|
||||
int basepe, size_t *chunksizehintp,
|
||||
int use_parallel, void* parameters,
|
||||
struct NC_Dispatch* table, NC* ncp);
|
||||
````
|
||||
|
||||
The key difference is that these are the union of all the possible
|
||||
create/open signatures from the include/netcdfXXX.h files. Note especially the last
|
||||
three parameters. The parameters argument is a pointer to arbitrary data
|
||||
to provide extra info to the dispatcher.
|
||||
The table argument is included in case the create
|
||||
function (e.g. *NCM_create_) needs to invoke other dispatch
|
||||
functions. The very last argument, ncp, is a pointer to an NC
|
||||
instance. The raw NC instance will have been created by *libdispatch/dfile.c*
|
||||
and is passed to e.g. open with the expectation that it will be filled in
|
||||
by the dispatch open function.
|
||||
|
||||
## Accessing Data with put_vara() and get_vara()
|
||||
|
||||
````
|
||||
int (*put_vara)(int ncid, int varid, const size_t *start, const size_t *count,
|
||||
const void *value, nc_type memtype);
|
||||
````
|
||||
|
||||
````
|
||||
int (*get_vara)(int ncid, int varid, const size_t *start, const size_t *count,
|
||||
void *value, nc_type memtype);
|
||||
````
|
||||
|
||||
Most of the parameters are similar to the netcdf API parameters. The
|
||||
last parameter, however, is the type of the data in
|
||||
memory. Additionally, instead of using an "int islong" parameter, the
|
||||
memtype will be either ::NC_INT or ::NC_INT64, depending on the value
|
||||
of sizeof(long). This means that even netcdf-3 code must be prepared
|
||||
to encounter the ::NC_INT64 type.
|
||||
|
||||
## Accessing Attributes with put_attr() and get_attr()
|
||||
|
||||
````
|
||||
int (*get_att)(int ncid, int varid, const char *name,
|
||||
void *value, nc_type memtype);
|
||||
````
|
||||
|
||||
````
|
||||
int (*put_att)(int ncid, int varid, const char *name, nc_type datatype, size_t len,
|
||||
const void *value, nc_type memtype);
|
||||
````
|
||||
|
||||
Again, the key difference is the memtype parameter. As with
|
||||
put/get_vara, it used ::NC_INT64 to encode the long case.
|
||||
|
||||
## Pre-defined Dispatch Functions
|
||||
|
||||
It is sometimes not necessary to implement all the functions in the
|
||||
dispatch table. Some pre-defined functions are available which may be
|
||||
used in many cases.
|
||||
|
||||
## Inquiry Functions
|
||||
|
||||
Many of The netCDF inquiry functions operate from an in-memory model of
|
||||
metadata. Once a file is opened, or a file is created, this
|
||||
in-memory metadata model is kept up to date. Consequenty the inquiry
|
||||
functions do not depend on the dispatch layer code. These functions
|
||||
can be used by all dispatch layers which use the internal netCDF
|
||||
enhanced data model.
|
||||
|
||||
- NC4_inq
|
||||
- NC4_inq_type
|
||||
- NC4_inq_dimid
|
||||
- NC4_inq_dim
|
||||
- NC4_inq_unlimdim
|
||||
- NC4_inq_att
|
||||
- NC4_inq_attid
|
||||
- NC4_inq_attname
|
||||
- NC4_get_att
|
||||
- NC4_inq_varid
|
||||
- NC4_inq_var_all
|
||||
- NC4_show_metadata
|
||||
- NC4_inq_unlimdims
|
||||
- NC4_inq_ncid
|
||||
- NC4_inq_grps
|
||||
- NC4_inq_grpname
|
||||
- NC4_inq_grpname_full
|
||||
- NC4_inq_grp_parent
|
||||
- NC4_inq_grp_full_ncid
|
||||
- NC4_inq_varids
|
||||
- NC4_inq_dimids
|
||||
- NC4_inq_typeids
|
||||
- NC4_inq_type_equal
|
||||
- NC4_inq_user_type
|
||||
- NC4_inq_typeid
|
||||
|
||||
## NCDEFAULT get/put Functions
|
||||
|
||||
The mapped (varm) get/put functions have been
|
||||
implemented in terms of the array (vara) functions. So dispatch layers
|
||||
need only implement the vara functions, and can use the following
|
||||
functions to get the and varm functions:
|
||||
|
||||
- NCDEFAULT_get_varm
|
||||
- NCDEFAULT_put_varm
|
||||
|
||||
For the netcdf-3 format, the strided functions (nc_get/put_vars)
|
||||
are similarly implemented in terms of the vara functions. So the following
|
||||
convenience functions are available.
|
||||
|
||||
- NCDEFAULT_get_vars
|
||||
- NCDEFAULT_put_vars
|
||||
|
||||
For the netcdf-4 format, the vars functions actually exist, so
|
||||
the default vars functions are not used.
|
||||
|
||||
## Read-Only Functions
|
||||
|
||||
Some dispatch layers are read-only (ex. HDF4). Any function which
|
||||
writes to a file, including nc_create(), needs to return error code
|
||||
::NC_EPERM. The following read-only functions are available so that
|
||||
these don't have to be re-implemented in each read-only dispatch layer:
|
||||
|
||||
- NC_RO_create
|
||||
- NC_RO_redef
|
||||
- NC_RO__enddef
|
||||
- NC_RO_sync
|
||||
- NC_RO_set_fill
|
||||
- NC_RO_def_dim
|
||||
- NC_RO_rename_dim
|
||||
- NC_RO_rename_att
|
||||
- NC_RO_del_att
|
||||
- NC_RO_put_att
|
||||
- NC_RO_def_var
|
||||
- NC_RO_rename_var
|
||||
- NC_RO_put_vara
|
||||
- NC_RO_def_var_fill
|
||||
|
||||
## Classic NetCDF Only Functions
|
||||
|
||||
There are two functions that are only used in the classic code. All
|
||||
other dispatch layers (except PnetCDF) return error ::NC_ENOTNC3 for
|
||||
these functions. The following functions are provided for this
|
||||
purpose:
|
||||
|
||||
- NOTNC3_inq_base_pe
|
||||
- NOTNC3_set_base_pe
|
||||
|
||||
# HDF4 Dispatch Layer as a Simple Example
|
||||
|
||||
The HDF4 dispatch layer is about the simplest possible dispatch
|
||||
layer. It is read-only, classic model. It will serve as a nice, simple
|
||||
example of a dispatch layer.
|
||||
|
||||
Note that the HDF4 layer is optional in the netCDF build. Not all
|
||||
users will have HDF4 installed, and those users will not build with
|
||||
the HDF4 dispatch layer enabled. For this reason HDF4 code is guarded
|
||||
as follows.
|
||||
````
|
||||
#ifdef USE_HDF4
|
||||
...
|
||||
#endif /*USE_HDF4*/
|
||||
````
|
||||
|
||||
Code in libhdf4 is only compiled if HDF4 is
|
||||
turned on in the build.
|
||||
|
||||
### The netcdf.h File
|
||||
|
||||
In the main netcdf.h file, we have the following:
|
||||
|
||||
````
|
||||
#define NC_FORMATX_NC_HDF4 (3)
|
||||
````
|
||||
|
||||
### The ncdispatch.h File
|
||||
|
||||
In ncdispatch.h we have the following:
|
||||
|
||||
````
|
||||
#ifdef USE_HDF4
|
||||
extern NC_Dispatch* HDF4_dispatch_table;
|
||||
extern int HDF4_initialize(void);
|
||||
extern int HDF4_finalize(void);
|
||||
#endif
|
||||
````
|
||||
|
||||
### The netcdf_meta.h File
|
||||
|
||||
The netcdf_meta.h file allows for easy determination of what features
|
||||
are in use. For HDF4, It contains the following, set by configure:
|
||||
````
|
||||
...
|
||||
#define NC_HAS_HDF4 0 /*!< HDF4 support. */
|
||||
...
|
||||
````
|
||||
|
||||
### The hdf4dispatch.h File
|
||||
|
||||
The file *hdf4dispatch.h* contains prototypes and
|
||||
macro definitions used within the HDF4 code in libhdf4. This include
|
||||
file should not be used anywhere except in libhdf4.
|
||||
|
||||
### Initialization Code Changes in liblib Directory
|
||||
|
||||
The file *nc_initialize.c* is modified to include the following:
|
||||
````
|
||||
#ifdef USE_HDF4
|
||||
extern int HDF4_initialize(void);
|
||||
extern int HDF4_finalize(void);
|
||||
#endif
|
||||
````
|
||||
|
||||
### Changes to libdispatch/dfile.c
|
||||
|
||||
In order for a dispatch layer to be used, it must be correctly
|
||||
determined in functions *NC_open()* or *NC_create()* in *libdispatch/dfile.c*.
|
||||
HDF4 has a magic number that is detected in
|
||||
*NC_interpret_magic_number()*, which allows *NC_open* to automatically
|
||||
detect an HDF4 file.
|
||||
|
||||
Once HDF4 is detected, the *model* variable is set to *NC_FORMATX_NC_HDF4*,
|
||||
and later this is used in a case statement:
|
||||
````
|
||||
case NC_FORMATX_NC_HDF4:
|
||||
dispatcher = HDF4_dispatch_table;
|
||||
break;
|
||||
````
|
||||
|
||||
This sets the dispatcher to the HDF4 dispatcher, which is defined in
|
||||
the libhdf4 directory.
|
||||
|
||||
### Dispatch Table in libhdf4/hdf4dispatch.c
|
||||
|
||||
The file *hdf4dispatch.c* contains the definition of the HDF4 dispatch
|
||||
table. It looks like this:
|
||||
````
|
||||
/* This is the dispatch object that holds pointers to all the
|
||||
* functions that make up the HDF4 dispatch interface. */
|
||||
static NC_Dispatch HDF4_dispatcher = {
|
||||
NC_FORMATX_NC_HDF4,
|
||||
NC_DISPATCH_VERSION,
|
||||
NC_RO_create,
|
||||
NC_HDF4_open,
|
||||
NC_RO_redef,
|
||||
NC_RO__enddef,
|
||||
NC_RO_sync,
|
||||
...
|
||||
NC_NOTNC4_set_var_chunk_cache,
|
||||
NC_NOTNC4_get_var_chunk_cache,
|
||||
...
|
||||
};
|
||||
````
|
||||
Note that most functions use some of the predefined dispatch
|
||||
functions. Functions that start with NC_RO* are read-only, they return
|
||||
::NC_EPERM. Functions that start with NOTNC4* return ::NC_ENOTNC4.
|
||||
|
||||
Only the functions that start with NC_HDF4* need to be implemented for
|
||||
the HDF4 dispatch layer. There are 6 such functions:
|
||||
|
||||
- NC_HDF4_open
|
||||
- NC_HDF4_abort
|
||||
- NC_HDF4_close
|
||||
- NC_HDF4_inq_format
|
||||
- NC_HDF4_inq_format_extended
|
||||
- NC_HDF4_get_vara
|
||||
|
||||
### HDF4 Reading Code
|
||||
|
||||
The code in *hdf4file.c* opens the HDF4 SD dataset, and reads the
|
||||
metadata. This metadata is stored in the netCDF internal metadata
|
||||
model, allowing the inq functions to work.
|
||||
|
||||
The code in *hdf4var.c* does an *nc_get_vara()* on the HDF4 SD
|
||||
dataset. This is all that is needed for all the nc_get_* functions to
|
||||
work.
|
||||
|
||||
# Point of Contact {#filters_poc}
|
||||
|
||||
*Author*: Dennis Heimbigner<br>
|
||||
*Email*: dmh at ucar dot edu<br>
|
||||
*Initial Version*: 12/22/2021<br>
|
||||
*Last Revised*: 12/22/2021
|
712
docs/filters.md
712
docs/filters.md
File diff suppressed because it is too large
Load Diff
639
docs/internal.md
Normal file
639
docs/internal.md
Normal file
@ -0,0 +1,639 @@
|
||||
Notes On the Internals of the NetCDF-C Library
|
||||
============================
|
||||
<!-- double header is needed to workaround doxygen bug -->
|
||||
|
||||
# Notes On the Internals of the NetCDF-C Library {#intern_head}
|
||||
|
||||
\tableofcontents
|
||||
|
||||
This document attempts to record important information about
|
||||
the internal architecture and operation of the netcdf-c library.
|
||||
|
||||
# 1. Including C++ Code in the netcdf-c Library {#intern_c++}
|
||||
|
||||
The state of C compiler technology has reached the point where
|
||||
it is possible to include C++ code into the netcdf-c library
|
||||
code base. Two examples are:
|
||||
|
||||
1. The AWS S3 SDK wrapper *libdispatch/ncs3sdk.cpp* file.
|
||||
2. The TinyXML wrapper *ncxml\_tinyxml2.cpp* file.
|
||||
|
||||
However there are some consequences that must be handled for this to work.
|
||||
Specifically, the compiler must be told that the C++ runtime is needed
|
||||
in the following ways.
|
||||
|
||||
## Modifications to *lib\_flags.am*
|
||||
Suppose we have a flag *ENABLE\_XXX* where that XXX
|
||||
feature entails using C++ code. Then the following must be added
|
||||
to *lib\_flags.am*
|
||||
````
|
||||
if ENABLE_XXX
|
||||
AM_LDFLAGS += -lstdc++
|
||||
endif
|
||||
````
|
||||
|
||||
## Modifications to *libxxx/Makefile.am*
|
||||
|
||||
The Makefile in which the C++ code is included and compiled
|
||||
(assumed here to be the *libxxx* directory) must have this set.
|
||||
````
|
||||
AM_CXXFLAGS = -std=c++11
|
||||
````
|
||||
It is possible that other values (e.g. *-std=c++14*) may also work.
|
||||
|
||||
# 2. Managing instances of complex data types
|
||||
|
||||
For a long time, there have been known problems with the
|
||||
management of complex types containing VLENs. This also
|
||||
involves the string type because it is stored as a VLEN of
|
||||
chars.
|
||||
|
||||
The term complex type refers to any type that directly or
|
||||
recursively references a VLEN type. So an array of VLENS, a
|
||||
compound with a VLEN field, and so on.
|
||||
|
||||
In order to properly handle instances of these complex types, it
|
||||
is necessary to have function that can recursively walk
|
||||
instances of such types to perform various actions on them. The
|
||||
term "deep" is also used to mean recursive.
|
||||
|
||||
Two deep walking operations are provided by the netcdf-c library
|
||||
to aid in managing instances of complex structures.
|
||||
* free'ing an instance of the complex type
|
||||
* copying an instance of the complex type.
|
||||
|
||||
Previously The netcdf-c library only did shallow free and shallow copy of
|
||||
complex types. This meant that only the top level was properly
|
||||
free'd or copied, but deep internal blocks in the instance were
|
||||
not touched. This led to a host of memory leaks and failures
|
||||
when the deep data was effectively shared between the netcdf-c library
|
||||
internally and the user's data.
|
||||
|
||||
Note that the term "vector" is used to mean a contiguous (in
|
||||
memory) sequence of instances of some type. Given an array with,
|
||||
say, dimensions 2 X 3 X 4, this will be stored in memory as a
|
||||
vector of length 2*3*4=24 instances.
|
||||
|
||||
The use cases are primarily these.
|
||||
|
||||
## nc\_get\_vars
|
||||
Suppose one is reading a vector of instances using nc\_get\_vars
|
||||
(or nc\_get\_vara or nc\_get\_var, etc.). These functions will
|
||||
return the vector in the top-level memory provided. All
|
||||
interior blocks (form nested VLEN or strings) will have been
|
||||
dynamically allocated. Note that computing the size of the vector
|
||||
may be tricky because the strides must be taken into account.
|
||||
|
||||
After using this vector of instances, it is necessary to free
|
||||
(aka reclaim) the dynamically allocated memory, otherwise a
|
||||
memory leak occurs. So, the recursive reclaim function is used
|
||||
to walk the returned instance vector and do a deep reclaim of
|
||||
the data.
|
||||
|
||||
Currently functions are defined in netcdf.h that are supposed to
|
||||
handle this: nc\_free\_vlen(), nc\_free\_vlens(), and
|
||||
nc\_free\_string(). Unfortunately, these functions only do a
|
||||
shallow free, so deeply nested instances are not properly
|
||||
handled by them. They are marked in the description as
|
||||
deprecated in favor of the newer recursive function.
|
||||
|
||||
## nc\_put\_vars
|
||||
|
||||
Suppose one is writing a vector of instances using nc\_put\_vars
|
||||
(or nc\_put\_vara or nc\_put\_var, etc.). These functions will
|
||||
write the contents of the vector to the specified variable.
|
||||
Note that internally, the data passed to the nc\_put\_xxx function is
|
||||
immediately written so there is no need to copy it internally. But the
|
||||
caller may need to reclaim the vector of data that was created and passed
|
||||
in to the nc\_put\_xxx function.
|
||||
|
||||
After writing this vector of instances, and assuming it was dynamically
|
||||
created, at some point it will be necessary to reclaim that data.
|
||||
So again, the recursive reclaim function can be used
|
||||
to walk the returned instance vector and do a deep reclaim of
|
||||
the data.
|
||||
|
||||
## nc\_put\_att
|
||||
Suppose one is writing a vector of instances as the data of an attribute
|
||||
using, say, nc\_put\_att.
|
||||
|
||||
Internally, the incoming attribute data must be copied and stored
|
||||
so that changes/reclamation of the input data will not affect
|
||||
the attribute. Note that this copying behavior is different from
|
||||
writing to a variable, where the data is written immediately.
|
||||
|
||||
Again, the code inside the netcdf library used to use only shallow copying
|
||||
rather than deep copy. As a result, one saw effects such as described
|
||||
in Github Issue https://github.com/Unidata/netcdf-c/issues/2143.
|
||||
|
||||
Also, after defining the attribute, it may be necessary for the user
|
||||
to free the data that was provided as input to nc\_put\_att() as in the
|
||||
nc\_put\_xxx functions (previously described).
|
||||
|
||||
## nc\_get\_att
|
||||
Suppose one is reading a vector of instances as the data of an attribute
|
||||
using, say, nc\_get\_att.
|
||||
|
||||
Internally, the existing attribute data must be copied and returned
|
||||
to the caller, and the caller is responsible for reclaiming
|
||||
the returned data.
|
||||
|
||||
Again, the code inside the netcdf library used to only do shallow copying
|
||||
rather than deep copy. So this could lead to memory leaks and errors
|
||||
because the deep data was shared between the library and the user.
|
||||
|
||||
## New Instance Walking API
|
||||
|
||||
Proper recursive functions were added to the netcdf-c library to
|
||||
provide reclaim and copy functions and use those as needed.
|
||||
These functions are defined in libdispatch/dinstance.c and their
|
||||
signatures are defined in include/netcdf.h. For back
|
||||
compatibility, corresponding "ncaux\_XXX" functions are defined
|
||||
in include/netcdf\_aux.h.
|
||||
````
|
||||
int nc_reclaim_data(int ncid, nc_type xtypeid, void* memory, size_t count);
|
||||
int nc_reclaim_data_all(int ncid, nc_type xtypeid, void* memory, size_t count);
|
||||
int nc_copy_data(int ncid, nc_type xtypeid, const void* memory, size_t count, void* copy);
|
||||
int nc_copy_data_all(int ncid, nc_type xtypeid, const void* memory, size_t count, void** copyp);
|
||||
````
|
||||
There are two variants. The first two, nc\_reclaim\_data() and
|
||||
nc\_copy\_data(), assume the top-level vector is managed by the
|
||||
caller. For reclaim, this is so the user can use, for example, a
|
||||
statically allocated vector. For copy, it assumes the user
|
||||
provides the space into which the copy is stored.
|
||||
|
||||
The second two, nc\_reclaim\_data\_all() and
|
||||
nc\_copy\_data\_all(), allows the functions to manage the
|
||||
top-level. So for nc\_reclaim\_data\_all, the top level is
|
||||
assumed to be dynamically allocated and will be free'd by
|
||||
nc\_reclaim\_data\_all(). The nc\_copy\_data\_all() function
|
||||
will allocate the top level and return a pointer to it to the
|
||||
user. The user can later pass that pointer to
|
||||
nc\_reclaim\_data\_all() to reclaim the instance(s).
|
||||
|
||||
# Internal Changes
|
||||
The netcdf-c library internals are changed to use the proper reclaim
|
||||
and copy functions. This also allows some simplification of the code
|
||||
since the stdata and vldata fields of NC\_ATT\_INFO are no longer needed.
|
||||
Currently this is commented out using the SEPDATA \#define macro.
|
||||
When the bugs are found and fixed, all this code will be removed.
|
||||
|
||||
## Optimizations
|
||||
|
||||
In order to make these functions as efficient as possible, it is
|
||||
desirable to classify all types as to whether or not they contain
|
||||
variable-size data. If a type is fixed sized (i.e. does not contain
|
||||
variable-size data) then it can be freed or copied as a single chunk.
|
||||
This significantly increases the performance for such types.
|
||||
For variable-size types, it is necessary to walk each instance of the type
|
||||
and recursively reclaim or copy it. As another optimization,
|
||||
if the type is a vector of strings, then the per-instance walk can be
|
||||
sped up by doing the reclaim or copy inline.
|
||||
|
||||
The rules for classifying types as fixed or variable size are as follows.
|
||||
|
||||
1. All atomic types, except string, are fixed size.
|
||||
2. All enum type and opaque types are fixed size.
|
||||
3. All string types and VLEN types are variable size.
|
||||
4. A compound type is fixed size if all of the types of its
|
||||
fields are fixed size. Otherwise it has variable size.
|
||||
|
||||
The classification of types can be made at the time the type is defined
|
||||
or is read in from some existing file. The reclaim and copy functions
|
||||
use this information to speed up the handling of fixed size types.
|
||||
|
||||
# Warnings
|
||||
|
||||
1. The new API functions require that the type information be
|
||||
accessible. This means that you cannot use these functions
|
||||
after the file has been closed. After the file is closed, you
|
||||
are on your own.
|
||||
|
||||
2. There is still one known failure that has not been solved; it is
|
||||
possibly an HDF5 memory leak. All the failures revolve around
|
||||
some variant of this .cdl file. The proximate cause of failure is
|
||||
the use of a VLEN FillValue.
|
||||
````
|
||||
netcdf x {
|
||||
types:
|
||||
float(*) row_of_floats ;
|
||||
dimensions:
|
||||
m = 5 ;
|
||||
variables:
|
||||
row_of_floats ragged_array(m) ;
|
||||
row_of_floats ragged_array:_FillValue = {-999} ;
|
||||
data:
|
||||
ragged_array = {10, 11, 12, 13, 14}, {20, 21, 22, 23}, {30, 31, 32},
|
||||
{40, 41}, _ ;
|
||||
}
|
||||
````
|
||||
|
||||
# 3. Inferring File Types
|
||||
|
||||
As described in the companion document -- docs/dispatch.md --
|
||||
when nc\_create() or nc\_open() is called, it must figure out what
|
||||
kind of file is being created or opened. Once it has figured out
|
||||
the file kind, the appropriate "dispatch table" can be used
|
||||
to process that file.
|
||||
|
||||
## The Role of URLs
|
||||
|
||||
Figuring out the kind of file is referred to as model inference
|
||||
and is, unfortunately, a complicated process. The complication
|
||||
is mostly a result of allowing a path argument to be a URL.
|
||||
Inferring the file kind from a URL requires deep processing of
|
||||
the URL structure: the protocol, the host, the path, and the fragment
|
||||
parts in particular. The query part is currently not used because
|
||||
it usually contains information to be processed by the server
|
||||
receiving the URL.
|
||||
|
||||
The "fragment" part of the URL may be unfamiliar.
|
||||
The last part of a URL may optionally contain a fragment, which
|
||||
is syntactically of this form in this pseudo URL specification.
|
||||
````
|
||||
<protocol>://<host>/<path>?<query>#<fragment>
|
||||
````
|
||||
The form of the fragment is similar to a query and takes this general form.
|
||||
````
|
||||
'#'<key>=<value>&<key>=<value>&...
|
||||
````
|
||||
The key is a simple name, the value is any sequence of characters,
|
||||
although URL special characters such as '&' must be URL encoded in
|
||||
the '%XX' form where each X is a hexadecimal digit.
|
||||
An example might look like this non-sensical example:
|
||||
````
|
||||
https://host.com/path#mode=nczarr,s3&bytes
|
||||
````
|
||||
It is important to note that the fragment part is not intended to be
|
||||
passed to the server, but rather is processed by the client program.
|
||||
It is this property that allows the netcdf-c library to use it to
|
||||
pass information deep into the dispatch table code that is processing the
|
||||
URL.
|
||||
|
||||
## Model Inference Inputs
|
||||
|
||||
The inference algorithm is given the following information
|
||||
from which it must determine the kind of file being accessed.
|
||||
|
||||
### Mode
|
||||
|
||||
The mode is a set of flags that are passed as the second
|
||||
argument to nc\_create and nc\_open. The set of flags is define in
|
||||
the netcdf.h header file. Generally it specifies the general
|
||||
format of the file: netcdf-3 (classic) or netcdf-4 (enhanced).
|
||||
Variants of these can also be specified, e.g. 64-bit netcdf-3 or
|
||||
classic netcdf-4.
|
||||
In the case where the path argument is a simple file path,
|
||||
using a mode flag is the most common mechanism for specifying
|
||||
the model.
|
||||
|
||||
### Path
|
||||
The file path, the first argument to nc\_create and nc\_open,
|
||||
Can be either a simple file path or a URL.
|
||||
If it is a URL, then it will be deeply inspected to determine
|
||||
the model.
|
||||
|
||||
### File Contents
|
||||
When the contents of a real file are available,
|
||||
the contents of the file can be used to determine the dispatch table.
|
||||
As a rule, this is likely to be useful only for *nc\_open*.
|
||||
It also requires access to functions that can open and read at least
|
||||
the initial part of the file.
|
||||
As a rule, the initial small prefix of the file is read
|
||||
and examined to see if it matches any of the so-called
|
||||
"magic numbers" that indicate the kind of file being read.
|
||||
|
||||
### Open vs Create
|
||||
Is the file being opened or is it being created?
|
||||
|
||||
### Parallelism
|
||||
Is parallel IO available?
|
||||
|
||||
## Model Inference Outputs
|
||||
The inferencing algorithm outputs two pieces of information.
|
||||
|
||||
1. model - this is used by nc\_open and nc\_create to choose the dispatch table.
|
||||
2. newpath - in some case, usually URLS, the path may be rewritten to include extra information for use by the dispatch functions.
|
||||
|
||||
The model output is actually a struct containing two fields:
|
||||
|
||||
1. implementation - this is a value from the NC\_FORMATX\_xxx
|
||||
values in netcdf.h. It generally determines the dispatch
|
||||
table to use.
|
||||
2. format -- this is an NC\_FORMAT\_xxx value defining, in effect,
|
||||
the netcdf-format to which the underlying format is to be
|
||||
translated. Thus it can tell the netcdf-3 dispatcher that it
|
||||
should actually implement CDF5 rather than standard netcdf classic.
|
||||
|
||||
## The Inference Algorithm
|
||||
|
||||
The construction of the model is primarily carried out by the function
|
||||
*NC\_infermodel()* (in *libdispatch/dinfermodel.c).
|
||||
It is given the following parameters:
|
||||
1. path -- (IN) absolute file path or URL
|
||||
2. modep -- (IN/OUT) the set of mode flags given to *NC\_open* or *NC\_create*.
|
||||
3. iscreate -- (IN) distinguish open from create.
|
||||
4. useparallel -- (IN) indicate if parallel IO can be used.
|
||||
5. params -- (IN/OUT) arbitrary data dependent on the mode and path.
|
||||
6. model -- (IN/OUT) place to store inferred model.
|
||||
7. newpathp -- (OUT) the canonical rewrite of the path argument.
|
||||
|
||||
As a rule, these values are used in the this order of preference
|
||||
to infer the model.
|
||||
|
||||
1. file contents -- highest precedence
|
||||
2. url (if it is one) -- using the "mode=" key in the fragment (see below).
|
||||
3. mode flags
|
||||
4. default format -- lowest precedence
|
||||
|
||||
The sequence of steps is as follows.
|
||||
|
||||
### URL processing -- processuri()
|
||||
|
||||
If the path appears to be a URL, then it is parsed
|
||||
and processed by the processuri function as follows.
|
||||
|
||||
1. Protocol --
|
||||
The protocol is extracted and tested against the list of
|
||||
legal protocols. If not found, then it is an error.
|
||||
If found, then it is replaced by a substitute -- if specified.
|
||||
So, for example, the protocol "dods" is replaced the protocol "http"
|
||||
(note that at some point "http" will be replaced with "https").
|
||||
Additionally, one or more "key=value" strings is appended
|
||||
to the existing fragment of the url. So, again for "dods",
|
||||
the fragment is extended by the string "mode=dap2".
|
||||
Thus replacing "dods" does not lose information, but rather transfers
|
||||
it to the fragment for later use.
|
||||
|
||||
2. Fragment --
|
||||
After the protocol is processed, the initial fragment processing occurs
|
||||
by converting it to a list data structure of the form
|
||||
````
|
||||
{<key>,<value>,<key>,<value>,<key>,<value>....}
|
||||
````
|
||||
|
||||
### Macro Processing -- processmacros()
|
||||
|
||||
If the fragment list produced by processuri() is non-empty, then
|
||||
it is processed for "macros". Notice that if the original path
|
||||
was not a URL, then the fragment list is empty and this
|
||||
processing will be bypassed. In any case, It is convenient to
|
||||
allow some singleton fragment keys to be expanded into larger
|
||||
fragment components. In effect, those singletons act as
|
||||
macros. They can help to simplify the user's URL. The term
|
||||
singleton means a fragment key with no associated value:
|
||||
"#bytes", for example.
|
||||
|
||||
The list of fragments is searched looking for keys whose
|
||||
value part is NULL or the empty string. Then the table
|
||||
of macros is searched for that key and if found, then
|
||||
a key and values is appended to the fragment list and the singleton
|
||||
is removed.
|
||||
|
||||
### Mode Inference -- processinferences()
|
||||
|
||||
This function just processes the list of values associated
|
||||
with the "mode" key. It is similar to a macro in that
|
||||
certain mode values are added or removed based on tables
|
||||
of "inferences" and "negations".
|
||||
Again, the purpose is to allow users to provide simplified URL fragments.
|
||||
|
||||
The list of mode values is repeatedly searched and whenever a value
|
||||
is found that is in the "modeinferences" table, then the associated inference value
|
||||
is appended to the list of mode values. This process stops when no changes
|
||||
occur. This form of inference allows the user to specify "mode=zarr"
|
||||
and have it converted to "mode=nczarr,zarr". This avoids the need for the
|
||||
dispatch table code to do the same inference.
|
||||
|
||||
After the inferences are made, The list of mode values is again
|
||||
repeatedly searched and whenever a value
|
||||
is found that is in the "modenegations" table, then the associated negation value
|
||||
is removed from the list of mode values, assuming it is there. This process stops when no changes
|
||||
occur. This form of inference allows the user to make sure that "mode=bytes,nczarr"
|
||||
has the bytes mode take precedence by removing the "nczarr" value. Such illegal
|
||||
combinations can occur because of previous processing steps.
|
||||
|
||||
### Fragment List Normalization
|
||||
As the fragment list is processed, duplicates appear with the same key.
|
||||
A function -- cleanfragments() -- is applied to clean up the fragment list
|
||||
by coalesing the values of duplicate keys and removing duplicate key values.
|
||||
|
||||
### S3 Rebuild
|
||||
If the URL is determined to be a reference to a resource on the Amazon S3 cloud,
|
||||
then the URL needs to be converted to what is called "path format".
|
||||
There are four S3 URL formats:
|
||||
|
||||
1. Virtual -- ````https://<bucket>.s3.<region>.amazonaws.com/<path>````
|
||||
2. Path -- ````https://s3.<region>.amazonaws.com/<bucket>/<path>````
|
||||
3. S3 -- ````s3://<bucket>/<path>````
|
||||
4. Other -- ````https://<host>/<bucket>/<path>````
|
||||
|
||||
The S3 processing converts all of these to the Path format. In the "S3" format case
|
||||
it is necessary to find or default the region from examining the ".aws" directory files.
|
||||
|
||||
### File Rebuild
|
||||
If the URL protocol is "file" and its path is a relative file path,
|
||||
then it is made absolute by prepending the path of the current working directory.
|
||||
|
||||
In any case, after S3 or File rebuilds, the URL is completely
|
||||
rebuilt using any modified protocol, host, path, and
|
||||
fragments. The query is left unchanged in the current algorithm.
|
||||
The resulting rebuilt URL is passed back to the caller.
|
||||
|
||||
### Mode Key Processing
|
||||
The set of values of the fragment's "mode" key are processed one by one
|
||||
to see if it is possible to determine the model.
|
||||
There is a table for format interpretations that maps a mode value
|
||||
to the model's implementation and format. So for example,
|
||||
if the mode value "dap2" is encountered, then the model
|
||||
implementation is set to NC\_FORMATX\_DAP2 and the format
|
||||
is set to NC\_FORMAT\_CLASSIC.
|
||||
|
||||
### Non-Mode Key Processing
|
||||
If processing the mode does not tell us the implementation, then
|
||||
all other fragment keys are processed to see if the implementaton
|
||||
(and format) can be deduced. Currently this does nothing.
|
||||
|
||||
### URL Defaults
|
||||
If the model is still not determined and the path is a URL, then
|
||||
the implementation is defaulted to DAP2. This is for back
|
||||
compatibility when all URLS implied DAP2.
|
||||
|
||||
### Mode Flags
|
||||
In the event that the path is not a URL, then it is necessary
|
||||
to use the mode flags and the isparallel arguments to choose a model.
|
||||
This is just a straight forward flag checking exercise.
|
||||
|
||||
### Content Inference -- check\_file\_type()
|
||||
If the path is being opened (as opposed to created), then
|
||||
it may be possible to actually read the first few bytes of the
|
||||
resource specified by the path and use that to determine the
|
||||
model. If this succeeds, then it takes precedence over
|
||||
all other model inferences.
|
||||
|
||||
### Flag Consistency
|
||||
Once the model is known, then the set of mode flags
|
||||
is modified to be consistent with that information.
|
||||
So for example, if DAP2 is the model, then all netcdf-4 mode flags
|
||||
and some netcdf-3 flags are removed from the set of mode flags
|
||||
because DAP2 provides only a standard netcdf-classic format.
|
||||
|
||||
# 4. Adding a Standard Filter
|
||||
|
||||
The standard filter system extends the netcdf-c library API to
|
||||
support a fixed set of "standard" filters. This is similar to the
|
||||
way that deflate and szip are currently supported.
|
||||
For background, the file filter.md should be consulted.
|
||||
|
||||
In general, the API for a standard filter has the following prototypes.
|
||||
The case of zstandard (libzstd) is used as an example.
|
||||
````
|
||||
int nc_def_var_zstandard(int ncid, int varid, int level);
|
||||
int nc_inq_var_zstandard(int ncid, int varid, int* has_filterp, int* levelp);
|
||||
````
|
||||
So generally the API has the ncid and the varid as fixed, and then
|
||||
a list of parameters specific to the filter -- level in this case.
|
||||
For the inquire function, there is an additional argument -- has_filterp --
|
||||
that is set to 1 if the filter is defined for the given variable
|
||||
and is 0 if not.
|
||||
The remainder of the inquiry parameters are pointers to memory
|
||||
into which the parameters are stored -- levelp in this case.
|
||||
|
||||
It is important to note that including a standard filter still
|
||||
requires three supporting objects:
|
||||
|
||||
1. The implementing library for the filter. For example,
|
||||
libzstd must be installed in order to use the zstandard
|
||||
API.
|
||||
2. A HDF5 wrapper for the filter must be installed in the
|
||||
directory pointed to by the HDF5_PLUGIN_PATH environment
|
||||
variable.
|
||||
3. (Optional) An NCZarr Codec implementation must be installed
|
||||
in the the HDF5_PLUGIN_PATH directory.
|
||||
|
||||
## Adding a New Standard Filter
|
||||
|
||||
The implementation of a standard filter must be loaded from one
|
||||
of several locations.
|
||||
|
||||
1. It can be part of libnetcdf.so (preferred),
|
||||
2. it can be loaded as part of the client code,
|
||||
3. or it can be loaded as part of an external library such as libccr.
|
||||
|
||||
However, the three objects listed above need to be
|
||||
stored in the HDF5_PLUGIN_DIR directory, so adding a standard
|
||||
filter still requires modification to the netcdf build system.
|
||||
This limitation may be lifted in the future.
|
||||
|
||||
### Build Changes
|
||||
In order to detect a standard library, the following changes
|
||||
must be made for Automake (configure.ac/Makefile.am)
|
||||
and CMake (CMakeLists.txt)
|
||||
|
||||
#### Configure.ac
|
||||
Configure.ac must have a block that similar to this that locates
|
||||
the implementing library.
|
||||
````
|
||||
# See if we have libzstd
|
||||
AC_CHECK_LIB([zstd],[ZSTD_compress],[have_zstd=yes],[have_zstd=no])
|
||||
if test "x$have_zstd" = "xyes" ; then
|
||||
AC_SEARCH_LIBS([ZSTD_compress],[zstd zstd.dll cygzstd.dll], [], [])
|
||||
AC_DEFINE([HAVE_ZSTD], [1], [if true, zstd library is available])
|
||||
fi
|
||||
AC_MSG_CHECKING([whether libzstd library is available])
|
||||
AC_MSG_RESULT([${have_zstd}])
|
||||
````
|
||||
Note the the entry point (*ZSTD_compress*) is library dependent
|
||||
and is used to see if the library is available.
|
||||
|
||||
#### Makefile.am
|
||||
|
||||
It is assumed you have an HDF5 wrapper for zstd. If you want it
|
||||
to be built as part of the netcdf-c library then you need to
|
||||
add the following to *netcdf-c/plugins/Makefile.am*.
|
||||
````
|
||||
if HAVE_ZSTD
|
||||
noinst_LTLIBRARIES += libh5zstd.la
|
||||
libh5szip_la_SOURCES = H5Zzstd.c H5Zzstd.h
|
||||
endif
|
||||
````
|
||||
|
||||
# Need our version of szip if libsz available and we are not using HDF5
|
||||
if HAVE_SZ
|
||||
noinst_LTLIBRARIES += libh5szip.la
|
||||
libh5szip_la_SOURCES = H5Zszip.c H5Zszip.h
|
||||
endif
|
||||
|
||||
#### CMakeLists.txt
|
||||
In an analog to *configure.ac*, a block like
|
||||
this needs to be in *netcdf-c/CMakeLists.txt*.
|
||||
````
|
||||
FIND_PACKAGE(Zstd)
|
||||
set_std_filter(Zstd)
|
||||
````
|
||||
The FIND_PACKAGE requires a CMake module for the filter
|
||||
in the cmake/modules directory.
|
||||
The *set_std_filter* function is a macro.
|
||||
|
||||
An entry in the file config.h.cmake.in will also be needed.
|
||||
````
|
||||
/* Define to 1 if zstd library available. */
|
||||
#cmakedefine HAVE_ZSTD 1
|
||||
````
|
||||
|
||||
### Implementation Template
|
||||
As a template, here is the implementation for zstandard.
|
||||
It can be used as the template for adding other standard filters.
|
||||
It is currently located in *netcdf-d/libdispatch/dfilter.c*, but
|
||||
could be anywhere as indicated above.
|
||||
````
|
||||
#ifdef HAVE_ZSTD
|
||||
int
|
||||
nc_def_var_zstandard(int ncid, int varid, int level)
|
||||
{
|
||||
int stat = NC_NOERR;
|
||||
unsigned ulevel;
|
||||
|
||||
if((stat = nc_inq_filter_avail(ncid,H5Z_FILTER_ZSTD))) goto done;
|
||||
/* Filter is available */
|
||||
/* Level must be between -131072 and 22 on Zstandard v. 1.4.5 (~202009)
|
||||
Earlier versions have fewer levels (especially fewer negative levels) */
|
||||
if (level < -131072 || level > 22)
|
||||
return NC_EINVAL;
|
||||
ulevel = (unsigned) level; /* Keep bit pattern */
|
||||
if((stat = nc_def_var_filter(ncid,varid,H5Z_FILTER_ZSTD,1,&ulevel))) goto done;
|
||||
done:
|
||||
return stat;
|
||||
}
|
||||
|
||||
int
|
||||
nc_inq_var_zstandard(int ncid, int varid, int* hasfilterp, int *levelp)
|
||||
{
|
||||
int stat = NC_NOERR;
|
||||
size_t nparams;
|
||||
unsigned params = 0;
|
||||
int hasfilter = 0;
|
||||
|
||||
if((stat = nc_inq_filter_avail(ncid,H5Z_FILTER_ZSTD))) goto done;
|
||||
/* Filter is available */
|
||||
/* Get filter info */
|
||||
stat = nc_inq_var_filter_info(ncid,varid,H5Z_FILTER_ZSTD,&nparams,NULL);
|
||||
if(stat == NC_ENOFILTER) {stat = NC_NOERR; hasfilter = 0; goto done;}
|
||||
if(stat != NC_NOERR) goto done;
|
||||
hasfilter = 1;
|
||||
if(nparams != 1) {stat = NC_EFILTER; goto done;}
|
||||
if((stat = nc_inq_var_filter_info(ncid,varid,H5Z_FILTER_ZSTD,&nparams,¶ms))) goto done;
|
||||
done:
|
||||
if(levelp) *levelp = (int)params;
|
||||
if(hasfilterp) *hasfilterp = hasfilter;
|
||||
return stat;
|
||||
}
|
||||
#endif /*HAVE_ZSTD*/
|
||||
````
|
||||
|
||||
# Point of Contact {#intern_poc}
|
||||
|
||||
*Author*: Dennis Heimbigner<br>
|
||||
*Email*: dmh at ucar dot edu<br>
|
||||
*Initial Version*: 12/22/2021<br>
|
||||
*Last Revised*: 01/25/2022
|
@ -481,9 +481,9 @@ collections — High-performance dataset datatypes](https://docs.python.org/2/li
|
||||
<a name="ref_xarray">[7]</a> [XArray Zarr Encoding Specification](http://xarray.pydata.org/en/latest/internals.html#zarr-encoding-specification)<br>
|
||||
<a name="ref_xarray">[8]</a> [Dynamic Filter Loading](https://support.hdfgroup.org/HDF5/doc/Advanced/DynamicallyLoadedFilters/HDF5DynamicallyLoadedFilters.pdf)<br>
|
||||
<a name="ref_xarray">[9]</a> [Officially Registered Custom HDF5 Filters](https://portal.hdfgroup.org/display/support/Registered+Filter+Plugins)<br>
|
||||
<a name="ref_xarray">[10]</a> [C-Blosc Compressor Implementation](https://github.com/Blosc/c-blosc)
|
||||
<a name="ref_awssdk_conda">[11]</a> [Conda-forge / packages / aws-sdk-cpp]
|
||||
(https://anaconda.org/conda-forge/aws-sdk-cpp)<br>
|
||||
<a name="ref_xarray">[10]</a> [C-Blosc Compressor Implementation](https://github.com/Blosc/c-blosc)<br>
|
||||
<a name="ref_awssdk_conda">[11]</a> [Conda-forge / packages / aws-sdk-cpp](https://anaconda.org/conda-forge/aws-sdk-cpp)<br>
|
||||
<a name="ref_gdal">[12]</a> [GDAL Zarr](https://gdal.org/drivers/raster/zarr.html)<br>
|
||||
|
||||
# Appendix A. Building NCZarr Support {#nczarr_build}
|
||||
|
||||
@ -524,8 +524,7 @@ Note also that if S3 support is enabled, then you need to have a C++ compiler in
|
||||
|
||||
The necessary CMake flags are as follows (with defaults)
|
||||
|
||||
1.
|
||||
-DENABLE_NCZARR=off -- equivalent to the Automake _--disable-nczarr_ option.
|
||||
1. -DENABLE_NCZARR=off -- equivalent to the Automake _--disable-nczarr_ option.
|
||||
2. -DENABLE_NCZARR_S3=off -- equivalent to the Automake _--enable-nczarr-s3_ option.
|
||||
3. -DENABLE_NCZARR_S3_TESTS=off -- equivalent to the Automake _--enable-nczarr-s3-tests_ option.
|
||||
|
||||
@ -562,7 +561,7 @@ Building this package from scratch has proven to be a formidable task.
|
||||
This appears to be due to dependencies on very specific versions of,
|
||||
for example, openssl.
|
||||
|
||||
## **nix** Build
|
||||
## *\*nix\** Build
|
||||
|
||||
For linux, the following context works. Of course your mileage may vary.
|
||||
* OS: ubuntu 21
|
||||
@ -682,7 +681,7 @@ Some of the relevant limits are as follows:
|
||||
Note that the limit is defined in terms of bytes and not (Unicode) characters.
|
||||
This affects the depth to which groups can be nested because the key encodes the full path name of a group.
|
||||
|
||||
# Appendix D. Alternative Mechanisms for Accessing Remote Datasets
|
||||
# Appendix D. Alternative Mechanisms for Accessing Remote Datasets {#nczarr_altremote}
|
||||
|
||||
The NetCDF-C library contains an alternate mechanism for accessing traditional netcdf-4 files stored in Amazon S3: The byte-range mechanism.
|
||||
The idea is to treat the remote data as if it was a big file.
|
||||
@ -706,7 +705,7 @@ Specifically, Thredds servers support such access using the HttpServer access me
|
||||
https://thredds-test.unidata.ucar.edu/thredds/fileServer/irma/metar/files/METAR_20170910_0000.nc#bytes
|
||||
````
|
||||
|
||||
# Appendix E. AWS Selection Algorithms.
|
||||
# Appendix E. AWS Selection Algorithms. {#nczarr_awsselect}
|
||||
|
||||
If byterange support is enabled, the netcdf-c library will parse the files
|
||||
````
|
||||
@ -764,7 +763,7 @@ Picking an access-key/secret-key pair is always determined
|
||||
by the current active profile. To choose to not use keys
|
||||
requires that the active profile must be "none".
|
||||
|
||||
# Appendix F. NCZarr Version 1 Meta-Data Representation
|
||||
# Appendix F. NCZarr Version 1 Meta-Data Representation. {#nczarr_version1}
|
||||
|
||||
In NCZarr Version 1, the NCZarr specific metadata was represented using new objects rather than as keys in existing Zarr objects.
|
||||
Due to conflicts with the Zarr specification, that format is deprecated in favor of the one described above.
|
||||
@ -779,6 +778,26 @@ The content of these objects is the same as the contents of the corresponding ke
|
||||
* ''.nczarray <=> ''_NCZARR_ARRAY_''
|
||||
* ''.nczattr <=> ''_NCZARR_ATTR_''
|
||||
|
||||
# Appendix G. JSON Attribute Convention. {#nczarr_version1}
|
||||
|
||||
An attribute may be encountered on read whose value when parsed
|
||||
by JSON is a dictionary. As a special conventions, the value
|
||||
converted to a string and stored as the value of the attribute
|
||||
and the type of the attribute is treated as char.
|
||||
|
||||
When writing a character valued attribute, it's value is examined
|
||||
to see if it looks like a JSON dictionary (i.e. "{...}")
|
||||
and is parseable as JSON.
|
||||
If so, then the attribute value is treated as one long string,
|
||||
parsed as JSON, and stored in the .zattr file in JSON form.
|
||||
|
||||
These conventions are intended to help support various
|
||||
attributes created by other packages where the attribute is a
|
||||
complex JSON dictionary. An example is the GDAL Driver
|
||||
convention <a href="#ref_gdal">[12]</a>. The value is a complex
|
||||
JSON dictionary and it is desirable to both read and write that kind of
|
||||
information through the netcdf API.
|
||||
|
||||
# Point of Contact {#nczarr_poc}
|
||||
|
||||
__Author__: Dennis Heimbigner<br>
|
||||
|
Loading…
Reference in New Issue
Block a user