mirror of
https://github.com/Unidata/netcdf-c.git
synced 2025-01-30 16:10:44 +08:00
772 lines
32 KiB
Markdown
772 lines
32 KiB
Markdown
Notes On the Internals of the NetCDF-C Library
|
|
============================
|
|
<!-- double header is needed to workaround doxygen bug -->
|
|
|
|
# Notes On the Internals of the NetCDF-C Library {#intern_head}
|
|
|
|
This document attempts to record important information about
|
|
the internal architecture and operation of the netcdf-c library.
|
|
It covers the following issues.
|
|
|
|
* [Including C++ Code in the netcdf-c Library](#intern_cpp)
|
|
* [Managing instances of variable-length data types](#intern_vlens)
|
|
* [Inferring File Types](#intern_infer)
|
|
* [Adding a Standard Filter](#intern_filters)
|
|
* [Test Interference](#intern_isolation)
|
|
|
|
# 1. Including C++ Code in the netcdf-c Library {#intern_cpp}
|
|
|
|
The state of C compiler technology has reached the point where
|
|
it is possible to include C++ code into the netcdf-c library
|
|
code base. Two examples are:
|
|
|
|
1. The AWS S3 SDK wrapper *libdispatch/ncs3sdk.cpp* file.
|
|
2. The TinyXML wrapper *ncxml\_tinyxml2.cpp* file.
|
|
|
|
However there are some consequences that must be handled for this to work.
|
|
Specifically, the compiler must be told that the C++ runtime is needed
|
|
in the following ways.
|
|
|
|
## Modifications to *lib\_flags.am*
|
|
Suppose we have a flag *ENABLE\_XXX* where that XXX
|
|
feature entails using C++ code. Then the following must be added
|
|
to *lib\_flags.am*
|
|
````
|
|
if ENABLE_XXX
|
|
AM_LDFLAGS += -lstdc++
|
|
endif
|
|
````
|
|
|
|
## Modifications to *libxxx/Makefile.am*
|
|
|
|
The Makefile in which the C++ code is included and compiled
|
|
(assumed here to be the *libxxx* directory) must have this set.
|
|
````
|
|
AM_CXXFLAGS = -std=c++11
|
|
````
|
|
It is possible that other values (e.g. *-std=c++14*) may also work.
|
|
|
|
# 2. Managing instances of variable-length data types {#intern_vlens}
|
|
|
|
For a long time, there have been known problems with the
|
|
management of complex types containing VLENs. This also
|
|
involves the string type because it is stored as a VLEN of chars.
|
|
|
|
The term "variable-length" refers to any
|
|
type that directly or recursively references a VLEN type. So an
|
|
array of VLENS, a compound with a VLEN field, and so on.
|
|
|
|
In order to properly handle instances of these variable-length types, it
|
|
is necessary to have function that can recursively walk
|
|
instances of such types to perform various actions on them. The
|
|
term "deep" is also used to mean recursive.
|
|
|
|
Two primary deep walking operations are provided by the netcdf-c library
|
|
to aid in managing instances of variable-length types.
|
|
- free'ing an instance of the type
|
|
- copying an instance of the type.
|
|
|
|
Note that the term "vector" is used in the following text to
|
|
mean a contiguous (in memory) sequence of instances of some
|
|
type. Given an array with, say, dimensions 2 X 3 X 4, this will
|
|
be stored in memory as a vector of length 2\*3\*4=24 instances.
|
|
|
|
## Special case reclamation functions
|
|
|
|
Previously The netcdf-c library provided functions to reclaim an
|
|
instance of a subset of the possible variable-length types. It
|
|
provided no specialized support for copying instances of
|
|
variable-length types.
|
|
|
|
These specialized functions are still available and can be used
|
|
when their pre-conditions are met. They have the advantage of
|
|
being faster than the general purpose functions described
|
|
later in this document.
|
|
|
|
These functions, and their pre and post conditions, are as follows.
|
|
* *int nc_free_string(size_t len, char\* data[])*
|
|
<u>Action</u>: reclaim a vector of variable length strings.
|
|
<u>Pre-condition(s)</u>: the data argument is a vector containing *len* pointers to variable length, nul-terminated strings.
|
|
<u>Post-condition(s)</u>: only the strings in the vector are reclaimed -- the vector itself is not reclaimed.
|
|
|
|
* int nc_free_vlens(size_t len, nc_vlen_t vlens[])
|
|
<u>Action</u>: reclaim a vector of VLEN instances.
|
|
<u>Pre-condition(s)</u>: the data argument is a vector containing *len* pointers to variable length, nul-terminated strings.
|
|
<u>Post-condition(s)</u>:
|
|
* only the data pointed to by the VLEN instances (i.e. *nc_vlen_t.p* in the vector are reclaimed -- the vector itself is not reclaimed.
|
|
* the base type of the VLEN must be a fixed-size type -- this means atomic types in the range NC_CHAR thru NC_UINT64, and compound types where the basetype of each field is itself fixed-size.
|
|
|
|
* *int nc_free_vlen(nc_vlen_t \*vl)*
|
|
<u>Action</u>: this is equivalent to calling *nc_free_vlens(1,&vl)*.
|
|
|
|
If the pre and post conditions are not met, then using these
|
|
functions can lead to a host of memory leaks and failures
|
|
because the deep data variable-length data is in effect
|
|
shared between the netcdf-c library internally and the user's data.
|
|
|
|
## Typical use cases
|
|
|
|
### *nc\_get\_vars*
|
|
Suppose one is reading a vector of instances using nc\_get\_vars
|
|
(or nc\_get\_vara or nc\_get\_var, etc.). These functions will
|
|
return the vector in the top-level memory provided. All
|
|
interior blocks (from nested VLEN or strings) will have been
|
|
dynamically allocated. Note that computing the size of the vector
|
|
may be tricky because the strides must be taken into account.
|
|
|
|
After using this vector of instances, it is necessary to free
|
|
(aka reclaim) the dynamically allocated memory, otherwise a
|
|
memory leak occurs. So, the recursive reclaim function is used
|
|
to walk the returned instance vector and do a deep reclaim of
|
|
the data.
|
|
|
|
### *nc\_put\_vars*
|
|
|
|
Suppose one is writing a vector of instances using nc\_put\_vars
|
|
(or nc\_put\_vara or nc\_put\_var, etc.). These functions will
|
|
write the contents of the vector to the specified variable.
|
|
Note that internally, the data passed to the nc\_put\_xxx function is
|
|
immediately written so there is no need to copy it internally. But the
|
|
caller may need to reclaim the vector of data that was created and passed
|
|
in to the nc\_put\_xxx function.
|
|
|
|
After writing this vector of instances, and assuming it was dynamically
|
|
created, at some point it will be necessary to reclaim that data.
|
|
So again, the recursive reclaim function can be used
|
|
to walk the returned instance vector and do a deep reclaim of
|
|
the data.
|
|
|
|
WARNING: If the data passed into these functions contains statically
|
|
allocated data, then using any of the reclaim functions will fail.
|
|
|
|
### *nc\_put\_att*
|
|
Suppose one is writing a vector of instances as the data of an attribute
|
|
using, say, nc\_put\_att.
|
|
|
|
Internally, the incoming attribute data must be copied and stored
|
|
so that changes/reclamation of the input data will not affect
|
|
the attribute. Note that this copying behavior is different from
|
|
writing to a variable, where the data is written immediately.
|
|
|
|
After defining the attribute, it may be necessary for the user
|
|
to free the data that was provided as input to nc\_put\_att() as in the
|
|
nc\_put\_xxx functions (previously described).
|
|
|
|
### *nc\_get\_att*
|
|
Suppose one is reading a vector of instances as the data of an attribute
|
|
using, say, nc\_get\_att.
|
|
|
|
Internally, the existing attribute data must be copied and returned
|
|
to the caller, and the caller is responsible for reclaiming
|
|
the returned data.
|
|
|
|
## New Instance Walking API
|
|
|
|
Proper recursive functions were added to the netcdf-c library to
|
|
provide reclaim and copy functions and use those as needed.
|
|
These functions are defined in libdispatch/dinstance.c and their
|
|
signatures are defined in include/netcdf.h. For back
|
|
compatibility, corresponding "ncaux\_XXX" functions are defined
|
|
in include/netcdf\_aux.h.
|
|
````
|
|
int nc_reclaim_data(int ncid, nc_type xtypeid, void* memory, size_t count);
|
|
int nc_reclaim_data_all(int ncid, nc_type xtypeid, void* memory, size_t count);
|
|
int nc_copy_data(int ncid, nc_type xtypeid, const void* memory, size_t count, void* copy);
|
|
int nc_copy_data_all(int ncid, nc_type xtypeid, const void* memory, size_t count, void** copyp);
|
|
````
|
|
There are two variants. The two functions, nc\_reclaim\_data() and
|
|
nc\_copy\_data(), assume the top-level vector is managed by the
|
|
caller. For reclaim, this is so the user can use, for example, a
|
|
statically allocated vector. For copy, it assumes the user
|
|
provides the space into which the copy is stored.
|
|
|
|
The other two, nc\_reclaim\_data\_all() and
|
|
nc\_copy\_data\_all(), allow the called functions to manage the
|
|
top-level. So for nc\_reclaim\_data\_all, the top level is
|
|
assumed to be dynamically allocated and will be free'd by
|
|
nc\_reclaim\_data\_all(). The nc\_copy\_data\_all() function
|
|
will allocate the top level and return a pointer to it to the
|
|
user. The user can later pass that pointer to
|
|
nc\_reclaim\_data\_all() to reclaim the instance(s).
|
|
|
|
## Internal Changes
|
|
The netcdf-c library internals are changed to use the proper reclaim
|
|
and copy functions. This also allows some simplification of the code
|
|
since the stdata and vldata fields of NC\_ATT\_INFO are no longer needed.
|
|
|
|
## Optimizations
|
|
|
|
In order to make these functions as efficient as possible, it is
|
|
desirable to classify all types as to whether or not they contain
|
|
variable-size data. If a type is fixed sized (i.e. does not contain
|
|
variable-size data) then it can be freed or copied as a single chunk.
|
|
This significantly increases the performance for such types.
|
|
For variable-size types, it is necessary to walk each instance of the type
|
|
and recursively reclaim or copy it. As another optimization,
|
|
if the type is a vector of strings, then the per-instance walk can be
|
|
sped up by doing the reclaim or copy inline.
|
|
|
|
The rules for classifying types as fixed or variable size are as follows.
|
|
|
|
1. All atomic types, except string, are fixed size.
|
|
2. All enum type and opaque types are fixed size.
|
|
3. All string types and VLEN types are variable size.
|
|
4. A compound type is fixed size if all of the types of its
|
|
fields are fixed size. Otherwise it has variable size.
|
|
|
|
The classification of types can be made at the time the type is defined
|
|
or is read in from some existing file. The reclaim and copy functions
|
|
use this information to speed up the handling of fixed size types.
|
|
|
|
## Warnings
|
|
|
|
1. The new API functions require that the type information be
|
|
accessible. This means that you cannot use these functions
|
|
after the file has been closed. After the file is closed, you
|
|
are on your own.
|
|
|
|
2. There is still one known failure that has not been solved; it is
|
|
possibly an HDF5 memory leak. All the failures revolve around
|
|
some variant of this .cdl file. The proximate cause of failure is
|
|
the use of a VLEN FillValue.
|
|
````
|
|
netcdf x {
|
|
types:
|
|
float(*) row_of_floats ;
|
|
dimensions:
|
|
m = 5 ;
|
|
variables:
|
|
row_of_floats ragged_array(m) ;
|
|
row_of_floats ragged_array:_FillValue = {-999} ;
|
|
data:
|
|
ragged_array = {10, 11, 12, 13, 14}, {20, 21, 22, 23}, {30, 31, 32},
|
|
{40, 41}, _ ;
|
|
}
|
|
````
|
|
|
|
# 3. Inferring File Types {#intern_infer}
|
|
|
|
As described in the companion document -- docs/dispatch.md --
|
|
when nc\_create() or nc\_open() is called, it must figure out what
|
|
kind of file is being created or opened. Once it has figured out
|
|
the file kind, the appropriate "dispatch table" can be used
|
|
to process that file.
|
|
|
|
## The Role of URLs
|
|
|
|
Figuring out the kind of file is referred to as model inference
|
|
and is, unfortunately, a complicated process. The complication
|
|
is mostly a result of allowing a path argument to be a URL.
|
|
Inferring the file kind from a URL requires deep processing of
|
|
the URL structure: the protocol, the host, the path, and the fragment
|
|
parts in particular. The query part is currently not used because
|
|
it usually contains information to be processed by the server
|
|
receiving the URL.
|
|
|
|
The "fragment" part of the URL may be unfamiliar.
|
|
The last part of a URL may optionally contain a fragment, which
|
|
is syntactically of this form in this pseudo URL specification.
|
|
````
|
|
<protocol>://<host>/<path>?<query>#<fragment>
|
|
````
|
|
The form of the fragment is similar to a query and takes this general form.
|
|
````
|
|
'#'<key>=<value>&<key>=<value>&...
|
|
````
|
|
The key is a simple name, the value is any sequence of characters,
|
|
although URL special characters such as '&' must be URL encoded in
|
|
the '%XX' form where each X is a hexadecimal digit.
|
|
An example might look like this non-sensical example:
|
|
````
|
|
https://host.com/path#mode=nczarr,s3&bytes
|
|
````
|
|
It is important to note that the fragment part is not intended to be
|
|
passed to the server, but rather is processed by the client program.
|
|
It is this property that allows the netcdf-c library to use it to
|
|
pass information deep into the dispatch table code that is processing the
|
|
URL.
|
|
|
|
## Model Inference Inputs
|
|
|
|
The inference algorithm is given the following information
|
|
from which it must determine the kind of file being accessed.
|
|
|
|
### Mode
|
|
|
|
The mode is a set of flags that are passed as the second
|
|
argument to nc\_create and nc\_open. The set of flags is define in
|
|
the netcdf.h header file. Generally it specifies the general
|
|
format of the file: netcdf-3 (classic) or netcdf-4 (enhanced).
|
|
Variants of these can also be specified, e.g. 64-bit netcdf-3 or
|
|
classic netcdf-4.
|
|
In the case where the path argument is a simple file path,
|
|
using a mode flag is the most common mechanism for specifying
|
|
the model.
|
|
|
|
### Path
|
|
The file path, the first argument to nc\_create and nc\_open,
|
|
Can be either a simple file path or a URL.
|
|
If it is a URL, then it will be deeply inspected to determine
|
|
the model.
|
|
|
|
### File Contents
|
|
When the contents of a real file are available,
|
|
the contents of the file can be used to determine the dispatch table.
|
|
As a rule, this is likely to be useful only for *nc\_open*.
|
|
It also requires access to functions that can open and read at least
|
|
the initial part of the file.
|
|
As a rule, the initial small prefix of the file is read
|
|
and examined to see if it matches any of the so-called
|
|
"magic numbers" that indicate the kind of file being read.
|
|
|
|
### Open vs Create
|
|
Is the file being opened or is it being created?
|
|
|
|
### Parallelism
|
|
Is parallel IO available?
|
|
|
|
## Model Inference Outputs
|
|
The inferencing algorithm outputs two pieces of information.
|
|
|
|
1. model - this is used by nc\_open and nc\_create to choose the dispatch table.
|
|
2. newpath - in some case, usually URLS, the path may be rewritten to include extra information for use by the dispatch functions.
|
|
|
|
The model output is actually a struct containing two fields:
|
|
|
|
1. implementation - this is a value from the NC\_FORMATX\_xxx
|
|
values in netcdf.h. It generally determines the dispatch
|
|
table to use.
|
|
2. format -- this is an NC\_FORMAT\_xxx value defining, in effect,
|
|
the netcdf-format to which the underlying format is to be
|
|
translated. Thus it can tell the netcdf-3 dispatcher that it
|
|
should actually implement CDF5 rather than standard netcdf classic.
|
|
|
|
## The Inference Algorithm
|
|
|
|
The construction of the model is primarily carried out by the function
|
|
*NC\_infermodel()* (in *libdispatch/dinfermodel.c*).
|
|
It is given the following parameters:
|
|
1. path -- (IN) absolute file path or URL
|
|
2. modep -- (IN/OUT) the set of mode flags given to *NC\_open* or *NC\_create*.
|
|
3. iscreate -- (IN) distinguish open from create.
|
|
4. useparallel -- (IN) indicate if parallel IO can be used.
|
|
5. params -- (IN/OUT) arbitrary data dependent on the mode and path.
|
|
6. model -- (IN/OUT) place to store inferred model.
|
|
7. newpathp -- (OUT) the canonical rewrite of the path argument.
|
|
|
|
As a rule, these values are used in the this order of preference
|
|
to infer the model.
|
|
|
|
1. file contents -- highest precedence
|
|
2. url (if it is one) -- using the "mode=" key in the fragment (see below).
|
|
3. mode flags
|
|
4. default format -- lowest precedence
|
|
|
|
The sequence of steps is as follows.
|
|
|
|
### URL processing -- processuri()
|
|
|
|
If the path appears to be a URL, then it is parsed
|
|
and processed by the processuri function as follows.
|
|
|
|
1. Protocol --
|
|
The protocol is extracted and tested against the list of
|
|
legal protocols. If not found, then it is an error.
|
|
If found, then it is replaced by a substitute -- if specified.
|
|
So, for example, the protocol "dods" is replaced the protocol "http"
|
|
(note that at some point "http" will be replaced with "https").
|
|
Additionally, one or more "key=value" strings is appended
|
|
to the existing fragment of the url. So, again for "dods",
|
|
the fragment is extended by the string "mode=dap2".
|
|
Thus replacing "dods" does not lose information, but rather transfers
|
|
it to the fragment for later use.
|
|
|
|
2. Fragment --
|
|
After the protocol is processed, the initial fragment processing occurs
|
|
by converting it to a list data structure of the form
|
|
````
|
|
{<key>,<value>,<key>,<value>,<key>,<value>....}
|
|
````
|
|
|
|
### Macro Processing -- processmacros()
|
|
|
|
If the fragment list produced by processuri() is non-empty, then
|
|
it is processed for "macros". Notice that if the original path
|
|
was not a URL, then the fragment list is empty and this
|
|
processing will be bypassed. In any case, It is convenient to
|
|
allow some singleton fragment keys to be expanded into larger
|
|
fragment components. In effect, those singletons act as
|
|
macros. They can help to simplify the user's URL. The term
|
|
singleton means a fragment key with no associated value:
|
|
"#bytes", for example.
|
|
|
|
The list of fragments is searched looking for keys whose
|
|
value part is NULL or the empty string. Then the table
|
|
of macros is searched for that key and if found, then
|
|
a key and values is appended to the fragment list and the singleton
|
|
is removed.
|
|
|
|
### Mode Inference -- processinferences()
|
|
|
|
This function just processes the list of values associated
|
|
with the "mode" key. It is similar to a macro in that
|
|
certain mode values are added or removed based on tables
|
|
of "inferences" and "negations".
|
|
Again, the purpose is to allow users to provide simplified URL fragments.
|
|
|
|
The list of mode values is repeatedly searched and whenever a value
|
|
is found that is in the "modeinferences" table, then the associated inference value
|
|
is appended to the list of mode values. This process stops when no changes
|
|
occur. This form of inference allows the user to specify "mode=zarr"
|
|
and have it converted to "mode=nczarr,zarr". This avoids the need for the
|
|
dispatch table code to do the same inference.
|
|
|
|
After the inferences are made, The list of mode values is again
|
|
repeatedly searched and whenever a value
|
|
is found that is in the "modenegations" table, then the associated negation value
|
|
is removed from the list of mode values, assuming it is there. This process stops when no changes
|
|
occur. This form of inference allows the user to make sure that "mode=bytes,nczarr"
|
|
has the bytes mode take precedence by removing the "nczarr" value. Such illegal
|
|
combinations can occur because of previous processing steps.
|
|
|
|
### Fragment List Normalization
|
|
As the fragment list is processed, duplicates appear with the same key.
|
|
A function -- cleanfragments() -- is applied to clean up the fragment list
|
|
by coalesing the values of duplicate keys and removing duplicate key values.
|
|
|
|
### S3 Rebuild
|
|
If the URL is determined to be a reference to a resource on the Amazon S3 cloud,
|
|
then the URL needs to be converted to what is called "path format".
|
|
There are four S3 URL formats:
|
|
|
|
1. Virtual -- ````https://<bucket>.s3.<region>.amazonaws.com/<path>````
|
|
2. Path -- ````https://s3.<region>.amazonaws.com/<bucket>/<path>````
|
|
3. S3 -- ````s3://<bucket>/<path>````
|
|
4. Other -- ````https://<host>/<bucket>/<path>````
|
|
|
|
The S3 processing converts all of these to the Path format. In the "S3" format case
|
|
it is necessary to find or default the region from examining the ".aws" directory files.
|
|
|
|
### File Rebuild
|
|
If the URL protocol is "file" and its path is a relative file path,
|
|
then it is made absolute by prepending the path of the current working directory.
|
|
|
|
In any case, after S3 or File rebuilds, the URL is completely
|
|
rebuilt using any modified protocol, host, path, and
|
|
fragments. The query is left unchanged in the current algorithm.
|
|
The resulting rebuilt URL is passed back to the caller.
|
|
|
|
### Mode Key Processing
|
|
The set of values of the fragment's "mode" key are processed one by one
|
|
to see if it is possible to determine the model.
|
|
There is a table for format interpretations that maps a mode value
|
|
to the model's implementation and format. So for example,
|
|
if the mode value "dap2" is encountered, then the model
|
|
implementation is set to NC\_FORMATX\_DAP2 and the format
|
|
is set to NC\_FORMAT\_CLASSIC.
|
|
|
|
### Non-Mode Key Processing
|
|
If processing the mode does not tell us the implementation, then
|
|
all other fragment keys are processed to see if the implementaton
|
|
(and format) can be deduced. Currently this does nothing.
|
|
|
|
### URL Defaults
|
|
If the model is still not determined and the path is a URL, then
|
|
the implementation is defaulted to DAP2. This is for back
|
|
compatibility when all URLS implied DAP2.
|
|
|
|
### Mode Flags
|
|
In the event that the path is not a URL, then it is necessary
|
|
to use the mode flags and the isparallel arguments to choose a model.
|
|
This is just a straight forward flag checking exercise.
|
|
|
|
### Content Inference -- check\_file\_type()
|
|
If the path is being opened (as opposed to created), then
|
|
it may be possible to actually read the first few bytes of the
|
|
resource specified by the path and use that to determine the
|
|
model. If this succeeds, then it takes precedence over
|
|
all other model inferences.
|
|
|
|
### Flag Consistency
|
|
Once the model is known, then the set of mode flags
|
|
is modified to be consistent with that information.
|
|
So for example, if DAP2 is the model, then all netcdf-4 mode flags
|
|
and some netcdf-3 flags are removed from the set of mode flags
|
|
because DAP2 provides only a standard netcdf-classic format.
|
|
|
|
# 4. Adding a Standard Filter {#intern_filters}
|
|
|
|
The standard filter system extends the netcdf-c library API to
|
|
support a fixed set of "standard" filters. This is similar to the
|
|
way that deflate and szip are currently supported.
|
|
For background, the file filter.md should be consulted.
|
|
|
|
In general, the API for a standard filter has the following prototypes.
|
|
The case of zstandard (libzstd) is used as an example.
|
|
````
|
|
int nc_def_var_zstandard(int ncid, int varid, int level);
|
|
int nc_inq_var_zstandard(int ncid, int varid, int* has_filterp, int* levelp);
|
|
````
|
|
So generally the API has the ncid and the varid as fixed, and then
|
|
a list of parameters specific to the filter -- level in this case.
|
|
For the inquire function, there is an additional argument -- has\_filterp --
|
|
that is set to 1 if the filter is defined for the given variable
|
|
and is 0 if not.
|
|
The remainder of the inquiry parameters are pointers to memory
|
|
into which the parameters are stored -- levelp in this case.
|
|
|
|
It is important to note that including a standard filter still
|
|
requires three supporting objects:
|
|
|
|
1. The implementing library for the filter. For example,
|
|
libzstd must be installed in order to use the zstandard
|
|
API.
|
|
2. A HDF5 wrapper for the filter must be installed in the
|
|
directory pointed to by the HDF5\_PLUGIN\_PATH environment
|
|
variable.
|
|
3. (Optional) An NCZarr Codec implementation must be installed
|
|
in the the HDF5\_PLUGIN\_PATH directory.
|
|
|
|
## Adding a New Standard Filter
|
|
|
|
The implementation of a standard filter must be loaded from one
|
|
of several locations.
|
|
|
|
1. It can be part of libnetcdf.so (preferred),
|
|
2. it can be loaded as part of the client code,
|
|
3. or it can be loaded as part of an external library such as libccr.
|
|
|
|
However, the three objects listed above need to be
|
|
stored in the HDF5\_PLUGIN\_PATH directory, so adding a standard
|
|
filter still requires modification to the netcdf build system.
|
|
This limitation may be lifted in the future.
|
|
|
|
### Build Changes
|
|
In order to detect a standard library, the following changes
|
|
must be made for Automake (configure.ac/Makefile.am)
|
|
and CMake (CMakeLists.txt)
|
|
|
|
#### Configure.ac
|
|
Configure.ac must have a block that similar to this that locates
|
|
the implementing library.
|
|
````
|
|
\# See if we have libzstd
|
|
AC_CHECK_LIB([zstd],[ZSTD_compress],[have_zstd=yes],[have_zstd=no])
|
|
if test "x$have_zstd" = "xyes" ; then
|
|
AC_SEARCH_LIBS([ZSTD_compress],[zstd zstd.dll cygzstd.dll], [], [])
|
|
AC_DEFINE([HAVE_ZSTD], [1], [if true, zstd library is available])
|
|
fi
|
|
AC_MSG_CHECKING([whether libzstd library is available])
|
|
AC_MSG_RESULT([${have_zstd}])
|
|
````
|
|
Note the the entry point (*ZSTD\_compress*) is library dependent
|
|
and is used to see if the library is available.
|
|
|
|
#### Makefile.am
|
|
|
|
It is assumed you have an HDF5 wrapper for zstd. If you want it
|
|
to be built as part of the netcdf-c library then you need to
|
|
add the following to *netcdf-c/plugins/Makefile.am*.
|
|
````
|
|
if HAVE_ZSTD
|
|
noinst_LTLIBRARIES += libh5zstd.la
|
|
libh5szip_la_SOURCES = H5Zzstd.c H5Zzstd.h
|
|
endif
|
|
````
|
|
|
|
````
|
|
\# Need our version of szip if libsz available and we are not using HDF5
|
|
if HAVE_SZ
|
|
noinst_LTLIBRARIES += libh5szip.la
|
|
libh5szip_la_SOURCES = H5Zszip.c H5Zszip.h
|
|
endif
|
|
````
|
|
#### CMakeLists.txt
|
|
In an analog to *configure.ac*, a block like
|
|
this needs to be in *netcdf-c/CMakeLists.txt*.
|
|
````
|
|
FIND_PACKAGE(Zstd)
|
|
set_std_filter(Zstd)
|
|
````
|
|
The FIND\_PACKAGE requires a CMake module for the filter
|
|
in the cmake/modules directory.
|
|
The *set\_std\_filter* function is a macro.
|
|
|
|
An entry in the file config.h.cmake.in will also be needed.
|
|
````
|
|
/* Define to 1 if zstd library available. */
|
|
#cmakedefine HAVE_ZSTD 1
|
|
````
|
|
|
|
### Implementation Template
|
|
As a template, here is the implementation for zstandard.
|
|
It can be used as the template for adding other standard filters.
|
|
It is currently located in *netcdf-d/libdispatch/dfilter.c*, but
|
|
could be anywhere as indicated above.
|
|
````
|
|
#ifdef HAVE_ZSTD
|
|
int
|
|
nc_def_var_zstandard(int ncid, int varid, int level)
|
|
{
|
|
int stat = NC_NOERR;
|
|
unsigned ulevel;
|
|
|
|
if((stat = nc_inq_filter_avail(ncid,H5Z_FILTER_ZSTD))) goto done;
|
|
/* Filter is available */
|
|
/* Level must be between -131072 and 22 on Zstandard v. 1.4.5 (~202009)
|
|
Earlier versions have fewer levels (especially fewer negative levels) */
|
|
if (level < -131072 || level > 22)
|
|
return NC_EINVAL;
|
|
ulevel = (unsigned) level; /* Keep bit pattern */
|
|
if((stat = nc_def_var_filter(ncid,varid,H5Z_FILTER_ZSTD,1,&ulevel))) goto done;
|
|
done:
|
|
return stat;
|
|
}
|
|
|
|
int
|
|
nc_inq_var_zstandard(int ncid, int varid, int* hasfilterp, int *levelp)
|
|
{
|
|
int stat = NC_NOERR;
|
|
size_t nparams;
|
|
unsigned params = 0;
|
|
int hasfilter = 0;
|
|
|
|
if((stat = nc_inq_filter_avail(ncid,H5Z_FILTER_ZSTD))) goto done;
|
|
/* Filter is available */
|
|
/* Get filter info */
|
|
stat = nc_inq_var_filter_info(ncid,varid,H5Z_FILTER_ZSTD,&nparams,NULL);
|
|
if(stat == NC_ENOFILTER) {stat = NC_NOERR; hasfilter = 0; goto done;}
|
|
if(stat != NC_NOERR) goto done;
|
|
hasfilter = 1;
|
|
if(nparams != 1) {stat = NC_EFILTER; goto done;}
|
|
if((stat = nc_inq_var_filter_info(ncid,varid,H5Z_FILTER_ZSTD,&nparams,¶ms))) goto done;
|
|
done:
|
|
if(levelp) *levelp = (int)params;
|
|
if(hasfilterp) *hasfilterp = hasfilter;
|
|
return stat;
|
|
}
|
|
#endif /*HAVE_ZSTD*/
|
|
````
|
|
|
|
# 5. Test Interference {#intern_isolation}
|
|
|
|
At some point, Unidata switched from running tests serially to running tests in parallel.
|
|
It soon became apparent that there were resources shared between tests and that parallel
|
|
execution sometimes caused interference between tests.
|
|
|
|
In order to fix the inter-test interference, several approaches were used.
|
|
1. Renaming resources (primarily files) so that tests would create difference test files.
|
|
2. Telling the test system that there were explicit dependencies between tests so that they would not be run in parallel.
|
|
3. Isolating test resources by creating independent directories for each test.
|
|
|
|
## Test Isolation
|
|
The isolation mechanism is currently used mostly in nczarr_tests.
|
|
It requires that tests are all executed inside a shell script.
|
|
When the script starts, it invokes a shell function called "isolate".
|
|
This function looks in current directory for a directory called "testset_\<uid\>".
|
|
If "testset_\<uid\> is not found then it creates it.
|
|
This directory is then used to isolate all test output.
|
|
|
|
After calling "isolate", the script enters the "testset_\<uid\>"
|
|
directory. Then each actual test creates a directory in which to
|
|
store any file resources that it creates during execution.
|
|
Suppose, for example, that the shell script is called "run_XXXX.sh".
|
|
The isolate function creates a directory with the general name "testset_\<uid\>".
|
|
Then the run_XXX.sh script creates a directory "testset_\<uid\>/testdir_XXX",
|
|
enters it and runs the test.
|
|
During cleanup, specifically "make clean", all the testset_\<uid\> directories are deleted.
|
|
|
|
The "\<uid\>" is a unique identifier created using the "date +%s" command. It returns an integer
|
|
representing the number of seconds since the start of the so-called "epoch" basically
|
|
"00:00:00 UTC, 1 January 1970". Using a date makes it easier to detect and reclaim obsolete
|
|
testset directories.
|
|
|
|
## Cloud Test Isolation
|
|
|
|
When testing against the cloud (currently Amazon S3), the interference
|
|
problem is intensified.
|
|
This is because the current cloud testing uses a single S3 bucket,
|
|
which means that not only is there inter-test interference, but there
|
|
is also potential interference across builds.
|
|
This means, for example, that testing by github actions could
|
|
interfere with local testing by individual users.
|
|
This problem is difficult to solve, but a mostly complete solution has been implemented
|
|
possible with cmake, but not (as yet) possible with automake.
|
|
|
|
In any case, there is a shell function called s3isolate in nczarr_test/test_nczarr.sh that operates on cloud resources in a way that is similar to the isolate function.
|
|
The s3isolate does several things:
|
|
1. It invokes isolate to ensure local isolation.
|
|
2. It creates a path prefix relative to the Unidata S3 bucket that has the name "testset_\<uid\>", where this name
|
|
is the same as the one created by the isolate function.
|
|
3. It appends the uid to a file called s3cleanup_\<pid\>.uids. This file may accumulate several uids indicating
|
|
the keys that need to be cleaned up. The pid is a separate small unique id to avoid s3cleanup interference.
|
|
|
|
The test script then ensures that any cloud resources are created as extensions of the path prefix.
|
|
|
|
Cleanup of S3 resources is complex.
|
|
In configure.ac or the top-level CMakeList.txt files, the path "netcdf-c/testset_\<uid\>"
|
|
is created and via configuration commands, is propagated to various Makefile.am
|
|
and specific script files.
|
|
|
|
The actual cleanup requires different approaches for cmake and for automake.
|
|
In cmake, the CTestCustom.cmake mechanism is used and contains the following command:
|
|
````
|
|
IF(ENABLE_S3_TESTING)
|
|
# Assume run in top-level CMAKE_BINARY_DIR
|
|
set(CTEST_CUSTOM_POST_TEST "bash -x ${CMAKE_BINARY_DIR}/s3cleanup.sh")
|
|
ENDIF()
|
|
````
|
|
|
|
In automake, the "check-local" extension mechanism is used
|
|
because it is invoked after all tests are run in the nczarr_test
|
|
directory. So nczarr_test/Makefile.am contains the following
|
|
equivalent code:
|
|
````
|
|
if ENABLE_S3_TESTALL
|
|
check-local:
|
|
bash -x ${top_srcdir}/s3cleanup.sh
|
|
endif
|
|
````
|
|
|
|
### s3cleanup.sh
|
|
This script is created by configuring the base file s3cleanup.in.
|
|
It is unfortunately complex, but roughly it does the following.
|
|
1. It computes a list of all the keys for all objects in the Unidata bucket and stores them in a file
|
|
named "s3cleanup_\<pid\>.keys".
|
|
2. Get a list of date based uids created above.
|
|
3. Iterate over the keys and the uids to collect the set of keys matching any one of the uids.
|
|
4. Divide the keys into sets of 500. This is because the delete-objects command will not accept more than 1000 keys at a time.
|
|
5. Convert each set of 500 keys from step 4 into a properly formatted JSON file suitable for use by the "aws delete-objects" command.
|
|
This file is called "s3cleanup_\<pid\>.json".
|
|
6. Use the "aws delete-objects" command to delete the keys.
|
|
7. Repeat steps 5 and 6 for each set of 500 keys.
|
|
|
|
The pid is a small random number to avoid local interference.
|
|
It is important to note that this script assumes that the
|
|
AWS command line package is installed.
|
|
This can be installed, for example, using this command:
|
|
````apt install awscli````.
|
|
|
|
### s3gc.sh
|
|
This script is created by configuring the base file s3gc.in.
|
|
It is intended as a way for users to manually cleanup the S3 Unidata bucket.
|
|
It takes a single argument, delta, that is the number of days before the present day
|
|
and computes a stop date corresponding to "present_day - delta".
|
|
All keys for all uids on or before the stop date.
|
|
|
|
It operates as follows:
|
|
1. Get a list of date based uids created above.
|
|
2. Iterate over the keys and collect all that are on or before the stop date.
|
|
3. Divide the keys into sets of 500. This is in recognition of the 1000 key limit mentioned previously.
|
|
4. Convert each set of 500 keys from step 3 into a properly formatted JSON file suitable for use by the "aws delete-objects" command.
|
|
This file is called "s3cleanup_\<pid\>.json".
|
|
5. Use the "aws delete-objects" command to delete the keys.
|
|
6. Repeat steps 4 and 5 for each set of 500 keys.
|
|
|
|
# Point of Contact {#intern_poc}
|
|
|
|
*Author*: Dennis Heimbigner<br>
|
|
*Email*: dmh at ucar dot edu<br>
|
|
*Initial Version*: 12/22/2021<br>
|
|
*Last Revised*: 9/16/2023
|