* Some auxilliary functions for parsing textual filter specifications have been moved to the file *netcdf\_aux.h*. See [Appendix A](#filters_appendixa).
These functions implicitly use the HDF5 mechanisms and may produce an error if applied to a file format that is not compatible with the HDF5 mechanism.
Add a filter to the set of filters to be used when writing a variable. This must be invoked after the variable has been created and before *nc\_enddef* is invoked.
````
int nc_def_var_filter(int ncid, int varid, unsigned int id,
* NC\_ENOFILTER — Filter not defined for the variable.
The *id* indicates the filter of interest.
The actual parameters are stored in *params*.
The number of parameters is returned in *nparamsp*.
As is usual with the netcdf API, one is expected to call this function twice.
The first time to set *nparamsp* and the second to get the parameters in client-allocated memory.
Any of these arguments can be NULL, in which case no value is returned.
If the specified id is not attached to the variable, then NC\_ENOFILTER is returned.
### nc\_inq\_var\_filter
Query a variable to obtain information about the first filter associated with the variable.
When netcdf-c was modified to support multiple filters per variable, the utility of this function became redundant since it returns info only about the first defined filter for the variable.
Internally, it is implemented using the functions *nc\_inq\_var\_filter\_ids* and *nc\_inq\_filter\_info*.
* *\_Filter* — a string containing a comma separated list of constants specifying (1) the filter id to apply, and (2) a vector of constants representing the parameters for controlling the operation of the specified filter.
This is a "special" attribute, which means that it will normally be invisible when using *ncdump* unless the -s flag is specified.
For backward compatibility it is probably better to use the *\_Deflate* attribute instead of *\_Filter*. But using *\_Filter* to specify deflation will work.
Note that the lexical order of declaration is important when more than one filter is specified for a variable because it determines the order in which the filters are applied.
### Example CDL File (Data elided)
````
netcdf bzip2szip {
dimensions:
dim0 = 4 ; dim1 = 4 ; dim2 = 4 ; dim3 = 4 ;
variables:
float var(dim0, dim1, dim2, dim3) ;
var:_Filter = "307,9|4,32,32" ; // bzip2 then szip
var:_Storage = "chunked" ;
var:_ChunkSizes = 4, 4, 4, 4 ;
data:
...
}
````
Note that the assigned filter id for bzip2 is 307 and for szip it is 4.
When copying a netcdf file using *nccopy* it is possible to specify filter information for any output variable by using the "-F" option on the command line; for example:
nccopy -F "var,307,9" unfiltered.nc filtered.nc
Assume that *unfiltered.nc* has a chunked but not bzip2 compressed variable named "var".
This command will copy that variable to the *filtered.nc* output file but using filter with id 307 (i.e. bzip2) and with parameter(s) 9 indicating the compression level.
See the section on the <ahref="#filters_syntax">parameter encoding syntax</a> for the details on the allowable kinds of constants.
The "-F" option can be used repeatedly, as long as a different variable is specified for each occurrence.
It can be convenient to specify that the same compression is to be applied to more than one variable. To support this, two additional *-F* cases are defined.
Note that the characters '\*', '\&', and '\|' are shell reserved characters, so you will probably need to escape or quote the filter spec in that environment.
The utilities <ahref="#NCGEN">ncgen</a> and <ahref="#NCCOPY">nccopy</a>, and also the output of *ncdump*, support the specification of filter ids, formats, and parameters in text format.
Basically, These specifications consist of a filter id, a comma, and then a sequence of
comma separated constants representing the parameters.
The constants are converted within the utility to a proper set of unsigned int constants (see the <ahref="#ParamEncode">parameter encoding section</a>).
To simplify things, various kinds of constants can be specified rather than just simple unsigned integers.
The *ncgen* and *nccopy* programs will encode them properly using the rules specified in the section on <ahref="#filters_paramcoding">parameter encode/decode</a>.
Since the original types are lost after encoding, *ncdump* will always show a simple list of unsigned integer constants.
<tr><td>-9223372036854775807L<td>64-bit signed long long<td>l|L<td>LE encoding
<tr><td>18446744073709551615UL<td>64-bit unsigned long long<td>u|U l|L<td>LE encoding
</table>
Some things to note.
1. In all cases, except for an untagged positive integer, the format tag is required and determines how the constant is converted to one or two unsigned int values.
2. For an untagged positive integer, the constant is treated as of the smallest type into which it fits (i.e. 8,16,32, or 64 bit).
3. For signed byte and short, the value is sign extended to 32 bits and then treated as an unsigned int value, but maintaining the bit-pattern.
4. For double, and signed|unsigned long long, they are converted as specified in the section on <ahref="#filters_paramcoding">parameter encode/decode</a>.
1.*H5PL\_type\_t H5PLget\_plugin\_type(void)*— This function is expected to return the constant value *H5PL\_TYPE\_FILTER* to indicate that this is a filter library.
2.*const void* H5PLget\_plugin\_info(void)* — This function returns a pointer to a table of type *H5Z\_class2\_t*.
In particular, it specifies the filter id implemented by the library and it must match that id specified for the variable in *nc\_def\_var\_filter* in order to be used.
If plugin verification fails, then that plugin is ignored and the search continues for another, matching plugin.
# NCZarr Filter Support {#filters_nczarr}
The inclusion of Zarr support in the netcdf-c library creates the need to provide a new representation consistent with the way that Zarr files store filter information.
For Zarr, filters are represented using the JSON notation.
Each filter is defined by a JSON dictionary, and each such filter dictionary
is guaranteed to have a key named "id" whose value is a unique string defining the filter algorithm: "lz4" or "bzip2", for example.
1. It must store its filter information in its metadata in the above JSON dictionary format.
2. It is required to re-use the HDF5 filter implementations.
This is to avoid having to rewrite the filter implementations
This means that some mechanism is needed to translate between the HDF5 id+parameter model and the Zarr JSON dictionary model.
3. It must be possible to modify the set of visible parameters in response to environment information such as the type of the associated variable; this is required to mimic the corresponding HDF5 capability.
Note that the term "visible parameters" is used here to refer to the parameters provided by "nc_def_var_filter" or those stored in the dataset's metadata as provided by the JSON codec. The term "working parameters" refers to the parameters given to the compressor itself and derived from the visible parameters.
The standard authority for defining Zarr filters is the list supported by the NumCodecs project [7].
Comparing the set of standard filters (aka codecs) defined by NumCodecs to the set of standard filters defined by HDF5 [3], it can be seen that the two sets overlap, but each has filters not defined by the other.
Note also that it is undesirable that a specific set of filters/codecs be built into the NCZarr implementation.
Rather, it is preferable for there be some extensible way to associate the JSON with the code implementing the codec. This mirrors the plugin model used by HDF5.
The mechanism provided to address these issues is similar to that taken by HDF5.
A shared library must exist that has certain well-defined entry points that allow the NCZarr code to determine information about a Codec.
The shared library exports a well-known function name to access Codec information and relate it to a corresponding HDF5 implementation,
If this is important, then the filter implementation is responsible for marking this difference using, for example, different number of parameters or some differing value.
### Step 2: Convert visible parameters to working parameters
A new special attribute is defined called *\_Codecs* in parallel to the current *\_Filters* special attribute. Its value is a string containing the JSON representation of the Codecs associated with a given variable.
When it comes time to write out the meta-data, the stored HDF5-style parameters are passed to a specific Codec function to obtain the corresponding JSON representation. Again see [Appendix E](#filters_appendixe).
then the filter chain is (f1,f2,...fn,c) with f1 being applied first and c being applied last when encoding. On decode, the filter chain is executed in the order (c,fn...f2,f1).
So, an HDF5 filter chain is divided into two parts, where the last filter in the chain is assigned as the "compressor" and the remaining
filters are assigned as the "filters".
But independent of this, each codec, whether a compressor or a filter,
is stored in the JSON dictionary form described earlier.
## Extensions
The Codec style, using JSON, has the ability to provide very complex parameters that may be hard to encode as a vector of unsigned integers.
It might be desirable to consider exporting a JSON-base API out of the netcdf-c API to support user access to this complexity.
As part of its testing, the NetCDF build process creates a number of shared libraries in the *netcdf-c/plugins* (or sometimes *netcdf-c/plugins/.libs*) directory.
If you need a filter from that set, you may be able to set *HDF5\_PLUGIN\_PATH*
A slightly simplified version of one of the HDF5 filter test cases is also available as an example within the netcdf-c source tree directory *netcdf-c/examples/C*.
The test is called *filter\_example.c* and it is executed as part of the *run\_examples4.sh* shell script.
When multiple filters are defined on a variable, the order of application, when writing data to the file, is same as the order in which *nc\_def\_var\_filter*is called.
2. If *nc\_def\_var\_filter*or *nc\_def\_var\_deflate*or *nc\_def\_var\_szip*is called multiple times with the same filter id, but possibly with different sets of parameters, then the position of that filter in the sequence of applictions does not change.
It turns out that the two parameters provided when calling nc\_def\_var\_filter correspond to the first two parameters of the four parameters returned by nc\_inq\_var\_filter.
2. Change the values of some parameters: the value of the *options\_mask* argument is known to add additional flag bits, and the *pixels\_per\_block* parameter may be modified.
The reason for these changes is has to do with the fact that the szip API provided by the underlying H5Pset\_szip function is actually a subset of the capabilities of the real szip implementation.
In any case, if the caller uses the *nc\_inq\_var\_szip* or the *nc\_inq\_var\_filter* functions, then the parameter values returned may differ from those originally specified.
It should also be noted that the HDF5 szip filter wrapper that
is invoked depends on the configuration of the netcdf-c library.
If the HDF5 installation supports szip, then the NCZarr szip
will use the HDF5 wrapper. If HDF5 does not support szip, or HDF5
is not enabled, then the plugins directory will contain a local
HDF5 szip wrapper to be used by NCZarr. This can be confusing,
but is generally transparent to the use since the plugins
HDF5 szip wrapper was taken from the HDF5 code base.
# Appendix A. HDF5 Parameter Encode/Decode {#filters_appendixa}
The filter id for an HDF5 format filter is an unsigned integer.
Further, the parameters passed to an HDF5 format filter are encoded internally as a vector of 32-bit unsigned integers.
It may be that the parameters required by a filter can naturally be encoded as unsigned integers.
The bzip2 compression filter, for example, expects a single integer value from zero thru nine.
This encodes naturally as a single unsigned integer.
Note that signed integers and single-precision (32-bit) float values also can easily be represented as 32 bit unsigned integers by proper casting to an unsigned integer so that the bit pattern is preserved.
Simple signed integer values of type short or char can also be mapped to an unsigned integer by truncating to 16 or 8 bits respectively and then sign extending. Similarly, unsigned 8 and 16 bit
Machine byte order (aka endian-ness) is an issue for passing some kinds of parameters.
You might define the parameters when compressing on a little endian machine, but later do the decompression on a big endian machine.
When using HDF5 format filters, byte order is not an issue for 32-bit values because HDF5 takes care of converting them between the local machine byte order and network byte order.
Parameters whose size is larger than 32-bits present a byte order problem.
This specifically includes double precision floats and (signed or unsigned) 64-bit integers.
For these cases, the machine byte order issue must be handled, in part, by the compression code.
This is because HDF5 will treat, for example, an unsigned long long as two 32-bit unsigned integers and will convert each to network order separately.
This means that on a machine whose byte order is different than the machine in which the parameters were initially created, the two integers will be separately
1. the 8 bytes start as native machine order for the machine doing the call to *nc\_def\_var\_filter*.
2. The caller divides the 8 bytes into 2 four byte pieces and passes them to *nc\_def\_var\_filter*.
3. HDF5 takes each four byte piece and ensures that each piece is in network (big) endian order.
4. When the filter is called, the two pieces are returned in the same order but with the bytes in each piece consistent with the native machine order for the machine executing the filter.
In order to properly extract the correct 8-byte value, we need to ensure that the values stored in the HDF5 file have a known format independent of the native format of the creating machine.
The idea is to do sufficient manipulation so that HDF5 will store the 8-byte value as a little endian value divided into two 4-byte integers.
Note that little-endian is used as the standard because it is the most common machine format.
When read, the filter code needs to be aware of this convention and do the appropriate conversions.
This leads to the following set of rules.
### Encoding
1. Encode on little endian (LE) machine: no special action is required.
The 8-byte value is passed to HDF5 as two 4-byte integers.
HDF5 byte swaps each integer and stores it in the file.
2. Encode on a big endian (BE) machine: several steps are required:
1. Do an 8-byte byte swap to convert the original value to little-endian format.
2. Since the encoding machine is BE, HDF5 will just store the value.
So it is necessary to simulate little endian encoding by byte-swapping each 4-byte integer separately.
3. This doubly swapped pair of integers is then passed to HDF5 and is stored unchanged.
### Decoding
1. Decode on LE machine: no special action is required.
HDF5 will get the two 4-bytes values from the file and byte-swap each separately.
The concatenation of those two integers will be the expected LE value.
2. Decode on a big endian (BE) machine: the inverse of the encode case must be implemented.
1. HDF5 sends the two 4-byte values to the filter.
2. The filter must then byte-swap each 4-byte value independently.
3. The filter then must concatenate the two 4-byte values into a single 8-byte value.
Because of the encoding rules, this 8-byte value will be in LE format.
4. The filter must finally do an 8-byte byte-swap on that 8-byte value to convert it to desired BE format.
unsigned int filterid; /* ID for arbitrary filter. */
size_t nparams; /* nparams for arbitrary filter. */
unsigned int* params; /* Params for arbitrary filter. */
} NC_H5_Filterspec;
````
This struct in effect encapsulates all of the information about and HDF5 formatted filter — the id, the number of parameters, and the parameters themselves.
# Appendix C. Build Flags for Detecting the Filter Mechanism {#filters_appendixc}
This, in conjunction with the error code *NC\_ENOFILTER* in *netcdf.h* can be used to see what filter mechanism is in place as described in the section on \ref filters_compatibility.
2. defined(NC\_ENOFILTER) && !defined(NC\_HAS\_MULTIFILTERS) — indicates that the 4.7.4 mechanism is in place.
It does support multiple filters, but the error return codes for *nc\_inq\_var\_filter* are different and the filter spec parser functions are in a different location with different names.
3. defined(NC\_ENOFILTER) && defined(NC\_HAS\_MULTIFILTERS) — indicates that the multiple filters are supported, and that *nc\_inq\_var\_filter* returns a filterid of zero to indicate that a variable has no filters.
Also, the filter spec parsers have the names and signatures described in this document and are define in *netcdf\_aux.h*.
The Codec API mirrors the HDF5 API closely. It has one well-known function that can be invoked to obtain information about the Codec as well as pointers to special functions to perform conversions.
Support for a select set of standard filters is built into the NetCDF API.
Generally, they are accessed using the following generic API, where XXXX is
the filter name. As a rule, the names are those used in the HDF5 filter ID naming authority [4] or the NumCodecs naming authority [7].
````
int nc_def_var_XXXX(int ncid, int varid, unsigned filterid, size_t nparams, unsigned* params);
int nc_inq_var_XXXX(int ncid, int varid, int* hasfilter, size_t* nparamsp, unsigned* params);
````
The first function inserts the specified filter into the filter chain for a given variable.
The second function queries the given variable to see if the specified function
is in the filter chain for that variable. The *hasfilter* argument is set
to one if the filter is in the chain and zero otherwise.
As is usual with the netcdf API, one is expected to call this function twice.
The first time to set *nparamsp* and the second to get the parameters in the client-allocated memory argument *params*.
Any of these arguments can be NULL, in which case no value is returned.
Note that NetCDF inherits four filters from HDF5, namely shuffle, fletcher32, deflate (zlib), and szip. The API's for these do not conform to the above API.
So aside from those four, the current set of standard filters is as follows.
1.*libzstd.so* | *zstd.dll* | *libzstd.dylib* -- The actual zstandard compressor library; typically installed by using your platform specific package manager.
2. The HDF5 wrapper for *libzstd.so* -- There are several options for obtaining this (see [Appendix G](#filters_appendixg).)
3. (Optional) The Zarr wrapper for *libzstd.so* -- you need this if you intend to read/write Zarr datasets that were compressed using zstandard; again see [Appendix G](#filters_appendixg).
1. HDF5_PLUGIN_PATH is defined and is the same value as it was at build time -- no action needed
2. HDF5_PLUGIN_PATH is defined and is has a different value from build time -- the user is responsible for ensuring that the run-time path includes the same directory used at build time, otherwise this case will fail.
3. HDF5_PLUGIN_DIR is not defined at either run-time or build-time -- no action needed
4. HDF5_PLUGIN_DIR is not defined at run-time but was defined at build-time -- this will probably fail