re: https://github.com/Unidata/netcdf-c/issues/1836 Revert the internal filter code to simplify it. From the user's point of view, the only visible changes should be: 1. The functions that convert text to filter specs have had their signature reverted and have been moved to netcdf_aux.h 2. Some filter API functions now return NC_ENOFILTER when inquiry is made about some filter. Internally,the dispatch table has been modified to get rid of the filter_actions entry and associated complex structures. It has been replaced with inq_var_filter_ids and inq_var_filter_info entries and the dispatch table version has been bumped to 3. Corresponding NOOP and NOTNC4 functions were added to libdispatch/dnotnc4.c. Also, the filter_action entries in dispatch tables were replaced for all dispatch code bases (HDF5, DAP2, etc). This should only impact UDF users. In the process, it became clear that the form of the filters field in NC_VAR_INFO_T was format dependent, so I converted it to be of type void* and pushed its management into the various dispatch code bases. Specifically libhdf5 and libnczarr now manage the filters field in their own way. The auxilliary functions for parsing textual filter specifications were moved to netcdf_aux.h and were renamed to the following: * ncaux_h5filterspec_parse * ncaux_h5filterspec_parselist * ncaux_h5filterspec_free * ncaux_h5filter_fix8 Misc. Other Changes: 1. Document NUG/filters.md updated to reflect the changes above. 2. All the old data types (structs and enums) used by filter_actions actions were deleted. The exception is the NC_H5_Filterspec because it is needed by ncaux_h5filterspec_parselist. 3. Clientside filters were removed -- another enhancement for which no-one ever asked. 4. The ability to remove filters was itself removed. 5. Some functionality needed by nczarr was moved from libhdf5 to libsrc4 e.g. nc4_find_default_chunksizes 6. All the filterx code was removed 7. ncfilter.h and nc4filter.c no longer used Misc. Unrelated Changes: 1. The nczarr_test makefile clean was leaving some directories; so add clean-local to take care of them.
32 KiB
NetCDF-4 Filter Support
NetCDF-4 Filter Support
[TOC]
The netCDF library supports a general filter mechanism to apply various kinds of filters to datasets before reading or writing.
The netCDF enhanced (aka netCDF-4) library inherits this capability since it depends on the HDF5 library. The HDF5 library (1.8.11 and later) supports filters, and netCDF is based closely on that underlying HDF5 mechanism.
Filters assume that a variable has chunking defined and each chunk is filtered before writing and "unfiltered" after reading and before passing the data to the user.
In the event that multiple filters are defined on a variable, they are applied in first-defined order on writing and on the reverse order when reading.
The most common kind of filter is a compression-decompression filter, and that is the focus of this document.
For now, this document is strongly influenced by the HDF5 mechanism. When other implementations (e.g. Zarr) support filters, this document will have multiple sections: one for each mechanism.
A Warning on Backward Compatibility
The API defined in this document should accurately reflect the current state of filters in the netCDF-c library. Be aware that there was a short period in which the filter code was undergoing some revision and extension. Those extensions have largely been reverted. Unfortunately, some users may experience some compilation problems for previously working code because of these reversions. In that case, please revise your code to adhere to this document. Apologies are extended for any inconvenience.
The primary user-visible incompatibiltities are as follows:
- Some functions (e.g. nc_inq_var_filter_ids) had a different name in the previous releases.
- The functions nc_inq_var_filter_info and nc_inq_var_filter return the NC_ENOFILTER error when information is requested about from a variable that has no filters or does not have a specified filter attached.
- Some auxilliary functions for parsing textual filter specifications have been moved to netcdf_aux.h. See Appendix A.
- It is no longer possible to use a filter name, such as "szip"; one must use the assigned filter id, 4 for szip.
Enabling A HDF5 Compression Filter
HDF5 supports dynamic loading of compression filters using the following process for reading of compressed data.
- Assume that we have a dataset with one or more variables that were compressed using some algorithm. How the dataset was compressed will be discussed subsequently.
- Shared libraries or DLLs exist that implement the compress/decompress algorithm. These libraries have a specific API so that the HDF5 library can locate, load, and utilize the compressor. These libraries are expected to installed in a specific directory.
In order to compress a variable with an HDF5 compliant filter, the netcdf-c library must be given three pieces of information: (1) some unique identifier for the filter to be used, (2) a vector of parameters for controlling the action of the compression filter, and (3) a shared library implementation of the filter.
The meaning of the parameters is, of course, completely filter dependent and the filter description [3] needs to be consulted. For bzip2, for example, a single parameter is provided representing the compression level. It is legal to provide a zero-length set of parameters. Defaults are not provided, so this assumes that the filter can operate with zero parameters.
Filter ids are assigned by the HDF group. See [4] for a current list of assigned filter ids. Note that ids above 32767 can be used for testing without registration.
The first two pieces of information can be provided in one of three ways: using ncgen, via an API call, or via command line parameters to nccopy. In any case, remember that filtering also requires setting chunking, so the variable must also be marked with chunking information. If compression is set for a non-chunked variable, the variable will forcibly be converted to chunked using a default chunking algorithm.
Using The API
The necessary API methods are included in netcdf_filter.h by default. These functions implicitly use the HDF5 mechanisms and may produce an error if applied to a file format that is not compatible with the HDF5 mechanism.
- Add a filter to the set of filters to be used when writing a variable. This must be invoked after the variable has been created and before nc_enddef is invoked.
int nc_def_var_filter(int ncid, int varid, unsigned int id, size_t nparams, const unsigned int* params);
Arguments:
* ncid — File and group ID.
* varid — Variable ID.
* id — Filter ID.
* nparams — Number of filter parameters.
* params — Filter parameters.
Return codes:
* NC_NOERR — No error.
* NC_ENOTNC4 — Not a netCDF-4 file.
* NC_EBADID — Bad ncid or bad filter id
* NC_ENOTVAR — Invalid variable ID.
* NC_EINDEFINE — called when not in define mode
* NC_ELATEDEF — called after variable was created
* NC_EINVAL — Scalar variable, or parallel enabled and parallel filters not supported or nparams or params invalid.
-
Query a variable to obtain a list of all filters associated with that variable.
The number of filters associated with the variable is stored in nfiltersp (it may be zero). The set of filter ids will be returned in filterids. As is usual with the netcdf API, one is expected to call this function twice. The first time to set nfiltersp and the second to get the filter ids in client-allocated memory. Any of these arguments can be NULL, in which case no value is returned.
int nc_inq_var_filter_ids(int ncid, int varid, size_t* nfiltersp, unsigned int* filterids);
Arguments:
* ncid — File and group ID.
* varid — Variable ID.
* nfiltersp — Stores number of filters found; may be zero.
* filterids — Stores set of filter ids.
Return codes:
* NC_NOERR — No error.
* NC_ENOTNC4 — Not a netCDF-4 file.
* NC_EBADID — Bad ncid
* NC_ENOTVAR — Invalid variable ID.
-
Query a variable to obtain information about a specific filter associated with the variable.
The id indicates the filter of interest. The actual parameters are stored in params. The number of parameters is returned in nparamsp. As is usual with the netcdf API, one is expected to call this function twice. The first time to set nparamsp and the second to get the parameters in client-allocated memory. Any of these arguments can be NULL, in which case no value is returned. If the specified id is not attached to the variable, then NC_ENOFILTER is returned.
int nc_inq_var_filter_info(int ncid, int varid, unsigned int id, size_t* nparamsp, unsigned int* params);
Arguments:
* ncid — File and group ID.
* varid — Variable ID.
* id — The filter id of interest.
* nparamsp — Stores number of parameters.
* params — Stores set of filter parameters.
Return codes:
* NC_NOERR — No error.
* NC_ENOTNC4 — Not a netCDF-4 file.
* NC_EBADID — Bad ncid
* NC_ENOTVAR — Invalid variable ID.
* NC_ENOFILTER — Filter not defined for the variable.
-
Query a variable to obtain information about the first filter associated with the variable.
When netcdf-c was modified to support multiple filters per variable, the utility of this function became redundant since it returns info only about the first defined filter for the variable. Internally, it is implemented using the functions nc_inq_var_filter_ids and nc_inq_filter_info.
In any case, the filter id will be returned in the idp argument. If there are not filters, then zero is stored in this argument. Otherwise, the number of parameters is stored in nparamsp and the actual parameters in params. As is usual with the netcdf API, one is expected to call this function twice. The first time to set nparamsp and the second to get the parameters in client-allocated memory. Any of these arguments can be NULL, in which case no value is returned.
int nc_inq_var_filter(int ncid, int varid, unsigned int* idp, size_t* nparamsp, unsigned int* params);
Arguments:
* ncid — File and group ID.
* varid — Variable ID.
* idp — Stores the id of the first found filter, set to zero if variable has no filters.
* nparamsp — Stores number of parameters.
* params — Stores set of filter parameters.
Return codes:
* NC_NOERR — No error.
* NC_ENOTNC4 — Not a netCDF-4 file.
* NC_EBADID — Bad ncid
* NC_ENOTVAR — Invalid variable ID.
Using ncgen
In a CDL file, compression of a variable can be specified by annotating it with the following attribute:
- ''_Filter'' — a string containing a comma separated list of constants specifying (1) the filter id to apply, and (2) a vector of constants representing the parameters for controlling the operation of the specified filter. See the section on the parameter encoding syntax for the details on the allowable kinds of constants.
This is a "special" attribute, which means that it will normally be invisible when using ncdump unless the -s flag is specified.
This attribute may be repeated to specify multiple filters. For backward compatibility it is probably better to use the ''_Deflate'' attribute instead of ''_Filter''. But using ''_Filter'' to specify deflation will work.
Note that the lexical order of declaration is important when more than one filter is specified for a variable because it determines the order in which the filters are applied.
Example CDL File (Data elided)
netcdf bzip2szip {
dimensions:
dim0 = 4 ; dim1 = 4 ; dim2 = 4 ; dim3 = 4 ;
variables:
float var(dim0, dim1, dim2, dim3) ;
var:_Filter = "307,9|4,32,32" ; // bzip2 then szip
var:_Storage = "chunked" ;
var:_ChunkSizes = 4, 4, 4, 4 ;
data:
...
}
Note that the assigned filter id for bzip2 is 307 and for szip it is 4.
Using nccopy
When copying a netcdf file using nccopy it is possible to specify filter information for any output variable by using the "-F" option on the command line; for example:
nccopy -F "var,307,9" unfiltered.nc filtered.nc
Assume that unfiltered.nc has a chunked but not bzip2 compressed variable named "var". This command will copy that variable to the filtered.nc output file but using filter with id 307 (i.e. bzip2) and with parameter(s) 9 indicating the compression level. See the section on the parameter encoding syntax for the details on the allowable kinds of constants.
The "-F" option can be used repeatedly, as long as a different variable is specified for each occurrence.
It can be convenient to specify that the same compression is to be applied to more than one variable. To support this, two additional -F cases are defined.
-F *,...
means apply the filter to all variables in the dataset.-F v1&v2&..,...
means apply the filter to multiple variables.
Multiple filters can be specified using the pipeline notions '|'. For example
-F v1&v2,307,9|4,32,32
means apply filter 307 (bzip2) then filter 4 (szip) to the multiple variables.
Note that the characters '*', '&', and '|' are shell reserved characters, so you will probably need to escape or quote the filter spec in that environment.
As a rule, any input filter on an input variable will be applied to the equivalent output variable — assuming the output file type is netcdf-4. It is, however, sometimes convenient to suppress output compression either totally or on a per-variable basis. Total suppression of output filters can be accomplished by specifying a special case of "-F", namely this.
nccopy -F none input.nc output.nc
The expression -F *,none
is equivalent to -F none
.
Suppression of output filtering for a specific set of variables can be accomplished using these formats.
nccopy -F "var,none" input.nc output.nc
nccopy -F "v1&v2&...,none" input.nc output.nc
where "var" and the "vi" are the fully qualified name of a variable.
The rules for all possible cases of the "-F none" flag are defined by this table.
-F none | -Fvar,... | Input Filter | Applied Output Filter |
---|---|---|---|
true | undefined | NA | unfiltered |
true | none | NA | unfiltered |
true | defined | NA | use output filter(s) |
false | undefined | defined | use input filter(s) |
false | none | NA | unfiltered |
false | defined | NA | use output filter(s) |
false | undefined | undefined | unfiltered |
false | defined | defined | use output filter(s) |
HDF5 Parameter Encode/Decode
The filter id for an HDF5 format filter is an unsigned integer. Further, the parameters passed to an HDF5 format filter are encoded internally as a vector of 32-bit unsigned integers. It may be that the parameters required by a filter can naturally be encoded as unsigned integers. The bzip2 compression filter, for example, expects a single integer value from zero thru nine. This encodes naturally as a single unsigned integer.
Note that signed integers and single-precision (32-bit) float values also can easily be represented as 32 bit unsigned integers by proper casting to an unsigned integer so that the bit pattern is preserved. Simple integer values of type short or char (or the unsigned versions) can also be mapped to an unsigned integer by truncating to 16 or 8 bits respectively and then zero extending.
Machine byte order (aka endian-ness) is an issue for passing some kinds of parameters. You might define the parameters when compressing on a little endian machine, but later do the decompression on a big endian machine.
When using HDF5 format filters, byte order is not an issue for 32-bit values because HDF5 takes care of converting them between the local machine byte order and network byte order.
Parameters whose size is larger than 32-bits present a byte order problem. This specifically includes double precision floats and (signed or unsigned) 64-bit integers. For these cases, the machine byte order issue must be handled, in part, by the compression code. This is because HDF5 will treat, for example, an unsigned long long as two 32-bit unsigned integers and will convert each to network order separately. This means that on a machine whose byte order is different than the machine in which the parameters were initially created, the two integers will be separately endian converted. But this will be incorrect for 64-bit values.
So, we have this situation (for HDF5 only):
- the 8 bytes come in as native machine order for the machine doing the call to nc_def_var_filter.
- HDF5 divides the 8 bytes into 2 four byte pieces and ensures that each piece is in network (big) endian order.
- When the filter is called, the two pieces are returned in the same order but with the bytes in each piece consistent with the native machine order for the machine executing the filter.
Encoding Algorithms for HDF5
In order to properly extract the correct 8-byte value, we need to ensure that the values stored in the HDF5 file have a known format independent of the native format of the creating machine.
The idea is to do sufficient manipulation so that HDF5 will store the 8-byte value as a little endian value divided into two 4-byte integers. Note that little-endian is used as the standard because it is the most common machine format. When read, the filter code needs to be aware of this convention and do the appropriate conversions.
This leads to the following set of rules.
Encoding
-
Encode on little endian (LE) machine: no special action is required. The 8-byte value is passed to HDF5 as two 4-byte integers. HDF5 byte swaps each integer and stores it in the file.
-
Encode on a big endian (BE) machine: several steps are required:
- Do an 8-byte byte swap to convert the original value to little-endian format.
- Since the encoding machine is BE, HDF5 will just store the value. So it is necessary to simulate little endian encoding by byte-swapping each 4-byte integer separately.
- This doubly swapped pair of integers is then passed to HDF5 and is stored unchanged.
Decoding
-
Decode on LE machine: no special action is required. HDF5 will get the two 4-bytes values from the file and byte-swap each separately. The concatenation of those two integers will be the expected LE value.
-
Decode on a big endian (BE) machine: the inverse of the encode case must be implemented.
- HDF5 sends the two 4-byte values to the filter.
- The filter must then byte-swap each 4-byte value independently.
- The filter then must concatenate the two 4-byte values into a single 8-byte value. Because of the encoding rules, this 8-byte value will be in LE format.
- The filter must finally do an 8-byte byte-swap on that 8-byte value to convert it to desired BE format.
To support these rules, some utility programs exist and are discussed in Appendix A.
Filter Specification Syntax
The utilities ncgen and nccopy, and also the output of ncdump, support the specification of filter ids, formats, and parameters in text format. The BNF specification is defined in Appendix C. Basically, These specifications consist of a filter id a comma, and then a sequence of comma separated constants representing the parameters The constants are converted within the utility to a proper set of unsigned int constants (see the parameter encoding section).
To simplify things, various kinds of constants can be specified rather than just simple unsigned integers. The ncgen and nccopy will encode them properly using the rules specified in the section on parameter encode/decode. Since the original types are lost after encoding, ncdump will always show a simple list of unsigned integer constants.
The currently supported constants are as follows.
Example | Type | Format Tag | Notes |
---|---|---|---|
-17b | signed 8-bit byte | b|B | Truncated to 8 bits and zero extended to 32 bits |
23ub | unsigned 8-bit byte | u|U b|B | Truncated to 8 bits and zero extended to 32 bits |
-25S | signed 16-bit short | s|S | Truncated to 16 bits and zero extended to 32 bits |
27US | unsigned 16-bit short | u|U s|S | Truncated to 16 bits and zero extended to 32 bits |
-77 | implicit signed 32-bit integer | Leading minus sign and no tag | |
77 | implicit unsigned 32-bit integer | No tag | |
93U | explicit unsigned 32-bit integer | u|U | |
789f | 32-bit float | f|F | |
12345678.12345678d | 64-bit double | d|D | LE encoding |
-9223372036854775807L | 64-bit signed long long | l|L | LE encoding |
18446744073709551615UL | 64-bit unsigned long long | u|U l|L | LE encoding |
- In all cases, except for an untagged positive integer, the format tag is required and determines how the constant is converted to one or two unsigned int values.
- For an untagged positive integer, the constant is treated as of the smallest type into which it fits (i.e. 8,16,32, or 64 bit).
- For signed byte and short, the value is sign extended to 32 bits and then treated as an unsigned int value, but maintaining the bit-pattern.
- For double, and signed|unsigned long long, they are converted as specified in the section on parameter encode/decode.
- In order to support mutiple filters, the argument to ''_Filter'' may be a pipeline separated (using '|') to specify a list of filters specs.
Dynamic Loading Process
Each filter is assumed to be compiled into a separate dynamically loaded library. For HDF5 conformant filters, these filter libraries are assumed to be in some specific location. The details for writing such a filter are defined in the HDF5 documentation[1,2].
Plugin directory
The HDF5 loader expects plugins to be in a specified plugin directory. The default directory is:
- "/usr/local/hdf5/lib/plugin” for linux/unix operating systems (including Cygwin)
- “%ALLUSERSPROFILE%\hdf5\lib\plugin” for Windows systems, although the code does not appear to explicitly use this path.
The default may be overridden using the environment variable HDF5_PLUGIN_PATH.
Plugin Library Naming
Given a plugin directory, HDF5 examines every file in that directory that conforms to a specified name pattern as determined by the platform on which the library is being executed.
Platform | Basename | Extension |
---|---|---|
Linux | lib* | .so* |
OSX | lib* | .so* |
Cygwin | cyg* | .dll* |
Windows | * | .dll |
Plugin Verification
For each dynamic library located using the previous patterns, HDF5 attempts to load the library and attempts to obtain information from it. Specifically, It looks for two functions with the following signatures.
- H5PL_type_t H5PLget_plugin_type(void) — This function is expected to return the constant value H5PL_TYPE_FILTER to indicate that this is a filter library.
- const void* H5PLget_plugin_info(void) — This function returns a pointer to a table of type H5Z_class2_t. This table contains the necessary information needed to utilize the filter both for reading and for writing. In particular, it specifies the filter id implemented by the library and it must match that id specified for the variable in nc_def_var_filter in order to be used.
If plugin verification fails, then that plugin is ignored and the search continues for another, matching plugin.
Debugging
Debugging plugins can be very difficult. You will probably need to use the old printf approach for debugging the filter itself.
One case worth mentioning is when you have a dataset that is using an unknown filter. For this situation, you need to identify what filter(s) are used in the dataset. This can be accomplished using this command.
ncdump -s -h <dataset filename>
Since ncdump is not being asked to access the data (the -h flag), it can obtain the filter information without failures. Then it can print out the filter id and the parameters (the -s flag).
Test Cases
Within the netcdf-c source tree, the directory netcdf-c/nc_test4 contains a number of test cases for testing dynamic filter writing and reading. These include
- test_filter.c — tests simple compression/decompression using the bzip2 compressor in the directory plugins.
- test_filterparser.c — validates parameter passing.
- test_multifilter.c — tests applying multiple filters to a single variable: bzip2, deflate(zip), and szip (if enabled).
- test_filter.sh — test driver to execute the above tests.
These tests are disabled if --disable-shared or if --disable-hdf5 is specified.
Example
A slightly simplified version of the filter test case is also available as an example within the netcdf-c source tree directory netcdf-c/examples/C. The test is called filter_example.c and it is executed as part of the run_examples4.sh shell script. The test case demonstrates dynamic filter writing and reading.
The files example/C/hdf5plugins/Makefile.am and example/C/hdf5plugins/CMakeLists.txt demonstrate how to build the hdf5 plugin for bzip2.
Notes
Order of Invocation for Multiple Filters
When multiple filters are defined on a variable, the order of application, when writing data to the file, is same as the order in which nc_def_var_filter is called. When reading a file the order of application is of necessity the reverse.
There are some special cases.
- The fletcher32 filter is always applied first, if enabled.
- If nc_def_var_filter or nc_def_var_deflate or nc_def_var_szip is called multiple times with the same filter id, but possibly with different sets of parameters, then the position of that filter in the sequence of applictions does not change. However the last set of parameters specified is used when actually writing the dataset.
- Deflate and shuffle — these two are inextricably linked in the current API, but have quite different semantics. If you call nc_def_var_deflate multiple times, then the previous rule applies with respect to deflate. However, the shuffle filter, if enabled, is ''always'' applied before applying any other filters, except fletcher32.
- Once a filter is defined for a variable, it cannot be removed nor can its position in the filter order be changed.
Memory Allocation Issues
Starting with HDF5 version 1.10.x, the plugin code MUST be careful when using the standard malloc(), realloc(), and free() function.
In the event that the code is allocating, reallocating, for free'ing memory that either came from or will be exported to the calling HDF5 library, then one MUST use the corresponding HDF5 functions H5allocate_memory(), H5resize_memory(), H5free_memory() [5] to avoid memory failures.
Additionally, if your filter code leaks memory, then the HDF5 library generates a failure something like this.
H5MM.c:232: H5MM_final_sanity_check: Assertion `0 == H5MM_curr_alloc_bytes_s' failed.
One can look at the the code in plugins/H5Zbzip2.c and H5Zmisc.c as illustrations.
SZIP Issues
The current szip plugin code in the HDF5 library has some behaviors that can catch the unwary. These are handled internally to (mostly) hide them so that they should not affect users. Specifically, this filter may do two things.
- Add extra parameters to the filter parameters: going from the two parameters provided by the user to four parameters for internal use. It turns out that the two parameters provided when calling nc_def_var_filter correspond to the first two parameters of the four parameters returned by nc_inq_var_filter.
- Change the values of some parameters: the value of the options_mask argument is known to add additional flag bits, and the pixels_per_block parameter may be modified.
The reason for these changes is has to do with the fact that the szip API provided by the underlying H5Pset_szip function is actually a subset of the capabilities of the real szip implementation. Presumably this is for historical reasons.
In any case, if the caller uses the nc_inq_var_szip or the nc_inq_var_filter functions, then the parameter values returned may differ from those originally specified.
Supported Systems
The current matrix of OS X build systems known to work is as follows.
Build System | Supported OS |
---|---|
Automake | Linux, Cygwin, OSX |
Cmake | Linux, Cygwin, OSX, Visual Studio |
Generic Plugin Build
If you do not want to use Automake or Cmake, the following has been known to work.
gcc -g -O0 -shared -o libbzip2.so <plugin source files> -L${HDF5LIBDIR} -lhdf5_hl -lhdf5 -L${ZLIBDIR} -lz
Appendix A. Support Utilities
Several functions are exported from the netcdf-c library for use by client programs and by filter implementations. They are defined in the header file netcdf_aux.h. The h5 tag indicates that they assume that the result of the parse is a set of unsigned integers — the format used by HDF5.
-
int ncaux_h5filterspec_parse(const char* txt, unsigned int* idp. size_t* nparamsp, unsigned int** paramsp);
- txt contains the text of a sequence of comma separated constants
- idp will contain the first constant — the filter id
- nparamsp will contain the number of params
- paramsp will contain a vector of params — the caller must free This function can parse single filter spec strings as defined in the section on Filter Specification Syntax.
-
int ncaux_h5filterspec_parselist(const char* txt, int* formatp, size_t* nspecsp, struct NC_H5_Filterspec*** vectorp);
- txt contains the text of a sequence '|' separated filter specs.
- formatp currently always returns 0.
- nspecsp will return the number of filter specifications.
- vectorp will return a pointer to a vector of pointers to filter specification instances — the caller must free. This function parses a sequence of filter specifications each separated by a '|' character. The text between '|' separators must be parsable by ncaux_h5filterspec_parse.
-
void ncaux_h5filterspec_free(struct NC_H5_Filterspec* f);
- f is a pointer to an instance of
struct NC_H5_Filterspec
Typically this was returned as an element of the vector returned by ncaux_h5filterspec_parselist.
This reclaims the parameters of the filter spec object as well as the object itself.
- f is a pointer to an instance of
-
int ncaux_h5filterspec_fix8(unsigned char* mem8, int decode);
- mem8 is a pointer to the 8-byte value either to fix.
- decode is 1 if the function should apply the 8-byte decoding algorithm else apply the encoding algorithm. This function implements the 8-byte conversion algorithms for HDF5 Before calling nc_def_var_filter (unless NC_parsefilterspec was used), the client must call this function with the decode argument set to 0. Inside the filter code, this function should be called with the decode argument set to 1.
Examples of the use of these functions can be seen in the test program nc_test4/tst_filterparser.c.
Some of the above functions use a C struct defined in netcdf_filter.h. The definition of that struct is as follows.
typedef struct NC_H5_Filterspec {
unsigned int filterid; /* ID for arbitrary filter. */
size_t nparams; /* nparams for arbitrary filter. */
unsigned int* params; /* Params for arbitrary filter. */
} NC_H5_Filterspec;
This struct in effect encapsulates all of the information about and HDF5 formatted filter — the id, the number of parameters, and the parameters themselves.
Appendix B. Build Flags for Detecting the Filter Mechanism
The include file _netcdf_meta.h contains the following definition.
#define NC_HAS_MULTIFILTERS 1
This, in conjunction with the error code NC_ENOFILTER in netcdf.h can be used to see what filter mechanism is in place.
- !defined(NC_ENOFILTER) && !defined(NC_HAS_MULTIFILTERS) — indicates that the old pre-4.7.4 mechanism is in place. It does not support multiple filters.
- defined(NC_ENOFILTER) && !defined(NC_HAS_MULTIFILTERS) — indicates that the 4.7.4 mechanism is in place. It does support multiple filters, but the error return codes for nc_inq_var_filter are different and the filter spec parser functions are in a different location with different names.
- defined(NC_ENOFILTER) && defined(NC_HAS_MULTIFILTERS) — indicates that the multiple filters are supported, that nc_inq_var_filter returns a filterid of zero to indicate that a variable has no filters. Also, the filter spec parsers have the names and signatures described in this document and are define in netcdf_aux.h.
Appendix C. BNF for Specifying Filters in Utilities
speclist: spec
| speclist '|' spec
;
spec: filterid
| filterid ',' parameterlist
;
filterid: unsigned32
;
parameterlist: parameter
| parameterlist ',' parameter
;
parameter: unsigned32
where
unsigned32: <32 bit unsigned integer>
References
- https://support.hdfgroup.org/HDF5/doc/Advanced/DynamicallyLoadedFilters/HDF5DynamicallyLoadedFilters.pdf
- https://support.hdfgroup.org/HDF5/doc/TechNotes/TechNote-HDF5-CompressionTroubleshooting.pdf
- https://portal.hdfgroup.org/display/support/Contributions#Contributions-filters
- https://support.hdfgroup.org/services/contributions.html#filters
- https://support.hdfgroup.org/HDF5/doc/RM/RM_H5.html
- https://confluence.hdfgroup.org/display/HDF5/Filters
Point of Contact
Author: Dennis Heimbigner
Email: dmh at ucar dot edu
Initial Version: 1/10/2018
Last Revised: 9/18/2020