Merge pull request #2388 from rkouznetsov/lossydoc

Add clarification for the meaning of NSB
This commit is contained in:
Ward Fisher 2023-12-19 10:08:34 -07:00 committed by GitHub
commit 4bdb093fca
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -611,7 +611,99 @@ As part of its testing, the NetCDF build process creates a number of shared libr
If you need a filter from that set, you may be able to set *HDF5\_PLUGIN\_PATH*
to point to that directory or you may be able to copy the shared libraries out of that directory to your own location.
## Debugging {#filters_debug}
# Lossy One-Way Filters
As of NetCDF version 4.8.2, the netcdf-c library supports
bit-grooming filters.
````
Bit-grooming is a lossy compression algorithm that removes the
bloat due to false-precision, those bits and bytes beyond the
meaningful precision of the data. Bit Grooming is statistically
unbiased, applies to all floating point numbers, and is easy to
use. Bit-Grooming reduces data storage requirements by
25-80%. Unlike its best-known competitor Linear Packing, Bit
Grooming imposes no software overhead on users, and guarantees
its precision throughout the whole floating point range
[https://doi.org/10.5194/gmd-9-3199-2016].
````
The generic term "quantize" is used to refer collectively to the various
precision-trimming algorithms. The key thing to note about quantization is that
it occurs at the point of writing of data only. Since its output is
legal data, it does not need to be "de-quantized" when the data is read.
Because of this, quantization is not part of the standard filter
mechanism and has a separate API.
The API for bit-groom is currently as follows.
````
int nc_def_var_quantize(int ncid, int varid, int quantize_mode, int nsd);
int nc_inq_var_quantize(int ncid, int varid, int *quantize_modep, int *nsdp);
````
The *quantize_mode* argument specifies the particular algorithm.
Currently, three are supported: NC_QUANTIZE_BITGROOM, NC_QUANTIZE_GRANULARBR,
and NC_QUANTIZE_BITROUND. In addition quantization can be disabled using
the value NC_NOQUANTIZE.
The input to ncgen or the output from ncdump supports special attributes
to indicate if quantization was applied to a given variable.
These attributes have the following form.
````
_QuantizeBitGroomNumberOfSignificantDigits = <NSD>
or
_QuantizeGranularBitRoundNumberOfSignificantDigits = <NSD>
or
_QuantizeBitRoundNumberOfSignificantBits = <NSB>
````
The value NSD is the number of significant (decimal) digits to keep.
The value NSB is the number of bits to keep in the fraction part of an
IEEE754 floating-point number. Note that NSB of QuantizeBitRound is the same as
"number of explicit mantissa bits" (https://doi.org/10.5194/gmd-9-3199-2016) and same as
the number of "keep-bits" (https://doi.org/10.5194/gmd-14-377-2021), but is not
one less than the number of significant bunary figures:
`_QuantizeBitRoundNumberOfSignificantBits = 0` means one significant binary figure,
`_QuantizeBitRoundNumberOfSignificantBits = 1` means two significant binary figures etc.
## Distortions introduced by lossy filters
Any lossy filter introduces distortions to data.
The lossy filters implemented in netcdf-c introduce a distortoin
that can be quantified in terms of a _relative_ error. The magnitude of
distortion introduced to every single value V is guaranteed to be within
a certain fraction of V, expressed as 0.5 * V * 2**{-NSB}:
i.e. it is 0.5V for NSB=0, 0.25V for NSB=1, 0.125V for NSB=2 etc.
Two other methods use different definitions of _decimal precision_, though both
are guaranteed to reproduce NSD decimals when printed.
The margin for a relative error introduced by the methods are summarised in the table
```
NSD 1 2 3 4 5 6 7
BitGroom
Error Margin 3.1e-2 3.9e-3 4.9e-4 3.1e-5 3.8e-6 4.7e-7 -
GranularBitRound
Error Margin 1.4e-1 1.9e-2 2.2e-3 1.4e-4 1.8e-5 2.2e-6 -
```
If one defines decimal precision as in BitGroom, i.e. the introduced relative
error must not exceed half of the unit at the decimal place NSD in the
worst-case scenario, the following values of NSB should be used for BitRound:
```
NSD 1 2 3 4 5 6 7
NSB 3 6 9 13 16 19 23
```
The resulting application of BitRound is as fast as BitGroom, and is free from
artifacts in multipoint statistics introduced by BitGroom
(see https://doi.org/10.5194/gmd-14-377-2021).
# Debugging {#filters_debug}
Depending on the debugger one uses, debugging plugins can be very difficult.
It may be necessary to use the old printf approach for debugging the filter itself.