Merge pull request #2388 from rkouznetsov/lossydoc

Add clarification for the meaning of NSB
2024-12-09 08:11:38 +08:00 · 2023-12-19 10:08:34 -07:00 · 2023-12-19 10:08:34 -07:00 · 4bdb093fca
commit 4bdb093fca
parent 4c2899017e b55211b6c7
1 changed files with 93 additions and 1 deletions
--- a/docs/filters.md
+++ b/docs/filters.md
@ -611,7 +611,99 @@ As part of its testing, the NetCDF build process creates a number of shared libr
 If you need a filter from that set, you may be able to set *HDF5\_PLUGIN\_PATH*
 to point to that directory or you may be able to copy the shared libraries out of that directory to your own location.

-## Debugging {#filters_debug}
+# Lossy One-Way Filters
+
+As of NetCDF version 4.8.2, the netcdf-c library supports 
+bit-grooming filters.
+````
+Bit-grooming is a lossy compression algorithm that removes the
+bloat due to false-precision, those bits and bytes beyond the
+meaningful precision of the data. Bit Grooming is statistically
+unbiased, applies to all floating point numbers, and is easy to
+use. Bit-Grooming reduces data storage requirements by
+25-80%. Unlike its best-known competitor Linear Packing, Bit
+Grooming imposes no software overhead on users, and guarantees
+its precision throughout the whole floating point range 
+[https://doi.org/10.5194/gmd-9-3199-2016].
+````
+The generic term "quantize" is used to refer collectively to the various
+precision-trimming algorithms. The key thing to note about quantization is that
+it occurs at the point of writing of data only. Since its output is
+legal data, it does not need to be "de-quantized" when the data is read.
+Because of this, quantization is not part of the standard filter
+mechanism and has a separate API.
+
+The API for bit-groom is currently as follows.
+````
+int nc_def_var_quantize(int ncid, int varid, int quantize_mode, int nsd);
+int nc_inq_var_quantize(int ncid, int varid, int *quantize_modep, int *nsdp);
+````
+The *quantize_mode* argument specifies the particular algorithm.
+Currently, three are supported: NC_QUANTIZE_BITGROOM, NC_QUANTIZE_GRANULARBR,
+and NC_QUANTIZE_BITROUND. In addition quantization can be disabled using
+the value NC_NOQUANTIZE.
+
+The input to ncgen or the output from ncdump supports special attributes
+to indicate if quantization was applied to a given variable.
+These attributes have the following form.
+````
+_QuantizeBitGroomNumberOfSignificantDigits = <NSD>
+or
+_QuantizeGranularBitRoundNumberOfSignificantDigits = <NSD>
+or
+_QuantizeBitRoundNumberOfSignificantBits = <NSB>
+````
+The value NSD is the number of significant (decimal) digits  to keep.
+The value NSB is the number of bits to keep in the fraction part of an
+IEEE754 floating-point number. Note that NSB of QuantizeBitRound is the same as
+"number of explicit mantissa bits" (https://doi.org/10.5194/gmd-9-3199-2016) and same as 
+the number of "keep-bits" (https://doi.org/10.5194/gmd-14-377-2021), but is not 
+one less than the number of significant bunary figures:
+`_QuantizeBitRoundNumberOfSignificantBits = 0` means one significant binary figure,
+`_QuantizeBitRoundNumberOfSignificantBits = 1` means two significant binary figures etc.
+
+## Distortions introduced by lossy filters
+
+  Any lossy filter introduces distortions to data.
+  The lossy filters implemented in netcdf-c introduce a distortoin 
+ that can be quantified in terms of a _relative_ error. The magnitude of 
+ distortion introduced to every single value V is guaranteed to be within 
+ a certain fraction of V, expressed as 0.5 * V * 2**{-NSB}:
+ i.e. it is 0.5V for NSB=0, 0.25V for NSB=1, 0.125V for NSB=2 etc.
+ 
+
+ Two other methods use different definitions of _decimal precision_, though both
+ are guaranteed to reproduce NSD decimals when printed.
+ The margin for a relative error introduced by the methods are summarised in the table
+
+ ```
+  NSD                   1        2        3       4       5        6      7 
+
+  BitGroom   
+  Error Margin      3.1e-2  3.9e-3   4.9e-4  3.1e-5  3.8e-6    4.7e-7     -
+
+  GranularBitRound
+  Error Margin      1.4e-1  1.9e-2   2.2e-3  1.4e-4  1.8e-5    2.2e-6     - 
+   
+ ```
+ 
+
+ If one defines decimal precision as in BitGroom, i.e. the introduced relative 
+ error must not exceed half of the unit at the decimal place NSD in the 
+ worst-case scenario,  the following  values of NSB should be used for BitRound:
+
+ ```
+  NSD     1     2    3     4     5     6     7   
+  NSB     3     6    9    13    16    19    23
+ ```
+ 
+ The resulting application of BitRound is as fast as BitGroom, and is free from 
+ artifacts in multipoint  statistics introduced by BitGroom 
+ (see https://doi.org/10.5194/gmd-14-377-2021).
+
+
+# Debugging {#filters_debug}
+

 Depending on the debugger one uses, debugging plugins can be very difficult.
 It may be necessary to use the old printf approach for debugging the filter itself.