mirror of
https://github.com/Unidata/netcdf-c.git
synced 2024-11-21 03:13:42 +08:00
adding quantize documentation
This commit is contained in:
parent
6a935c6812
commit
aba1f76f72
@ -95,6 +95,6 @@ obsolete/fan_utils.html bestpractices.md filters.md indexing.md
|
||||
inmemory.md DAP2.dox FAQ.md
|
||||
known_problems.md
|
||||
COPYRIGHT.dox user_defined_formats.md DAP4.md DAP4.dox
|
||||
testserver.dox byterange.dox filters.md nczarr.md auth.md)
|
||||
testserver.dox byterange.dox filters.md nczarr.md auth.md quantize.md)
|
||||
|
||||
ADD_EXTRA_DIST("${CUR_EXTRA_DIST}")
|
||||
|
@ -758,6 +758,7 @@ INPUT = \
|
||||
@abs_top_srcdir@/docs/inmeminternal.dox \
|
||||
@abs_top_srcdir@/docs/indexing.dox \
|
||||
@abs_top_srcdir@/docs/testserver.dox \
|
||||
@abs_top_srcdir@/docs/quantize.md \
|
||||
@abs_top_srcdir@/include/netcdf.h \
|
||||
@abs_top_srcdir@/include/netcdf_mem.h \
|
||||
@abs_top_srcdir@/include/netcdf_par.h \
|
||||
@ -906,7 +907,9 @@ IMAGE_PATH = @abs_top_srcdir@/docs/images/chunking2.png \
|
||||
@abs_top_srcdir@/docs/images/netcdf_architecture.png \
|
||||
@abs_top_srcdir@/docs/images/pnetcdf.png \
|
||||
@abs_top_srcdir@/docs/images/deptree.jpg \
|
||||
@abs_top_srcdir@/docs/images/InstallTreeWindows.png
|
||||
@abs_top_srcdir@/docs/images/InstallTreeWindows.png \
|
||||
@abs_top_srcdir@/docs/images/quantize_pi.png \
|
||||
@abs_top_srcdir@/docs/images/quantize_performance.png
|
||||
|
||||
# The INPUT_FILTER tag can be used to specify a program that doxygen should
|
||||
# invoke to filter for each input file. Doxygen will invoke the filter program
|
||||
|
@ -13,7 +13,7 @@ windows-binaries.md dispatch.md building-with-cmake.md CMakeLists.txt groups.dox
|
||||
notes.md install-fortran.md credits.md auth.md filters.md \
|
||||
obsolete/fan_utils.html indexing.dox inmemory.md FAQ.md \
|
||||
known_problems.md COPYRIGHT.md inmeminternal.dox testserver.dox \
|
||||
byterange.dox nczarr.md
|
||||
byterange.dox nczarr.md quantize.md
|
||||
|
||||
# Turn off parallel builds in this directory.
|
||||
.NOTPARALLEL:
|
||||
|
@ -5,7 +5,8 @@
|
||||
|
||||
# See netcdf-c/COPYRIGHT file for more info.
|
||||
|
||||
EXTRA_DIST = aqua.jpg chunking2.png compatibility3.png compression.png \
|
||||
groups.png nc4-model.png ncatts.png nc-classic-uml.png nccoords.png \
|
||||
ncfile.png pnetcdf.png terra.jpg netcdf_architecture.png \
|
||||
deptree.jpg InstallTreeWindows.png uniLogo.png
|
||||
EXTRA_DIST = aqua.jpg chunking2.png compatibility3.png \
|
||||
compression.png groups.png nc4-model.png ncatts.png \
|
||||
nc-classic-uml.png nccoords.png ncfile.png pnetcdf.png terra.jpg \
|
||||
netcdf_architecture.png deptree.jpg InstallTreeWindows.png \
|
||||
uniLogo.png quantize_pi.png quantize_performance.md
|
||||
|
BIN
docs/images/quantize_performance.png
Normal file
BIN
docs/images/quantize_performance.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 75 KiB |
BIN
docs/images/quantize_pi.png
Normal file
BIN
docs/images/quantize_pi.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 183 KiB |
290
docs/quantize.md
Normal file
290
docs/quantize.md
Normal file
@ -0,0 +1,290 @@
|
||||
# Lossy Compression with Quantize
|
||||
|
||||
## Introduction
|
||||
|
||||
The quantize feature was initially developed as part of the Community
|
||||
Codec Repository (CCR) [2]. The CCR project allows netCDF users to
|
||||
make use of HDF5 plugins (a.k.a. “filters”) which can add new
|
||||
compression and other algorithms to the HDF5 library. As part of CCR,
|
||||
the quantization algorithms were implemented as HDF5 filters.
|
||||
|
||||
However, one aspect of implementing the quantization as a filter is
|
||||
that the filter is also required when reading the data [1]. Although
|
||||
this makes sense for compression/decompression algorithms, the
|
||||
quantize algorithms are only needed when data are written. Requiring
|
||||
that the readers of the data also install the filters places an
|
||||
unnecessary burden on data readers. Furthermore, using the quantize
|
||||
filter results in data that cannot be read by netCDF-Java or versions
|
||||
of netcdf-c before 4.8.0, when support for multiple HDF5 filters was
|
||||
added. For these reasons, it was decided to merge the quantize
|
||||
algorithms into the netcdf-c library [5].
|
||||
|
||||
As part of the netcdf-c library, the quantize algorithms are available
|
||||
for netCDF/HDF5 files, and the new ncZarr format, and produce data
|
||||
files that are fully backward compatible for all versions of netcdf-c
|
||||
since 4.0, and also fully compatible with netcdf-Java.
|
||||
|
||||
## The Quantize Feature
|
||||
|
||||
The quantize algorithms assist with lossy compression by setting
|
||||
excess bits to all zeros or all ones (in alternate array values). This
|
||||
allows a subsequent compression algorithm, like the zlib-based
|
||||
deflation built into netCDF-4, to better compress the data.
|
||||
|
||||
The quantize feature is applied to a variable in a netCDF file, and
|
||||
may only be used with single or double precision floating point
|
||||
(netCDF types NC_FLOAT and NC_DOUBLE). Attempting to turn on quantize
|
||||
for any other type of netCDF variable will result in an error.
|
||||
|
||||
It should be noted that turning on quantize does not, by itself,
|
||||
reduce the size of the data. Only if subsequent compression is used
|
||||
will setting the quantize feature result in additional compression.
|
||||
|
||||
![Quantization of Pi](quantize_pi.png)
|
||||
Figure 1: The value of Pi expressed as a 32-bit floating point number,
|
||||
with different levels of quantization applied, from Number of
|
||||
Significant Digits equal to 8 (no quantization), to 1 (maximum
|
||||
quantization). The least significant bits of the significand are
|
||||
replaced with zeros, to the extent possible, while preserving the
|
||||
desired number of significant digits. In this example the Bit Grooming
|
||||
quantization algorithm is used.
|
||||
|
||||
## Quantization Algorithms
|
||||
|
||||
Three different quantization algorithms are provided in the netcdf-c
|
||||
quantize feature. Each does a somewhat different calculation to
|
||||
determine the number of bits that can be set to zeros (or ones), while
|
||||
preserving the number of significant digits specified by the user.
|
||||
|
||||
Two of the algorithms, Bit-Groom and Granular Bit-Round, accept the
|
||||
number of decimal digits to be preserved in the data. One algorithm,
|
||||
Bit-Round, accepts the number of binary bits to preserve.
|
||||
|
||||
### Bit Grooming
|
||||
|
||||
The Bit Grooming algorithm sets determines the number of bits which
|
||||
are necessary for the required number of significant decimal
|
||||
digits. This determination is made at the beginning of processing and
|
||||
is applied to all values.
|
||||
|
||||
Bit Grooming then sets excess bits of the first array value to zero,
|
||||
then excess bits of the next array value to one, and continues
|
||||
alternating between zero and one for the excess bits of every other
|
||||
array value. In this way, the average value of the array is preserved.
|
||||
|
||||
For the Bit Grooming algorithm, the NSD parameter refers to the number
|
||||
of significant decimal digits that will be preserved. The number of
|
||||
significant digits may be 1-7 for single precision floating point, or
|
||||
1-15 for double precision floating point.
|
||||
|
||||
### Granular Bit Round
|
||||
|
||||
Granular Bit Round determines the number of required bits for each
|
||||
value in the array, and uses IEEE rounding to change the data
|
||||
value. It achieves a better overall compression ratio by more
|
||||
aggressively determining the minimum number of bits required to
|
||||
preserve the specified number of decimal digit precision.
|
||||
|
||||
For the Granular Bit Round algorithm, the NSD parameter refers to the
|
||||
number of significant decimal digits that will be preserved (as with
|
||||
the Bit Grooming algorithm). The number of significant digits may be
|
||||
1-7 for single precision floating point, or 1-15 for double precision
|
||||
floating point.
|
||||
|
||||
### Bit Round
|
||||
|
||||
The Bit Round algorithm allows the user to directly specify the number
|
||||
of bits of the significant which will be preserved, and then sets
|
||||
excess bits to zero or one for alternate array values.
|
||||
|
||||
For the Bit Round algorithm, the NSD parameter refers to the number of
|
||||
significant binary digits that will be preserved. The number of
|
||||
significant digits may be 1-23 for single precision floating point, or
|
||||
1-52 for double precision floating point.
|
||||
|
||||
## Quantize Attribute
|
||||
|
||||
When the quantize feature is used, an integer attribute is added to
|
||||
the variable which contains the NSD setting. Without this attribute it
|
||||
would be impossible for readers to know that quantize had been applied
|
||||
to the data. The name of the attribute reflects the quantize algorithm
|
||||
used. In accordance with the conventions established by the NetCDF
|
||||
Users Guide, these attribute names begin with an underscore,
|
||||
indicating that they are added by the library and should not be
|
||||
modified or deleted by users [6].
|
||||
|
||||
Algorithm | Attribute Name
|
||||
----------|---------------
|
||||
Bit Groom | _QuantizeBitGroomNumberOfSignificantDigits
|
||||
Granular Bit Round | _QuantizeGranularBitRoundNumberOfSignificantDigits
|
||||
Bit Round | _QuantizeBitRoundNumberOfSignificantBits
|
||||
|
||||
Figure 2: Table showing the names of the attribute added to a variable
|
||||
after the quantize feature has been applied. The name of the attribute
|
||||
indicates the algorithm used, the integer values represents the number
|
||||
of significant decimal digits (for Bit Groom and Granular Bit Round),
|
||||
or the number of significand bits retained (for Bit Round).
|
||||
|
||||
## Handling of Fill Values
|
||||
|
||||
In a netCDF file, fill values refer to the value used for elements of
|
||||
the data not written by the user. For example, if a variable contains
|
||||
an array of 10 values, and the user only writes 8 of them, the other
|
||||
two values will be set to the fill value for that variable.
|
||||
|
||||
The fill value of a variable may be set by the user by adding an
|
||||
attribute of the same type as the variable with the name
|
||||
“_FillValue”. If present, the value of this attribute will be used as
|
||||
the fill value for that variable. If not specified, a default value
|
||||
for each type is used as the fill value. The default fill values may
|
||||
be found in the netcdf.h file.
|
||||
|
||||
When using the quantize feature, any fill values will remain
|
||||
unquantized. That is, the excess bits of any array element will not be
|
||||
changed, if that element is the fill value. This is necessary if the
|
||||
fill value is to retain its purpose as an indicator of values that
|
||||
have not been written.
|
||||
|
||||
## Using the Quantize Feature
|
||||
|
||||
Turning on the quantize feature must be done on a per-variable basis,
|
||||
after the variable has been defined, and before nc_enddef() (or its
|
||||
Fortran equivalents) have been called. (Recall that for netCDF/HDF5
|
||||
files, nc_enddef() is automatically called when data are written or
|
||||
read from a variable.)
|
||||
|
||||
In accordance with the usual NetCDF API practice, an inquiry function
|
||||
is also provided which may be called to check if quantize has been
|
||||
turned on for a variable. Calling the inquiry function is not required
|
||||
when reading the data - it is provided for user convenience.
|
||||
|
||||
### Using Quantize with the NetCDF C API
|
||||
|
||||
Quantize is available in the main branch of the netcdf-c library, and
|
||||
will be part of the next release (netcdf-c-4.9.0).
|
||||
|
||||
To turn on the quantize feature, call the nc_def_var_quantize()
|
||||
function. To inquire about whether quantize been turned on for a
|
||||
variable, use the nc_inq_var_quantize() function.
|
||||
|
||||
@code
|
||||
/* Create two variables, one float, one double. Quantization
|
||||
* may only be applied to floating point data. */
|
||||
if (nc_def_var(ncid, "var1", NC_FLOAT, NDIM1, &dimid, &varid1)) ERR;
|
||||
if (nc_def_var(ncid, "var2", NC_DOUBLE, NDIM1, &dimid, &varid2)) ERR;
|
||||
|
||||
/* Set up quantization. This will not make the data any
|
||||
* smaller, unless compression is also turned on. In this
|
||||
* case, we will set 3 significant digits. */
|
||||
if (nc_def_var_quantize(ncid, varid1, NC_QUANTIZE_BITGROOM, NSD_3)) ERR;
|
||||
if (nc_def_var_quantize(ncid, varid2, NC_QUANTIZE_BITGROOM, NSD_3)) ERR;
|
||||
|
||||
/* Set up zlib compression. This will work better because the
|
||||
* data are quantized, yielding a smaller output file. We will
|
||||
* set compression level to 1, which is usually the best
|
||||
* choice. */
|
||||
if (nc_def_var_deflate(ncid, varid1, 0, 1, 1)) ERR;
|
||||
if (nc_def_var_deflate(ncid, varid2, 0, 1, 1)) ERR;
|
||||
@endcode
|
||||
|
||||
Figure 3: Example of using the quantize feature in C. Note that the
|
||||
example also demonstrates adding zlib (a.k.a. deflate) compression to
|
||||
the variables. Without turning on the compression, use of quantize
|
||||
alone will not result in smaller data output.
|
||||
|
||||
### Using Quantize with the NetCDF Fortran 90 API
|
||||
|
||||
Quantize is available on a branch of the netcdf-fortran libraries, and
|
||||
will be merged to main after the next netcdf-c release (4.9.0) and
|
||||
will be released as part of the netCDF Fortran 90 API in the
|
||||
subsequent release of netcdf-fortran.
|
||||
|
||||
In the Fortran 90 API, quantization is turned on by using two new
|
||||
optional arguments to nf90_def_var(), the quantize_mode and the nsd
|
||||
arguments.
|
||||
|
||||
@code
|
||||
! Define some variables.
|
||||
call check(nf90_def_var(ncid, VAR1_NAME, NF90_FLOAT, dimids, varid1&
|
||||
&, deflate_level = DEFLATE_LEVEL, quantize_mode =&
|
||||
& nf90_quantize_bitgroom, nsd = 3))
|
||||
call check(nf90_def_var(ncid, VAR2_NAME, NF90_DOUBLE, dimids,&
|
||||
& varid2, contiguous = .TRUE., quantize_mode =&
|
||||
& nf90_quantize_bitgroom, nsd = 3))
|
||||
@endcode
|
||||
|
||||
Figure 4: In the Fortran 90 netCDF API, two additional optional
|
||||
parameters are available for the quantize feature, the quantize_mode
|
||||
and nsd parameters.
|
||||
|
||||
### Using Quantize with the NetCDF Fortran 77 API
|
||||
|
||||
Quantize is available on a branch of the netcdf-fortran libraries, and
|
||||
will be merged to main after the next netcdf-c release (4.9.0) and
|
||||
will be released as part of the netCDF Fortran 77 API in the
|
||||
subsequent release of netcdf-fortran.
|
||||
|
||||
@code
|
||||
C Create some variables.
|
||||
do x = 1, NVARS
|
||||
retval = nf_def_var(ncid, var_name(x), var_type(x), NDIM1,
|
||||
$ dimids, varid(x))
|
||||
if (retval .ne. nf_noerr) stop 3
|
||||
|
||||
C Turn on quantize.
|
||||
retval = nf_def_var_quantize(ncid, varid(x),
|
||||
$ NF_QUANTIZE_BITGROOM, NSD_3)
|
||||
if (retval .ne. nf_noerr) stop 3
|
||||
|
||||
C Turn on zlib compression.
|
||||
retval = nf_def_var_deflate(ncid, varid(x), 0, 1, 1)
|
||||
if (retval .ne. nf_noerr) stop 3
|
||||
end do
|
||||
@endcode
|
||||
|
||||
Figure 4: In the Fortran 77 netCDF API, nf_def_var_quantize() and
|
||||
nf_inq_var_quantize() are provided, which wrap the quantize functions
|
||||
from the C API.
|
||||
|
||||
## Performance
|
||||
|
||||
![Quantization Performance](images/quantize_performance.png)
|
||||
|
||||
Figure 5: Compression ratio of E3SM Atmosphere Model (EAM) v2 default
|
||||
monthly dataset of raw size 445 MB compressed with default netCDF
|
||||
lossless compression algorithm (DEFLATE, compression level=1) alone
|
||||
(leftmost), or after pre-filtering with one of three lossy codecs
|
||||
(BitGroom, Granular BitGroom, or BitRound) with quantization increasing
|
||||
(and precision decreasing) to the right.
|
||||
|
||||
## References
|
||||
|
||||
1. HDF5 Dynamically Loaded Filters, The HDF Group, retrieved on
|
||||
December 2, 2021 from
|
||||
https://support.hdfgroup.org/HDF5/doc/Advanced/DynamicallyLoadedFilters/HDF5DynamicallyLoadedFilters.pdf.
|
||||
|
||||
2. Hartnett, Zender, C. S., (2020), ADDITIONAL NETCDF COMPRESSION
|
||||
OPTIONS WITH THE COMMUNITY CODEC REPOSITORY (CCR), American
|
||||
Meteorological Society (AMS) Annual Meeting, retrieved on July 3, 2021
|
||||
from
|
||||
https://www.researchgate.net/publication/347726695_ADDITIONAL_NETCDF_COMPRESSION_OPTIONS_WITH_THE_COMMUNITY_CODEC_REPOSITORY_CCR.
|
||||
|
||||
3. Zender, C. S. (2016), Bit Grooming: Statistically accurate
|
||||
precision-preserving quantization with compression, evaluated in the
|
||||
netCDF Operators (NCO, v4.4.8+), Geosci. Model Dev., 9, 3199-3211,
|
||||
doi:10.5194/gmd-9-3199-2016 Retrieved on Sep 21, 2020 from
|
||||
https://www.researchgate.net/publication/301575383_Bit_Grooming_Statistically_accurate_precision-preserving_quantization_with_compression_evaluated_in_the_netCDF_Operators_NCO_v448.
|
||||
|
||||
4. Delaunay, X., A. Courtois, and F. Gouillon (2019), Evaluation of
|
||||
lossless and lossy algorithms for the compression of scientific
|
||||
datasets in netCDF-4 or HDF5 files, Geosci. Model Dev., 12(9),
|
||||
4099-4113, doi:10.5194/gmd-2018-250, retrieved on Sep 21, 2020 from
|
||||
https://www.researchgate.net/publication/335987647_Evaluation_of_lossless_and_lossy_algorithms_for_the_compression_of_scientific_datasets_in_netCDF-4_or_HDF5_files.
|
||||
|
||||
5. Hartnett, E., et. al., “Provide a way to do bit grooming before
|
||||
compression”, netcdf-c GitHub Issue #1548,
|
||||
https://github.com/Unidata/netcdf-c/issues/1548.
|
||||
|
||||
6. Rew, R., et. al., NetCDF Users Guide, Appendix A: Attribute
|
||||
Conventions, Unidata,
|
||||
https://docs.unidata.ucar.edu/netcdf-c/current/attribute_conventions.html.
|
Loading…
Reference in New Issue
Block a user