2022-01-08 09:40:44 +08:00
|
|
|
/* Copyright 2018-2022 University Corporation for Atmospheric
|
2018-07-12 21:05:21 +08:00
|
|
|
Research/Unidata. */
|
|
|
|
/**
|
|
|
|
* @file This header file contains macros, types, and prototypes for
|
|
|
|
* the HDF5 code in libhdf5. This header should not be included in
|
|
|
|
* code outside libhdf5.
|
|
|
|
*
|
|
|
|
* @author Ed Hartnett
|
2019-02-19 21:09:10 +08:00
|
|
|
*/
|
2018-07-12 21:05:21 +08:00
|
|
|
|
|
|
|
#ifndef _HDF5INTERNAL_
|
|
|
|
#define _HDF5INTERNAL_
|
|
|
|
|
|
|
|
#include "config.h"
|
2018-07-17 22:29:47 +08:00
|
|
|
#include <hdf5.h>
|
|
|
|
#include <hdf5_hl.h>
|
2018-07-16 18:50:15 +08:00
|
|
|
#include "nc4internal.h"
|
2018-07-12 21:05:21 +08:00
|
|
|
#include "ncdimscale.h"
|
2018-07-16 18:50:15 +08:00
|
|
|
#include "nc4dispatch.h"
|
2018-11-26 23:13:57 +08:00
|
|
|
#include "hdf5dispatch.h"
|
2019-03-22 01:33:27 +08:00
|
|
|
#include "netcdf_filter.h"
|
2018-07-12 21:05:21 +08:00
|
|
|
|
|
|
|
#define NC_MAX_HDF5_NAME (NC_MAX_NAME + 10)
|
|
|
|
|
2019-09-18 10:27:43 +08:00
|
|
|
/* These have to do with creating chunked datasets in HDF5. */
|
2018-07-12 21:05:21 +08:00
|
|
|
#define NC_HDF5_UNLIMITED_DIMSIZE (0)
|
|
|
|
#define NC_HDF5_CHUNKSIZE_FACTOR (10)
|
|
|
|
#define NC_HDF5_MIN_CHUNK_SIZE (2)
|
|
|
|
|
|
|
|
#define NC_EMPTY_SCALE "NC_EMPTY_SCALE"
|
|
|
|
|
|
|
|
/* This is an attribute I had to add to handle multidimensional
|
This PR adds EXPERIMENTAL support for accessing data in the
cloud using a variant of the Zarr protocol and storage
format. This enhancement is generically referred to as "NCZarr".
The data model supported by NCZarr is netcdf-4 minus the user-defined
types and the String type. In this sense it is similar to the CDF-5
data model.
More detailed information about enabling and using NCZarr is
described in the document NUG/nczarr.md and in a
[Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in).
WARNING: this code has had limited testing, so do use this version
for production work. Also, performance improvements are ongoing.
Note especially the following platform matrix of successful tests:
Platform | Build System | S3 support
------------------------------------
Linux+gcc | Automake | yes
Linux+gcc | CMake | yes
Visual Studio | CMake | no
Additionally, and as a consequence of the addition of NCZarr,
major changes have been made to the Filter API. NOTE: NCZarr
does not yet support filters, but these changes are enablers for
that support in the future. Note that it is possible
(probable?) that there will be some accidental reversions if the
changes here did not correctly mimic the existing filter testing.
In any case, previously filter ids and parameters were of type
unsigned int. In order to support the more general zarr filter
model, this was all converted to char*. The old HDF5-specific,
unsigned int operations are still supported but they are
wrappers around the new, char* based nc_filterx_XXX functions.
This entailed at least the following changes:
1. Added the files libdispatch/dfilterx.c and include/ncfilter.h
2. Some filterx utilities have been moved to libdispatch/daux.c
3. A new entry, "filter_actions" was added to the NCDispatch table
and the version bumped.
4. An overly complex set of structs was created to support funnelling
all of the filterx operations thru a single dispatch
"filter_actions" entry.
5. Move common code to from libhdf5 to libsrc4 so that it is accessible
to nczarr.
Changes directly related to Zarr:
1. Modified CMakeList.txt and configure.ac to support both C and C++
-- this is in support of S3 support via the awd-sdk libraries.
2. Define a size64_t type to support nczarr.
3. More reworking of libdispatch/dinfermodel.c to
support zarr and to regularize the structure of the fragments
section of a URL.
Changes not directly related to Zarr:
1. Make client-side filter registration be conditional, with default off.
2. Hack include/nc4internal.h to make some flags added by Ed be unique:
e.g. NC_CREAT, NC_INDEF, etc.
3. cleanup include/nchttp.h and libdispatch/dhttp.c.
4. Misc. changes to support compiling under Visual Studio including:
* Better testing under windows for dirent.h and opendir and closedir.
5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags
and to centralize error reporting.
6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them.
7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible.
Changes Left TO-DO:
1. fix provenance code, it is too HDF5 specific.
2020-06-29 08:02:47 +08:00
|
|
|
* coordinate variables. See nc4internal.h:NC_ATT_COORDINATES.
|
|
|
|
* in nc4internal.h.
|
|
|
|
*/
|
|
|
|
#define COORDINATES NC_ATT_COORDINATES
|
2018-07-12 21:05:21 +08:00
|
|
|
#define COORDINATES_LEN (NC_MAX_NAME * 5)
|
|
|
|
|
|
|
|
/* This is used when the user defines a non-coordinate variable with
|
|
|
|
* same name as a dimension. */
|
|
|
|
#define NON_COORD_PREPEND "_nc4_non_coord_"
|
|
|
|
|
|
|
|
/* An attribute in the HDF5 root group of this name means that the
|
|
|
|
* file must follow strict netCDF classic format rules. */
|
This PR adds EXPERIMENTAL support for accessing data in the
cloud using a variant of the Zarr protocol and storage
format. This enhancement is generically referred to as "NCZarr".
The data model supported by NCZarr is netcdf-4 minus the user-defined
types and the String type. In this sense it is similar to the CDF-5
data model.
More detailed information about enabling and using NCZarr is
described in the document NUG/nczarr.md and in a
[Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in).
WARNING: this code has had limited testing, so do use this version
for production work. Also, performance improvements are ongoing.
Note especially the following platform matrix of successful tests:
Platform | Build System | S3 support
------------------------------------
Linux+gcc | Automake | yes
Linux+gcc | CMake | yes
Visual Studio | CMake | no
Additionally, and as a consequence of the addition of NCZarr,
major changes have been made to the Filter API. NOTE: NCZarr
does not yet support filters, but these changes are enablers for
that support in the future. Note that it is possible
(probable?) that there will be some accidental reversions if the
changes here did not correctly mimic the existing filter testing.
In any case, previously filter ids and parameters were of type
unsigned int. In order to support the more general zarr filter
model, this was all converted to char*. The old HDF5-specific,
unsigned int operations are still supported but they are
wrappers around the new, char* based nc_filterx_XXX functions.
This entailed at least the following changes:
1. Added the files libdispatch/dfilterx.c and include/ncfilter.h
2. Some filterx utilities have been moved to libdispatch/daux.c
3. A new entry, "filter_actions" was added to the NCDispatch table
and the version bumped.
4. An overly complex set of structs was created to support funnelling
all of the filterx operations thru a single dispatch
"filter_actions" entry.
5. Move common code to from libhdf5 to libsrc4 so that it is accessible
to nczarr.
Changes directly related to Zarr:
1. Modified CMakeList.txt and configure.ac to support both C and C++
-- this is in support of S3 support via the awd-sdk libraries.
2. Define a size64_t type to support nczarr.
3. More reworking of libdispatch/dinfermodel.c to
support zarr and to regularize the structure of the fragments
section of a URL.
Changes not directly related to Zarr:
1. Make client-side filter registration be conditional, with default off.
2. Hack include/nc4internal.h to make some flags added by Ed be unique:
e.g. NC_CREAT, NC_INDEF, etc.
3. cleanup include/nchttp.h and libdispatch/dhttp.c.
4. Misc. changes to support compiling under Visual Studio including:
* Better testing under windows for dirent.h and opendir and closedir.
5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags
and to centralize error reporting.
6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them.
7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible.
Changes Left TO-DO:
1. fix provenance code, it is too HDF5 specific.
2020-06-29 08:02:47 +08:00
|
|
|
#define NC3_STRICT_ATT_NAME NC_ATT_NC3_STRICT_NAME
|
2018-07-12 21:05:21 +08:00
|
|
|
|
|
|
|
/* If this attribute is present on a dimscale variable, use the value
|
|
|
|
* as the netCDF dimid. */
|
This PR adds EXPERIMENTAL support for accessing data in the
cloud using a variant of the Zarr protocol and storage
format. This enhancement is generically referred to as "NCZarr".
The data model supported by NCZarr is netcdf-4 minus the user-defined
types and the String type. In this sense it is similar to the CDF-5
data model.
More detailed information about enabling and using NCZarr is
described in the document NUG/nczarr.md and in a
[Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in).
WARNING: this code has had limited testing, so do use this version
for production work. Also, performance improvements are ongoing.
Note especially the following platform matrix of successful tests:
Platform | Build System | S3 support
------------------------------------
Linux+gcc | Automake | yes
Linux+gcc | CMake | yes
Visual Studio | CMake | no
Additionally, and as a consequence of the addition of NCZarr,
major changes have been made to the Filter API. NOTE: NCZarr
does not yet support filters, but these changes are enablers for
that support in the future. Note that it is possible
(probable?) that there will be some accidental reversions if the
changes here did not correctly mimic the existing filter testing.
In any case, previously filter ids and parameters were of type
unsigned int. In order to support the more general zarr filter
model, this was all converted to char*. The old HDF5-specific,
unsigned int operations are still supported but they are
wrappers around the new, char* based nc_filterx_XXX functions.
This entailed at least the following changes:
1. Added the files libdispatch/dfilterx.c and include/ncfilter.h
2. Some filterx utilities have been moved to libdispatch/daux.c
3. A new entry, "filter_actions" was added to the NCDispatch table
and the version bumped.
4. An overly complex set of structs was created to support funnelling
all of the filterx operations thru a single dispatch
"filter_actions" entry.
5. Move common code to from libhdf5 to libsrc4 so that it is accessible
to nczarr.
Changes directly related to Zarr:
1. Modified CMakeList.txt and configure.ac to support both C and C++
-- this is in support of S3 support via the awd-sdk libraries.
2. Define a size64_t type to support nczarr.
3. More reworking of libdispatch/dinfermodel.c to
support zarr and to regularize the structure of the fragments
section of a URL.
Changes not directly related to Zarr:
1. Make client-side filter registration be conditional, with default off.
2. Hack include/nc4internal.h to make some flags added by Ed be unique:
e.g. NC_CREAT, NC_INDEF, etc.
3. cleanup include/nchttp.h and libdispatch/dhttp.c.
4. Misc. changes to support compiling under Visual Studio including:
* Better testing under windows for dirent.h and opendir and closedir.
5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags
and to centralize error reporting.
6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them.
7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible.
Changes Left TO-DO:
1. fix provenance code, it is too HDF5 specific.
2020-06-29 08:02:47 +08:00
|
|
|
#define NC_DIMID_ATT_NAME NC_ATT_DIMID_NAME /*See nc4internal.h*/
|
2018-07-12 21:05:21 +08:00
|
|
|
|
|
|
|
/** This is the name of the class HDF5 dimension scale attribute. */
|
This PR adds EXPERIMENTAL support for accessing data in the
cloud using a variant of the Zarr protocol and storage
format. This enhancement is generically referred to as "NCZarr".
The data model supported by NCZarr is netcdf-4 minus the user-defined
types and the String type. In this sense it is similar to the CDF-5
data model.
More detailed information about enabling and using NCZarr is
described in the document NUG/nczarr.md and in a
[Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in).
WARNING: this code has had limited testing, so do use this version
for production work. Also, performance improvements are ongoing.
Note especially the following platform matrix of successful tests:
Platform | Build System | S3 support
------------------------------------
Linux+gcc | Automake | yes
Linux+gcc | CMake | yes
Visual Studio | CMake | no
Additionally, and as a consequence of the addition of NCZarr,
major changes have been made to the Filter API. NOTE: NCZarr
does not yet support filters, but these changes are enablers for
that support in the future. Note that it is possible
(probable?) that there will be some accidental reversions if the
changes here did not correctly mimic the existing filter testing.
In any case, previously filter ids and parameters were of type
unsigned int. In order to support the more general zarr filter
model, this was all converted to char*. The old HDF5-specific,
unsigned int operations are still supported but they are
wrappers around the new, char* based nc_filterx_XXX functions.
This entailed at least the following changes:
1. Added the files libdispatch/dfilterx.c and include/ncfilter.h
2. Some filterx utilities have been moved to libdispatch/daux.c
3. A new entry, "filter_actions" was added to the NCDispatch table
and the version bumped.
4. An overly complex set of structs was created to support funnelling
all of the filterx operations thru a single dispatch
"filter_actions" entry.
5. Move common code to from libhdf5 to libsrc4 so that it is accessible
to nczarr.
Changes directly related to Zarr:
1. Modified CMakeList.txt and configure.ac to support both C and C++
-- this is in support of S3 support via the awd-sdk libraries.
2. Define a size64_t type to support nczarr.
3. More reworking of libdispatch/dinfermodel.c to
support zarr and to regularize the structure of the fragments
section of a URL.
Changes not directly related to Zarr:
1. Make client-side filter registration be conditional, with default off.
2. Hack include/nc4internal.h to make some flags added by Ed be unique:
e.g. NC_CREAT, NC_INDEF, etc.
3. cleanup include/nchttp.h and libdispatch/dhttp.c.
4. Misc. changes to support compiling under Visual Studio including:
* Better testing under windows for dirent.h and opendir and closedir.
5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags
and to centralize error reporting.
6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them.
7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible.
Changes Left TO-DO:
1. fix provenance code, it is too HDF5 specific.
2020-06-29 08:02:47 +08:00
|
|
|
#define HDF5_DIMSCALE_CLASS_ATT_NAME NC_ATT_CLASS /*See nc4internal.h*/
|
2018-07-12 21:05:21 +08:00
|
|
|
|
|
|
|
/** This is the name of the name HDF5 dimension scale attribute. */
|
This PR adds EXPERIMENTAL support for accessing data in the
cloud using a variant of the Zarr protocol and storage
format. This enhancement is generically referred to as "NCZarr".
The data model supported by NCZarr is netcdf-4 minus the user-defined
types and the String type. In this sense it is similar to the CDF-5
data model.
More detailed information about enabling and using NCZarr is
described in the document NUG/nczarr.md and in a
[Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in).
WARNING: this code has had limited testing, so do use this version
for production work. Also, performance improvements are ongoing.
Note especially the following platform matrix of successful tests:
Platform | Build System | S3 support
------------------------------------
Linux+gcc | Automake | yes
Linux+gcc | CMake | yes
Visual Studio | CMake | no
Additionally, and as a consequence of the addition of NCZarr,
major changes have been made to the Filter API. NOTE: NCZarr
does not yet support filters, but these changes are enablers for
that support in the future. Note that it is possible
(probable?) that there will be some accidental reversions if the
changes here did not correctly mimic the existing filter testing.
In any case, previously filter ids and parameters were of type
unsigned int. In order to support the more general zarr filter
model, this was all converted to char*. The old HDF5-specific,
unsigned int operations are still supported but they are
wrappers around the new, char* based nc_filterx_XXX functions.
This entailed at least the following changes:
1. Added the files libdispatch/dfilterx.c and include/ncfilter.h
2. Some filterx utilities have been moved to libdispatch/daux.c
3. A new entry, "filter_actions" was added to the NCDispatch table
and the version bumped.
4. An overly complex set of structs was created to support funnelling
all of the filterx operations thru a single dispatch
"filter_actions" entry.
5. Move common code to from libhdf5 to libsrc4 so that it is accessible
to nczarr.
Changes directly related to Zarr:
1. Modified CMakeList.txt and configure.ac to support both C and C++
-- this is in support of S3 support via the awd-sdk libraries.
2. Define a size64_t type to support nczarr.
3. More reworking of libdispatch/dinfermodel.c to
support zarr and to regularize the structure of the fragments
section of a URL.
Changes not directly related to Zarr:
1. Make client-side filter registration be conditional, with default off.
2. Hack include/nc4internal.h to make some flags added by Ed be unique:
e.g. NC_CREAT, NC_INDEF, etc.
3. cleanup include/nchttp.h and libdispatch/dhttp.c.
4. Misc. changes to support compiling under Visual Studio including:
* Better testing under windows for dirent.h and opendir and closedir.
5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags
and to centralize error reporting.
6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them.
7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible.
Changes Left TO-DO:
1. fix provenance code, it is too HDF5 specific.
2020-06-29 08:02:47 +08:00
|
|
|
#define HDF5_DIMSCALE_NAME_ATT_NAME NC_ATT_NAME
|
2018-07-12 21:05:21 +08:00
|
|
|
|
2020-11-20 08:01:04 +08:00
|
|
|
/* forward */
|
|
|
|
struct NCauth;
|
|
|
|
|
Provide byte-range reading of remote datasets
re: issue https://github.com/Unidata/netcdf-c/issues/1251
Assume that you have the URL to a remote dataset
which is a normal netcdf-3 or netcdf-4 file.
This PR allows the netcdf-c to read that dataset's
contents as a netcdf file using HTTP byte ranges
if the remote server supports byte-range access.
Originally, this PR was set up to access Amazon S3 objects,
but it can also access other remote datasets such as those
provided by a Thredds server via the HTTPServer access protocol.
It may also work for other kinds of servers.
Note that this is not intended as a true production
capability because, as is known, this kind of access to
can be quite slow. In addition, the byte-range IO drivers
do not currently do any sort of optimization or caching.
An additional goal here is to gain some experience with
the Amazon S3 REST protocol.
This architecture and its use documented in
the file docs/byterange.dox.
There are currently two test cases:
1. nc_test/tst_s3raw.c - this does a simple open, check format, close cycle
for a remote netcdf-3 file and a remote netcdf-4 file.
2. nc_test/test_s3raw.sh - this uses ncdump to investigate some remote
datasets.
This PR also incorporates significantly changed model inference code
(see the superceded PR https://github.com/Unidata/netcdf-c/pull/1259).
1. It centralizes the code that infers the dispatcher.
2. It adds support for byte-range URLs
Other changes:
1. NC_HDF5_finalize was not being properly called by nc_finalize().
2. Fix minor bug in ncgen3.l
3. fix memory leak in nc4info.c
4. add code to walk the .daprc triples and to replace protocol=
fragment tag with a more general mode= tag.
Final Note:
Th inference code is still way too complicated. We need to move
to the validfile() model used by netcdf Java, where each
dispatcher is asked if it can process the file. This decentralizes
the inference code. This will be done after all the major new
dispatchers (PIO, Zarr, etc) have been implemented.
2019-01-02 09:27:36 +08:00
|
|
|
/** Struct to hold HDF5-specific info for the file. */
|
|
|
|
typedef struct NC_HDF5_FILE_INFO {
|
2018-07-19 22:50:53 +08:00
|
|
|
hid_t hdfid;
|
2021-10-30 10:06:37 +08:00
|
|
|
#if defined(ENABLE_BYTERANGE)
|
|
|
|
int byterange;
|
|
|
|
NCURI* uri; /* Parse of the incoming path, if url */
|
|
|
|
#if defined(ENABLE_HDF5_ROS3) || defined(ENABLE_S3_SDK)
|
|
|
|
struct NCauth* auth;
|
|
|
|
#endif
|
Provide byte-range reading of remote datasets
re: issue https://github.com/Unidata/netcdf-c/issues/1251
Assume that you have the URL to a remote dataset
which is a normal netcdf-3 or netcdf-4 file.
This PR allows the netcdf-c to read that dataset's
contents as a netcdf file using HTTP byte ranges
if the remote server supports byte-range access.
Originally, this PR was set up to access Amazon S3 objects,
but it can also access other remote datasets such as those
provided by a Thredds server via the HTTPServer access protocol.
It may also work for other kinds of servers.
Note that this is not intended as a true production
capability because, as is known, this kind of access to
can be quite slow. In addition, the byte-range IO drivers
do not currently do any sort of optimization or caching.
An additional goal here is to gain some experience with
the Amazon S3 REST protocol.
This architecture and its use documented in
the file docs/byterange.dox.
There are currently two test cases:
1. nc_test/tst_s3raw.c - this does a simple open, check format, close cycle
for a remote netcdf-3 file and a remote netcdf-4 file.
2. nc_test/test_s3raw.sh - this uses ncdump to investigate some remote
datasets.
This PR also incorporates significantly changed model inference code
(see the superceded PR https://github.com/Unidata/netcdf-c/pull/1259).
1. It centralizes the code that infers the dispatcher.
2. It adds support for byte-range URLs
Other changes:
1. NC_HDF5_finalize was not being properly called by nc_finalize().
2. Fix minor bug in ncgen3.l
3. fix memory leak in nc4info.c
4. add code to walk the .daprc triples and to replace protocol=
fragment tag with a more general mode= tag.
Final Note:
Th inference code is still way too complicated. We need to move
to the validfile() model used by netcdf Java, where each
dispatcher is asked if it can process the file. This decentralizes
the inference code. This will be done after all the major new
dispatchers (PIO, Zarr, etc) have been implemented.
2019-01-02 09:27:36 +08:00
|
|
|
#endif
|
2018-07-19 22:50:53 +08:00
|
|
|
} NC_HDF5_FILE_INFO_T;
|
|
|
|
|
2018-11-08 22:09:11 +08:00
|
|
|
/* This is a struct to handle the dim metadata. */
|
|
|
|
typedef struct NC_HDF5_DIM_INFO
|
|
|
|
{
|
2019-02-19 21:09:10 +08:00
|
|
|
hid_t hdf_dimscaleid; /* Non-zero if a DIM_WITHOUT_VARIABLE dataset is in use (no coord var). */
|
|
|
|
HDF5_OBJID_T hdf5_objid;
|
2018-11-08 22:09:11 +08:00
|
|
|
} NC_HDF5_DIM_INFO_T;
|
|
|
|
|
2018-11-08 02:33:02 +08:00
|
|
|
/** Strut to hold HDF5-specific info for attributes. */
|
|
|
|
typedef struct NC_HDF5_ATT_INFO
|
|
|
|
{
|
2019-02-19 21:09:10 +08:00
|
|
|
hid_t native_hdf_typeid; /* Native HDF5 datatype for attribute's data */
|
2018-11-08 02:33:02 +08:00
|
|
|
} NC_HDF5_ATT_INFO_T;
|
|
|
|
|
2018-11-12 22:40:15 +08:00
|
|
|
/* Struct to hold HDF5-specific info for a group. */
|
|
|
|
typedef struct NC_HDF5_GRP_INFO
|
|
|
|
{
|
2019-02-19 21:09:10 +08:00
|
|
|
hid_t hdf_grpid;
|
2018-11-12 22:40:15 +08:00
|
|
|
} NC_HDF5_GRP_INFO_T;
|
|
|
|
|
2018-11-13 20:44:39 +08:00
|
|
|
/* Struct to hold HDF5-specific info for a variable. */
|
|
|
|
typedef struct NC_HDF5_VAR_INFO
|
|
|
|
{
|
2019-02-19 21:09:10 +08:00
|
|
|
hid_t hdf_datasetid;
|
|
|
|
HDF5_OBJID_T *dimscale_hdf5_objids;
|
2020-03-30 02:48:59 +08:00
|
|
|
nc_bool_t dimscale; /**< True if var is a dimscale. */
|
|
|
|
nc_bool_t *dimscale_attached; /**< Array of flags that are true if dimscale is attached for that dim index. */
|
2021-05-18 09:49:58 +08:00
|
|
|
int flags;
|
|
|
|
# define NC_HDF5_VAR_FILTER_MISSING 1 /* if any filter is missing */
|
2018-11-13 20:44:39 +08:00
|
|
|
} NC_HDF5_VAR_INFO_T;
|
|
|
|
|
|
|
|
/* Struct to hold HDF5-specific info for a field. */
|
|
|
|
typedef struct NC_HDF5_FIELD_INFO
|
|
|
|
{
|
2019-02-19 21:09:10 +08:00
|
|
|
hid_t hdf_typeid;
|
|
|
|
hid_t native_hdf_typeid;
|
2018-11-13 20:44:39 +08:00
|
|
|
} NC_HDF5_FIELD_INFO_T;
|
|
|
|
|
2018-11-20 23:00:48 +08:00
|
|
|
/* Struct to hold HDF5-specific info for a type. */
|
|
|
|
typedef struct NC_HDF5_TYPE_INFO
|
|
|
|
{
|
2019-02-19 21:09:10 +08:00
|
|
|
hid_t hdf_typeid;
|
|
|
|
hid_t native_hdf_typeid;
|
2018-11-20 23:00:48 +08:00
|
|
|
} NC_HDF5_TYPE_INFO_T;
|
|
|
|
|
2018-11-16 23:26:09 +08:00
|
|
|
/* Logging and debugging. */
|
|
|
|
void reportopenobjects(int log, hid_t);
|
|
|
|
int hdf5_set_log_level();
|
2020-12-17 11:48:02 +08:00
|
|
|
void nc_log_hdf5(void);
|
2018-11-16 23:26:09 +08:00
|
|
|
|
|
|
|
/* These functions deal with HDF5 dimension scales. */
|
2018-07-12 21:05:21 +08:00
|
|
|
int rec_detach_scales(NC_GRP_INFO_T *grp, int dimid, hid_t dimscaleid);
|
|
|
|
int rec_reattach_scales(NC_GRP_INFO_T *grp, int dimid, hid_t dimscaleid);
|
2019-01-21 00:25:04 +08:00
|
|
|
int delete_dimscale_dataset(NC_GRP_INFO_T *grp, int dimid, NC_DIM_INFO_T *dim);
|
2018-11-16 23:26:09 +08:00
|
|
|
|
|
|
|
/* Write metadata. */
|
2018-11-13 20:27:46 +08:00
|
|
|
int nc4_rec_write_metadata(NC_GRP_INFO_T *grp, nc_bool_t bad_coord_order);
|
|
|
|
int nc4_rec_write_groups_types(NC_GRP_INFO_T *grp);
|
2018-11-16 23:26:09 +08:00
|
|
|
|
|
|
|
/* Adjust the cache. */
|
2018-11-13 20:27:46 +08:00
|
|
|
int nc4_adjust_var_cache(NC_GRP_INFO_T *grp, NC_VAR_INFO_T * var);
|
2018-11-16 23:26:09 +08:00
|
|
|
|
|
|
|
/* Open a HDF5 dataset. */
|
|
|
|
int nc4_open_var_grp2(NC_GRP_INFO_T *grp, int varid, hid_t *dataset);
|
|
|
|
|
|
|
|
/* Find types. */
|
2018-11-13 20:27:46 +08:00
|
|
|
NC_TYPE_INFO_T *nc4_rec_find_hdf_type(NC_FILE_INFO_T* h5,
|
|
|
|
hid_t target_hdf_typeid);
|
2018-11-16 23:26:09 +08:00
|
|
|
int nc4_get_hdf_typeid(NC_FILE_INFO_T *h5, nc_type xtype,
|
2019-02-19 21:09:10 +08:00
|
|
|
hid_t *hdf_typeid, int endianness);
|
2018-11-16 23:26:09 +08:00
|
|
|
|
|
|
|
/* Enddef and closing files. */
|
2018-11-13 20:44:39 +08:00
|
|
|
int nc4_close_hdf5_file(NC_FILE_INFO_T *h5, int abort, NC_memio *memio);
|
2018-10-23 19:39:00 +08:00
|
|
|
int nc4_rec_grp_HDF5_del(NC_GRP_INFO_T *grp);
|
2018-11-16 23:26:09 +08:00
|
|
|
int nc4_enddef_netcdf4_file(NC_FILE_INFO_T *h5);
|
2020-03-30 02:48:59 +08:00
|
|
|
int nc4_HDF5_close_type(NC_TYPE_INFO_T* type);
|
2018-09-07 05:13:09 +08:00
|
|
|
|
2018-11-13 20:44:39 +08:00
|
|
|
/* Break & reform coordinate variables */
|
|
|
|
int nc4_break_coord_var(NC_GRP_INFO_T *grp, NC_VAR_INFO_T *coord_var, NC_DIM_INFO_T *dim);
|
|
|
|
int nc4_reform_coord_var(NC_GRP_INFO_T *grp, NC_VAR_INFO_T *coord_var, NC_DIM_INFO_T *dim);
|
2018-09-07 05:13:09 +08:00
|
|
|
|
2018-09-05 01:27:47 +08:00
|
|
|
/* In-memory functions */
|
|
|
|
extern hid_t NC4_image_init(NC_FILE_INFO_T* h5);
|
|
|
|
extern void NC4_image_finalize(void*);
|
2018-09-07 05:13:09 +08:00
|
|
|
|
2019-01-28 02:06:02 +08:00
|
|
|
/* Create HDF5 dataset for dim without a coord var. */
|
|
|
|
extern int nc4_create_dim_wo_var(NC_DIM_INFO_T *dim);
|
|
|
|
|
2019-01-28 02:10:41 +08:00
|
|
|
/* Give a var a secret HDF5 name, for use when there is a dim of this
|
|
|
|
* name, but the var is not a coord var of that dim. */
|
|
|
|
extern int nc4_give_var_secret_name(NC_VAR_INFO_T *var);
|
|
|
|
|
2018-11-26 22:49:58 +08:00
|
|
|
/* Find file, group, var, and att info, doing lazy reads if needed. */
|
|
|
|
int nc4_hdf5_find_grp_var_att(int ncid, int varid, const char *name, int attnum,
|
2018-11-30 23:59:58 +08:00
|
|
|
int use_name, char *norm_name, NC_FILE_INFO_T **h5,
|
2018-11-26 22:49:58 +08:00
|
|
|
NC_GRP_INFO_T **grp, NC_VAR_INFO_T **var,
|
|
|
|
NC_ATT_INFO_T **att);
|
|
|
|
|
2018-12-18 22:48:22 +08:00
|
|
|
/* Find var, doing lazy var metadata read if needed. */
|
|
|
|
int nc4_hdf5_find_grp_h5_var(int ncid, int varid, NC_FILE_INFO_T **h5,
|
|
|
|
NC_GRP_INFO_T **grp, NC_VAR_INFO_T **var);
|
2018-12-12 05:44:04 +08:00
|
|
|
|
2020-03-30 02:48:59 +08:00
|
|
|
int nc4_HDF5_close_att(NC_ATT_INFO_T *att);
|
|
|
|
|
2018-12-20 00:43:32 +08:00
|
|
|
/* Perform lazy read of the rest of the metadata for a var. */
|
|
|
|
int nc4_get_var_meta(NC_VAR_INFO_T *var);
|
2018-12-12 05:44:04 +08:00
|
|
|
|
2020-05-08 22:58:42 +08:00
|
|
|
/* Get the file chunk cache settings from HDF5. */
|
|
|
|
int nc4_hdf5_get_chunk_cache(int ncid, size_t *sizep, size_t *nelemsp,
|
|
|
|
float *preemptionp);
|
2020-09-28 02:43:46 +08:00
|
|
|
/* Filter Dispatch Entries */
|
|
|
|
int NC4_hdf5_def_var_filter(int ncid, int varid, unsigned int filterid, size_t nparams, const unsigned int *params);
|
|
|
|
int NC4_hdf5_inq_var_filter_ids(int ncid, int varid, size_t* nfiltersp, unsigned int *filterids);
|
|
|
|
int NC4_hdf5_inq_var_filter_info(int ncid, int varid, unsigned int filterid, size_t* nparamsp, unsigned int *params);
|
Enhance/Fix filter support
re: Discussion https://github.com/Unidata/netcdf-c/discussions/2214
The primary change is to support so-called "standard filters".
A standard filter is one that is defined by the following
netcdf-c API:
````
int nc_def_var_XXX(int ncid, int varid, size_t nparams, unsigned* params);
int nc_inq_var_XXXX(int ncid, int varid, int* usefilterp, unsigned* params);
````
So for example, zstandard would be a standard filter by defining
the functions *nc_def_var_zstandard* and *nc_inq_var_zstandard*.
In order to define these functions, we need a new dispatch function:
````
int nc_inq_filter_avail(int ncid, unsigned filterid);
````
This function, combined with the existing filter API can be used
to implement arbitrary standard filters using a simple code pattern.
Note that I would have preferred that this function return a list
of all available filters, but HDF5 does not support that functionality.
So this PR implements the dispatch function and implements
the following standard functions:
+ bzip2
+ zstandard
+ blosc
Specific test cases are also provided for HDF5 and NCZarr.
Over time, other specific standard filters will be defined.
## Primary Changes
* Add nc_inq_filter_avail() to netcdf-c API.
* Add standard filter implementations to test use of *nc_inq_filter_avail*.
* Bump the dispatch table version number and add to all the relevant
dispatch tables (libsrc, libsrcp, etc).
* Create a program to invoke nc_inq_filter_avail so that it is accessible
to shell scripts.
* Cleanup szip support to properly support szip
when HDF5 is disabled. This involves detecting
libsz separately from testing if HDF5 supports szip.
* Integrate shuffle and fletcher32 into the existing
filter API. This means that, for example, nc_def_var_fletcher32
is now a wrapper around nc_def_var_filter.
* Extend the Codec defaulting to allow multiple default shared libraries.
## Misc. Changes
* Modify configure.ac/CMakeLists.txt to look for the relevant
libraries implementing standard filters.
* Modify libnetcdf.settings to list available standard filters
(including deflate and szip).
* Add CMake test modules to locate libbz2 and libzstd.
* Cleanup the HDF5 memory manager function use in the plugins.
* remove unused file include//ncfilter.h
* remove tests for the HDF5 memory operations e.g. H5allocate_memory.
* Add flag to ncdump to force use of _Filter instead of _Deflate
or _Shuffle or _Fletcher32. Used for testing.
2022-03-15 02:39:37 +08:00
|
|
|
int NC4_hdf5_inq_filter_avail(int ncid, unsigned id);
|
2020-09-28 02:43:46 +08:00
|
|
|
|
|
|
|
/* Filterlist management */
|
|
|
|
|
|
|
|
/* The NC_VAR_INFO_T->filters field is an NClist of this struct */
|
|
|
|
struct NC_HDF5_Filter {
|
|
|
|
int flags; /**< Flags describing state of this filter. */
|
2021-05-18 09:49:58 +08:00
|
|
|
# define NC_HDF5_FILTER_MISSING 1 /* Filter implementation is not accessible */
|
2020-09-28 02:43:46 +08:00
|
|
|
unsigned int filterid; /**< ID for arbitrary filter. */
|
|
|
|
size_t nparams; /**< nparams for arbitrary filter. */
|
|
|
|
unsigned int* params; /**< Params for arbitrary filter. */
|
|
|
|
};
|
|
|
|
|
2021-09-03 07:04:26 +08:00
|
|
|
int NC4_hdf5_filter_initialize(void);
|
|
|
|
int NC4_hdf5_filter_finalize(void);
|
2020-09-28 02:43:46 +08:00
|
|
|
int NC4_hdf5_filter_remove(NC_VAR_INFO_T* var, unsigned int id);
|
|
|
|
int NC4_hdf5_filter_lookup(NC_VAR_INFO_T* var, unsigned int id, struct NC_HDF5_Filter** fi);
|
2021-05-18 09:49:58 +08:00
|
|
|
int NC4_hdf5_addfilter(NC_VAR_INFO_T* var, unsigned int id, size_t nparams, const unsigned int* params, int flags);
|
2020-09-28 02:43:46 +08:00
|
|
|
int NC4_hdf5_filter_freelist(NC_VAR_INFO_T* var);
|
2021-05-18 09:49:58 +08:00
|
|
|
int NC4_hdf5_find_missing_filter(NC_VAR_INFO_T* var, unsigned int* idp);
|
2020-02-17 03:59:33 +08:00
|
|
|
|
2021-08-24 14:45:38 +08:00
|
|
|
/* Add an attribute to the attribute list. */
|
|
|
|
int nc4_put_att(NC_GRP_INFO_T* grp, int varid, const char *name, nc_type file_type,
|
|
|
|
size_t len, const void *data, nc_type mem_type, int force);
|
|
|
|
|
2019-03-10 11:35:57 +08:00
|
|
|
/* Support functions for provenance info (defined in nc4hdf.c) */
|
|
|
|
extern int NC4_hdf5get_libversion(unsigned*,unsigned*,unsigned*);/*libsrc4/nc4hdf.c*/
|
|
|
|
extern int NC4_hdf5get_superblock(struct NC_FILE_INFO*, int*);/*libsrc4/nc4hdf.c*/
|
|
|
|
extern int NC4_isnetcdf4(struct NC_FILE_INFO*); /*libsrc4/nc4hdf.c*/
|
|
|
|
|
Codify cross-platform file paths
The netcdf-c code has to deal with a variety of platforms:
Windows, OSX, Linux, Cygwin, MSYS, etc. These platforms differ
significantly in the kind of file paths that they accept. So in
order to handle this, I have created a set of replacements for
the most common file system operations such as _open_ or _fopen_
or _access_ to manage the file path differences correctly.
A more limited version of this idea was already implemented via
the ncwinpath.h and dwinpath.c code. So this can be viewed as a
replacement for that code. And in path in many cases, the only
change that was required was to replace '#include <ncwinpath.h>'
with '#include <ncpathmgt.h>' and then replace file operation
calls with the NCxxx equivalent from ncpathmgr.h Note that
recently, the ncwinpath.h was renamed ncpathmgmt.h, so this pull
request should not require dealing with winpath.
The heart of the change is include/ncpathmgmt.h, which provides
alternate operations such as NCfopen or NCaccess and which properly
parse and rebuild path arguments to work for the platform on which
the code is executing. This mostly matters for Windows because of the
way that it uses backslash and drive letters, as compared to *nix*.
One important feature is that the user can do string manipulations
on a file path without having to worry too much about the platform
because the path management code will properly handle most mixed cases.
So one can for example concatenate a path suffix that uses forward
slashes to a Windows path and have it work correctly.
The conversion code is in libdispatch/dpathmgr.c, and the
important function there is NCpathcvt which does the proper
conversions to the local path format.
As a rule, most code should just replace their file operations with
the corresponding NCxxx ones defined in include/ncpathmgmt.h. These
NCxxx functions all call NCpathcvt on their path arguments before
executing the actual file operation.
In some rare cases, the client may need to directly use NCpathcvt,
but this should be avoided as much as possible. If there is a need
for supporting a new file operation not already in ncpathmgmt.h, then
use the code in dpathmgr.c as a template. Also please notify Unidata
so we can include it as a formal part or our supported operations.
Also, if you see an operation in the library that is not using the
NCxxx form, then please submit an issue so we can fix it.
Misc. Changes:
* Clean up the utf8 testing code; it is impossible to get some
tests to work under windows using shell scripts; the args do
not pass as utf8 but as some other encoding.
* Added an extra utf8 test case: test_unicode_path.sh
* Add a true test for HDF5 1.10.6 or later because as noted in
PR https://github.com/Unidata/netcdf-c/pull/1794,
HDF5 changed its Windows file path handling.
2021-03-05 04:41:31 +08:00
|
|
|
extern int nc4_find_default_chunksizes2(NC_GRP_INFO_T *grp, NC_VAR_INFO_T *var);
|
2020-07-12 01:28:47 +08:00
|
|
|
|
Codify cross-platform file paths
The netcdf-c code has to deal with a variety of platforms:
Windows, OSX, Linux, Cygwin, MSYS, etc. These platforms differ
significantly in the kind of file paths that they accept. So in
order to handle this, I have created a set of replacements for
the most common file system operations such as _open_ or _fopen_
or _access_ to manage the file path differences correctly.
A more limited version of this idea was already implemented via
the ncwinpath.h and dwinpath.c code. So this can be viewed as a
replacement for that code. And in path in many cases, the only
change that was required was to replace '#include <ncwinpath.h>'
with '#include <ncpathmgt.h>' and then replace file operation
calls with the NCxxx equivalent from ncpathmgr.h Note that
recently, the ncwinpath.h was renamed ncpathmgmt.h, so this pull
request should not require dealing with winpath.
The heart of the change is include/ncpathmgmt.h, which provides
alternate operations such as NCfopen or NCaccess and which properly
parse and rebuild path arguments to work for the platform on which
the code is executing. This mostly matters for Windows because of the
way that it uses backslash and drive letters, as compared to *nix*.
One important feature is that the user can do string manipulations
on a file path without having to worry too much about the platform
because the path management code will properly handle most mixed cases.
So one can for example concatenate a path suffix that uses forward
slashes to a Windows path and have it work correctly.
The conversion code is in libdispatch/dpathmgr.c, and the
important function there is NCpathcvt which does the proper
conversions to the local path format.
As a rule, most code should just replace their file operations with
the corresponding NCxxx ones defined in include/ncpathmgmt.h. These
NCxxx functions all call NCpathcvt on their path arguments before
executing the actual file operation.
In some rare cases, the client may need to directly use NCpathcvt,
but this should be avoided as much as possible. If there is a need
for supporting a new file operation not already in ncpathmgmt.h, then
use the code in dpathmgr.c as a template. Also please notify Unidata
so we can include it as a formal part or our supported operations.
Also, if you see an operation in the library that is not using the
NCxxx form, then please submit an issue so we can fix it.
Misc. Changes:
* Clean up the utf8 testing code; it is impossible to get some
tests to work under windows using shell scripts; the args do
not pass as utf8 but as some other encoding.
* Added an extra utf8 test case: test_unicode_path.sh
* Add a true test for HDF5 1.10.6 or later because as noted in
PR https://github.com/Unidata/netcdf-c/pull/1794,
HDF5 changed its Windows file path handling.
2021-03-05 04:41:31 +08:00
|
|
|
EXTERNL hid_t nc4_H5Fopen(const char *filename, unsigned flags, hid_t fapl_id);
|
|
|
|
EXTERNL hid_t nc4_H5Fcreate(const char *filename, unsigned flags, hid_t fcpl_id, hid_t fapl_id);
|
2020-07-12 01:28:47 +08:00
|
|
|
|
2022-01-08 09:40:44 +08:00
|
|
|
int hdf5set_format_compatibility(hid_t fapl_id);
|
|
|
|
|
2018-07-12 21:05:21 +08:00
|
|
|
#endif /* _HDF5INTERNAL_ */
|