This PR adds EXPERIMENTAL support for accessing data in the
cloud using a variant of the Zarr protocol and storage
format. This enhancement is generically referred to as "NCZarr".
The data model supported by NCZarr is netcdf-4 minus the user-defined
types and the String type. In this sense it is similar to the CDF-5
data model.
More detailed information about enabling and using NCZarr is
described in the document NUG/nczarr.md and in a
[Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in).
WARNING: this code has had limited testing, so do use this version
for production work. Also, performance improvements are ongoing.
Note especially the following platform matrix of successful tests:
Platform | Build System | S3 support
------------------------------------
Linux+gcc | Automake | yes
Linux+gcc | CMake | yes
Visual Studio | CMake | no
Additionally, and as a consequence of the addition of NCZarr,
major changes have been made to the Filter API. NOTE: NCZarr
does not yet support filters, but these changes are enablers for
that support in the future. Note that it is possible
(probable?) that there will be some accidental reversions if the
changes here did not correctly mimic the existing filter testing.
In any case, previously filter ids and parameters were of type
unsigned int. In order to support the more general zarr filter
model, this was all converted to char*. The old HDF5-specific,
unsigned int operations are still supported but they are
wrappers around the new, char* based nc_filterx_XXX functions.
This entailed at least the following changes:
1. Added the files libdispatch/dfilterx.c and include/ncfilter.h
2. Some filterx utilities have been moved to libdispatch/daux.c
3. A new entry, "filter_actions" was added to the NCDispatch table
and the version bumped.
4. An overly complex set of structs was created to support funnelling
all of the filterx operations thru a single dispatch
"filter_actions" entry.
5. Move common code to from libhdf5 to libsrc4 so that it is accessible
to nczarr.
Changes directly related to Zarr:
1. Modified CMakeList.txt and configure.ac to support both C and C++
-- this is in support of S3 support via the awd-sdk libraries.
2. Define a size64_t type to support nczarr.
3. More reworking of libdispatch/dinfermodel.c to
support zarr and to regularize the structure of the fragments
section of a URL.
Changes not directly related to Zarr:
1. Make client-side filter registration be conditional, with default off.
2. Hack include/nc4internal.h to make some flags added by Ed be unique:
e.g. NC_CREAT, NC_INDEF, etc.
3. cleanup include/nchttp.h and libdispatch/dhttp.c.
4. Misc. changes to support compiling under Visual Studio including:
* Better testing under windows for dirent.h and opendir and closedir.
5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags
and to centralize error reporting.
6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them.
7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible.
Changes Left TO-DO:
1. fix provenance code, it is too HDF5 specific.
2020-06-29 08:02:47 +08:00
|
|
|
/* Copyright 2018, University Corporation for Atmospheric
|
|
|
|
* Research. See COPYRIGHT file for copying and redistribution
|
|
|
|
* conditions. */
|
|
|
|
|
|
|
|
/**
|
|
|
|
* @file @internal The functions which control NCZ
|
|
|
|
* caching. These caching controls allow the user to change the cache
|
|
|
|
* sizes of ZARR before opening files.
|
|
|
|
*
|
|
|
|
* @author Dennis Heimbigner, Ed Hartnett
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include "zincludes.h"
|
|
|
|
#include "zcache.h"
|
|
|
|
|
|
|
|
#undef DEBUG
|
|
|
|
|
|
|
|
#undef FILLONREAD
|
|
|
|
|
|
|
|
#undef FLUSH
|
|
|
|
|
|
|
|
/* Forward */
|
|
|
|
static int get_chunk(NCZChunkCache* cache, const char* key, NCZCacheEntry* entry);
|
|
|
|
static int put_chunk(NCZChunkCache* cache, const char* key, const NCZCacheEntry*);
|
|
|
|
static int create_chunk(NCZChunkCache* cache, const char* key, NCZCacheEntry* entry);
|
|
|
|
static int buildchunkkey(size_t R, const size64_t* chunkindices, char** keyp);
|
|
|
|
static int makeroom(NCZChunkCache* cache);
|
|
|
|
|
|
|
|
/**************************************************/
|
|
|
|
/* Dispatch table per-var cache functions */
|
|
|
|
|
|
|
|
/**
|
|
|
|
* @internal Set chunk cache size for a variable. This is the internal
|
|
|
|
* function called by nc_set_var_chunk_cache().
|
|
|
|
*
|
|
|
|
* @param ncid File ID.
|
|
|
|
* @param varid Variable ID.
|
|
|
|
* @param size Size in bytes to set cache.
|
|
|
|
* @param nelems # of entries in cache
|
|
|
|
* @param preemption Controls cache swapping.
|
|
|
|
*
|
|
|
|
* @returns ::NC_NOERR No error.
|
|
|
|
* @returns ::NC_EBADID Bad ncid.
|
|
|
|
* @returns ::NC_ENOTVAR Invalid variable ID.
|
|
|
|
* @returns ::NC_ESTRICTNC3 Attempting netcdf-4 operation on strict
|
|
|
|
* nc3 netcdf-4 file.
|
|
|
|
* @returns ::NC_EINVAL Invalid input.
|
|
|
|
* @returns ::NC_EHDFERR HDF5 error.
|
|
|
|
* @author Ed Hartnett
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
NCZ_set_var_chunk_cache(int ncid, int varid, size_t cachesize, size_t nelems, float preemption)
|
|
|
|
{
|
|
|
|
NC_GRP_INFO_T *grp;
|
|
|
|
NC_FILE_INFO_T *h5;
|
|
|
|
NC_VAR_INFO_T *var;
|
|
|
|
NCZ_VAR_INFO_T *zvar;
|
|
|
|
int retval;
|
|
|
|
|
|
|
|
/* Check input for validity. */
|
|
|
|
if (preemption < 0 || preemption > 1)
|
|
|
|
return NC_EINVAL;
|
|
|
|
|
|
|
|
/* Find info for this file and group, and set pointer to each. */
|
|
|
|
if ((retval = nc4_find_nc_grp_h5(ncid, NULL, &grp, &h5)))
|
|
|
|
return retval;
|
|
|
|
assert(grp && h5);
|
|
|
|
|
|
|
|
/* Find the var. */
|
|
|
|
if (!(var = (NC_VAR_INFO_T *)ncindexith(grp->vars, varid)))
|
|
|
|
return NC_ENOTVAR;
|
|
|
|
assert(var && var->hdr.id == varid);
|
|
|
|
|
|
|
|
zvar = (NCZ_VAR_INFO_T*)var->format_var_info;
|
|
|
|
assert(zvar != NULL && zvar->cache != NULL);
|
|
|
|
|
|
|
|
/* Set the values. */
|
|
|
|
var->chunk_cache_size = cachesize;
|
|
|
|
var->chunk_cache_nelems = nelems;
|
|
|
|
var->chunk_cache_preemption = preemption;
|
|
|
|
|
|
|
|
#ifdef LOOK
|
|
|
|
/* Reopen the dataset to bring new settings into effect. */
|
|
|
|
if ((retval = nc4_reopen_dataset(grp, var)))
|
|
|
|
return retval;
|
|
|
|
#endif
|
|
|
|
return NC_NOERR;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* @internal Adjust the chunk cache of a var for better
|
|
|
|
* performance.
|
|
|
|
*
|
|
|
|
* @note For contiguous and compact storage vars, or when parallel I/O
|
|
|
|
* is in use, this function will do nothing and return ::NC_NOERR;
|
|
|
|
*
|
|
|
|
* @param grp Pointer to group info struct.
|
|
|
|
* @param var Pointer to var info struct.
|
|
|
|
*
|
|
|
|
* @return ::NC_NOERR No error.
|
|
|
|
* @author Ed Hartnett
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
NCZ_adjust_var_cache(NC_GRP_INFO_T *grp, NC_VAR_INFO_T *var)
|
|
|
|
{
|
2020-11-20 08:01:04 +08:00
|
|
|
/* Reset the cache parameters since var chunking may have changed */
|
|
|
|
|
|
|
|
|
This PR adds EXPERIMENTAL support for accessing data in the
cloud using a variant of the Zarr protocol and storage
format. This enhancement is generically referred to as "NCZarr".
The data model supported by NCZarr is netcdf-4 minus the user-defined
types and the String type. In this sense it is similar to the CDF-5
data model.
More detailed information about enabling and using NCZarr is
described in the document NUG/nczarr.md and in a
[Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in).
WARNING: this code has had limited testing, so do use this version
for production work. Also, performance improvements are ongoing.
Note especially the following platform matrix of successful tests:
Platform | Build System | S3 support
------------------------------------
Linux+gcc | Automake | yes
Linux+gcc | CMake | yes
Visual Studio | CMake | no
Additionally, and as a consequence of the addition of NCZarr,
major changes have been made to the Filter API. NOTE: NCZarr
does not yet support filters, but these changes are enablers for
that support in the future. Note that it is possible
(probable?) that there will be some accidental reversions if the
changes here did not correctly mimic the existing filter testing.
In any case, previously filter ids and parameters were of type
unsigned int. In order to support the more general zarr filter
model, this was all converted to char*. The old HDF5-specific,
unsigned int operations are still supported but they are
wrappers around the new, char* based nc_filterx_XXX functions.
This entailed at least the following changes:
1. Added the files libdispatch/dfilterx.c and include/ncfilter.h
2. Some filterx utilities have been moved to libdispatch/daux.c
3. A new entry, "filter_actions" was added to the NCDispatch table
and the version bumped.
4. An overly complex set of structs was created to support funnelling
all of the filterx operations thru a single dispatch
"filter_actions" entry.
5. Move common code to from libhdf5 to libsrc4 so that it is accessible
to nczarr.
Changes directly related to Zarr:
1. Modified CMakeList.txt and configure.ac to support both C and C++
-- this is in support of S3 support via the awd-sdk libraries.
2. Define a size64_t type to support nczarr.
3. More reworking of libdispatch/dinfermodel.c to
support zarr and to regularize the structure of the fragments
section of a URL.
Changes not directly related to Zarr:
1. Make client-side filter registration be conditional, with default off.
2. Hack include/nc4internal.h to make some flags added by Ed be unique:
e.g. NC_CREAT, NC_INDEF, etc.
3. cleanup include/nchttp.h and libdispatch/dhttp.c.
4. Misc. changes to support compiling under Visual Studio including:
* Better testing under windows for dirent.h and opendir and closedir.
5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags
and to centralize error reporting.
6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them.
7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible.
Changes Left TO-DO:
1. fix provenance code, it is too HDF5 specific.
2020-06-29 08:02:47 +08:00
|
|
|
return NC_NOERR;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**************************************************/
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Create a chunk cache object
|
|
|
|
*
|
|
|
|
* @param var containing var
|
|
|
|
* @param entrysize Size in bytes of an entry
|
|
|
|
* @param cachep return cache pointer
|
|
|
|
*
|
|
|
|
* @return ::NC_NOERR No error.
|
|
|
|
* @return ::NC_EINVAL Bad preemption.
|
|
|
|
* @author Dennis Heimbigner, Ed Hartnett
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
NCZ_create_chunk_cache(NC_VAR_INFO_T* var, size64_t chunksize, NCZChunkCache** cachep)
|
|
|
|
{
|
|
|
|
int stat = NC_NOERR;
|
|
|
|
NCZChunkCache* cache = NULL;
|
|
|
|
void* fill = NULL;
|
|
|
|
size_t nelems, cachesize;
|
2020-11-20 08:01:04 +08:00
|
|
|
NCZ_VAR_INFO_T* zvar = NULL;
|
This PR adds EXPERIMENTAL support for accessing data in the
cloud using a variant of the Zarr protocol and storage
format. This enhancement is generically referred to as "NCZarr".
The data model supported by NCZarr is netcdf-4 minus the user-defined
types and the String type. In this sense it is similar to the CDF-5
data model.
More detailed information about enabling and using NCZarr is
described in the document NUG/nczarr.md and in a
[Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in).
WARNING: this code has had limited testing, so do use this version
for production work. Also, performance improvements are ongoing.
Note especially the following platform matrix of successful tests:
Platform | Build System | S3 support
------------------------------------
Linux+gcc | Automake | yes
Linux+gcc | CMake | yes
Visual Studio | CMake | no
Additionally, and as a consequence of the addition of NCZarr,
major changes have been made to the Filter API. NOTE: NCZarr
does not yet support filters, but these changes are enablers for
that support in the future. Note that it is possible
(probable?) that there will be some accidental reversions if the
changes here did not correctly mimic the existing filter testing.
In any case, previously filter ids and parameters were of type
unsigned int. In order to support the more general zarr filter
model, this was all converted to char*. The old HDF5-specific,
unsigned int operations are still supported but they are
wrappers around the new, char* based nc_filterx_XXX functions.
This entailed at least the following changes:
1. Added the files libdispatch/dfilterx.c and include/ncfilter.h
2. Some filterx utilities have been moved to libdispatch/daux.c
3. A new entry, "filter_actions" was added to the NCDispatch table
and the version bumped.
4. An overly complex set of structs was created to support funnelling
all of the filterx operations thru a single dispatch
"filter_actions" entry.
5. Move common code to from libhdf5 to libsrc4 so that it is accessible
to nczarr.
Changes directly related to Zarr:
1. Modified CMakeList.txt and configure.ac to support both C and C++
-- this is in support of S3 support via the awd-sdk libraries.
2. Define a size64_t type to support nczarr.
3. More reworking of libdispatch/dinfermodel.c to
support zarr and to regularize the structure of the fragments
section of a URL.
Changes not directly related to Zarr:
1. Make client-side filter registration be conditional, with default off.
2. Hack include/nc4internal.h to make some flags added by Ed be unique:
e.g. NC_CREAT, NC_INDEF, etc.
3. cleanup include/nchttp.h and libdispatch/dhttp.c.
4. Misc. changes to support compiling under Visual Studio including:
* Better testing under windows for dirent.h and opendir and closedir.
5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags
and to centralize error reporting.
6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them.
7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible.
Changes Left TO-DO:
1. fix provenance code, it is too HDF5 specific.
2020-06-29 08:02:47 +08:00
|
|
|
|
|
|
|
if(chunksize == 0) return NC_EINVAL;
|
|
|
|
|
2020-11-20 08:01:04 +08:00
|
|
|
zvar = (NCZ_VAR_INFO_T*)var->format_var_info;
|
|
|
|
|
This PR adds EXPERIMENTAL support for accessing data in the
cloud using a variant of the Zarr protocol and storage
format. This enhancement is generically referred to as "NCZarr".
The data model supported by NCZarr is netcdf-4 minus the user-defined
types and the String type. In this sense it is similar to the CDF-5
data model.
More detailed information about enabling and using NCZarr is
described in the document NUG/nczarr.md and in a
[Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in).
WARNING: this code has had limited testing, so do use this version
for production work. Also, performance improvements are ongoing.
Note especially the following platform matrix of successful tests:
Platform | Build System | S3 support
------------------------------------
Linux+gcc | Automake | yes
Linux+gcc | CMake | yes
Visual Studio | CMake | no
Additionally, and as a consequence of the addition of NCZarr,
major changes have been made to the Filter API. NOTE: NCZarr
does not yet support filters, but these changes are enablers for
that support in the future. Note that it is possible
(probable?) that there will be some accidental reversions if the
changes here did not correctly mimic the existing filter testing.
In any case, previously filter ids and parameters were of type
unsigned int. In order to support the more general zarr filter
model, this was all converted to char*. The old HDF5-specific,
unsigned int operations are still supported but they are
wrappers around the new, char* based nc_filterx_XXX functions.
This entailed at least the following changes:
1. Added the files libdispatch/dfilterx.c and include/ncfilter.h
2. Some filterx utilities have been moved to libdispatch/daux.c
3. A new entry, "filter_actions" was added to the NCDispatch table
and the version bumped.
4. An overly complex set of structs was created to support funnelling
all of the filterx operations thru a single dispatch
"filter_actions" entry.
5. Move common code to from libhdf5 to libsrc4 so that it is accessible
to nczarr.
Changes directly related to Zarr:
1. Modified CMakeList.txt and configure.ac to support both C and C++
-- this is in support of S3 support via the awd-sdk libraries.
2. Define a size64_t type to support nczarr.
3. More reworking of libdispatch/dinfermodel.c to
support zarr and to regularize the structure of the fragments
section of a URL.
Changes not directly related to Zarr:
1. Make client-side filter registration be conditional, with default off.
2. Hack include/nc4internal.h to make some flags added by Ed be unique:
e.g. NC_CREAT, NC_INDEF, etc.
3. cleanup include/nchttp.h and libdispatch/dhttp.c.
4. Misc. changes to support compiling under Visual Studio including:
* Better testing under windows for dirent.h and opendir and closedir.
5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags
and to centralize error reporting.
6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them.
7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible.
Changes Left TO-DO:
1. fix provenance code, it is too HDF5 specific.
2020-06-29 08:02:47 +08:00
|
|
|
if((cache = calloc(1,sizeof(NCZChunkCache))) == NULL)
|
|
|
|
{stat = NC_ENOMEM; goto done;}
|
|
|
|
cache->var = var;
|
2020-11-20 08:01:04 +08:00
|
|
|
cache->ndims = var->ndims + zvar->scalar;
|
This PR adds EXPERIMENTAL support for accessing data in the
cloud using a variant of the Zarr protocol and storage
format. This enhancement is generically referred to as "NCZarr".
The data model supported by NCZarr is netcdf-4 minus the user-defined
types and the String type. In this sense it is similar to the CDF-5
data model.
More detailed information about enabling and using NCZarr is
described in the document NUG/nczarr.md and in a
[Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in).
WARNING: this code has had limited testing, so do use this version
for production work. Also, performance improvements are ongoing.
Note especially the following platform matrix of successful tests:
Platform | Build System | S3 support
------------------------------------
Linux+gcc | Automake | yes
Linux+gcc | CMake | yes
Visual Studio | CMake | no
Additionally, and as a consequence of the addition of NCZarr,
major changes have been made to the Filter API. NOTE: NCZarr
does not yet support filters, but these changes are enablers for
that support in the future. Note that it is possible
(probable?) that there will be some accidental reversions if the
changes here did not correctly mimic the existing filter testing.
In any case, previously filter ids and parameters were of type
unsigned int. In order to support the more general zarr filter
model, this was all converted to char*. The old HDF5-specific,
unsigned int operations are still supported but they are
wrappers around the new, char* based nc_filterx_XXX functions.
This entailed at least the following changes:
1. Added the files libdispatch/dfilterx.c and include/ncfilter.h
2. Some filterx utilities have been moved to libdispatch/daux.c
3. A new entry, "filter_actions" was added to the NCDispatch table
and the version bumped.
4. An overly complex set of structs was created to support funnelling
all of the filterx operations thru a single dispatch
"filter_actions" entry.
5. Move common code to from libhdf5 to libsrc4 so that it is accessible
to nczarr.
Changes directly related to Zarr:
1. Modified CMakeList.txt and configure.ac to support both C and C++
-- this is in support of S3 support via the awd-sdk libraries.
2. Define a size64_t type to support nczarr.
3. More reworking of libdispatch/dinfermodel.c to
support zarr and to regularize the structure of the fragments
section of a URL.
Changes not directly related to Zarr:
1. Make client-side filter registration be conditional, with default off.
2. Hack include/nc4internal.h to make some flags added by Ed be unique:
e.g. NC_CREAT, NC_INDEF, etc.
3. cleanup include/nchttp.h and libdispatch/dhttp.c.
4. Misc. changes to support compiling under Visual Studio including:
* Better testing under windows for dirent.h and opendir and closedir.
5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags
and to centralize error reporting.
6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them.
7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible.
Changes Left TO-DO:
1. fix provenance code, it is too HDF5 specific.
2020-06-29 08:02:47 +08:00
|
|
|
cache->chunksize = chunksize;
|
|
|
|
assert(cache->fillchunk == NULL);
|
|
|
|
cache->fillchunk = NULL;
|
|
|
|
|
|
|
|
/* Figure out the actual cache size */
|
|
|
|
cachesize = var->chunk_cache_size;
|
|
|
|
nelems = (cachesize / chunksize);
|
|
|
|
if(nelems == 0) nelems = 1;
|
|
|
|
/* Make consistent */
|
|
|
|
cachesize = nelems * chunksize;
|
|
|
|
cache->maxentries = nelems;
|
|
|
|
#ifdef FLUSH
|
|
|
|
cache->maxentries = 1;
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#ifdef DEBUG
|
|
|
|
fprintf(stderr,"%s.cache: nelems=%ld size=%ld\n",
|
|
|
|
var->hdr.name,(unsigned long)cache->maxentries,(unsigned long)(cache->maxentries*cache->chunksize));
|
|
|
|
#endif
|
|
|
|
if((cache->entries = nclistnew()) == NULL)
|
|
|
|
{stat = NC_ENOMEM; goto done;}
|
|
|
|
nclistsetalloc(cache->entries,cache->maxentries);
|
|
|
|
if(cachep) {*cachep = cache; cache = NULL;}
|
|
|
|
done:
|
|
|
|
nullfree(fill);
|
|
|
|
nullfree(cache);
|
|
|
|
return THROW(stat);
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
NCZ_free_chunk_cache(NCZChunkCache* cache)
|
|
|
|
{
|
|
|
|
if(cache == NULL) return;
|
|
|
|
/* Iterate over the entries */
|
|
|
|
while(nclistlength(cache->entries) > 0) {
|
|
|
|
NCZCacheEntry* entry = nclistremove(cache->entries,0);
|
|
|
|
nullfree(entry->data); nullfree(entry->key); nullfree(entry);
|
|
|
|
}
|
|
|
|
#ifdef DEBUG
|
|
|
|
fprintf(stderr,"|cache.free|=%ld\n",nclistlength(cache->entries));
|
|
|
|
#endif
|
|
|
|
nclistfree(cache->entries);
|
|
|
|
cache->entries = NULL;
|
|
|
|
nullfree(cache->fillchunk);
|
|
|
|
nullfree(cache);
|
|
|
|
}
|
|
|
|
|
|
|
|
size64_t
|
|
|
|
NCZ_cache_entrysize(NCZChunkCache* cache)
|
|
|
|
{
|
|
|
|
assert(cache);
|
|
|
|
return cache->chunksize;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Return number of active entries in cache */
|
|
|
|
size64_t
|
|
|
|
NCZ_cache_size(NCZChunkCache* cache)
|
|
|
|
{
|
|
|
|
assert(cache);
|
|
|
|
return nclistlength(cache->entries);
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
NCZ_read_cache_chunk(NCZChunkCache* cache, const size64_t* indices, void** datap)
|
|
|
|
{
|
|
|
|
int stat = NC_NOERR;
|
|
|
|
char* key = NULL;
|
2020-11-20 08:01:04 +08:00
|
|
|
int rank = cache->ndims;
|
This PR adds EXPERIMENTAL support for accessing data in the
cloud using a variant of the Zarr protocol and storage
format. This enhancement is generically referred to as "NCZarr".
The data model supported by NCZarr is netcdf-4 minus the user-defined
types and the String type. In this sense it is similar to the CDF-5
data model.
More detailed information about enabling and using NCZarr is
described in the document NUG/nczarr.md and in a
[Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in).
WARNING: this code has had limited testing, so do use this version
for production work. Also, performance improvements are ongoing.
Note especially the following platform matrix of successful tests:
Platform | Build System | S3 support
------------------------------------
Linux+gcc | Automake | yes
Linux+gcc | CMake | yes
Visual Studio | CMake | no
Additionally, and as a consequence of the addition of NCZarr,
major changes have been made to the Filter API. NOTE: NCZarr
does not yet support filters, but these changes are enablers for
that support in the future. Note that it is possible
(probable?) that there will be some accidental reversions if the
changes here did not correctly mimic the existing filter testing.
In any case, previously filter ids and parameters were of type
unsigned int. In order to support the more general zarr filter
model, this was all converted to char*. The old HDF5-specific,
unsigned int operations are still supported but they are
wrappers around the new, char* based nc_filterx_XXX functions.
This entailed at least the following changes:
1. Added the files libdispatch/dfilterx.c and include/ncfilter.h
2. Some filterx utilities have been moved to libdispatch/daux.c
3. A new entry, "filter_actions" was added to the NCDispatch table
and the version bumped.
4. An overly complex set of structs was created to support funnelling
all of the filterx operations thru a single dispatch
"filter_actions" entry.
5. Move common code to from libhdf5 to libsrc4 so that it is accessible
to nczarr.
Changes directly related to Zarr:
1. Modified CMakeList.txt and configure.ac to support both C and C++
-- this is in support of S3 support via the awd-sdk libraries.
2. Define a size64_t type to support nczarr.
3. More reworking of libdispatch/dinfermodel.c to
support zarr and to regularize the structure of the fragments
section of a URL.
Changes not directly related to Zarr:
1. Make client-side filter registration be conditional, with default off.
2. Hack include/nc4internal.h to make some flags added by Ed be unique:
e.g. NC_CREAT, NC_INDEF, etc.
3. cleanup include/nchttp.h and libdispatch/dhttp.c.
4. Misc. changes to support compiling under Visual Studio including:
* Better testing under windows for dirent.h and opendir and closedir.
5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags
and to centralize error reporting.
6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them.
7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible.
Changes Left TO-DO:
1. fix provenance code, it is too HDF5 specific.
2020-06-29 08:02:47 +08:00
|
|
|
NC_FILE_INFO_T* file = cache->var->container->nc4_info;
|
|
|
|
NCZCacheEntry* entry = NULL;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
/* Create the key for this cache */
|
|
|
|
if((stat = NCZ_buildchunkpath(cache,indices,&key))) goto done;
|
|
|
|
|
|
|
|
/* See if already in cache try MRU */
|
|
|
|
for(i=nclistlength(cache->entries)-1;i>=0;i--) {
|
|
|
|
entry = (NCZCacheEntry*)nclistget(cache->entries,i);
|
|
|
|
if(strcmp(key,entry->key)==0) {
|
|
|
|
if(datap) *datap = entry->data;
|
|
|
|
/* Move to keep MRU at end */
|
|
|
|
nclistremove(cache->entries,i);
|
|
|
|
break;
|
|
|
|
} else entry = NULL;
|
|
|
|
}
|
|
|
|
if(entry == NULL) { /*!found*/
|
|
|
|
/* Make room in the cache */
|
|
|
|
if((stat=makeroom(cache))) goto done;
|
|
|
|
/* Create a new entry */
|
|
|
|
if((entry = calloc(1,sizeof(NCZCacheEntry)))==NULL)
|
|
|
|
{stat = NC_ENOMEM; goto done;}
|
|
|
|
memcpy(entry->indices,indices,rank*sizeof(size64_t));
|
|
|
|
/* Create the local copy space */
|
|
|
|
if((entry->data = calloc(1,cache->chunksize)) == NULL)
|
|
|
|
{stat = NC_ENOMEM; goto done;}
|
|
|
|
entry->key= key; key = NULL;
|
|
|
|
/* Try to read the object in toto */
|
|
|
|
stat=get_chunk(cache,entry->key,entry);
|
|
|
|
switch (stat) {
|
|
|
|
case NC_NOERR: break;
|
|
|
|
case NC_EEMPTY:
|
|
|
|
case NC_ENOTFOUND: /*signals the chunk needs to be created */
|
|
|
|
/* If the file is read-only, then fake the chunk */
|
|
|
|
entry->modified = (!file->no_write);
|
|
|
|
if(!file->no_write) {
|
|
|
|
if((stat = create_chunk(cache,entry->key,entry))) goto done;
|
|
|
|
}
|
|
|
|
#ifdef FILLONREAD
|
|
|
|
/* apply fill value */
|
|
|
|
memcpy(entry->data,cache->fillchunk,cache->chunksize);
|
|
|
|
#else
|
|
|
|
memset(entry->data,0,cache->chunksize);
|
|
|
|
#endif
|
|
|
|
break;
|
|
|
|
default: goto done;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
nclistpush(cache->entries,entry);
|
|
|
|
#ifdef DEBUG
|
|
|
|
fprintf(stderr,"|cache.read.lru|=%ld\n",nclistlength(cache->entries));
|
|
|
|
#endif
|
|
|
|
if(datap) *datap = entry->data;
|
|
|
|
entry = NULL;
|
|
|
|
|
|
|
|
done:
|
|
|
|
if(entry) {nullfree(entry->data); nullfree(entry->key);}
|
|
|
|
nullfree(entry);
|
|
|
|
nullfree(key);
|
|
|
|
return THROW(stat);
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
NCZ_write_cache_chunk(NCZChunkCache* cache, const size64_t* indices, void** datap)
|
|
|
|
{
|
|
|
|
int stat = NC_NOERR;
|
|
|
|
char* key = NULL;
|
2020-11-20 08:01:04 +08:00
|
|
|
int i,rank = cache->ndims;
|
This PR adds EXPERIMENTAL support for accessing data in the
cloud using a variant of the Zarr protocol and storage
format. This enhancement is generically referred to as "NCZarr".
The data model supported by NCZarr is netcdf-4 minus the user-defined
types and the String type. In this sense it is similar to the CDF-5
data model.
More detailed information about enabling and using NCZarr is
described in the document NUG/nczarr.md and in a
[Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in).
WARNING: this code has had limited testing, so do use this version
for production work. Also, performance improvements are ongoing.
Note especially the following platform matrix of successful tests:
Platform | Build System | S3 support
------------------------------------
Linux+gcc | Automake | yes
Linux+gcc | CMake | yes
Visual Studio | CMake | no
Additionally, and as a consequence of the addition of NCZarr,
major changes have been made to the Filter API. NOTE: NCZarr
does not yet support filters, but these changes are enablers for
that support in the future. Note that it is possible
(probable?) that there will be some accidental reversions if the
changes here did not correctly mimic the existing filter testing.
In any case, previously filter ids and parameters were of type
unsigned int. In order to support the more general zarr filter
model, this was all converted to char*. The old HDF5-specific,
unsigned int operations are still supported but they are
wrappers around the new, char* based nc_filterx_XXX functions.
This entailed at least the following changes:
1. Added the files libdispatch/dfilterx.c and include/ncfilter.h
2. Some filterx utilities have been moved to libdispatch/daux.c
3. A new entry, "filter_actions" was added to the NCDispatch table
and the version bumped.
4. An overly complex set of structs was created to support funnelling
all of the filterx operations thru a single dispatch
"filter_actions" entry.
5. Move common code to from libhdf5 to libsrc4 so that it is accessible
to nczarr.
Changes directly related to Zarr:
1. Modified CMakeList.txt and configure.ac to support both C and C++
-- this is in support of S3 support via the awd-sdk libraries.
2. Define a size64_t type to support nczarr.
3. More reworking of libdispatch/dinfermodel.c to
support zarr and to regularize the structure of the fragments
section of a URL.
Changes not directly related to Zarr:
1. Make client-side filter registration be conditional, with default off.
2. Hack include/nc4internal.h to make some flags added by Ed be unique:
e.g. NC_CREAT, NC_INDEF, etc.
3. cleanup include/nchttp.h and libdispatch/dhttp.c.
4. Misc. changes to support compiling under Visual Studio including:
* Better testing under windows for dirent.h and opendir and closedir.
5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags
and to centralize error reporting.
6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them.
7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible.
Changes Left TO-DO:
1. fix provenance code, it is too HDF5 specific.
2020-06-29 08:02:47 +08:00
|
|
|
NCZCacheEntry* entry = NULL;
|
|
|
|
|
|
|
|
/* Create the key for this cache */
|
|
|
|
if((stat = NCZ_buildchunkpath(cache,indices,&key))) goto done;
|
|
|
|
|
|
|
|
/* See if already in cache try MRU */
|
|
|
|
for(i=nclistlength(cache->entries)-1;i>=0;i--) {
|
|
|
|
entry = (NCZCacheEntry*)nclistget(cache->entries,i);
|
|
|
|
if(strcmp(key,entry->key)==0) {
|
|
|
|
if(datap) *datap = entry->data;
|
|
|
|
/* Move to keep MRU at end */
|
|
|
|
nclistremove(cache->entries,i);
|
|
|
|
break;
|
|
|
|
} else entry = NULL;
|
|
|
|
}
|
|
|
|
if(entry == NULL) { /*!found*/
|
|
|
|
if((stat=makeroom(cache))) goto done;
|
|
|
|
/* Create a new entry */
|
|
|
|
if((entry = calloc(1,sizeof(NCZCacheEntry)))==NULL)
|
|
|
|
{stat = NC_ENOMEM; goto done;}
|
|
|
|
memcpy(entry->indices,indices,rank*sizeof(size64_t));
|
|
|
|
/* Create the local copy space */
|
|
|
|
if((entry->data = calloc(1,cache->chunksize)) == NULL)
|
|
|
|
{stat = NC_ENOMEM; goto done;}
|
|
|
|
entry->key= key; key = NULL;
|
|
|
|
}
|
|
|
|
entry->modified = 1;
|
|
|
|
nclistpush(cache->entries,entry); /* MRU order */
|
|
|
|
#ifdef DEBUG
|
|
|
|
fprintf(stderr,"|cache.write|=%ld\n",nclistlength(cache->entries));
|
|
|
|
#endif
|
|
|
|
entry = NULL;
|
|
|
|
|
|
|
|
done:
|
|
|
|
if(entry) {nullfree(entry->data); nullfree(entry->key);}
|
|
|
|
nullfree(entry);
|
|
|
|
nullfree(key);
|
|
|
|
return THROW(stat);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
makeroom(NCZChunkCache* cache)
|
|
|
|
{
|
|
|
|
int stat = NC_NOERR;
|
|
|
|
/* Flush from LRU end if we are at capacity */
|
|
|
|
while(nclistlength(cache->entries) >= cache->maxentries) {
|
|
|
|
NCZCacheEntry* e = nclistremove(cache->entries,0);
|
|
|
|
assert(e != NULL);
|
|
|
|
if(e->modified) /* flush to file */
|
|
|
|
stat=put_chunk(cache,e->key,e);
|
|
|
|
/* reclaim */
|
|
|
|
nullfree(e->data); nullfree(e->key); nullfree(e);
|
|
|
|
}
|
|
|
|
#ifdef DEBUG
|
|
|
|
fprintf(stderr,"|cache.makeroom|=%ld\n",nclistlength(cache->entries));
|
|
|
|
#endif
|
|
|
|
return stat;
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
NCZ_flush_chunk_cache(NCZChunkCache* cache)
|
|
|
|
{
|
|
|
|
int stat = NC_NOERR;
|
|
|
|
size_t i;
|
|
|
|
|
|
|
|
if(NCZ_cache_size(cache) == 0) goto done;
|
|
|
|
|
|
|
|
/* Iterate over the entries in hashmap */
|
|
|
|
for(i=0;i<nclistlength(cache->entries);i++) {
|
|
|
|
NCZCacheEntry* entry = nclistget(cache->entries,i);
|
|
|
|
if(entry->modified) {
|
|
|
|
/* Write out this chunk in toto*/
|
|
|
|
if((stat=put_chunk(cache,entry->key,entry)))
|
|
|
|
goto done;
|
|
|
|
}
|
|
|
|
entry->modified = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
done:
|
|
|
|
return THROW(stat);
|
|
|
|
}
|
|
|
|
|
|
|
|
#if 0
|
|
|
|
int
|
|
|
|
NCZ_chunk_cache_modified(NCZChunkCache* cache, const size64_t* indices)
|
|
|
|
{
|
|
|
|
int stat = NC_NOERR;
|
|
|
|
char* key = NULL;
|
|
|
|
NCZCacheEntry* entry = NULL;
|
2020-11-20 08:01:04 +08:00
|
|
|
int rank = cache->ndims;
|
This PR adds EXPERIMENTAL support for accessing data in the
cloud using a variant of the Zarr protocol and storage
format. This enhancement is generically referred to as "NCZarr".
The data model supported by NCZarr is netcdf-4 minus the user-defined
types and the String type. In this sense it is similar to the CDF-5
data model.
More detailed information about enabling and using NCZarr is
described in the document NUG/nczarr.md and in a
[Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in).
WARNING: this code has had limited testing, so do use this version
for production work. Also, performance improvements are ongoing.
Note especially the following platform matrix of successful tests:
Platform | Build System | S3 support
------------------------------------
Linux+gcc | Automake | yes
Linux+gcc | CMake | yes
Visual Studio | CMake | no
Additionally, and as a consequence of the addition of NCZarr,
major changes have been made to the Filter API. NOTE: NCZarr
does not yet support filters, but these changes are enablers for
that support in the future. Note that it is possible
(probable?) that there will be some accidental reversions if the
changes here did not correctly mimic the existing filter testing.
In any case, previously filter ids and parameters were of type
unsigned int. In order to support the more general zarr filter
model, this was all converted to char*. The old HDF5-specific,
unsigned int operations are still supported but they are
wrappers around the new, char* based nc_filterx_XXX functions.
This entailed at least the following changes:
1. Added the files libdispatch/dfilterx.c and include/ncfilter.h
2. Some filterx utilities have been moved to libdispatch/daux.c
3. A new entry, "filter_actions" was added to the NCDispatch table
and the version bumped.
4. An overly complex set of structs was created to support funnelling
all of the filterx operations thru a single dispatch
"filter_actions" entry.
5. Move common code to from libhdf5 to libsrc4 so that it is accessible
to nczarr.
Changes directly related to Zarr:
1. Modified CMakeList.txt and configure.ac to support both C and C++
-- this is in support of S3 support via the awd-sdk libraries.
2. Define a size64_t type to support nczarr.
3. More reworking of libdispatch/dinfermodel.c to
support zarr and to regularize the structure of the fragments
section of a URL.
Changes not directly related to Zarr:
1. Make client-side filter registration be conditional, with default off.
2. Hack include/nc4internal.h to make some flags added by Ed be unique:
e.g. NC_CREAT, NC_INDEF, etc.
3. cleanup include/nchttp.h and libdispatch/dhttp.c.
4. Misc. changes to support compiling under Visual Studio including:
* Better testing under windows for dirent.h and opendir and closedir.
5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags
and to centralize error reporting.
6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them.
7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible.
Changes Left TO-DO:
1. fix provenance code, it is too HDF5 specific.
2020-06-29 08:02:47 +08:00
|
|
|
|
|
|
|
/* Create the key for this cache */
|
|
|
|
if((stat=buildchunkkey(rank, indices, &key))) goto done;
|
|
|
|
|
|
|
|
/* See if already in cache */
|
|
|
|
if(NC_hashmapget(cache->entries, key, strlen(key), (uintptr_t*)entry)) { /* found */
|
|
|
|
entry->modified = 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
done:
|
|
|
|
nullfree(key);
|
|
|
|
return THROW(stat);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
/**************************************************/
|
|
|
|
/*
|
|
|
|
From Zarr V2 Specification:
|
|
|
|
"The compressed sequence of bytes for each chunk is stored under
|
|
|
|
a key formed from the index of the chunk within the grid of
|
|
|
|
chunks representing the array. To form a string key for a
|
|
|
|
chunk, the indices are converted to strings and concatenated
|
Upgrade the nczarr code to match Zarr V2
Re: https://github.com/zarr-developers/zarr-python/pull/716
The Zarr version 2 spec has been extended to include the ability
to choose the dimension separator in chunk name keys. The legal
separators has been extended from {'.'} to {'.' '/'}. So now it
is possible to use a key like "0/1/2/0" for chunk names.
This PR implements this for NCZarr. The V2 spec now says that
this separator can be set on a per-variable basis. For now, I
have chosen to allow this be set only globally by adding a key
named "ZARR.DIMENSION_SEPARATOR=<char>" in the
.daprc/.dodsrc/ncrc file. Currently, the only legal separator
characters are '.' (the default) and '/'. On writing, this key
will only be written if its value is different than the default.
This change caused problems because supporting a separator of '/'
is difficult to parse when keys/paths use '/' as the path separator.
A test case was added for this.
Additionally, make nczarr be enabled default by default. This required
some additional changes so that if zip and/or AWS S3 sdk are unavailable,
then they are disabled for NCZarr.
In addition the following unrelated changes were made.
1. Tested that pure-zarr mode could read an nczarr formatted store.
1. The .rc file handling now merges all known .rc files (.ncrc,.daprc, and .dodsrc) in that order and using those in HOME first, then in current directory. For duplicate entries, the later ones override the earlier ones. This change is to remove some of the conflicts inherent in the current .rc file load process. A set of test cases was also added.
1. Re-order tests in configure.ac and CMakeLists.txt so that if libcurl
is not found then the other options that depend upon it properly
are disabled.
1. I decided that xarray support should be enabled by default for pure
zarr. In order to allow disabling, I added a new mode flag "noxarray".
1. Certain test in nczarr_test depend on use of .dodsrc. In order for these
to work when testing in parallel, some inter-test dependencies needed to
be added.
1. Improved authorization testing to use changes in thredds.ucar.edu
2021-04-25 09:48:15 +08:00
|
|
|
with the dimension_separator character ('/' or '.') separating each index. For
|
This PR adds EXPERIMENTAL support for accessing data in the
cloud using a variant of the Zarr protocol and storage
format. This enhancement is generically referred to as "NCZarr".
The data model supported by NCZarr is netcdf-4 minus the user-defined
types and the String type. In this sense it is similar to the CDF-5
data model.
More detailed information about enabling and using NCZarr is
described in the document NUG/nczarr.md and in a
[Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in).
WARNING: this code has had limited testing, so do use this version
for production work. Also, performance improvements are ongoing.
Note especially the following platform matrix of successful tests:
Platform | Build System | S3 support
------------------------------------
Linux+gcc | Automake | yes
Linux+gcc | CMake | yes
Visual Studio | CMake | no
Additionally, and as a consequence of the addition of NCZarr,
major changes have been made to the Filter API. NOTE: NCZarr
does not yet support filters, but these changes are enablers for
that support in the future. Note that it is possible
(probable?) that there will be some accidental reversions if the
changes here did not correctly mimic the existing filter testing.
In any case, previously filter ids and parameters were of type
unsigned int. In order to support the more general zarr filter
model, this was all converted to char*. The old HDF5-specific,
unsigned int operations are still supported but they are
wrappers around the new, char* based nc_filterx_XXX functions.
This entailed at least the following changes:
1. Added the files libdispatch/dfilterx.c and include/ncfilter.h
2. Some filterx utilities have been moved to libdispatch/daux.c
3. A new entry, "filter_actions" was added to the NCDispatch table
and the version bumped.
4. An overly complex set of structs was created to support funnelling
all of the filterx operations thru a single dispatch
"filter_actions" entry.
5. Move common code to from libhdf5 to libsrc4 so that it is accessible
to nczarr.
Changes directly related to Zarr:
1. Modified CMakeList.txt and configure.ac to support both C and C++
-- this is in support of S3 support via the awd-sdk libraries.
2. Define a size64_t type to support nczarr.
3. More reworking of libdispatch/dinfermodel.c to
support zarr and to regularize the structure of the fragments
section of a URL.
Changes not directly related to Zarr:
1. Make client-side filter registration be conditional, with default off.
2. Hack include/nc4internal.h to make some flags added by Ed be unique:
e.g. NC_CREAT, NC_INDEF, etc.
3. cleanup include/nchttp.h and libdispatch/dhttp.c.
4. Misc. changes to support compiling under Visual Studio including:
* Better testing under windows for dirent.h and opendir and closedir.
5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags
and to centralize error reporting.
6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them.
7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible.
Changes Left TO-DO:
1. fix provenance code, it is too HDF5 specific.
2020-06-29 08:02:47 +08:00
|
|
|
example, given an array with shape (10000, 10000) and chunk
|
|
|
|
shape (1000, 1000) there will be 100 chunks laid out in a 10 by
|
|
|
|
10 grid. The chunk with indices (0, 0) provides data for rows
|
|
|
|
0-1000 and columns 0-1000 and is stored under the key "0.0"; the
|
|
|
|
chunk with indices (2, 4) provides data for rows 2000-3000 and
|
|
|
|
columns 4000-5000 and is stored under the key "2.4"; etc."
|
|
|
|
*/
|
|
|
|
|
|
|
|
/**
|
|
|
|
* @param R Rank
|
|
|
|
* @param chunkindices The chunk indices
|
|
|
|
* @param keyp Return the chunk key string
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
buildchunkkey(size_t R, const size64_t* chunkindices, char** keyp)
|
|
|
|
{
|
|
|
|
int stat = NC_NOERR;
|
|
|
|
int r;
|
|
|
|
NCbytes* key = ncbytesnew();
|
|
|
|
|
|
|
|
if(keyp) *keyp = NULL;
|
|
|
|
|
|
|
|
for(r=0;r<R;r++) {
|
|
|
|
char sindex[64];
|
|
|
|
if(r > 0) ncbytescat(key,".");
|
|
|
|
/* Print as decimal with no leading zeros */
|
|
|
|
snprintf(sindex,sizeof(sindex),"%lu",(unsigned long)chunkindices[r]);
|
|
|
|
ncbytescat(key,sindex);
|
|
|
|
}
|
|
|
|
ncbytesnull(key);
|
|
|
|
if(keyp) *keyp = ncbytesextract(key);
|
|
|
|
|
|
|
|
ncbytesfree(key);
|
|
|
|
return THROW(stat);
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* @internal Push data to chunk of a file.
|
|
|
|
* If chunk does not exist, create it
|
|
|
|
*
|
|
|
|
* @param file Pointer to file info struct.
|
|
|
|
* @param proj Chunk projection
|
|
|
|
* @param datalen size of data
|
|
|
|
* @param data Buffer containing the chunk data to write
|
|
|
|
*
|
|
|
|
* @return ::NC_NOERR No error.
|
|
|
|
* @author Dennis Heimbigner
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
put_chunk(NCZChunkCache* cache, const char* key, const NCZCacheEntry* entry)
|
|
|
|
{
|
|
|
|
int stat = NC_NOERR;
|
|
|
|
NCZ_FILE_INFO_T* zfile = NULL;
|
|
|
|
NCZMAP* map = NULL;
|
|
|
|
|
|
|
|
LOG((3, "%s: var: %p", __func__, cache->var));
|
|
|
|
|
|
|
|
zfile = ((cache->var->container)->nc4_info)->format_file_info;
|
|
|
|
map = zfile->map;
|
|
|
|
|
|
|
|
stat = nczmap_write(map,key,0,cache->chunksize,entry->data);
|
|
|
|
switch(stat) {
|
|
|
|
case NC_NOERR: break;
|
|
|
|
case NC_EEMPTY:
|
|
|
|
/* Create the chunk */
|
2021-01-29 11:11:01 +08:00
|
|
|
switch (stat = nczmap_defineobj(map,key)) {
|
|
|
|
case NC_NOERR: case NC_EFOUND: break;
|
|
|
|
default: goto done;
|
|
|
|
}
|
This PR adds EXPERIMENTAL support for accessing data in the
cloud using a variant of the Zarr protocol and storage
format. This enhancement is generically referred to as "NCZarr".
The data model supported by NCZarr is netcdf-4 minus the user-defined
types and the String type. In this sense it is similar to the CDF-5
data model.
More detailed information about enabling and using NCZarr is
described in the document NUG/nczarr.md and in a
[Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in).
WARNING: this code has had limited testing, so do use this version
for production work. Also, performance improvements are ongoing.
Note especially the following platform matrix of successful tests:
Platform | Build System | S3 support
------------------------------------
Linux+gcc | Automake | yes
Linux+gcc | CMake | yes
Visual Studio | CMake | no
Additionally, and as a consequence of the addition of NCZarr,
major changes have been made to the Filter API. NOTE: NCZarr
does not yet support filters, but these changes are enablers for
that support in the future. Note that it is possible
(probable?) that there will be some accidental reversions if the
changes here did not correctly mimic the existing filter testing.
In any case, previously filter ids and parameters were of type
unsigned int. In order to support the more general zarr filter
model, this was all converted to char*. The old HDF5-specific,
unsigned int operations are still supported but they are
wrappers around the new, char* based nc_filterx_XXX functions.
This entailed at least the following changes:
1. Added the files libdispatch/dfilterx.c and include/ncfilter.h
2. Some filterx utilities have been moved to libdispatch/daux.c
3. A new entry, "filter_actions" was added to the NCDispatch table
and the version bumped.
4. An overly complex set of structs was created to support funnelling
all of the filterx operations thru a single dispatch
"filter_actions" entry.
5. Move common code to from libhdf5 to libsrc4 so that it is accessible
to nczarr.
Changes directly related to Zarr:
1. Modified CMakeList.txt and configure.ac to support both C and C++
-- this is in support of S3 support via the awd-sdk libraries.
2. Define a size64_t type to support nczarr.
3. More reworking of libdispatch/dinfermodel.c to
support zarr and to regularize the structure of the fragments
section of a URL.
Changes not directly related to Zarr:
1. Make client-side filter registration be conditional, with default off.
2. Hack include/nc4internal.h to make some flags added by Ed be unique:
e.g. NC_CREAT, NC_INDEF, etc.
3. cleanup include/nchttp.h and libdispatch/dhttp.c.
4. Misc. changes to support compiling under Visual Studio including:
* Better testing under windows for dirent.h and opendir and closedir.
5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags
and to centralize error reporting.
6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them.
7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible.
Changes Left TO-DO:
1. fix provenance code, it is too HDF5 specific.
2020-06-29 08:02:47 +08:00
|
|
|
/* write again */
|
|
|
|
if((stat = nczmap_write(map,key,0,cache->chunksize,entry->data)))
|
|
|
|
goto done;
|
|
|
|
break;
|
|
|
|
default: goto done;
|
|
|
|
}
|
|
|
|
done:
|
|
|
|
return THROW(stat);
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* @internal Push data from memory to file.
|
|
|
|
*
|
|
|
|
* @param cache Pointer to parent cache
|
|
|
|
* @param key chunk key
|
|
|
|
* @param entry cache entry to read into
|
|
|
|
*
|
|
|
|
* @return ::NC_NOERR No error.
|
|
|
|
* @author Dennis Heimbigner
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
get_chunk(NCZChunkCache* cache, const char* key, NCZCacheEntry* entry)
|
|
|
|
{
|
|
|
|
int stat = NC_NOERR;
|
|
|
|
NCZMAP* map = NULL;
|
|
|
|
NC_FILE_INFO_T* file = NULL;
|
|
|
|
NCZ_FILE_INFO_T* zfile = NULL;
|
|
|
|
|
|
|
|
LOG((3, "%s: file: %p", __func__, file));
|
|
|
|
|
|
|
|
file = (cache->var->container)->nc4_info;
|
|
|
|
zfile = file->format_file_info;
|
|
|
|
map = zfile->map;
|
|
|
|
assert(map && entry->data);
|
|
|
|
|
|
|
|
stat = nczmap_read(map,key,0,cache->chunksize,(char*)entry->data);
|
|
|
|
|
|
|
|
return THROW(stat);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
create_chunk(NCZChunkCache* cache, const char* key, NCZCacheEntry* entry)
|
|
|
|
{
|
|
|
|
int stat = NC_NOERR;
|
|
|
|
NC_FILE_INFO_T* file = NULL;
|
|
|
|
NCZ_FILE_INFO_T* zfile = NULL;
|
|
|
|
NCZMAP* map = NULL;
|
|
|
|
|
|
|
|
file = (cache->var->container)->nc4_info;
|
|
|
|
zfile = file->format_file_info;
|
|
|
|
map = zfile->map;
|
|
|
|
|
|
|
|
/* Create the chunk */
|
|
|
|
if((stat = nczmap_defineobj(map,key))) goto done;
|
|
|
|
entry->modified = 1; /* mark as modified */
|
|
|
|
/* let higher function decide on fill */
|
|
|
|
|
|
|
|
done:
|
|
|
|
return THROW(stat);
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
NCZ_buildchunkpath(NCZChunkCache* cache, const size64_t* chunkindices, char** keyp)
|
|
|
|
{
|
|
|
|
int stat = NC_NOERR;
|
|
|
|
char* chunkname = NULL;
|
|
|
|
char* varkey = NULL;
|
|
|
|
char* key = NULL;
|
|
|
|
|
|
|
|
/* Get the chunk object name */
|
2020-11-20 08:01:04 +08:00
|
|
|
if((stat = buildchunkkey(cache->ndims, chunkindices, &chunkname))) goto done;
|
This PR adds EXPERIMENTAL support for accessing data in the
cloud using a variant of the Zarr protocol and storage
format. This enhancement is generically referred to as "NCZarr".
The data model supported by NCZarr is netcdf-4 minus the user-defined
types and the String type. In this sense it is similar to the CDF-5
data model.
More detailed information about enabling and using NCZarr is
described in the document NUG/nczarr.md and in a
[Unidata Developer's blog entry](https://www.unidata.ucar.edu/blogs/developer/en/entry/overview-of-zarr-support-in).
WARNING: this code has had limited testing, so do use this version
for production work. Also, performance improvements are ongoing.
Note especially the following platform matrix of successful tests:
Platform | Build System | S3 support
------------------------------------
Linux+gcc | Automake | yes
Linux+gcc | CMake | yes
Visual Studio | CMake | no
Additionally, and as a consequence of the addition of NCZarr,
major changes have been made to the Filter API. NOTE: NCZarr
does not yet support filters, but these changes are enablers for
that support in the future. Note that it is possible
(probable?) that there will be some accidental reversions if the
changes here did not correctly mimic the existing filter testing.
In any case, previously filter ids and parameters were of type
unsigned int. In order to support the more general zarr filter
model, this was all converted to char*. The old HDF5-specific,
unsigned int operations are still supported but they are
wrappers around the new, char* based nc_filterx_XXX functions.
This entailed at least the following changes:
1. Added the files libdispatch/dfilterx.c and include/ncfilter.h
2. Some filterx utilities have been moved to libdispatch/daux.c
3. A new entry, "filter_actions" was added to the NCDispatch table
and the version bumped.
4. An overly complex set of structs was created to support funnelling
all of the filterx operations thru a single dispatch
"filter_actions" entry.
5. Move common code to from libhdf5 to libsrc4 so that it is accessible
to nczarr.
Changes directly related to Zarr:
1. Modified CMakeList.txt and configure.ac to support both C and C++
-- this is in support of S3 support via the awd-sdk libraries.
2. Define a size64_t type to support nczarr.
3. More reworking of libdispatch/dinfermodel.c to
support zarr and to regularize the structure of the fragments
section of a URL.
Changes not directly related to Zarr:
1. Make client-side filter registration be conditional, with default off.
2. Hack include/nc4internal.h to make some flags added by Ed be unique:
e.g. NC_CREAT, NC_INDEF, etc.
3. cleanup include/nchttp.h and libdispatch/dhttp.c.
4. Misc. changes to support compiling under Visual Studio including:
* Better testing under windows for dirent.h and opendir and closedir.
5. Misc. changes to the oc2 code to support various libcurl CURLOPT flags
and to centralize error reporting.
6. By default, suppress the vlen tests that have unfixed memory leaks; add option to enable them.
7. Make part of the nc_test/test_byterange.sh test be contingent on remotetest.unidata.ucar.edu being accessible.
Changes Left TO-DO:
1. fix provenance code, it is too HDF5 specific.
2020-06-29 08:02:47 +08:00
|
|
|
/* Get the var object key */
|
|
|
|
if((stat = NCZ_varkey(cache->var,&varkey))) goto done;
|
|
|
|
/* Prefix the path to the containing variable object */
|
|
|
|
if((stat=nczm_concat(varkey,chunkname,&key))) goto done;
|
|
|
|
if(keyp) {*keyp = key; key = NULL;}
|
|
|
|
|
|
|
|
done:
|
|
|
|
nullfree(chunkname);
|
|
|
|
nullfree(varkey);
|
|
|
|
nullfree(key);
|
|
|
|
return THROW(stat);
|
|
|
|
}
|
|
|
|
|