netcdf-c/libnczarr/zcache.h
Dennis Heimbigner 231ae96c4b Add support for Zarr string type to NCZarr
* re: https://github.com/Unidata/netcdf-c/pull/2278
* re: https://github.com/Unidata/netcdf-c/issues/2485
* re: https://github.com/Unidata/netcdf-c/issues/2474

This PR subsumes PR https://github.com/Unidata/netcdf-c/pull/2278.
Actually is a bit an omnibus covering several issues.

## PR https://github.com/Unidata/netcdf-c/pull/2278
Add support for the Zarr string type.
Zarr strings are restricted currently to be of fixed size.
The primary issue to be addressed is to provide a way for user to
specify the size of the fixed length strings. This is handled by providing
the following new attributes special:
1. **_nczarr_default_maxstrlen** —
This is an attribute of the root group. It specifies the default
maximum string length for string types. If not specified, then
it has the value of 64 characters.
2. **_nczarr_maxstrlen** —
This is a per-variable attribute. It specifies the maximum
string length for the string type associated with the variable.
If not specified, then it is assigned the value of
**_nczarr_default_maxstrlen**.

This PR also requires some hacking to handle the existing netcdf-c NC_CHAR
type, which does not exist in zarr. The goal was to choose numpy types for
both the netcdf-c NC_STRING type and the netcdf-c NC_CHAR type such that
if a pure zarr implementation read them, it would still work and an
NC_CHAR type would be handled by zarr as a string of length 1.

For writing variables and NCZarr attributes, the type mapping is as follows:
* "|S1" for NC_CHAR.
* ">S1" for NC_STRING && MAXSTRLEN==1
* ">Sn" for NC_STRING && MAXSTRLEN==n

Note that it is a bit of a hack to use endianness, but it should be ok since for
string/char, the endianness has no meaning.

For reading attributes with pure zarr (i.e. with no nczarr
atribute types defined), they will always be interpreted as of
type NC_CHAR.

## Issue: https://github.com/Unidata/netcdf-c/issues/2474
This PR partly fixes this issue because it provided more
comprehensive support for Zarr attributes that are JSON valued expressions.
This PR still does not address the problem in that issue where the
_ARRAY_DIMENSION attribute is incorrectly set. Than can only be
fixed by the creator of the datasets.

## Issue: https://github.com/Unidata/netcdf-c/issues/2485
This PR also fixes the scalar failure shown in this issue.
It generally cleans up scalar handling.
It also adds a note to the documentation describing that
NCZarr supports scalars while Zarr does not and also how
scalar interoperability is achieved.

## Misc. Other Changes
1. Convert the nczarr special attributes and keys to be all lower case. So "_NCZARR_ATTR" now used "_nczarr_attr. Support back compatibility for the upper case names.
2. Cleanup my too-clever-by-half handling of scalars in libnczarr.
2022-08-27 20:21:13 -06:00

73 lines
3.0 KiB
C

/*********************************************************************
* Copyright 2018, UCAR/Unidata
* See netcdf/COPYRIGHT file for copying and redistribution conditions.
*********************************************************************/
#ifndef ZCACHE_H
#define ZCACHE_H
/* This holds all the fields
to support either impl of cache
*/
struct NCxcache;
/* Note in the following: the term "real"
refers to the unfiltered/uncompressed data
The term filtered refers to the result of running
the real data through the filter chain. Note that the
sizeof the filtered data might be larger than the size of
the real data.
The term "raw" is used to refer to the data on disk and it may either
be real or filtered.
*/
typedef struct NCZCacheEntry {
struct List {void* next; void* prev; void* unused;} list;
int modified;
size64_t indices[NC_MAX_VAR_DIMS];
struct ChunkKey {
char* varkey; /* key to the containing variable */
char* chunkkey; /* name of the chunk */
} key;
size64_t hashkey;
int isfiltered; /* 1=>data contains filtered data else real data */
int isfixedstring; /* 1 => data contains the fixed strings, 0 => data contains pointers to strings */
size64_t size; /* |data| */
void* data; /* contains either filtered or real data */
} NCZCacheEntry;
typedef struct NCZChunkCache {
int valid; /* 0 => following fields need to be re-calculated */
NC_VAR_INFO_T* var; /* backlink */
size64_t ndims; /* true ndims == var->ndims + scalar */
size64_t chunksize; /* for real data */
size64_t chunkcount; /* cross product of chunksizes */
void* fillchunk; /* enough fillvalues to fill a real chunk */
size_t maxentries; /* Max number of entries allowed; maxsize can override */
size_t maxsize; /* Maximum space used by cache; 0 => nolimit */
size_t used; /* How much total space is being used */
NClist* mru; /* NClist<NCZCacheEntry> all cache entries in mru order */
struct NCxcache* xcache;
char dimension_separator;
} NCZChunkCache;
/**************************************************/
#define FILTERED(cache) (nclistlength((NClist*)(cache)->var->filters))
extern int NCZ_set_var_chunk_cache(int ncid, int varid, size_t size, size_t nelems, float preemption);
extern int NCZ_adjust_var_cache(NC_VAR_INFO_T *var);
extern int NCZ_create_chunk_cache(NC_VAR_INFO_T* var, size64_t, char dimsep, NCZChunkCache** cachep);
extern void NCZ_free_chunk_cache(NCZChunkCache* cache);
extern int NCZ_read_cache_chunk(NCZChunkCache* cache, const size64_t* indices, void** datap);
extern int NCZ_flush_chunk_cache(NCZChunkCache* cache);
extern size64_t NCZ_cache_entrysize(NCZChunkCache* cache);
extern NCZCacheEntry* NCZ_cache_entry(NCZChunkCache* cache, const size64_t* indices);
extern size64_t NCZ_cache_size(NCZChunkCache* cache);
extern int NCZ_buildchunkpath(NCZChunkCache* cache, const size64_t* chunkindices, struct ChunkKey* key);
extern int NCZ_ensure_fill_chunk(NCZChunkCache* cache);
extern int NCZ_reclaim_fill_chunk(NCZChunkCache* cache);
#endif /*ZCACHE_H*/