mirror of
https://github.com/Unidata/netcdf-c.git
synced 2024-11-27 07:30:33 +08:00
d953899559
re: https://github.com/zarr-developers/zarr-specs/issues/41 After discussions with the Zarr community, it was decided to convert to a new representation of the NCZarr meta-data extensions: version 2. These extensions store information necessary to mapping the Zarr data model to the netcdf-4 data model. The basic change is to remove the NCZarr specific objects: .nczarr, .nczgroup, .nczarray, and .nczattr. The contents of these objects is moved into the corresponding existing Zarr objects as special keys. The mapping is as follows: * ''.nczarr'' => ''/.zgroup/_NCZARR_SUPERBLOCK_'' * ''.nczgroup => ''.zgroup/_NCZARR_GROUP_'' * ''.nczarray => ''.zarray/_NCZARR_ARRAY_'' * ''.nczattr => ''.zattr/_NCZARR_ATTR_'' Backward compatibility is maintained by looking for the object ''/.nczarr'' and if found, then assuming that the dataset is in the older version 1 format. This compatibility only supports reading of such version 1 datasets. Documentation and test cases are also added. Misc. Other Changes: 1. The json parsing code was added to the general library instead of nczarr only (ncjson.c, ncjson.h). 2. Improved support for different platform paths by allowing conversion to a single common path representation. 3. Add some new error codes. 4. Modify nccopy usage to mention the new chunking specification.
365 lines
14 KiB
C
365 lines
14 KiB
C
/* Copyright 2018-2018 University Corporation for Atmospheric
|
|
Research/Unidata. */
|
|
|
|
/**
|
|
* @file This header file contains types (and type-related macros)
|
|
* for the libzarr code.
|
|
*
|
|
*
|
|
* @author Dennis Heimbigner
|
|
*/
|
|
|
|
/*
|
|
This API essentially implements a simplified variant
|
|
of the Amazon S3 API. Specifically, we have the following
|
|
kinds of things.
|
|
|
|
As with Amazon S3, keys are utf8 strings with a specific structure:
|
|
that of a path similar to those of a Unix path with '/' as the
|
|
separator for the segments of the path.
|
|
|
|
As with Unix, all keys have this BNF syntax:
|
|
<pre>
|
|
key: '/' | keypath ;
|
|
keypath: '/' segment | keypath '/' segment ;
|
|
segment: <sequence of UTF-8 characters except control characters and '/'>
|
|
</pre>
|
|
|
|
Obviously, one can infer a tree structure from this key structure.
|
|
A containment relationship is defined by key prefixes.
|
|
Thus one key is "contained" (possibly transitively)
|
|
by another if one key is a prefix (in the string sense) of the other.
|
|
So in this sense the key "/x/y/z" is contained by the key "/x/y".
|
|
|
|
In this model all keys "exist" but only some keys refer to
|
|
objects containing content -- content bearing.
|
|
An important restriction is placed on the structure of the tree.
|
|
Namely, keys are only defined for content-bearing objects.
|
|
Further, all the leaves of the tree are these content-bearing objects.
|
|
This means that the key for one content-bearing object cannot
|
|
be a prefix of any other key.
|
|
|
|
There several other concepts of note.
|
|
1. Dataset - a dataset is the complete tree contained by the key defining
|
|
the root of the dataset. Technically, the root of the tree is the key <dataset>/.nczarr,
|
|
where .nczarr can be considered the superblock of the dataset.
|
|
2. Object - equivalent of the S3 object; Each object has a unique key
|
|
and "contains" data in the form of an arbitrary sequence of 8-bit bytes.
|
|
|
|
The zmap API defined here isolates the key-value pair mapping code
|
|
from the Zarr-based implementation of NetCDF-4.
|
|
|
|
It wraps an internal C dispatch table manager
|
|
for implementing an abstract data structure
|
|
implementing the key/object model.
|
|
|
|
Search:
|
|
The search function has two purposes:
|
|
a. Support reading of pure zarr datasets (because they do not explicitly
|
|
track their contents).
|
|
b. Debugging to allow raw examination of the storage. See zdump
|
|
for example.
|
|
|
|
The search function takes a prefix path which has a key syntax (see above).
|
|
The set of legal keys is the set of keys such that the key references
|
|
a content-bearing object -- e.g. /x/y/.zarray or /.zgroup. Essentially
|
|
this is the set of keys pointing to the leaf objects of the tree of keys
|
|
constituting a dataset. This set potentially limits the set of keys that need to be
|
|
examined during search.
|
|
|
|
Ideally the search function would return
|
|
the set of names that are immediate suffixes of a
|
|
given prefix path. That is, if <prefix> is the prefix path,
|
|
then search returns all <name> such that <prefix>/<name> is itself a prefix of a "legal" key.
|
|
This could be used to implement glob style searches such as "/x/y/ *" or "/x/y/ **"
|
|
|
|
This semantics was chosen because it appears to be the minimum required to implement
|
|
all other kinds of search using recursion. So for example
|
|
1. Avoid returning keys that are not a prefix of some legal key.
|
|
2. Avoid returning all the legal keys in the dataset because that set may be very large;
|
|
although the implementation may still have to examine all legal keys to get the desired subset.
|
|
3. Allow for use of partial read mechanisms such as iterators, if available.
|
|
This can support processing a limited set of keys for each iteration. This is a
|
|
straighforward tradeoff of space over time.
|
|
|
|
This is doable in S3 search using common prefixes with a delimiter of '/', although
|
|
the implementation is a bit tricky. For the file system zmap implementation, the legal search keys can be obtained
|
|
one level at a time, which directly implements the search semantics. For the zip file implementation,
|
|
this semantics is not possible, so the whole tree must be obtained and searched.
|
|
|
|
Issues:
|
|
1. S3 limits key lengths to 1024 bytes. Some deeply nested netcdf files
|
|
will almost certainly exceed this limit.
|
|
2. Besides content, S3 objects can have an associated small set
|
|
of what may be called tags, which are themselves of the form of
|
|
key-value pairs, but where the key and value are always text. As
|
|
far as it is possible to determine, Zarr never uses these tags,
|
|
so they are not included in the zmap data structure.
|
|
|
|
A Note on Error Codes:
|
|
|
|
This model uses the S3 concepts of keys. All legal keys "exist"
|
|
in that it is possible to write to them, The concept of a key
|
|
not-existing has no meaning: all keys exist. Normally, in S3,
|
|
each key specifies an object, but unless that object has
|
|
content, it does not exist. Therefore we distinguish
|
|
content-bearing "objects" from non-content-bearing objects. Our
|
|
model only hold content-bearing objects. Note that the length of
|
|
that content may be zero. The important point is that in this
|
|
model, only content-bearing objects actually exist. Note that
|
|
this different than, say, a direvtory tree where a key will
|
|
always lead to something: a directory or a file.
|
|
|
|
In any case, the zmap API returns two distinguished error code:
|
|
1. NC_NOERR if a operation succeeded
|
|
2. NC_EEMPTY is returned when accessing a key that has no content.
|
|
This does not preclude other errors being returned such NC_EACCESS or NC_EPERM or NC_EINVAL
|
|
if there are permission errors or illegal function arguments, for example.
|
|
It also does not preclude the use of other error codes internal to the zmap
|
|
implementation. So zmap_file, for example, uses NC_ENOTFOUND internally
|
|
because it is possible to detect the existence of directories and files.
|
|
This does not propagate to the API.
|
|
|
|
Note that NC_EEMPTY is a new error code to signal to that the
|
|
caller asked for non-content-bearing key.
|
|
|
|
The current set of operations defined for zmaps are define with the
|
|
generic nczm_xxx functions below.
|
|
|
|
Each zmap implementation has retrievable flags defining limitations
|
|
of the implementation.
|
|
|
|
*/
|
|
|
|
#ifndef ZMAP_H
|
|
#define ZMAP_H
|
|
|
|
#include "ncexternl.h"
|
|
|
|
#define NCZM_SEP "/"
|
|
|
|
#define NCZM_DOT '.'
|
|
|
|
/*Mnemonic*/
|
|
#define LOCALIZE 1
|
|
|
|
/* Forward */
|
|
typedef struct NCZMAP_API NCZMAP_API;
|
|
|
|
/* Define the space of implemented (eventually) map implementations */
|
|
typedef enum NCZM_IMPL {
|
|
NCZM_UNDEF=0, /* In-memory implementation */
|
|
NCZM_FILE=1, /* File system directory-based implementation */
|
|
NCZM_ZIP=2, /* Zip-file based implementation */
|
|
NCZM_S3=3, /* Amazon S3 implementation */
|
|
} NCZM_IMPL;
|
|
|
|
/* Define the default map implementation */
|
|
#define NCZM_DEFAULT NCZM_FILE
|
|
|
|
/* Define the per-implementation limitations flags */
|
|
typedef size64_t NCZM_FEATURES;
|
|
/* powers of 2 */
|
|
#define NCZM_UNIMPLEMENTED 1 /* Unknown/ unimplemented */
|
|
#define NCZM_WRITEONCE 2 /* Objects can only be written once */
|
|
#define NCZM_ZEROSTART 4 /* Objects can only be written using a start count of zero */
|
|
|
|
/*
|
|
For each dataset, we create what amounts to a class
|
|
defining data and the API function implementations.
|
|
All datasets are subclasses of NCZMAP.
|
|
In the usual C approach, subclassing is performed by
|
|
casting.
|
|
|
|
So all Dataset structs have this as their first field
|
|
so we can cast to this form; avoids need for
|
|
a separate per-implementation malloc piece.
|
|
|
|
*/
|
|
typedef struct NCZMAP {
|
|
NCZM_IMPL format;
|
|
char* url;
|
|
int mode;
|
|
size64_t flags; /* Passed in by caller */
|
|
struct NCZMAP_API* api;
|
|
} NCZMAP;
|
|
|
|
/* zmap_s3sdk related-types and constants */
|
|
|
|
#define AWSHOST ".amazonaws.com"
|
|
|
|
enum URLFORMAT {UF_NONE=0, UF_VIRTUAL=1, UF_PATH=2, UF_OTHER=3};
|
|
|
|
typedef struct ZS3INFO {
|
|
enum URLFORMAT urlformat;
|
|
char* host; /* non-null if other*/
|
|
char* region; /* region */
|
|
char* bucket; /* bucket name */
|
|
char* rootkey;
|
|
} ZS3INFO;
|
|
|
|
/* Forward */
|
|
struct NClist;
|
|
|
|
/* Define the object-level API */
|
|
|
|
struct NCZMAP_API {
|
|
int version;
|
|
/* Map Operations */
|
|
int (*close)(NCZMAP* map, int deleteit);
|
|
/* Object Operations */
|
|
int (*exists)(NCZMAP* map, const char* key);
|
|
int (*len)(NCZMAP* map, const char* key, size64_t* sizep);
|
|
int (*read)(NCZMAP* map, const char* key, size64_t start, size64_t count, void* content);
|
|
int (*write)(NCZMAP* map, const char* key, size64_t start, size64_t count, const void* content);
|
|
int (*search)(NCZMAP* map, const char* prefix, struct NClist* matches);
|
|
};
|
|
|
|
/* Define the Dataset level API */
|
|
typedef struct NCZMAP_DS_API {
|
|
int version;
|
|
NCZM_FEATURES features;
|
|
int (*create)(const char *path, int mode, size64_t constraints, void* parameters, NCZMAP** mapp);
|
|
int (*open)(const char *path, int mode, size64_t constraints, void* parameters, NCZMAP** mapp);
|
|
} NCZMAP_DS_API;
|
|
|
|
#ifdef __cplusplus
|
|
extern "C" {
|
|
#endif
|
|
|
|
/**
|
|
Get limitations of a particular implementation.
|
|
@param impl -- the map implemenation type
|
|
@param limitsp return limitation flags here
|
|
@return NC_NOERR if the operation succeeded
|
|
@return NC_EXXX if the operation failed for one of several possible reasons
|
|
*/
|
|
EXTERNL NCZM_FEATURES nczmap_features(NCZM_IMPL);
|
|
|
|
/* Object API Wrappers; note that there are no group operations
|
|
because group keys do not map to directories.
|
|
*/
|
|
|
|
/**
|
|
Check if a specified content-bearing object exists or not.
|
|
@param map -- the containing map
|
|
@param key -- the key specifying the content-bearing object
|
|
@return NC_NOERR if the object exists
|
|
@return NC_EEMPTY if the object is not content bearing.
|
|
@return NC_EXXX if the operation failed for one of several possible reasons
|
|
*/
|
|
EXTERNL int nczmap_exists(NCZMAP* map, const char* key);
|
|
|
|
/**
|
|
Return the current size of a specified content-bearing object exists or not.
|
|
@param map -- the containing map
|
|
@param key -- the key specifying the content-bearing object
|
|
@param sizep -- the object's size is returned thru this pointer.
|
|
@return NC_NOERR if the object exists
|
|
@return NC_EEMPTY if the object is not content bearing
|
|
@return NC_EXXX if the operation failed for one of several possible reasons
|
|
*/
|
|
EXTERNL int nczmap_len(NCZMAP* map, const char* key, size64_t* sizep);
|
|
|
|
/**
|
|
Read the content of a specified content-bearing object.
|
|
@param map -- the containing map
|
|
@param key -- the key specifying the content-bearing object
|
|
@param start -- offset into the content to start reading
|
|
@param count -- number of bytes to read
|
|
@param content -- read the data into this memory
|
|
@return NC_NOERR if the operation succeeded
|
|
@return NC_EEMPTY if the object is not content-bearing.
|
|
@return NC_EXXX if the operation failed for one of several possible reasons
|
|
*/
|
|
EXTERNL int nczmap_read(NCZMAP* map, const char* key, size64_t start, size64_t count, void* content);
|
|
|
|
/**
|
|
Write the content of a specified content-bearing object.
|
|
@param map -- the containing map
|
|
@param key -- the key specifying the content-bearing object
|
|
@param start -- offset into the content to start writing
|
|
@param count -- number of bytes to write
|
|
@param content -- write the data from this memory
|
|
@return NC_NOERR if the operation succeeded
|
|
@return NC_EXXX if the operation failed for one of several possible reasons
|
|
Note that this makes the key a content-bearing object.
|
|
*/
|
|
EXTERNL int nczmap_write(NCZMAP* map, const char* key, size64_t start, size64_t count, const void* content);
|
|
|
|
/**
|
|
Return a vector of names (not keys) representing the
|
|
next segment of legal objects that are immediately contained by the prefix key.
|
|
@param map -- the containing map
|
|
@param prefix -- the key into the tree where the search is to occur
|
|
@param matches -- return the set of names in this list; might be empty
|
|
@return NC_NOERR if the operation succeeded
|
|
@return NC_EXXX if the operation failed for one of several possible reasons
|
|
*/
|
|
EXTERNL int nczmap_search(NCZMAP* map, const char* prefix, struct NClist* matches);
|
|
|
|
/**
|
|
Close a map
|
|
@param map -- the map to close
|
|
@param deleteit-- if true, then delete the corresponding dataset
|
|
@return NC_NOERR if the operation succeeded
|
|
@return NC_EXXX if the operation failed for one of several possible reasons
|
|
*/
|
|
EXTERNL int nczmap_close(NCZMAP* map, int deleteit);
|
|
|
|
/* Create/open and control a dataset using a specific implementation */
|
|
EXTERNL int nczmap_create(NCZM_IMPL impl, const char *path, int mode, size64_t constraints, void* parameters, NCZMAP** mapp);
|
|
EXTERNL int nczmap_open(NCZM_IMPL impl, const char *path, int mode, size64_t constraints, void* parameters, NCZMAP** mapp);
|
|
|
|
/* Utility functions */
|
|
|
|
/** Split a path into pieces along '/' character; elide any leading '/' */
|
|
EXTERNL int nczm_split(const char* path, struct NClist* segments);
|
|
|
|
/* Split a path into pieces along some character; elide any leading char */
|
|
EXTERNL int nczm_split_delim(const char* path, char delim, struct NClist* segments);
|
|
|
|
/* Convenience: Join all segments into a path using '/' character */
|
|
EXTERNL int nczm_join(struct NClist* segments, char** pathp);
|
|
|
|
/* Convenience: Join all segments into a path using '/' character
|
|
but taking possible lead windows drive letter into account
|
|
*/
|
|
EXTERNL int nczm_joinpath(struct NClist* segments, char** pathp);
|
|
|
|
/* Convenience: concat two strings with '/' between; caller frees */
|
|
EXTERNL int nczm_concat(const char* prefix, const char* suffix, char** pathp);
|
|
|
|
/* Convenience: concat multiple strings with no intermediate separators; caller frees */
|
|
EXTERNL int nczm_appendn(char** resultp, int n, ...);
|
|
|
|
/* Break a key into prefix and suffix, where prefix is the first nsegs segments;
|
|
nsegs can be negative to specify that suffix is |nsegs| long
|
|
*/
|
|
EXTERNL int nczm_divide_at(const char* key, int nsegs, char** prefixp, char** suffixp);
|
|
|
|
/* Reclaim the content of a map but not the map itself */
|
|
EXTERNL int nczm_clear(NCZMAP* map);
|
|
|
|
/* Return 1 if path is absolute; takes Windows drive letters into account */
|
|
EXTERNL int nczm_isabsolutepath(const char* path);
|
|
|
|
/* Convert forward to back slash if needed */
|
|
EXTERNL int nczm_localize(const char* path, char** newpathp, int local);
|
|
|
|
EXTERNL int nczm_canonicalpath(const char* path, char** cpathp);
|
|
EXTERNL int nczm_basename(const char* path, char** basep);
|
|
EXTERNL int nczm_segment1(const char* path, char** seg1p);
|
|
EXTERNL int nczm_lastsegment(const char* path, char** lastp);
|
|
|
|
/* bubble sorts (note arguments) */
|
|
EXTERNL void nczm_sortlist(struct NClist* l);
|
|
EXTERNL void nczm_sortenvv(int n, char** envv);
|
|
EXTERNL void NCZ_freeenvv(int n, char** envv);
|
|
|
|
#ifdef __cplusplus
|
|
}
|
|
#endif
|
|
|
|
#endif /*ZMAP_H*/
|