mirror of
https://github.com/Unidata/netcdf-c.git
synced 2025-01-06 15:34:44 +08:00
432 lines
16 KiB
Plaintext
432 lines
16 KiB
Plaintext
/** \file
|
|
|
|
\internal
|
|
|
|
\page nchashmap Indexed Access to Metadata Objects
|
|
|
|
\tableofcontents
|
|
|
|
The original internal representations of metadata in memory
|
|
relied on linear searching of lists to locate various objects
|
|
by name or by numeric id: by _varid_ or by _grpid_ for example.
|
|
|
|
In recent years, the flaws in that approach have become obvious
|
|
as users create files with extremely large numbers of objects:
|
|
groups, variables, attributes, and dimensions. One case
|
|
has 14 megabytes of metadata. Creating and (especially) later
|
|
opening such files was exceedingly slow.
|
|
|
|
This problem was partially alleviated in both netcdf-3 (libsrc)
|
|
and netcdf-4 (libsrc4) by adding name hashing tables.
|
|
However, and especially for netcdf-4, linear search still prevailed.
|
|
|
|
A pervasive change has been made to try to change searches by name
|
|
or by id from O(n) to O(1).
|
|
This uses hashing for name-based search
|
|
and vectors for numeric id-based search.
|
|
All other cases were left as O(n) searches.
|
|
|
|
This document describes the architecture and details of the netCDF
|
|
internal object lookup mechanisms now in place.
|
|
|
|
\section Sindexed_searches Indexed Searches
|
|
|
|
There are, as a rule, two searches that are used to locate
|
|
metadata object: (1) search by name and (2) search by
|
|
externally visible id (e.g. dimid or varid).
|
|
|
|
It is currently the case that after all the metadata is read or
|
|
created, hashing is used for locating objects by name. In all
|
|
other cases -- apparently -- lookup is by linear search of a
|
|
of linked list.
|
|
|
|
It is relevant that, once created, no metadata object -- except
|
|
attributes -- can be deleted. They can be renamed, but that
|
|
does not change the associated structure or id. Deletion only
|
|
occurs when an error occurs in creating an object or on invoking
|
|
"nc_close()".
|
|
|
|
The numeric identifiers for dimensions, types, and groups are
|
|
all globally unique across a file. But note that variable id's
|
|
are not globally unique (IMO a bad design decision) but are only
|
|
unique within the containing group. Thus, in order to provide a
|
|
unique id for a variable it must be composed of the containing
|
|
group id plus the variable id.
|
|
|
|
Note also that names are unique only within a group and with respect
|
|
to some kind of metadata. That is a group cannot have e.g. two
|
|
dimensions with the same name. But it can have a variable and a dimension
|
|
with the same name (as with coordinate variables).
|
|
|
|
Finally, attribute names are unique only with respect to each other
|
|
and with respect to the containing object (a variable or a group).
|
|
|
|
\section Sbasic_data_structures Basic Data Structures
|
|
|
|
The basic data structures used by the new lookup mechanisms
|
|
are described in the following sections.
|
|
|
|
\subsection Snclist NClist
|
|
|
|
With rare exceptions, vectors of objects are maintained as
|
|
instances of NClist, which provides a dynamically extendible
|
|
vector of pointers: pointers to metadata objects in this case.
|
|
It is possible to append new objects or insert at a specific
|
|
vector offset, or overwrite an existing pointer at a specific
|
|
offset.
|
|
|
|
The NClist structure definition is as follows.
|
|
|
|
\code
|
|
typedef struct NClist {
|
|
size_t alloc;
|
|
size_t length;
|
|
void** content;
|
|
} NClist;
|
|
\endcode
|
|
|
|
\subsection Snc_hashmap NC_hashmap
|
|
|
|
The NC_hashmap type is a hash table mapping a string
|
|
(the key) to a data item. As a rule, the data item is a pointer to a
|
|
metadata object. The current implementation supports table
|
|
expansion when the # of entries in the table starts to get too
|
|
large. A simple linear rehash is used for collisions
|
|
and no separate hash-chain is used. This means that when
|
|
expanded, it must be completely rebuilt. The performance hit for
|
|
this has yet to be determined. The alternative is to move to some
|
|
form of extendible hashing as used in databases.
|
|
|
|
The hashtable definition is as follows.
|
|
|
|
\code
|
|
typedef struct NC_hashmap {
|
|
size_t size;
|
|
size_t count;
|
|
NC_hentry* table;
|
|
} NC_hashmap;
|
|
\endcode
|
|
|
|
where size is the current allocated size and count is the
|
|
number of active entries in the table. The "table" field is
|
|
a vector of entries of this form.
|
|
|
|
\code
|
|
typedef struct NC_hentry {
|
|
int flags;
|
|
uintptr_t data;
|
|
unsigned int hashkey;
|
|
size_t keysize;
|
|
char* key;
|
|
} NC_hentry;
|
|
\endcode
|
|
|
|
The _flags_ field indicates the state of the entry and can be
|
|
in one of three disjoint states:
|
|
|
|
1. ACTIVE - there is an object referenced in this entry
|
|
2. DELETED - an entry was deleted, but must be marked so
|
|
that linear rehash will work.
|
|
3. EMPTY - unused
|
|
|
|
The "data" field is of type "uintptr_t". Note that we assume
|
|
that sizeof(unintptr_t) == sizeof(void*). This is very
|
|
important. It is supposed to be the case that a value of type
|
|
uintptr_t is an integer of sufficient size to hold a void* pointer.
|
|
|
|
This means that the data field can hold an unsigned integer or a
|
|
void* pointer. As a pointer, it often points to an instance of a
|
|
variable, or dimension, or other object.
|
|
|
|
The hashkey field is a CRC32 hash of the key. Note that comparing
|
|
hashkeys is not sufficient to ensure that the corresponding keys are
|
|
the same because hash collisions are possible. Even moving to, say,
|
|
64 bit keys, is probably not sufficient to avoid hash collisions.
|
|
A 128 bit key (e.g. MD5) might be sufficient but mathematical
|
|
investigation would be required.
|
|
|
|
Since comparing only hash keys is not sufficient, it is necessary to store
|
|
a copy of the actual key. The key and keysize fields are used for this.
|
|
|
|
\subsection Sncindex NCindex
|
|
|
|
An "index" \(aka instance of type "NCindex"\) is a combination
|
|
of one NClist instance plus one NC_hashmap instance.
|
|
The hashmap maps a name to the position of the correspondingly
|
|
named object in the NClist part of the NCindex.
|
|
|
|
An index is used to provide several kinds of lookup with respect
|
|
to a specific list of metadata objects. For example, the
|
|
subgroups of a group are stored using a vector of pointers to
|
|
the subgroup objects. This also provides information about
|
|
creation order, which is sometimes important. However, we often
|
|
need fast access to that vector by name so an NCindex object
|
|
provides these capabilities. The NCindex object contains:
|
|
|
|
1. A vector into which the object pointers can be stored
|
|
and iterated over.
|
|
2. A map from name to the corresponding object index in the vector.
|
|
|
|
Note that currently, NCindex is only used in libsrc4 and libhdf4.
|
|
But if performance issues warrant, it will also be used in
|
|
libsrc.
|
|
|
|
Note also that alternative implementations are feasible that do not
|
|
use a hash table for name indexing, but rather keep a list sorted by name
|
|
and use binary search to do name-based lookup. If this alternative were
|
|
implemented, then it is probable that we could get rid of using the NC_hashmap
|
|
structure altogether for netcdf-4. There is a performance cost since binary
|
|
search is O(log n). In practice, it is probable that this is of negligible
|
|
effect. The advantage is that rename operations become considerably simpler.
|
|
|
|
\section Sglobal_object_access Global Object Access
|
|
|
|
As mentioned, dimension, group, and type external id's (dimid,
|
|
grpid, typeid) are unique across the whole file. It is therefore
|
|
convenient to store in memory a per-file vector for each object
|
|
type such that the external id of the object is the same as the
|
|
position of that object in the corresponding per-file
|
|
vector. This makes lookup by external id very efficient.
|
|
Note that this was already the case for netcdf-3 (libsrc) so
|
|
this is a change for libsrc4 only.
|
|
|
|
The global set of dimensions, types, and groups is maintained by
|
|
three instances of NClist in the NC_FILE_INFO structure:
|
|
namely _alldims_, _alltypes_, and _allgroups_.
|
|
The position of the object within the corresponding list determines
|
|
the object's external id. Thus, a position of a dimension object within the
|
|
"alldims" field of the file structure determines its dimid. Similarly
|
|
for types and groups.
|
|
|
|
\section Sper_group_object_access Per-Group Object Access
|
|
|
|
Each group object (NC_GRP_INFO_T) contains five instances of
|
|
NCindex. One is for dimensions, one is for types, one is for
|
|
subgroups, one is for variables, and one is for attributes. An
|
|
index is used for two reasons. First, it allows name-based lookup
|
|
for these items. Second, the declaration order is maintained by
|
|
the list within the index's vector. Note that the position of
|
|
an object in a group index vector has no necessary
|
|
relationship to the position of that object within the global
|
|
vectors.
|
|
|
|
Note however that the index vector for variables does define
|
|
the variable id, which is unique only within a group.
|
|
In this special case, the external id for the variable is
|
|
the same as its offset in the index's vector for the group.
|
|
|
|
A note about typeids. Since user defined types have an external
|
|
id starting at NC_FIRSTUSERTYPEID, we leave the global type
|
|
vector entries 0..NC_FIRSTUSERTYPEID-1 empty.
|
|
|
|
\section Smetadata_object_header Metadata Object Header
|
|
|
|
Each metadata object (e.g. NC_DIM_INFO_T, NC_VAR_INFO_T)
|
|
now has what is called a "hdr" object as its first field.
|
|
This provides a form of pseudo-inheritance for these objects
|
|
because they can all be cast to "NC_OBJ" to get common information.
|
|
The structure of the header is as follows.
|
|
|
|
\code
|
|
typedef struct NC_OBJ {
|
|
NC_SORT sort;
|
|
char* name; /* assumed to be null terminated */
|
|
size_t id;
|
|
unsigned int hashkey;
|
|
} NC_OBJ;
|
|
\endcode
|
|
|
|
The sort is one of the values _NCVAR_, _NCDIM_, _NCATT_, _NCTYP_, or _NCGRP_.
|
|
The name is assumed to be nul terminated. The id is the assigned id
|
|
for the object. The hashkey is the same hash value as used in nchashtable.c.
|
|
|
|
\section Scliches Programming cliches
|
|
|
|
\subsection Slookupname Lookup an Object by Name
|
|
|
|
In the original code, the following _cliche_ (code snippet)
|
|
was common for looking up an object by name.
|
|
\code
|
|
NC_GRP_INFO_T* grp = ...;
|
|
...
|
|
NC_GRP_INFO_T* g;
|
|
...
|
|
for (g = grp->children; g; g = g->l.next) {
|
|
if(strcmp(name,g->name)==0) {
|
|
... code to process matching grp by name
|
|
}
|
|
...
|
|
}
|
|
\endcode
|
|
In this case, this loop is iterating across all the subgroups (children)
|
|
of the grp. It does so by walking the linked list of child groups.
|
|
It does a name comparison in order to find the group with the desired name.
|
|
|
|
In the new code, this iteration cliche is replaced by something
|
|
that will look like this.
|
|
\code
|
|
NC_GRP_INFO_T* grp = ...;
|
|
NC_GRP_INFO_T* g;
|
|
...
|
|
g = ncindexlookup(grp->children,name);
|
|
if(g != NULL)
|
|
... code to process matching grp by name
|
|
}
|
|
\endcode
|
|
In this case, the iteration is replaced by a hashtable lookup.
|
|
|
|
\subsection Slookupid Lookup an Object by id
|
|
|
|
In the original code, the following _cliche_ (code snippet)
|
|
was common for looking up an object by its id (dimid, varid, etc).
|
|
\code
|
|
NC_GRP_INFO_T* grp = ...;
|
|
...
|
|
NC_DIM_INFO_T* d;
|
|
...
|
|
for (d = grp->dim; d; d = d->l.next) {
|
|
if(varid == d->dimid)
|
|
... code to process matching dim by index
|
|
}
|
|
...
|
|
}
|
|
\endcode
|
|
In this case, this loop is iterating across all the dimension objects
|
|
of the grp. It does so by walking the linked list of dimensions.
|
|
It does an id comparison in order to find the group with the desired
|
|
dimension.
|
|
|
|
In the new code, this iteration cliche is replaced by something
|
|
that will look like this.
|
|
\code
|
|
NC_FILE_INFO_T* h5 = ...;
|
|
NC_DIM_INFO_T* d;;
|
|
...
|
|
d = nclistget(h5->alldims,id);
|
|
if(d != NULL)
|
|
... code to process matching dim by id
|
|
}
|
|
\endcode
|
|
This shows how the alldims vector is used to map from a
|
|
dimid directly to the matching dimension object.
|
|
In this example, h5 is the NC_FILE_INFO_T file object.
|
|
This approach works for dimension ids, group ids, and type ids
|
|
because they are globally unique.
|
|
|
|
For variables and attributes, we have to use the containing group's
|
|
NCindex, such as grp->vars. In this case, the varid, is mapped using
|
|
code like this.
|
|
\code
|
|
NC_GRP_INFO_T* grp = ...;
|
|
NC_VAR_INFO_T* v;
|
|
...
|
|
v = ncindexith(grp->vars,id);
|
|
if(v != NULL)
|
|
... code to process matching variable by id
|
|
}
|
|
\endcode
|
|
|
|
\subsection Siterate Iterating over sets of objects
|
|
|
|
In the original code, the following _cliche_ (code snippet)
|
|
was common.
|
|
\code
|
|
NC_GRP_INFO_T* grp;
|
|
...
|
|
NC_GRP_INFO_T* g;
|
|
...
|
|
for (g = grp->children; g; g = g->l.next)
|
|
...
|
|
\endcode
|
|
In this case, this loop is iterating across all the subgroups (children)
|
|
of the grp. It does so by walking the linked list of child groups.
|
|
Similar loops are used to walk a list of dimensions, variables, types,
|
|
or attributes.
|
|
|
|
In the new code, this iteration cliche is replaced by something
|
|
that will look like this.
|
|
\code
|
|
NC_GRP_INFO_T* grp;
|
|
...
|
|
for(i=0;i<ncindexsize(grp->children);i++) {
|
|
NC_GRP_INFO_T* g = nclistith(grp->children,i);
|
|
...
|
|
}
|
|
\endcode
|
|
In this case, the iteration is by index into the underlying vector.
|
|
|
|
\section Sperf Performance
|
|
|
|
The initial impetus for this change was to improve the performance
|
|
of netcdf-4 metadata loading by replacing linear searches with O(1)
|
|
searches.
|
|
|
|
In fact, this goal has not been met. It appears to be the case
|
|
that the metadata loading costs are entirely dominated by the
|
|
performance of the HDF5 library. The reason for this is that
|
|
the netcdf-c library loads all the metadata immediately
|
|
when a file is opened. This in turn means that all of the metadata
|
|
is immediately extracted from the underlying HDF5 file. So, there is
|
|
no opportunity for lazy loading to be used.
|
|
|
|
The remedys of which I can conceive are these.
|
|
|
|
1. Modify the netcdf-c library to also do lazy loading
|
|
(work on this is under way).
|
|
2. Store a single metadata object into the file so it can
|
|
be read at one shot. This object would then be processed
|
|
in-memory to construct the internal metadata. The costs for
|
|
this approach are space in the file plus the need to keep it
|
|
consistent with the actual metadata stored by HDF5.
|
|
|
|
It should be noted that there is an effect from this change.
|
|
Using gprof, one can see that in the original code the obj_list_add
|
|
function was the dominate function called by a large percentage (about 20%).
|
|
Whereas with the new code, the function call distribution is much more
|
|
even with no function taking more than 4-5%.
|
|
|
|
Some other observations:
|
|
|
|
1. the utf8 code now shows up as taking about 4%. Given that most names
|
|
are straight ASCII, it might pay to try to optimize for this to avoid
|
|
invoking the utf8 processing code.
|
|
2. In the new code, attribute processing appears to take up a lot of the
|
|
time. This, however might be an artifact of the test cases.
|
|
3. There is a small performance improvement from avoiding walking the linked
|
|
list. It appears that creating a file is about 10% faster and opening a file
|
|
is also about 10% faster.
|
|
|
|
\section Snotes_and_warnings Notes and Warning
|
|
|
|
1. NCindex is currently not used for enum constants and compound fields.
|
|
Additionally, it is not used for listing the dimensions associated
|
|
with a variable.
|
|
2. References between meta-data objects (e.g. group parent or
|
|
containing group) are stored directly and not using any kind
|
|
of vector or hashtable.
|
|
3. Attribute deletion is still a costly operation because it causes
|
|
the whole attribute index to be rebuilt.
|
|
4. Renaming is still a costly operation because it causes
|
|
the whole containing index to be rebuilt.
|
|
5. As in the original code, object ids (dimid, etc) are assigned
|
|
explicitly using counters within the NC_FILE_INFO_T object.
|
|
When stored into, for example, "alldims", the position of the
|
|
object is forcibly made to match the value of the assigned id.
|
|
6. The file nchashmap.c has a constant, SMALLTABLE, that controls
|
|
the size of the default hash table. Setting this constant
|
|
may be useful in debugging by reducing the default table size.
|
|
7. The file ncindex.c has a constant, SMALLTABLE, that controls
|
|
the size of the default hash table. Setting this constant
|
|
may be useful in debugging by reducing the default table size.
|
|
8. The file ncindex.c has a constant, NCNOHASH, that controls
|
|
if the index uses that hash table versus just searching the
|
|
index's vector. This is for experimental purposes.
|
|
|
|
\section Sprovenance Contact Information
|
|
|
|
__Author__: Dennis Heimbigner<br>
|
|
__Initial Version__: 01/10/2018<br>
|
|
__Last Revised__: 03/15/2018
|
|
|
|
*/
|