mirror of
https://github.com/Unidata/netcdf-c.git
synced 2024-12-27 08:49:16 +08:00
40013b72f6
in the docs directory. 1. Add a new internal document -- testserver.dox -- to describe how to set up and maintain the dap test server. 2. It moves the internal documentation (internal.dox, indexing.dox, and testserver.dox) to later in the documentation table of contents. 3. Cleanup the formatting of the internal documents. 4. Cleanup some minor doxygen issues in other files.
432 lines
16 KiB
Plaintext
432 lines
16 KiB
Plaintext
/** \file
|
|
|
|
\internal
|
|
|
|
\page nchashmap Indexed Access to Metadata Objects
|
|
|
|
\tableofcontents
|
|
|
|
The original internal representations of metadata in memory
|
|
relied on linear searching of lists to locate various objects
|
|
by name or by numeric id: by _varid_ or by _grpid_ for example.
|
|
|
|
In recent years, the flaws in that approach have become obvious
|
|
as users create files with extremely large numbers of objects:
|
|
groups, variables, attributes, and dimensions. One case
|
|
has 14 megabytes of metadata. Creating and (especially) later
|
|
opening such files was exceedingly slow.
|
|
|
|
This problem was partially alleviated in both netcdf-3 (libsrc)
|
|
and netcdf-4 (libsrc4) by adding name hashing tables.
|
|
However, and especially for netcdf-4, linear search still prevailed.
|
|
|
|
A pervasive change has been made to try to change searches by name
|
|
or by id from O(n) to O(1).
|
|
This uses hashing for name-based search
|
|
and vectors for numeric id-based search.
|
|
All other cases were left as O(n) searches.
|
|
|
|
This document describes the architecture and details of the netCDF
|
|
internal object lookup mechanisms now in place.
|
|
|
|
\section Sindexed_searches Indexed Searches
|
|
|
|
There are, as a rule, two searches that are used to locate
|
|
metadata object: (1) search by name and (2) search by
|
|
externally visible id (e.g. dimid or varid).
|
|
|
|
It is currently the case that after all the metadata is read or
|
|
created, hashing is used for locating objects by name. In all
|
|
other cases -- apparently -- lookup is by linear search of a
|
|
of linked list.
|
|
|
|
It is relevant that, once created, no metadata object -- except
|
|
attributes -- can be deleted. They can be renamed, but that
|
|
does not change the associated structure or id. Deletion only
|
|
occurs when an error occurs in creating an object or on invoking
|
|
"nc_close()".
|
|
|
|
The numeric identifiers for dimensions, types, and groups are
|
|
all globally unique across a file. But note that variable id's
|
|
are not globally unique (IMO a bad design decision) but are only
|
|
unique within the containing group. Thus, in order to provide a
|
|
unique id for a variable it must be composed of the containing
|
|
group id plus the variable id.
|
|
|
|
Note also that names are unique only within a group and with respect
|
|
to some kind of metadata. That is a group cannot have e.g. two
|
|
dimensions with the same name. But it can have a variable and a dimension
|
|
with the same name (as with coordinate variables).
|
|
|
|
Finally, attribute names are unique only with respect to each other
|
|
and with respect to the containing object (a variable or a group).
|
|
|
|
\section Sbasic_data_structures Basic Data Structures
|
|
|
|
The basic data structures used by the new lookup mechanisms
|
|
are described in the following sections.
|
|
|
|
\subsection Snclist NClist
|
|
|
|
With rare exceptions, vectors of objects are maintained as
|
|
instances of NClist, which provides a dynamically extendible
|
|
vector of pointers: pointers to metadata objects in this case.
|
|
It is possible to append new objects or insert at a specific
|
|
vector offset, or overwrite an existing pointer at a specific
|
|
offset.
|
|
|
|
The NClist structure definition is as follows.
|
|
|
|
\code
|
|
typedef struct NClist {
|
|
size_t alloc;
|
|
size_t length;
|
|
void** content;
|
|
} NClist;
|
|
\endcode
|
|
|
|
\subsection Snc_hashmap NC_hashmap
|
|
|
|
The NC_hashmap type is a hash table mapping a string
|
|
(the key) to a data item. As a rule, the data item is a pointer to a
|
|
metadata object. The current implementation supports table
|
|
expansion when the # of entries in the table starts to get too
|
|
large. A simple linear rehash is used for collisions
|
|
and no separate hash-chain is used. This means that when
|
|
expanded, it must be completely rebuilt. The performance hit for
|
|
this has yet to be determined. The alternative is to move to some
|
|
form of extendible hashing as used in databases.
|
|
|
|
The hashtable definition is as follows.
|
|
|
|
\code
|
|
typedef struct NC_hashmap {
|
|
size_t size;
|
|
size_t count;
|
|
NC_hentry* table;
|
|
} NC_hashmap;
|
|
\endcode
|
|
|
|
where size is the current allocated size and count is the
|
|
number of active entries in the table. The "table" field is
|
|
a vector of entries of this form.
|
|
|
|
\code
|
|
typedef struct NC_hentry {
|
|
int flags;
|
|
uintptr_t data;
|
|
unsigned int hashkey;
|
|
size_t keysize;
|
|
char* key;
|
|
} NC_hentry;
|
|
\endcode
|
|
|
|
The _flags_ field indicates the state of the entry and can be
|
|
in one of three disjoint states:
|
|
|
|
1. ACTIVE - there is an object referenced in this entry
|
|
2. DELETED - an entry was deleted, but must be marked so
|
|
that linear rehash will work.
|
|
3. EMPTY - unused
|
|
|
|
The "data" field is of type "uintptr_t". Note that we assume
|
|
that sizeof(unintptr_t) == sizeof(void*). This is very
|
|
important. It is supposed to be the case that a value of type
|
|
uintptr_t is an integer of sufficient size to hold a void* pointer.
|
|
|
|
This means that the data field can hold an unsigned integer or a
|
|
void* pointer. As a pointer, it often points to an instance of a
|
|
variable, or dimension, or other object.
|
|
|
|
The hashkey field is a CRC32 hash of the key. Note that comparing
|
|
hashkeys is not sufficient to ensure that the corresponding keys are
|
|
the same because hash collisions are possible. Even moving to, say,
|
|
64 bit keys, is probably not sufficient to avoid hash collisions.
|
|
A 128 bit key (e.g. MD5) might be sufficient but mathematical
|
|
investigation would be required.
|
|
|
|
Since comparing only hash keys is not sufficient, it is necessary to store
|
|
a copy of the actual key. The key and keysize fields are used for this.
|
|
|
|
\subsection Sncindex NCindex
|
|
|
|
An "index" \(aka instance of type "NCindex"\) is a combination
|
|
of one NClist instance plus one NC_hashmap instance.
|
|
The hashmap maps a name to the position of the correspondingly
|
|
named object in the NClist part of the NCindex.
|
|
|
|
An index is used to provide several kinds of lookup with respect
|
|
to a specific list of metadata objects. For example, the
|
|
subgroups of a group are stored using a vector of pointers to
|
|
the subgroup objects. This also provides information about
|
|
creation order, which is sometimes important. However, we often
|
|
need fast access to that vector by name so an NCindex object
|
|
provides these capabilities. The NCindex object contains:
|
|
|
|
1. A vector into which the object pointers can be stored
|
|
and iterated over.
|
|
2. A map from name to the corresponding object index in the vector.
|
|
|
|
Note that currently, NCindex is only used in libsrc4 and libhdf4.
|
|
But if performance issues warrant, it will also be used in
|
|
libsrc.
|
|
|
|
Note also that alternative implementations are feasible that do not
|
|
use a hash table for name indexing, but rather keep a list sorted by name
|
|
and use binary search to do name-based lookup. If this alternative were
|
|
implemented, then it is probable that we could get rid of using the NC_hashmap
|
|
structure altogether for netcdf-4. There is a performance cost since binary
|
|
search is O(log n). In practice, it is probable that this is of negligable
|
|
effect. The advantage is that rename operations become considerably simpler.
|
|
|
|
\section Sglobal_object_access Global Object Access
|
|
|
|
As mentioned, dimension, group, and type external id's (dimid,
|
|
grpid, typeid) are unique across the whole file. It is therefore
|
|
convenient to store in memory a per-file vector for each object
|
|
type such that the external id of the object is the same as the
|
|
position of that object in the corresponding per-file
|
|
vector. This makes lookup by external id very efficient.
|
|
Note that this was already the case for netcdf-3 (libsrc) so
|
|
this is a change for libsrc4 only.
|
|
|
|
The global set of dimensions, types, and groups is maintained by
|
|
three instances of NClist in the NC_FILE_INFO structure:
|
|
namely _alldims_, _alltypes_, and _allgroups_.
|
|
The position of the object within the corresponding list determines
|
|
the object's external id. Thus, a position of a dimension object within the
|
|
"alldims" field of the file structure determines its dimid. Similarly
|
|
for types and groups.
|
|
|
|
\section Sper_group_object_access Per-Group Object Access
|
|
|
|
Each group object (NC_GRP_INFO_T) contains five instances of
|
|
NCindex. One is for dimensions, one is for types, one is for
|
|
subgroups, one is for variables, and one is for attributes. An
|
|
index is used for two reasons. First, it allows name-based lookup
|
|
for these items. Second, the declaration order is maintained by
|
|
the list within the index's vector. Note that the position of
|
|
an object in a group index vector has no necessary
|
|
relationship to the position of that object within the global
|
|
vectors.
|
|
|
|
Note however that the index vector for variables does define
|
|
the variable id, which is unique only within a group.
|
|
In this special case, the external id for the variable is
|
|
the same as its offset in the index's vector for the group.
|
|
|
|
A note about typeids. Since user defined types have an external
|
|
id starting at NC_FIRSTUSERTYPEID, we leave the global type
|
|
vector entries 0..NC_FIRSTUSERTYPEID-1 empty.
|
|
|
|
\section Smetadata_object_header Metadata Object Header
|
|
|
|
Each metadata object (e.g. NC_DIM_INFO_T, NC_VAR_INFO_T)
|
|
now has what is called a "hdr" object as its first field.
|
|
This provides a form of pseudo-inheritance for these objects
|
|
because they can all be cast to "NC_OBJ" to get common information.
|
|
The structure of the header is as follows.
|
|
|
|
\code
|
|
typedef struct NC_OBJ {
|
|
NC_SORT sort;
|
|
char* name; /* assumed to be null terminated */
|
|
size_t id;
|
|
unsigned int hashkey;
|
|
} NC_OBJ;
|
|
\endcode
|
|
|
|
The sort is one of the values _NCVAR_, _NCDIM_, _NCATT_, _NCTYP_, or _NCGRP_.
|
|
The name is assumed to be nul terminated. The id is the assigned id
|
|
for the object. The hashkey is the same hash value as used in nchashtable.c.
|
|
|
|
\section Scliches Programming cliches
|
|
|
|
\subsection Slookupname Lookup an Object by Name
|
|
|
|
In the original code, the following _cliche_ (code snippet)
|
|
was common for looking up an object by name.
|
|
\code
|
|
NC_GRP_INFO_T* grp = ...;
|
|
...
|
|
NC_GRP_INFO_T* g;
|
|
...
|
|
for (g = grp->children; g; g = g->l.next) {
|
|
if(strcmp(name,g->name)==0) {
|
|
... code to process matching grp by name
|
|
}
|
|
...
|
|
}
|
|
\endcode
|
|
In this case, this loop is iterating across all the subgroups (children)
|
|
of the grp. It does so by walking the linked list of child groups.
|
|
It does a name comparison in order to find the group with the desired name.
|
|
|
|
In the new code, this iteration cliche is replaced by something
|
|
that will look like this.
|
|
\code
|
|
NC_GRP_INFO_T* grp = ...;
|
|
NC_GRP_INFO_T* g;
|
|
...
|
|
g = ncindexlookup(grp->children,name);
|
|
if(g != NULL)
|
|
... code to process matching grp by name
|
|
}
|
|
\endcode
|
|
In this case, the iteration is replaced by a hashtable lookup.
|
|
|
|
\subsection Slookupid Lookup an Object by id
|
|
|
|
In the original code, the following _cliche_ (code snippet)
|
|
was common for looking up an object by its id (dimid, varid, etc).
|
|
\code
|
|
NC_GRP_INFO_T* grp = ...;
|
|
...
|
|
NC_DIM_INFO_T* d;
|
|
...
|
|
for (d = grp->dim; d; d = d->l.next) {
|
|
if(varid == d->dimid)
|
|
... code to process matching dim by index
|
|
}
|
|
...
|
|
}
|
|
\endcode
|
|
In this case, this loop is iterating across all the dimension objects
|
|
of the grp. It does so by walking the linked list of dimensions.
|
|
It does an id comparison in order to find the group with the desired
|
|
dimension.
|
|
|
|
In the new code, this iteration cliche is replaced by something
|
|
that will look like this.
|
|
\code
|
|
NC_FILE_INFO_T* h5 = ...;
|
|
NC_DIM_INFO_T* d;;
|
|
...
|
|
d = nclistget(h5->alldims,id);
|
|
if(d != NULL)
|
|
... code to process matching dim by id
|
|
}
|
|
\endcode
|
|
This shows how the alldims vector is used to map from a
|
|
dimid directly to the matching dimension object.
|
|
In this example, h5 is the NC_FILE_INFO_T file object.
|
|
This approach works for dimension ids, group ids, and type ids
|
|
because they are globally unique.
|
|
|
|
For variables and attributes, we have to use the containing group's
|
|
NCindex, such as grp->vars. In this case, the varid, is mapped using
|
|
code like this.
|
|
\code
|
|
NC_GRP_INFO_T* grp = ...;
|
|
NC_VAR_INFO_T* v;
|
|
...
|
|
v = ncindexith(grp->vars,id);
|
|
if(v != NULL)
|
|
... code to process matching variable by id
|
|
}
|
|
\endcode
|
|
|
|
\subsection Siterate Iterating over sets of objects
|
|
|
|
In the original code, the following _cliche_ (code snippet)
|
|
was common.
|
|
\code
|
|
NC_GRP_INFO_T* grp;
|
|
...
|
|
NC_GRP_INFO_T* g;
|
|
...
|
|
for (g = grp->children; g; g = g->l.next)
|
|
...
|
|
\endcode
|
|
In this case, this loop is iterating across all the subgroups (children)
|
|
of the grp. It does so by walking the linked list of child groups.
|
|
Similar loops are used to walk a list of dimensions, variables, types,
|
|
or attributes.
|
|
|
|
In the new code, this iteration cliche is replaced by something
|
|
that will look like this.
|
|
\code
|
|
NC_GRP_INFO_T* grp;
|
|
...
|
|
for(i=0;i<ncindexsize(grp->children);i++) {
|
|
NC_GRP_INFO_T* g = nclistith(grp->children,i);
|
|
...
|
|
}
|
|
\endcode
|
|
In this case, the iteration is by index into the underlying vector.
|
|
|
|
\section Sperf Performance
|
|
|
|
The initial impetus for this change was to improve the performance
|
|
of netcdf-4 metadata loading by replacing linear searches with O(1)
|
|
searches.
|
|
|
|
In fact, this goal has not been met. It appears to be the case
|
|
that the metadata loading costs are entirely dominated by the
|
|
performance of the HDF5 library. The reason for this is that
|
|
the netcdf-c library loads all the metadata immediately
|
|
when a file is opened. This in turn means that all of the metadata
|
|
is immediately extracted from the underlying HDF5 file. So, there is
|
|
no opportunity for lazy loading to be used.
|
|
|
|
The remedys of which I can conceive are these.
|
|
|
|
1. Modify the netcdf-c library to also do lazy loading
|
|
(work on this is under way).
|
|
2. Store a single metadata object into the file so it can
|
|
be read at one shot. This object would then be processed
|
|
in-memory to construct the internal metadata. The costs for
|
|
this approach are space in the file plus the need to keep it
|
|
consistent with the actual metadata stored by HDF5.
|
|
|
|
It should be noted that there is an effect from this change.
|
|
Using gprof, one can see that in the original code the obj_list_add
|
|
function was the dominate function called by a large percentage (about 20%).
|
|
Whereas with the new code, the function call distribution is much more
|
|
even with no function taking more than 4-5%.
|
|
|
|
Some other observations:
|
|
|
|
1. the utf8 code now shows up as taking about 4%. Given that most names
|
|
are straight ASCII, it might pay to try to optimize for this to avoid
|
|
invoking the utf8 processing code.
|
|
2. In the new code, attribute processing appears to take up a lot of the
|
|
time. This, however might be an artifact of the test cases.
|
|
3. There is a small performance improvement from avoiding walking the linked
|
|
list. It appears that creating a file is about 10% faster and opening a file
|
|
is also about 10% faster.
|
|
|
|
\section Snotes_and_warnings Notes and Warning
|
|
|
|
1. NCindex is currently not used for enum constants and compound fields.
|
|
Additionally, it is not used for listing the dimensions associated
|
|
with a variable.
|
|
2. References between meta-data objects (e.g. group parent or
|
|
containing group) are stored directly and not using any kind
|
|
of vector or hashtable.
|
|
3. Attribute deletion is still a costly operation because it causes
|
|
the whole attribute index to be rebuilt.
|
|
4. Renaming is still a costly operation because it causes
|
|
the whole containing index to be rebuilt.
|
|
5. As in the original code, object ids (dimid, etc) are assigned
|
|
explicitly using counters within the NC_FILE_INFO_T object.
|
|
When stored into, for example, "alldims", the position of the
|
|
object is forcibly made to match the value of the assigned id.
|
|
6. The file nchashmap.c has a constant, SMALLTABLE, that controls
|
|
the size of the default hash table. Setting this constant
|
|
may be useful in debugging by reducing the default table size.
|
|
7. The file ncindex.c has a constant, SMALLTABLE, that controls
|
|
the size of the default hash table. Setting this constant
|
|
may be useful in debugging by reducing the default table size.
|
|
8. The file ncindex.c has a constant, NCNOHASH, that controls
|
|
if the index uses that hash table versus just searching the
|
|
index's vector. This is for experimental purposes.
|
|
|
|
\section Sprovenance Contact Information
|
|
|
|
__Author__: Dennis Heimbigner<br>
|
|
__Initial Version__: 01/10/2018<br>
|
|
__Last Revised__: 03/15/2018
|
|
|
|
*/
|