mirror of
https://github.com/Unidata/netcdf-c.git
synced 2025-01-12 15:45:21 +08:00
506 lines
14 KiB
HTML
506 lines
14 KiB
HTML
|
<html>
|
||
|
<body>
|
||
|
<h1>DAP to Netcdf Translation Rules</h1>
|
||
|
Two translations are currently available.
|
||
|
<ul>
|
||
|
<li><a href="#ncdap3">DAP 2 Protocol to netCDF-3</a>
|
||
|
<li><a href="#ncdap4">DAP 2 Protocol to netCDF-4</a>
|
||
|
</ul>
|
||
|
|
||
|
<h2><a name="ncdap3">netCDF-3 Translation Rules</a></h3>
|
||
|
The current set of translation rules to convert an OPeNDAP
|
||
|
DAP protocol version 2 DDS to netCDF-3 is designed to mimic
|
||
|
as closely as possible those currently used by the libnc-dap
|
||
|
system. Please note that the translation is still subject
|
||
|
to change to respond to unforeseen problems and user
|
||
|
suggestions.
|
||
|
<p>
|
||
|
For illustrative purposes, the following example will be used.
|
||
|
<pre>
|
||
|
Dataset {
|
||
|
Int32 f1;
|
||
|
Structure {
|
||
|
Int32 f11;
|
||
|
Structure {
|
||
|
Int32 f1[3];
|
||
|
Int32 f2;
|
||
|
} FS2[2];
|
||
|
} S1;
|
||
|
Structure {
|
||
|
Grid {
|
||
|
Array:
|
||
|
Float32 temp[lat=2][lon=2];
|
||
|
Maps:
|
||
|
Int32 lat[lat=2];
|
||
|
Int32 lon[lon=2];
|
||
|
} G1;
|
||
|
} S2;
|
||
|
Grid {
|
||
|
Array:
|
||
|
Float32 G2[lat=2][lon=2];
|
||
|
Maps:
|
||
|
Int32 lat[2];
|
||
|
Int32 lon[2];
|
||
|
} G2;
|
||
|
Int32 lat[lat=2];
|
||
|
Int32 lon[lon=2];
|
||
|
} D1;
|
||
|
</pre>
|
||
|
|
||
|
<h3>Variable Definition</h3>
|
||
|
The set of variables is defined by the fields with
|
||
|
primitive base types as they occur in
|
||
|
Sequences, Grids, and Structures.
|
||
|
The field names are modified to be fully qualified initially.
|
||
|
For the above, the set of variables, the variables are as follows.
|
||
|
<ol>
|
||
|
<li>f1
|
||
|
<li>S1.f11
|
||
|
<li>S1.FS2.f1
|
||
|
<li>S1.FS2.f2
|
||
|
<li>S2.G1.temp
|
||
|
<li>S2.G1.lat
|
||
|
<li>S2.G1.lon
|
||
|
<li>S2.G2.G2
|
||
|
<li>S2.G2.lat
|
||
|
<li>S2.G2.lon
|
||
|
<li>lat
|
||
|
<li>lon
|
||
|
</ol>
|
||
|
|
||
|
<h3>Variable Dimension Translation</h3>
|
||
|
A variable's rank is determined from three sources.
|
||
|
<ol>
|
||
|
<li>
|
||
|
The variable has the dimensions associated with the field
|
||
|
it represents (e.g. S1.FS2.f1[3] in the above example).
|
||
|
<li>
|
||
|
The variable inherits the dimensions associated with any containing
|
||
|
structure that has a rank greater than zero.
|
||
|
These dimensions precede those of case 1.
|
||
|
Thus, we have in our example, f1[2][3], where the first dimension
|
||
|
comes from the containing Structure FS2[2].
|
||
|
<li>
|
||
|
The variable's set of dimensions are altered
|
||
|
if any of its containers is a DAP DDS Sequence.
|
||
|
This is discussed more fully below.
|
||
|
</ol>
|
||
|
|
||
|
<h3>Dimension translation</h3>
|
||
|
For dimensions, the rules are as follows.
|
||
|
<ol>
|
||
|
<li> Fields in dimensioned structures inherit the dimension
|
||
|
of the structure; thus the above list would have the following
|
||
|
dimensioned variables.
|
||
|
<ul>
|
||
|
<li>S1.FS2.f1 -> S1.FS2.f1[2][3]
|
||
|
<li>S1.FS2.f2 -> S1.FS2.f2[2]
|
||
|
<li>S2.G1.temp -> S2.G1.temp[lat=2][lon=2]
|
||
|
<li>S2.G1.lat -> S2.G1.lat[lat=2]
|
||
|
<li>S2.G1.lon -> S2.G1.lon[lon=2]
|
||
|
<li>S2.G2.G2 -> S2.G2.lon[lat=2][lon=2]
|
||
|
<li>S2.G2.lat -> S2.G2.lat[lat=2]
|
||
|
<li>S2.G2.lon -> S2.G2.lon[lon=2]
|
||
|
<li>lat -> lat[lat=2]
|
||
|
<li>lon -> lon[lon=2]
|
||
|
</ul>
|
||
|
<p>
|
||
|
<li>
|
||
|
Collect all of the dimension specifications from the DDS, both
|
||
|
named and anonymous (unnamed)
|
||
|
For each unique anonymous dimension with value NN
|
||
|
create a netCDF dimension of the form "<array>_<i>=NN",
|
||
|
where <array> is the fully qualified name of the variable and i is the
|
||
|
i'th (inherited) dimension of the array where the anonymous dimension occurs.
|
||
|
For our example, this would create the following dimensions.
|
||
|
<ul>
|
||
|
<li>S1.FS2.f1_0 = 2 ;
|
||
|
<li>S1.FS2.f1_1 = 3 ;
|
||
|
<li>S1.FS2.f2_0 = 2 ;
|
||
|
<li>S2.G2.lat_0 = 2 ;
|
||
|
<li>S2.G2.lon_0 = 2 ;
|
||
|
</ul>
|
||
|
<p>
|
||
|
<li>
|
||
|
If, however, the anonymous dimension is the single dimension
|
||
|
of a MAP vector in a Grid then the dimension is given the
|
||
|
same name as the map vector This leads to the following.
|
||
|
<ul>
|
||
|
<li>S2.G2.lat_0 -> S2.G2.lat
|
||
|
<li>S2.G2.lon_0 -> S2.G2.lon
|
||
|
</ul>
|
||
|
<p>
|
||
|
<li>
|
||
|
For each unique named dimension "<name>=NN",
|
||
|
create a netCDF dimension of the form "<name>=NN",
|
||
|
where name has the qualifications removed.
|
||
|
If this leads to duplicates (i.e. same name and same value),
|
||
|
then the duplicates are ignored.
|
||
|
This produces the following.
|
||
|
<ul>
|
||
|
<li>S2.G2.lat -> lat
|
||
|
<li>S2.G2.lon -> lon
|
||
|
</ul>
|
||
|
Note that this produces duplicates.
|
||
|
<p>
|
||
|
<li>
|
||
|
At this point the only dimensions left to process should be named
|
||
|
dimensions with the same name as some dimension from step number 3,
|
||
|
but with a different value. For those dimensions create a dimension
|
||
|
of the form "<name>M=NN" where M is a counter starting at 1.
|
||
|
The example has no instances of this.
|
||
|
<p>
|
||
|
<li>
|
||
|
Finally and if needed, define a single UNLIMITED dimension named "unlimited"
|
||
|
with value zero.
|
||
|
</ol>
|
||
|
This leads to the following set of dimensions.
|
||
|
<pre>
|
||
|
dimensions:
|
||
|
unlimited = UNLIMITED;
|
||
|
lat = 2 ;
|
||
|
lon = 2 ;
|
||
|
S1.FS2.f1_0 = 2 ;
|
||
|
S1.FS2.f1_1 = 3 ;
|
||
|
S1.FS2.f2_0 = 2 ;
|
||
|
</pre>
|
||
|
|
||
|
<h3>Variable Name Translation</h3>
|
||
|
The steps for variable name translation are as follows.
|
||
|
<p>
|
||
|
<ol>
|
||
|
<li>Take the set of variables captured above.
|
||
|
Thus for the above DDS, the following fields would be collected.
|
||
|
<ul>
|
||
|
<li>f1
|
||
|
<li>S1.f11
|
||
|
<li>S1.FS2.f1
|
||
|
<li>S1.FS2.f2
|
||
|
<li>S2.G1.temp
|
||
|
<li>S2.G1.lat
|
||
|
<li>S2.G1.lon
|
||
|
<li>S2.G2.G2
|
||
|
<li>S2.G2.lat
|
||
|
<li>S2.G2.lon
|
||
|
<li>lat
|
||
|
<li>lon
|
||
|
</ul>
|
||
|
<p>
|
||
|
<li>All grid array variables are renamed to be the same as the containing
|
||
|
grid and the grid prefix is removed.
|
||
|
In the above DDS, this results in the following changes.
|
||
|
<ol>
|
||
|
<li> G1.temp -> G1
|
||
|
<li> G2.G2 -> G2
|
||
|
</ol>
|
||
|
Note that, for example, the G1.lon keeps that name.
|
||
|
Also note that libnc-dap just drops the grid map variables,
|
||
|
so this is one place where the translation differs from
|
||
|
libnc-dap, but in a compatible way.
|
||
|
|
||
|
</ol>
|
||
|
<p>
|
||
|
It is important to note that if this process could produce duplicate
|
||
|
variables (i.e. with the same name); in that case they are all assumed
|
||
|
to have the same content and the duplicates are ignored.
|
||
|
If it turns out that the duplicates have different content, then
|
||
|
the translation will not detect this. YOU HAVE BEEN WARNED.
|
||
|
<p>
|
||
|
The final netCDF-3 schema (minus attributes) is then as follows.
|
||
|
<pre>
|
||
|
netcdf t {
|
||
|
dimensions:
|
||
|
unlimited = UNLIMITED
|
||
|
lat = 2 ;
|
||
|
lon = 2 ;
|
||
|
S1.FS2.f1_0 = 2 ;
|
||
|
S1.FS2.f1_1 = 3 ;
|
||
|
S1.FS2.f2_0 = 2 ;
|
||
|
variables:
|
||
|
int f1 ;
|
||
|
int lat(lat) ;
|
||
|
int lon(lon) ;
|
||
|
int S1.f11 ;
|
||
|
int S1.FS2.f1(S1.FS2.f1_0, S1.FS2.f1_1) ;
|
||
|
int S1.FS2.f2(S1_FS2_f2_0) ;
|
||
|
float S2.G1(lat, lon) ;
|
||
|
int S2.G1.lat(lat) ;
|
||
|
int S2.G1.lon(lon) ;
|
||
|
float G2(lat, lon) ;
|
||
|
int G2.lat(lat) ;
|
||
|
int G2.lon(lon) ;
|
||
|
}
|
||
|
</pre>
|
||
|
In actuality, the unlimited dimension is dropped because
|
||
|
it is unused.
|
||
|
<p>
|
||
|
There are differences with the original libnc-dap here
|
||
|
because libnc-dap technically was incorrect. The original
|
||
|
would have said this, for example.
|
||
|
<pre>
|
||
|
int S1.FS2.f1(lat, lat) ;
|
||
|
</pre>
|
||
|
Note that this is incorrect because it dimensions
|
||
|
S1.FS2.f1(2,2) rather than S1.FS2.f1(2,3).
|
||
|
|
||
|
<h3>Translating DAP DDS Sequences</h3>
|
||
|
Any variable (as determined above) that is contained
|
||
|
directly or indirectly by a Sequence is subject to revision
|
||
|
of its rank using the following rules.
|
||
|
<ol>
|
||
|
<li>
|
||
|
Let the variable be contained in Sequence Q1, where Q1 is the
|
||
|
innermost containing sequence. If Q1 is itself contained
|
||
|
(directly or indirectly) in a sequence,
|
||
|
or Q1 is contained (again directly or indirectly)
|
||
|
in a structure that has rank greater than 0,
|
||
|
then the variable will have an initial UNLIMITED
|
||
|
dimension. However, all dimensions coming from "above" and including (in
|
||
|
the containment sense) the innermost Sequence, Q1, will be
|
||
|
removed and replaced by the single UNLIMITED dimension. The
|
||
|
size associated with that UNLIMITED is zero, which means
|
||
|
that its contents are inaccessible through the netcdf-3 API.
|
||
|
Again, this differs from libnc-dap, which leaves out such variables.
|
||
|
Again, however, this difference is compatible.
|
||
|
<p>
|
||
|
<li>
|
||
|
If the variable is contained in a single Sequence (i.e. not nested)
|
||
|
and all containing structures have rank 0, then the variable will
|
||
|
have an initial dimension whose size is the record count for that
|
||
|
Sequence. The name of the new dimension will be the name of the
|
||
|
Sequence.
|
||
|
</ol>
|
||
|
<p>
|
||
|
Consider this example.
|
||
|
<pre>
|
||
|
Dataset {
|
||
|
Structure {
|
||
|
Sequence {
|
||
|
Int32 f1[3];
|
||
|
Int32 f2;
|
||
|
} SQ1;
|
||
|
} S1[2];
|
||
|
Sequence {
|
||
|
Structure {
|
||
|
Int32 x1[7];
|
||
|
} S2[5];
|
||
|
} Q2;
|
||
|
} D;
|
||
|
</pre>
|
||
|
The corresponding netcdf-3 translation is pretty much as follows
|
||
|
(the value for dimension Q2 may differ).
|
||
|
<pre>
|
||
|
dimensions:
|
||
|
unlimited = UNLIMITED ; // (0 currently)
|
||
|
S1.SQ1.f1_0 = 2 ;
|
||
|
S1.SQ1.f1_1 = 3 ;
|
||
|
S1.SQ1.f2_0 = 2 ;
|
||
|
Q2.S2.x1_0 = 5 ;
|
||
|
Q2.S2.x1_1 = 7 ;
|
||
|
Q2 = 5 ;
|
||
|
variables:
|
||
|
int S1.SQ1.f1(unlimited, S1.SQ1.f1_1) ;
|
||
|
int S1.SQ1.f2(unlimited) ;
|
||
|
int Q2.S2.x1(Q2, Q2.S2.x1_0, Q2.S2.x1_1) ;
|
||
|
</pre>
|
||
|
Note that for example S1.SQ1.f1_0
|
||
|
is not actually used because it has been folded
|
||
|
into the unlimited dimension.
|
||
|
<p>
|
||
|
Note that there is a performance cost
|
||
|
because the translation code has to walk the data to determine
|
||
|
how many records are associated with the sequence.
|
||
|
Since libnc-dap did something similar, it can be assumed that
|
||
|
the cost is not prohibitive.
|
||
|
|
||
|
<h2><a name="ncdap4">netCDF-4 Translation Rules</a></h2>
|
||
|
The DAP to netCDF-4 translation is enabled if the
|
||
|
"--enable-netcdf-4" option is specified at configure time.
|
||
|
This translation includes some elements of the libnc-dap
|
||
|
translation, but attempts to provide a simpler (but not,
|
||
|
unfortunately, simple) set of translation rules than is used
|
||
|
for the netCDF-3 translation.
|
||
|
Please note that the translation is still subject
|
||
|
to change to respond to unforeseen problems or to
|
||
|
suggested improvements.
|
||
|
<p>
|
||
|
This text will use this running example.
|
||
|
<pre>
|
||
|
Dataset {
|
||
|
Int32 f1[fdim=10];
|
||
|
Structure {
|
||
|
Int32 f11;
|
||
|
Structure {
|
||
|
Int32 f1[3];
|
||
|
Int32 f2;
|
||
|
} FS2[2];
|
||
|
} S1;
|
||
|
Grid {
|
||
|
Array:
|
||
|
Float32 temp[lat=2][lon=2];
|
||
|
Maps:
|
||
|
Int32 lat[2];
|
||
|
Int32 lon[2];
|
||
|
} G1;
|
||
|
Sequence {
|
||
|
Float64 depth;
|
||
|
} Q1;
|
||
|
} D
|
||
|
</pre>
|
||
|
|
||
|
<h3>Variable Definition</h3>
|
||
|
The rules for choosing variables is as follows.
|
||
|
<ol>
|
||
|
<li> Start with the names of the top-level fields of the DDS.
|
||
|
The term top-level means that the object is a direct subnode
|
||
|
of the Dataset object. In our example, this produces the set
|
||
|
[f1, S1, G1, Q1].
|
||
|
<p>
|
||
|
<li>
|
||
|
Replace all Grid objects with the fully qualified list of array
|
||
|
and map fields of the grid.
|
||
|
Our variable set then becomes [f1, S1, G1.temp, G1.lat, G1.lon, Q1].
|
||
|
Note that the libnc-dap practice of re-naming the array variable to be
|
||
|
that of the Grid is not used.
|
||
|
<p>
|
||
|
<li>
|
||
|
Attempt to remove the prefix Grid name from the top-level Grid array and map
|
||
|
variables. If that eventually conflicts with some other name,
|
||
|
then leave the conflicting Grids alone.
|
||
|
Our variable set then becomes [f1, S1, temp, lat, lon, Q1].
|
||
|
<li>
|
||
|
If the Grid array name is the same as the Grid name, then
|
||
|
remove the prefix Grid name (not shown).
|
||
|
</ol>
|
||
|
|
||
|
<h3>Dimension Definition</h3>
|
||
|
The rules for choosing and defining dimensions is as follows.
|
||
|
<ol>
|
||
|
<li>
|
||
|
Collect the set of dimensions (named and anonymous) directly
|
||
|
associated with the variables as defined above.
|
||
|
This means that dimensions
|
||
|
within user-defined types are ignored. From our example,
|
||
|
the dimension set is [fdim=10,lat=2,lon=2,2,2]. Note that the
|
||
|
unqualified names are used.
|
||
|
<p>
|
||
|
<li>
|
||
|
If an anonymous dimension is associated with a Grid Map variable,
|
||
|
then given the dimension, the name of the map.
|
||
|
Our dimension set now becomes
|
||
|
[fdim=10,lat=2,lon=2,lat=2,lon=2].
|
||
|
<p>
|
||
|
<li>
|
||
|
All remaining anonymous dimensions are given the
|
||
|
name "<var>_NN", where "<var>" is the
|
||
|
unqualified name of the variable in which the anonymous
|
||
|
dimension appears and NN is the relative position of that
|
||
|
dimension in the dimensions associated with that array.
|
||
|
No instances of this rule occur in the running example.
|
||
|
<p>
|
||
|
<li>
|
||
|
Remove duplicate dimensions (those with same name and value).
|
||
|
Our dimension set now becomes
|
||
|
[fdim=10,lat=2,lon=2].
|
||
|
<p>
|
||
|
<li>
|
||
|
The final case occurs when there are dimensions with the same
|
||
|
name but with different values. For this case,
|
||
|
the size of the dimension is appended to the dimension name.
|
||
|
</ol>
|
||
|
|
||
|
<h3>Type Definition</h3>
|
||
|
The rules for choosing user-defined types are as follows.
|
||
|
<ol>
|
||
|
<li>For every Structure, Sequence, and non-top-level Grid,
|
||
|
netcdf-4 compound type is created whose fields are the fields
|
||
|
of the Structure, Sequence, or Grid. The name of the type is the
|
||
|
same as the Structure or Grid name suffixed with "_t".
|
||
|
However, the compound types derived from Sequences are instead
|
||
|
suffixed with "_record_t".
|
||
|
<p>
|
||
|
The types of the fields are the types of the corresponding field
|
||
|
of the Structure, Sequence, or Grid. Note that this type
|
||
|
might be itself a user-defined type.
|
||
|
<p>
|
||
|
From the example, we get the following compound types.
|
||
|
<pre>
|
||
|
compound FS2_t {
|
||
|
int f1[3];
|
||
|
int f2;
|
||
|
};
|
||
|
compound S1_t {
|
||
|
int f11;
|
||
|
FS2_t FS2[2];
|
||
|
};
|
||
|
compound Q1_record_t {
|
||
|
double depth;
|
||
|
};
|
||
|
</pre>
|
||
|
<p>
|
||
|
<li>
|
||
|
For all sequences of name X,
|
||
|
also create this type.
|
||
|
<pre>
|
||
|
X_record_t (*) X_t
|
||
|
</pre>
|
||
|
In our example, this produces the following type.
|
||
|
<pre>
|
||
|
Q1_record_t (*) Q1_t
|
||
|
</pre>
|
||
|
<p>
|
||
|
<li>
|
||
|
If a Sequence, Q has a single field F,
|
||
|
whose type is a primitive type, T,
|
||
|
(e.g., int, float, string), then
|
||
|
do not apply the previous rule, but instead replace the whole
|
||
|
sequence with the the following field.
|
||
|
<pre>
|
||
|
T (*) Q.f
|
||
|
</pre>
|
||
|
<p>
|
||
|
<li>
|
||
|
Attempt to maximally shorten the type names as long as there is no conflict.
|
||
|
</ol>
|
||
|
|
||
|
<h2>Choosing a Translation</h2>
|
||
|
The decision about whether to translate to netCDF-3
|
||
|
(libnc-dap) or netCDF-4 is determined by applying the
|
||
|
following rules in order.
|
||
|
<ol>
|
||
|
<li>
|
||
|
If the NC_CLASSIC_MODEL flag is set on nc_open(), then netcdf-3
|
||
|
(i.e. libnc-dap) translation is used.
|
||
|
<li>
|
||
|
If the NC_NETCDF4 flag is set on nc_open(), then netCDF-4
|
||
|
translation is used.
|
||
|
<li>
|
||
|
If the URL is prefixed with the string
|
||
|
"[mode=netcdf3]" or "[mode=libnc-dap]",
|
||
|
then the libnc-dap translation is used.
|
||
|
<li>
|
||
|
If the URL is prefixed with the string "[mode=netcdf4]",
|
||
|
then the netCDF-4 translation described below is used.
|
||
|
<li>
|
||
|
If none of the above is specified, then the default
|
||
|
is "[mode=libnc-dap]".
|
||
|
</ol>
|
||
|
|
||
|
<h2>Defined Client Parameters</h2>
|
||
|
Currently, a limited set of client parameters is recognized.
|
||
|
Parameters not listed here are ignored, but no error is signalled.
|
||
|
<table borderwidth=1 cellpadding=2>
|
||
|
<tr><th>Parameter Name<th>Legal Values<th>Semantics
|
||
|
<tr><td>[mode=...]<td>libnc‑dap|netcdf3|netcdf4
|
||
|
<td>Specify the translation to be applied to the DAP data source on the
|
||
|
client side.
|
||
|
<tr><td>[show=...]<td>das|dds|url
|
||
|
<td>This causes information to appear as specific global attributes. The
|
||
|
tags may be combined using comma with no spaces
|
||
|
(e.g. "show=dds,url"). The currently recognized tags are "dds" to
|
||
|
display the underlying DDS, "das" similarly, and "url" to display
|
||
|
the url used to retrieve the data.
|
||
|
</table>
|
||
|
</body>
|
||
|
</html>
|