netcdf-c/libncdap4/translation.html
2010-06-03 13:24:43 +00:00

506 lines
14 KiB
HTML

<html>
<body>
<h1>DAP to Netcdf Translation Rules</h1>
Two translations are currently available.
<ul>
<li><a href="#ncdap3">DAP 2 Protocol to netCDF-3</a>
<li><a href="#ncdap4">DAP 2 Protocol to netCDF-4</a>
</ul>
<h2><a name="ncdap3">netCDF-3 Translation Rules</a></h3>
The current set of translation rules to convert an OPeNDAP
DAP protocol version 2 DDS to netCDF-3 is designed to mimic
as closely as possible those currently used by the libnc-dap
system. Please note that the translation is still subject
to change to respond to unforeseen problems and user
suggestions.
<p>
For illustrative purposes, the following example will be used.
<pre>
Dataset {
Int32 f1;
Structure {
Int32 f11;
Structure {
Int32 f1[3];
Int32 f2;
} FS2[2];
} S1;
Structure {
Grid {
Array:
Float32 temp[lat=2][lon=2];
Maps:
Int32 lat[lat=2];
Int32 lon[lon=2];
} G1;
} S2;
Grid {
Array:
Float32 G2[lat=2][lon=2];
Maps:
Int32 lat[2];
Int32 lon[2];
} G2;
Int32 lat[lat=2];
Int32 lon[lon=2];
} D1;
</pre>
<h3>Variable Definition</h3>
The set of variables is defined by the fields with
primitive base types as they occur in
Sequences, Grids, and Structures.
The field names are modified to be fully qualified initially.
For the above, the set of variables, the variables are as follows.
<ol>
<li>f1
<li>S1.f11
<li>S1.FS2.f1
<li>S1.FS2.f2
<li>S2.G1.temp
<li>S2.G1.lat
<li>S2.G1.lon
<li>S2.G2.G2
<li>S2.G2.lat
<li>S2.G2.lon
<li>lat
<li>lon
</ol>
<h3>Variable Dimension Translation</h3>
A variable's rank is determined from three sources.
<ol>
<li>
The variable has the dimensions associated with the field
it represents (e.g. S1.FS2.f1[3] in the above example).
<li>
The variable inherits the dimensions associated with any containing
structure that has a rank greater than zero.
These dimensions precede those of case 1.
Thus, we have in our example, f1[2][3], where the first dimension
comes from the containing Structure FS2[2].
<li>
The variable's set of dimensions are altered
if any of its containers is a DAP DDS Sequence.
This is discussed more fully below.
</ol>
<h3>Dimension translation</h3>
For dimensions, the rules are as follows.
<ol>
<li> Fields in dimensioned structures inherit the dimension
of the structure; thus the above list would have the following
dimensioned variables.
<ul>
<li>S1.FS2.f1 -&gt; S1.FS2.f1[2][3]
<li>S1.FS2.f2 -&gt; S1.FS2.f2[2]
<li>S2.G1.temp -&gt; S2.G1.temp[lat=2][lon=2]
<li>S2.G1.lat -&gt; S2.G1.lat[lat=2]
<li>S2.G1.lon -&gt; S2.G1.lon[lon=2]
<li>S2.G2.G2 -&gt; S2.G2.lon[lat=2][lon=2]
<li>S2.G2.lat -&gt; S2.G2.lat[lat=2]
<li>S2.G2.lon -&gt; S2.G2.lon[lon=2]
<li>lat -&gt; lat[lat=2]
<li>lon -&gt; lon[lon=2]
</ul>
<p>
<li>
Collect all of the dimension specifications from the DDS, both
named and anonymous (unnamed)
For each unique anonymous dimension with value NN
create a netCDF dimension of the form "&lt;array&gt;_&lt;i&gt;=NN",
where <array> is the fully qualified name of the variable and i is the
i'th (inherited) dimension of the array where the anonymous dimension occurs.
For our example, this would create the following dimensions.
<ul>
<li>S1.FS2.f1_0 = 2 ;
<li>S1.FS2.f1_1 = 3 ;
<li>S1.FS2.f2_0 = 2 ;
<li>S2.G2.lat_0 = 2 ;
<li>S2.G2.lon_0 = 2 ;
</ul>
<p>
<li>
If, however, the anonymous dimension is the single dimension
of a MAP vector in a Grid then the dimension is given the
same name as the map vector This leads to the following.
<ul>
<li>S2.G2.lat_0 -&gt; S2.G2.lat
<li>S2.G2.lon_0 -&gt; S2.G2.lon
</ul>
<p>
<li>
For each unique named dimension "&lt;name&gt;=NN",
create a netCDF dimension of the form "&lt;name&gt;=NN",
where name has the qualifications removed.
If this leads to duplicates (i.e. same name and same value),
then the duplicates are ignored.
This produces the following.
<ul>
<li>S2.G2.lat -&gt; lat
<li>S2.G2.lon -&gt; lon
</ul>
Note that this produces duplicates.
<p>
<li>
At this point the only dimensions left to process should be named
dimensions with the same name as some dimension from step number 3,
but with a different value. For those dimensions create a dimension
of the form "&lt;name&gt;M=NN" where M is a counter starting at 1.
The example has no instances of this.
<p>
<li>
Finally and if needed, define a single UNLIMITED dimension named "unlimited"
with value zero.
</ol>
This leads to the following set of dimensions.
<pre>
dimensions:
unlimited = UNLIMITED;
lat = 2 ;
lon = 2 ;
S1.FS2.f1_0 = 2 ;
S1.FS2.f1_1 = 3 ;
S1.FS2.f2_0 = 2 ;
</pre>
<h3>Variable Name Translation</h3>
The steps for variable name translation are as follows.
<p>
<ol>
<li>Take the set of variables captured above.
Thus for the above DDS, the following fields would be collected.
<ul>
<li>f1
<li>S1.f11
<li>S1.FS2.f1
<li>S1.FS2.f2
<li>S2.G1.temp
<li>S2.G1.lat
<li>S2.G1.lon
<li>S2.G2.G2
<li>S2.G2.lat
<li>S2.G2.lon
<li>lat
<li>lon
</ul>
<p>
<li>All grid array variables are renamed to be the same as the containing
grid and the grid prefix is removed.
In the above DDS, this results in the following changes.
<ol>
<li> G1.temp -> G1
<li> G2.G2 -> G2
</ol>
Note that, for example, the G1.lon keeps that name.
Also note that libnc-dap just drops the grid map variables,
so this is one place where the translation differs from
libnc-dap, but in a compatible way.
</ol>
<p>
It is important to note that if this process could produce duplicate
variables (i.e. with the same name); in that case they are all assumed
to have the same content and the duplicates are ignored.
If it turns out that the duplicates have different content, then
the translation will not detect this. YOU HAVE BEEN WARNED.
<p>
The final netCDF-3 schema (minus attributes) is then as follows.
<pre>
netcdf t {
dimensions:
unlimited = UNLIMITED
lat = 2 ;
lon = 2 ;
S1.FS2.f1_0 = 2 ;
S1.FS2.f1_1 = 3 ;
S1.FS2.f2_0 = 2 ;
variables:
int f1 ;
int lat(lat) ;
int lon(lon) ;
int S1.f11 ;
int S1.FS2.f1(S1.FS2.f1_0, S1.FS2.f1_1) ;
int S1.FS2.f2(S1_FS2_f2_0) ;
float S2.G1(lat, lon) ;
int S2.G1.lat(lat) ;
int S2.G1.lon(lon) ;
float G2(lat, lon) ;
int G2.lat(lat) ;
int G2.lon(lon) ;
}
</pre>
In actuality, the unlimited dimension is dropped because
it is unused.
<p>
There are differences with the original libnc-dap here
because libnc-dap technically was incorrect. The original
would have said this, for example.
<pre>
int S1.FS2.f1(lat, lat) ;
</pre>
Note that this is incorrect because it dimensions
S1.FS2.f1(2,2) rather than S1.FS2.f1(2,3).
<h3>Translating DAP DDS Sequences</h3>
Any variable (as determined above) that is contained
directly or indirectly by a Sequence is subject to revision
of its rank using the following rules.
<ol>
<li>
Let the variable be contained in Sequence Q1, where Q1 is the
innermost containing sequence. If Q1 is itself contained
(directly or indirectly) in a sequence,
or Q1 is contained (again directly or indirectly)
in a structure that has rank greater than 0,
then the variable will have an initial UNLIMITED
dimension. However, all dimensions coming from "above" and including (in
the containment sense) the innermost Sequence, Q1, will be
removed and replaced by the single UNLIMITED dimension. The
size associated with that UNLIMITED is zero, which means
that its contents are inaccessible through the netcdf-3 API.
Again, this differs from libnc-dap, which leaves out such variables.
Again, however, this difference is compatible.
<p>
<li>
If the variable is contained in a single Sequence (i.e. not nested)
and all containing structures have rank 0, then the variable will
have an initial dimension whose size is the record count for that
Sequence. The name of the new dimension will be the name of the
Sequence.
</ol>
<p>
Consider this example.
<pre>
Dataset {
Structure {
Sequence {
Int32 f1[3];
Int32 f2;
} SQ1;
} S1[2];
Sequence {
Structure {
Int32 x1[7];
} S2[5];
} Q2;
} D;
</pre>
The corresponding netcdf-3 translation is pretty much as follows
(the value for dimension Q2 may differ).
<pre>
dimensions:
unlimited = UNLIMITED ; // (0 currently)
S1.SQ1.f1_0 = 2 ;
S1.SQ1.f1_1 = 3 ;
S1.SQ1.f2_0 = 2 ;
Q2.S2.x1_0 = 5 ;
Q2.S2.x1_1 = 7 ;
Q2 = 5 ;
variables:
int S1.SQ1.f1(unlimited, S1.SQ1.f1_1) ;
int S1.SQ1.f2(unlimited) ;
int Q2.S2.x1(Q2, Q2.S2.x1_0, Q2.S2.x1_1) ;
</pre>
Note that for example S1.SQ1.f1_0
is not actually used because it has been folded
into the unlimited dimension.
<p>
Note that there is a performance cost
because the translation code has to walk the data to determine
how many records are associated with the sequence.
Since libnc-dap did something similar, it can be assumed that
the cost is not prohibitive.
<h2><a name="ncdap4">netCDF-4 Translation Rules</a></h2>
The DAP to netCDF-4 translation is enabled if the
"--enable-netcdf-4" option is specified at configure time.
This translation includes some elements of the libnc-dap
translation, but attempts to provide a simpler (but not,
unfortunately, simple) set of translation rules than is used
for the netCDF-3 translation.
Please note that the translation is still subject
to change to respond to unforeseen problems or to
suggested improvements.
<p>
This text will use this running example.
<pre>
Dataset {
Int32 f1[fdim=10];
Structure {
Int32 f11;
Structure {
Int32 f1[3];
Int32 f2;
} FS2[2];
} S1;
Grid {
Array:
Float32 temp[lat=2][lon=2];
Maps:
Int32 lat[2];
Int32 lon[2];
} G1;
Sequence {
Float64 depth;
} Q1;
} D
</pre>
<h3>Variable Definition</h3>
The rules for choosing variables is as follows.
<ol>
<li> Start with the names of the top-level fields of the DDS.
The term top-level means that the object is a direct subnode
of the Dataset object. In our example, this produces the set
[f1, S1, G1, Q1].
<p>
<li>
Replace all Grid objects with the fully qualified list of array
and map fields of the grid.
Our variable set then becomes [f1, S1, G1.temp, G1.lat, G1.lon, Q1].
Note that the libnc-dap practice of re-naming the array variable to be
that of the Grid is not used.
<p>
<li>
Attempt to remove the prefix Grid name from the top-level Grid array and map
variables. If that eventually conflicts with some other name,
then leave the conflicting Grids alone.
Our variable set then becomes [f1, S1, temp, lat, lon, Q1].
<li>
If the Grid array name is the same as the Grid name, then
remove the prefix Grid name (not shown).
</ol>
<h3>Dimension Definition</h3>
The rules for choosing and defining dimensions is as follows.
<ol>
<li>
Collect the set of dimensions (named and anonymous) directly
associated with the variables as defined above.
This means that dimensions
within user-defined types are ignored. From our example,
the dimension set is [fdim=10,lat=2,lon=2,2,2]. Note that the
unqualified names are used.
<p>
<li>
If an anonymous dimension is associated with a Grid Map variable,
then given the dimension, the name of the map.
Our dimension set now becomes
[fdim=10,lat=2,lon=2,lat=2,lon=2].
<p>
<li>
All remaining anonymous dimensions are given the
name "&lt;var&gt;_NN", where "&lt;var&gt;" is the
unqualified name of the variable in which the anonymous
dimension appears and NN is the relative position of that
dimension in the dimensions associated with that array.
No instances of this rule occur in the running example.
<p>
<li>
Remove duplicate dimensions (those with same name and value).
Our dimension set now becomes
[fdim=10,lat=2,lon=2].
<p>
<li>
The final case occurs when there are dimensions with the same
name but with different values. For this case,
the size of the dimension is appended to the dimension name.
</ol>
<h3>Type Definition</h3>
The rules for choosing user-defined types are as follows.
<ol>
<li>For every Structure, Sequence, and non-top-level Grid,
netcdf-4 compound type is created whose fields are the fields
of the Structure, Sequence, or Grid. The name of the type is the
same as the Structure or Grid name suffixed with "_t".
However, the compound types derived from Sequences are instead
suffixed with "_record_t".
<p>
The types of the fields are the types of the corresponding field
of the Structure, Sequence, or Grid. Note that this type
might be itself a user-defined type.
<p>
From the example, we get the following compound types.
<pre>
compound FS2_t {
int f1[3];
int f2;
};
compound S1_t {
int f11;
FS2_t FS2[2];
};
compound Q1_record_t {
double depth;
};
</pre>
<p>
<li>
For all sequences of name X,
also create this type.
<pre>
X_record_t (*) X_t
</pre>
In our example, this produces the following type.
<pre>
Q1_record_t (*) Q1_t
</pre>
<p>
<li>
If a Sequence, Q has a single field F,
whose type is a primitive type, T,
(e.g., int, float, string), then
do not apply the previous rule, but instead replace the whole
sequence with the the following field.
<pre>
T (*) Q.f
</pre>
<p>
<li>
Attempt to maximally shorten the type names as long as there is no conflict.
</ol>
<h2>Choosing a Translation</h2>
The decision about whether to translate to netCDF-3
(libnc-dap) or netCDF-4 is determined by applying the
following rules in order.
<ol>
<li>
If the NC_CLASSIC_MODEL flag is set on nc_open(), then netcdf-3
(i.e. libnc-dap) translation is used.
<li>
If the NC_NETCDF4 flag is set on nc_open(), then netCDF-4
translation is used.
<li>
If the URL is prefixed with the string
"[mode=netcdf3]" or "[mode=libnc-dap]",
then the libnc-dap translation is used.
<li>
If the URL is prefixed with the string "[mode=netcdf4]",
then the netCDF-4 translation described below is used.
<li>
If none of the above is specified, then the default
is "[mode=libnc-dap]".
</ol>
<h2>Defined Client Parameters</h2>
Currently, a limited set of client parameters is recognized.
Parameters not listed here are ignored, but no error is signalled.
<table borderwidth=1 cellpadding=2>
<tr><th>Parameter Name<th>Legal Values<th>Semantics
<tr><td>[mode=...]<td>libnc&#8209;dap|netcdf3|netcdf4
<td>Specify the translation to be applied to the DAP data source on the
client side.
<tr><td>[show=...]<td>das|dds|url
<td>This causes information to appear as specific global attributes. The
tags may be combined using comma with no spaces
(e.g. "show=dds,url"). The currently recognized tags are "dds" to
display the underlying DDS, "das" similarly, and "url" to display
the url used to retrieve the data.
</table>
</body>
</html>