mirror of
https://github.com/Unidata/netcdf-c.git
synced 2024-12-27 08:49:16 +08:00
bf2746b8ea
re: issue https://github.com/Unidata/netcdf-c/issues/1251 Assume that you have the URL to a remote dataset which is a normal netcdf-3 or netcdf-4 file. This PR allows the netcdf-c to read that dataset's contents as a netcdf file using HTTP byte ranges if the remote server supports byte-range access. Originally, this PR was set up to access Amazon S3 objects, but it can also access other remote datasets such as those provided by a Thredds server via the HTTPServer access protocol. It may also work for other kinds of servers. Note that this is not intended as a true production capability because, as is known, this kind of access to can be quite slow. In addition, the byte-range IO drivers do not currently do any sort of optimization or caching. An additional goal here is to gain some experience with the Amazon S3 REST protocol. This architecture and its use documented in the file docs/byterange.dox. There are currently two test cases: 1. nc_test/tst_s3raw.c - this does a simple open, check format, close cycle for a remote netcdf-3 file and a remote netcdf-4 file. 2. nc_test/test_s3raw.sh - this uses ncdump to investigate some remote datasets. This PR also incorporates significantly changed model inference code (see the superceded PR https://github.com/Unidata/netcdf-c/pull/1259). 1. It centralizes the code that infers the dispatcher. 2. It adds support for byte-range URLs Other changes: 1. NC_HDF5_finalize was not being properly called by nc_finalize(). 2. Fix minor bug in ncgen3.l 3. fix memory leak in nc4info.c 4. add code to walk the .daprc triples and to replace protocol= fragment tag with a more general mode= tag. Final Note: Th inference code is still way too complicated. We need to move to the validfile() model used by netcdf Java, where each dispatcher is asked if it can process the file. This decentralizes the inference code. This will be done after all the major new dispatchers (PIO, Zarr, etc) have been implemented.
255 lines
10 KiB
Plaintext
255 lines
10 KiB
Plaintext
/*!
|
|
\page dap4 DAP4 Protocol Support
|
|
|
|
|
|
<!-- Note that this file has the .dox extension, but is mostly markdown -->
|
|
<!-- Begin MarkDown -->
|
|
|
|
\tableofcontents
|
|
|
|
# DAP4 Introduction {#dap4_introduction}
|
|
|
|
Beginning with netCDF version 4.5.0, optional support is provided for
|
|
accessing data from servers supporting the DAP4 protocol.
|
|
|
|
DAP4 support is enabled if the _--enable-dap_ option
|
|
is used with './configure'. If DAP4 support is enabled, then
|
|
a usable version of _libcurl_ must be specified
|
|
using the _LDFLAGS_ environment variable (similar to the way
|
|
that the _HDF5_ libraries are referenced).
|
|
Refer to the installation manual for details.
|
|
By default DAP4 support is enabled if _libcurl_ is found.
|
|
DAP4 support can be disabled using the _--disable-dap_.
|
|
|
|
DAP4 uses a data model that is, by design, similar to,
|
|
but -- for historical reasons -- not identical with,
|
|
the netCDF-Enhanced (aka netcdf-4) model.
|
|
Generically, the DAP4 data model is encoded in XML document
|
|
called a DMR.
|
|
For detailed information about the DAP4 DMR, refer to
|
|
the DAP4 specification Volume 1:
|
|
http://docs.opendap.org/index.php/DAP4:_Specification_Volume_1
|
|
|
|
# Accessing Data Using the DAP4 Prototocol {#dap4_accessing_data}
|
|
|
|
In order to access a DAP4 data source through the netCDF API, the
|
|
file name normally used is replaced with a URL with a specific
|
|
format. The URL is composed of three parts.
|
|
- URL - this is a standard form URL with specific markers to indicate that
|
|
it refers to a DAP4 encoded dataset. The markers are of the form
|
|
"dap4", "mode=dap4", or "/thredds/dap4". The following
|
|
examples show how they are specified. Note that the "/thredds/dap4"
|
|
case will only work when accessing a Thredds-based server.
|
|
+ [dap4]http://remotetest.unidata.ucar.edu/d4ts/test.01
|
|
+ [mode=dap4]http://remotetest.unidata.ucar.edu/d4ts/test.01
|
|
+ http://remotetest.unidata.ucar.edu/d4ts/test.01#dap4
|
|
+ http://remotetest.unidata.ucar.edu/d4ts/test.01#mode=dap4
|
|
+ http://thredds.ucar.edu/thredds/dap4/...
|
|
|
|
- Constraints - these are suffixed to the URL and take the form
|
|
“?dap4.ce=\<expression\>”. The form of the constraint expression
|
|
is somewhat complicated, and the specification should be consulted.
|
|
|
|
- Client parameters - these may be specified in either of
|
|
two ways. The older, deprecated form prefixes text to the
|
|
front of the url and is of the the general form [\<name>]
|
|
or [\<name>=value]. Examples include [show=fetch] or [noprefetch].
|
|
The newer, preferred form prefixes the
|
|
parameters to the end of the url using the semi-standard '#'
|
|
format: e.g. http://....#show=fetch&noprefetch.
|
|
|
|
It is possible to see what the translation does to a particular DAP4
|
|
data source in two steps. First, one can examine the DMR
|
|
source through a web browser and then second, one can examine
|
|
the translation using the "ncdump -h" command to see the
|
|
corresponding netCDF-4 representation.
|
|
|
|
For example, if a web browser is given the following (fictional) URL,
|
|
it will return the DMR for the specified dataset (in XML format).
|
|
\code
|
|
http://remotetest.unidata.ucar.edu/d4ts/test.01.dmr.xml#dap4
|
|
\endcode
|
|
|
|
By using the following ncdump command, it is possible to see the
|
|
equivalent netCDF-4 translation.
|
|
|
|
\code
|
|
ncdump -h '[dap4]http://remotetest.unidata.ucar.edu/d4ts/test.01'
|
|
\endcode
|
|
|
|
# Defined Client Parameters {#dap4_defined_params}
|
|
|
|
Currently, a limited set of client parameters is
|
|
recognized. Parameters not listed here are
|
|
ignored, but no error is signalled.
|
|
|
|
Parameter Name Legal Values Semantics
|
|
- "log" | "log=<file>" - Turn on logging and send the log output to
|
|
the specified file. If no file is specified, then output is sent to standard
|
|
error.
|
|
- "show=fetch" - This parameter causes the netCDF code to log a copy
|
|
of the complete url for every HTTP get request. If logging is
|
|
enabled, then this can be helpful in checking to see the access
|
|
behavior of the netCDF code.
|
|
- "translate=nc4" - This parameter causes the netCDF code to look
|
|
for specially named elements in the DMR XML in order to
|
|
achieve a better translation of the DAP4 meta-data to NetCDF enhanced
|
|
metadata.
|
|
- "opaquesize=<integer>" - This parameter causes the netCDF code to
|
|
convert DAP4 variable size OPAQUE objects to netcdf-4 fixed size
|
|
objects and forces all of them to be of the size specified.
|
|
- "fillmismatch" - Unfortunately, a number of servers sometimes
|
|
fail to make sure that the type of the "_FillValue" attribute of a variable
|
|
is the same as the type of the containing variable. Setting this tag
|
|
caused the netcdf translation to attempt to fix this mismatch. If not set,
|
|
then an error will occur.
|
|
|
|
# Notes on Debugging DAP4 Access {#dap4_debug}
|
|
|
|
The DAP4 support has a logging facility.
|
|
Turning on this logging can
|
|
sometimes give important information. Logging can be enabled by
|
|
using the client parameter "log" or "log=filename",
|
|
where the first case will send log output to standard error and the
|
|
second will send log output to the specified file.
|
|
|
|
Users should also be aware that if one is
|
|
accessing data over an NFS mount, one may see some .nfsxxxxx files;
|
|
those can be ignored.
|
|
|
|
## HTTP Configuration. {#dap4_http2_config}
|
|
|
|
Limited support for configuring the http connection is provided via
|
|
parameters in the “.daprc” configuration file (aka ".dodsrc").
|
|
The relevant .daprc file is
|
|
located by first looking in the current working directory, and if not
|
|
found, then looking in the directory specified by the “$HOME”
|
|
environment variable.
|
|
|
|
Entries in the .daprc file are of the form:
|
|
````
|
|
['['<url>']']<key>=<value>
|
|
````
|
|
|
|
That is, it consists of a key name and value pair and optionally
|
|
preceded by a url enclosed in square brackets.
|
|
|
|
For given KEY and URL strings, the value chosen is as follows:
|
|
|
|
If URL is null, then look for the .daprc entry that has no url prefix
|
|
and whose key is same as the KEY for which we are looking.
|
|
|
|
If the URL is not null, then look for all the .daprc entries that
|
|
have a url, URL1, say, and for which URL1 has the same host and port
|
|
as URL. All parts of the url's except host and port are ignored.
|
|
For example, if URL = http//x.y/a, then it will match
|
|
entries of the form
|
|
_[http//x.y/a]KEY=VALUE_ or _[http//x.y/b]KEY=VALUE_.
|
|
It will not match an entry of the form _[http//x.y:8080]KEY=VALUE
|
|
because the second has a port number (8080) different than the URL.
|
|
Finally from the set so constructed, choose the first matching entry.
|
|
|
|
Currently, the supported set of keys (with descriptions) are as
|
|
follows.
|
|
|
|
-# HTTP.VERBOSE
|
|
Type: boolean ("1"/"0")
|
|
Description: Produce verbose output, especially using SSL.
|
|
Related CURL Flags: CURLOPT_VERBOSE
|
|
|
|
-# HTTP.DEFLATE
|
|
Type: boolean ("1"/"0")
|
|
Description: Allow use of compression by the server.
|
|
Related CURL Flags: CURLOPT_ENCODING
|
|
|
|
-# HTTP.COOKIEJAR
|
|
Type: String representing file path
|
|
Description: Specify the name of file into which to store cookies. Defaults to in-memory storage.
|
|
Related CURL Flags:CURLOPT_COOKIEJAR
|
|
|
|
-# HTTP.CREDENTIALS.USER
|
|
Type: String representing user name
|
|
Description: Specify the user name for Digest and Basic authentication.
|
|
Related CURL Flags:
|
|
|
|
-# HTTP.CREDENTIALS.PASSWORD
|
|
Type: String representing password
|
|
Type: boolean ("1"/"0")
|
|
Description: Specify the password for Digest and Basic authentication.
|
|
Related CURL Flags:
|
|
|
|
-# HTTP.SSL.CERTIFICATE
|
|
Type: String representing file path
|
|
Description: Path to a file containing a PEM cerficate.
|
|
Related CURL Flags: CURLOPT_CERT
|
|
|
|
-# HTTP.SSL.KEY
|
|
Type: String representing file path
|
|
Description: Same as HTTP.SSL.CERTIFICATE, and should usually have the same value.
|
|
Related CURL Flags: CURLOPT_SSLKEY
|
|
|
|
-# HTTP.SSL.KEYPASSWORD
|
|
Type: String representing password
|
|
Description: Password for accessing the HTTP.SSL.KEY/HTTP.SSL.CERTIFICATE
|
|
Related CURL Flags: CURLOPT_KEYPASSWORD
|
|
|
|
-# HTTP.SSL.CAPATH
|
|
Type: String representing directory
|
|
Description: Path to a directory containing trusted certificates for validating server certificates.
|
|
Related CURL Flags: CURLOPT_CAPATH
|
|
|
|
-# HTTP.SSL.VALIDATE
|
|
Type: boolean ("1"/"0")
|
|
Description: Cause the client to verify the server's presented certificate.
|
|
Related CURL Flags: CURLOPT_SSL_VERIFYPEER, CURLOPT_SSL_VERIFYHOST
|
|
|
|
-# HTTP.TIMEOUT
|
|
Type: String ("dddddd")
|
|
Description: Specify the maximum time in seconds that you allow the http transfer operation to take.
|
|
Related CURL Flags: CURLOPT_TIMEOUT, CURLOPT_NOSIGNAL
|
|
|
|
-# HTTP.PROXY_SERVER
|
|
Type: String representing url to access the proxy: (e.g.http://[username:password@]host[:port])
|
|
Description: Specify the needed information for accessing a proxy.
|
|
Related CURL Flags: CURLOPT_PROXY, CURLOPT_PROXYHOST, CURLOPT_PROXYUSERPWD
|
|
|
|
-# HTTP.READ.BUFFERSIZE
|
|
Type: String ("dddddd")
|
|
Description: Specify the the internal buffer size for curl reads.
|
|
Related CURL Flags: CURLOPT_BUFFERSIZE, CURL_MAX_WRITE_SIZE (16kB),
|
|
CURL_MAX_READ_SIZE (512kB).
|
|
|
|
-# HTTP.KEEPALIVE
|
|
Type: String ("on|n/m")
|
|
Description: Specify that TCP KEEPALIVE should be enabled and that the associated idle wait time is n and that the associated repeat interval is m. If the value is of the form is the string "on", then turn on keepalive, but do not set idle or interval.
|
|
Related CURL Flags: CURLOPT_TCP_KEEPALIVE, CURLOPT_TCP_KEEPIDLE,
|
|
CURLOPT_TCP_KEEPINTVL.
|
|
|
|
The related curl flags line indicates the curl flags modified by this
|
|
key. See the libcurl documentation of the _curl_easy_setopt()_ function
|
|
for more detail (http://curl.haxx.se/libcurl/c/curl_easy_setopt.html).
|
|
|
|
For servers that require client authentication, as a rule, the following
|
|
.daprc entries should be specified.
|
|
````
|
|
HTTP.SSL.VALIDATE
|
|
HTTP.COOKIEJAR
|
|
HTTP.SSL.CERTIFICATE
|
|
HTTP.SSL.KEY
|
|
HTTP.SSL.CAPATH
|
|
````
|
|
|
|
Additionally, the _HTTP.SSL.CERTIFICATE_ and _HTTP.SSL.KEY_
|
|
entries should have same value, which is the file path for a
|
|
certificate. The HTTP.SSL.CAPATH entry should
|
|
be the path to a directory containing validation "certificates".
|
|
|
|
# Point of Contact {#dap4_poc}
|
|
|
|
__Author__: Dennis Heimbigner<br>
|
|
__Email__: dmh at ucar dot edu<br>
|
|
__Initial Version__: 6/5/2017<br>
|
|
__Last Revised__: 11/7/2018
|
|
|
|
*/
|