netcdf-c/docs/auth.md
Dennis Heimbigner 12ec5711d7 Fix some problems with Earthdata authorization.
re: Issue https://github.com/Unidata/netcdf-c/issues/2704

The issue reported problems accessing e.g. opendap.earthdata.nasa.gov,
which uses the authentication mechanisms of urs.earthdata.nasa.gov.
The file *docs/auth.md* describes how to setup the proper authorization
mechanisms for earthdata, but there turned out to be some bugs
in the code that prevented this from working.

## Primary Changes
* Add some clarification text to *auth.md*.
* Fix the process for loading and merging *.ncrc* and *.dodsrc* file to conform to documentation.
* Fix *NC_s3urlrebuild* so that non-S3 urls are passed through unchanged.
* Fix a bug in the .rc test *test_rcmerge.sh*.
2023-06-10 18:51:13 -06:00

20 KiB

NetCDF Authorization Support

NetCDF Authorization Support

[TOC]

Introduction

netCDF can support user authorization using the facilities provided by the curl library. This includes basic password authentication as well as certificate-based authorization. At the moment, this document only applies to DAP2 and DAP4 access.

With some exceptions (e.g. see the section on redirection) The libcurl authorization mechanisms can be accessed in two ways

  1. Inserting the username and password into the url, or
  2. Accessing information from a so-called rc file named either .ncrc or .dodsrc. The latter is historical and deprecated, but will be supported indefinitely.

URL-Based Authentication

For simple password based authentication, it is possible to directly insert the username and the password into a url in this form.

http://username:password@host/...

This username and password will be used if the server asks for authentication. Note that only simple password authentication is supported in this format.

Specifically note that redirection-based authorization may work with this but it is a security risk. This is because the username and password may be sent to each server in the redirection chain.

Note also that the user:password form may contain characters that must be escaped. See the password escaping section to see how to properly escape the user and password.

RC File Authentication

The netcdf library supports an rc file mechanism to allow the passing of a number of parameters to libnetcdf and libcurl. Locating the rc file is a multi-step process.

Search Order

The netcdf-c library searches for, and loads from, the following files, in this order:

  1. $HOME/.ncrc
  2. $HOME/.dodsrc
  3. $CWD/.ncrc
  4. $CWD/.dodsrc

$HOME is the user's home directory and $CWD is the current working directory. Entries in later files override any of the earlier files

It is strongly suggested that you pick a uniform location and a uniform name and use them always. Otherwise you may observe unexpected results when the netcdf-c library loads an rc file you did not expect.

RC File Format

The rc file format is a series of lines of the general form:

[<host:port>]<key>=<value>

where the bracket-enclosed host:port is optional.

URL Constrained RC File Entries

Each line of the rc file can begin with a host+port enclosed in square brackets. The form is "host:port". If the port is not specified then the form is just "host". The reason that more of the url is not used is that libcurl's authorization grain is not any finer than host level.

Here are some examples.

    [remotetest.unidata.ucar.edu]HTTP.VERBOSE=1
or
    [fake.ucar.edu:9090]HTTP.VERBOSE=0

If the url request from, say, the netcdf_open method has a host+port matching one of the prefixes in the rc file, then the corresponding entry will be used, otherwise ignored. This means that an entry with a matching host+port will take precedence over an entry without a host+port.

For example, the URL

    http://remotetest.unidata.ucar.edu/thredds/dodsC/testdata/testData.nc

will have HTTP.VERBOSE set to 1 because its host matches the example above.

Similarly,

    http://fake.ucar.edu:9090/dts/test.01

will have HTTP.VERBOSE set to 0 because its host+port matches the example above.

Authorization-Related Keys

The currently defined set of authorization-related keys are as follows. The second column is the affected curl_easy_setopt option(s), if any (see reference #1).

KeyAffected curl_easy_setopt OptionsNotes
HTTP.COOKIEJARCURLOPT_COOKIEJAR
HTTP.COOKIEFILECURLOPT_COOKIEJARCOOKIEJAR and COOKIEFILE are considered aliases, so setting one will set the other as well.
HTTP.PROXY.SERVERCURLOPT_PROXY, CURLOPT_PROXYPORT, CURLOPT_PROXYUSERPWD
HTTP.PROXY_SERVERCURLOPT_PROXY, CURLOPT_PROXYPORT, CURLOPT_PROXYUSERPWDDecprecated: use HTTP.PROXY.SERVER
HTTP.SSL.CERTIFICATECURLOPT_SSLCERT
HTTP.SSL.KEYCURLOPT_SSLKEY
HTTP.SSL.KEYPASSWORDCURLOPT_KEYPASSWORD
HTTP.SSL.CAINFOCURLOPT_CAINFO
HTTP.SSL.CAPATHCURLOPT_CAPATH
HTTP.SSL.VERIFYPEERCURLOPT_SSL_VERIFYPEER
HTTP.SSL.VALIDATECURLOPT_SSL_VERIFYPEER, CURLOPT_SSL_VERIFYHOST
HTTP.CREDENTIALS.USERPASSWORDCURLOPT_USERPASSWORD
HTTP.CREDENTIALS.USERNAMECURLOPT_USERNAME
HTTP.CREDENTIALS.PASSWORDCURLOPT_PASSWORD
HTTP.NETRCCURLOPT_NETRC, CURLOPT_NETRC_FILESpecify path of the .netrc file to use and enables its use.
AWS.PROFILEN.A.Specify name of a profile in from the .aws/credentials file
AWS.REGIONN.A.Specify name of a default region

Password Authentication

The key HTTP.CREDENTIALS.USERPASSWORD can be used to set the simple password authentication. This is an alternative to setting it in the url. The value must be of the form "username:password". See the password escaping section to see how this value must escape certain characters. Also see redirection authorization for important additional information.

The pair of keys HTTP.CREDENTIALS.USERNAME and HTTP.CREDENTIALS.PASSWORD can be used as an alternative to HTTP.CREDENTIALS.USERPASSWORD to set the simple password authentication. If present, they take precedence over HTTP.CREDENTIALS.USERPASSWORD. The values do not need to be escaped. See redirection authorization for important additional information.

The HTTP.COOKIEJAR key specifies the name of file from which to read cookies (CURLOPT_COOKIEJAR) and also the file into which to store cookies (CURLOPT_COOKIEFILE). The same value is used for both CURLOPT values. It defaults to in-memory storage. See redirection authorization for important additional information.

Certificate Authentication

HTTP.SSL.CERTIFICATE specifies a file path for a file containing a PEM cerficate. This is typically used for client-side authentication.

HTTP.SSL.KEY is essentially the same as HTTP.SSL.CERTIFICATE and should always have the same value.

HTTP.SSL.KEYPASSWORD specifies the password for accessing the HTTP.SSL.CERTIFICATE/HTTP.SSL.key file.

HTTP.SSL.CAPATH specifies the path to a directory containing trusted certificates for validating server certificates. See reference #2 for more info.

HTTP.SSL.VALIDATE is a boolean (1/0) value that if true (1) specifies that the client should verify the server's presented certificate.

HTTP.PROXY.SERVER specifies the url for accessing the proxy: e.g. http://[username:password@]host[:port]

HTTP.PROXY_SERVER deprecated; use HTTP.PROXY.SERVER

HTTP.NETRC specifies the absolute path of the .netrc file, and causes it to be used instead of username and password. See redirection authorization for information about using .netrc.

Password Escaping

With current password rules, it is is not unlikely that the password will contain characters that need to be escaped. Similarly, the user may contain characters such as '@' that need to be escaped. To support this, it is assumed that all occurrences of user:password use URL (i.e. %%XX) escaping for at least the characters in the table below.

The minimum set of characters that must be escaped depends on the location. If the user+pwd is embedded in the URL, then '@' and ':' must be escaped. If the user+pwd is the value for the HTTP.CREDENTIALS.USERPASSWORD key in the rc file, then ':' must be escaped. Escaping should not be used in the .netrc file nor in HTTP.CREDENTIALS.USERNAME or HTTPCREDENTIALS.PASSWORD.

The relevant escape codes are as follows.

CharacterEscaped Form
'@'%40
':'%3a
Additional characters can be escaped if desired.

Redirection-Based Authentication

Some sites provide authentication by using a third party site to do the authentication. Examples include ESG, URS, RDA, and most oauth2-based systems.

The process is usually as follows.

  1. The client contacts the server of interest (SOI), the actual data provider using, typically http protocol.
  2. The SOI sends a redirect to the client to connect to the e.g. URS system using the https protocol (note the use of https instead of http).
  3. The client authenticates with URS.
  4. URS sends a redirect (with authorization information) to send the client back to the SOI to actually obtain the data.

It turns out that libcurl, by default, uses the password in the .ncrc file (or from the url) for all connections that request a password. This causes problems because only the the specific redirected connection is the one that actually requires the password. This is where the .netrc file comes in. Libcurl will use .netrc for the redirected connection. It is possible to cause libcurl to use the .ncrc password always, but this introduces a security hole because it may send the initial user+pwd to every server in the redirection chain. In summary, if you are using redirection, then you are ''strongly'' encouraged to create a .netrc file to hold the password for the site to which the redirection is sent.

The format of this .netrc file will contain lines that typically look like this.

machine mmmmmm login xxxxxx password yyyyyy

where the machine, mmmmmm, is the hostname of the machine to which the client is redirected for authorization, and the login and password are those needed to authenticate on that machine.

The location of the .netrc file can be specified by putting the following line in your .ncrc/.dodsrc file.

HTTP.NETRC=<path to .ncrc file>

If not specified, then libcurl will look first in the current directory, and then in the HOME directory.

One final note. In using this, you MUST to specify a real file in the file system to act as the cookie jar file (HTTP.COOKIEJAR) so that the redirect site can properly pass back authorization information.

Accessing earthdata.nasa.gov

Since it is so common, here is a set of templates to use to access earthdata.nasa.gov.

.ncrc File

HTTP.NETRC=/home/<user>/.netrc
HTTP.COOKIEJAR=/home/<user>/.urs_cookies

.netrc File

machine urs.earthdata.nasa.gov login <user> password <password>

Client-Side Certificates

Some systems, notably ESG (Earth System Grid), requires the use of client-side certificates, as well as being re-direction based. This requires setting the following entries:

  • HTTP.COOKIEJAR — a file path for storing cookies across re-direction.
  • HTTP.NETRC — the path to the netrc file.
  • HTTP.SSL.CERTIFICATE — the file path for the client side certificate file.
  • HTTP.SSL.KEY — this should have the same value as HTTP.SSL.CERTIFICATE.
  • HTTP.SSL.CAPATH — the path to a "certificates" directory.
  • HTTP.SSL.VALIDATE — force validation of the server certificate.

Note that the first two are there to support re-direction based authentication.

References

  1. https://curl.haxx.se/libcurl/c/curl_easy_setopt.html
  2. https://curl.haxx.se/docs/ssl-compared.html

Appendix A. All RC-File Keys

For completeness, this is the list of all rc-file keys. If this documentation is out of date with respect to the actual code, the code is definitive.

Keycurl_easy_setopt Option
HTTP.DEFLATECUROPT_DEFLATE
with value "deflate,gzip"
HTTP.VERBOSECUROPT_VERBOSE
HTTP.TIMEOUTCUROPT_TIMEOUT
HTTP.USERAGENTCUROPT_USERAGENT
HTTP.COOKIEJARCUROPT_COOKIEJAR
HTTP.COOKIE_JARCUROPT_COOKIEJAR
HTTP.PROXY.SERVERCURLOPT_PROXY,
CURLOPT_PROXYPORT,
CURLOPT_PROXYUSERPWD
HTTP.PROXY_SERVERCURLOPT_PROXY,
CURLOPT_PROXYPORT,
CURLOPT_PROXYUSERPWD
HTTP.SSL.CERTIFICATECUROPT_SSLCERT
HTTP.SSL.KEYCUROPT_SSLKEY
HTTP.SSL.KEYPASSWORDCUROPT_KEYPASSWORD
HTTP.SSL.CAINFOCUROPT_CAINFO
HTTP.SSL.CAPATHCUROPT_CAPATH
HTTP.SSL.VERIFYPEERCUROPT_SSL_VERIFYPEER
HTTP.CREDENTIALS.USERPASSWORDCUROPT_USERPASSWORD
HTTP.CREDENTIALS.USERNAMECUROPT_USERNAME
HTTP.CREDENTIALS.PASSWORDCUROPT_PASSWORD
HTTP.NETRCCURLOPT_NETRC,CURLOPT_NETRC_FILE

Appendix B. URS Access in Detail

It is possible to use the NASA Earthdata Login System (URS) with netcdf by using using the process specified in the redirection based authorization section. In order to access URS controlled datasets, however, it is necessary to register as a user with NASA at this website (subject to change):

https://uat.urs.earthdata.nasa.gov/

Appendix C. ESG Access in Detail

It is possible to access Earth Systems Grid (ESG) datasets from ESG servers through the netCDF API using the techniques described in the section on Client-Side Certificates.

In order to access ESG datasets, however, it is necessary to register as a user with ESG and to setup your environment so that proper authentication is established between an netcdf client program and the ESG data server. Specifically, it is necessary to use what is called "client-side keys" to enable this authentication. Normally, when a client accesses a server in a secure fashion (using "https"), the server provides an authentication certificate to the client. With client-side keys, the client must also provide a certificate to the server so that the server can know with whom it is communicating. Note that this section is subject to change as ESG changes its procedures.

The netcdf library uses the curl library and it is that underlying library that must be properly configured.

Terminology

The key elements for client-side keys requires the constructions of two "stores" on the client side.

  • Keystore - a repository to hold the client side key.
  • Truststore - a repository to hold a chain of certificates that can be used to validate the certificate sent by the server to the client.

The server actually has a similar set of stores, but the client need not be concerned with those.

Initial Steps

The first step is to obtain authorization from ESG. Note that this information may evolve over time, and may be out of date. This discussion is in terms of BADC and NCSA. You will need to substitute as necessary.

  1. Register at http://badc.nerc.ac.uk/register to obtain access to badc and to obtain an openid, which will looks something like:

    https://ceda.ac.uk/openid/Firstname.Lastname
  2. Ask BADC for access to whatever datasets are of interest.

  3. Obtain short term credentials at http://grid.ncsa.illinois.edu/myproxy/MyProxyLogon/ You will need to download and run the MyProxyLogon program. This will create a keyfile in, typically, the directory ".globus". The keyfile will have a name similar to this: "x509up_u13615" The other elements in ".globus" are certificates to use in validating the certificate your client gets from the server.

  4. Obtain the program source ImportKey.java from this location: http://www.agentbob.info/agentbob/79-AB.html (read the whole page, it will help you understand the remaining steps).

Building the KeyStore

You will have to modify the keyfile in the previous step and then create a keystore and install the key and a certificate. The commands are these:

openssl pkcs8 -topk8 -nocrypt -in x509up_u13615 -inform PEM -out key.der -outform DER
openssl x509 -in x509up_u13615 -inform PEM -out cert.der -outform DER
java -classpath <path to ImportKey.class> -Dkeypassword="<password>" -Dkeystore=./<keystorefilename> key.der cert.der

Note, the file names "key.der" and "cert.der" can be whatever you choose. It is probably best to leave the .der extension, though.

Building the TrustStore

Building the truststore is a bit tricky because as provided, the certificates in ".globus" need some massaging. See the script below for the details. The primary command is this, which is executed for every certificate, c, in globus. It sticks the certificate into the file named "truststore"

keytool -trustcacerts -storepass "password" -v -keystore "truststore"  -importcert -file "${c}"

Running the C Client

Refer to the section on Client-Side Certificates. The keys specified there must be set in the rc file to support ESG access.

  • HTTP.COOKIEJAR=~/.dods_cookies
  • HTTP.NETRC=~/.netrc
  • HTTP.SSL.CERTIFICATE=~/esgkeystore
  • HTTP.SSL.KEY=~/esgkeystore
  • HTTP.SSL.CAPATH=~/.globus
  • HTTP.SSL.VALIDATE=1

Of course, the file paths above are suggestions only; you can modify as needed. The HTTP.SSL.CERTIFICATE and HTTP.SSL.KEY entries should have same value, which is the file path for the certificate produced by MyProxyLogon. The HTTP.SSL.CAPATH entry should be the path to the "certificates" directory produced by MyProxyLogon.

As noted, ESG also uses re-direction based authentication. So, when it receives an initial connection from a client, it redirects to a separate authentication server. When that server has authenticated the client, it redirects back to the original url to complete the request.

Script for creating Stores

The following script shows in detail how to actually construct the key and trust stores. It is specific to the format of the globus file as it was when ESG support was first added. It may have changed since then, in which case, you will need to seek some help in fixing this script. It would help if you communicated what you changed to the author so this document can be updated.

#!/bin/sh -x
KEYSTORE="esgkeystore"
TRUSTSTORE="esgtruststore"
GLOBUS="globus"
TRUSTROOT="certificates"
CERT="x509up_u13615"
TRUSTROOTPATH="$GLOBUS/$TRUSTROOT"
CERTFILE="$GLOBUS/$CERT"
PWD="password"

D="-Dglobus=$GLOBUS"
CCP="bcprov-jdk16-145.jar"
CP="./build:${CCP}"
JAR="myproxy.jar"

# Initialize needed directories
rm -fr build
mkdir build
rm -fr $GLOBUS
mkdir $GLOBUS
rm -f $KEYSTORE
rm -f $TRUSTSTORE

# Compile MyProxyCmd and ImportKey
javac -d ./build -classpath "$CCP" *.java
javac -d ./build ImportKey.java

# Execute MyProxyCmd
java -cp "$CP myproxy.MyProxyCmd

# Build the keystore
openssl pkcs8 -topk8 -nocrypt -in $CERTFILE -inform PEM -out key.der -outform DER
openssl x509 -in $CERTFILE -inform PEM -out cert.der -outform DER
java -Dkeypassword=$PWD -Dkeystore=./${KEYSTORE} -cp ./build ImportKey key.der cert.der

# Clean up the certificates in the globus directory
for c in ${TRUSTROOTPATH}/*.0 ; do
    alias=`basename $c .0`
    sed -e '0,/---/d' <$c >/tmp/${alias}
    echo "-----BEGIN CERTIFICATE-----" >$c
    cat /tmp/${alias} >>$c
done

# Build the truststore
for c in ${TRUSTROOTPATH}/*.0 ; do
    alias=`basename $c .0`
    echo "adding: $TRUSTROOTPATH/${c}"
    echo "alias: $alias"
    yes | keytool -trustcacerts -storepass "$PWD" -v -keystore ./$TRUSTSTORE -alias $alias -importcert -file "${c}"
done
exit

Point of Contact

Author: Dennis Heimbigner
Email: dmh at ucar dot edu Initial Version: 11/21/2014
Last Revised: 08/24/2017