\endcode
That is, it consists of a key name and value pair and optionally
preceded by a url enclosed in square brackets.
For given KEY and URL strings, the value chosen is as follows:
If URL is null, then look for the .dodsrc entry that has no url prefix
and whose key is same as the KEY for which we are looking.
If the URL is not null, then look for all the .dodsrc entries that
have a url, URL1, say, and for which URL1 is a prefix (in the string
sense) of URL. For example, if URL = http//x.y/a, then it will match
entries of the form
\code
1. [http//x.y/a]KEY=VALUE
2. [http//x.y/a/b]KEY=VALUE
\endcode
It will not match an entry of the form
\code
[http//x.y/b]KEY=VALUE
\endcode
because “http://x.y/b” is not a string prefix of
“http://x.y/a”. Finally from the set so constructed, choose the entry
with the longest url prefix: “http//x.y/a/b]KEY=VALUE” in this case.
Currently, the supported set of keys (with descriptions) are as
follows.
HTTP.VERBOSE
Type: boolean ("1"/"0")
Description: Produce verbose output, especially using SSL.
Related CURL Flags: CURLOPT_VERBOSE
HTTP.DEFLATE
Type: boolean ("1"/"0")
Description: Allow use of compression by the server.
Related CURL Flags: CURLOPT_ENCODING
HTTP.COOKIEJAR
Type: String representing file path
Description: Specify the name of file into which to store cookies. Defaults to in-memory storage.
Related CURL Flags:CURLOPT_COOKIEJAR
HTTP.COOKIEFILE
Type: String representing file path
Description: Same as HTTP.COOKIEJAR.
Related CURL Flags: CURLOPT_COOKIEFILE
HTTP.CREDENTIALS.USER
Type: String representing user name
Description: Specify the user name for Digest and Basic authentication.
Related CURL Flags:
HTTP.CREDENTIALS.PASSWORD
Type: String representing password
Type: boolean ("1"/"0")
Description: Specify the password for Digest and Basic authentication.
Related CURL Flags:
HTTP.SSL.CERTIFICATE
Type: String representing file path
Description: Path to a file containing a PEM cerficate.
Related CURL Flags: CURLOPT_CERT
HTTP.SSL.KEY
Type: String representing file path
Description: Same as HTTP.SSL.CERTIFICATE, and should usually have the same value.
Related CURL Flags: CURLOPT_SSLKEY
HTTP.SSL.KEYPASSWORD
Type: String representing password
Description: Password for accessing the HTTP.SSL.KEY/HTTP.SSL.CERTIFICATE
Related CURL Flags: CURLOPT_KEYPASSWORD
HTTP.SSL.CAPATH
Type: String representing directory
Description: Path to a directory containing trusted certificates for validating server sertificates.
Related CURL Flags: CURLOPT_CAPATH
HTTP.SSL.VALIDATE
Type: boolean ("1"/"0")
Description: Cause the client to verify the server's presented certificate.
Related CURL Flags: CURLOPT_SSL_VERIFYPEER, CURLOPT_SSL_VERIFYHOST
HTTP.TIMEOUT
Type: String ("dddddd")
Description: Specify the maximum time in seconds that you allow the http transfer operation to take.
Related CURL Flags: CURLOPT_TIMEOUT, CURLOPT_NOSIGNAL
HTTP.PROXY_SERVER
Type: String representing url to access the proxy: (e.g.http://[username:password@]host[:port])
Description: Specify the needed information for accessing a proxy.
Related CURL Flags: CURLOPT_PROXY, CURLOPT_PROXYHOST, CURLOPT_PROXYUSERPWD
The related curl flags line indicates the curl flags modified by this
key. See the libcurl documentation of the curl_easy_setopt() function
for more detail (http://curl.haxx.se/libcurl/c/curl_easy_setopt.html).
For ESG client side key support, the following entries must be specified:
\code
HTTP.SSL.VALIDATE
HTTP.COOKIEJAR
HTTP.SSL.CERTIFICATE
HTTP.SSL.KEY
HTTP.SSL.CAPATH
\endcode
Additionally, for ESG, the HTTP.SSL.CERTIFICATE and HTTP.SSL.KEY
entries should have same value, which is the file path for the
certificate produced by MyProxyLogon. The HTTP.SSL.CAPATH entry should
be the path to the "certificates" directory produced by MyProxyLogon.
\page netcdf_perf_chunking Improving Performance with Chunking
\tableofcontents
\section chunk_cache The Chunk Cache
When data are first read or written to a netCDF-4/HDF5 variable, the
HDF5 library opens a cache for that variable. The default size of that
cache (settable with the –with-chunk-cache-size at netCDF build time).
For good performance your chunk cache must be larger than one chunk of
your data - preferably that it be large enough to hold multiple chunks
of data.
In addition, when a file is opened (or a variable created in an open
file), the netCDF-4 library checks to make sure the default chunk
cache size will work for that variable. The cache will be large enough
to hold N chunks, up to a maximum size of M bytes. (Both N and M are
settable at configure time with the –with-default-chunks-in-cache and
the –with-max-default-cache-size options to the configure
script. Currently they are set to 10 and 64 MB.)
To change the default chunk cache size, use the set_chunk_cache
function before opening the file with nc_set_chunk_cache(). Fortran 77
programmers see NF_SET_CHUNK_CACHE). Fortran 90 programmers use the
optional cache_size, cache_nelems, and cache_preemption parameters to
nf90_open/nf90_create to change the chunk size before opening the
file.
To change the per-variable cache size, use the set_var_chunk_cache
function at any time on an open file. C programmers see
nc_set_var_chunk_cache(), Fortran 77 programmers see
NF_SET_VAR_CHUNK_CACHE, ).
\section default_chunking_4_1 The Default Chunking Scheme
When the data writer does not specify chunk sizes for variable, the
netCDF library has to come up with some default values.
For unlimited dimensions, a chunk size of one is always used. For
large datasets, where the size of fixed dimensions is small compared
to the unlimited dimensions, users are advised to avoid unlimited
dimensions or to increase the chunk sizes of the unlimited
dimensions. Be aware that an unlimited dimension with chunksize > 1
may result in slower performance for record-oriented access patterns
that where common with netcdf-3.
For unlimited dimensions, a chunk size of one is always used. For
large datasets, where the size of fixed dimensions is small compared
to the unlimited dimensions, users are advised to avoid unlimited
dimensions or to increase the chunk sizes of the unlimited
dimensions. Be aware that an unlimited dimension with chunksize > 1
may result in slower performance for record-oriented access patterns
that where common with netcdf-3.
\section chunking_parallel_io Chunking and Parallel I/O
When files are opened for read/write parallel I/O access, the chunk
cache is not used. Therefore it is important to open parallel files
with read only access when possible, to achieve the best performance.
\section bm_file A Utility to Help Benchmark Results: bm_file
The bm_file utility may be used to copy files, from one netCDF format
to another, changing chunking, filter, parallel I/O, and other
parameters. This program may be used for benchmarking netCDF
performance for user data files with a range of choices, allowing data
producers to pick settings that best serve their user base.
NetCDF must have been configured with –enable-benchmarks at build time
for the bm_file program to be built. When built with
–enable-benchmarks, netCDF will include tests (run with “make check”)
that will run the bm_file program on sample data files.
Since data files and their access patterns vary from case to case,
these benchmark tests are intended to suggest further use of the
bm_file program for users.
Here's an example of a call to bm_file:
\code
./bm_file -d -f 3 -o tst_elena_out.nc -c 0:-1:0:1024:256:256 tst_elena_int_3D.nc
\endcode
Generally a range of settings must be tested. This is best done with a
shell script, which calls bf_file repeatedly, to create output like
this:
*** Running benchmarking program bm_file for simple shorts test files, 1D to 6D...
input format, output_format, input size, output size, meta read time, meta write time, data read time, data write time, enddianness, metadata reread time, data reread time, read rate, write rate, reread rate, deflate, shuffle, chunksize[0], chunksize[1], chunksize[2], chunksize[3]
1, 4, 200092, 207283, 1613, 1054, 409, 312, 0, 1208, 1551, 488.998, 641.026, 128.949, 0, 0, 100000, 0, 0, 0
1, 4, 199824, 208093, 1545, 1293, 397, 284, 0, 1382, 1563, 503.053, 703.211, 127.775, 0, 0, 316, 316, 0, 0
1, 4, 194804, 204260, 1562, 1611, 390, 10704, 0, 1627, 2578, 499.159, 18.1868, 75.5128, 0, 0, 46, 46, 46, 0
1, 4, 167196, 177744, 1531, 1888, 330, 265, 0, 12888, 1301, 506.188, 630.347, 128.395, 0, 0, 17, 17, 17, 17
1, 4, 200172, 211821, 1509, 2065, 422, 308, 0, 1979, 1550, 473.934, 649.351, 129.032, 0, 0, 10, 10, 10, 10
1, 4, 93504, 106272, 1496, 2467, 191, 214, 0, 32208, 809, 488.544, 436.037, 115.342, 0, 0, 6, 6, 6, 6
*** SUCCESS!!!
Such tables are suitable for import into spreadsheets, for easy
graphing of results.
Several test scripts are run during the “make check” of the netCDF
build, in the nc_test4 directory. The following example may be found
in nc_test4/run_bm_elena.sh.
#!/bin/sh
# This shell runs some benchmarks that Elena ran as described here:
# http://hdfeos.org/workshops/ws06/presentations/Pourmal/HDF5_IO_Perf.pdf
# $Id: netcdf.texi,v 1.82 2010/05/15 20:43:13 dmh Exp $
set -e
echo ""
echo "*** Testing the benchmarking program bm_file for simple float file, no compression..."
./bm_file -h -d -f 3 -o tst_elena_out.nc -c 0:-1:0:1024:16:256 tst_elena_int_3D.nc
./bm_file -d -f 3 -o tst_elena_out.nc -c 0:-1:0:1024:256:256 tst_elena_int_3D.nc
./bm_file -d -f 3 -o tst_elena_out.nc -c 0:-1:0:512:64:256 tst_elena_int_3D.nc
./bm_file -d -f 3 -o tst_elena_out.nc -c 0:-1:0:512:256:256 tst_elena_int_3D.nc
./bm_file -d -f 3 -o tst_elena_out.nc -c 0:-1:0:256:64:256 tst_elena_int_3D.nc
./bm_file -d -f 3 -o tst_elena_out.nc -c 0:-1:0:256:256:256 tst_elena_int_3D.nc
echo '*** SUCCESS!!!'
exit 0
The reading that bm_file does can be tailored to match the expected
access pattern.
The bm_file program is controlled with command line options.
./bm_file
bm_file -v [-s N]|[-t V:S:S:S -u V:C:C:C -r V:I:I:I] -o file_out -f N -h -c V:C:C,V:C:C:C -d -m -p -i -e 1|2 file
[-v] Verbose
[-o file] Output file name
[-f N] Output format (1 - classic, 2 - 64-bit offset, 3 - netCDF-4, 4 - netCDF4/CLASSIC)
[-h] Print output header
[-c V:Z:S:C:C:C[,V:Z:S:C:C:C, etc.]] Deflate, shuffle, and chunking parameters for vars
[-t V:S:S:S[,V:S:S:S, etc.]] Starts for reads/writes
[-u V:C:C:C[,V:C:C:C, etc.]] Counts for reads/writes
[-r V:I:I:I[,V:I:I:I, etc.]] Incs for reads/writes
[-d] Doublecheck output by rereading each value
[-m] Do compare of each data value during doublecheck (slow for large files!)
[-p] Use parallel I/O
[-s N] Denom of fraction of slowest varying dimension read.
[-i] Use MPIIO (only relevant for parallel builds).
[-e 1|2] Set the endianness of output (1=little 2=big).
file Name of netCDF file
\page netcdf_utilities_guide NetCDF Utilities
\tableofcontents
\section cdl_guide CDL Guide
\subsection cdl_syntax CDL Syntax
Below is an example of CDL, describing a netCDF classic format file with several
named dimensions (lat, lon, time), variables (z, t, p, rh, lat, lon,
time), variable attributes (units, _FillValue, valid_range), and some
data.
\code
netcdf foo { // example netCDF specification in CDL
dimensions:
lat = 10, lon = 5, time = unlimited;
variables:
int lat(lat), lon(lon), time(time);
float z(time,lat,lon), t(time,lat,lon);
double p(time,lat,lon);
int rh(time,lat,lon);
lat:units = "degrees_north";
lon:units = "degrees_east";
time:units = "seconds";
z:units = "meters";
z:valid_range = 0., 5000.;
p:_FillValue = -9999.;
rh:_FillValue = -1;
data:
lat = 0, 10, 20, 30, 40, 50, 60, 70, 80, 90;
lon = -140, -118, -96, -84, -52;
}
\endcode
All CDL statements are terminated by a semicolon. Spaces, tabs, and
newlines can be used freely for readability. Comments may follow the
double slash characters '//' on any line.
A CDL description for a classic model file consists of three optional
parts: dimensions, variables, and data. The variable part may contain
variable declarations and attribute assignments. For the enhanced
model supported by netCDF-4, a CDL decription may also includes
groups, subgroups, and user-defined types.
A dimension is used to define the shape of one or more of the
multidimensional variables described by the CDL description. A
dimension has a name and a length. At most one dimension in a classic
CDL description can have the unlimited length, which means a variable
using this dimension can grow to any length (like a record number in a
file). Any number of dimensions can be declared of unlimited length in
CDL for an enhanced model file.
A variable represents a multidimensional array of values of the same
type. A variable has a name, a data type, and a shape described by its
list of dimensions. Each variable may also have associated attributes
(see below) as well as data values. The name, data type, and shape of
a variable are specified by its declaration in the variable section of
a CDL description. A variable may have the same name as a dimension;
by convention such a variable contains coordinates of the dimension it
names.
An attribute contains information about a variable or about the whole
netCDF dataset or containing group. Attributes may be used to specify
such properties as units, special values, maximum and minimum valid
values, and packing parameters. Attribute information is represented
by single values or one-dimensional arrays of values. For example,
“units” might be an attribute represented by a string such as
“celsius”. An attribute has an associated variable, a name, a data
type, a length, and a value. In contrast to variables that are
intended for data, attributes are intended for ancillary data or
metadata (data about data).
In CDL, an attribute is designated by a variable and attribute name,
separated by a colon (':'). It is possible to assign global attributes
to the netCDF dataset as a whole by omitting the variable name and
beginning the attribute name with a colon (':'). The data type of an
attribute in CDL, if not explicitly specified, is derived from the
type of the value assigned to it. In the netCDF-4
enhanced model, attributes may be declared to be of user-defined type,
like variables.
The length of an attribute is the number of data values assigned to
it. Multiple values are assigned to non-character attributes by
separating the values with commas (','). All values assigned to an
attribute must be of the same type. In the classic data model,
character arrays are used for textual information. The length of a
character attribute is the number of bytes, and an array of
character values can be represented in string notation. In the
enhanced data model of netCDF-4, variable-length strings are available
as a primitive type, and the length of a string attribute is the
number of string values assigned to it.
In CDL, just as for netCDF, the names of dimensions, variables and
attributes (and, in netCDF-4 files, groups, user-defined types,
compound member names, and enumeration symbols) consist of arbitrary
sequences of alphanumeric characters, underscore '_', period '.', plus
'+', hyphen '-', or at sign '@', but beginning with a letter or
underscore. However names commencing with underscore are reserved for
system use. Case is significant in netCDF names. A zero-length name is
not allowed. Some widely used conventions restrict names to only
alphanumeric characters or underscores. Names that have trailing space
characters are also not permitted.
Beginning with versions 3.6.3 and 4.0, names may also include UTF-8
encoded Unicode characters as well as other special characters, except
for the character '/', which may not appear in a name (because it is
reserved for path names of nested groups). In CDL, most special
characters are escaped with a backslash '\' character, but that
character is not actually part of the netCDF name. The special
characters that do not need to be escaped in CDL names are underscore
'_', period '.', plus '+', hyphen '-', or at sign '@'. The formal
specification of CDL name syntax is provided in the classic format
specification (see \ref classic_format). Note that by using
special characters in names, you may make your data not compliant with
conventions that have more stringent requirements on valid names for
netCDF components, for example the CF Conventions.
The names for the primitive data types are reserved words in CDL, so
names of variables, dimensions, and attributes must not be primitive
type names.
The optional data section of a CDL description is where netCDF
variables may be initialized. The syntax of an initialization is
simple:
\code
variable = value_1, value_2, ... ;
\endcode
The comma-delimited list of constants may be separated by spaces,
tabs, and newlines. For multidimensional arrays, the last dimension
varies fastest. Thus, row-order rather than column order is used for
matrices. If fewer values are supplied than are needed to fill a
variable, it is extended with the fill value. The types of constants
need not match the type declared for a variable; coercions are done to
convert integers to floating point, for example. All meaningful type
conversions among numeric primitive types are supported.
A special notation for fill values is supported: the ‘_’ character
designates a fill value for variables.
\subsection cdl_data_types CDL Data Types
The CDL primitive data types for the classic model are:
- char - Characters.
- byte - Eight-bit integers.
- short - 16-bit signed integers.
- int - 32-bit signed integers.
- long - (Deprecated, synonymous with int)
- float - IEEE single-precision floating point (32 bits).
- real - (Synonymous with float).
- double - IEEE double-precision floating point (64 bits).
NetCDF-4 supports the additional primitive types:
- ubyte - Unsigned eight-bit integers.
- ushort - Unsigned 16-bit integers.
- uint - Unsigned 32-bit integers.
- int64 - 64-bit singed integers.
- uint64 - Unsigned 64-bit singed integers.
- string - Variable-length string of characters
Except for the added numeric data-types byte and ubyte, CDL supports
the same numeric primitive
data types as C. For backward compatibility, in declarations primitive
type names may be specified in either upper or lower case.
The byte type differs from the char type in that it is intended for
numeric data, and the zero byte has no special significance, as it may
for character data. In the classic data model, byte data could be
interpreted as either signed (-128 to 127) or unsigned (0 to
255). When reading byte data in a way that converts it into another
numeric type, the default interpretation is signed. The netCDF-4
enhanced data model added an unsigned byte type.
The short type holds values between -32768 and
32767. The ushort type holds values between 0 and 65536. The int type
can hold values between -2147483648 and 2147483647. The uint type
holds values between 0 and 4294967296. The int64 type can hold values
between -9223372036854775808 and 9223372036854775807. The uint64 type
can hold values between 0 and 18446744073709551616.
The float type can hold values between about -3.4+38 and 3.4+38, with
external representation as 32-bit IEEE normalized single-precision
floating-point numbers. The double type can hold values between about
-1.7+308 and 1.7+308, with external representation as 64-bit IEEE
standard normalized double-precision, floating-point numbers. The
string type holds variable length strings.
A netCDF-4 string is a variable length array of Unicode
characters. When reading/writing a String to a
netCDF file or other external representation, the characters are UTF-8
encoded (note that ASCII is a
subset of UTF-8). Libraries may use different internal
representations, for example the Java library uses UTF-16 encoding.
The netCDF char type contains uninterpreted characters, one character
per byte. Typically these contain 7-bit ASCII characters, but the
character encoding is application specific. For this reason,
applications writing data using the enhanced data model are encouraged
to use the netCDF-4 string data type in preference to the char data
type. Applications writing string data using the char data type are
encouraged to add the special variable attribute "_Encoding" with a
value that the netCDF libraries recognize. Currently those valid
values are "UTF-8" or "ASCII", case insensitive.
\subsection cdl_constants CDL Notation for Data Constants
This section describes the CDL notation for constants.
Attributes are initialized in the variables section of a CDL
description by providing a list of constants that determines the
attribute's length and type (if primitive and not explicitly
declared). CDL defines a syntax for constant values that permits
distinguishing among different netCDF primitive types. The syntax for
CDL constants is similar to C syntax, with type suffixes appended to
bytes, shorts, and floats to distinguish them from ints and doubles.
A byte constant is represented by an integer constant with a 'b' (or
'B') appended. In the old netCDF-2 API, byte constants could also be
represented using single characters or standard C character escape
sequences such as 'a' or '\n'. This is still supported for backward
compatibility, but deprecated to make the distinction clear between
the numeric byte type and the textual char type. Example byte
constants include:
\code
0b // a zero byte
-1b // -1 as an 8-bit byte
255b // also -1 as a signed 8-bit byte
\endcode
Character constants are enclosed in double quotes. A character array
may be represented as a string enclosed in double quotes. Multiple CDL
strings are concatenated into a single array of characters, permitting
long character arrays to appear on multiple lines. To support multiple
variable-length textual values, a conventional delimiter such as ','
or blank may be used, but interpretation of any such convention for a
delimiter must be implemented in software above the netCDF library
layer. The usual escape conventions for C strings are honored. For
example:
\code
"a" // ASCII 'a'
"Two\nlines\n" // a 10-character string with two embedded newlines
"a bell:\007" // a character array containing an ASCII bell
"ab","cde" // the same as "abcde"
\endcode
The form of a short constant is an integer constant with an 's' or 'S'
appended. If a short constant begins with '0', it is interpreted as
octal. When it begins with '0x', it is interpreted as a hexadecimal
constant. For example:
\code
2s // a short 2
0123s // octal
0x7ffs // hexadecimal
\endcode
The form of an int constant is an ordinary integer constant. If an int
constant begins with '0', it is interpreted as octal. When it begins
with '0x', it is interpreted as a hexadecimal constant. Examples of
valid int constants include:
\code
-2
0123 // octal
0x7ff // hexadecimal
1234567890L // deprecated, uses old long suffix
\endcode
The float type is appropriate for representing data with about seven
significant digits of precision. The form of a float constant is the
same as a C floating-point constant with an 'f' or 'F' appended. A
decimal point is required in a CDL float to distinguish it from an
integer. For example, the following are all acceptable float
constants:
\code
-2.0f
3.14159265358979f // will be truncated to less precision
1.f
.1f
\endcode
The double type is appropriate for representing floating-point data
with about 16 significant digits of precision. The form of a double
constant is the same as a C floating-point constant. An optional 'd'
or 'D' may be appended. A decimal point is required in a CDL double to
distinguish it from an integer. For example, the following are all
acceptable double constants:
\code
-2.0
3.141592653589793
1.0e-20
1.d
\endcode
Unsigned integer constants can be created by appending the character
'U' or 'u' between the constant and any trailing size specifier. Thus
one could say 10U, 100us, 100000ul, or 1000000ull, for example.
Constants for the variable-length string type, available as a
primitive type in the netCDF-4 enhanced data model are, like character
constants, represented using double quotes. This represents a
potential ambiguity since a multi-character string may also indicate a
dimensioned character value. Disambiguation usually occurs by context,
but care should be taken to specify the string type to ensure the
proper choice. For example, these two CDL specifications of global
attributes have different types:
\code
:att1 = "abcd", "efg" ; // a char attribute of length 7
string :att2 = "abcd", efg" ; // a string attribute of length 2
\endcode
Opaque constants are represented as sequences of hexadecimal digits
preceded by 0X or 0x: 0xaa34ffff, for example. These constants can
still be used as integer constants and will be either truncated or
extended as necessary.
The ncgen man-page reference has more details about CDL representation
of constants of user-defined types.
\section guide_ncdump ncdump
Convert NetCDF file to text form (CDL)
\subsection ncdump_SYNOPSIS ncdump synopsis
\code
ncdump [-chistxw] [-v var1,...] [-b lang] [-f lang]
[-l len] [-n name] [-p n[,n]] [-g grp1,...] file
ncdump -k file
\endcode
\subsection ncdump_DESCRIPTION ncdump description
The \b ncdump utility generates a text representation of a specified
netCDF file on standard output, optionally excluding some or all of
the variable data in the output. The text representation is in a form
called CDL (network Common Data form Language) that can be viewed,
edited, or serve as input to \b ncgen, a companion program that can
generate a binary netCDF file from a CDL file. Hence \b ncgen and \b
ncdump can be used as inverses to transform the data representation
between binary and text representations. See \b ncgen documentation
for a description of CDL and netCDF representations.
\b ncdump may also be used to determine what kind of netCDF file
is used (which variant of the netCDF file format) with the -k
option.
If DAP support was enabled when \b ncdump was built, the file name may
specify a DAP URL. This allows \b ncdump to access data sources from
DAP servers, including data in other formats than netCDF. When used
with DAP URLs, \b ncdump shows the translation from the DAP data
model to the netCDF data model.
\b ncdump may also be used as a simple browser for netCDF data files,
to display the dimension names and lengths; variable names, types, and
shapes; attribute names and values; and optionally, the values of data
for all variables or selected variables in a netCDF file. For
netCDF-4 files, groups and user-defined types are also included in \b
ncdump output.
\b ncdump uses '_' to represent data values that are equal to the
'_FillValue' attribute for a variable, intended to represent
data that has not yet been written. If a variable has no
'_FillValue' attribute, the default fill value for the variable
type is used unless the variable is of byte type.
\b ncdump defines a default display format used for each type of
netCDF data, but this can be changed if a `C_format' attribute
is defined for a netCDF variable. In this case, \b ncdump will
use the `C_format' attribute to format each value. For
example, if floating-point data for the netCDF variable `Z' is
known to be accurate to only three significant digits, it would
be appropriate to use the variable attribute
\code
Z:C_format = "%.3g"
\endcode
\subsection ncdump_OPTIONS ncdump options
@par -c
Show the values of \e coordinate \e variables (1D variables with the
same names as dimensions) as well as the declarations of all
dimensions, variables, attribute values, groups, and user-defined
types. Data values of non-coordinate variables are not included in
the output. This is usually the most suitable option to use for a
brief look at the structure and contents of a netCDF file.
@par -h
Show only the header information in the output, that is, output only
the declarations for the netCDF dimensions, variables, attributes,
groups, and user-defined types of the input file, but no data values
for any variables. The output is identical to using the '-c' option
except that the values of coordinate variables are not included. (At
most one of '-c' or '-h' options may be present.)
@par -v \a var1,...
@par
The output will include data values for the specified variables, in
addition to the declarations of all dimensions, variables, and
attributes. One or more variables must be specified by name in the
comma-delimited list following this option. The list must be a single
argument to the command, hence cannot contain unescaped blanks or
other white space characters. The named variables must be valid netCDF
variables in the input-file. A variable within a group in a netCDF-4
file may be specified with an absolute path name, such as
`/GroupA/GroupA2/var'. Use of a relative path name such as `var' or
`grp/var' specifies all matching variable names in the file. The
default, without this option and in the absence of the '-c' or '-h'
options, is to include data values for \e all variables in the output.
@par -b [c|f]
A brief annotation in the form of a CDL comment (text beginning with
the characters '//') will be included in the data section of the
output for each 'row' of data, to help identify data values for
multidimensional variables. If lang begins with 'C' or 'c', then C
language conventions will be used (zero-based indices, last dimension
varying fastest). If lang begins with 'F' or 'f', then FORTRAN
language conventions will be used (one-based indices, first dimension
varying fastest). In either case, the data will be presented in the
same order; only the annotations will differ. This option may be
useful for browsing through large volumes of multidimensional data.
@par -f [c|f]
Full annotations in the form of trailing CDL comments (text beginning
with the characters '//') for every data value (except individual
characters in character arrays) will be included in the data
section. If lang begins with 'C' or 'c', then C language conventions
will be used. If lang begins with 'F' or 'f', then FORTRAN language
conventions will be used. In either case, the data will be presented
in the same order; only the annotations will differ. This option may
be useful for piping data into other filters, since each data value
appears on a separate line, fully identified. (At most one of '-b' or
'-f' options may be present.)
@par -l \e length
@par
Changes the default maximum line length (80) used in formatting lists
of non-character data values.
@par -n \e name
@par
CDL requires a name for a netCDF file, for use by 'ncgen -b' in
generating a default netCDF file name. By default, \b ncdump
constructs this name from the last component of the file name of
the input netCDF file by stripping off any extension it has. Use
the '-n' option to specify a different name. Although the output
file name used by 'ncgen -b' can be specified, it may be wise to
have \b ncdump change the default name to avoid inadvertently
overwriting a valuable netCDF file when using \b ncdump, editing the
resulting CDL file, and using 'ncgen -b' to generate a new netCDF
file from the edited CDL file.
@par -p \e float_digits[, \e double_digits ]
@par
Specifies default precision (number of significant digits) to use in
displaying floating-point or double precision data values for
attributes and variables. If specified, this value overrides the value
of the C_format attribute, if any, for a variable. Floating-point data
will be displayed with \e float_digits significant digits. If \e
double_digits is also specified, double-precision values will be
displayed with that many significant digits. In the absence of any
'-p' specifications, floating-point and double-precision data are
displayed with 7 and 15 significant digits respectively. CDL files can
be made smaller if less precision is required. If both floating-point
and double precisions are specified, the two values must appear
separated by a comma (no blanks) as a single argument to the command.
(To represent every last bit of precision in a CDL file for all
possible floating-point values would requires '-p 9,17'.)
@par -k
Show \e kind of netCDF file, that is which format variant the file uses.
Other options are ignored if this option is specified. Output will be
one of 'classic'. '64-bit offset', 'netCDF-4', or 'netCDF-4 classic
model'.
@par -s
Specifies that \e special virtual attributes should be output for the
file format variant and for variable properties such as
compression, chunking, and other properties specific to the format
implementation that are primarily related to performance rather
than the logical schema of the data. All the special virtual
attributes begin with '_' followed by an upper-case
letter. Currently they include the global attribute '_Format' and
the variable attributes '_ChunkSizes', '_DeflateLevel',
'_Endianness', '_Fletcher32', '_NoFill', '_Shuffle', and '_Storage'.
The \b ncgen utility recognizes these attributes and
supports them appropriately.
@par -t
Controls display of time data, if stored in a variable that uses a
udunits compliant time representation such as 'days since 1970-01-01'
or 'seconds since 2009-03-15 12:01:17'. If this option is specified,
time values are displayed as a human-readable date-time strings rather
than numerical values, interpreted in terms of a 'calendar' variable
attribute, if specified. For numeric attributes of time variables,
the human-readable time value is displayed after the actual value, in
an associated CDL comment. Calendar attribute values interpreted with
this option include the CF Conventions values 'gregorian' or
'standard', 'proleptic_gregorian', 'noleap' or '365_day', 'all_leap'
or '366_day', '360_day', and 'julian'.
@par -i
Same as the '-t' option, except output time data as date-time strings
with ISO-8601 standard 'T' separator, instead of a blank.
@par -g \e grp1,...
@par
The output will include data values only for the specified groups.
One or more groups must be specified by name in the comma-delimited
list following this option. The list must be a single argument to the
command. The named groups must be valid netCDF groups in the
input-file. The default, without this option and in the absence of the
'-c' or '-h' options, is to include data values for all groups in the
output.
@par -w
For file names that request remote access using DAP URLs, access data
with client-side caching of entire variables.
@par -x
Output XML (NcML) instead of CDL. The NcML does not include data values.
The NcML output option currently only works for netCDF classic model data.
\subsection ncdump_EXAMPLES ncdump examples
Look at the structure of the data in the netCDF file foo.nc:
\code
ncdump -c foo.nc
\endcode
Produce an annotated CDL version of the structure and data in the
netCDF file foo.nc, using C-style indexing for the annotations:
\code
ncdump -b c foo.nc > foo.cdl
\endcode
Output data for only the variables uwind and vwind from the netCDF
file foo.nc, and show the floating-point data with only three
significant digits of precision:
\code
ncdump -v uwind,vwind -p 3 foo.nc
\endcode
Produce a fully-annotated (one data value per line) listing of the
data for the variable omega, using FORTRAN conventions for indices,
and changing the netCDF file name in the resulting CDL file to
omega:
\code
ncdump -v omega -f fortran -n omega foo.nc > Z.cdl
\endcode
Examine the translated DDS for the DAP source from the specified URL:
\code
ncdump -h http://test.opendap.org:8080/dods/dts/test.01
\endcode
Without dumping all the data, show the special virtual attributes that indicate
performance-related characterisitics of a netCDF-4 file:
\code
ncdump -h -s nc4file.nc
\endcode
\subsection see_also_ncdump SEE ALSO
ncgen(1), netcdf(3)
- \ref guide_ncgen
- \ref guide_nccopy
\subsection ncdump_string_note NOTE ON STRING OUTPUT
For classic, 64-bit offset or netCDF-4 classic model data, \b ncdump
generates line breaks after embedded newlines in displaying character
data. This is not done for netCDF-4 files, because netCDF-4 supports
arrays of real strings of varying length.
\section guide_nccopy nccopy
Copy a netCDF file, optionally changing format, compression, or chunking in the output.
\subsection nccopy_SYNOPSIS nccopy synopsis
\code
nccopy [-k kind] [-d n] [-s] [-c chunkspec] [-u] [-w] [-[v|V] var1,...]
[-[g|G] grp1,...] [-m bufsize] [-h chunk_cache] [-e cache_elems]
[-r] infile outfile
\endcode
\subsection nccopy_DESCRIPTION nccopy description
The \b nccopy utility copies an input netCDF file in any supported
format variant to an output netCDF file, optionally converting the
output to any compatible netCDF format variant, compressing the data,
or rechunking the data. For example, if built with the netCDF-3
library, a netCDF classic file may be copied to a netCDF 64-bit offset
file, permitting larger variables. If built with the netCDF-4
library, a netCDF classic file may be copied to a netCDF-4 file or to
a netCDF-4 classic model file as well, permitting data compression,
efficient schema changes, larger variable sizes, and use of other
netCDF-4 features.
\b nccopy also serves as an example of a generic netCDF-4 program,
with its ability to read any valid netCDF file and handle nested
groups, strings, and user-defined types, including arbitrarily
nested compound types, variable-length types, and data of any valid
netCDF-4 type.
If DAP support was enabled when \b nccopy was built, the file name may
specify a DAP URL. This may be used to convert data on DAP servers to
local netCDF files.
\subsection nccopy_OPTIONS nccopy options
\par -k \e kind
Specifies the kind of file to be created (that is, the format variant)
and, by inference, the data model (i.e. netcdf-3 (classic) versus
netcdf-4 (enhanced)). The possible arguments are as follows. \n
'1' or 'classic' => netCDF classic format \n
'2', '64-bit-offset', or '64-bit offset' => netCDF 64-bit format \n
'3', 'hdf5', 'netCDF-4', or 'enhanced' => netCDF-4 format (enhanced data model) \n
'4', 'hdf5-nc3', 'netCDF-4 classic model', or 'enhanced-nc3' => netCDF-4 classic model format \n
\par
If no value for -k is specified, then the output will use the same
format as the input, except if the input is classic or 64-bit offset
and either chunking or compression is specified, in which case the
output will be netCDF-4 classic model format. Note that attempting
some kinds of format conversion will result in an error, if the
conversion is not possible. For example, an attempt to copy a
netCDF-4 file that uses features of the enhanced model, such as groups
or variable-length strings, to any of the other kinds of netCDF
formats that use the classic model will result in an error.
\par -d \e n
For netCDF-4 output, including netCDF-4 classic model, specify
deflation level (level of compression) for variable data output. 0
corresponds to no compression and 9 to maximum compression, with
higher levels of compression requiring marginally more time to
compress or uncompress than lower levels. Compression achieved may
also depend on output chunking parameters. If this option is
specified for a classic format or 64-bit offset format input file, it
is not necessary to also specify that the output should be netCDF-4
classic model, as that will be the default. If this option is not
specified and the input file has compressed variables, the compression
will still be preserved in the output, using the same chunking as in
the input by default.
\par
Note that \b nccopy requires all variables to be compressed using the
same compression level, but the API has no such restriction. With a
program you can customize compression for each variable independently.
\par -s For netCDF-4 output, including netCDF-4 classic model, specify
shuffling of variable data bytes before compression or after
decompression. Shuffling refers to interlacing of bytes in a chunk so
that the first bytes of all values are contiguous in storage, followed
by all the second bytes, and so on, which often improves compression.
This option is ignored unless a non-zero deflation level is specified.
Using -d0 to specify no deflation on input data that has been
compressed and shuffled turns off both compression and shuffling in
the output.
\par -u
Convert any unlimited size dimensions in the input to fixed size
dimensions in the output. This can speed up variable-at-a-time
access, but slow down record-at-a-time access to multiple variables
along an unlimited dimension.
\par -w
Keep output in memory (as a diskless netCDF file) until output is
closed, at which time output file is written to disk. This can
greatly speedup operations such as converting unlimited dimension to
fixed size (-u option), chunking, rechunking, or compressing the
input. It requires that available memory is large enough to hold the
output file. This option may provide a larger speedup than careful
tuning of the -m, -h, or -e options, and it's certainly a lot simpler.
\par -c \e chunkspec
\par
For netCDF-4 output, including netCDF-4 classic model, specify
chunking (multidimensional tiling) for variable data in the output.
This is useful to specify the units of disk access, compression, or
other filters such as checksums. Changing the chunking in a netCDF
file can also greatly speedup access, by choosing chunk shapes that
are appropriate for the most common access patterns.
\par
The chunkspec argument is a string of comma-separated associations,
each specifying a dimension name, a '/' character, and optionally the
corresponding chunk length for that dimension. No blanks should
appear in the chunkspec string, except possibly escaped blanks that
are part of a dimension name. A chunkspec names dimensions along
which chunking is to take place, and omits dimensions which are
not to be chunked or for
which the default chunk length is desired. If a dimension name is
followed by a '/' character but no subsequent chunk length, the actual
dimension length is assumed. If copying a classic model file to a
netCDF-4 output file and not naming all dimensions in the chunkspec,
unnamed dimensions will also use the actual dimension length for the
chunk length. An example of a chunkspec for variables that use
'm' and 'n' dimensions might be 'm/100,n/200' to specify 100 by 200
chunks.
\par
The chunkspec '/' that omits all dimension names and
corresponding chunk lengths specifies that no chunking is to occur in
the output, so can be used to unchunk all the chunked variables.
To see the chunking resulting from copying with a chunkspec,
use the '-s' option of ncdump on the output file.
\par
As an I/O optimization, \b nccopy has a threshold for the minimum size of
non-record variables that get chunked, currently 8192 bytes. In the future,
use of this threshold and its size may be settable in an option.
\par
Note that \b nccopy requires variables that share a dimension to also
share the chunk size associated with that dimension, but the
programming interface has no such restriction. If you need to
customize chunking for variables independently, you will need to use
the library API in a custom utility program.
\par -v \a var1,...
\par
The output will include data values for the specified variables, in
addition to the declarations of all dimensions, variables, and
attributes. One or more variables must be specified by name in the
comma-delimited list following this option. The list must be a single
argument to the command, hence cannot contain unescaped blanks or
other white space characters. The named variables must be valid netCDF
variables in the input-file. A variable within a group in a netCDF-4
file may be specified with an absolute path name, such as
`/GroupA/GroupA2/var'. Use of a relative path name such as `var' or
`grp/var' specifies all matching variable names in the file. The
default, without this option, is to include data values for \e all variables
in the output.
\par -V \a var1,...
\par
The output will include the specified variables only but all dimensions and
global or group attributes. One or more variables must be specified by name in the
comma-delimited list following this option. The list must be a single argument
to the command, hence cannot contain unescaped blanks or other white space
characters. The named variables must be valid netCDF variables in the
input-file. A variable within a group in a netCDF-4 file may be specified with
an absolute path name, such as `/GroupA/GroupA2/var'. Use of a relative path
name such as `var' or `grp/var' specifies all matching variable names in the
file. The default, without this option, is to include \e all variables in the
output.
\par -g \e grp1,...
\par
The output will include data values only for the specified groups.
One or more groups must be specified by name in the comma-delimited
list following this option. The list must be a single argument to the
command. The named groups must be valid netCDF groups in the
input-file. The default, without this option, is to include data values for all
groups in the output.
\par -G \e grp1,...
\par
The output will include only the specified groups.
One or more groups must be specified by name in the comma-delimited
list following this option. The list must be a single argument to the
command. The named groups must be valid netCDF groups in the
input-file. The default, without this option, is to include all groups in the
output.
\par -m \e bufsize
\par
An integer or floating-point number that specifies the size, in bytes,
of the copy buffer used to copy large variables. A suffix of K, M, G,
or T multiplies the copy buffer size by one thousand, million,
billion, or trillion, respectively. The default is 5 Mbytes,
but will be increased if necessary to hold at least one chunk of
netCDF-4 chunked variables in the input file. You may want to specify
a value larger than the default for copying large files over high
latency networks. Using the '-w' option may provide better
performance, if the output fits in memory.
\par -h \e chunk_cache
\par
For netCDF-4 output, including netCDF-4 classic model, an integer or
floating-point number that specifies the size in bytes of chunk cache
allocated for each chunked variable. This is not a property of the file, but merely
a performance tuning parameter for avoiding compressing or
decompressing the same data multiple times while copying and changing
chunk shapes. A suffix of K, M, G, or T multiplies the chunk cache
size by one thousand, million, billion, or trillion, respectively.
The default is 4.194304 Mbytes (or whatever was specified for the
configure-time constant CHUNK_CACHE_SIZE when the netCDF library was
built). Ideally, the \b nccopy utility should accept only one memory
buffer size and divide it optimally between a copy buffer and chunk
cache, but no general algorithm for computing the optimum chunk cache
size has been implemented yet. Using the '-w' option may provide
better performance, if the output fits in memory.
\par -e \e cache_elems
\par
For netCDF-4 output, including netCDF-4 classic model, specifies
number of chunks that the chunk cache can hold. A suffix of K, M, G,
or T multiplies the number of chunks that can be held in the cache
by one thousand, million, billion, or trillion, respectively. This is not a
property of the file, but merely a performance tuning parameter for
avoiding compressing or decompressing the same data multiple times
while copying and changing chunk shapes. The default is 1009 (or
whatever was specified for the configure-time constant
CHUNK_CACHE_NELEMS when the netCDF library was built). Ideally, the
\b nccopy utility should determine an optimum value for this parameter,
but no general algorithm for computing the optimum number of chunk
cache elements has been implemented yet.
\par -r
Read netCDF classic or 64-bit offset input file into a diskless netCDF
file in memory before copying. Requires that input file be small
enough to fit into memory. For \b nccopy, this doesn't seem to provide
any significant speedup, so may not be a useful option.
\subsection nccopy_EXAMPLES nccopy examples
Simple Copy
Make a copy of foo1.nc, a netCDF file of any type, to
foo2.nc, a netCDF file of the same type:
\code
nccopy foo1.nc foo2.nc
\endcode
Note that the above copy will not be as fast as use of cp or other
simple copy utility, because the file is copied using only the netCDF
API. If the input file has extra bytes after the end of the netCDF
data, those will not be copied, because they are not accessible
through the netCDF interface. If the original file was generated in
'No fill' mode so that fill values are not stored for padding for data
alignment, the output file may have different padding bytes.
Uncompress Data
Convert a netCDF-4 classic model file, compressed.nc, that uses
compression, to a netCDF-3 file classic.nc:
\code
nccopy -k classic compressed.nc classic.nc
\endcode
Note that '1' could be used instead of 'classic'.
Remote Access to Data Subset
Download the variable 'time_bnds' and its associated attributes from
an OPeNDAP server and copy the result to a netCDF file named 'tb.nc':
\code
nccopy 'http://test.opendap.org/opendap/data/nc/sst.mnmean.nc.gz?time_bnds' tb.nc
\endcode
Note that URLs that name specific variables as command-line arguments
should generally be quoted, to avoid the shell interpreting special
characters such as '?'.
Compress Data
Compress all the variables in the input file foo.nc, a netCDF file of
any type, to the output file bar.nc:
\code
nccopy -d1 foo.nc bar.nc
\endcode
If foo.nc was a classic or 64-bit offset netCDF file, bar.nc will be a
netCDF-4 classic model netCDF file, because the classic and 64-bit
offset format variants don't support compression. If foo.nc was a
netCDF-4 file with some variables compressed using various deflation
levels, the output will also be a netCDF-4 file of the same type, but
all the variables, including any uncompressed variables in the input,
will now use deflation level 1.
Rechunk Data for Faster Access
Assume the input data includes gridded variables that use time, lat,
lon dimensions, with 1000 times by 1000 latitudes by 1000 longitudes,
and that the time dimension varies most slowly. Also assume that
users want quick access to data at all times for a small set of
lat-lon points. Accessing data for 1000 times would typically require
accessing 1000 disk blocks, which may be slow.
Reorganizing the data into chunks on disk that have all the time in
each chunk for a few lat and lon coordinates would greatly speed up
such access. To chunk the data in the input file slow.nc, a netCDF
file of any type, to the output file fast.nc, you could use;
\code
nccopy -c time/1000,lat/40,lon/40 slow.nc fast.nc
\endcode
to specify data chunks of 1000 times, 40 latitudes, and 40 longitudes.
If you had enough memory to contain the output file, you could speed
up the rechunking operation significantly by creating the output in
memory before writing it to disk on close:
\code
nccopy -w -c time/1000,lat/40,lon/40 slow.nc fast.nc
\endcode
\subsection see_also_nccopy SEE ALSO
ncdump(1), ncgen(1), netcdf(3)
\section guide_ncgen ncgen
The ncgen tool generates a netCDF file or a C or FORTRAN program that
creates a netCDF dataset. If no options are specified in invoking
ncgen, the program merely checks the syntax of the CDL input,
producing error messages for any violations of CDL syntax.
The ncgen tool is now is capable of producing netcdf-4 files. It
operates essentially identically to the original ncgen.
The CDL input to ncgen may include data model constructs from the
netcdf- data model. In particular, it includes new primitive types
such as unsigned integers and strings, opaque data, enumerations, and
user-defined constructs using vlen and compound types. The ncgen man
page should be consulted for more detailed information.
UNIX syntax for invoking ncgen:
\code
ncgen [-b] [-o netcdf-file] [-c] [-f] [-k] [-l] [-x] [input-file]
\endcode
where:
-b
Create a (binary) netCDF file. If the '-o' option is absent, a default
file name will be constructed from the netCDF name (specified after
the netcdf keyword in the input) by appending the '.nc'
extension. Warning: if a file already exists with the specified name
it will be overwritten.
-o netcdf-file
Name for the netCDF file created. If this option is specified, it
implies the '-b' option. (This option is necessary because netCDF
files are direct-access files created with seek calls, and hence
cannot be written to standard output.)
-c
Generate C source code that will create a netCDF dataset matching the
netCDF specification. The C source code is written to standard
output. This is only useful for relatively small CDL files, since all
the data is included in variable initializations in the generated
program. The -c flag is deprecated and the -lc flag should be used
intstead.
-f
Generate FORTRAN source code that will create a netCDF dataset
matching the netCDF specification. The FORTRAN source code is written
to standard output. This is only useful for relatively small CDL
files, since all the data is included in variable initializations in
the generated program. The -f flag is deprecated and the -lf77 flag
should be used intstead.
-k
The -k file specifies the kind of netCDF file to generate. The
arguments to the -k flag can be as follows.
1, classic – Produce a netcdf classic file format file.
2, 64-bit-offset, '64-bit offset' – Produce a netcdf 64 bit classic file format file.
3, hdf5, netCDF-4, enhanced – Produce a netcdf-4 format file.
4, hdf5-nc3, 'netCDF-4 classic model', enhanced-nc3 – Produce a netcdf-4 file format, but restricted to netcdf-3 classic CDL input.
Note that the -v flag is a deprecated alias for -k.
-l
The -l file specifies that ncgen should output (to standard output)
the text of a program that, when compiled and executed, will produce
the corresponding binary .nc file. The arguments to the -l flag can be
as follows.
c|C => C language output.
f77|fortran77 => FORTRAN 77 language output; note that currently only the classic model is supported for fortran output.
-x
Use “no fill” mode, omitting the initialization of variable values
with fill values. This can make the creation of large files much
faster, but it will also eliminate the possibility of detecting the
inadvertent reading of values that haven't been written.
Examples
Check the syntax of the CDL file foo.cdl:
\code
ncgen foo.cdl
\endcode
From the CDL file foo.cdl, generate an equivalent binary netCDF file
named bar.nc:
\code
ncgen -o bar.nc foo.cdl
\endcode
From the CDL file foo.cdl, generate a C program containing netCDF
function invocations that will create an equivalent binary netCDF
dataset:
\code
ncgen -l c foo.cdl > foo.c
\endcode
\section guide_ncgen3 ncgen3
The ncgen3 tool is the new name for the older, original ncgen utility.
The ncgen3 tool generates a netCDF file or a C or FORTRAN program that
creates a netCDF dataset. If no options are specified in invoking
ncgen3, the program merely checks the syntax of the CDL input,
producing error messages for any violations of CDL syntax.
The ncgen3 utility can only generate classic-model netCDF-4 files or
programs.
UNIX syntax for invoking ncgen3:
\code
ncgen3 [-b] [-o netcdf-file] [-c] [-f] [-v2|-v3] [-x] [input-file]
\endcode
where:
-b
Create a (binary) netCDF file. If the '-o' option is absent, a default
file name will be constructed from the netCDF name (specified after
the netcdf keyword in the input) by appending the '.nc'
extension. Warning: if a file already exists with the specified name
it will be overwritten.
-o netcdf-file
Name for the netCDF file created. If this option is specified, it
implies the '-b' option. (This option is necessary because netCDF
files are direct-access files created with seek calls, and hence
cannot be written to standard output.)
-c
Generate C source code that will create a netCDF dataset matching the
netCDF specification. The C source code is written to standard
output. This is only useful for relatively small CDL files, since all
the data is included in variable initializations in the generated
program.
-f
Generate FORTRAN source code that will create a netCDF dataset
matching the netCDF specification. The FORTRAN source code is written
to standard output. This is only useful for relatively small CDL
files, since all the data is included in variable initializations in
the generated program.
-v2
The generated netCDF file or program will use the version of the
format with 64-bit offsets, to allow for the creation of very large
files. These files are not as portable as classic format netCDF files,
because they require version 3.6.0 or later of the netCDF library.
-v3
The generated netCDF file will be in netCDF-4/HDF5 format. These files
are not as portable as classic format netCDF files, because they
require version 4.0 or later of the netCDF library.
-x
Use “no fill” mode, omitting the initialization of variable values
with fill values. This can make the creation of large files much
faster, but it will also eliminate the possibility of detecting the
inadvertent reading of values that haven't been written.
\page file_format_specifications File Format Specifications
\tableofcontents
\todo Wrap the following group of pages into this. From NetCDF Classic Format Spec through HDF4 SD Format.
\section classic_format_spec The NetCDF Classic Format Specification
To present the format more formally, we use a BNF grammar notation. In
this notation:
- Non-terminals (entities defined by grammar rules) are in lower case.
- Terminals (atomic entities in terms of which the format
specification is written) are in upper case, and are specified
literally as US-ASCII characters within single-quote characters or are
described with text between angle brackets (‘\<’ and ‘\>’).
- Optional entities are enclosed between braces (‘[’ and ‘]’).
- A sequence of zero or more occurrences of an entity is denoted by
‘[entity ...]’.
- A vertical line character (‘|’) separates alternatives. Alternation
has lower precedence than concatenation.
- Comments follow ‘//’ characters.
- A single byte that is not a printable character is denoted using a
hexadecimal number with the notation ‘\\xDD’, where each D is a
hexadecimal digit.
- A literal single-quote character is denoted by ‘\'’, and a literal
back-slash character is denoted by ‘\\’.
Following the grammar, a few additional notes are included to specify
format characteristics that are impractical to capture in a BNF
grammar, and to note some special cases for implementers. Comments in
the grammar point to the notes and special cases, and help to clarify
the intent of elements of the format.
The Format in Detail
netcdf_file = header data
header = magic numrecs dim_list gatt_list var_list
magic = 'C' 'D' 'F' VERSION
VERSION = \\x01 | // classic format
\\x02 // 64-bit offset format
numrecs = NON_NEG | STREAMING // length of record dimension
dim_list = ABSENT | NC_DIMENSION nelems [dim ...]
gatt_list = att_list // global attributes
att_list = ABSENT | NC_ATTRIBUTE nelems [attr ...]
var_list = ABSENT | NC_VARIABLE nelems [var ...]
ABSENT = ZERO ZERO // Means list is not present
ZERO = \\x00 \\x00 \\x00 \\x00 // 32-bit zero
NC_DIMENSION = \\x00 \\x00 \\x00 \\x0A // tag for list of dimensions
NC_VARIABLE = \\x00 \\x00 \\x00 \\x0B // tag for list of variables
NC_ATTRIBUTE = \\x00 \\x00 \\x00 \\x0C // tag for list of attributes
nelems = NON_NEG // number of elements in following sequence
dim = name dim_length
name = nelems namestring
// Names a dimension, variable, or attribute.
// Names should match the regular expression
// ([a-zA-Z0-9_]|{MUTF8})([^\\x00-\\x1F/\\x7F-\\xFF]|{MUTF8})*
// For other constraints, see "Note on names", below.
namestring = ID1 [IDN ...] padding
ID1 = alphanumeric | '_'
IDN = alphanumeric | special1 | special2
alphanumeric = lowercase | uppercase | numeric | MUTF8
lowercase = 'a'|'b'|'c'|'d'|'e'|'f'|'g'|'h'|'i'|'j'|'k'|'l'|'m'|
'n'|'o'|'p'|'q'|'r'|'s'|'t'|'u'|'v'|'w'|'x'|'y'|'z'
uppercase = 'A'|'B'|'C'|'D'|'E'|'F'|'G'|'H'|'I'|'J'|'K'|'L'|'M'|
'N'|'O'|'P'|'Q'|'R'|'S'|'T'|'U'|'V'|'W'|'X'|'Y'|'Z'
numeric = '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'
// special1 chars have traditionally been
// permitted in netCDF names.
special1 = '_'|'.'|'@'|'+'|'-'
// special2 chars are recently permitted in
// names (and require escaping in CDL).
// Note: '/' is not permitted.
special2 = ' ' | '!' | '"' | '#' | '$' | '%' | '&' | '\'' |
'(' | ')' | '*' | ',' | ':' | ';' | '<' | '=' |
'>' | '?' | '[' | '\\' | ']' | '^' | '`' | '{' |
'|' | '}' | '~'
MUTF8 =
dim_length = NON_NEG // If zero, this is the record dimension.
// There can be at most one record dimension.
attr = name nc_type nelems [values ...]
nc_type = NC_BYTE |
NC_CHAR |
NC_SHORT |
NC_INT |
NC_FLOAT |
NC_DOUBLE
var = name nelems [dimid ...] vatt_list nc_type vsize begin
// nelems is the dimensionality (rank) of the
// variable: 0 for scalar, 1 for vector, 2
// for matrix, ...
dimid = NON_NEG // Dimension ID (index into dim_list) for
// variable shape. We say this is a "record
// variable" if and only if the first
// dimension is the record dimension.
vatt_list = att_list // Variable-specific attributes
vsize = NON_NEG // Variable size. If not a record variable,
// the amount of space in bytes allocated to
// the variable's data. If a record variable,
// the amount of space per record. See "Note
// on vsize", below.
begin = OFFSET // Variable start location. The offset in
// bytes (seek index) in the file of the
// beginning of data for this variable.
data = non_recs recs
non_recs = [vardata ...] // The data for all non-record variables,
// stored contiguously for each variable, in
// the same order the variables occur in the
// header.
vardata = [values ...] // All data for a non-record variable, as a
// block of values of the same type as the
// variable, in row-major order (last
// dimension varying fastest).
recs = [record ...] // The data for all record variables are
// stored interleaved at the end of the
// file.
record = [varslab ...] // Each record consists of the n-th slab
// from each record variable, for example
// x[n,...], y[n,...], z[n,...] where the
// first index is the record number, which
// is the unlimited dimension index.
varslab = [values ...] // One record of data for a variable, a
// block of values all of the same type as
// the variable in row-major order (last
// index varying fastest).
values = bytes | chars | shorts | ints | floats | doubles
string = nelems [chars]
bytes = [BYTE ...] padding
chars = [CHAR ...] padding
shorts = [SHORT ...] padding
ints = [INT ...]
floats = [FLOAT ...]
doubles = [DOUBLE ...]
padding = <0, 1, 2, or 3 bytes to next 4-byte boundary>
// Header padding uses null (\\x00) bytes. In
// data, padding uses variable's fill value.
// See "Note on padding", below, for a special
// case.
NON_NEG =
STREAMING = \\xFF \\xFF \\xFF \\xFF // Indicates indeterminate record
// count, allows streaming data
OFFSET = | // For classic format or
// for 64-bit offset format
BYTE = <8-bit byte> // See "Note on byte data", below.
CHAR = <8-bit byte> // See "Note on char data", below.
SHORT = <16-bit signed integer, Bigendian, two's complement>
INT = <32-bit signed integer, Bigendian, two's complement>
INT64 = <64-bit signed integer, Bigendian, two's complement>
FLOAT = <32-bit IEEE single-precision float, Bigendian>
DOUBLE = <64-bit IEEE double-precision float, Bigendian>
// following type tags are 32-bit integers
NC_BYTE = \\x00 \\x00 \\x00 \\x01 // 8-bit signed integers
NC_CHAR = \\x00 \\x00 \\x00 \\x02 // text characters
NC_SHORT = \\x00 \\x00 \\x00 \\x03 // 16-bit signed integers
NC_INT = \\x00 \\x00 \\x00 \\x04 // 32-bit signed integers
NC_FLOAT = \\x00 \\x00 \\x00 \\x05 // IEEE single precision floats
NC_DOUBLE = \\x00 \\x00 \\x00 \\x06 // IEEE double precision floats
// Default fill values for each type, may be
// overridden by variable attribute named
// `_FillValue'. See "Note on fill values",
// below.
FILL_CHAR = \\x00 // null byte
FILL_BYTE = \\x81 // (signed char) -127
FILL_SHORT = \\x80 \\x01 // (short) -32767
FILL_INT = \\x80 \\x00 \\x00 \\x01 // (int) -2147483647
FILL_FLOAT = \\x7C \\xF0 \\x00 \\x00 // (float) 9.9692099683868690e+36
FILL_DOUBLE = \\x47 \\x9E \\x00 \\x00 \\x00 \\x00 //(double)9.9692099683868690e+36
Note on vsize: This number is the product of the dimension lengths
(omitting the record dimension) and the number of bytes per value
(determined from the type), increased to the next multiple of 4, for
each variable. If a record variable, this is the amount of space per
record (except that, for backward compatibility, it always includes
padding to the next multiple of 4 bytes, even in the exceptional case
noted below under “Note on padding”). The netCDF “record size” is
calculated as the sum of the vsize's of all the record variables.
The vsize field is actually redundant, because its value may be
computed from other information in the header. The 32-bit vsize field
is not large enough to contain the size of variables that require more
than 2^32 - 4 bytes, so 2^32 - 1 is used in the vsize field for such
variables.
Note on names: Earlier versions of the netCDF C-library reference
implementation enforced a more restricted set of characters in
creating new names, but permitted reading names containing arbitrary
bytes. This specification extends the permitted characters in names to
include multi-byte UTF-8 encoded Unicode and additional printing
characters from the US-ASCII alphabet. The first character of a name
must be alphanumeric, a multi-byte UTF-8 character, or '_' (reserved
for special names with meaning to implementations, such as the
“_FillValue” attribute). Subsequent characters may also include
printing special characters, except for '/' which is not allowed in
names. Names that have trailing space characters are also not
permitted.
Implementations of the netCDF classic and 64-bit offset format must
ensure that names are normalized according to Unicode NFC
normalization rules during encoding as UTF-8 for storing in the file
header. This is necessary to ensure that gratuitous differences in the
representation of Unicode names do not cause anomalies in comparing
files and querying data objects by name.
Note on streaming data: The largest possible record count, 2^32 - 1,
is reserved to indicate an indeterminate number of records. This means
that the number of records in the file must be determined by other
means, such as reading them or computing the current number of records
from the file length and other information in the header. It also
means that the numrecs field in the header will not be updated as
records are added to the file. [This feature is not yet implemented].
Note on padding: In the special case when there is only one record
variable and it is of type character, byte, or short, no padding is
used between record slabs, so records after the first record do not
necessarily start on four-byte boundaries. However, as noted above
under “Note on vsize”, the vsize field is computed to include padding
to the next multiple of 4 bytes. In this case, readers should ignore
vsize and assume no padding. Writers should store vsize as if padding
were included.
Note on byte data: It is possible to interpret byte data as either
signed (-128 to 127) or unsigned (0 to 255). When reading byte data
through an interface that converts it into another numeric type, the
default interpretation is signed. There are various attribute
conventions for specifying whether bytes represent signed or unsigned
data, but no standard convention has been established. The variable
attribute “_Unsigned” is reserved for this purpose in future
implementations.
Note on char data: Although the characters used in netCDF names must
be encoded as UTF-8, character data may use other encodings. The
variable attribute “_Encoding” is reserved for this purpose in future
implementations.
Note on fill values: Because data variables may be created before
their values are written, and because values need not be written
sequentially in a netCDF file, default “fill values” are defined for
each type, for initializing data values before they are explicitly
written. This makes it possible to detect reading values that were
never written. The variable attribute “_FillValue”, if present,
overrides the default fill value for a variable. If _FillValue is
defined then it should be scalar and of the same type as the variable.
Fill values are not required, however, because netCDF libraries have
traditionally supported a “no fill” mode when writing, omitting the
initialization of variable values with fill values. This makes the
creation of large files faster, but also eliminates the possibility of
detecting the inadvertent reading of values that haven't been written.
\section computing_offsets Notes on Computing File Offsets
The offset (position within the file) of a specified data value in a
classic format or 64-bit offset data file is completely determined by
the variable start location (the offset in the begin field), the
external type of the variable (the nc_type field), and the dimension
indices (one for each of the variable's dimensions) of the value
desired.
The external size in bytes of one data value for each possible netCDF
type, denoted extsize below, is:
- NC_BYTE 1
- NC_CHAR 1
- NC_SHORT 2
- NC_INT 4
- NC_FLOAT 4
- NC_DOUBLE 8
The record size, denoted by recsize below, is the sum of the vsize
fields of record variables (variables that use the unlimited
dimension), using the actual value determined by dimension sizes and
variable type in case the vsize field is too small for the variable
size.
To compute the offset of a value relative to the beginning of a
variable, it is helpful to precompute a “product vector” from the
dimension lengths. Form the products of the dimension lengths for the
variable from right to left, skipping the leftmost (record) dimension
for record variables, and storing the results as the product vector
for each variable.
For example:
\code
Non-record variable:
dimension lengths: [ 5 3 2 7] product vector: [210 42 14 7]
Record variable:
dimension lengths: [0 2 9 4] product vector: [0 72 36 4]
\endcode
At this point, the leftmost product, when rounded up to the next
multiple of 4, is the variable size, vsize, in the grammar above. For
example, in the non-record variable above, the value of the vsize
field is 212 (210 rounded up to a multiple of 4). For the record
variable, the value of vsize is just 72, since this is already a
multiple of 4.
Let coord be the array of coordinates (dimension indices, zero-based)
of the desired data value. Then the offset of the value from the
beginning of the file is just the file offset of the first data value
of the desired variable (its begin field) added to the inner product
of the coord and product vectors times the size, in bytes, of each
datum for the variable. Finally, if the variable is a record variable,
the product of the record number, 'coord[0]', and the record size,
recsize, is added to yield the final offset value.
A special case: Where there is exactly one record variable, we drop
the requirement that each record be four-byte aligned, so in this case
there is no record padding.
\subsection offset_examples Examples
By using the grammar above, we can derive the smallest valid netCDF
file, having no dimensions, no variables, no attributes, and hence, no
data. A CDL representation of the empty netCDF file is
\code
netcdf empty { }
\endcode
This empty netCDF file has 32 bytes. It begins with the four-byte
“magic number” that identifies it as a netCDF version 1 file: ‘C’,
‘D’, ‘F’, ‘\\x01’. Following are seven 32-bit integer zeros
representing the number of records, an empty list of dimensions, an
empty list of global attributes, and an empty list of variables.
Below is an (edited) dump of the file produced using the Unix command
\code
od -xcs empty.nc
\endcode
Each 16-byte portion of the file is displayed with 4 lines. The first
line displays the bytes in hexadecimal. The second line displays the
bytes as characters. The third line displays each group of two bytes
interpreted as a signed 16-bit integer. The fourth line (added by
human) presents the interpretation of the bytes in terms of netCDF
components and values.
\code
4344 4601 0000 0000 0000 0000 0000 0000
C D F 001 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
17220 17921 00000 00000 00000 00000 00000 00000
[magic number ] [ 0 records ] [ 0 dimensions (ABSENT) ]
0000 0000 0000 0000 0000 0000 0000 0000
\0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
00000 00000 00000 00000 00000 00000 00000 00000
[ 0 global atts (ABSENT) ] [ 0 variables (ABSENT) ]
\endcode
As a less trivial example, consider the CDL
\code
netcdf tiny {
dimensions:
dim = 5;
variables:
short vx(dim);
data:
vx = 3, 1, 4, 1, 5 ;
}
\endcode
which corresponds to a 92-byte netCDF file. The following is an edited dump of this file:
\code
4344 4601 0000 0000 0000 000a 0000 0001
C D F 001 \0 \0 \0 \0 \0 \0 \0 \n \0 \0 \0 001
17220 17921 00000 00000 00000 00010 00000 00001
[magic number ] [ 0 records ] [NC_DIMENSION ] [ 1 dimension ]
0000 0003 6469 6d00 0000 0005 0000 0000
\0 \0 \0 003 d i m \0 \0 \0 \0 005 \0 \0 \0 \0
00000 00003 25705 27904 00000 00005 00000 00000
[ 3 char name = "dim" ] [ size = 5 ] [ 0 global atts
0000 0000 0000 000b 0000 0001 0000 0002
\0 \0 \0 \0 \0 \0 \0 013 \0 \0 \0 001 \0 \0 \0 002
00000 00000 00000 00011 00000 00001 00000 00002
(ABSENT) ] [NC_VARIABLE ] [ 1 variable ] [ 2 char name =
7678 0000 0000 0001 0000 0000 0000 0000
v x \0 \0 \0 \0 \0 001 \0 \0 \0 \0 \0 \0 \0 \0
30328 00000 00000 00001 00000 00000 00000 00000
"vx" ] [1 dimension ] [ with ID 0 ] [ 0 attributes
0000 0000 0000 0003 0000 000c 0000 0050
\0 \0 \0 \0 \0 \0 \0 003 \0 \0 \0 \f \0 \0 \0 P
00000 00000 00000 00003 00000 00012 00000 00080
(ABSENT) ] [type NC_SHORT] [size 12 bytes] [offset: 80]
0003 0001 0004 0001 0005 8001
\0 003 \0 001 \0 004 \0 001 \0 005 200 001
00003 00001 00004 00001 00005 -32767
[ 3] [ 1] [ 4] [ 1] [ 5] [fill ]
\endcode
\section offset_format_spec The 64-bit Offset Format
The netCDF 64-bit offset format differs from the classic format only
in the VERSION byte, ‘\\x02’ instead of ‘\\x01’, and the OFFSET entity,
a 64-bit instead of a 32-bit offset from the beginning of the
file. This small format change permits much larger files, but there
are still some practical size restrictions. Each fixed-size variable
and the data for one record's worth of each record variable are still
limited in size to a little less that 4 GiB. The rationale for this
limitation is to permit aggregate access to all the data in a netCDF
variable (or a record's worth of data) on 32-bit platforms.
\section netcdf_4_spec The NetCDF-4 Format
The netCDF-4 format implements and expands the netCDF-3 data model by
using an enhanced version of HDF5 as the storage layer. Use is made of
features that are only available in HDF5 version 1.8 and later.
Using HDF5 as the underlying storage layer, netCDF-4 files remove many
of the restrictions for classic and 64-bit offset files. The richer
enhanced model supports user-defined types and data structures,
hierarchical scoping of names using groups, additional primitive types
including strings, larger variable sizes, and multiple unlimited
dimensions. The underlying HDF5 storage layer also supports
per-variable compression, multidimensional tiling, and efficient
dynamic schema changes, so that data need not be copied when adding
new variables to the file schema.
Creating a netCDF-4/HDF5 file with netCDF-4 results in an HDF5
file. The features of netCDF-4 are a subset of the features of HDF5,
so the resulting file can be used by any existing HDF5 application.
Although every file in netCDF-4 format is an HDF5 file, there are HDF5
files that are not netCDF-4 format files, because the netCDF-4 format
intentionally uses a limited subset of the HDF5 data model and file
format features. Some HDF5 features not supported in the netCDF
enhanced model and netCDF-4 format include non-hierarchical group
structures, HDF5 reference types, multiple links to a data object,
user-defined atomic data types, stored property lists, more permissive
rules for data object names, the HDF5 date/time type, and attributes
associated with user-defined types.
A complete specification of HDF5 files is beyond the scope of this
document. For more information about HDF5, see the HDF5 web site:
http://hdf.ncsa.uiuc.edu/HDF5/.
The specification that follows is sufficient to allow HDF5 users to
create files that will be accessable from netCDF-4.
\subsection creation_order Creation Order
The netCDF API maintains the creation order of objects that are
created in the file. The same is not true in HDF5, which maintains the
objects in alphabetical order. Starting in version 1.8 of HDF5, the
ability to maintain creation order was added. This must be explicitly
turned on in the HDF5 data file in several ways.
Each group must have link and attribute creation order set. The
following code (from libsrc4/nc4hdf.c) shows how the netCDF-4 library
sets these when creating a group.
\code
/* Create group, with link_creation_order set in the group
* creation property list. */
if ((gcpl_id = H5Pcreate(H5P_GROUP_CREATE)) < 0)
return NC_EHDFERR;
if (H5Pset_link_creation_order(gcpl_id, H5P_CRT_ORDER_TRACKED|H5P_CRT_ORDER_INDEXED) < 0)
BAIL(NC_EHDFERR);
if (H5Pset_attr_creation_order(gcpl_id, H5P_CRT_ORDER_TRACKED|H5P_CRT_ORDER_INDEXED) < 0)
BAIL(NC_EHDFERR);
if ((grp->hdf_grpid = H5Gcreate2(grp->parent->hdf_grpid, grp->name,
H5P_DEFAULT, gcpl_id, H5P_DEFAULT)) < 0)
BAIL(NC_EHDFERR);
if (H5Pclose(gcpl_id) < 0)
BAIL(NC_EHDFERR);
\endcode
Each dataset in the HDF5 file must be created with a property list for
which the attribute creation order has been set to creation
ordering. The H5Pset_attr_creation_order funtion is used to set the
creation ordering of attributes of a variable.
The following example code (from libsrc4/nc4hdf.c) shows how the
creation ordering is turned on by the netCDF library.
\code
/* Turn on creation order tracking. */
if (H5Pset_attr_creation_order(plistid, H5P_CRT_ORDER_TRACKED|
H5P_CRT_ORDER_INDEXED) < 0)
BAIL(NC_EHDFERR);
\endcode
\subsection groups_spec Groups
NetCDF-4 groups are the same as HDF5 groups, but groups in a netCDF-4
file must be strictly hierarchical. In general, HDF5 permits
non-hierarchical structuring of groups (for example, a group that is
its own grandparent). These non-hierarchical relationships are not
allowed in netCDF-4 files.
In the netCDF API, the global attribute becomes a group-level
attribute. That is, each group may have its own global attributes.
The root group of a file is named “/” in the netCDF API, where names
of groups are used. It should be noted that the netCDF API (like the
HDF5 API) makes little use of names, and refers to entities by number.
\subsection dims_spec Dimensions with HDF5 Dimension Scales
Until version 1.8, HDF5 did not have any capability to represent
shared dimensions. With the 1.8 release, HDF5 introduced the dimension
scale feature to allow shared dimensions in HDF5 files.
The dimension scale is unfortunately not exactly equivilent to the
netCDF shared dimension, and this leads to a number of compromises in
the design of netCDF-4.
A netCDF shared dimension consists solely of a length and a name. An
HDF5 dimension scale also includes values for each point along the
dimension, information that is (optionally) included in a netCDF
coordinate variable.
To handle the case of a netCDF dimension without a coordinate
variable, netCDF-4 creates dimension scales of type char, and leaves
the contents of the dimension scale empty. Only the name and length of
the scale are significant. To distinguish this case, netCDF-4 takes
advantage of the NAME attribute of the dimension scale. (Not to be
confused with the name of the scale itself.) In the case of dimensions
without coordinate data, the HDF5 dimension scale NAME attribute is
set to the string: "This is a netCDF dimension but not a netCDF
variable."
In the case where a coordinate variable is defined for a dimension,
the HDF5 dimscale matches the type of the netCDF coordinate variable,
and contains the coordinate data.
A further difficulty arrises when an n-dimensional coordinate variable
is defined, where n is greater than one. NetCDF allows such coordinate
variables, but the HDF5 model does not allow dimension scales to be
attached to other dimension scales, making it impossible to completely
represent the multi-dimensional coordinate variables of the netCDF
model.
To capture this information, multidimensional coordinate variables
have an attribute named _Netcdf4Coordinates. The attribute is an array
of H5T_NATIVE_INT, with the netCDF dimension IDs of each of its
dimensions.
The _Netcdf4Coordinates attribute is otherwise hidden by the netCDF
API. It does not appear as one of the attributes for the netCDF
variable involved, except through the HDF5 API.
\subsection dim_spec2 Dimensions without HDF5 Dimension Scales
Starting with the netCDF-4.1 release, netCDF can read HDF5 files which
do not use dimension scales. In this case the netCDF library assigns
dimensions to the HDF5 dataset as needed, based on the length of the
dimension.
When an HDF5 file is opened, each dataset is examined in turn. The
lengths of all the dimensions involved in the shape of the dataset are
determined. Each new (i.e. previously unencountered) length results in
the creation of a phony dimension in the netCDF API.
This will not accurately detect a shared, unlimited dimension in the
HDF5 file, if different datasets have different lengths along this
dimension (possible in HDF5, but not in netCDF).
Note that this is a read-only capability for the netCDF library. When
the netCDF library writes HDF5 files, they always use a dimension
scale for every dimension.
Datasets must have either dimension scales for every dimension, or no
dimension scales at all. Partial dimension scales are not, at this
time, understood by the netCDF library.
\subsection dim_spec3 Dimension and Coordinate Variable Ordering
In order to preserve creation order, the netCDF-4 library writes
variables in their creation order. Since some variables are also
dimension scales, their order reflects both the order of the
dimensions and the order of the coordinate variables.
However, these may be different. Consider the following code:
\code
/* Create a test file. */
if (nc_create(FILE_NAME, NC_CLASSIC_MODEL|NC_NETCDF4, &ncid)) ERR;
/* Define dimensions in order. */
if (nc_def_dim(ncid, DIM0, NC_UNLIMITED, &dimids[0])) ERR;
if (nc_def_dim(ncid, DIM1, 4, &dimids[1])) ERR;
/* Define coordinate variables in a different order. */
if (nc_def_var(ncid, DIM1, NC_DOUBLE, 1, &dimids[1], &varid[1])) ERR;
if (nc_def_var(ncid, DIM0, NC_DOUBLE, 1, &dimids[0], &varid[0])) ERR;
\endcode
In this case the order of the coordinate variables will be different
from the order of the dimensions.
In practice, this should make little difference in user code, but if
the user is writing code that depends on the ordering of dimensions,
the netCDF library was updated in version 4.1 to detect this
condition, and add the attribute _Netcdf4Dimid to the dimension scales
in the HDF5 file. This attribute holds a scalar H5T_NATIVE_INT which
is the (zero-based) dimension ID for this dimension.
If this attribute is present on any dimension scale, it must be
present on all dimension scales in the file.
\subsection vars_spec Variables
Variables in netCDF-4/HDF5 files exactly correspond to HDF5
datasets. The data types match naturally between netCDF and HDF5.
In netCDF classic format, the problem of endianness is solved by
writing all data in big-endian order. The HDF5 library allows data to
be written as either big or little endian, and automatically reorders
the data when it is read, if necessary.
By default, netCDF uses the native types on the machine which writes
the data. Users may change the endianness of a variable (before any
data are written). In that case the specified endian type will be used
in HDF5 (for example, a H5T_STD_I16LE will be used for NC_SHORT, if
little-endian has been specified for that variable.)
- NC_BYTE = H5T_NATIVE_SCHAR
- NC_UBYTE = H5T_NATIVE_SCHAR
- NC_CHAR = H5T_C_S1
- NC_STRING = variable length array of H5T_C_S1
- NC_SHORT = H5T_NATIVE_SHORT
- NC_USHORT = H5T_NATIVE_USHORT
- NC_INT = H5T_NATIVE_INT
- NC_UINT = H5T_NATIVE_UINT
- NC_INT64 = H5T_NATIVE_LLONG
- NC_UINT64 = H5T_NATIVE_ULLONG
- NC_FLOAT = H5T_NATIVE_FLOAT
- NC_DOUBLE = H5T_NATIVE_DOUBLE
The NC_CHAR type represents a single character, and the NC_STRING an
array of characters. This can be confusing because a one-dimensional
array of NC_CHAR is used to represent a string (i.e. a scalar
NC_STRING).
An odd case may arise in which the user defines a variable with the
same name as a dimension, but which is not intended to be the
coordinate variable for that dimension. In this case the string
"_nc4_non_coord_" is pre-pended to the name of the HDF5 dataset, and
stripped from the name for the netCDF API.
\subsection atts_spec Attributes
Attributes in HDF5 and netCDF-4 correspond very closely. Each
attribute in an HDF5 file is represented as an attribute in the
netCDF-4 file, with the exception of the attributes below, which are
ignored by the netCDF-4 API.
- _Netcdf4Coordinates An integer array containing the dimension IDs of
a variable which is a multi-dimensional coordinate variable.
- _nc3_strict When this (scalar, H5T_NATIVE_INT) attribute exists in
the root group of the HDF5 file, the netCDF API will enforce the
netCDF classic model on the data file.
- REFERENCE_LIST This attribute is created and maintained by the HDF5
dimension scale API.
- CLASS This attribute is created and maintained by the HDF5 dimension
scale API.
- DIMENSION_LIST This attribute is created and maintained by the HDF5
dimension scale API.
- NAME This attribute is created and maintained by the HDF5 dimension
scale API.
- _Netcdf4Dimid Holds a scalar H5T_NATIVE_INT that is the (zero-based)
dimension ID for this dimension, needed when dimensions and
coordinate variables are defined in different orders.
\subsection user_defined_spec User-Defined Data Types
Each user-defined data type in an HDF5 file exactly corresponds to a
user-defined data type in the netCDF-4 file. Only base data types
which correspond to netCDF-4 data types may be used. (For example, no
HDF5 reference data types may be used.)
\subsection compression_spec Compression
The HDF5 library provides data compression using the zlib library and
the szlib library. NetCDF-4 only allows users to create data with the
zlib library (due to licensing restrictions on the szlib
library). Since HDF5 supports the transparent reading of the data with
either compression filter, the netCDF-4 library can read data
compressed with szlib (if the underlying HDF5 library is built to
support szlib), but has no way to write data with szlib compression.
With zlib compression (a.k.a. deflation) the user may set a deflation
factor from 0 to 9. In our measurements the zero deflation level does
not compress the data, but does incur the performance penalty of
compressing the data. The netCDF API does not allow the user to write
a variable with zlib deflation of 0 - when asked to do so, it turns
off deflation for the variable instead. NetCDF can read an HDF5 file
with deflation of zero, and correctly report that to the user.
\section netcdf_4_classic_spec The NetCDF-4 Classic Model Format
Every classic and 64-bit offset file can be represented as a netCDF-4
file, with no loss of information. There are some significant benefits
to using the simpler netCDF classic model with the netCDF-4 file
format. For example, software that writes or reads classic model data
can write or read netCDF-4 classic model format data by
recompiling/relinking to a netCDF-4 API library, with no or only
trivial changes needed to the program source code. The netCDF-4
classic model format supports this usage by enforcing rules on what
functions may be called to store data in the file, to make sure its
data can be read by older netCDF applications (when relinked to a
netCDF-4 library).
Writing data in this format prevents use of enhanced model features
such as groups, added primitive types not available in the classic
model, and user-defined types. However performance features of the
netCDF-4 formats that do not require additional features of the
enhanced model, such as per-variable compression and chunking,
efficient dynamic schema changes, and larger variable size limits,
offer potentially significant performance improvements to readers of
data stored in this format, without requiring program changes.
When a file is created via the netCDF API with a CLASSIC_MODEL mode
flag, the library creates an attribute (_nc3_strict) in the root
group. This attribute is hidden by the netCDF API, but is read when
the file is later opened, and used to ensure that no enhanced model
features are written to the file.
\section hdf4_sd_format HDF4 SD Format
Starting with version 4.1, the netCDF libraries can read HDF4 SD
(Scientific Dataset) files. Access is limited to those HDF4 files
created with the Scientific Dataset API. Access is read-only.
Dataset types are translated between HDF4 and netCDF in a
straighforward manner.
- DFNT_CHAR = NC_CHAR
- DFNT_UCHAR, DFNT_UINT8 = NC_UBYTE
- DFNT_INT8 = NC_BYTE
- DFNT_INT16 = NC_SHORT
- DFNT_UINT16 = NC_USHORT
- DFNT_INT32 = NC_INT
- DFNT_UINT32 = NC_UINT
- DFNT_FLOAT32 = NC_FLOAT
- DFNT_FLOAT64 = NC_DOUBLE
*/