mirror of
https://github.com/Unidata/netcdf-c.git
synced 2024-11-27 07:30:33 +08:00
dbaf62f5e6
range_error checks in netCDF-4 type conversion code. Made netCDF attribute tests with type conversion more comprehensive and stringent, fixing bugs identified with better tests. Changed a test in nc_test/tst_atts.c to use netCDF-3 file instead of netCDF-4 file, because that directory is supposed to be for tests that work with --disable-netcdf-4. Added test demonstrating NCF-171 bug on 32-bit platforms, only run when configured with --enable-extra-tests.
4510 lines
193 KiB
Plaintext
4510 lines
193 KiB
Plaintext
/*! \file
|
||
|
||
NetCDF Users Guide
|
||
|
||
\page user_guide The NetCDF Users Guide
|
||
|
||
\ref netcdf_summary
|
||
|
||
Introduction
|
||
- \ref netcdf_interface
|
||
- \ref netcdf_format
|
||
- \ref performance
|
||
- \ref archival
|
||
- \ref attribute_conventions
|
||
- \ref background
|
||
- \ref limitations
|
||
|
||
Components of a NetCDF Data Set
|
||
- \ref data_model
|
||
- \ref dimensions
|
||
- \ref variables
|
||
- \ref attributes
|
||
- \ref differences_atts_vars
|
||
|
||
Data
|
||
- \ref external_types
|
||
- \ref classic_structures
|
||
- \ref user_defined_types
|
||
- \ref type_conversion
|
||
- \ref data_access
|
||
- \ref remote_client
|
||
- \ref type_conversion
|
||
|
||
File Structure and Performance
|
||
- \ref classic_file_parts
|
||
- \ref parts_of_netcdf4
|
||
- \ref xdr_layer
|
||
- \ref large_file_support
|
||
- \ref offset_format_limitations
|
||
- \ref classic_format_limitations
|
||
- \ref netcdf_3_io
|
||
- \ref parallel_access
|
||
- \ref interoperability_with_hdf5
|
||
- \ref creating_self
|
||
- \ref dap_support
|
||
|
||
Improving Performance with Chunking
|
||
- \ref chunk_cache
|
||
- \ref default_chunking_4_1
|
||
- \ref default_chunking_4_0_1
|
||
- \ref chunking_parallel_io
|
||
- \ref bm_file
|
||
|
||
NetCDF Utilities
|
||
- \ref cdl_syntax
|
||
- \ref cdl_data_types
|
||
- \ref cdl_constants
|
||
- \ref guide_ncgen
|
||
- \ref guide_ncdump
|
||
- \ref guide_nccopy
|
||
- \ref guide_ncgen3
|
||
|
||
File Format Specification
|
||
- \ref classic_format_spec
|
||
- \ref computing_offsets
|
||
- \ref offset_examples
|
||
- \ref offset_format_spec
|
||
- \ref netcdf_4_spec
|
||
- \ref netcdf_4_classic_spec
|
||
- \ref hdf4_sd_format
|
||
|
||
|
||
\page netcdf_summary Summary
|
||
|
||
The purpose of the Network Common Data Form (netCDF) interface is to
|
||
allow you to create, access, and share array-oriented data in a form
|
||
that is self-describing and portable. "Self-describing" means that a
|
||
dataset includes information defining the data it contains. "Portable"
|
||
means that the data in a dataset is represented in a form that can be
|
||
accessed by computers with different ways of storing integers,
|
||
characters, and floating-point numbers. Using the netCDF interface for
|
||
creating new datasets makes the data portable. Using the netCDF
|
||
interface in software for data access, management, analysis, and
|
||
display can make the software more generally useful.
|
||
|
||
The netCDF software includes C, Fortran 77, Fortran 90, and C++
|
||
interfaces for accessing netCDF data. These libraries are available
|
||
for many common computing platforms.
|
||
|
||
The community of netCDF users has contributed ports of the software to
|
||
additional platforms and interfaces for other programming languages as
|
||
well. Source code for netCDF software libraries is freely available to
|
||
encourage the sharing of both array-oriented data and the software
|
||
that makes the data useful.
|
||
|
||
This User's Guide presents the netCDF data model. It explains how the
|
||
netCDF data model uses dimensions, variables, and attributes to store
|
||
data.
|
||
|
||
Reference documentation for UNIX systems, in the form of UNIX 'man'
|
||
pages for the C and FORTRAN interfaces is also available at the netCDF
|
||
web site (http://www.unidata.ucar.edu/netcdf), and with the netCDF
|
||
distribution.
|
||
|
||
The latest version of this document, and the language specific guides,
|
||
can be found at the netCDF web site,
|
||
http://www.unidata.ucar.edu/netcdf/docs, along with extensive
|
||
additional information about netCDF, including pointers to other
|
||
software that works with netCDF data.
|
||
|
||
Separate documentation of the Java netCDF library can be found at
|
||
http://www.unidata.ucar.edu/software/netcdf-java.
|
||
|
||
\page netcdf_interface The NetCDF Interface
|
||
|
||
The Network Common Data Form, or netCDF, is an interface to a library
|
||
of data access functions for storing and retrieving data in the form
|
||
of arrays. An array is an n-dimensional (where n is 0, 1, 2, ...)
|
||
rectangular structure containing items which all have the same data
|
||
type (e.g., 8-bit character, 32-bit integer). A scalar (simple single
|
||
value) is a 0-dimensional array.
|
||
|
||
NetCDF is an abstraction that supports a view of data as a collection
|
||
of self-describing, portable objects that can be accessed through a
|
||
simple interface. Array values may be accessed directly, without
|
||
knowing details of how the data are stored. Auxiliary information
|
||
about the data, such as what units are used, may be stored with the
|
||
data. Generic utilities and application programs can access netCDF
|
||
datasets and transform, combine, analyze, or display specified fields
|
||
of the data. The development of such applications has led to improved
|
||
accessibility of data and improved re-usability of software for
|
||
array-oriented data management, analysis, and display.
|
||
|
||
The netCDF software implements an abstract data type, which means that
|
||
all operations to access and manipulate data in a netCDF dataset must
|
||
use only the set of functions provided by the interface. The
|
||
representation of the data is hidden from applications that use the
|
||
interface, so that how the data are stored could be changed without
|
||
affecting existing programs. The physical representation of netCDF
|
||
data is designed to be independent of the computer on which the data
|
||
were written.
|
||
|
||
Unidata supports the netCDF interfaces for C (see <a
|
||
href="http://www.unidata.ucar.edu/netcdf/docs/netcdf-c.html#Top" >NetCDF C Interface
|
||
Guide</a>), FORTRAN 77 (see <a
|
||
href="http://www.unidata.ucar.edu/netcdf/docs/netcdf-f77.html#Top" >NetCDF Fortran 77
|
||
Interface Guide</a>), FORTRAN 90 (see <a
|
||
href="http://www.unidata.ucar.edu/netcdf/docs/netcdf-f90.html#Top" >NetCDF Fortran 90
|
||
Interface Guide</a>), and C++ (see <a
|
||
href="http://www.unidata.ucar.edu/netcdf/docs/netcdf-cxx.html#Top" >NetCDF C++ Interface
|
||
Guide</a>).
|
||
|
||
The netCDF library is supported for various UNIX operating systems. A
|
||
MS Windows port is also available. The software is also ported and
|
||
tested on a few other operating systems, with assistance from users
|
||
with access to these systems, before each major release. Unidata's
|
||
netCDF software is freely available <a
|
||
href="ftp://ftp.unidata.ucar.edu/pub/netcdf">via FTP</a> to encourage
|
||
its widespread use.
|
||
|
||
For detailed installation instructions, see <a
|
||
href="http://www.unidata.ucar.edu/netcdf/docs/building.html" >Building NetCDF</a>.
|
||
|
||
\page netcdf_format The netCDF File Format
|
||
|
||
Until version 3.6.0, all versions of netCDF employed only one binary
|
||
data format, now referred to as netCDF classic format. NetCDF classic
|
||
is the default format for all versions of netCDF.
|
||
|
||
In version 3.6.0 a new binary format was introduced, 64-bit offset
|
||
format. Nearly identical to netCDF classic format, it uses 64-bit
|
||
offsets (hence the name), and allows users to create far larger
|
||
datasets.
|
||
|
||
In version 4.0.0 a third binary format was introduced: the HDF5
|
||
format. Starting with this version, the netCDF library can use HDF5
|
||
files as its base format. (Only HDF5 files created with netCDF-4 can
|
||
be understood by netCDF-4).
|
||
|
||
By default, netCDF uses the classic format. To use the 64-bit offset
|
||
or netCDF-4/HDF5 format, set the appropriate constant when creating
|
||
the file.
|
||
|
||
To achieve network-transparency (machine-independence), netCDF classic
|
||
and 64-bit offset formats are implemented in terms of an external
|
||
representation much like XDR (eXternal Data Representation, see
|
||
http://www.ietf.org/rfc/rfc1832.txt), a standard for describing and
|
||
encoding data. This representation provides encoding of data into
|
||
machine-independent sequences of bits. It has been implemented on a
|
||
wide variety of computers, by assuming only that eight-bit bytes can
|
||
be encoded and decoded in a consistent way. The IEEE 754
|
||
floating-point standard is used for floating-point data
|
||
representation.
|
||
|
||
Descriptions of the overall structure of netCDF classic and 64-bit
|
||
offset files are provided later in this manual. See Structure.
|
||
|
||
The details of the classic and 64-bit offset formats are described in
|
||
an appendix. See File Format. However, users are discouraged from
|
||
using the format specification to develop independent low-level
|
||
software for reading and writing netCDF files, because this could lead
|
||
to compatibility problems if the format is ever modified.
|
||
|
||
\section select_format How to Select the Format
|
||
|
||
With three different base formats, care must be taken in creating data
|
||
files to choose the correct base format.
|
||
|
||
The format of a netCDF file is determined at create time.
|
||
|
||
When opening an existing netCDF file the netCDF library will
|
||
transparently detect its format and adjust accordingly. However,
|
||
netCDF library versions earlier than 3.6.0 cannot read 64-bit offset
|
||
format files, and library versions before 4.0 can't read netCDF-4/HDF5
|
||
files. NetCDF classic format files (even if created by version 3.6.0
|
||
or later) remain compatible with older versions of the netCDF library.
|
||
|
||
Users are encouraged to use netCDF classic format to distribute data,
|
||
for maximum portability.
|
||
|
||
To select 64-bit offset or netCDF-4 format files, C programmers should
|
||
use flag NC_64BIT_OFFSET or NC_NETCDF4 in function nc_create. See
|
||
nc_create.
|
||
|
||
In Fortran, use flag nf_64bit_offset or nf_format_netcdf4 in function
|
||
NF_CREATE. See NF_CREATE.
|
||
|
||
It is also possible to change the default creation format, to convert
|
||
a large body of code without changing every create call. C programmers
|
||
see nc_set_default_format. Fortran programs see NF_SET_DEFAULT_FORMAT.
|
||
|
||
\subsection classic_format NetCDF Classic Format
|
||
|
||
The original netCDF format is identified using four bytes in the file
|
||
header. All files in this format have “CDF\001” at the beginning of
|
||
the file. In this documentation this format is referred to as “netCDF
|
||
classic format.”
|
||
|
||
NetCDF classic format is identical to the format used by every
|
||
previous version of netCDF. It has maximum portability, and is still
|
||
the default netCDF format.
|
||
|
||
For some users, the various 2 GiB format limitations of the classic
|
||
format become a problem. (see Classic Limitations). 1.4.2 NetCDF
|
||
64-bit Offset Format
|
||
|
||
For these users, 64-bit offset format is a natural choice. It greatly
|
||
eases the size restrictions of netCDF classic files (see 64 bit Offset
|
||
Limitations).
|
||
|
||
Files with the 64-bit offsets are identified with a “CDF\002” at the
|
||
beginning of the file. In this documentation this format is called
|
||
“64-bit offset format.”
|
||
|
||
Since 64-bit offset format was introduced in version 3.6.0, earlier
|
||
versions of the netCDF library can't read 64-bit offset files.
|
||
|
||
\subsection netcdf_4_format NetCDF-4 Format
|
||
|
||
In version 4.0, netCDF included another new underlying format: HDF5.
|
||
|
||
NetCDF-4 format files offer new features such as groups, compound
|
||
types, variable length arrays, new unsigned integer types, parallel
|
||
I/O access, etc. None of these new features can be used with classic
|
||
or 64-bit offset files.
|
||
|
||
NetCDF-4 files can't be created at all, unless the netCDF configure
|
||
script is run with –enable-netcdf-4. This also requires version 1.8.0
|
||
of HDF5.
|
||
|
||
For the netCDF-4.0 release, netCDF-4 features are only available from
|
||
the C and Fortran interfaces. We plan to bring netCDF-4 features to
|
||
the CXX API in a future release of netCDF.
|
||
|
||
NetCDF-4 files can't be read by any version of the netCDF library
|
||
previous to 4.0. (But they can be read by HDF5, version 1.8.0 or
|
||
better).
|
||
|
||
For more discussion of format issues see The NetCDF Tutorial.
|
||
|
||
\page performance What about Performance?
|
||
|
||
One of the goals of netCDF is to support efficient access to small
|
||
subsets of large datasets. To support this goal, netCDF uses direct
|
||
access rather than sequential access. This can be much more efficient
|
||
when the order in which data is read is different from the order in
|
||
which it was written, or when it must be read in different orders for
|
||
different applications.
|
||
|
||
The amount of overhead for a portable external representation depends
|
||
on many factors, including the data type, the type of computer, the
|
||
granularity of data access, and how well the implementation has been
|
||
tuned to the computer on which it is run. This overhead is typically
|
||
small in comparison to the overall resources used by an
|
||
application. In any case, the overhead of the external representation
|
||
layer is usually a reasonable price to pay for portable data access.
|
||
|
||
Although efficiency of data access has been an important concern in
|
||
designing and implementing netCDF, it is still possible to use the
|
||
netCDF interface to access data in inefficient ways: for example, by
|
||
requesting a slice of data that requires a single value from each
|
||
record. Advice on how to use the interface efficiently is provided in
|
||
Structure.
|
||
|
||
The use of HDF5 as a data format adds significant overhead in metadata
|
||
operations, less so in data access operations. We continue to study
|
||
the challenge of implementing netCDF-4/HDF5 format without
|
||
compromising performance.
|
||
|
||
\page creating_self Creating Self-Describing Data conforming to Conventions
|
||
|
||
The mere use of netCDF is not sufficient to make data
|
||
"self-describing" and meaningful to both humans and machines. The
|
||
names of variables and dimensions should be meaningful and conform to
|
||
any relevant conventions. Dimensions should have corresponding
|
||
coordinate variables where sensible.
|
||
|
||
Attributes play a vital role in providing ancillary information. It is
|
||
important to use all the relevant standard attributes using the
|
||
relevant conventions. For a description of reserved attributes (used
|
||
by the netCDF library) and attribute conventions for generic
|
||
application software, see Attribute Conventions.
|
||
|
||
A number of groups have defined their own additional conventions and
|
||
styles for netCDF data. Descriptions of these conventions, as well as
|
||
examples incorporating them can be accessed from the netCDF
|
||
Conventions site, http://www.unidata.ucar.edu/netcdf/conventions.html.
|
||
|
||
These conventions should be used where suitable. Additional
|
||
conventions are often needed for local use. These should be
|
||
contributed to the above netCDF conventions site if likely to interest
|
||
other users in similar areas.
|
||
|
||
\page limitations Limitations of NetCDF
|
||
|
||
The netCDF classic data model is widely applicable to data that can be
|
||
organized into a collection of named array variables with named
|
||
attributes, but there are some important limitations to the model and
|
||
its implementation in software. Some of these limitations have been
|
||
removed or relaxed in netCDF-4 files, but still apply to netCDF
|
||
classic and netCDF 64-bit offset files.
|
||
|
||
Currently, netCDF classic and 64-bit offset formats offer a limited
|
||
number of external numeric data types: 8-, 16-, 32-bit integers, or
|
||
32- or 64-bit floating-point numbers. (The netCDF-4 format adds 64-bit
|
||
integer types and unsigned integer types.)
|
||
|
||
With the netCDF-4/HDF5 format, new unsigned integers (of various
|
||
sizes), 64-bit integers, and the string type allow improved expression
|
||
of meaning in scientific data. The new VLEN (variable length) and
|
||
COMPOUND types allow users to organize data in new ways.
|
||
|
||
With the classic netCDF file format, there are constraints that limit
|
||
how a dataset is structured to store more than 2 GiBytes (a GiByte is
|
||
2^30 or 1,073,741,824 bytes, as compared to a Gbyte, which is
|
||
1,000,000,000 bytes.) of data in a single netCDF dataset. (see Classic
|
||
Limitations). This limitation is a result of 32-bit offsets used for
|
||
storing relative offsets within a classic netCDF format file. Since
|
||
one of the goals of netCDF is portable data, and some file systems
|
||
still can't deal with files larger than 2 GiB, it is best to keep
|
||
files that must be portable below this limit. Nevertheless, it is
|
||
possible to create and access netCDF files larger than 2 GiB on
|
||
platforms that provide support for such files (see Large File
|
||
Support).
|
||
|
||
The new 64-bit offset format allows large files, and makes it easy to
|
||
create to create fixed variables of about 4 GiB, and record variables
|
||
of about 4 GiB per record. (see 64 bit Offset Limitations). However,
|
||
old netCDF applications will not be able to read the 64-bit offset
|
||
files until they are upgraded to at least version 3.6.0 of netCDF
|
||
(i.e. the version in which 64-bit offset format was introduced).
|
||
|
||
With the netCDF-4/HDF5 format, size limitations are further relaxed,
|
||
and files can be as large as the underlying file system
|
||
supports. NetCDF-4/HDF5 files are unreadable to the netCDF library
|
||
before version 4.0.
|
||
|
||
Another limitation of the classic (and 64-bit offset) model is that
|
||
only one unlimited (changeable) dimension is permitted for each netCDF
|
||
data set. Multiple variables can share an unlimited dimension, but
|
||
then they must all grow together. Hence the classic netCDF model does
|
||
not permit variables with several unlimited dimensions or the use of
|
||
multiple unlimited dimensions in different variables within the same
|
||
dataset. Variables that have non-rectangular shapes (for example,
|
||
ragged arrays) cannot be represented conveniently.
|
||
|
||
In netCDF-4/HDF5 files, multiple unlimited dimensions are fully
|
||
supported. Any variable can be defined with any combination of limited
|
||
and unlimited dimensions.
|
||
|
||
The extent to which data can be completely self-describing is limited:
|
||
there is always some assumed context without which sharing and
|
||
archiving data would be impractical. NetCDF permits storing meaningful
|
||
names for variables, dimensions, and attributes; units of measure in a
|
||
form that can be used in computations; text strings for attribute
|
||
values that apply to an entire data set; and simple kinds of
|
||
coordinate system information. But for more complex kinds of metadata
|
||
(for example, the information necessary to provide accurate
|
||
georeferencing of data on unusual grids or from satellite images), it
|
||
is often necessary to develop conventions.
|
||
|
||
Specific additions to the netCDF data model might make some of these
|
||
conventions unnecessary or allow some forms of metadata to be
|
||
represented in a uniform and compact way. For example, adding explicit
|
||
georeferencing to the netCDF data model would simplify elaborate
|
||
georeferencing conventions at the cost of complicating the model. The
|
||
problem is finding an appropriate trade-off between the richness of
|
||
the model and its generality (i.e., its ability to encompass many
|
||
kinds of data). A data model tailored to capture the shared context
|
||
among researchers within one discipline may not be appropriate for
|
||
sharing or combining data from multiple disciplines.
|
||
|
||
The classic netCDF data model (which is used for classic-format and
|
||
64-bit offset format data) does not support nested data structures
|
||
such as trees, nested arrays, or other recursive structures. Through
|
||
use of indirection and conventions it is possible to represent some
|
||
kinds of nested structures, but the result may fall short of the
|
||
netCDF goal of self-describing data.
|
||
|
||
In netCDF-4/HDF5 format files, the introduction of the compound type
|
||
allows the creation of complex data types, involving any combination
|
||
of types. The VLEN type allows efficient storage of ragged arrays, and
|
||
the introduction of hierarchical groups allows users new ways to
|
||
organize data.
|
||
|
||
Finally, using the netCDF-3 programming interfaces, concurrent access
|
||
to a netCDF dataset is limited. One writer and multiple readers may
|
||
access data in a single dataset simultaneously, but there is no
|
||
support for multiple concurrent writers.
|
||
|
||
NetCDF-4 supports parallel read/write access to netCDF-4/HDF5 files,
|
||
using the underlying HDF5 library and parallel read/write access to
|
||
classic and 64-bit offset files using the parallel-netcdf library.
|
||
|
||
For more information about HDF5, see the HDF5 web site:
|
||
http://hdfgroup.org/HDF5/.
|
||
|
||
For more information about parallel-netcdf, see their web site:
|
||
http://www.mcs.anl.gov/parallel-netcdf.
|
||
|
||
\page data_model The Data Model
|
||
|
||
A netCDF dataset contains dimensions, variables, and attributes, which
|
||
all have both a name and an ID number by which they are
|
||
identified. These components can be used together to capture the
|
||
meaning of data and relations among data fields in an array-oriented
|
||
dataset. The netCDF library allows simultaneous access to multiple
|
||
netCDF datasets which are identified by dataset ID numbers, in
|
||
addition to ordinary file names.
|
||
|
||
\section Enhanced Data Model in NetCDF-4/HDF5 Files
|
||
|
||
Files created with the netCDF-4 format have access to an enhanced data
|
||
model, which includes named groups. Groups, like directories in a Unix
|
||
file system, are hierarchically organized, to arbitrary depth. They
|
||
can be used to organize large numbers of variables.
|
||
|
||
\image html nc4-model.png "Enhanced NetCDF Data Model"
|
||
\image latex nc4-model.png "Enhanced NetCDF Data Model"
|
||
\image rtf nc4-model.png "Enhanced NetCDF Data Model"
|
||
|
||
Each group acts as an entire netCDF dataset in the classic model. That
|
||
is, each group may have attributes, dimensions, and variables, as well
|
||
as other groups.
|
||
|
||
The default group is the root group, which allows the classic netCDF
|
||
data model to fit neatly into the new model.
|
||
|
||
Dimensions are scoped such that they can be seen in all descendant
|
||
groups. That is, dimensions can be shared between variables in
|
||
different groups, if they are defined in a parent group.
|
||
|
||
In netCDF-4 files, the user may also define a type. For example a
|
||
compound type may hold information from an array of C structures, or a
|
||
variable length type allows the user to read and write arrays of
|
||
variable length values.
|
||
|
||
Variables, groups, and types share a namespace. Within the same group,
|
||
variables, groups, and types must have unique names. (That is, a type
|
||
and variable may not have the same name within the same group, and
|
||
similarly for sub-groups of that group.)
|
||
|
||
Groups and user-defined types are only available in files created in
|
||
the netCDF-4/HDF5 format. They are not available for classic or 64-bit
|
||
offset format files.
|
||
|
||
\page object_name Name
|
||
|
||
\section Permitted Characters in NetCDF Names
|
||
|
||
The names of dimensions, variables and attributes (and, in netCDF-4
|
||
files, groups, user-defined types, compound member names, and
|
||
enumeration symbols) consist of arbitrary sequences of alphanumeric
|
||
characters, underscore '_', period '.', plus '+', hyphen '-', or at
|
||
sign '@', but beginning with an alphanumeric character or
|
||
underscore. However names commencing with underscore are reserved for
|
||
system use.
|
||
|
||
Beginning with versions 3.6.3 and 4.0, names may also include UTF-8
|
||
encoded Unicode characters as well as other special characters, except
|
||
for the character '/', which may not appear in a name.
|
||
|
||
Names that have trailing space characters are also not permitted.
|
||
|
||
Case is significant in netCDF names.
|
||
|
||
\section Name Length
|
||
|
||
A zero-length name is not allowed.
|
||
|
||
Names longer than ::NC_MAX_NAME will not be accepted any netCDF define
|
||
function. An error of ::NC_EMAXNAME will be returned.
|
||
|
||
All netCDF inquiry functions will return names of maximum size
|
||
::NC_MAX_NAME for netCDF files. Since this does not include the
|
||
terminating NULL, space should be reserved for NC_MAX_NAME + 1
|
||
characters.
|
||
|
||
\section Conventions
|
||
|
||
Some widely used conventions restrict names to only alphanumeric
|
||
characters or underscores.
|
||
|
||
\page archival Is NetCDF a Good Archive Format?
|
||
|
||
NetCDF classic or 64-bit offset formats can be used as a
|
||
general-purpose archive format for storing arrays. Compression of data
|
||
is possible with netCDF (e.g., using arrays of eight-bit or 16-bit
|
||
integers to encode low-resolution floating-point numbers instead of
|
||
arrays of 32-bit numbers), or the resulting data file may be
|
||
compressed before storage (but must be uncompressed before it is
|
||
read). Hence, using these netCDF formats may require more space than
|
||
special-purpose archive formats that exploit knowledge of particular
|
||
characteristics of specific datasets.
|
||
|
||
With netCDF-4/HDF5 format, the zlib library can provide compression on
|
||
a per-variable basis. That is, some variables may be compressed,
|
||
others not. In this case the compression and decompression of data
|
||
happen transparently to the user, and the data may be stored, read,
|
||
and written compressed.
|
||
|
||
\page attribute_conventions Attribute Conventions
|
||
|
||
Attribute conventions are assumed by some netCDF generic applications,
|
||
e.g., ‘units’ as the name for a string attribute that gives the units
|
||
for a netCDF variable.
|
||
|
||
It is strongly recommended that applicable conventions be followed
|
||
unless there are good reasons for not doing so. Below we list the
|
||
names and meanings of recommended standard attributes that have proven
|
||
useful. Note that some of these (e.g. units, valid_range,
|
||
scale_factor) assume numeric data and should not be used with
|
||
character data.
|
||
|
||
\note Attribute names commencing with underscore ('_') are reserved
|
||
for use by the netCDF library.
|
||
|
||
\section units
|
||
|
||
A character string that specifies the units used for the variable's
|
||
data. Unidata has developed a freely-available library of routines to
|
||
convert between character string and binary forms of unit
|
||
specifications and to perform various useful operations on the binary
|
||
forms. This library is used in some netCDF applications. Using the
|
||
recommended units syntax permits data represented in conformable units
|
||
to be automatically converted to common units for arithmetic
|
||
operations. For more information see Units.
|
||
|
||
\section long_name
|
||
|
||
A long descriptive name. This could be used for labeling plots, for
|
||
example. If a variable has no long_name attribute assigned, the
|
||
variable name should be used as a default.
|
||
|
||
\section _FillValue
|
||
|
||
The _FillValue attribute specifies the fill value used to pre-fill
|
||
disk space allocated to the variable. Such pre-fill occurs unless
|
||
nofill mode is set using nc_set_fill(). The fill value is returned
|
||
when reading values that were never written. If ::_FillValue is defined
|
||
then it should be scalar and of the same type as the variable. If the
|
||
variable is packed using scale_factor and add_offset attributes (see
|
||
below), the _FillValue attribute should have the data type of the
|
||
packed data.
|
||
|
||
It is not necessary to define your own _FillValue attribute for a
|
||
variable if the default fill value for the type of the variable is
|
||
adequate. However, use of the default fill value for data type byte is
|
||
not recommended. Note that if you change the value of this attribute,
|
||
the changed value applies only to subsequent writes; previously
|
||
written data are not changed.
|
||
|
||
Generic applications often need to write a value to represent
|
||
undefined or missing values. The fill value provides an appropriate
|
||
value for this purpose because it is normally outside the valid range
|
||
and therefore treated as missing when read by generic applications. It
|
||
is legal (but not recommended) for the fill value to be within the
|
||
valid range.
|
||
|
||
\section missing_value
|
||
|
||
This attribute is not treated in any special way by the library or
|
||
conforming generic applications, but is often useful documentation and
|
||
may be used by specific applications. The missing_value attribute can
|
||
be a scalar or vector containing values indicating missing data. These
|
||
values should all be outside the valid range so that generic
|
||
applications will treat them as missing.
|
||
|
||
When scale_factor and add_offset are used for packing, the value(s) of
|
||
the missing_value attribute should be specified in the domain of the
|
||
data in the file (the packed data), so that missing values can be
|
||
detected before the scale_factor and add_offset are applied.
|
||
valid_min A scalar specifying the minimum valid value for this
|
||
variable. valid_max A scalar specifying the maximum valid value for
|
||
this variable. valid_range A vector of two numbers specifying the
|
||
minimum and maximum valid values for this variable, equivalent to
|
||
specifying values for both valid_min and valid_max attributes. Any of
|
||
these attributes define the valid range. The attribute valid_range
|
||
must not be defined if either valid_min or valid_max is defined.
|
||
|
||
Generic applications should treat values outside the valid range as
|
||
missing. The type of each valid_range, valid_min and valid_max
|
||
attribute should match the type of its variable (except that for byte
|
||
data, these can be of a signed integral type to specify the intended
|
||
range).
|
||
|
||
If neither valid_min, valid_max nor valid_range is defined then
|
||
generic applications should define a valid range as follows. If the
|
||
data type is byte and _FillValue is not explicitly defined, then the
|
||
valid range should include all possible values. Otherwise, the valid
|
||
range should exclude the _FillValue (whether defined explicitly or by
|
||
default) as follows. If the _FillValue is positive then it defines a
|
||
valid maximum, otherwise it defines a valid minimum. For integer
|
||
types, there should be a difference of 1 between the _FillValue and
|
||
this valid minimum or maximum. For floating point types, the
|
||
difference should be twice the minimum possible (1 in the least
|
||
significant bit) to allow for rounding error.
|
||
|
||
If the variable is packed using scale_factor and add_offset attributes
|
||
(see below), the _FillValue, missing_value, valid_range, valid_min, or
|
||
valid_max attributes should have the data type of the packed data.
|
||
|
||
\section scale_factor
|
||
|
||
If present for a variable, the data are to be multiplied by this
|
||
factor after the data are read by the application that accesses the
|
||
data.
|
||
|
||
If valid values are specified using the valid_min, valid_max,
|
||
valid_range, or _FillValue attributes, those values should be
|
||
specified in the domain of the data in the file (the packed data), so
|
||
that they can be interpreted before the scale_factor and add_offset
|
||
are applied.
|
||
|
||
\section add_offset
|
||
|
||
If present for a variable, this number is to be added to the data
|
||
after it is read by the application that accesses the data. If both
|
||
scale_factor and add_offset attributes are present, the data are first
|
||
scaled before the offset is added. The attributes scale_factor and
|
||
add_offset can be used together to provide simple data compression to
|
||
store low-resolution floating-point data as small integers in a netCDF
|
||
dataset. When scaled data are written, the application should first
|
||
subtract the offset and then divide by the scale factor, rounding the
|
||
result to the nearest integer to avoid a bias caused by truncation
|
||
towards zero.
|
||
|
||
When scale_factor and add_offset are used for packing, the associated
|
||
variable (containing the packed data) is typically of type byte or
|
||
short, whereas the unpacked values are intended to be of type float or
|
||
double. The attributes scale_factor and add_offset should both be of
|
||
the type intended for the unpacked data, e.g. float or double.
|
||
|
||
\section signedness
|
||
|
||
Deprecated attribute, originally designed to indicate whether byte
|
||
values should be treated as signed or unsigned. The attributes
|
||
valid_min and valid_max may be used for this purpose. For example, if
|
||
you intend that a byte variable store only non-negative values, you
|
||
can use valid_min = 0 and valid_max = 255. This attribute is ignored
|
||
by the netCDF library.
|
||
|
||
\section C_format
|
||
|
||
A character array providing the format that should be used by C
|
||
applications to print values for this variable. For example, if you
|
||
know a variable is only accurate to three significant digits, it would
|
||
be appropriate to define the C_format attribute as "%.3g". The ncdump
|
||
utility program uses this attribute for variables for which it is
|
||
defined. The format applies to the scaled (internal) type and value,
|
||
regardless of the presence of the scaling attributes scale_factor and
|
||
add_offset.
|
||
|
||
\section FORTRAN_format
|
||
|
||
A character array providing the format that should be used by FORTRAN
|
||
applications to print values for this variable. For example, if you
|
||
know a variable is only accurate to three significant digits, it would
|
||
be appropriate to define the FORTRAN_format attribute as "(G10.3)".
|
||
|
||
\section title
|
||
|
||
A global attribute that is a character array providing a succinct
|
||
description of what is in the dataset.
|
||
|
||
\section history
|
||
|
||
A global attribute for an audit trail. This is a character array with
|
||
a line for each invocation of a program that has modified the
|
||
dataset. Well-behaved generic netCDF applications should append a line
|
||
containing: date, time of day, user name, program name and command
|
||
arguments.
|
||
|
||
\section Conventions
|
||
|
||
If present, 'Conventions' is a global attribute that is a character
|
||
array for the name of the conventions followed by the
|
||
dataset. Originally, these conventions were named by a string that was
|
||
interpreted as a directory name relative to the directory
|
||
/pub/netcdf/Conventions/ on the host ftp.unidata.ucar.edu. The web
|
||
page http://www.unidata.ucar.edu/netcdf/conventions.html is now the
|
||
preferred and authoritative location for registering a URI reference
|
||
to a set of conventions maintained elsewhere. The FTP site will be
|
||
preserved for compatibility with existing references, but authors of
|
||
new conventions should submit a request to
|
||
support-netcdf@unidata.ucar.edu for listing on the Unidata conventions
|
||
web page.
|
||
|
||
It may be convenient for defining institutions and groups to use a
|
||
hierarchical structure for general conventions and more specialized
|
||
conventions. For example, if a group named NUWG agrees upon a set of
|
||
conventions for dimension names, variable names, required attributes,
|
||
and netCDF representations for certain discipline-specific data
|
||
structures, they may store a document describing the agreed-upon
|
||
conventions in a dataset in the NUWG/ subdirectory of the Conventions
|
||
directory. Datasets that followed these conventions would contain a
|
||
global Conventions attribute with value "NUWG".
|
||
|
||
Later, if the group agrees upon some additional conventions for a
|
||
specific subset of NUWG data, for example time series data, the
|
||
description of the additional conventions might be stored in the
|
||
NUWG/Time_series/ subdirectory, and datasets that adhered to these
|
||
additional conventions would use the global Conventions attribute with
|
||
value "NUWG/Time_series", implying that this dataset adheres to the
|
||
NUWG conventions and also to the additional NUWG time-series
|
||
conventions.
|
||
|
||
It is possible for a netCDF file to adhere to more than one set of
|
||
conventions, even when there is no inheritance relationship among the
|
||
conventions. In this case, the value of the `Conventions' attribute
|
||
may be a single text string containing a list of the convention names
|
||
separated by blank space (recommended) or commas (if a convention name
|
||
contains blanks).
|
||
|
||
Typical conventions web sites will include references to documents in
|
||
some form agreed upon by the community that supports the conventions
|
||
and examples of netCDF file structures that follow the conventions.
|
||
|
||
\page background Background and Evolution of the NetCDF Interface
|
||
|
||
The development of the netCDF interface began with a modest goal
|
||
related to Unidata's needs: to provide a common interface between
|
||
Unidata applications and real-time meteorological data. Since Unidata
|
||
software was intended to run on multiple hardware platforms with
|
||
access from both C and FORTRAN, achieving Unidata's goals had the
|
||
potential for providing a package that was useful in a broader
|
||
context. By making the package widely available and collaborating with
|
||
other organizations with similar needs, we hoped to improve the then
|
||
current situation in which software for scientific data access was
|
||
only rarely reused by others in the same discipline and almost never
|
||
reused between disciplines (Fulker, 1988).
|
||
|
||
Important concepts employed in the netCDF software originated in a
|
||
paper (Treinish and Gough, 1987) that described data-access software
|
||
developed at the NASA Goddard National Space Science Data Center
|
||
(NSSDC). The interface provided by this software was called the Common
|
||
Data Format (CDF). The NASA CDF was originally developed as a
|
||
platform-specific FORTRAN library to support an abstraction for
|
||
storing arrays.
|
||
|
||
The NASA CDF package had been used for many different kinds of data in
|
||
an extensive collection of applications. It had the virtues of
|
||
simplicity (only 13 subroutines), independence from storage format,
|
||
generality, ability to support logical user views of data, and support
|
||
for generic applications.
|
||
|
||
Unidata held a workshop on CDF in Boulder in August 1987. We proposed
|
||
exploring the possibility of collaborating with NASA to extend the CDF
|
||
FORTRAN interface, to define a C interface, and to permit the access
|
||
of data aggregates with a single call, while maintaining compatibility
|
||
with the existing NASA interface.
|
||
|
||
Independently, Dave Raymond at the New Mexico Institute of Mining and
|
||
Technology had developed a package of C software for UNIX that
|
||
supported sequential access to self-describing array-oriented data and
|
||
a "pipes and filters" (or "data flow") approach to processing,
|
||
analyzing, and displaying the data. This package also used the "Common
|
||
Data Format" name, later changed to C-Based Analysis and Display
|
||
System (CANDIS). Unidata learned of Raymond's work (Raymond, 1988),
|
||
and incorporated some of his ideas, such as the use of named
|
||
dimensions and variables with differing shapes in a single data
|
||
object, into the Unidata netCDF interface.
|
||
|
||
In early 1988, Glenn Davis of Unidata developed a prototype netCDF
|
||
package in C that was layered on XDR. This prototype proved that a
|
||
single-file, XDR-based implementation of the CDF interface could be
|
||
achieved at acceptable cost and that the resulting programs could be
|
||
implemented on both UNIX and VMS systems. However, it also
|
||
demonstrated that providing a small, portable, and NASA CDF-compatible
|
||
FORTRAN interface with the desired generality was not
|
||
practical. NASA's CDF and Unidata's netCDF have since evolved
|
||
separately, but recent CDF versions share many characteristics with
|
||
netCDF.
|
||
|
||
In early 1988, Joe Fahle of SeaSpace, Inc. (a commercial software
|
||
development firm in San Diego, California), a participant in the 1987
|
||
Unidata CDF workshop, independently developed a CDF package in C that
|
||
extended the NASA CDF interface in several important ways (Fahle,
|
||
1989). Like Raymond's package, the SeaSpace CDF software permitted
|
||
variables with unrelated shapes to be included in the same data object
|
||
and permitted a general form of access to multidimensional
|
||
arrays. Fahle's implementation was used at SeaSpace as the
|
||
intermediate form of storage for a variety of steps in their
|
||
image-processing system. This interface and format have subsequently
|
||
evolved into the Terascan data format.
|
||
|
||
After studying Fahle's interface, we concluded that it solved many of
|
||
the problems we had identified in trying to stretch the NASA interface
|
||
to our purposes. In August 1988, we convened a small workshop to agree
|
||
on a Unidata netCDF interface, and to resolve remaining open
|
||
issues. Attending were Joe Fahle of SeaSpace, Michael Gough of Apple
|
||
(an author of the NASA CDF software), Angel Li of the University of
|
||
Miami (who had implemented our prototype netCDF software on VMS and
|
||
was a potential user), and Unidata systems development
|
||
staff. Consensus was reached at the workshop after some further
|
||
simplifications were discovered. A document incorporating the results
|
||
of the workshop into a proposed Unidata netCDF interface specification
|
||
was distributed widely for comments before Glenn Davis and Russ Rew
|
||
implemented the first version of the software. Comparison with other
|
||
data-access interfaces and experience using netCDF are discussed in
|
||
Rew and Davis (1990a), Rew and Davis (1990b), Jenter and Signell
|
||
(1992), and Brown, Folk, Goucher, and Rew (1993).
|
||
|
||
In October 1991, we announced version 2.0 of the netCDF software
|
||
distribution. Slight modifications to the C interface (declaring
|
||
dimension lengths to be long rather than int) improved the usability
|
||
of netCDF on inexpensive platforms such as MS-DOS computers, without
|
||
requiring recompilation on other platforms. This change to the
|
||
interface required no changes to the associated file format.
|
||
|
||
Release of netCDF version 2.3 in June 1993 preserved the same file
|
||
format but added single call access to records, optimizations for
|
||
accessing cross-sections involving non-contiguous data, subsampling
|
||
along specified dimensions (using 'strides'), accessing non-contiguous
|
||
data (using 'mapped array sections'), improvements to the ncdump and
|
||
ncgen utilities, and an experimental C++ interface.
|
||
|
||
In version 2.4, released in February 1996, support was added for new
|
||
platforms and for the C++ interface, significant optimizations were
|
||
implemented for supercomputer architectures, and the file format was
|
||
formally specified in an appendix to the User's Guide.
|
||
|
||
FAN (File Array Notation), software providing a high-level interface
|
||
to netCDF data, was made available in May 1996. The capabilities of
|
||
the FAN utilities include extracting and manipulating array data from
|
||
netCDF datasets, printing selected data from netCDF arrays, copying
|
||
ASCII data into netCDF arrays, and performing various operations (sum,
|
||
mean, max, min, product, and others) on netCDF arrays.
|
||
|
||
In 1996 and 1997, Joe Sirott implemented and made available the first
|
||
implementation of a read-only netCDF interface for Java, Bill Noon
|
||
made a Python module available for netCDF, and Konrad Hinsen
|
||
contributed another netCDF interface for Python.
|
||
|
||
In May 1997, Version 3.3 of netCDF was released. This included a new
|
||
type-safe interface for C and Fortran, as well as many other
|
||
improvements. A month later, Charlie Zender released version 1.0 of
|
||
the NCO (netCDF Operators) package, providing command-line utilities
|
||
for general purpose operations on netCDF data.
|
||
|
||
Version 3.4 of Unidata's netCDF software, released in March 1998,
|
||
included initial large file support, performance enhancements, and
|
||
improved Cray platform support. Later in 1998, Dan Schmitt provided a
|
||
Tcl/Tk interface, and Glenn Davis provided version 1.0 of netCDF for
|
||
Java.
|
||
|
||
In May 1999, Glenn Davis, who was instrumental in creating and
|
||
developing netCDF, died in a small plane crash during a
|
||
thunderstorm. The memory of Glenn's passions and intellect continue to
|
||
inspire those of us who worked with him.
|
||
|
||
In February 2000, an experimental Fortran 90 interface developed by
|
||
Robert Pincus was released.
|
||
|
||
John Caron released netCDF for Java, version 2.0 in February
|
||
2001. This version incorporated a new high-performance package for
|
||
multidimensional arrays, simplified the interface, and included
|
||
OpenDAP (known previously as DODS) remote access, as well as remote
|
||
netCDF access via HTTP contributed by Don Denbo.
|
||
|
||
In March 2001, netCDF 3.5.0 was released. This release fully
|
||
integrated the new Fortran 90 interface, enhanced portability,
|
||
improved the C++ interface, and added a few new tuning functions.
|
||
|
||
Also in 2001, Takeshi Horinouchi and colleagues made a netCDF
|
||
interface for Ruby available, as did David Pierce for the R language
|
||
for statistical computing and graphics. Charles Denham released
|
||
WetCDF, an independent implementation of the netCDF interface for
|
||
Matlab, as well as updates to the popular netCDF Toolbox for Matlab.
|
||
|
||
In 2002, Unidata and collaborators developed NcML, an XML
|
||
representation for netCDF data useful for cataloging data holdings,
|
||
aggregation of data from multiple datasets, augmenting metadata in
|
||
existing datasets, and support for alternative views of data. The Java
|
||
interface currently provides access to netCDF data through NcML.
|
||
|
||
Additional developments in 2002 included translation of C and Fortran
|
||
User Guides into Japanese by Masato Shiotani and colleagues, creation
|
||
of a “Best Practices” guide for writing netCDF files, and provision of
|
||
an Ada-95 interface by Alexandru Corlan.
|
||
|
||
In July 2003 a group of researchers at Northwestern University and
|
||
Argonne National Laboratory (Jianwei Li, Wei-keng Liao, Alok
|
||
Choudhary, Robert Ross, Rajeev Thakur, William Gropp, and Rob Latham)
|
||
contributed a new parallel interface for writing and reading netCDF
|
||
data, tailored for use on high performance platforms with parallel
|
||
I/O. The implementation built on the MPI-IO interface, providing
|
||
portability to many platforms.
|
||
|
||
In October 2003, Greg Sjaardema contributed support for an alternative
|
||
format with 64-bit offsets, to provide more complete support for very
|
||
large files. These changes, with slight modifications at Unidata, were
|
||
incorporated into version 3.6.0, released in December, 2004.
|
||
|
||
In 2004, thanks to a NASA grant, Unidata and NCSA began a
|
||
collaboration to increase the interoperability of netCDF and HDF5, and
|
||
bring some advanced HDF5 features to netCDF users.
|
||
|
||
In February, 2006, release 3.6.1 fixed some minor bugs.
|
||
|
||
In March, 2007, release 3.6.2 introduced an improved build system that
|
||
used automake and libtool, and an upgrade to the most recent autoconf
|
||
release, to support shared libraries and the netcdf-4 builds. This
|
||
release also introduced the NetCDF Tutorial and example programs.
|
||
|
||
The first beta release of netCDF-4.0 was celebrated with a giant party
|
||
at Unidata in April, 2007. Over 2000 people danced 'til dawn at the
|
||
NCAR Mesa Lab, listening to the Flaming Lips and the Denver Gilbert &
|
||
Sullivan repertory company. Brittany Spears performed the
|
||
world-premire of her smash hit "Format me baby, one more time."
|
||
|
||
In June, 2008, netCDF-4.0 was released. Version 3.6.3, the same code
|
||
but with netcdf-4 features turned off, was released at the same
|
||
time. The 4.0 release uses HDF5 1.8.1 as the data storage layer for
|
||
netcdf, and introduces many new features including groups and
|
||
user-defined types. The 3.6.3/4.0 releases also introduced handling of
|
||
UTF8-encoded Unicode names.
|
||
|
||
NetCDF-4.1.1 was released in April, 2010, provided built-in client
|
||
support for the DAP protocol for accessing data from remote OPeNDAP
|
||
servers, full support for the enhanced netCDF-4 data model in the
|
||
ncgen utility, a new nccopy utility for copying and conversion among
|
||
netCDF format variants, ability to read some HDF4/HDF5 data archives
|
||
through the netCDF C or Fortran interfaces, support for parallel I/O
|
||
on netCDF classic and 64-bit offset files using the parallel-netcdf
|
||
(formerly pnetcdf) library from Argonne/Northwestern, a new nc-config
|
||
utility to help compile and link programs that use netCDF, inclusion
|
||
of the UDUNITS library for hadling “units” attributes, and inclusion
|
||
of libcf to assist in creating data compliant with the Climate and
|
||
Forecast (CF) metadata conventions.
|
||
|
||
In September, 2010, the Netcdf-Java/CDM (Common Data Model) version
|
||
4.2 library was declared stable and made available to users. This
|
||
100%-Java implementation provides a read-write interface to netCDF-3
|
||
classic and 64-bit offset data, as well as a read-onlt interface to
|
||
netCDF-4 enhanced model data and many other formats of scientific data
|
||
through a common (CDM) interface. The NetCDF-Java library also
|
||
implements NcML, which allows you to add metadata to CDM datasets, as
|
||
well as to create virtual datasets through aggregation. A ToolsUI
|
||
application is also included that provides a graphical user interface
|
||
to capabilities similar to the C-based ncdump and ncgen utilities, as
|
||
well as CF-compliance checking and many other features.
|
||
|
||
\page remote_client The Remote Data Access Client
|
||
|
||
Starting with version 4.1.1 the netCDF C libraries and utilities have
|
||
supported remote data access.
|
||
|
||
\page data_access Data Access
|
||
|
||
To access (read or write) netCDF data you specify an open netCDF
|
||
dataset, a netCDF variable, and information (e.g., indices)
|
||
identifying elements of the variable. The name of the access function
|
||
corresponds to the internal type of the data. If the internal type has
|
||
a different representation from the external type of the variable, a
|
||
conversion between the internal type and external type will take place
|
||
when the data is read or written.
|
||
|
||
Access to data in classic and 64-bit offset format is direct. Access
|
||
to netCDF-4 data is buffered by the HDF5 layer. In either case you can
|
||
access a small subset of data from a large dataset efficiently,
|
||
without first accessing all the data that precedes it.
|
||
|
||
Reading and writing data by specifying a variable, instead of a
|
||
position in a file, makes data access independent of how many other
|
||
variables are in the dataset, making programs immune to data format
|
||
changes that involve adding more variables to the data.
|
||
|
||
In the C and FORTRAN interfaces, datasets are not specified by name
|
||
every time you want to access data, but instead by a small integer
|
||
called a dataset ID, obtained when the dataset is first created or
|
||
opened.
|
||
|
||
Similarly, a variable is not specified by name for every data access
|
||
either, but by a variable ID, a small integer used to identify each
|
||
variable in a netCDF dataset.
|
||
|
||
\section forms_of_data_access Forms of Data Access
|
||
|
||
The netCDF interface supports several forms of direct access to data
|
||
values in an open netCDF dataset. We describe each of these forms of
|
||
access in order of increasing generality:
|
||
- access to all elements;
|
||
- access to individual elements, specified with an index vector;
|
||
- access to array sections, specified with an index vector, and count vector;
|
||
- access to sub-sampled array sections, specified with an index
|
||
vector, count vector, and stride vector; and
|
||
- access to mapped array sections, specified with an index vector,
|
||
count vector, stride vector, and an index mapping vector.
|
||
|
||
The four types of vector (index vector, count vector, stride vector
|
||
and index mapping vector) each have one element for each dimension of
|
||
the variable. Thus, for an n-dimensional variable (rank = n),
|
||
n-element vectors are needed. If the variable is a scalar (no
|
||
dimensions), these vectors are ignored.
|
||
|
||
An array section is a "slab" or contiguous rectangular block that is
|
||
specified by two vectors. The index vector gives the indices of the
|
||
element in the corner closest to the origin. The count vector gives
|
||
the lengths of the edges of the slab along each of the variable's
|
||
dimensions, in order. The number of values accessed is the product of
|
||
these edge lengths.
|
||
|
||
A subsampled array section is similar to an array section, except that
|
||
an additional stride vector is used to specify sampling. This vector
|
||
has an element for each dimension giving the length of the strides to
|
||
be taken along that dimension. For example, a stride of 4 means every
|
||
fourth value along the corresponding dimension. The total number of
|
||
values accessed is again the product of the elements of the count
|
||
vector.
|
||
|
||
A mapped array section is similar to a subsampled array section except
|
||
that an additional index mapping vector allows one to specify how data
|
||
values associated with the netCDF variable are arranged in memory. The
|
||
offset of each value from the reference location, is given by the sum
|
||
of the products of each index (of the imaginary internal array which
|
||
would be used if there were no mapping) by the corresponding element
|
||
of the index mapping vector. The number of values accessed is the same
|
||
as for a subsampled array section.
|
||
|
||
The use of mapped array sections is discussed more fully below, but
|
||
first we present an example of the more commonly used array-section
|
||
access.
|
||
|
||
\section c_array_section_access A C Example of Array-Section Access
|
||
|
||
Assume that in our earlier example of a netCDF dataset (see Network
|
||
Common Data Form Language (CDL)), we wish to read a cross-section of
|
||
all the data for the temp variable at one level (say, the second), and
|
||
assume that there are currently three records (time values) in the
|
||
netCDF dataset. Recall that the dimensions are defined as
|
||
|
||
\code
|
||
lat = 5, lon = 10, level = 4, time = unlimited;
|
||
\endcode
|
||
|
||
and the variable temp is declared as
|
||
|
||
\code
|
||
float temp(time, level, lat, lon);
|
||
\endcode
|
||
|
||
in the CDL notation.
|
||
|
||
A corresponding C variable that holds data for only one level might be
|
||
declared as:
|
||
|
||
\code
|
||
#define LATS 5
|
||
#define LONS 10
|
||
#define LEVELS 1
|
||
#define TIMES 3 /* currently */
|
||
...
|
||
float temp[TIMES*LEVELS*LATS*LONS];
|
||
\endcode
|
||
|
||
to keep the data in a one-dimensional array, or
|
||
|
||
\code
|
||
...
|
||
float temp[TIMES][LEVELS][LATS][LONS];
|
||
\endcode
|
||
|
||
using a multidimensional array declaration.
|
||
|
||
To specify the block of data that represents just the second level,
|
||
all times, all latitudes, and all longitudes, we need to provide a
|
||
start index and some edge lengths. The start index should be (0, 1, 0,
|
||
0) in C, because we want to start at the beginning of each of the
|
||
time, lon, and lat dimensions, but we want to begin at the second
|
||
value of the level dimension. The edge lengths should be (3, 1, 5, 10)
|
||
in C, (since we want to get data for all three time values, only one
|
||
level value, all five lat values, and all 10 lon values. We should
|
||
expect to get a total of 150 floating-point values returned (3 * 1 * 5
|
||
* 10), and should provide enough space in our array for this many. The
|
||
order in which the data will be returned is with the last dimension,
|
||
lon, varying fastest:
|
||
|
||
\code
|
||
temp[0][1][0][0]
|
||
temp[0][1][0][1]
|
||
temp[0][1][0][2]
|
||
temp[0][1][0][3]
|
||
|
||
...
|
||
|
||
temp[2][1][4][7]
|
||
temp[2][1][4][8]
|
||
temp[2][1][4][9]
|
||
\endcode
|
||
|
||
Different dimension orders for the C, FORTRAN, or other language
|
||
interfaces do not reflect a different order for values stored on the
|
||
disk, but merely different orders supported by the procedural
|
||
interfaces to the languages. In general, it does not matter whether a
|
||
netCDF dataset is written using the C, FORTRAN, or another language
|
||
interface; netCDF datasets written from any supported language may be
|
||
read by programs written in other supported languages. 3.4.3 More on
|
||
General Array Section Access for C
|
||
|
||
The use of mapped array sections allows non-trivial relationships
|
||
between the disk addresses of variable elements and the addresses
|
||
where they are stored in memory. For example, a matrix in memory could
|
||
be the transpose of that on disk, giving a quite different order of
|
||
elements. In a regular array section, the mapping between the disk and
|
||
memory addresses is trivial: the structure of the in-memory values
|
||
(i.e., the dimensional lengths and their order) is identical to that
|
||
of the array section. In a mapped array section, however, an index
|
||
mapping vector is used to define the mapping between indices of netCDF
|
||
variable elements and their memory addresses.
|
||
|
||
With mapped array access, the offset (number of array elements) from
|
||
the origin of a memory-resident array to a particular point is given
|
||
by the inner product[1] of the index mapping vector with the point's
|
||
coordinate offset vector. A point's coordinate offset vector gives,
|
||
for each dimension, the offset from the origin of the containing array
|
||
to the point.In C, a point's coordinate offset vector is the same as
|
||
its coordinate vector.
|
||
|
||
The index mapping vector for a regular array section would have–in
|
||
order from most rapidly varying dimension to most slowly–a constant 1,
|
||
the product of that value with the edge length of the most rapidly
|
||
varying dimension of the array section, then the product of that value
|
||
with the edge length of the next most rapidly varying dimension, and
|
||
so on. In a mapped array, however, the correspondence between netCDF
|
||
variable disk locations and memory locations can be different.
|
||
|
||
For example, the following C definitions:
|
||
|
||
\code
|
||
struct vel {
|
||
int flags;
|
||
float u;
|
||
float v;
|
||
} vel[NX][NY];
|
||
ptrdiff_t imap[2] = {
|
||
sizeof(struct vel),
|
||
sizeof(struct vel)*NY
|
||
};
|
||
\endcode
|
||
|
||
where imap is the index mapping vector, can be used to access the
|
||
memory-resident values of the netCDF variable, vel(NY,NX), even though
|
||
the dimensions are transposed and the data is contained in a 2-D array
|
||
of structures rather than a 2-D array of floating-point values.
|
||
|
||
A detailed example of mapped array access is presented in the
|
||
description of the interfaces for mapped array access. See Write a
|
||
Mapped Array of Values - nc_put_varm_ type.
|
||
|
||
Note that, although the netCDF abstraction allows the use of
|
||
subsampled or mapped array-section access there use is not
|
||
required. If you do not need these more general forms of access, you
|
||
may ignore these capabilities and use single value access or regular
|
||
array section access instead.
|
||
|
||
\page dimensions Dimensions
|
||
|
||
A dimension may be used to represent a real physical dimension, for
|
||
example, time, latitude, longitude, or height. A dimension might also
|
||
be used to index other quantities, for example station or
|
||
model-run-number.
|
||
|
||
A netCDF dimension has both a name and a length.
|
||
|
||
A dimension length is an arbitrary positive integer, except that one
|
||
dimension in a classic or 64-bit offset netCDF dataset can have the
|
||
length UNLIMITED. In a netCDF-4 dataset, any number of unlimited
|
||
dimensions can be used.
|
||
|
||
Such a dimension is called the unlimited dimension or the record
|
||
dimension. A variable with an unlimited dimension can grow to any
|
||
length along that dimension. The unlimited dimension index is like a
|
||
record number in conventional record-oriented files.
|
||
|
||
A netCDF classic or 64-bit offset dataset can have at most one
|
||
unlimited dimension, but need not have any. If a variable has an
|
||
unlimited dimension, that dimension must be the most significant
|
||
(slowest changing) one. Thus any unlimited dimension must be the first
|
||
dimension in a CDL shape and the first dimension in corresponding C
|
||
array declarations.
|
||
|
||
A netCDF-4 dataset may have multiple unlimited dimensions, and there
|
||
are no restrictions on their order in the list of a variables
|
||
dimensions.
|
||
|
||
To grow variables along an unlimited dimension, write the data using
|
||
any of the netCDF data writing functions, and specify the index of the
|
||
unlimited dimension to the desired record number. The netCDF library
|
||
will write however many records are needed (using the fill value,
|
||
unless that feature is turned off, to fill in any intervening
|
||
records).
|
||
|
||
CDL dimension declarations may appear on one or more lines following
|
||
the CDL keyword dimensions. Multiple dimension declarations on the
|
||
same line may be separated by commas. Each declaration is of the form
|
||
name = length. Use the “/” character to include group information
|
||
(netCDF-4 output only).
|
||
|
||
There are four dimensions in the above example: lat, lon, level, and
|
||
time (see \ref data_model). The first three are assigned fixed
|
||
lengths; time is assigned the length UNLIMITED, which means it is the
|
||
unlimited dimension.
|
||
|
||
The basic unit of named data in a netCDF dataset is a variable. When a
|
||
variable is defined, its shape is specified as a list of
|
||
dimensions. These dimensions must already exist. The number of
|
||
dimensions is called the rank (a.k.a. dimensionality). A scalar
|
||
variable has rank 0, a vector has rank 1 and a matrix has rank 2.
|
||
|
||
It is possible (since version 3.1 of netCDF) to use the same dimension
|
||
more than once in specifying a variable shape. For example,
|
||
correlation(instrument, instrument) could be a matrix giving
|
||
correlations between measurements using different instruments. But
|
||
data whose dimensions correspond to those of physical space/time
|
||
should have a shape comprising different dimensions, even if some of
|
||
these have the same length.
|
||
|
||
\page variables Variables
|
||
|
||
Variables are used to store the bulk of the data in a netCDF
|
||
dataset. A variable represents an array of values of the same type. A
|
||
scalar value is treated as a 0-dimensional array. A variable has a
|
||
name, a data type, and a shape described by its list of dimensions
|
||
specified when the variable is created. A variable may also have
|
||
associated attributes, which may be added, deleted or changed after
|
||
the variable is created.
|
||
|
||
A variable external data type is one of a small set of netCDF
|
||
types. In classic and 64-bit offset files, only the original six types
|
||
are available (byte, character, short, int, float, and
|
||
double). Variables in netCDF-4 files may also use unsigned short,
|
||
unsigned int, 64-bit int, unsigned 64-bit int, or string. Or the user
|
||
may define a type, as an opaque blob of bytes, as an array of variable
|
||
length arrays, or as a compound type, which acts like a C struct. (See
|
||
\ref data_type).
|
||
|
||
In the CDL notation, classic and 64-bit offset type can be used. They
|
||
are given the simpler names byte, char, short, int, float, and
|
||
double. The name real may be used as a synonym for float in the CDL
|
||
notation. The name long is a deprecated synonym for int. For the exact
|
||
meaning of each of the types see External Types. The ncgen utility
|
||
supports new primitive types with names ubyte, ushort, uint, int64,
|
||
uint64, and string.
|
||
|
||
CDL variable declarations appear after the variable keyword in a CDL
|
||
unit. They have the form
|
||
|
||
\code
|
||
type variable_name ( dim_name_1, dim_name_2, ... );
|
||
\endcode
|
||
|
||
for variables with dimensions, or
|
||
|
||
\code
|
||
type variable_name;
|
||
\endcode
|
||
|
||
for scalar variables.
|
||
|
||
In the above CDL example there are six variables. As discussed below,
|
||
four of these are coordinate variables. The remaining variables
|
||
(sometimes called primary variables), temp and rh, contain what is
|
||
usually thought of as the data. Each of these variables has the
|
||
unlimited dimension time as its first dimension, so they are called
|
||
record variables. A variable that is not a record variable has a fixed
|
||
length (number of data values) given by the product of its dimension
|
||
lengths. The length of a record variable is also the product of its
|
||
dimension lengths, but in this case the product is variable because it
|
||
involves the length of the unlimited dimension, which can vary. The
|
||
length of the unlimited dimension is the number of records. 2.3.1
|
||
Coordinate Variables
|
||
|
||
It is legal for a variable to have the same name as a dimension. Such
|
||
variables have no special meaning to the netCDF library. However there
|
||
is a convention that such variables should be treated in a special way
|
||
by software using this library.
|
||
|
||
A variable with the same name as a dimension is called a coordinate
|
||
variable. It typically defines a physical coordinate corresponding to
|
||
that dimension. The above CDL example includes the coordinate
|
||
variables lat, lon, level and time, defined as follows:
|
||
|
||
\code
|
||
int lat(lat), lon(lon), level(level);
|
||
short time(time);
|
||
...
|
||
data:
|
||
level = 1000, 850, 700, 500;
|
||
lat = 20, 30, 40, 50, 60;
|
||
lon = -160,-140,-118,-96,-84,-52,-45,-35,-25,-15;
|
||
time = 12;
|
||
\endcode
|
||
|
||
These define the latitudes, longitudes, barometric pressures and times
|
||
corresponding to positions along these dimensions. Thus there is data
|
||
at altitudes corresponding to 1000, 850, 700 and 500 millibars; and at
|
||
latitudes 20, 30, 40, 50 and 60 degrees north. Note that each
|
||
coordinate variable is a vector and has a shape consisting of just the
|
||
dimension with the same name.
|
||
|
||
A position along a dimension can be specified using an index. This is
|
||
an integer with a minimum value of 0 for C programs, 1 in Fortran
|
||
programs. Thus the 700 millibar level would have an index value of 2
|
||
in the example above in a C program, and 3 in a Fortran program.
|
||
|
||
If a dimension has a corresponding coordinate variable, then this
|
||
provides an alternative, and often more convenient, means of
|
||
specifying position along it. Current application packages that make
|
||
use of coordinate variables commonly assume they are numeric vectors
|
||
and strictly monotonic (all values are different and either increasing
|
||
or decreasing).
|
||
|
||
\page attributes Attributes
|
||
|
||
NetCDF attributes are used to store data about the data (ancillary
|
||
data or metadata), similar in many ways to the information stored in
|
||
data dictionaries and schema in conventional database systems. Most
|
||
attributes provide information about a specific variable. These are
|
||
identified by the name (or ID) of that variable, together with the
|
||
name of the attribute.
|
||
|
||
Some attributes provide information about the dataset as a whole and
|
||
are called global attributes. These are identified by the attribute
|
||
name together with a blank variable name (in CDL) or a special null
|
||
"global variable" ID (in C or Fortran).
|
||
|
||
In netCDF-4 file, attributes can also be added at the group level.
|
||
|
||
An attribute has an associated variable (the null "global variable"
|
||
for a global or group-level attribute), a name, a data type, a length,
|
||
and a value. The current version treats all attributes as vectors;
|
||
scalar values are treated as single-element vectors.
|
||
|
||
Conventional attribute names should be used where applicable. New
|
||
names should be as meaningful as possible.
|
||
|
||
The external type of an attribute is specified when it is created. The
|
||
types permitted for attributes are the same as the netCDF external
|
||
data types for variables. Attributes with the same name for different
|
||
variables should sometimes be of different types. For example, the
|
||
attribute valid_max specifying the maximum valid data value for a
|
||
variable of type int should be of type int, whereas the attribute
|
||
valid_max for a variable of type double should instead be of type
|
||
double.
|
||
|
||
Attributes are more dynamic than variables or dimensions; they can be
|
||
deleted and have their type, length, and values changed after they are
|
||
created, whereas the netCDF interface provides no way to delete a
|
||
variable or to change its type or shape.
|
||
|
||
The CDL notation for defining an attribute is
|
||
|
||
\code
|
||
variable_name:attribute_name = list_of_values;
|
||
\endcode
|
||
|
||
for a variable attribute, or
|
||
|
||
\code
|
||
:attribute_name = list_of_values;
|
||
\endcode
|
||
|
||
for a global attribute.
|
||
|
||
For the netCDF classic model, the type and length of each attribute
|
||
are not explicitly declared in CDL; they are derived from the values
|
||
assigned to the attribute. All values of an attribute must be of the
|
||
same type. The notation used for constant values of the various netCDF
|
||
types is discussed later (see CDL Constants).
|
||
|
||
The extended CDL syntax for the enhanced data model supported by
|
||
netCDF-4 allows optional type specifications, including user-defined
|
||
types, for attributes of user-defined types. See ncdump output or the
|
||
reference documentation for ncgen for details of the extended CDL
|
||
systax.
|
||
|
||
In the netCDF example (see \ref data_model), units is an attribute for
|
||
the variable lat that has a 13-character array value
|
||
'degrees_north'. And valid_range is an attribute for the variable rh
|
||
that has length 2 and values '0.0' and '1.0'.
|
||
|
||
One global attribute, called “source”, is defined for the example
|
||
netCDF dataset. This is a character array intended for documenting the
|
||
data. Actual netCDF datasets might have more global attributes to
|
||
document the origin, history, conventions, and other characteristics
|
||
of the dataset as a whole.
|
||
|
||
Most generic applications that process netCDF datasets assume standard
|
||
attribute conventions and it is strongly recommended that these be
|
||
followed unless there are good reasons for not doing so. For
|
||
information about units, long_name, valid_min, valid_max, valid_range,
|
||
scale_factor, add_offset, _FillValue, and other conventional
|
||
attributes, see Attribute Conventions.
|
||
|
||
Attributes may be added to a netCDF dataset long after it is first
|
||
defined, so you don't have to anticipate all potentially useful
|
||
attributes. However adding new attributes to an existing classic or
|
||
64-bit offset format dataset can incur the same expense as copying the
|
||
dataset. For a more extensive discussion see Structure.
|
||
|
||
\page differences_atts_vars Differences between Attributes and Variables
|
||
|
||
In contrast to variables, which are intended for bulk data, attributes
|
||
are intended for ancillary data, or information about the data. The
|
||
total amount of ancillary data associated with a netCDF object, and
|
||
stored in its attributes, is typically small enough to be
|
||
memory-resident. However variables are often too large to entirely fit
|
||
in memory and must be split into sections for processing.
|
||
|
||
Another difference between attributes and variables is that variables
|
||
may be multidimensional. Attributes are all either scalars
|
||
(single-valued) or vectors (a single, fixed dimension).
|
||
|
||
Variables are created with a name, type, and shape before they are
|
||
assigned data values, so a variable may exist with no values. The
|
||
value of an attribute is specified when it is created, unless it is a
|
||
zero-length attribute.
|
||
|
||
A variable may have attributes, but an attribute cannot have
|
||
attributes. Attributes assigned to variables may have the same units
|
||
as the variable (for example, valid_range) or have no units (for
|
||
example, scale_factor). If you want to store data that requires units
|
||
different from those of the associated variable, it is better to use a
|
||
variable than an attribute. More generally, if data require ancillary
|
||
data to describe them, are multidimensional, require any of the
|
||
defined netCDF dimensions to index their values, or require a
|
||
significant amount of storage, that data should be represented using
|
||
variables rather than attributes.
|
||
|
||
\page classic_file_parts Parts of a NetCDF Classic File
|
||
|
||
A netCDF classic or 64-bit offset dataset is stored as a single file
|
||
comprising two parts:
|
||
- a header, containing all the information about dimensions, attributes,
|
||
and variables except for the variable data;
|
||
- a data part, comprising fixed-size data, containing the data for
|
||
variables that don't have an unlimited dimension; and variable-size
|
||
data, containing the data for variables that have an unlimited
|
||
dimension.
|
||
|
||
Both the header and data parts are represented in a
|
||
machine-independent form. This form is very similar to XDR (eXternal
|
||
Data Representation), extended to support efficient storage of arrays
|
||
of non-byte data.
|
||
|
||
The header at the beginning of the file contains information about the
|
||
dimensions, variables, and attributes in the file, including their
|
||
names, types, and other characteristics. The information about each
|
||
variable includes the offset to the beginning of the variable's data
|
||
for fixed-size variables or the relative offset of other variables
|
||
within a record. The header also contains dimension lengths and
|
||
information needed to map multidimensional indices for each variable
|
||
to the appropriate offsets.
|
||
|
||
By default, this header has little usable extra space; it is only as
|
||
large as it needs to be for the dimensions, variables, and attributes
|
||
(including all the attribute values) in the netCDF dataset, with a
|
||
small amount of extra space from rounding up to the nearest disk block
|
||
size. This has the advantage that netCDF files are compact, requiring
|
||
very little overhead to store the ancillary data that makes the
|
||
datasets self-describing. A disadvantage of this organization is that
|
||
any operation on a netCDF dataset that requires the header to grow
|
||
(or, less likely, to shrink), for example adding new dimensions or new
|
||
variables, requires moving the data by copying it. This expense is
|
||
incurred when the enddef function is called: nc_enddef in C (see
|
||
nc_enddef), NF_ENDDEF in Fortran (see NF_ENDDEF), after a previous
|
||
call to the redef function: nc_redef in C (see nc_redef) or NF_REDEF
|
||
in Fortran (see NF_REDEF). If you create all necessary dimensions,
|
||
variables, and attributes before writing data, and avoid later
|
||
additions and renamings of netCDF components that require more space
|
||
in the header part of the file, you avoid the cost associated with
|
||
later changing the header.
|
||
|
||
Alternatively, you can use an alternative version of the enddef
|
||
function with two underbar characters instead of one to explicitly
|
||
reserve extra space in the file header when the file is created: in C
|
||
nc__enddef (see nc__enddef), in Fortran NF__ENDDEF (see NF__ENDDEF),
|
||
after a previous call to the redef function. This avoids the expense
|
||
of moving all the data later by reserving enough extra space in the
|
||
header to accommodate anticipated changes, such as the addition of new
|
||
attributes or the extension of existing string attributes to hold
|
||
longer strings.
|
||
|
||
When the size of the header is changed, data in the file is moved, and
|
||
the location of data values in the file changes. If another program is
|
||
reading the netCDF dataset during redefinition, its view of the file
|
||
will be based on old, probably incorrect indexes. If netCDF datasets
|
||
are shared across redefinition, some mechanism external to the netCDF
|
||
library must be provided that prevents access by readers during
|
||
redefinition, and causes the readers to call nc_sync/NF_SYNC before
|
||
any subsequent access.
|
||
|
||
The fixed-size data part that follows the header contains all the
|
||
variable data for variables that do not employ an unlimited
|
||
dimension. The data for each variable is stored contiguously in this
|
||
part of the file. If there is no unlimited dimension, this is the last
|
||
part of the netCDF file.
|
||
|
||
The record-data part that follows the fixed-size data consists of a
|
||
variable number of fixed-size records, each of which contains data for
|
||
all the record variables. The record data for each variable is stored
|
||
contiguously in each record.
|
||
|
||
The order in which the variable data appears in each data section is
|
||
the same as the order in which the variables were defined, in
|
||
increasing numerical order by netCDF variable ID. This knowledge can
|
||
sometimes be used to enhance data access performance, since the best
|
||
data access is currently achieved by reading or writing the data in
|
||
sequential order.
|
||
|
||
\page parts_of_netcdf4 Parts of a NetCDF-4 HDF5 File
|
||
|
||
NetCDF-4 files are created with the HDF5 library, and are HDF5 files
|
||
in every way, and can be read without the netCDF-4 interface. (Note
|
||
that modifying these files with HDF5 will almost certainly make them
|
||
unreadable to netCDF-4.)
|
||
|
||
Groups in a netCDF-4 file correspond with HDF5 groups (although the
|
||
netCDF-4 tree is rooted not at the HDF5 root, but in group “_netCDF”).
|
||
|
||
Variables in netCDF coorespond with identically named datasets in
|
||
HDF5. Attributes similarly.
|
||
|
||
Since there is more metadata in a netCDF file than an HDF5 file,
|
||
special datasets are used to hold netCDF metadata.
|
||
|
||
The _netcdf_dim_info dataset (in group _netCDF) contains the ids of
|
||
the shared dimensions, and their length (0 for unlimited dimensions).
|
||
|
||
The _netcdf_var_info dataset (in group _netCDF) holds an array of
|
||
compound types which contain the variable ID, and the associated
|
||
dimension ids.
|
||
|
||
\page xdr_layer The Extended XDR Layer
|
||
|
||
XDR is a standard for describing and encoding data and a library of
|
||
functions for external data representation, allowing programmers to
|
||
encode data structures in a machine-independent way. Classic or 64-bit
|
||
offset netCDF employs an extended form of XDR for representing
|
||
information in the header part and the data parts. This extended XDR
|
||
is used to write portable data that can be read on any other machine
|
||
for which the library has been implemented.
|
||
|
||
The cost of using a canonical external representation for data varies
|
||
according to the type of data and whether the external form is the
|
||
same as the machine's native form for that type.
|
||
|
||
For some data types on some machines, the time required to convert
|
||
data to and from external form can be significant. The worst case is
|
||
reading or writing large arrays of floating-point data on a machine
|
||
that does not use IEEE floating-point as its native representation.
|
||
|
||
\page large_file_support Large File Support
|
||
|
||
It is possible to write netCDF files that exceed 2 GiByte on platforms
|
||
that have "Large File Support" (LFS). Such files are
|
||
platform-independent to other LFS platforms, but trying to open them
|
||
on an older platform without LFS yields a "file too large" error.
|
||
|
||
Without LFS, no files larger than 2 GiBytes can be used. The rest of
|
||
this section applies only to systems with LFS.
|
||
|
||
The original binary format of netCDF (classic format) limits the size
|
||
of data files by using a signed 32-bit offset within its internal
|
||
structure. Files larger than 2 GiB can be created, with certain
|
||
limitations. See Classic Limitations.
|
||
|
||
In version 3.6.0, netCDF included its first-ever variant of the
|
||
underlying data format. The new format introduced in 3.6.0 uses 64-bit
|
||
file offsets in place of the 32-bit offsets. There are still some
|
||
limits on the sizes of variables, but the new format can create very
|
||
large datasets. See 64 bit Offset Limitations.
|
||
|
||
NetCDF-4 variables and files can be any size supported by the
|
||
underlying file system.
|
||
|
||
The original data format (netCDF classic), is still the default data
|
||
format for the netCDF library.
|
||
|
||
The following table summarizes the size limitations of various
|
||
permutations of LFS support, netCDF version, and data format. Note
|
||
that 1 GiB = 2^30 bytes or about 1.07e+9 bytes, 1 EiB = 2^60 bytes or
|
||
about 1.15e+18 bytes. Note also that all sizes are really 4 bytes less
|
||
than the ones given below. For example the maximum size of a fixed
|
||
variable in netCDF 3.6 classic format is really 2 GiB - 4 bytes.
|
||
|
||
Limit No LFS v3.5 v3.6/classic v3.6/64-bit offset v4.0/netCDF-4
|
||
|
||
Max File Size 2 GiB 8 EiB 8 EiB 8 EiB ??
|
||
|
||
Max Number of Fixed Vars > 2 GiB 0 1 (last) 1 (last) 2^32 ??
|
||
|
||
Max Record Vars w/ Rec Size > 2 GiB 0 1 (last) 1 (last) 2^32 ??
|
||
|
||
Max Size of Fixed/Record Size of Record Var 2 GiB 2 GiB 2 GiB 4 GiB ??
|
||
|
||
Max Record Size 2 GiB/nrecs 4 GiB 8 EiB/nrecs 8 EiB/nrecs ??
|
||
|
||
For more information about the different file formats of netCDF See
|
||
Which Format.
|
||
|
||
\page offset_format_limitations NetCDF 64-bit Offset Format Limitations
|
||
|
||
Although the 64-bit offset format allows the creation of much larger
|
||
netCDF files than was possible with the classic format, there are
|
||
still some restrictions on the size of variables.
|
||
|
||
It's important to note that without Large File Support (LFS) in the
|
||
operating system, it's impossible to create any file larger than 2
|
||
GiBytes. Assuming an operating system with LFS, the following
|
||
restrictions apply to the netCDF 64-bit offset format.
|
||
|
||
No fixed-size variable can require more than 2^32 - 4 bytes (i.e. 4GiB
|
||
- 4 bytes, or 4,294,967,292 bytes) of storage for its data, unless it
|
||
is the last fixed-size variable and there are no record
|
||
variables. When there are no record variables, the last fixed-size
|
||
variable can be any size supported by the file system, e.g. terabytes.
|
||
|
||
A 64-bit offset format netCDF file can have up to 2^32 - 1 fixed sized
|
||
variables, each under 4GiB in size. If there are no record variables
|
||
in the file the last fixed variable can be any size.
|
||
|
||
No record variable can require more than 2^32 - 4 bytes of storage for
|
||
each record's worth of data, unless it is the last record variable. A
|
||
64-bit offset format netCDF file can have up to 2^32 - 1 records, of
|
||
up to 2^32 - 1 variables, as long as the size of one record's data for
|
||
each record variable except the last is less than 4 GiB - 4.
|
||
|
||
Note also that all netCDF variables and records are padded to 4 byte
|
||
boundaries.
|
||
|
||
\page classic_format_limitations NetCDF Classic Format Limitations
|
||
|
||
There are important constraints on the structure of large netCDF
|
||
classic files that result from the 32-bit relative offsets that are
|
||
part of the netCDF classic file format:
|
||
|
||
The maximum size of a record in the classic format in versions 3.5.1
|
||
and earlier is 2^32 - 4 bytes, or about 4 GiB. In versions 3.6.0 and
|
||
later, there is no such restriction on total record size for the
|
||
classic format or 64-bit offset format.
|
||
|
||
If you don't use the unlimited dimension, only one variable can exceed
|
||
2 GiB in size, but it can be as large as the underlying file system
|
||
permits. It must be the last variable in the dataset, and the offset
|
||
to the beginning of this variable must be less than about 2 GiB.
|
||
|
||
The limit is really 2^31 - 4. If you were to specify a variable size
|
||
of 2^31 -3, for example, it would be rounded up to the nearest
|
||
multiple of 4 bytes, which would be 2^31, which is larger than the
|
||
largest signed integer, 2^31 - 1.
|
||
|
||
For example, the structure of the data might be something like:
|
||
|
||
\code
|
||
netcdf bigfile1 {
|
||
dimensions:
|
||
x=2000;
|
||
y=5000;
|
||
z=10000;
|
||
variables:
|
||
double x(x); // coordinate variables
|
||
double y(y);
|
||
double z(z);
|
||
double var(x, y, z); // 800 Gbytes
|
||
}
|
||
\endcode
|
||
|
||
If you use the unlimited dimension, record variables may exceed 2 GiB
|
||
in size, as long as the offset of the start of each record variable
|
||
within a record is less than 2 GiB - 4. For example, the structure of
|
||
the data in a 2.4 Tbyte file might be something like:
|
||
|
||
\code
|
||
netcdf bigfile2 {
|
||
dimensions:
|
||
x=2000;
|
||
y=5000;
|
||
z=10;
|
||
t=UNLIMITED; // 1000 records, for example
|
||
variables:
|
||
double x(x); // coordinate variables
|
||
double y(y);
|
||
double z(z);
|
||
double t(t);
|
||
// 3 record variables, 2400000000 bytes per record
|
||
double var1(t, x, y, z);
|
||
double var2(t, x, y, z);
|
||
double var3(t, x, y, z);
|
||
}
|
||
\endcode
|
||
|
||
\page netcdf_3_io The NetCDF-3 I/O Layer
|
||
|
||
The following discussion applies only to netCDF classic and 64-bit
|
||
offset files. For netCDF-4 files, the I/O layer is the HDF5 library.
|
||
|
||
For netCDF classic and 64-bit offset files, an I/O layer implemented
|
||
much like the C standard I/O (stdio) library is used by netCDF to read
|
||
and write portable data to netCDF datasets. Hence an understanding of
|
||
the standard I/O library provides answers to many questions about
|
||
multiple processes accessing data concurrently, the use of I/O
|
||
buffers, and the costs of opening and closing netCDF files. In
|
||
particular, it is possible to have one process writing a netCDF
|
||
dataset while other processes read it.
|
||
|
||
Data reads and writes are no more atomic than calls to stdio fread()
|
||
and fwrite(). An nc_sync/NF_SYNC call is analogous to the fflush call
|
||
in the C standard I/O library, writing unwritten buffered data so
|
||
other processes can read it; The C function nc_sync (see nc_sync), or
|
||
the Fortran function NF_SYNC (see NF_SYNC), also brings header changes
|
||
up-to-date (for example, changes to attribute values). Opening the
|
||
file with the NC_SHARE (in C) or the NF_SHARE (in Fortran) is
|
||
analogous to setting a stdio stream to be unbuffered with the _IONBF
|
||
flag to setvbuf.
|
||
|
||
As in the stdio library, flushes are also performed when "seeks" occur
|
||
to a different area of the file. Hence the order of read and write
|
||
operations can influence I/O performance significantly. Reading data
|
||
in the same order in which it was written within each record will
|
||
minimize buffer flushes.
|
||
|
||
You should not expect netCDF classic or 64-bit offset format data
|
||
access to work with multiple writers having the same file open for
|
||
writing simultaneously.
|
||
|
||
It is possible to tune an implementation of netCDF for some platforms
|
||
by replacing the I/O layer with a different platform-specific I/O
|
||
layer. This may change the similarities between netCDF and standard
|
||
I/O, and hence characteristics related to data sharing, buffering, and
|
||
the cost of I/O operations.
|
||
|
||
The distributed netCDF implementation is meant to be
|
||
portable. Platform-specific ports that further optimize the
|
||
implementation for better I/O performance are practical in some cases.
|
||
|
||
\page parallel_access Parallel Access with NetCDF-4
|
||
|
||
Use the special parallel open (or create) calls to open (or create) a
|
||
file, and then to use parallel I/O to read or write that file (see
|
||
nc_open_par()).
|
||
|
||
Note that the chunk cache is turned off if a file is opened for
|
||
parallel I/O in read/write mode. Open the file in read-only mode to
|
||
engage the chunk cache.
|
||
|
||
NetCDF uses the HDF5 parallel programming model for parallel I/O with
|
||
netCDF-4/HDF5 files. The HDF5 tutorial
|
||
(http://hdfgroup.org/HDF5//HDF5/Tutor) is a good reference.
|
||
|
||
For classic and 64-bit offset files, netCDF uses the parallel-netcdf
|
||
(formerly pnetcdf) library from Argonne National Labs/Nortwestern
|
||
University. For parallel access of classic and 64-bit offset files,
|
||
netCDF must be configured with the –with-pnetcdf option at build
|
||
time. See the parallel-netcdf site for more information
|
||
(http://www.mcs.anl.gov/parallel-netcdf).
|
||
|
||
\page interoperability_with_hdf5 Interoperability with HDF5
|
||
|
||
To create HDF5 files that can be read by netCDF-4, use the latest in
|
||
the HDF5 1.8.x series.
|
||
|
||
HDF5 has some features that will not be supported by netCDF-4, and
|
||
will cause problems for interoperability:
|
||
- HDF5 allows a Group to be both an ancestor and a descendant of
|
||
another Group, creating cycles in the subgroup graph. HDF5 also
|
||
permits multiple parents for a Group. In the netCDF-4 data model,
|
||
Groups form a tree with no cycles, so each Group (except the
|
||
top-level unnamed Group) has a unique parent.
|
||
- HDF5 supports "references" which are like pointers to objects and
|
||
data regions within a file. The netCDF-4 data model omits
|
||
references.
|
||
- HDF5 supports some primitive types that are not included in the
|
||
netCDF-4 data model, including H5T_TIME and H5T_BITFIELD.
|
||
- HDF5 supports multiple names for data objects like Datasets
|
||
(netCDF-4 variables) with no distinguished name. The netCDF-4 data
|
||
model requires that each variable, attribute, dimension, and group
|
||
have a single distinguished name.
|
||
- HDF5 (like netCDF) supports scalar attributes, but netCDF-4 cannot
|
||
read scalar HDF5 attributes (unless it is a string
|
||
attribute). This limitation will be removed in a future release of
|
||
netCDF.
|
||
|
||
These are fairly easy requirements to meet, but there is one relating
|
||
to shared dimensions which is a little more challenging. Every HDF5
|
||
dataset must have a dimension scale attached to each dimension.
|
||
|
||
Dimension scales are a new feature for HF 1.8, which allow
|
||
specification of shared dimensions.
|
||
|
||
Without creation order in the HDF5 file, the files will still be
|
||
readable to netCDF-4, it's just that netCDF-4 will number the
|
||
variables in alphabetical, rather than creation, order.
|
||
|
||
Interoperability is a complex task, and all of this is in the alpha
|
||
release stage. It is tested in libsrc4/tst_interops.c, which contains
|
||
some examples of how to create HDF5 files, modify them in netCDF-4,
|
||
and then verify them in HDF5. (And vice versa).
|
||
|
||
\page dap_support DAP Support
|
||
|
||
Beginning with netCDF version 4.1, optional support is provided for
|
||
accessing data through OPeNDAP servers using the DAP protocol.
|
||
|
||
DAP support is automatically enabled if a usable curl library can be
|
||
located using the curl-config program.
|
||
DAP support can forcibly be enabled or disabled using the –enable-dap
|
||
flag or the –disable-dap flag, respectively. If enabled, then DAP
|
||
support requires access to the curl library. Refer to the installation
|
||
manual for details: The NetCDF Installation and Porting Guide.
|
||
|
||
DAP uses a data model that is different from that supported by netCDF,
|
||
either classic or enhanced. Generically, the DAP data model is encoded
|
||
textually in a DDS (Dataset Descriptor Structure). There is a second
|
||
data model for DAP attributes, which is encoded textually in a DAS
|
||
(Dataset Attribute Structure). For detailed information about the DAP
|
||
DDS and DAS, refer to the OPeNDAP web site http://opendap.org.
|
||
|
||
\section Accessing OPeNDAP Data
|
||
|
||
In order to access an OPeNDAP data source through the netCDF API, the
|
||
file name normally used is replaced with a URL with a specific
|
||
format. The URL is composed of four parts.
|
||
- Client parameters - these are prefixed to the front of the URL and
|
||
are of the general form [{name}] or [{name}=value]. Examples
|
||
include [cache=1] and [netcdf3].
|
||
- URL - this is a standard form URL such as
|
||
http://motherlode.unidata.ucar.edu:8081/dts/test.01
|
||
- Constraints - these are suffixed to the URL and take the form
|
||
“?\<projections>&selections”. The meaning of the terms projection
|
||
and selection is somewhat complicated; and the OPeNDAP web site,
|
||
http://www.opendap.or, should be consulted. The interaction of DAP
|
||
constraints with netCDF is complex and at the moment requires an
|
||
understanding of how DAP is translated to netCDF.
|
||
|
||
It is possible to see what the translation does to a particular DAP
|
||
data source in either of two ways. First, one can examine the DDS
|
||
source through a web browser and then examine the translation using
|
||
the ncdump -h command to see the netCDF Classic translation. The
|
||
ncdump output will actually be the union of the DDS with the DAS, so
|
||
to see the complete translation, it is necessary to view both.
|
||
|
||
For example, if a web browser is given the following, the first URL
|
||
will return the DDS for the specified dataset, and the second URL will
|
||
return the DAS for the specified dataset.
|
||
|
||
\code
|
||
http://test.opendap.org:8080/dods/dts/test.01.dds
|
||
http://test.opendap.org:8080/dods/dts/test.01.das
|
||
\endcode
|
||
|
||
Then by using the following ncdump command, it is possible to see the
|
||
equivalent netCDF Classic translation.
|
||
|
||
\code
|
||
ncdump -h http://test.opendap.org:8080/dods/dts/test.01
|
||
\endcode
|
||
|
||
The DDS output from the web server should look like this.
|
||
|
||
\code
|
||
Dataset {
|
||
Byte b;
|
||
Int32 i32;
|
||
UInt32 ui32;
|
||
Int16 i16;
|
||
UInt16 ui16;
|
||
Float32 f32;
|
||
Float64 f64;
|
||
String s;
|
||
Url u;
|
||
} SimpleTypes;
|
||
\endcode
|
||
|
||
The DAS output from the web server should look like this.
|
||
|
||
\code
|
||
Attributes {
|
||
Facility {
|
||
String PrincipleInvestigator ``Mark Abbott'', ``Ph.D'';
|
||
String DataCenter ``COAS Environmental Computer Facility'';
|
||
String DrifterType ``MetOcean WOCE/OCM'';
|
||
}
|
||
b {
|
||
String Description ``A test byte'';
|
||
String units ``unknown'';
|
||
}
|
||
i32 {
|
||
String Description ``A 32 bit test server int'';
|
||
String units ``unknown'';
|
||
}
|
||
}
|
||
\endcode
|
||
|
||
The output from ncdump should look like this.
|
||
|
||
\code
|
||
netcdf test {
|
||
dimensions:
|
||
stringdim64 = 64 ;
|
||
variables:
|
||
byte b ;
|
||
b:Description = "A test byte" ;
|
||
b:units = "unknown" ;
|
||
int i32 ;
|
||
i32:Description = "A 32 bit test server int" ;
|
||
i32:units = "unknown" ;
|
||
int ui32 ;
|
||
short i16 ;
|
||
short ui16 ;
|
||
float f32 ;
|
||
double f64 ;
|
||
char s(stringdim64) ;
|
||
char u(stringdim64) ;
|
||
}
|
||
\endcode
|
||
|
||
Note that the fields of type String and type URL have suddenly
|
||
acquired a dimension. This is because strings are translated to arrays
|
||
of char, which requires adding an extra dimension. The size of the
|
||
dimension is determined in a variety of ways and can be specified. It
|
||
defaults to 64 and when read, the underlying string is either padded
|
||
or truncated to that length.
|
||
|
||
Also note that the Facility attributes do not appear in the
|
||
translation because they are neither global nor associated with a
|
||
variable in the DDS.
|
||
|
||
Alternately, one can get the text of the DDS as a global attribute by
|
||
using the client parameters mechanism . In this case, the parameter
|
||
“[show=dds]” can be prefixed to the URL and the data retrieved using
|
||
the following command
|
||
|
||
\code
|
||
ncdump -h [show=dds]http://test.opendap.org:8080/dods/dts/test.01.dds
|
||
\endcode
|
||
|
||
The ncdump -h command will then show both the translation and the
|
||
original DDS. In the above example, the DDS would appear as the global
|
||
attribute “_DDS” as follows.
|
||
|
||
\code
|
||
netcdf test {
|
||
...
|
||
variables:
|
||
:_DDS = "Dataset { Byte b; Int32 i32; UInt32 ui32; Int16 i16;
|
||
UInt16 ui16; Float32 f32; Float64 f64;
|
||
Strings; Url u; } SimpleTypes;"
|
||
|
||
byte b ;
|
||
...
|
||
}
|
||
\endcode
|
||
|
||
\section dap_to_netcdf DAP to NetCDF Translation Rules
|
||
|
||
Two translations are currently available.
|
||
|
||
DAP 2 Protocol to netCDF-3
|
||
DAP 2 Protocol to netCDF-4
|
||
|
||
\subsection netCDF-3 Translation Rules
|
||
|
||
The current default translation code translates the OPeNDAP protocol
|
||
to netCDF-3 (classic). This netCDF-3 translation converts an OPeNDAP
|
||
DAP protocol version 2 DDS to netCDF-3 and is designed to mimic as
|
||
closely as possible the translation provided by the libnc-dap
|
||
system. In addition, a translation to netCDF-4 (enhanced) is provided
|
||
that is entirely new.
|
||
|
||
For illustrative purposes, the following example will be used.
|
||
|
||
\code
|
||
Dataset {
|
||
Int32 f1;
|
||
Structure {
|
||
Int32 f11;
|
||
Structure {
|
||
Int32 f1[3];
|
||
Int32 f2;
|
||
} FS2[2];
|
||
} S1;
|
||
Structure {
|
||
Grid {
|
||
Array:
|
||
Float32 temp[lat=2][lon=2];
|
||
Maps:
|
||
Int32 lat[lat=2];
|
||
Int32 lon[lon=2];
|
||
} G1;
|
||
} S2;
|
||
Grid {
|
||
Array:
|
||
Float32 G2[lat=2][lon=2];
|
||
Maps:
|
||
Int32 lat[2];
|
||
Int32 lon[2];
|
||
} G2;
|
||
Int32 lat[lat=2];
|
||
Int32 lon[lon=2];
|
||
} D1;
|
||
\code
|
||
|
||
\subsection Variable Definition
|
||
|
||
The set of netCDF variables is derived from the fields with primitive
|
||
base types as they occur in Sequences, Grids, and Structures. The
|
||
field names are modified to be fully qualified initially. For the
|
||
above, the set of variables are as follows. The coordinate variables
|
||
within grids are left out in order to mimic the behavior of libnc-dap.
|
||
|
||
\code
|
||
f1
|
||
S1.f11
|
||
S1.FS2.f1
|
||
S1.FS2.f2
|
||
S2.G1.temp
|
||
S2.G2.G2
|
||
lat
|
||
lon
|
||
\endcode
|
||
|
||
\subsection Variable Dimension Translation
|
||
|
||
A variable's rank is determined from three sources.
|
||
- The variable has the dimensions associated with the field it
|
||
represents (e.g. S1.FS2.f1[3] in the above example).
|
||
- The variable inherits the dimensions associated with any containing
|
||
structure that has a rank greater than zero. These dimensions precede
|
||
those of case 1. Thus, we have in our example, f1[2][3], where the
|
||
first dimension comes from the containing Structure FS2[2].
|
||
- The variable's set of dimensions are altered if any of its
|
||
containers is a DAP DDS Sequence. This is discussed more fully below.
|
||
|
||
If the type of the netCDF variable is char, then an extra string
|
||
dimension is added as the last dimension.
|
||
|
||
\subsection Dimension translation
|
||
|
||
For dimensions, the rules are as follows.
|
||
|
||
Fields in dimensioned structures inherit the dimension of the
|
||
structure; thus the above list would have the following dimensioned
|
||
variables.
|
||
|
||
\code
|
||
S1.FS2.f1 -> S1.FS2.f1[2][3]
|
||
S1.FS2.f2 -> S1.FS2.f2[2]
|
||
S2.G1.temp -> S2.G1.temp[lat=2][lon=2]
|
||
S2.G1.lat -> S2.G1.lat[lat=2]
|
||
S2.G1.lon -> S2.G1.lon[lon=2]
|
||
S2.G2.G2 -> S2.G2.lon[lat=2][lon=2]
|
||
S2.G2.lat -> S2.G2.lat[lat=2]
|
||
S2.G2.lon -> S2.G2.lon[lon=2]
|
||
lat -> lat[lat=2]
|
||
lon -> lon[lon=2]
|
||
\endcode
|
||
|
||
Collect all of the dimension specifications from the DDS, both named
|
||
and anonymous (unnamed) For each unique anonymous dimension with value
|
||
NN create a netCDF dimension of the form "XX_<i>=NN", where XX is the
|
||
fully qualified name of the variable and i is the i'th (inherited)
|
||
dimension of the array where the anonymous dimension occurs. For our
|
||
example, this would create the following dimensions.
|
||
|
||
\code
|
||
S1.FS2.f1_0 = 2 ;
|
||
S1.FS2.f1_1 = 3 ;
|
||
S1.FS2.f2_0 = 2 ;
|
||
S2.G2.lat_0 = 2 ;
|
||
S2.G2.lon_0 = 2 ;
|
||
\endcode
|
||
|
||
If however, the anonymous dimension is the single dimension of a MAP
|
||
vector in a Grid then the dimension is given the same name as the map
|
||
vector This leads to the following.
|
||
|
||
\code
|
||
S2.G2.lat_0 -> S2.G2.lat
|
||
S2.G2.lon_0 -> S2.G2.lon
|
||
\endcode
|
||
|
||
For each unique named dimension "<name>=NN", create a netCDF dimension
|
||
of the form "<name>=NN", where name has the qualifications removed. If
|
||
this leads to duplicates (i.e. same name and same value), then the
|
||
duplicates are ignored. This produces the following.
|
||
|
||
\code
|
||
S2.G2.lat -> lat
|
||
S2.G2.lon -> lon
|
||
\endcode
|
||
|
||
Note that this produces duplicates that will be ignored later.
|
||
|
||
At this point the only dimensions left to process should be named
|
||
dimensions with the same name as some dimension from step number 3,
|
||
but with a different value. For those dimensions create a dimension of
|
||
the form "<name>M=NN" where M is a counter starting at 1. The example
|
||
has no instances of this.
|
||
|
||
Finally and if needed, define a single UNLIMITED dimension named
|
||
"unlimited" with value zero. Unlimited will be used to handle certain
|
||
kinds of DAP sequences (see below).
|
||
|
||
This leads to the following set of dimensions.
|
||
|
||
\code
|
||
dimensions:
|
||
unlimited = UNLIMITED;
|
||
lat = 2 ;
|
||
lon = 2 ;
|
||
S1.FS2.f1_0 = 2 ;
|
||
S1.FS2.f1_1 = 3 ;
|
||
S1.FS2.f2_0 = 2 ;
|
||
\endcode
|
||
|
||
\subsection Variable Name Translation
|
||
|
||
The steps for variable name translation are as follows.
|
||
|
||
Take the set of variables captured above. Thus for the above DDS, the
|
||
following fields would be collected.
|
||
|
||
\code
|
||
f1
|
||
S1.f11
|
||
S1.FS2.f1
|
||
S1.FS2.f2
|
||
S2.G1.temp
|
||
S2.G2.G2
|
||
lat
|
||
lon
|
||
\endcode
|
||
|
||
All grid array variables are renamed to be the same as the containing
|
||
grid and the grid prefix is removed. In the above DDS, this results in
|
||
the following changes.
|
||
|
||
\code
|
||
G1.temp -> G1
|
||
G2.G2 -> G2
|
||
\endcode
|
||
|
||
It is important to note that this process could produce duplicate
|
||
variables (i.e. with the same name); in that case they are all assumed
|
||
to have the same content and the duplicates are ignored. If it turns
|
||
out that the duplicates have different content, then the translation
|
||
will not detect this. YOU HAVE BEEN WARNED.
|
||
|
||
The final netCDF-3 schema (minus attributes) is then as follows.
|
||
|
||
\code
|
||
netcdf t {
|
||
dimensions:
|
||
unlimited = UNLIMITED ;
|
||
lat = 2 ;
|
||
lon = 2 ;
|
||
S1.FS2.f1_0 = 2 ;
|
||
S1.FS2.f1_1 = 3 ;
|
||
S1.FS2.f2_0 = 2 ;
|
||
variables:
|
||
int f1 ;
|
||
int lat(lat) ;
|
||
int lon(lon) ;
|
||
int S1.f11 ;
|
||
int S1.FS2.f1(S1.FS2.f1_0, S1.FS2.f1_1) ;
|
||
int S1.FS2.f2(S1_FS2_f2_0) ;
|
||
float S2.G1(lat, lon) ;
|
||
float G2(lat, lon) ;
|
||
}
|
||
\endcode
|
||
|
||
In actuality, the unlimited dimension is dropped because it is unused.
|
||
|
||
There are differences with the original libnc-dap here because
|
||
libnc-dap technically was incorrect. The original would have said
|
||
this, for example.
|
||
|
||
\code
|
||
int S1.FS2.f1(lat, lat) ;
|
||
\endcode
|
||
|
||
Note that this is incorrect because it dimensions S1.FS2.f1(2,2)
|
||
rather than S1.FS2.f1(2,3).
|
||
|
||
\subsection Translating DAP DDS Sequences
|
||
|
||
Any variable (as determined above) that is contained directly or
|
||
indirectly by a Sequence is subject to revision of its rank using the
|
||
following rules.
|
||
|
||
Let the variable be contained in Sequence Q1, where Q1 is the
|
||
innermost containing sequence. If Q1 is itself contained (directly or
|
||
indirectly) in a sequence, or Q1 is contained (again directly or
|
||
indirectly) in a structure that has rank greater than 0, then the
|
||
variable will have an initial UNLIMITED dimension. Further, all
|
||
dimensions coming from "above" and including (in the containment
|
||
sense) the innermost Sequence, Q1, will be removed and replaced by
|
||
that single UNLIMITED dimension. The size associated with that
|
||
UNLIMITED is zero, which means that its contents are inaccessible
|
||
through the netCDF-3 API. Again, this differs from libnc-dap, which
|
||
leaves out such variables. Again, however, this difference is backward
|
||
compatible.
|
||
|
||
If the variable is contained in a single Sequence (i.e. not nested)
|
||
and all containing structures have rank 0, then the variable will have
|
||
an initial dimension whose size is the record count for that
|
||
Sequence. The name of the new dimension will be the name of the
|
||
Sequence.
|
||
|
||
Consider this example.
|
||
|
||
\code
|
||
Dataset {
|
||
Structure {
|
||
Sequence {
|
||
Int32 f1[3];
|
||
Int32 f2;
|
||
} SQ1;
|
||
} S1[2];
|
||
Sequence {
|
||
Structure {
|
||
Int32 x1[7];
|
||
} S2[5];
|
||
} Q2;
|
||
} D;
|
||
\endcode
|
||
|
||
The corresponding netCDF-3 translation is pretty much as follows (the
|
||
value for dimension Q2 may differ).
|
||
|
||
\code
|
||
dimensions:
|
||
unlimited = UNLIMITED ; // (0 currently)
|
||
S1.SQ1.f1_0 = 2 ;
|
||
S1.SQ1.f1_1 = 3 ;
|
||
S1.SQ1.f2_0 = 2 ;
|
||
Q2.S2.x1_0 = 5 ;
|
||
Q2.S2.x1_1 = 7 ;
|
||
Q2 = 5 ;
|
||
variables:
|
||
int S1.SQ1.f1(unlimited, S1.SQ1.f1_1) ;
|
||
int S1.SQ1.f2(unlimited) ;
|
||
int Q2.S2.x1(Q2, Q2.S2.x1_0, Q2.S2.x1_1) ;
|
||
\endcode
|
||
|
||
Note that for example S1.SQ1.f1_0 is not actually used because it has
|
||
been folded into the unlimited dimension.
|
||
|
||
Note that for sequences without a leading unlimited dimension, there
|
||
is a performance cost because the translation code has to walk the
|
||
data to determine how many records are associated with the
|
||
sequence. Since libnc-dap did essentially the same thing, it can be
|
||
assumed that the cost is not prohibitive.
|
||
|
||
\subsection netCDF-4 Translation Rules
|
||
|
||
A DAP to netCDF-4 translation also exists, but is not the default and
|
||
in any case is only available if the "–enable-netcdf-4" option is
|
||
specified at configure time. This translation includes some elements
|
||
of the libnc-dap translation, but attempts to provide a simpler (but
|
||
not, unfortunately, simple) set of translation rules than is used for
|
||
the netCDF-3 translation. Please note that the translation is still
|
||
experimental and will change to respond to unforeseen problems or to
|
||
suggested improvements.
|
||
|
||
This text will use this running example.
|
||
|
||
\code
|
||
Dataset {
|
||
Int32 f1[fdim=10];
|
||
Structure {
|
||
Int32 f11;
|
||
Structure {
|
||
Int32 f1[3];
|
||
Int32 f2;
|
||
} FS2[2];
|
||
} S1;
|
||
Grid {
|
||
Array:
|
||
Float32 temp[lat=2][lon=2];
|
||
Maps:
|
||
Int32 lat[2];
|
||
Int32 lon[2];
|
||
} G1;
|
||
Sequence {
|
||
Float64 depth;
|
||
} Q1;
|
||
} D
|
||
\code
|
||
|
||
\subsection Variable Definition
|
||
|
||
The rule for choosing variables is relatively simple. Start with the
|
||
names of the top-level fields of the DDS. The term top-level means
|
||
that the object is a direct subnode of the Dataset object. In our
|
||
example, this produces the set [f1, S1, G1, Q1].
|
||
|
||
\subsection Dimension Definition
|
||
|
||
The rules for choosing and defining dimensions is as follows.
|
||
|
||
Collect the set of dimensions (named and anonymous) directly
|
||
associated with the variables as defined above. This means that
|
||
dimensions within user-defined types are ignored. From our example,
|
||
the dimension set is [fdim=10,lat=2,lon=2,2,2]. Note that the
|
||
unqualified names are used.
|
||
|
||
All remaining anonymous dimensions are given the name "<var>_NN",
|
||
where "<var>" is the unqualified name of the variable in which the
|
||
anonymous dimension appears and NN is the relative position of that
|
||
dimension in the dimensions associated with that array. No instances
|
||
of this rule occur in the running example.
|
||
|
||
Remove duplicate dimensions (those with same name and value). Our
|
||
dimension set now becomes [fdim=10,lat=2,lon=2].
|
||
|
||
The final case occurs when there are dimensions with the same name but
|
||
with different values. For this case, the size of the dimension is
|
||
appended to the dimension name.
|
||
|
||
\subsection Type Definition
|
||
|
||
The rules for choosing user-defined types are as follows.
|
||
|
||
For every Structure, Grid, and Sequence, a netCDF-4 compound type is
|
||
created whose fields are the fields of the Structure, Sequence, or
|
||
Grid. With one exception, the name of the type is the same as the
|
||
Structure or Grid name suffixed with "_t". The exception is that the
|
||
compound types derived from Sequences are instead suffixed with
|
||
"_record_t".
|
||
|
||
The types of the fields are the types of the corresponding field of
|
||
the Structure, Sequence, or Grid. Note that this type might be itself
|
||
a user-defined type.
|
||
|
||
From the example, we get the following compound types.
|
||
|
||
\code
|
||
compound FS2_t {
|
||
int f1(3);
|
||
int f2;
|
||
};
|
||
compound S1_t {
|
||
int f11;
|
||
FS2_t FS2(2);
|
||
};
|
||
compound G1_t {
|
||
float temp(2,2);
|
||
int lat(2);
|
||
int lon(2);
|
||
}
|
||
compound Q1_record_t {
|
||
double depth;
|
||
};
|
||
\endcode
|
||
|
||
For all sequences of name X, also create this type.
|
||
|
||
\code
|
||
X_record_t (*) X_t
|
||
\endcode
|
||
|
||
In our example, this produces the following type.
|
||
|
||
\code
|
||
Q1_record_t (*) Q1_t
|
||
\endcode
|
||
|
||
If a Sequence, Q has a single field F, whose type is a primitive type,
|
||
T, (e.g., int, float, string), then do not apply the previous rule,
|
||
but instead replace the whole sequence with the the following field.
|
||
|
||
\code
|
||
T (*) Q.f
|
||
\endcode
|
||
|
||
\subsection Choosing a Translation
|
||
|
||
The decision about whether to translate to netCDF-3 or netCDF-4 is
|
||
determined by applying the following rules in order.
|
||
- If the NC_CLASSIC_MODEL flag is set on nc_open(), then netCDF-3
|
||
translation is used.
|
||
- If the NC_NETCDF4 flag is set on nc_open(), then netCDF-4
|
||
translation is used.
|
||
- If the URL is prefixed with the client parameter "[netcdf3]" or
|
||
"[netcdf-3]" then netCF-3 translation is used.
|
||
- If the URL is prefixed with the client parameter "[netcdf4]" or
|
||
"[netcdf-4]" then netCF-4 translation is used.
|
||
- If none of the above holds, then default to netCDF-3 classic translation.
|
||
|
||
\subsection Caching
|
||
|
||
In an effort to provide better performance for some access patterns,
|
||
client-side caching of data is available. The default is no caching,
|
||
but it may be enabled by prefixing the URL with "[cache]".
|
||
|
||
Caching operates basically as follows.
|
||
|
||
When a URL is first accessed using nc_open(), netCDF automatically
|
||
does a pre-fetch of selected variables. These include all variables
|
||
smaller than a specified (and user definable) size. This allows, for
|
||
example, quick access to coordinate variables.
|
||
|
||
Whenever a request is made using some variant of the nc_get_var() API
|
||
procedures, the complete variable is fetched and stored in the cache
|
||
as a new cache entry. Subsequence requests for any part of that
|
||
variable will access the cache entry to obtain the data.
|
||
|
||
The cache may become too full, either because there are too many
|
||
entries or because it is taking up too much disk space. In this case
|
||
cache entries are purged until the cache size limits are reached. The
|
||
cache purge algorithm is LRU (least recently used) so that variables
|
||
that are repeatedly referenced will tend to stay in the cache.
|
||
|
||
The cache is completely purged when nc_close() is invoked.
|
||
|
||
In order to decide if you should enable caching, you will need to have
|
||
some understanding of the access patterns of your program.
|
||
|
||
The ncdump program always dumps one or more whole variables so it
|
||
turns on caching.
|
||
|
||
If your program accesses only parts of a number of variables, then
|
||
caching should probably not be used since fetching whole variables
|
||
will probably slow down your program for no purpose.
|
||
|
||
Unfortunately, caching is currently an all or nothing proposition, so
|
||
for more complex access patterns, the decision to cache or not may not
|
||
have an obvious answer. Probably a good rule of thumb is to avoid
|
||
caching initially and later turn it on to see its effect on
|
||
performance.
|
||
|
||
\subsection Defined Client Parameters
|
||
|
||
Currently, a limited set of client parameters is
|
||
recognized. Parameters not listed here are ignored, but no error is
|
||
signalled.
|
||
|
||
Parameter Name Legal Values Semantics
|
||
- [netcdf-3]|[netcdf-3] - Specify translation to netCDF-3.
|
||
- [netcdf-4]|[netcdf-4] - Specify translation to netCDF-4.
|
||
- "[log]|[log=<file>]" "" - Turn on logging and send the log output to
|
||
the specified file. If no file is specified, then output to standard
|
||
error.
|
||
- "[show=...]" das|dds|url - This causes information to appear as
|
||
specific global attributes. The currently recognized tags are "dds"
|
||
to display the underlying DDS, "das" similarly, and "url" to display
|
||
the url used to retrieve the data. This parameter may be specified
|
||
multiple times (e.g. “[show=dds][show=url]”).
|
||
- "[show=fetch]" - This parameter causes the netCDF code to log a copy
|
||
of the complete url for every HTTP get request. If logging is
|
||
enabled, then this can be helpful in checking to see the access
|
||
behavior of the netCDF code.
|
||
- "[stringlength=NN]" - Specify the default string length to use for
|
||
string dimensions. The default is 64.
|
||
- "[stringlength_<var>=NN]" - Specify the default string length to use
|
||
for a string dimension for the specified variable. The default is
|
||
64.
|
||
- "[cache]" - This enables caching.
|
||
- "[cachelimit=NN]" - Specify the maximum amount of space allowed for
|
||
the cache.
|
||
- "[cachecount=NN]" - Specify the maximum number of entries in the
|
||
cache.
|
||
|
||
\subsection Notes on Debugging OPeNDAP Access
|
||
|
||
The OPeNDAP support makes use of the logging facility of the
|
||
underlying oc system. Note that this is currently separate from the
|
||
existing netCDF logging facility. Turning on this logging can
|
||
sometimes give important information. Logging can be enabled by
|
||
prefixing the url with the client parameter [log] or [log=filename],
|
||
where the first case will send log output to standard error and the
|
||
second will send log output to the specified file.
|
||
|
||
Users should also be aware that the DAP subsystem creates temporary
|
||
files of the name dataddsXXXXXX, where XXXXX is some random string. If
|
||
the program using the DAP subsystem crashes, these files may be left
|
||
around. It is perfectly safe to delete them. Also, if you are
|
||
accessing data over an NFS mount, you may see some .nfsxxxxx files;
|
||
those can be ignored as well. 4.12.4 HTTP Configuration.
|
||
|
||
Limited support for configuring the http connection is provided via
|
||
parameters in the “.httprc” configuration file. Although deprecated,
|
||
the name “.dodsrc” may also be used. The relevant .httprc file is
|
||
located by first looking in the current working directory, and if not
|
||
found, then looking in the directory specified by the “$HOME”
|
||
environment variable.
|
||
|
||
Entries in the .httprc file are of the form:
|
||
|
||
\code
|
||
['['<url>']']<key>=<value>
|
||
\endcode
|
||
|
||
That is, it consists of a key name and value pair and optionally
|
||
preceded by a url enclosed in square brackets.
|
||
|
||
For given KEY and URL strings, the value chosen is as follows:
|
||
|
||
If URL is null, then look for the .dodsrc entry that has no url prefix
|
||
and whose key is same as the KEY for which we are looking.
|
||
|
||
If the URL is not null, then look for all the .dodsrc entries that
|
||
have a url, URL1, say, and for which URL1 is a prefix (in the string
|
||
sense) of URL. For example, if URL = http//x.y/a, then it will match
|
||
entries of the form
|
||
|
||
\code
|
||
1. [http//x.y/a]KEY=VALUE
|
||
2. [http//x.y/a/b]KEY=VALUE
|
||
\endcode
|
||
|
||
It will not match an entry of the form
|
||
|
||
\code
|
||
[http//x.y/b]KEY=VALUE
|
||
\endcode
|
||
|
||
because “http://x.y/b” is not a string prefix of
|
||
“http://x.y/a”. Finally from the set so constructed, choose the entry
|
||
with the longest url prefix: “http//x.y/a/b]KEY=VALUE” in this case.
|
||
|
||
Currently, the supported set of keys (with descriptions) are as
|
||
follows.
|
||
|
||
<pre>
|
||
HTTP.VERBOSE
|
||
Type: boolean ("1"/"0")
|
||
Description: Produce verbose output, especially using SSL.
|
||
Related CURL Flags: CURLOPT_VERBOSE
|
||
HTTP.DEFLATE
|
||
Type: boolean ("1"/"0")
|
||
Description: Allow use of compression by the server.
|
||
Related CURL Flags: CURLOPT_ENCODING
|
||
HTTP.COOKIEJAR
|
||
Type: String representing file path
|
||
Description: Specify the name of file into which to store cookies. Defaults to in-memory storage.
|
||
Related CURL Flags:CURLOPT_COOKIEJAR
|
||
HTTP.COOKIEFILE
|
||
Type: String representing file path
|
||
Description: Same as HTTP.COOKIEJAR.
|
||
Related CURL Flags: CURLOPT_COOKIEFILE
|
||
HTTP.CREDENTIALS.USER
|
||
Type: String representing user name
|
||
Description: Specify the user name for Digest and Basic authentication.
|
||
Related CURL Flags:
|
||
HTTP.CREDENTIALS.PASSWORD
|
||
Type: String representing password
|
||
Type: boolean ("1"/"0")
|
||
Description: Specify the password for Digest and Basic authentication.
|
||
Related CURL Flags:
|
||
HTTP.SSL.CERTIFICATE
|
||
Type: String representing file path
|
||
Description: Path to a file containing a PEM cerficate.
|
||
Related CURL Flags: CURLOPT_CERT
|
||
HTTP.SSL.KEY
|
||
Type: String representing file path
|
||
Description: Same as HTTP.SSL.CERTIFICATE, and should usually have the same value.
|
||
Related CURL Flags: CURLOPT_SSLKEY
|
||
HTTP.SSL.KEYPASSWORD
|
||
Type: String representing password
|
||
Description: Password for accessing the HTTP.SSL.KEY/HTTP.SSL.CERTIFICATE
|
||
Related CURL Flags: CURLOPT_KEYPASSWORD
|
||
HTTP.SSL.CAPATH
|
||
Type: String representing directory
|
||
Description: Path to a directory containing trusted certificates for validating server sertificates.
|
||
Related CURL Flags: CURLOPT_CAPATH
|
||
HTTP.SSL.VALIDATE
|
||
Type: boolean ("1"/"0")
|
||
Description: Cause the client to verify the server's presented certificate.
|
||
Related CURL Flags: CURLOPT_SSL_VERIFYPEER, CURLOPT_SSL_VERIFYHOST
|
||
HTTP.TIMEOUT
|
||
Type: String ("dddddd")
|
||
Description: Specify the maximum time in seconds that you allow the http transfer operation to take.
|
||
Related CURL Flags: CURLOPT_TIMEOUT, CURLOPT_NOSIGNAL
|
||
HTTP.PROXY_SERVER
|
||
Type: String representing url to access the proxy: (e.g.http://[username:password@]host[:port])
|
||
Description: Specify the needed information for accessing a proxy.
|
||
Related CURL Flags: CURLOPT_PROXY, CURLOPT_PROXYHOST, CURLOPT_PROXYUSERPWD
|
||
</pre>
|
||
|
||
The related curl flags line indicates the curl flags modified by this
|
||
key. See the libcurl documentation of the curl_easy_setopt() function
|
||
for more detail http://curl.haxx.se/libcurl/c/curl_easy_setopt.html.
|
||
|
||
For ESG, the following entries must be specified:
|
||
|
||
\code
|
||
HTTP.SSL.VALIDATE
|
||
HTTP.COOKIEJAR
|
||
HTTP.SSL.CERTIFICATE
|
||
HTTP.SSL.KEY
|
||
HTTP.SSL.CAPATH
|
||
\endcode
|
||
|
||
Additionally, for ESG, the HTTP.SSL.CERTIFICATE and HTTP.SSL.KEY
|
||
entries should have same value, which is the file path for the
|
||
certificate produced by MyProxyLogon. The HTTP.SSL.CAPATH entry should
|
||
be the path to the "certificates" directory produced by MyProxyLogon.
|
||
|
||
\page chunk_cache The Chunk Cache
|
||
|
||
When data are first read or written to a netCDF-4/HDF5 variable, the
|
||
HDF5 library opens a cache for that variable. The default size of that
|
||
cache (settable with the –with-chunk-cache-size at netCDF build time).
|
||
|
||
For good performance your chunk cache must be larger than one chunk of
|
||
your data - preferably that it be large enough to hold multiple chunks
|
||
of data.
|
||
|
||
In addition, when a file is opened (or a variable created in an open
|
||
file), the netCDF-4 library checks to make sure the default chunk
|
||
cache size will work for that variable. The cache will be large enough
|
||
to hold N chunks, up to a maximum size of M bytes. (Both N and M are
|
||
settable at configure time with the –with-default-chunks-in-cache and
|
||
the –with-max-default-cache-size options to the configure
|
||
script. Currently they are set to 10 and 64 MB.)
|
||
|
||
To change the default chunk cache size, use the set_chunk_cache
|
||
function before opening the file with nc_set_chunk_cache(). Fortran 77
|
||
programmers see NF_SET_CHUNK_CACHE). Fortran 90 programmers use the
|
||
optional cache_size, cache_nelems, and cache_preemption parameters to
|
||
nf90_open/nf90_create to change the chunk size before opening the
|
||
file.
|
||
|
||
To change the per-variable cache size, use the set_var_chunk_cache
|
||
function at any time on an open file. C programmers see
|
||
nc_set_var_chunk_cache(), Fortran 77 programmers see
|
||
NF_SET_VAR_CHUNK_CACHE, ).
|
||
|
||
\page default_chunking_4_1 The Default Chunking Scheme in version 4.1
|
||
(and 4.1.1)
|
||
|
||
When the data writer does not specify chunk sizes for variable, the
|
||
netCDF library has to come up with some default values.
|
||
|
||
The C code below determines the default chunks sizes.
|
||
|
||
For unlimited dimensions, a chunk size of one is always used. Users
|
||
are advised to set chunk sizes for large data sets with one or more
|
||
unlimited dimensions, since a chunk size of one is quite inefficient.
|
||
|
||
For fixed dimensions, the algorithm below finds a size for the chunk
|
||
sizes in each dimension which results in chunks of DEFAULT_CHUNK_SIZE
|
||
(which can be modified in the netCDF configure script).
|
||
|
||
\code
|
||
/* Unlimited dim always gets chunksize of 1. */
|
||
if (dim->unlimited)
|
||
chunksize[d] = 1;
|
||
else
|
||
chunksize[d] = pow((double)DEFAULT_CHUNK_SIZE/type_size,
|
||
1/(double)(var->ndims - unlimdim));
|
||
\endcode
|
||
|
||
\page default_chunking_4_0_1 The Default Chunking Scheme in version 4.0.1
|
||
|
||
In the 4.0.1 release, the default chunk sizes were chosen with a
|
||
different scheme, as demonstrated in the following C code:
|
||
|
||
\code
|
||
/* These are limits for default chunk sizes. (2^16 and 2^20). */
|
||
#define NC_LEN_TOO_BIG 65536
|
||
#define NC_LEN_WAY_TOO_BIG 1048576
|
||
|
||
/* Now we must determine the default chunksize. */
|
||
if (dim->unlimited)
|
||
chunksize[d] = 1;
|
||
else if (dim->len < NC_LEN_TOO_BIG)
|
||
chunksize[d] = dim->len;
|
||
else if (dim->len > NC_LEN_TOO_BIG && dim->len <= NC_LEN_WAY_TOO_BIG)
|
||
chunksize[d] = dim->len / 2 + 1;
|
||
else
|
||
chunksize[d] = NC_LEN_WAY_TOO_BIG;
|
||
\endcode
|
||
|
||
As can be seen from this code, the default chunksize is 1 for
|
||
unlimited dimensions, otherwise it is the full length of the dimension
|
||
(if it is under NC_LEN_TOO_BIG), or half the size of the dimension (if
|
||
it is between NC_LEN_TOO_BIG and NC_LEN_WAY_TOO_BIG), and, if it's
|
||
longer than NC_LEN_WAY_TOO_BIG, it is set to NC_LEN_WAY_TOO_BIG.
|
||
|
||
Our experience is that these defaults work well for small data sets,
|
||
but once variable size reaches the GB range, the user is better off
|
||
determining chunk sizes for their read access patterns.
|
||
|
||
In particular, the idea of using 1 for the chunksize of an unlimited
|
||
dimension works well if the data are being read a record at a
|
||
time. Any other read access patterns will result in slower
|
||
performance.
|
||
|
||
\page chunking_parallel_io Chunking and Parallel I/O
|
||
|
||
When files are opened for read/write parallel I/O access, the chunk
|
||
cache is not used. Therefore it is important to open parallel files
|
||
with read only access when possible, to achieve the best performance.
|
||
|
||
\page bm_file A Utility to Help Benchmark Results: bm_file
|
||
|
||
The bm_file utility may be used to copy files, from one netCDF format
|
||
to another, changing chunking, filter, parallel I/O, and other
|
||
parameters. This program may be used for benchmarking netCDF
|
||
performance for user data files with a range of choices, allowing data
|
||
producers to pick settings that best serve their user base.
|
||
|
||
NetCDF must have been configured with –enable-benchmarks at build time
|
||
for the bm_file program to be built. When built with
|
||
–enable-benchmarks, netCDF will include tests (run with “make check”)
|
||
that will run the bm_file program on sample data files.
|
||
|
||
Since data files and their access patterns vary from case to case,
|
||
these benchmark tests are intended to suggest further use of the
|
||
bm_file program for users.
|
||
|
||
Here's an example of a call to bm_file:
|
||
|
||
\code
|
||
./bm_file -d -f 3 -o tst_elena_out.nc -c 0:-1:0:1024:256:256 tst_elena_int_3D.nc
|
||
\endcode
|
||
|
||
Generally a range of settings must be tested. This is best done with a
|
||
shell script, which calls bf_file repeatedly, to create output like
|
||
this:
|
||
|
||
<pre>
|
||
*** Running benchmarking program bm_file for simple shorts test files, 1D to 6D...
|
||
input format, output_format, input size, output size, meta read time, meta write time, data read time, data write time, enddianness, metadata reread time, data reread time, read rate, write rate, reread rate, deflate, shuffle, chunksize[0], chunksize[1], chunksize[2], chunksize[3]
|
||
1, 4, 200092, 207283, 1613, 1054, 409, 312, 0, 1208, 1551, 488.998, 641.026, 128.949, 0, 0, 100000, 0, 0, 0
|
||
1, 4, 199824, 208093, 1545, 1293, 397, 284, 0, 1382, 1563, 503.053, 703.211, 127.775, 0, 0, 316, 316, 0, 0
|
||
1, 4, 194804, 204260, 1562, 1611, 390, 10704, 0, 1627, 2578, 499.159, 18.1868, 75.5128, 0, 0, 46, 46, 46, 0
|
||
1, 4, 167196, 177744, 1531, 1888, 330, 265, 0, 12888, 1301, 506.188, 630.347, 128.395, 0, 0, 17, 17, 17, 17
|
||
1, 4, 200172, 211821, 1509, 2065, 422, 308, 0, 1979, 1550, 473.934, 649.351, 129.032, 0, 0, 10, 10, 10, 10
|
||
1, 4, 93504, 106272, 1496, 2467, 191, 214, 0, 32208, 809, 488.544, 436.037, 115.342, 0, 0, 6, 6, 6, 6
|
||
*** SUCCESS!!!
|
||
</pre>
|
||
|
||
Such tables are suitable for import into spreadsheets, for easy
|
||
graphing of results.
|
||
|
||
Several test scripts are run during the “make check” of the netCDF
|
||
build, in the nc_test4 directory. The following example may be found
|
||
in nc_test4/run_bm_elena.sh.
|
||
|
||
<pre>
|
||
#!/bin/sh
|
||
|
||
# This shell runs some benchmarks that Elena ran as described here:
|
||
# http://hdfeos.org/workshops/ws06/presentations/Pourmal/HDF5_IO_Perf.pdf
|
||
|
||
# $Id: netcdf.texi,v 1.82 2010/05/15 20:43:13 dmh Exp $
|
||
|
||
set -e
|
||
echo ""
|
||
|
||
echo "*** Testing the benchmarking program bm_file for simple float file, no compression..."
|
||
./bm_file -h -d -f 3 -o tst_elena_out.nc -c 0:-1:0:1024:16:256 tst_elena_int_3D.nc
|
||
./bm_file -d -f 3 -o tst_elena_out.nc -c 0:-1:0:1024:256:256 tst_elena_int_3D.nc
|
||
./bm_file -d -f 3 -o tst_elena_out.nc -c 0:-1:0:512:64:256 tst_elena_int_3D.nc
|
||
./bm_file -d -f 3 -o tst_elena_out.nc -c 0:-1:0:512:256:256 tst_elena_int_3D.nc
|
||
./bm_file -d -f 3 -o tst_elena_out.nc -c 0:-1:0:256:64:256 tst_elena_int_3D.nc
|
||
./bm_file -d -f 3 -o tst_elena_out.nc -c 0:-1:0:256:256:256 tst_elena_int_3D.nc
|
||
echo '*** SUCCESS!!!'
|
||
|
||
exit 0
|
||
</pre>
|
||
|
||
The reading that bm_file does can be tailored to match the expected
|
||
access pattern.
|
||
|
||
The bm_file program is controlled with command line options.
|
||
|
||
<pre>
|
||
./bm_file
|
||
bm_file -v [-s N]|[-t V:S:S:S -u V:C:C:C -r V:I:I:I] -o file_out -f N -h -c V:C:C,V:C:C:C -d -m -p -i -e 1|2 file
|
||
[-v] Verbose
|
||
[-o file] Output file name
|
||
[-f N] Output format (1 - classic, 2 - 64-bit offset, 3 - netCDF-4, 4 - netCDF4/CLASSIC)
|
||
[-h] Print output header
|
||
[-c V:Z:S:C:C:C[,V:Z:S:C:C:C, etc.]] Deflate, shuffle, and chunking parameters for vars
|
||
[-t V:S:S:S[,V:S:S:S, etc.]] Starts for reads/writes
|
||
[-u V:C:C:C[,V:C:C:C, etc.]] Counts for reads/writes
|
||
[-r V:I:I:I[,V:I:I:I, etc.]] Incs for reads/writes
|
||
[-d] Doublecheck output by rereading each value
|
||
[-m] Do compare of each data value during doublecheck (slow for large files!)
|
||
[-p] Use parallel I/O
|
||
[-s N] Denom of fraction of slowest varying dimension read.
|
||
[-i] Use MPIIO (only relevant for parallel builds).
|
||
[-e 1|2] Set the endianness of output (1=little 2=big).
|
||
file Name of netCDF file
|
||
</pre>
|
||
|
||
\page cdl_syntax CDL Syntax
|
||
|
||
Below is an example of CDL, describing a netCDF dataset with several
|
||
named dimensions (lat, lon, time), variables (z, t, p, rh, lat, lon,
|
||
time), variable attributes (units, _FillValue, valid_range), and some
|
||
data.
|
||
|
||
\code
|
||
netcdf foo { // example netCDF specification in CDL
|
||
|
||
dimensions:
|
||
lat = 10, lon = 5, time = unlimited;
|
||
|
||
variables:
|
||
int lat(lat), lon(lon), time(time);
|
||
float z(time,lat,lon), t(time,lat,lon);
|
||
double p(time,lat,lon);
|
||
int rh(time,lat,lon);
|
||
|
||
lat:units = "degrees_north";
|
||
lon:units = "degrees_east";
|
||
time:units = "seconds";
|
||
z:units = "meters";
|
||
z:valid_range = 0., 5000.;
|
||
p:_FillValue = -9999.;
|
||
rh:_FillValue = -1;
|
||
|
||
data:
|
||
lat = 0, 10, 20, 30, 40, 50, 60, 70, 80, 90;
|
||
lon = -140, -118, -96, -84, -52;
|
||
}
|
||
\endcode
|
||
|
||
All CDL statements are terminated by a semicolon. Spaces, tabs, and
|
||
newlines can be used freely for readability. Comments may follow the
|
||
double slash characters '//' on any line.
|
||
|
||
A CDL description for a classic model file consists of three optional
|
||
parts: dimensions, variables, and data. The variable part may contain
|
||
variable declarations and attribute assignments. For the enhanced
|
||
model supported by netCDF-4, a CDL decription may also includes
|
||
groups, subgroups, and user-defined types.
|
||
|
||
A dimension is used to define the shape of one or more of the
|
||
multidimensional variables described by the CDL description. A
|
||
dimension has a name and a length. At most one dimension in a classic
|
||
CDL description can have the unlimited length, which means a variable
|
||
using this dimension can grow to any length (like a record number in a
|
||
file). Any number of dimensions can be declared of unlimited length in
|
||
CDL for an enhanced model file.
|
||
|
||
A variable represents a multidimensional array of values of the same
|
||
type. A variable has a name, a data type, and a shape described by its
|
||
list of dimensions. Each variable may also have associated attributes
|
||
(see below) as well as data values. The name, data type, and shape of
|
||
a variable are specified by its declaration in the variable section of
|
||
a CDL description. A variable may have the same name as a dimension;
|
||
by convention such a variable contains coordinates of the dimension it
|
||
names.
|
||
|
||
An attribute contains information about a variable or about the whole
|
||
netCDF dataset or containing group. Attributes may be used to specify
|
||
such properties as units, special values, maximum and minimum valid
|
||
values, and packing parameters. Attribute information is represented
|
||
by single values or one-dimensional arrays of values. For example,
|
||
“units” might be an attribute represented by a string such as
|
||
“celsius”. An attribute has an associated variable, a name, a data
|
||
type, a length, and a value. In contrast to variables that are
|
||
intended for data, attributes are intended for ancillary data or
|
||
metadata (data about data).
|
||
|
||
In CDL, an attribute is designated by a variable and attribute name,
|
||
separated by a colon (':'). It is possible to assign global attributes
|
||
to the netCDF dataset as a whole by omitting the variable name and
|
||
beginning the attribute name with a colon (':'). The data type of an
|
||
attribute in CDL, if not explicitly specified, is derived from the
|
||
type of the value assigned to it. The length of an attribute is the
|
||
number of data values or the number of characters in the character
|
||
string assigned to it. Multiple values are assigned to non-character
|
||
attributes by separating the values with commas (','). All values
|
||
assigned to an attribute must be of the same type. In the netCDF-4
|
||
enhanced model, attributes may be declared to be of user-defined type,
|
||
like variables.
|
||
|
||
In CDL, just as for netCDF, the names of dimensions, variables and
|
||
attributes (and, in netCDF-4 files, groups, user-defined types,
|
||
compound member names, and enumeration symbols) consist of arbitrary
|
||
sequences of alphanumeric characters, underscore '_', period '.', plus
|
||
'+', hyphen '-', or at sign '@', but beginning with a letter or
|
||
underscore. However names commencing with underscore are reserved for
|
||
system use. Case is significant in netCDF names. A zero-length name is
|
||
not allowed. Some widely used conventions restrict names to only
|
||
alphanumeric characters or underscores. Names that have trailing space
|
||
characters are also not permitted.
|
||
|
||
Beginning with versions 3.6.3 and 4.0, names may also include UTF-8
|
||
encoded Unicode characters as well as other special characters, except
|
||
for the character '/', which may not appear in a name (because it is
|
||
reserved for path names of nested groups). In CDL, most special
|
||
characters are escaped with a backslash '\' character, but that
|
||
character is not actually part of the netCDF name. The special
|
||
characters that do not need to be escaped in CDL names are underscore
|
||
'_', period '.', plus '+', hyphen '-', or at sign '@'. For the formal
|
||
specification of CDL name syntax See Format. Note that by using
|
||
special characters in names, you may make your data not compliant with
|
||
conventions that have more stringent requirements on valid names for
|
||
netCDF components, for example the CF Conventions.
|
||
|
||
The names for the primitive data types are reserved words in CDL, so
|
||
names of variables, dimensions, and attributes must not be primitive
|
||
type names.
|
||
|
||
The optional data section of a CDL description is where netCDF
|
||
variables may be initialized. The syntax of an initialization is
|
||
simple:
|
||
|
||
\code
|
||
variable = value_1, value_2, ...;
|
||
\endcode
|
||
|
||
The comma-delimited list of constants may be separated by spaces,
|
||
tabs, and newlines. For multidimensional arrays, the last dimension
|
||
varies fastest. Thus, row-order rather than column order is used for
|
||
matrices. If fewer values are supplied than are needed to fill a
|
||
variable, it is extended with the fill value. The types of constants
|
||
need not match the type declared for a variable; coercions are done to
|
||
convert integers to floating point, for example. All meaningful type
|
||
conversions among primitive types are supported.
|
||
|
||
A special notation for fill values is supported: the ‘_’ character
|
||
designates a fill value for variables.
|
||
|
||
\page cdl_data_types CDL Data Types
|
||
|
||
The CDL primitive data types for the classic model are:
|
||
- char - Characters.
|
||
- byte - Eight-bit integers.
|
||
- short - 16-bit signed integers.
|
||
- int - 32-bit signed integers.
|
||
- long - (Deprecated, synonymous with int)
|
||
- float - IEEE single-precision floating point (32 bits).
|
||
- real - (Synonymous with float).
|
||
- double - IEEE double-precision floating point (64 bits).
|
||
|
||
NetCDF-4 supports the additional primitive types:
|
||
- ubyte - Unsigned eight-bit integers.
|
||
- ushort - Unsigned 16-bit integers.
|
||
- uint - Unsigned 32-bit integers.
|
||
- int64 - 64-bit singed integers.
|
||
- uint64 - Unsigned 64-bit singed integers.
|
||
- string - Variable-length string of characters
|
||
|
||
Except for the added data-type byte, CDL supports the same primitive
|
||
data types as C. For backward compatibility, in declarations primitive
|
||
type names may be specified in either upper or lower case.
|
||
|
||
The byte type differs from the char type in that it is intended for
|
||
numeric data, and the zero byte has no special significance, as it may
|
||
for character data. The short type holds values between -32768 and
|
||
32767. The ushort type holds values between 0 and 65536. The int type
|
||
can hold values between -2147483648 and 2147483647. The uint type
|
||
holds values between 0 and 4294967296. The int64 type can hold values
|
||
between -9223372036854775808 and 9223372036854775807. The uint64 type
|
||
can hold values between 0 and 18446744073709551616.
|
||
|
||
The float type can hold values between about -3.4+38 and 3.4+38, with
|
||
external representation as 32-bit IEEE normalized single-precision
|
||
floating-point numbers. The double type can hold values between about
|
||
-1.7+308 and 1.7+308, with external representation as 64-bit IEEE
|
||
standard normalized double-precision, floating-point numbers. The
|
||
string type holds variable length strings.
|
||
|
||
\page cdl_constants CDL Notation for Data Constants
|
||
|
||
This section describes the CDL notation for constants.
|
||
|
||
Attributes are initialized in the variables section of a CDL
|
||
description by providing a list of constants that determines the
|
||
attribute's length and type (if primitive and not explicitly
|
||
declared). CDL defines a syntax for constant values that permits
|
||
distinguishing among different netCDF primitive types. The syntax for
|
||
CDL constants is similar to C syntax, with type suffixes appended to
|
||
bytes, shorts, and floats to distinguish them from ints and doubles.
|
||
|
||
A byte constant is represented by a single character or multiple
|
||
character escape sequence enclosed in single quotes. For example:
|
||
|
||
\code
|
||
'a' // ASCII a
|
||
'\0' // a zero byte
|
||
'\n' // ASCII newline character
|
||
'\33' // ASCII escape character (33 octal)
|
||
'\x2b' // ASCII plus (2b hex)
|
||
'\376' // 377 octal = -127 (or 254) decimal
|
||
\endcode
|
||
|
||
Character constants are enclosed in double quotes. A character array
|
||
may be represented as a string enclosed in double quotes. Multiple
|
||
strings are concatenated into a single array of characters, permitting
|
||
long character arrays to appear on multiple lines. To support multiple
|
||
variable-length string values, a conventional delimiter such as ','
|
||
may be used, but interpretation of any such convention for a string
|
||
delimiter must be implemented in software above the netCDF library
|
||
layer. The usual escape conventions for C strings are honored. For
|
||
example:
|
||
|
||
\code
|
||
"a" // ASCII 'a'
|
||
"Two\nlines\n" // a 10-character string with two embedded newlines
|
||
"a bell:\007" // a string containing an ASCII bell
|
||
"ab","cde" // the same as "abcde"
|
||
\endcode
|
||
|
||
The form of a short constant is an integer constant with an 's' or 'S'
|
||
appended. If a short constant begins with '0', it is interpreted as
|
||
octal. When it begins with '0x', it is interpreted as a hexadecimal
|
||
constant. For example:
|
||
|
||
\code
|
||
2s // a short 2
|
||
0123s // octal
|
||
0x7ffs // hexadecimal
|
||
\endcode
|
||
|
||
The form of an int constant is an ordinary integer constant. If an int
|
||
constant begins with '0', it is interpreted as octal. When it begins
|
||
with '0x', it is interpreted as a hexadecimal constant. Examples of
|
||
valid int constants include:
|
||
|
||
\code
|
||
-2
|
||
0123 // octal
|
||
0x7ff // hexadecimal
|
||
1234567890L // deprecated, uses old long suffix
|
||
\endcode
|
||
|
||
The float type is appropriate for representing data with about seven
|
||
significant digits of precision. The form of a float constant is the
|
||
same as a C floating-point constant with an 'f' or 'F' appended. A
|
||
decimal point is required in a CDL float to distinguish it from an
|
||
integer. For example, the following are all acceptable float
|
||
constants:
|
||
|
||
\code
|
||
-2.0f
|
||
3.14159265358979f // will be truncated to less precision
|
||
1.f
|
||
.1f
|
||
\endcode
|
||
|
||
The double type is appropriate for representing floating-point data
|
||
with about 16 significant digits of precision. The form of a double
|
||
constant is the same as a C floating-point constant. An optional 'd'
|
||
or 'D' may be appended. A decimal point is required in a CDL double to
|
||
distinguish it from an integer. For example, the following are all
|
||
acceptable double constants:
|
||
|
||
\code
|
||
-2.0
|
||
3.141592653589793
|
||
1.0e-20
|
||
1.d
|
||
\endcode
|
||
|
||
\page guide_ncgen ncgen
|
||
|
||
The ncgen tool generates a netCDF file or a C or FORTRAN program that
|
||
creates a netCDF dataset. If no options are specified in invoking
|
||
ncgen, the program merely checks the syntax of the CDL input,
|
||
producing error messages for any violations of CDL syntax.
|
||
|
||
The ncgen tool is now is capable of producing netcdf-4 files. It
|
||
operates essentially identically to the original ncgen.
|
||
|
||
The CDL input to ncgen may include data model constructs from the
|
||
netcdf- data model. In particular, it includes new primitive types
|
||
such as unsigned integers and strings, opaque data, enumerations, and
|
||
user-defined constructs using vlen and compound types. The ncgen man
|
||
page should be consulted for more detailed information.
|
||
|
||
UNIX syntax for invoking ncgen:
|
||
|
||
\code
|
||
ncgen [-b] [-o netcdf-file] [-c] [-f] [-k<kind>] [-l<language>] [-x] [input-file]
|
||
\endcode
|
||
|
||
where:
|
||
|
||
<pre>
|
||
-b
|
||
Create a (binary) netCDF file. If the '-o' option is absent, a default
|
||
file name will be constructed from the netCDF name (specified after
|
||
the netcdf keyword in the input) by appending the '.nc'
|
||
extension. Warning: if a file already exists with the specified name
|
||
it will be overwritten.
|
||
|
||
-o netcdf-file
|
||
Name for the netCDF file created. If this option is specified, it
|
||
implies the '-b' option. (This option is necessary because netCDF
|
||
files are direct-access files created with seek calls, and hence
|
||
cannot be written to standard output.)
|
||
|
||
-c
|
||
Generate C source code that will create a netCDF dataset matching the
|
||
netCDF specification. The C source code is written to standard
|
||
output. This is only useful for relatively small CDL files, since all
|
||
the data is included in variable initializations in the generated
|
||
program. The -c flag is deprecated and the -lc flag should be used
|
||
intstead.
|
||
|
||
-f
|
||
Generate FORTRAN source code that will create a netCDF dataset
|
||
matching the netCDF specification. The FORTRAN source code is written
|
||
to standard output. This is only useful for relatively small CDL
|
||
files, since all the data is included in variable initializations in
|
||
the generated program. The -f flag is deprecated and the -lf77 flag
|
||
should be used intstead.
|
||
|
||
-k
|
||
The -k file specifies the kind of netCDF file to generate. The
|
||
arguments to the -k flag can be as follows.
|
||
1, classic – Produce a netcdf classic file format file.
|
||
2, 64-bit-offset, '64-bit offset' – Produce a netcdf 64 bit classic file format file.
|
||
3, hdf5, netCDF-4, enhanced – Produce a netcdf-4 format file.
|
||
4, hdf5-nc3, 'netCDF-4 classic model', enhanced-nc3 – Produce a netcdf-4 file format, but restricted to netcdf-3 classic CDL input.
|
||
|
||
Note that the -v flag is a deprecated alias for -k.
|
||
|
||
-l
|
||
The -l file specifies that ncgen should output (to standard output)
|
||
the text of a program that, when compiled and executed, will produce
|
||
the corresponding binary .nc file. The arguments to the -l flag can be
|
||
as follows.
|
||
c|C => C language output.
|
||
f77|fortran77 => FORTRAN 77 language output; note that currently only the classic model is supported for fortran output.
|
||
cml|CML => (experimental) NcML language output
|
||
j|java => (experimental) Java language output; the generated java code targets the existing Unidata Java interface, which means that only the classic model is supported.
|
||
|
||
|
||
-x
|
||
Use “no fill” mode, omitting the initialization of variable values
|
||
with fill values. This can make the creation of large files much
|
||
faster, but it will also eliminate the possibility of detecting the
|
||
inadvertent reading of values that haven't been written.
|
||
</pre>
|
||
|
||
<h1>Examples</h1>
|
||
|
||
Check the syntax of the CDL file foo.cdl:
|
||
|
||
\code
|
||
ncgen foo.cdl
|
||
\endcode
|
||
|
||
From the CDL file foo.cdl, generate an equivalent binary netCDF file
|
||
named bar.nc:
|
||
|
||
\code
|
||
ncgen -o bar.nc foo.cdl
|
||
\endcode
|
||
|
||
From the CDL file foo.cdl, generate a C program containing netCDF
|
||
function invocations that will create an equivalent binary netCDF
|
||
dataset:
|
||
|
||
\code
|
||
ncgen -c foo.cdl > foo.c
|
||
\endcode
|
||
|
||
\page guide_ncdump ncdump
|
||
|
||
The \b ncdump utility generates a text representation of a specified
|
||
netCDF file on standard output, optionally excluding some or all of
|
||
the variable data in the output. The text representation is in a form
|
||
called CDL (network Common Data form Language) that can be viewed,
|
||
edited, or serve as input to \b ncgen, a companion program that can
|
||
generate a binary netCDF file from a CDL file. Hence \b ncgen and \b
|
||
ncdump can be used as inverses to transform the data representation
|
||
between binary and text representations. See \b ncgen documentation
|
||
for a description of CDL and netCDF representations.
|
||
|
||
\b ncdump may also be used to determine what kind of netCDF file
|
||
is used (which variant of the netCDF file format) with the -k
|
||
option.
|
||
|
||
If DAP support was enabled when \b ncdump was built, the file name may
|
||
specify a DAP URL. This allows \b ncdump to access data sources from
|
||
DAP servers, including data in other formats than netCDF. When used
|
||
with DAP URLs, \b ncdump shows the translation from the DAP data
|
||
model to the netCDF data model.
|
||
|
||
\b ncdump may also be used as a simple browser for netCDF data files,
|
||
to display the dimension names and lengths; variable names, types, and
|
||
shapes; attribute names and values; and optionally, the values of data
|
||
for all variables or selected variables in a netCDF file. For
|
||
netCDF-4 files, groups and user-defined types are also included in \b
|
||
ncdump output.
|
||
|
||
\b ncdump uses '_' to represent data values that are equal to the
|
||
'_FillValue' attribute for a variable, intended to represent
|
||
data that has not yet been written. If a variable has no
|
||
'_FillValue' attribute, the default fill value for the variable
|
||
type is used unless the variable is of byte type.
|
||
|
||
\b ncdump defines a default display format used for each type of
|
||
netCDF data, but this can be changed if a `C_format' attribute
|
||
is defined for a netCDF variable. In this case, \b ncdump will
|
||
use the `C_format' attribute to format each value. For
|
||
example, if floating-point data for the netCDF variable `Z' is
|
||
known to be accurate to only three significant digits, it would
|
||
be appropriate to use the variable attribute
|
||
|
||
\code
|
||
Z:C_format = "%.3g"
|
||
\endcode
|
||
|
||
@section OPTIONS
|
||
|
||
@par -c
|
||
Show the values of \e coordinate \e variables (1D variables with the
|
||
same names as dimensions) as well as the declarations of all
|
||
dimensions, variables, attribute values, groups, and user-defined
|
||
types. Data values of non-coordinate variables are not included in
|
||
the output. This is usually the most suitable option to use for a
|
||
brief look at the structure and contents of a netCDF file.
|
||
|
||
@par -h
|
||
Show only the header information in the output, that is, output only
|
||
the declarations for the netCDF dimensions, variables, attributes,
|
||
groups, and user-defined types of the input file, but no data values
|
||
for any variables. The output is identical to using the '-c' option
|
||
except that the values of coordinate variables are not included. (At
|
||
most one of '-c' or '-h' options may be present.)
|
||
|
||
@par -v \a var1,...
|
||
@par
|
||
The output will include data values for the specified variables, in
|
||
addition to the declarations of all dimensions, variables, and
|
||
attributes. One or more variables must be specified by name in the
|
||
comma-delimited list following this option. The list must be a single
|
||
argument to the command, hence cannot contain unescaped blanks or
|
||
other white space characters. The named variables must be valid netCDF
|
||
variables in the input-file. A variable within a group in a netCDF-4
|
||
file may be specified with an absolute path name, such as
|
||
`/GroupA/GroupA2/var'. Use of a relative path name such as `var' or
|
||
`grp/var' specifies all matching variable names in the file. The
|
||
default, without this option and in the absence of the '-c' or '-h'
|
||
options, is to include data values for \e all variables in the output.
|
||
|
||
@par -b [c|f]
|
||
A brief annotation in the form of a CDL comment (text beginning with
|
||
the characters '//') will be included in the data section of the
|
||
output for each 'row' of data, to help identify data values for
|
||
multidimensional variables. If lang begins with 'C' or 'c', then C
|
||
language conventions will be used (zero-based indices, last dimension
|
||
varying fastest). If lang begins with 'F' or 'f', then FORTRAN
|
||
language conventions will be used (one-based indices, first dimension
|
||
varying fastest). In either case, the data will be presented in the
|
||
same order; only the annotations will differ. This option may be
|
||
useful for browsing through large volumes of multidimensional data.
|
||
|
||
@par -f [c|f]
|
||
Full annotations in the form of trailing CDL comments (text beginning
|
||
with the characters '//') for every data value (except individual
|
||
characters in character arrays) will be included in the data
|
||
section. If lang begins with 'C' or 'c', then C language conventions
|
||
will be used. If lang begins with 'F' or 'f', then FORTRAN language
|
||
conventions will be used. In either case, the data will be presented
|
||
in the same order; only the annotations will differ. This option may
|
||
be useful for piping data into other filters, since each data value
|
||
appears on a separate line, fully identified. (At most one of '-b' or
|
||
'-f' options may be present.)
|
||
|
||
@par -l \e length
|
||
|
||
@par
|
||
Changes the default maximum line length (80) used in formatting lists
|
||
of non-character data values.
|
||
|
||
@par -n \e name
|
||
|
||
@par
|
||
CDL requires a name for a netCDF file, for use by 'ncgen -b' in
|
||
generating a default netCDF file name. By default, \b ncdump
|
||
constructs this name from the last component of the file name of
|
||
the input netCDF file by stripping off any extension it has. Use
|
||
the '-n' option to specify a different name. Although the output
|
||
file name used by 'ncgen -b' can be specified, it may be wise to
|
||
have \b ncdump change the default name to avoid inadvertently
|
||
overwriting a valuable netCDF file when using \b ncdump, editing the
|
||
resulting CDL file, and using 'ncgen -b' to generate a new netCDF
|
||
file from the edited CDL file.
|
||
|
||
@par -p \e float_digits[, \e double_digits ]
|
||
|
||
@par
|
||
Specifies default precision (number of significant digits) to use in
|
||
displaying floating-point or double precision data values for
|
||
attributes and variables. If specified, this value overrides the value
|
||
of the C_format attribute, if any, for a variable. Floating-point data
|
||
will be displayed with \e float_digits significant digits. If \e
|
||
double_digits is also specified, double-precision values will be
|
||
displayed with that many significant digits. In the absence of any
|
||
'-p' specifications, floating-point and double-precision data are
|
||
displayed with 7 and 15 significant digits respectively. CDL files can
|
||
be made smaller if less precision is required. If both floating-point
|
||
and double precisions are specified, the two values must appear
|
||
separated by a comma (no blanks) as a single argument to the command.
|
||
(To represent every last bit of precision in a CDL file for all
|
||
possible floating-point values would require '-p 9,17'.)
|
||
|
||
@par -k
|
||
Show \e kind of netCDF file, that is which format variant the file uses.
|
||
Other options are ignored if this option is specified. Output will be
|
||
one of 'classic'. '64-bit offset', 'netCDF-4', or 'netCDF-4 classic
|
||
model'.
|
||
|
||
@par -s
|
||
Specifies that \e special virtual attributes should be output for the
|
||
file format variant and for variable properties such as
|
||
compression, chunking, and other properties specific to the format
|
||
implementation that are primarily related to performance rather
|
||
than the logical schema of the data. All the special virtual
|
||
attributes begin with '_' followed by an upper-case
|
||
letter. Currently they include the global attribute '_Format' and
|
||
the variable attributes '_ChunkSizes', '_DeflateLevel',
|
||
'_Endianness', '_Fletcher32', '_NoFill', '_Shuffle', and '_Storage'.
|
||
The \b ncgen utility recognizes these attributes and
|
||
supports them appropriately.
|
||
|
||
@par -t
|
||
Controls display of time data, if stored in a variable that uses a
|
||
udunits compliant time representation such as 'days since 1970-01-01'
|
||
or 'seconds since 2009-03-15 12:01:17'. If this option is specified,
|
||
time values are displayed as a human-readable date-time strings rather
|
||
than numerical values, interpreted in terms of a 'calendar' variable
|
||
attribute, if specified. For numeric attributes of time variables,
|
||
the human-readable time value is displayed after the actual value, in
|
||
an associated CDL comment. Calendar attribute values interpreted with
|
||
this option include the CF Conventions values 'gregorian' or
|
||
'standard', 'proleptic_gregorian', 'noleap' or '365_day', 'all_leap'
|
||
or '366_day', '360_day', and 'julian'.
|
||
|
||
@par -i
|
||
Same as the '-t' option, except output time data as date-time strings
|
||
with ISO-8601 standard 'T' separator, instead of a blank.
|
||
|
||
@par -g \e grp1,...
|
||
|
||
@par
|
||
The output will include data values only for the specified groups.
|
||
One or more groups must be specified by name in the comma-delimited
|
||
list following this option. The list must be a single argument to the
|
||
command. The named groups must be valid netCDF groups in the
|
||
input-file. The default, without this option and in the absence of the
|
||
'-c' or '-h' options, is to include data values for all groups in the
|
||
output.
|
||
|
||
@par -w
|
||
For file names that request remote access using DAP URLs, access data
|
||
with client-side caching of entire variables.
|
||
|
||
@par -x
|
||
Output XML (NcML) instead of CDL. The NcML does not include data values.
|
||
The NcML output option currently only works for netCDF classic model data.
|
||
|
||
@section EXAMPLES
|
||
|
||
Look at the structure of the data in the netCDF file foo.nc:
|
||
|
||
\code
|
||
ncdump -c foo.nc
|
||
\endcode
|
||
|
||
Produce an annotated CDL version of the structure and data in the
|
||
netCDF file foo.nc, using C-style indexing for the annotations:
|
||
|
||
\code
|
||
ncdump -b c foo.nc > foo.cdl
|
||
\endcode
|
||
|
||
Output data for only the variables uwind and vwind from the netCDF
|
||
file foo.nc, and show the floating-point data with only three
|
||
significant digits of precision:
|
||
|
||
\code
|
||
ncdump -v uwind,vwind -p 3 foo.nc
|
||
\endcode
|
||
|
||
Produce a fully-annotated (one data value per line) listing of the
|
||
data for the variable omega, using FORTRAN conventions for indices,
|
||
and changing the netCDF file name in the resulting CDL file to
|
||
omega:
|
||
|
||
\code
|
||
ncdump -v omega -f fortran -n omega foo.nc > Z.cdl
|
||
\endcode
|
||
|
||
Examine the translated DDS for the DAP source from the specified URL:
|
||
|
||
\code
|
||
ncdump -h http://test.opendap.org:8080/dods/dts/test.01
|
||
\endcode
|
||
|
||
Without dumping all the data, show the special virtual attributes that indicate
|
||
performance-related characterisitics of a netCDF-4 file:
|
||
|
||
\code
|
||
ncdump -h -s nc4file.nc
|
||
\endcode
|
||
|
||
@section see_also SEE ALSO
|
||
|
||
ncgen(1), netcdf(3)
|
||
|
||
@section string_note NOTE ON STRING OUTPUT
|
||
|
||
For classic, 64-bit offset or netCDF-4 classic model data, \b ncdump
|
||
generates line breaks after embedded newlines in displaying character
|
||
data. This is not done for netCDF-4 files, because netCDF-4 supports
|
||
arrays of real strings of varying length.
|
||
|
||
\page guide_nccopy nccopy
|
||
|
||
The \b nccopy utility copies an input netCDF file in any supported
|
||
format variant to an output netCDF file, optionally converting the
|
||
output to any compatible netCDF format variant, compressing the data,
|
||
or rechunking the data. For example, if built with the netCDF-3
|
||
library, a netCDF classic file may be copied to a netCDF 64-bit offset
|
||
file, permitting larger variables. If built with the netCDF-4
|
||
library, a netCDF classic file may be copied to a netCDF-4 file or to
|
||
a netCDF-4 classic model file as well, permitting data compression,
|
||
efficient schema changes, larger variable sizes, and use of other
|
||
netCDF-4 features.
|
||
|
||
\b nccopy also serves as an example of a generic netCDF-4 program,
|
||
with its ability to read any valid netCDF file and handle nested
|
||
groups, strings, and user-defined types, including arbitrarily
|
||
nested compound types, variable-length types, and data of any valid
|
||
netCDF-4 type.
|
||
|
||
If DAP support was enabled when \b nccopy was built, the file name may
|
||
specify a DAP URL. This may be used to convert data on DAP servers to
|
||
local netCDF files.
|
||
|
||
UNIX syntax for invoking nccopy:
|
||
|
||
\code
|
||
nccopy [-k kind] [-d n] [-s] [-u] [-w] [-c chunkspec] [-m bufsize]
|
||
[-h chunk_cache] [-e cache_elems] [-r] infile outfile
|
||
\endcode
|
||
|
||
where:
|
||
@par -k \e kind
|
||
Specifies the kind of file to be created (that is, the format variant)
|
||
and, by inference, the data model (i.e. netcdf-3 (classic) versus
|
||
netcdf-4 (enhanced)). The possible arguments are as follows. \n
|
||
'1' or 'classic' => netCDF classic format \n
|
||
'2', '64-bit-offset', or '64-bit offset' => netCDF 64-bit format \n
|
||
'3', 'hdf5', 'netCDF-4', or 'enhanced' => netCDF-4 format (enhanced data model) \n
|
||
'4', 'hdf5-nc3', 'netCDF-4 classic model', or 'enhanced-nc3' => netCDF-4 classic model format \n
|
||
|
||
@par
|
||
If no value for -k is specified, then the output will use the same
|
||
format as the input, except if the input is classic or 64-bit offset
|
||
and either chunking or compression is specified, in which case the
|
||
output will be netCDF-4 classic model format. Note that attempting
|
||
some kinds of format conversion will result in an error, if the
|
||
conversion is not possible. For example, an attempt to copy a
|
||
netCDF-4 file that uses features of the enhanced model, such as groups
|
||
or variable-length strings, to any of the other kinds of netCDF
|
||
formats that use the classic model will result in an error.
|
||
|
||
@par -d \e n
|
||
For netCDF-4 output, including netCDF-4 classic model, specify
|
||
deflation level (level of compression) for variable data output. 0
|
||
corresponds to no compression and 9 to maximum compression, with
|
||
higher levels of compression requiring marginally more time to
|
||
compress or uncompress than lower levels. Compression achieved may
|
||
also depend on output chunking parameters. If this option is
|
||
specified for a classic format or 64-bit offset format input file, it
|
||
is not necessary to also specify that the output should be netCDF-4
|
||
classic model, as that will be the default. If this option is not
|
||
specified and the input file has compressed variables, the compression
|
||
will still be preserved in the output, using the same chunking as in
|
||
the input by default.
|
||
|
||
@par
|
||
Note that \b nccopy requires all variables to be compressed using the
|
||
same compression level, but the API has no such restriction. With a
|
||
program you can customize compression for each variable independently.
|
||
|
||
@par -s
|
||
For netCDF-4 output, including netCDF-4 classic model,
|
||
specify shuffling of variable data bytes before compression or after
|
||
decompression. This option is ignored unless a non-zero deflation
|
||
level is specified. Turning shuffling on sometimes improves
|
||
compression.
|
||
|
||
@par -u
|
||
Convert any unlimited size dimensions in the input to fixed size
|
||
dimensions in the output. This can speed up variable-at-a-time
|
||
access, but slow down record-at-a-time access to multiple variables
|
||
along an unlimited dimension.
|
||
|
||
@par -w
|
||
Keep output in memory (as a diskless netCDF file) until output is
|
||
closed, at which time output file is written to disk. This can
|
||
greatly speedup operations such as converting unlimited dimension to
|
||
fixed size (-u option), chunking, rechunking, or compressing the
|
||
input. It requires that available memory is large enough to hold the
|
||
output file. This option may provide a larger speedup than careful
|
||
tuning of the -m, -h, or -e options, and it's certainly a lot simpler.
|
||
|
||
@par -c \e chunkspec
|
||
@par
|
||
For netCDF-4 output, including netCDF-4 classic model, specify
|
||
chunking (multidimensional tiling) for variable data in the output.
|
||
This is useful to specify the units of disk access, compression, or
|
||
other filters such as checksums. Changing the chunking in a netCDF
|
||
file can also greatly speedup access, by choosing chunk shapes that
|
||
are appropriate for the most common access patterns.
|
||
|
||
@par
|
||
The chunkspec argument is a string of comma-separated associations,
|
||
each specifying a dimension name, a '/' character, and optionally the
|
||
corresponding chunk length for that dimension. No blanks should
|
||
appear in the chunkspec string, except possibly escaped blanks that
|
||
are part of a dimension name. A chunkspec must name at least one
|
||
dimension, and may omit dimensions which are not to be chunked or for
|
||
which the default chunk length is desired. If a dimension name is
|
||
followed by a '/' character but no subsequent chunk length, the actual
|
||
dimension length is assumed. If copying a classic model file to a
|
||
netCDF-4 output file and not naming all dimensions in the chunkspec,
|
||
unnamed dimensions will also use the actual dimension length for the
|
||
chunk length. An example of a chunkspec for variables that use
|
||
'm' and 'n' dimensions might be 'm/100,n/200' to specify 100 by 200
|
||
chunks. To see the chunking resulting from copying with a chunkspec,
|
||
use the '-s' option of ncdump on the output file.
|
||
|
||
@par
|
||
Note that \b nccopy requires variables that share a dimension to also
|
||
share the chunk size associated with that dimension, but the
|
||
programming interface has no such restriction. If you need to
|
||
customize chunking for variables independently, you will need to use
|
||
the library API in a custom utility program.
|
||
|
||
@par -m \e bufsize
|
||
@par
|
||
An integer or floating-point number that specifies the size, in bytes,
|
||
of the copy buffer used to copy large variables. A suffix of K, M, G,
|
||
or T multiplies the copy buffer size by one thousand, million,
|
||
billion, or trillion, respectively. The default is 5 Mbytes,
|
||
but will be increased if necessary to hold at least one chunk of
|
||
netCDF-4 chunked variables in the input file. You may want to specify
|
||
a value larger than the default for copying large files over high
|
||
latency networks. Using the '-w' option may provide better
|
||
performance, if the output fits in memory.
|
||
|
||
@par -e \e chunk_cache
|
||
@par
|
||
For netCDF-4 output, including netCDF-4 classic model, an integer or
|
||
floating-point number that specifies the size in bytes of chunk cache
|
||
for chunked variables. This is not a property of the file, but merely
|
||
a performance tuning parameter for avoiding compressing or
|
||
decompressing the same data multiple times while copying and changing
|
||
chunk shapes. A suffix of K, M, G, or T multiplies the chunk cache
|
||
size by one thousand, million, billion, or trillion, respectively.
|
||
The default is 4.194304 Mbytes (or whatever was specified for the
|
||
configure-time constant CHUNK_CACHE_SIZE when the netCDF library was
|
||
built). Ideally, the \b nccopy utility should accept only one memory
|
||
buffer size and divide it optimally between a copy buffer and chunk
|
||
cache, but no general algorithm for computing the optimum chunk cache
|
||
size has been implemented yet. Using the '-w' option may provide
|
||
better performance, if the output fits in memory.
|
||
|
||
@par -h \e cache_elems
|
||
@par
|
||
For netCDF-4 output, including netCDF-4 classic model, specifies
|
||
number of elements that the chunk cache can hold. This is not a
|
||
property of the file, but merely a performance tuning parameter for
|
||
avoiding compressing or decompressing the same data multiple times
|
||
while copying and changing chunk shapes. The default is 1009 (or
|
||
whatever was specified for the configure-time constant
|
||
CHUNK_CACHE_NELEMS when the netCDF library was built). Ideally, the
|
||
\b nccopy utility should determine an optimum value for this parameter,
|
||
but no general algorithm for computing the optimum number of chunk
|
||
cache elements has been implemented yet.
|
||
|
||
@par -r
|
||
Read netCDF classic or 64-bit offset input file into a diskless netCDF
|
||
file in memory before copying. Requires that input file be small
|
||
enough to fit into memory. For \b nccopy, this doesn't seem to provide
|
||
any significant speedup, so may not be a useful option.
|
||
|
||
@section EXAMPLES
|
||
|
||
@subsection simple_copy Simple Copy
|
||
Make a copy of foo1.nc, a netCDF file of any type, to
|
||
foo2.nc, a netCDF file of the same type:
|
||
\code
|
||
nccopy foo1.nc foo2.nc
|
||
\endcode
|
||
Note that the above copy will not be as fast as use of cp or other
|
||
simple copy utility, because the file is copied using only the netCDF
|
||
API. If the input file has extra bytes after the end of the netCDF
|
||
data, those will not be copied, because they are not accessible
|
||
through the netCDF interface. If the original file was generated in
|
||
'No fill' mode so that fill values are not stored for padding for data
|
||
alignment, the output file may have different padding bytes.
|
||
|
||
@subsection uncompress Uncompress Data
|
||
Convert a netCDF-4 classic model file, compressed.nc, that uses
|
||
compression, to a netCDF-3 file classic.nc:
|
||
\code
|
||
nccopy -k classic compressed.nc classic.nc
|
||
\endcode
|
||
Note that '1' could be used instead of 'classic'.
|
||
|
||
@subsection remote_access Remote Access to Data Subset
|
||
Download the variable 'time_bnds' and its associated attributes from
|
||
an OPeNDAP server and copy the result to a netCDF file named 'tb.nc':
|
||
\code
|
||
nccopy 'http://test.opendap.org/opendap/data/nc/sst.mnmean.nc.gz?time_bnds' tb.nc
|
||
\endcode
|
||
Note that URLs that name specific variables as command-line arguments
|
||
should generally be quoted, to avoid the shell interpreting special
|
||
characters such as '?'.
|
||
|
||
@subsection compress Compress Data
|
||
Compress all the variables in the input file foo.nc, a netCDF file of
|
||
any type, to the output file bar.nc:
|
||
\code
|
||
nccopy -d1 foo.nc bar.nc
|
||
\endcode
|
||
If foo.nc was a classic or 64-bit offset netCDF file, bar.nc will be a
|
||
netCDF-4 classic model netCDF file, because the classic and 64-bit
|
||
offset format variants don't support compression. If foo.nc was a
|
||
netCDF-4 file with some variables compressed using various deflation
|
||
levels, the output will also be a netCDF-4 file of the same type, but
|
||
all the variables, including any uncompressed variables in the input,
|
||
will now use deflation level 1.
|
||
|
||
@subsection rechunk Rechunk Data for Faster Access
|
||
Assume the input data includes gridded variables that use time, lat,
|
||
lon dimensions, with 1000 times by 1000 latitudes by 1000 longitudes,
|
||
and that the time dimension varies most slowly. Also assume that
|
||
users want quick access to data at all times for a small set of
|
||
lat-lon points. Accessing data for 1000 times would typically require
|
||
accessing 1000 disk blocks, which may be slow.
|
||
|
||
Reorganizing the data into chunks on disk that have all the time in
|
||
each chunk for a few lat and lon coordinates would greatly speed up
|
||
such access. To chunk the data in the input file slow.nc, a netCDF
|
||
file of any type, to the output file fast.nc, you could use;
|
||
\code
|
||
nccopy -c time/1000,lat/40,lon/40 slow.nc fast.nc
|
||
\endcode
|
||
to specify data chunks of 1000 times, 40 latitudes, and 40 longitudes.
|
||
If you had enough memory to contain the output file, you could speed
|
||
up the rechunking operation significantly by creating the output in
|
||
memory before writing it to disk on close:
|
||
\code
|
||
nccopy -w -c time/1000,lat/40,lon/40 slow.nc fast.nc
|
||
\endcode
|
||
|
||
\page guide_ncgen3 ncgen3
|
||
|
||
The ncgen3 tool is the new name for the older, original ncgen utility.
|
||
|
||
The ncgen3 tool generates a netCDF file or a C or FORTRAN program that
|
||
creates a netCDF dataset. If no options are specified in invoking
|
||
ncgen3, the program merely checks the syntax of the CDL input,
|
||
producing error messages for any violations of CDL syntax.
|
||
|
||
The ncgen3 utility can only generate classic-model netCDF-4 files or
|
||
programs.
|
||
|
||
UNIX syntax for invoking ncgen3:
|
||
|
||
\code
|
||
ncgen3 [-b] [-o netcdf-file] [-c] [-f] [-v2|-v3] [-x] [input-file]
|
||
\endcode
|
||
|
||
where:
|
||
<pre>
|
||
-b
|
||
Create a (binary) netCDF file. If the '-o' option is absent, a default
|
||
file name will be constructed from the netCDF name (specified after
|
||
the netcdf keyword in the input) by appending the '.nc'
|
||
extension. Warning: if a file already exists with the specified name
|
||
it will be overwritten.
|
||
|
||
-o netcdf-file
|
||
Name for the netCDF file created. If this option is specified, it
|
||
implies the '-b' option. (This option is necessary because netCDF
|
||
files are direct-access files created with seek calls, and hence
|
||
cannot be written to standard output.)
|
||
|
||
-c
|
||
Generate C source code that will create a netCDF dataset matching the
|
||
netCDF specification. The C source code is written to standard
|
||
output. This is only useful for relatively small CDL files, since all
|
||
the data is included in variable initializations in the generated
|
||
program.
|
||
|
||
-f
|
||
Generate FORTRAN source code that will create a netCDF dataset
|
||
matching the netCDF specification. The FORTRAN source code is written
|
||
to standard output. This is only useful for relatively small CDL
|
||
files, since all the data is included in variable initializations in
|
||
the generated program.
|
||
|
||
-v2
|
||
The generated netCDF file or program will use the version of the
|
||
format with 64-bit offsets, to allow for the creation of very large
|
||
files. These files are not as portable as classic format netCDF files,
|
||
because they require version 3.6.0 or later of the netCDF library.
|
||
|
||
-v3
|
||
The generated netCDF file will be in netCDF-4/HDF5 format. These files
|
||
are not as portable as classic format netCDF files, because they
|
||
require version 4.0 or later of the netCDF library.
|
||
|
||
-x
|
||
Use “no fill” mode, omitting the initialization of variable values
|
||
with fill values. This can make the creation of large files much
|
||
faster, but it will also eliminate the possibility of detecting the
|
||
inadvertent reading of values that haven't been written.
|
||
</pre>
|
||
|
||
\page classic_format_spec The NetCDF Classic Format Specification
|
||
|
||
To present the format more formally, we use a BNF grammar notation. In
|
||
this notation:
|
||
- Non-terminals (entities defined by grammar rules) are in lower case.
|
||
- Terminals (atomic entities in terms of which the format
|
||
specification is written) are in upper case, and are specified
|
||
literally as US-ASCII characters within single-quote characters or are
|
||
described with text between angle brackets (‘\<’ and ‘\>’).
|
||
- Optional entities are enclosed between braces (‘[’ and ‘]’).
|
||
- A sequence of zero or more occurrences of an entity is denoted by
|
||
‘[entity ...]’.
|
||
- A vertical line character (‘|’) separates alternatives. Alternation
|
||
has lower precedence than concatenation.
|
||
- Comments follow ‘//’ characters.
|
||
- A single byte that is not a printable character is denoted using a
|
||
hexadecimal number with the notation ‘\\xDD’, where each D is a
|
||
hexadecimal digit.
|
||
- A literal single-quote character is denoted by ‘\'’, and a literal
|
||
back-slash character is denoted by ‘\\’.
|
||
|
||
Following the grammar, a few additional notes are included to specify
|
||
format characteristics that are impractical to capture in a BNF
|
||
grammar, and to note some special cases for implementers. Comments in
|
||
the grammar point to the notes and special cases, and help to clarify
|
||
the intent of elements of the format.
|
||
|
||
<h1>The Format in Detail</h1>
|
||
|
||
<pre>
|
||
netcdf_file = header data
|
||
header = magic numrecs dim_list gatt_list var_list
|
||
magic = 'C' 'D' 'F' VERSION
|
||
VERSION = \\x01 | // classic format
|
||
\\x02 // 64-bit offset format
|
||
numrecs = NON_NEG | STREAMING // length of record dimension
|
||
dim_list = ABSENT | NC_DIMENSION nelems [dim ...]
|
||
gatt_list = att_list // global attributes
|
||
att_list = ABSENT | NC_ATTRIBUTE nelems [attr ...]
|
||
var_list = ABSENT | NC_VARIABLE nelems [var ...]
|
||
ABSENT = ZERO ZERO // Means list is not present
|
||
ZERO = \\x00 \\x00 \\x00 \\x00 // 32-bit zero
|
||
NC_DIMENSION = \\x00 \\x00 \\x00 \\x0A // tag for list of dimensions
|
||
NC_VARIABLE = \\x00 \\x00 \\x00 \\x0B // tag for list of variables
|
||
NC_ATTRIBUTE = \\x00 \\x00 \\x00 \\x0C // tag for list of attributes
|
||
nelems = NON_NEG // number of elements in following sequence
|
||
dim = name dim_length
|
||
name = nelems namestring
|
||
// Names a dimension, variable, or attribute.
|
||
// Names should match the regular expression
|
||
// ([a-zA-Z0-9_]|{MUTF8})([^\\x00-\\x1F/\\x7F-\\xFF]|{MUTF8})*
|
||
// For other constraints, see "Note on names", below.
|
||
namestring = ID1 [IDN ...] padding
|
||
ID1 = alphanumeric | '_'
|
||
IDN = alphanumeric | special1 | special2
|
||
alphanumeric = lowercase | uppercase | numeric | MUTF8
|
||
lowercase = 'a'|'b'|'c'|'d'|'e'|'f'|'g'|'h'|'i'|'j'|'k'|'l'|'m'|
|
||
'n'|'o'|'p'|'q'|'r'|'s'|'t'|'u'|'v'|'w'|'x'|'y'|'z'
|
||
uppercase = 'A'|'B'|'C'|'D'|'E'|'F'|'G'|'H'|'I'|'J'|'K'|'L'|'M'|
|
||
'N'|'O'|'P'|'Q'|'R'|'S'|'T'|'U'|'V'|'W'|'X'|'Y'|'Z'
|
||
numeric = '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'
|
||
// special1 chars have traditionally been
|
||
// permitted in netCDF names.
|
||
special1 = '_'|'.'|'@'|'+'|'-'
|
||
// special2 chars are recently permitted in
|
||
// names (and require escaping in CDL).
|
||
// Note: '/' is not permitted.
|
||
special2 = ' ' | '!' | '"' | '#' | '$' | '%' | '&' | '\'' |
|
||
'(' | ')' | '*' | ',' | ':' | ';' | '<' | '=' |
|
||
'>' | '?' | '[' | '\\' | ']' | '^' | '`' | '{' |
|
||
'|' | '}' | '~'
|
||
MUTF8 = <multibyte UTF-8 encoded, NFC-normalized Unicode character>
|
||
dim_length = NON_NEG // If zero, this is the record dimension.
|
||
// There can be at most one record dimension.
|
||
attr = name nc_type nelems [values ...]
|
||
nc_type = NC_BYTE |
|
||
NC_CHAR |
|
||
NC_SHORT |
|
||
NC_INT |
|
||
NC_FLOAT |
|
||
NC_DOUBLE
|
||
var = name nelems [dimid ...] vatt_list nc_type vsize begin
|
||
// nelems is the dimensionality (rank) of the
|
||
// variable: 0 for scalar, 1 for vector, 2
|
||
// for matrix, ...
|
||
dimid = NON_NEG // Dimension ID (index into dim_list) for
|
||
// variable shape. We say this is a "record
|
||
// variable" if and only if the first
|
||
// dimension is the record dimension.
|
||
vatt_list = att_list // Variable-specific attributes
|
||
vsize = NON_NEG // Variable size. If not a record variable,
|
||
// the amount of space in bytes allocated to
|
||
// the variable's data. If a record variable,
|
||
// the amount of space per record. See "Note
|
||
// on vsize", below.
|
||
begin = OFFSET // Variable start location. The offset in
|
||
// bytes (seek index) in the file of the
|
||
// beginning of data for this variable.
|
||
data = non_recs recs
|
||
non_recs = [vardata ...] // The data for all non-record variables,
|
||
// stored contiguously for each variable, in
|
||
// the same order the variables occur in the
|
||
// header.
|
||
vardata = [values ...] // All data for a non-record variable, as a
|
||
// block of values of the same type as the
|
||
// variable, in row-major order (last
|
||
// dimension varying fastest).
|
||
recs = [record ...] // The data for all record variables are
|
||
// stored interleaved at the end of the
|
||
// file.
|
||
record = [varslab ...] // Each record consists of the n-th slab
|
||
// from each record variable, for example
|
||
// x[n,...], y[n,...], z[n,...] where the
|
||
// first index is the record number, which
|
||
// is the unlimited dimension index.
|
||
varslab = [values ...] // One record of data for a variable, a
|
||
// block of values all of the same type as
|
||
// the variable in row-major order (last
|
||
// index varying fastest).
|
||
values = bytes | chars | shorts | ints | floats | doubles
|
||
string = nelems [chars]
|
||
bytes = [BYTE ...] padding
|
||
chars = [CHAR ...] padding
|
||
shorts = [SHORT ...] padding
|
||
ints = [INT ...]
|
||
floats = [FLOAT ...]
|
||
doubles = [DOUBLE ...]
|
||
padding = <0, 1, 2, or 3 bytes to next 4-byte boundary>
|
||
// Header padding uses null (\\x00) bytes. In
|
||
// data, padding uses variable's fill value.
|
||
// See "Note on padding", below, for a special
|
||
// case.
|
||
NON_NEG = <non-negative INT>
|
||
STREAMING = \\xFF \\xFF \\xFF \\xFF // Indicates indeterminate record
|
||
// count, allows streaming data
|
||
OFFSET = <non-negative INT> | // For classic format or
|
||
<non-negative INT64> // for 64-bit offset format
|
||
BYTE = <8-bit byte> // See "Note on byte data", below.
|
||
CHAR = <8-bit byte> // See "Note on char data", below.
|
||
SHORT = <16-bit signed integer, Bigendian, two's complement>
|
||
INT = <32-bit signed integer, Bigendian, two's complement>
|
||
INT64 = <64-bit signed integer, Bigendian, two's complement>
|
||
FLOAT = <32-bit IEEE single-precision float, Bigendian>
|
||
DOUBLE = <64-bit IEEE double-precision float, Bigendian>
|
||
// following type tags are 32-bit integers
|
||
NC_BYTE = \\x00 \\x00 \\x00 \\x01 // 8-bit signed integers
|
||
NC_CHAR = \\x00 \\x00 \\x00 \\x02 // text characters
|
||
NC_SHORT = \\x00 \\x00 \\x00 \\x03 // 16-bit signed integers
|
||
NC_INT = \\x00 \\x00 \\x00 \\x04 // 32-bit signed integers
|
||
NC_FLOAT = \\x00 \\x00 \\x00 \\x05 // IEEE single precision floats
|
||
NC_DOUBLE = \\x00 \\x00 \\x00 \\x06 // IEEE double precision floats
|
||
// Default fill values for each type, may be
|
||
// overridden by variable attribute named
|
||
// `_FillValue'. See "Note on fill values",
|
||
// below.
|
||
FILL_CHAR = \\x00 // null byte
|
||
FILL_BYTE = \\x81 // (signed char) -127
|
||
FILL_SHORT = \\x80 \\x01 // (short) -32767
|
||
FILL_INT = \\x80 \\x00 \\x00 \\x01 // (int) -2147483647
|
||
FILL_FLOAT = \\x7C \\xF0 \\x00 \\x00 // (float) 9.9692099683868690e+36
|
||
FILL_DOUBLE = \\x47 \\x9E \\x00 \\x00 \\x00 \\x00 //(double)9.9692099683868690e+36
|
||
</pre>
|
||
|
||
Note on vsize: This number is the product of the dimension lengths
|
||
(omitting the record dimension) and the number of bytes per value
|
||
(determined from the type), increased to the next multiple of 4, for
|
||
each variable. If a record variable, this is the amount of space per
|
||
record (except that, for backward compatibility, it always includes
|
||
padding to the next multiple of 4 bytes, even in the exceptional case
|
||
noted below under “Note on padding”). The netCDF “record size” is
|
||
calculated as the sum of the vsize's of all the record variables.
|
||
|
||
The vsize field is actually redundant, because its value may be
|
||
computed from other information in the header. The 32-bit vsize field
|
||
is not large enough to contain the size of variables that require more
|
||
than 2^32 - 4 bytes, so 2^32 - 1 is used in the vsize field for such
|
||
variables.
|
||
|
||
Note on names: Earlier versions of the netCDF C-library reference
|
||
implementation enforced a more restricted set of characters in
|
||
creating new names, but permitted reading names containing arbitrary
|
||
bytes. This specification extends the permitted characters in names to
|
||
include multi-byte UTF-8 encoded Unicode and additional printing
|
||
characters from the US-ASCII alphabet. The first character of a name
|
||
must be alphanumeric, a multi-byte UTF-8 character, or '_' (reserved
|
||
for special names with meaning to implementations, such as the
|
||
“_FillValue” attribute). Subsequent characters may also include
|
||
printing special characters, except for '/' which is not allowed in
|
||
names. Names that have trailing space characters are also not
|
||
permitted.
|
||
|
||
Implementations of the netCDF classic and 64-bit offset format must
|
||
ensure that names are normalized according to Unicode NFC
|
||
normalization rules during encoding as UTF-8 for storing in the file
|
||
header. This is necessary to ensure that gratuitous differences in the
|
||
representation of Unicode names do not cause anomalies in comparing
|
||
files and querying data objects by name.
|
||
|
||
Note on streaming data: The largest possible record count, 2^32 - 1,
|
||
is reserved to indicate an indeterminate number of records. This means
|
||
that the number of records in the file must be determined by other
|
||
means, such as reading them or computing the current number of records
|
||
from the file length and other information in the header. It also
|
||
means that the numrecs field in the header will not be updated as
|
||
records are added to the file. [This feature is not yet implemented].
|
||
|
||
Note on padding: In the special case when there is only one record
|
||
variable and it is of type character, byte, or short, no padding is
|
||
used between record slabs, so records after the first record do not
|
||
necessarily start on four-byte boundaries. However, as noted above
|
||
under “Note on vsize”, the vsize field is computed to include padding
|
||
to the next multiple of 4 bytes. In this case, readers should ignore
|
||
vsize and assume no padding. Writers should store vsize as if padding
|
||
were included.
|
||
|
||
Note on byte data: It is possible to interpret byte data as either
|
||
signed (-128 to 127) or unsigned (0 to 255). When reading byte data
|
||
through an interface that converts it into another numeric type, the
|
||
default interpretation is signed. There are various attribute
|
||
conventions for specifying whether bytes represent signed or unsigned
|
||
data, but no standard convention has been established. The variable
|
||
attribute “_Unsigned” is reserved for this purpose in future
|
||
implementations.
|
||
|
||
Note on char data: Although the characters used in netCDF names must
|
||
be encoded as UTF-8, character data may use other encodings. The
|
||
variable attribute “_Encoding” is reserved for this purpose in future
|
||
implementations.
|
||
|
||
Note on fill values: Because data variables may be created before
|
||
their values are written, and because values need not be written
|
||
sequentially in a netCDF file, default “fill values” are defined for
|
||
each type, for initializing data values before they are explicitly
|
||
written. This makes it possible to detect reading values that were
|
||
never written. The variable attribute “_FillValue”, if present,
|
||
overrides the default fill value for a variable. If _FillValue is
|
||
defined then it should be scalar and of the same type as the variable.
|
||
|
||
Fill values are not required, however, because netCDF libraries have
|
||
traditionally supported a “no fill” mode when writing, omitting the
|
||
initialization of variable values with fill values. This makes the
|
||
creation of large files faster, but also eliminates the possibility of
|
||
detecting the inadvertent reading of values that haven't been written.
|
||
|
||
\page computing_offsets Notes on Computing File Offsets
|
||
|
||
The offset (position within the file) of a specified data value in a
|
||
classic format or 64-bit offset data file is completely determined by
|
||
the variable start location (the offset in the begin field), the
|
||
external type of the variable (the nc_type field), and the dimension
|
||
indices (one for each of the variable's dimensions) of the value
|
||
desired.
|
||
|
||
The external size in bytes of one data value for each possible netCDF
|
||
type, denoted extsize below, is:
|
||
- NC_BYTE 1
|
||
- NC_CHAR 1
|
||
- NC_SHORT 2
|
||
- NC_INT 4
|
||
- NC_FLOAT 4
|
||
- NC_DOUBLE 8
|
||
|
||
The record size, denoted by recsize below, is the sum of the vsize
|
||
fields of record variables (variables that use the unlimited
|
||
dimension), using the actual value determined by dimension sizes and
|
||
variable type in case the vsize field is too small for the variable
|
||
size.
|
||
|
||
To compute the offset of a value relative to the beginning of a
|
||
variable, it is helpful to precompute a “product vector” from the
|
||
dimension lengths. Form the products of the dimension lengths for the
|
||
variable from right to left, skipping the leftmost (record) dimension
|
||
for record variables, and storing the results as the product vector
|
||
for each variable.
|
||
|
||
For example:
|
||
|
||
\code
|
||
Non-record variable:
|
||
|
||
dimension lengths: [ 5 3 2 7] product vector: [210 42 14 7]
|
||
|
||
Record variable:
|
||
|
||
dimension lengths: [0 2 9 4] product vector: [0 72 36 4]
|
||
\endcode
|
||
|
||
At this point, the leftmost product, when rounded up to the next
|
||
multiple of 4, is the variable size, vsize, in the grammar above. For
|
||
example, in the non-record variable above, the value of the vsize
|
||
field is 212 (210 rounded up to a multiple of 4). For the record
|
||
variable, the value of vsize is just 72, since this is already a
|
||
multiple of 4.
|
||
|
||
Let coord be the array of coordinates (dimension indices, zero-based)
|
||
of the desired data value. Then the offset of the value from the
|
||
beginning of the file is just the file offset of the first data value
|
||
of the desired variable (its begin field) added to the inner product
|
||
of the coord and product vectors times the size, in bytes, of each
|
||
datum for the variable. Finally, if the variable is a record variable,
|
||
the product of the record number, 'coord[0]', and the record size,
|
||
recsize, is added to yield the final offset value.
|
||
|
||
A special case: Where there is exactly one record variable, we drop
|
||
the requirement that each record be four-byte aligned, so in this case
|
||
there is no record padding.
|
||
|
||
\page offset_examples Examples
|
||
|
||
By using the grammar above, we can derive the smallest valid netCDF
|
||
file, having no dimensions, no variables, no attributes, and hence, no
|
||
data. A CDL representation of the empty netCDF file is
|
||
|
||
\code
|
||
netcdf empty { }
|
||
\endcode
|
||
|
||
This empty netCDF file has 32 bytes. It begins with the four-byte
|
||
“magic number” that identifies it as a netCDF version 1 file: ‘C’,
|
||
‘D’, ‘F’, ‘\\x01’. Following are seven 32-bit integer zeros
|
||
representing the number of records, an empty list of dimensions, an
|
||
empty list of global attributes, and an empty list of variables.
|
||
|
||
Below is an (edited) dump of the file produced using the Unix command
|
||
|
||
\code
|
||
od -xcs empty.nc
|
||
\endcode
|
||
|
||
Each 16-byte portion of the file is displayed with 4 lines. The first
|
||
line displays the bytes in hexadecimal. The second line displays the
|
||
bytes as characters. The third line displays each group of two bytes
|
||
interpreted as a signed 16-bit integer. The fourth line (added by
|
||
human) presents the interpretation of the bytes in terms of netCDF
|
||
components and values.
|
||
|
||
\code
|
||
4344 4601 0000 0000 0000 0000 0000 0000
|
||
C D F 001 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
|
||
17220 17921 00000 00000 00000 00000 00000 00000
|
||
[magic number ] [ 0 records ] [ 0 dimensions (ABSENT) ]
|
||
|
||
0000 0000 0000 0000 0000 0000 0000 0000
|
||
\0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
|
||
00000 00000 00000 00000 00000 00000 00000 00000
|
||
[ 0 global atts (ABSENT) ] [ 0 variables (ABSENT) ]
|
||
\endcode
|
||
|
||
As a less trivial example, consider the CDL
|
||
|
||
\code
|
||
netcdf tiny {
|
||
dimensions:
|
||
dim = 5;
|
||
variables:
|
||
short vx(dim);
|
||
data:
|
||
vx = 3, 1, 4, 1, 5 ;
|
||
}
|
||
\endcode
|
||
|
||
which corresponds to a 92-byte netCDF file. The following is an edited dump of this file:
|
||
|
||
\code
|
||
4344 4601 0000 0000 0000 000a 0000 0001
|
||
C D F 001 \0 \0 \0 \0 \0 \0 \0 \n \0 \0 \0 001
|
||
17220 17921 00000 00000 00000 00010 00000 00001
|
||
[magic number ] [ 0 records ] [NC_DIMENSION ] [ 1 dimension ]
|
||
|
||
0000 0003 6469 6d00 0000 0005 0000 0000
|
||
\0 \0 \0 003 d i m \0 \0 \0 \0 005 \0 \0 \0 \0
|
||
00000 00003 25705 27904 00000 00005 00000 00000
|
||
[ 3 char name = "dim" ] [ size = 5 ] [ 0 global atts
|
||
|
||
0000 0000 0000 000b 0000 0001 0000 0002
|
||
\0 \0 \0 \0 \0 \0 \0 013 \0 \0 \0 001 \0 \0 \0 002
|
||
00000 00000 00000 00011 00000 00001 00000 00002
|
||
(ABSENT) ] [NC_VARIABLE ] [ 1 variable ] [ 2 char name =
|
||
|
||
7678 0000 0000 0001 0000 0000 0000 0000
|
||
v x \0 \0 \0 \0 \0 001 \0 \0 \0 \0 \0 \0 \0 \0
|
||
30328 00000 00000 00001 00000 00000 00000 00000
|
||
"vx" ] [1 dimension ] [ with ID 0 ] [ 0 attributes
|
||
|
||
0000 0000 0000 0003 0000 000c 0000 0050
|
||
\0 \0 \0 \0 \0 \0 \0 003 \0 \0 \0 \f \0 \0 \0 P
|
||
00000 00000 00000 00003 00000 00012 00000 00080
|
||
(ABSENT) ] [type NC_SHORT] [size 12 bytes] [offset: 80]
|
||
|
||
0003 0001 0004 0001 0005 8001
|
||
\0 003 \0 001 \0 004 \0 001 \0 005 200 001
|
||
00003 00001 00004 00001 00005 -32767
|
||
[ 3] [ 1] [ 4] [ 1] [ 5] [fill ]
|
||
\endcode
|
||
|
||
\page offset_format_spec The 64-bit Offset Format
|
||
|
||
The netCDF 64-bit offset format differs from the classic format only
|
||
in the VERSION byte, ‘\\x02’ instead of ‘\\x01’, and the OFFSET entity,
|
||
a 64-bit instead of a 32-bit offset from the beginning of the
|
||
file. This small format change permits much larger files, but there
|
||
are still some practical size restrictions. Each fixed-size variable
|
||
and the data for one record's worth of each record variable are still
|
||
limited in size to a little less that 4 GiB. The rationale for this
|
||
limitation is to permit aggregate access to all the data in a netCDF
|
||
variable (or a record's worth of data) on 32-bit platforms.
|
||
|
||
\page netcdf_4_spec The NetCDF-4 Format
|
||
|
||
The netCDF-4 format implements and expands the netCDF-3 data model by
|
||
using an enhanced version of HDF5 as the storage layer. Use is made of
|
||
features that are only available in HDF5 version 1.8 and later.
|
||
|
||
Using HDF5 as the underlying storage layer, netCDF-4 files remove many
|
||
of the restrictions for classic and 64-bit offset files. The richer
|
||
enhanced model supports user-defined types and data structures,
|
||
hierarchical scoping of names using groups, additional primitive types
|
||
including strings, larger variable sizes, and multiple unlimited
|
||
dimensions. The underlying HDF5 storage layer also supports
|
||
per-variable compression, multidimensional tiling, and efficient
|
||
dynamic schema changes, so that data need not be copied when adding
|
||
new variables to the file schema.
|
||
|
||
Creating a netCDF-4/HDF5 file with netCDF-4 results in an HDF5
|
||
file. The features of netCDF-4 are a subset of the features of HDF5,
|
||
so the resulting file can be used by any existing HDF5 application.
|
||
|
||
Although every file in netCDF-4 format is an HDF5 file, there are HDF5
|
||
files that are not netCDF-4 format files, because the netCDF-4 format
|
||
intentionally uses a limited subset of the HDF5 data model and file
|
||
format features. Some HDF5 features not supported in the netCDF
|
||
enhanced model and netCDF-4 format include non-hierarchical group
|
||
structures, HDF5 reference types, multiple links to a data object,
|
||
user-defined atomic data types, stored property lists, more permissive
|
||
rules for data object names, the HDF5 date/time type, and attributes
|
||
associated with user-defined types.
|
||
|
||
A complete specification of HDF5 files is beyond the scope of this
|
||
document. For more information about HDF5, see the HDF5 web site:
|
||
http://hdf.ncsa.uiuc.edu/HDF5/.
|
||
|
||
The specification that follows is sufficient to allow HDF5 users to
|
||
create files that will be accessable from netCDF-4.
|
||
|
||
\section creation_order Creation Order
|
||
|
||
The netCDF API maintains the creation order of objects that are
|
||
created in the file. The same is not true in HDF5, which maintains the
|
||
objects in alphabetical order. Starting in version 1.8 of HDF5, the
|
||
ability to maintain creation order was added. This must be explicitly
|
||
turned on in the HDF5 data file in several ways.
|
||
|
||
Each group must have link and attribute creation order set. The
|
||
following code (from libsrc4/nc4hdf.c) shows how the netCDF-4 library
|
||
sets these when creating a group.
|
||
|
||
\code
|
||
/* Create group, with link_creation_order set in the group
|
||
* creation property list. */
|
||
if ((gcpl_id = H5Pcreate(H5P_GROUP_CREATE)) < 0)
|
||
return NC_EHDFERR;
|
||
if (H5Pset_link_creation_order(gcpl_id, H5P_CRT_ORDER_TRACKED|H5P_CRT_ORDER_INDEXED) < 0)
|
||
BAIL(NC_EHDFERR);
|
||
if (H5Pset_attr_creation_order(gcpl_id, H5P_CRT_ORDER_TRACKED|H5P_CRT_ORDER_INDEXED) < 0)
|
||
BAIL(NC_EHDFERR);
|
||
if ((grp->hdf_grpid = H5Gcreate2(grp->parent->hdf_grpid, grp->name,
|
||
H5P_DEFAULT, gcpl_id, H5P_DEFAULT)) < 0)
|
||
BAIL(NC_EHDFERR);
|
||
if (H5Pclose(gcpl_id) < 0)
|
||
BAIL(NC_EHDFERR);
|
||
\endcode
|
||
|
||
Each dataset in the HDF5 file must be created with a property list for
|
||
which the attribute creation order has been set to creation
|
||
ordering. The H5Pset_attr_creation_order funtion is used to set the
|
||
creation ordering of attributes of a variable.
|
||
|
||
The following example code (from libsrc4/nc4hdf.c) shows how the
|
||
creation ordering is turned on by the netCDF library.
|
||
|
||
\code
|
||
/* Turn on creation order tracking. */
|
||
if (H5Pset_attr_creation_order(plistid, H5P_CRT_ORDER_TRACKED|
|
||
H5P_CRT_ORDER_INDEXED) < 0)
|
||
BAIL(NC_EHDFERR);
|
||
\endcode
|
||
|
||
\section groups_spec Groups
|
||
|
||
NetCDF-4 groups are the same as HDF5 groups, but groups in a netCDF-4
|
||
file must be strictly hierarchical. In general, HDF5 permits
|
||
non-hierarchical structuring of groups (for example, a group that is
|
||
its own grandparent). These non-hierarchical relationships are not
|
||
allowed in netCDF-4 files.
|
||
|
||
In the netCDF API, the global attribute becomes a group-level
|
||
attribute. That is, each group may have its own global attributes.
|
||
|
||
The root group of a file is named “/” in the netCDF API, where names
|
||
of groups are used. It should be noted that the netCDF API (like the
|
||
HDF5 API) makes little use of names, and refers to entities by number.
|
||
|
||
\section dims_spec Dimensions with HDF5 Dimension Scales
|
||
|
||
Until version 1.8, HDF5 did not have any capability to represent
|
||
shared dimensions. With the 1.8 release, HDF5 introduced the dimension
|
||
scale feature to allow shared dimensions in HDF5 files.
|
||
|
||
The dimension scale is unfortunately not exactly equivilent to the
|
||
netCDF shared dimension, and this leads to a number of compromises in
|
||
the design of netCDF-4.
|
||
|
||
A netCDF shared dimension consists solely of a length and a name. An
|
||
HDF5 dimension scale also includes values for each point along the
|
||
dimension, information that is (optionally) included in a netCDF
|
||
coordinate variable.
|
||
|
||
To handle the case of a netCDF dimension without a coordinate
|
||
variable, netCDF-4 creates dimension scales of type char, and leaves
|
||
the contents of the dimension scale empty. Only the name and length of
|
||
the scale are significant. To distinguish this case, netCDF-4 takes
|
||
advantage of the NAME attribute of the dimension scale. (Not to be
|
||
confused with the name of the scale itself.) In the case of dimensions
|
||
without coordinate data, the HDF5 dimension scale NAME attribute is
|
||
set to the string: "This is a netCDF dimension but not a netCDF
|
||
variable."
|
||
|
||
In the case where a coordinate variable is defined for a dimension,
|
||
the HDF5 dimscale matches the type of the netCDF coordinate variable,
|
||
and contains the coordinate data.
|
||
|
||
A further difficulty arrises when an n-dimensional coordinate variable
|
||
is defined, where n is greater than one. NetCDF allows such coordinate
|
||
variables, but the HDF5 model does not allow dimension scales to be
|
||
attached to other dimension scales, making it impossible to completely
|
||
represent the multi-dimensional coordinate variables of the netCDF
|
||
model.
|
||
|
||
To capture this information, multidimensional coordinate variables
|
||
have an attribute named _Netcdf4Coordinates. The attribute is an array
|
||
of H5T_NATIVE_INT, with the netCDF dimension IDs of each of its
|
||
dimensions.
|
||
|
||
The _Netcdf4Coordinates attribute is otherwise hidden by the netCDF
|
||
API. It does not appear as one of the attributes for the netCDF
|
||
variable involved, except through the HDF5 API.
|
||
|
||
\section dim_spec2 Dimensions without HDF5 Dimension Scales
|
||
|
||
Starting with the netCDF-4.1 release, netCDF can read HDF5 files which
|
||
do not use dimension scales. In this case the netCDF library assigns
|
||
dimensions to the HDF5 dataset as needed, based on the length of the
|
||
dimension.
|
||
|
||
When an HDF5 file is opened, each dataset is examined in turn. The
|
||
lengths of all the dimensions involved in the shape of the dataset are
|
||
determined. Each new (i.e. previously unencountered) length results in
|
||
the creation of a phony dimension in the netCDF API.
|
||
|
||
This will not accurately detect a shared, unlimited dimension in the
|
||
HDF5 file, if different datasets have different lengths along this
|
||
dimension (possible in HDF5, but not in netCDF).
|
||
|
||
Note that this is a read-only capability for the netCDF library. When
|
||
the netCDF library writes HDF5 files, they always use a dimension
|
||
scale for every dimension.
|
||
|
||
Datasets must have either dimension scales for every dimension, or no
|
||
dimension scales at all. Partial dimension scales are not, at this
|
||
time, understood by the netCDF library.
|
||
|
||
\section dim_spec3 Dimension and Coordinate Variable Ordering
|
||
|
||
In order to preserve creation order, the netCDF-4 library writes
|
||
variables in their creation order. Since some variables are also
|
||
dimension scales, their order reflects both the order of the
|
||
dimensions and the order of the coordinate variables.
|
||
|
||
However, these may be different. Consider the following code:
|
||
|
||
\code
|
||
/* Create a test file. */
|
||
if (nc_create(FILE_NAME, NC_CLASSIC_MODEL|NC_NETCDF4, &ncid)) ERR;
|
||
|
||
/* Define dimensions in order. */
|
||
if (nc_def_dim(ncid, DIM0, NC_UNLIMITED, &dimids[0])) ERR;
|
||
if (nc_def_dim(ncid, DIM1, 4, &dimids[1])) ERR;
|
||
|
||
/* Define coordinate variables in a different order. */
|
||
if (nc_def_var(ncid, DIM1, NC_DOUBLE, 1, &dimids[1], &varid[1])) ERR;
|
||
if (nc_def_var(ncid, DIM0, NC_DOUBLE, 1, &dimids[0], &varid[0])) ERR;
|
||
\endcode
|
||
|
||
In this case the order of the coordinate variables will be different
|
||
from the order of the dimensions.
|
||
|
||
In practice, this should make little difference in user code, but if
|
||
the user is writing code that depends on the ordering of dimensions,
|
||
the netCDF library was updated in version 4.1 to detect this
|
||
condition, and add the attribute _Netcdf4Dimid to the dimension scales
|
||
in the HDF5 file. This attribute holds a scalar H5T_NATIVE_INT which
|
||
is the (zero-based) dimension ID for this dimension.
|
||
|
||
If this attribute is present on any dimension scale, it must be
|
||
present on all dimension scales in the file.
|
||
|
||
\section vars_spec Variables
|
||
|
||
Variables in netCDF-4/HDF5 files exactly correspond to HDF5
|
||
datasets. The data types match naturally between netCDF and HDF5.
|
||
|
||
In netCDF classic format, the problem of endianness is solved by
|
||
writing all data in big-endian order. The HDF5 library allows data to
|
||
be written as either big or little endian, and automatically reorders
|
||
the data when it is read, if necessary.
|
||
|
||
By default, netCDF uses the native types on the machine which writes
|
||
the data. Users may change the endianness of a variable (before any
|
||
data are written). In that case the specified endian type will be used
|
||
in HDF5 (for example, a H5T_STD_I16LE will be used for NC_SHORT, if
|
||
little-endian has been specified for that variable.)
|
||
- NC_BYTE = H5T_NATIVE_SCHAR
|
||
- NC_UBYTE = H5T_NATIVE_SCHAR
|
||
- NC_CHAR = H5T_C_S1
|
||
- NC_STRING = variable length array of H5T_C_S1
|
||
- NC_SHORT = H5T_NATIVE_SHORT
|
||
- NC_USHORT = H5T_NATIVE_USHORT
|
||
- NC_INT = H5T_NATIVE_INT
|
||
- NC_UINT = H5T_NATIVE_UINT
|
||
- NC_INT64 = H5T_NATIVE_LLONG
|
||
- NC_UINT64 = H5T_NATIVE_ULLONG
|
||
- NC_FLOAT = H5T_NATIVE_FLOAT
|
||
- NC_DOUBLE = H5T_NATIVE_DOUBLE
|
||
|
||
The NC_CHAR type represents a single character, and the NC_STRING an
|
||
array of characters. This can be confusing because a one-dimensional
|
||
array of NC_CHAR is used to represent a string (i.e. a scalar
|
||
NC_STRING).
|
||
|
||
An odd case may arise in which the user defines a variable with the
|
||
same name as a dimension, but which is not intended to be the
|
||
coordinate variable for that dimension. In this case the string
|
||
"_nc4_non_coord_" is pre-pended to the name of the HDF5 dataset, and
|
||
stripped from the name for the netCDF API.
|
||
|
||
\section atts_spec Attributes
|
||
|
||
Attributes in HDF5 and netCDF-4 correspond very closely. Each
|
||
attribute in an HDF5 file is represented as an attribute in the
|
||
netCDF-4 file, with the exception of the attributes below, which are
|
||
ignored by the netCDF-4 API.
|
||
- _Netcdf4Coordinates An integer array containing the dimension IDs of
|
||
a variable which is a multi-dimensional coordinate variable.
|
||
- _nc3_strict When this (scalar, H5T_NATIVE_INT) attribute exists in
|
||
the root group of the HDF5 file, the netCDF API will enforce the
|
||
netCDF classic model on the data file.
|
||
- REFERENCE_LIST This attribute is created and maintained by the HDF5
|
||
dimension scale API.
|
||
- CLASS This attribute is created and maintained by the HDF5 dimension
|
||
scale API.
|
||
- DIMENSION_LIST This attribute is created and maintained by the HDF5
|
||
dimension scale API.
|
||
- NAME This attribute is created and maintained by the HDF5 dimension
|
||
scale API.
|
||
|
||
\section user_defined_spec User-Defined Data Types
|
||
|
||
Each user-defined data type in an HDF5 file exactly corresponds to a
|
||
user-defined data type in the netCDF-4 file. Only base data types
|
||
which correspond to netCDF-4 data types may be used. (For example, no
|
||
HDF5 reference data types may be used.)
|
||
|
||
\section compression_spec Compression
|
||
|
||
The HDF5 library provides data compression using the zlib library and
|
||
the szlib library. NetCDF-4 only allows users to create data with the
|
||
zlib library (due to licensing restrictions on the szlib
|
||
library). Since HDF5 supports the transparent reading of the data with
|
||
either compression filter, the netCDF-4 library can read data
|
||
compressed with szlib (if the underlying HDF5 library is built to
|
||
support szlib), but has no way to write data with szlib compression.
|
||
|
||
With zlib compression (a.k.a. deflation) the user may set a deflation
|
||
factor from 0 to 9. In our measurements the zero deflation level does
|
||
not compress the data, but does incur the performance penalty of
|
||
compressing the data. The netCDF API does not allow the user to write
|
||
a variable with zlib deflation of 0 - when asked to do so, it turns
|
||
off deflation for the variable instead. NetCDF can read an HDF5 file
|
||
with deflation of zero, and correctly report that to the user.
|
||
|
||
\page netcdf_4_classic_spec The NetCDF-4 Classic Model Format
|
||
|
||
Every classic and 64-bit offset file can be represented as a netCDF-4
|
||
file, with no loss of information. There are some significant benefits
|
||
to using the simpler netCDF classic model with the netCDF-4 file
|
||
format. For example, software that writes or reads classic model data
|
||
can write or read netCDF-4 classic model format data by
|
||
recompiling/relinking to a netCDF-4 API library, with no or only
|
||
trivial changes needed to the program source code. The netCDF-4
|
||
classic model format supports this usage by enforcing rules on what
|
||
functions may be called to store data in the file, to make sure its
|
||
data can be read by older netCDF applications (when relinked to a
|
||
netCDF-4 library).
|
||
|
||
Writing data in this format prevents use of enhanced model features
|
||
such as groups, added primitive types not available in the classic
|
||
model, and user-defined types. However performance features of the
|
||
netCDF-4 formats that do not require additional features of the
|
||
enhanced model, such as per-variable compression and chunking,
|
||
efficient dynamic schema changes, and larger variable size limits,
|
||
offer potentially significant performance improvements to readers of
|
||
data stored in this format, without requiring program changes.
|
||
|
||
When a file is created via the netCDF API with a CLASSIC_MODEL mode
|
||
flag, the library creates an attribute (_nc3_strict) in the root
|
||
group. This attribute is hidden by the netCDF API, but is read when
|
||
the file is later opened, and used to ensure that no enhanced model
|
||
features are written to the file.
|
||
|
||
\page hdf4_sd_format HDF4 SD Format
|
||
|
||
Starting with version 4.1, the netCDF libraries can read HDF4 SD
|
||
(Scientific Dataset) files. Access is limited to those HDF4 files
|
||
created with the Scientific Dataset API. Access is read-only.
|
||
|
||
Dataset types are translated between HDF4 and netCDF in a
|
||
straighforward manner.
|
||
- DFNT_CHAR = NC_CHAR
|
||
- DFNT_UCHAR, DFNT_UINT8 = NC_UBYTE
|
||
- DFNT_INT8 = NC_BYTE
|
||
- DFNT_INT16 = NC_SHORT
|
||
- DFNT_UINT16 = NC_USHORT
|
||
- DFNT_INT32 = NC_INT
|
||
- DFNT_UINT32 = NC_UINT
|
||
- DFNT_FLOAT32 = NC_FLOAT
|
||
- DFNT_FLOAT64 = NC_DOUBLE
|
||
|
||
|
||
\htmlonly
|
||
\page htmlonly_page html only page
|
||
|
||
LA LA LA!
|
||
|
||
\endhtmlonly
|
||
*/
|