mirror of
https://github.com/Unidata/netcdf-c.git
synced 2025-01-12 15:45:21 +08:00
7e2a9a9bdf
the new per-variable chunking specifications.
427 lines
20 KiB
Groff
427 lines
20 KiB
Groff
.\" $Id: nccopy.1 400 2010-08-27 21:02:52Z russ $
|
|
.TH NCCOPY 1 "2012-03-08" "Release 4.2" "UNIDATA UTILITIES"
|
|
.SH NAME
|
|
nccopy \- Copy a netCDF file, optionally changing format, compression, or chunking in the output.
|
|
.SH SYNOPSIS
|
|
.ft B
|
|
.HP
|
|
nccopy
|
|
.nh
|
|
\%[\-k \fI kind_name \fP]
|
|
\%[\-\fIkind_code\fP]
|
|
\%[\-d \fI n \fP]
|
|
\%[\-s]
|
|
\%[\-c \fI chunkspec \fP]
|
|
\%[\-u]
|
|
\%[\-w]
|
|
\%[\-[v|V] var1,...]
|
|
\%[\-[g|G] grp1,...]
|
|
\%[\-m \fI bufsize \fP]
|
|
\%[\-h \fI chunk_cache \fP]
|
|
\%[\-e \fI cache_elems \fP]
|
|
\%[\-r]
|
|
\%[\-F \fI filterspec \fP]
|
|
\%[\-L \fI n \fP]
|
|
\%[\-M \fI n \fP]
|
|
\%\fI infile \fP
|
|
\%\fI outfile \fP
|
|
.hy
|
|
.ft
|
|
.SH DESCRIPTION
|
|
.LP
|
|
The \fBnccopy\fP utility copies an input netCDF file in any supported
|
|
format variant to an output netCDF file, optionally converting the
|
|
output to any compatible netCDF format variant, compressing the data,
|
|
or rechunking the data. For example, if built with the netCDF-3
|
|
library, a netCDF classic file may be copied to a netCDF 64-bit offset
|
|
file, permitting larger variables. If built with the netCDF-4
|
|
library, a netCDF classic file may be copied to a netCDF-4 file or to
|
|
a netCDF-4 classic model file as well, permitting data compression,
|
|
efficient schema changes, larger variable sizes, and use of other
|
|
netCDF-4 features.
|
|
.LP
|
|
If no output format is specified, with either \-k \fIkind_name\fP
|
|
or \fI-kind_code\fP, then the output will use the same
|
|
format as the input, unless the input is classic or 64-bit offset
|
|
and either chunking or compression is specified, in which case the
|
|
output will be netCDF-4 classic model format. Attempting
|
|
some kinds of format conversion will result in an error, if the
|
|
conversion is not possible. For example, an attempt to copy a
|
|
netCDF-4 file that uses features of the enhanced model, such as
|
|
groups or variable-length strings, to any of the other kinds of netCDF
|
|
formats that use the classic model will result in an error.
|
|
.LP
|
|
\fBnccopy\fP also serves as an example of a generic netCDF-4 program,
|
|
with its ability to read any valid netCDF file and handle nested
|
|
groups, strings, and user-defined types, including arbitrarily
|
|
nested compound types, variable-length types, and data of any valid
|
|
netCDF-4 type.
|
|
.LP
|
|
If DAP support was enabled when \fBnccopy\fP was built, the file name may
|
|
specify a DAP URL. This may be used to convert data on DAP servers to
|
|
local netCDF files.
|
|
.SH OPTIONS
|
|
.IP "\fB \-k \fP \fI kind_name \fP"
|
|
Use format name to specify the kind of file to be created
|
|
and, by inference, the data model (i.e. netcdf-3 (classic) or
|
|
netcdf-4 (enhanced)). The possible arguments are:
|
|
.RS
|
|
.RS
|
|
.IP "'nc3' or 'classic' => netCDF classic format"
|
|
.IP "'nc6' or '64-bit offset' => netCDF 64-bit format"
|
|
.IP "'nc4' or 'netCDF-4' => netCDF-4 format (enhanced data model)"
|
|
.IP "'nc7' or 'netCDF-4 classic model' => netCDF-4 classic model format"
|
|
.RE
|
|
.RE
|
|
.IP
|
|
Note: The old format numbers '1', '2', '3', '4', equivalent
|
|
to the format names 'nc3', 'nc6', 'nc4', or 'nc7' respectively, are
|
|
also still accepted but deprecated, due to easy confusion between
|
|
format numbers and format names.
|
|
.IP "[\fB-\fP\fIkind_code\fP]"
|
|
Use format numeric code (instead of format name) to specify the kind of file to be created
|
|
and, by inference, the data model (i.e. netcdf-3 (classic) versus
|
|
netcdf-4 (enhanced)). The numeric codes are:
|
|
.RS
|
|
.RS
|
|
.IP "3 => netcdf classic format"
|
|
.IP "6 => netCDF 64-bit format"
|
|
.IP "4 => netCDF-4 format (enhanced data model)"
|
|
.IP "7 => netCDF-4 classic model format"
|
|
.RE
|
|
.RE
|
|
The numeric code "7" is used because "7=3+4", specifying the format
|
|
that uses the netCDF-3 data model for compatibility with the netCDF-4
|
|
storage format for performance. Credit is due to NCO for use of these
|
|
numeric codes instead of the old and confusing format numbers.
|
|
.IP "\fB \-d \fP \fI n \fP"
|
|
For netCDF-4 output, including netCDF-4 classic model, specify
|
|
deflation level (level of compression) for variable data output. 0
|
|
corresponds to no compression and 9 to maximum compression, with
|
|
higher levels of compression requiring marginally more time to
|
|
compress or uncompress than lower levels. Compression achieved may
|
|
also depend on output chunking parameters. If this option is
|
|
specified for a classic format or 64-bit offset format input file, it
|
|
is not necessary to also specify that the output should be netCDF-4
|
|
classic model, as that will be the default. If this option is not
|
|
specified and the input file has compressed variables, the compression
|
|
will still be preserved in the output, using the same chunking as in
|
|
the input by default.
|
|
.IP
|
|
Note that \fBnccopy\fP requires all variables to be compressed using the
|
|
same compression level, but the API has no such restriction. With
|
|
a program you can customize compression for each variable independently.
|
|
.IP "\fB \-s \fP"
|
|
For netCDF-4 output, including netCDF-4 classic model, specify
|
|
shuffling of variable data bytes before compression or after
|
|
decompression. Shuffling refers to interlacing of bytes in a chunk so
|
|
that the first bytes of all values are contiguous in storage, followed
|
|
by all the second bytes, and so on, which often improves compression.
|
|
This option is ignored unless a non-zero deflation level is specified.
|
|
Using \-d0 to specify no deflation on input data that has been
|
|
compressed and shuffled turns off both compression and shuffling in
|
|
the output.
|
|
.IP "\fB \-u \fP"
|
|
Convert any unlimited size dimensions in the input to fixed size
|
|
dimensions in the output. This can speed up variable-at-a-time
|
|
access, but slow down record-at-a-time access to multiple variables
|
|
along an unlimited dimension.
|
|
.IP "\fB \-w \fP"
|
|
Keep output in memory (as a diskless netCDF file) until output is
|
|
closed, at which time output file is written to disk. This can
|
|
greatly speedup operations such as converting unlimited dimension to
|
|
fixed size (\-u option), chunking, rechunking, or compressing the
|
|
input. It requires that available memory is large enough to hold the
|
|
output file. This option may provide a larger speedup than careful
|
|
tuning of the \-m, \-h, or \-e options, and it's certainly a lot simpler.
|
|
.IP "\fB \-c \fP \fIchunkspec\fP"
|
|
For netCDF-4 output, including netCDF-4 classic model, specify
|
|
chunking (multidimensional tiling) for variable data in the output.
|
|
This is useful to specify the units of disk access, compression, or
|
|
other filters such as checksums. Changing the chunking in a netCDF
|
|
file can also greatly speedup access, by choosing chunk shapes that
|
|
are appropriate for the most common access patterns.
|
|
.IP
|
|
The \fIchunkspec\fP argument has two forms. The first form is the
|
|
original, deprecated form and is a string of comma-separated associations,
|
|
each specifying a dimension name, a '/' character, and optionally the
|
|
corresponding chunk length for that dimension. No blanks should
|
|
appear in the chunkspec string, except possibly escaped blanks that
|
|
are part of a dimension name. A chunkspec names at least one
|
|
dimension, and may omit dimensions which are not to be chunked or for
|
|
which the default chunk length is desired. If a dimension name is
|
|
followed by a '/' character but no subsequent chunk length, the actual
|
|
dimension length is assumed. If copying a classic model file to a
|
|
netCDF-4 output file and not naming all dimensions in the chunkspec,
|
|
unnamed dimensions will also use the actual dimension length for the
|
|
chunk length. An example of a chunkspec for variables that use 'm'
|
|
and 'n' dimensions might be 'm/100,n/200' to specify 100 by 200
|
|
chunks. To see the chunking resulting from copying with a chunkspec,
|
|
use the '\-s' option of ncdump on the output file.
|
|
.IP
|
|
The chunkspec '/' that omits all dimension names and
|
|
corresponding chunk lengths specifies that no chunking is to occur in
|
|
the output, so can be used to unchunk all the chunked variables.
|
|
To see the chunking resulting from copying with a chunkspec,
|
|
use the '\-s' option of ncdump on the output file.
|
|
.IP
|
|
As an I/O optimization, \fBnccopy\fP has a threshold for the minimum size of
|
|
non-record variables that get chunked, currently 8192 bytes. The -M flag
|
|
can be used to override this value.
|
|
.IP
|
|
Note that \fBnccopy\fP requires variables that share a dimension to also
|
|
share the chunk size associated with that dimension, but the
|
|
programming interface has no such restriction. If you need to
|
|
customize chunking for variables independently, you will need to use
|
|
the second form of chunkspec. This second form of chunkspec has this
|
|
syntax: \fI var:n1,n2,...,nn \fP. This assumes that the variable named
|
|
"var" has rank n. The chunking to be applied to each dimension of the
|
|
variable is specified by the values of n1 through nn. This second
|
|
form of chunking specification can be repeated multiple times to specify
|
|
the exact chunking for different variables.
|
|
If the variable is specified but no chunk sizes are specified (i.e. \fI -d var: \fP)
|
|
then chunking is disabled for that variable.
|
|
If the same variable is specified
|
|
more than once, the second and later specifications are ignored.
|
|
Also, this second form, per-variable chunking, takes precedence over any
|
|
per-dimension chunking except the bare "/" case.
|
|
.IP "\fB \-v \fP \fI var1,... \fP"
|
|
The output will include data values for the specified variables, in
|
|
addition to the declarations of all dimensions, variables, and
|
|
attributes. One or more variables must be specified by name in the
|
|
comma-delimited list following this option. The list must be a single
|
|
argument to the command, hence cannot contain unescaped blanks or
|
|
other white space characters. The named variables must be valid netCDF
|
|
variables in the input-file. A variable within a group in a netCDF-4
|
|
file may be specified with an absolute path name, such as
|
|
"/GroupA/GroupA2/var". Use of a relative path name such as 'var' or
|
|
"grp/var" specifies all matching variable names in the file. The
|
|
default, without this option, is to include data values for \fI all \fP variables
|
|
in the output.
|
|
.IP "\fB \-V \fP \fI var1,... \fP"
|
|
The output will include the specified variables only but all dimensions and
|
|
global or group attributes. One or more variables must be specified by name in the
|
|
comma-delimited list following this option. The list must be a single argument
|
|
to the command, hence cannot contain unescaped blanks or other white space
|
|
characters. The named variables must be valid netCDF variables in the
|
|
input-file. A variable within a group in a netCDF-4 file may be specified with
|
|
an absolute path name, such as '/GroupA/GroupA2/var'. Use of a relative path
|
|
name such as 'var' or 'grp/var' specifies all matching variable names in the
|
|
file. The default, without this option, is to include \fI all \fP variables in the
|
|
output.
|
|
.IP "\fB \-g \fP \fI grp1,... \fP"
|
|
The output will include data values only for the specified groups.
|
|
One or more groups must be specified by name in the comma-delimited
|
|
list following this option. The list must be a single argument to the
|
|
command. The named groups must be valid netCDF groups in the
|
|
input-file. The default, without this option, is to include data values for all
|
|
groups in the output.
|
|
.IP "\fB \-G \fP \fI grp1,... \fP"
|
|
The output will include only the specified groups.
|
|
One or more groups must be specified by name in the comma-delimited
|
|
list following this option. The list must be a single argument to the
|
|
command. The named groups must be valid netCDF groups in the
|
|
input-file. The default, without this option, is to include all groups in the
|
|
output.
|
|
.IP "\fB \-m \fP \fI bufsize \fP"
|
|
An integer or floating-point number that specifies the size, in bytes,
|
|
of the copy buffer used to copy large variables. A suffix of K, M, G,
|
|
or T multiplies the copy buffer size by one thousand, million,
|
|
billion, or trillion, respectively. The default is 5 Mbytes,
|
|
but will be increased if necessary to hold at least one chunk of
|
|
netCDF-4 chunked variables in the input file. You may want to specify
|
|
a value larger than the default for copying large files over high
|
|
latency networks. Using the '\-w' option may provide better
|
|
performance, if the output fits in memory.
|
|
.IP "\fB \-h \fP \fI chunk_cache \fP"
|
|
For netCDF-4 output, including netCDF-4 classic model, an integer or
|
|
floating-point number that specifies the size in bytes of chunk cache
|
|
allocated for each chunked variable. This is not a property of the file, but merely
|
|
a performance tuning parameter for avoiding compressing or
|
|
decompressing the same data multiple times while copying and changing
|
|
chunk shapes. A suffix of K, M, G, or T multiplies the chunk cache
|
|
size by one thousand, million, billion, or trillion, respectively.
|
|
The default is 4.194304 Mbytes (or whatever was specified for the
|
|
configure-time constant CHUNK_CACHE_SIZE when the netCDF library was
|
|
built). Ideally, the \fBnccopy\fP utility should accept only one memory
|
|
buffer size and divide it optimally between a copy buffer and chunk
|
|
cache, but no general algorithm for computing the optimum chunk cache
|
|
size has been implemented yet. Using the '\-w' option may provide
|
|
better performance, if the output fits in memory.
|
|
.IP "\fB \-e \fP \fI cache_elems \fP"
|
|
For netCDF-4 output, including netCDF-4 classic model, specifies
|
|
number of chunks that the chunk cache can hold. A suffix of K, M, G,
|
|
or T multiplies the number of chunks that can be held in the cache
|
|
by one thousand, million, billion, or trillion, respectively. This is not a
|
|
property of the file, but merely a performance tuning parameter for
|
|
avoiding compressing or decompressing the same data multiple times
|
|
while copying and changing chunk shapes. The default is 1009 (or
|
|
whatever was specified for the configure-time constant
|
|
CHUNK_CACHE_NELEMS when the netCDF library was built). Ideally, the
|
|
\fBnccopy\fP utility should determine an optimum value for this parameter,
|
|
but no general algorithm for computing the optimum number of chunk
|
|
cache elements has been implemented yet.
|
|
.IP "\fB \-r \fP"
|
|
Read netCDF classic or 64-bit offset input file into a diskless netCDF
|
|
file in memory before copying. Requires that input file be small
|
|
enough to fit into memory. For \fBnccopy\fP, this doesn't seem to provide
|
|
any significant speedup, so may not be a useful option.
|
|
.IP "\fB \-L \fP \fIn\fP"
|
|
Set the log level; only usable if nccopy supports netCDF-4 (enhanced).
|
|
.IP "\fB \-M \fP \fIn\fP"
|
|
Set the minimum chunk size; only usable if nccopy supports netCDF-4 (enhanced).
|
|
.IP "\fB \-F \fP \fIfilterspec\fP"
|
|
For netCDF-4 output, including netCDF-4 classic model, specify a filter
|
|
to apply to an specified variable in the output. As a rule, the filter
|
|
is a compression/decompression algorithm with a unique numeric identifier
|
|
assigned by the HDF Group (see https://support.hdfgroup.org/services/filters.html).
|
|
.IP
|
|
The \fIfilterspec\fP argument has this general form.
|
|
.RS
|
|
fqn,filterid,param1,param2...paramn
|
|
.RE
|
|
The fqn (fully qualified name) is the name
|
|
of a variable prefixed by its containing
|
|
groups with the group names separated by forward slash ('/').
|
|
An example might be \FI/g1/g2/var\fP. Alternatively,
|
|
just the variable name can be given if it is in the root group:
|
|
e.g. \FIvar\fP. Backslash escapes may be used as needed.
|
|
The filterid is an unsigned positive integer representing the id
|
|
assigned by the HDFgroup to the filter. Following the id is a sequence of
|
|
parameters defining the operation of the filter. Each parameter
|
|
is a 32-bit unsigned integer.
|
|
.IP
|
|
This parameter may be repeated multiple times with different
|
|
variable names.
|
|
|
|
.SH EXAMPLES
|
|
.LP
|
|
Make a copy of foo1.nc, a netCDF file of any type, to foo2.nc, a
|
|
netCDF file of the same type:
|
|
.RS
|
|
.HP
|
|
nccopy foo1.nc foo2.nc
|
|
.RE
|
|
.LP
|
|
Note that the above copy will not be as fast as use of cp or other
|
|
simple copy utility, because the file is copied using only the netCDF
|
|
API. If the input file has extra bytes after the end of the netCDF
|
|
data, those will not be copied, because they are not accessible
|
|
through the netCDF interface. If the original file was generated in
|
|
"No fill" mode so that fill values are not stored for padding for data
|
|
alignment, the output file may have different padding bytes.
|
|
.LP
|
|
Convert a netCDF-4 classic model file, compressed.nc, that uses compression,
|
|
to a netCDF-3 file classic.nc:
|
|
.RS
|
|
.HP
|
|
nccopy \-k classic compressed.nc classic.nc
|
|
.RE
|
|
.LP
|
|
Note that 'nc3' could be used instead of 'classic'.
|
|
.LP
|
|
Download the variable 'time_bnds' and its associated attributes from
|
|
an OPeNDAP server and copy the result to a netCDF file named 'tb.nc':
|
|
.RS
|
|
.HP
|
|
nccopy 'http://test.opendap.org/opendap/data/nc/sst.mnmean.nc.gz?time_bnds' tb.nc
|
|
.RE
|
|
.LP
|
|
Note that URLs that name specific variables as command-line arguments
|
|
should generally be quoted, to avoid the shell interpreting special
|
|
characters such as '?'.
|
|
.LP
|
|
Compress all the variables in the input file foo.nc, a netCDF file of any
|
|
type, to the output file bar.nc:
|
|
.RS
|
|
.HP
|
|
nccopy \-d1 foo.nc bar.nc
|
|
.RE
|
|
.LP
|
|
If foo.nc was a classic or 64-bit offset netCDF file, bar.nc will be a
|
|
netCDF-4 classic model netCDF file, because the classic and 64-bit
|
|
offset format variants don't support compression. If foo.nc was a
|
|
netCDF-4 file with some variables compressed using various deflation
|
|
levels, the output will also be a netCDF-4 file of the same type, but
|
|
all the variables, including any uncompressed variables in the input,
|
|
will now use deflation level 1.
|
|
.LP
|
|
Assume the input data includes gridded variables that use time, lat,
|
|
lon dimensions, with 1000 times by 1000 latitudes by 1000 longitudes,
|
|
and that the time dimension varies most slowly. Also assume that
|
|
users want quick access to data at all times for a small set of
|
|
lat-lon points. Accessing data for 1000 times would typically require
|
|
accessing 1000 disk blocks, which may be slow.
|
|
.LP
|
|
Reorganizing the data into chunks on disk that have all the time in
|
|
each chunk for a few lat and lon coordinates would greatly speed up
|
|
such access. To chunk the data in the input file slow.nc, a netCDF
|
|
file of any type, to the output file fast.nc, you could use;
|
|
.RS
|
|
.HP
|
|
nccopy \-c time/1000,lat/40,lon/40 slow.nc fast.nc
|
|
.RE
|
|
.LP
|
|
to specify data chunks of 1000 times, 40 latitudes, and 40 longitudes.
|
|
If you had enough memory to contain the output file, you could speed
|
|
up the rechunking operation significantly by creating the output in
|
|
memory before writing it to disk on close (using the -w flag):
|
|
.RS
|
|
.HP
|
|
nccopy \-w \-c time/1000,lat/40,lon/40 slow.nc fast.nc
|
|
.RE
|
|
Alternatively, one could write this using the alternate, variable-specific
|
|
chunking specification and assuming that times, lat, and lon
|
|
are variables.
|
|
.RS
|
|
.HP
|
|
nccopy \-c time:1000 -c lat:40 -c lon:40 slow.nc fast.nc
|
|
.RE
|
|
.LP
|
|
.SH "Chunking Rules"
|
|
.LP
|
|
The complete set of chunking rules is captured here. As a rough
|
|
summary, these rules preserve all chunking properties from the
|
|
input file. These rules apply only when the selected output
|
|
format supports chunking, i.e. for the netcdf-4 variants.
|
|
.LP
|
|
The variable specific chunking specification should be obvious
|
|
and translates directly to the corresponding "nc_def_var_chunking"
|
|
API call.
|
|
.LP
|
|
.\" see: https://github.com/Unidata/netcdf-c/issues/725
|
|
The original per-dimension, chunking specification requires some
|
|
interpretation by nccopy.
|
|
The following rules are applied in the given order independently
|
|
for each variable to be copied from input to output. The rules are
|
|
written assuming we are trying to determine the chunking for a given
|
|
output variable Vout that comes from an input variable Vin.
|
|
.IP "1."
|
|
For each dimension of Vout explicitly specified on the command line
|
|
(using the '-c' option), apply the chunking value for that
|
|
dimension regardless of input format or input properties.
|
|
.IP "2."
|
|
For dimensions of Vout not named on the command line, preserve chunk
|
|
sizes from the corresponding input variable, if it is chunked.
|
|
.IP "3."
|
|
If Vin is contiguous, and none of its dimensions are
|
|
named on the command line, and chunking is not mandated by other
|
|
options, then make Vout be contiguous.
|
|
.IP "4."
|
|
If the input variable is contiguous (or is some netcdf-3
|
|
variant) and there are no options requiring chunking, or the '/'
|
|
special case for the '-c' option is specified, then the output
|
|
variable V is marked as contiguous.
|
|
.IP "5."
|
|
Final, default case: some or all chunk sizes are not
|
|
determined by the command line or the input
|
|
variable. This includes the non-chunked input cases such
|
|
as netcdf-3, cdf5, and DAP. In these cases retain all
|
|
chunk sizes determined by previous rules, and use the full
|
|
dimension size as the default. The exception is unlimited dimensions,
|
|
where the default is 4 megabytes.
|
|
|
|
.SH "SEE ALSO"
|
|
.LP
|
|
.BR ncdump(1), ncgen(1), netcdf(3)
|