mirror of
https://github.com/Unidata/netcdf-c.git
synced 2024-11-27 07:30:33 +08:00
304 lines
14 KiB
Groff
304 lines
14 KiB
Groff
.\" $Id: nccopy.1 400 2010-08-27 21:02:52Z russ $
|
|
.TH NCCOPY 1 "2012-03-08" "Release 4.2" "UNIDATA UTILITIES"
|
|
.SH NAME
|
|
nccopy \- Copy a netCDF file to specified variant of netCDF format,
|
|
optionally compressing or chunking data in the output copy.
|
|
.SH SYNOPSIS
|
|
.ft B
|
|
.HP
|
|
nccopy
|
|
.nh
|
|
\%[-k \fI kind \fP]
|
|
\%[-d \fI n \fP]
|
|
\%[-s]
|
|
\%[-c \fI chunkspec \fP]
|
|
\%[-u]
|
|
\%[-w]
|
|
\%[-[v|V] var1,...]
|
|
\%[-[g|G] grp1,...]
|
|
\%[-m \fI bufsize \fP]
|
|
\%[-h \fI chunk_cache \fP]
|
|
\%[-e \fI cache_elems \fP]
|
|
\%[-r]
|
|
\%\fI infile \fP
|
|
\%\fI outfile \fP
|
|
.hy
|
|
.ft
|
|
.SH DESCRIPTION
|
|
.LP
|
|
The \fBnccopy\fP utility copies an input netCDF file in any supported
|
|
format variant to an output netCDF file, optionally converting the
|
|
output to any compatible netCDF format variant, compressing the data,
|
|
or rechunking the data. For example, if built with the netCDF-3
|
|
library, a netCDF classic file may be copied to a netCDF 64-bit offset
|
|
file, permitting larger variables. If built with the netCDF-4
|
|
library, a netCDF classic file may be copied to a netCDF-4 file or to
|
|
a netCDF-4 classic model file as well, permitting data compression,
|
|
efficient schema changes, larger variable sizes, and use of other
|
|
netCDF-4 features.
|
|
.LP
|
|
\fBnccopy\fP also serves as an example of a generic netCDF-4 program,
|
|
with its ability to read any valid netCDF file and handle nested
|
|
groups, strings, and user-defined types, including arbitrarily
|
|
nested compound types, variable-length types, and data of any valid
|
|
netCDF-4 type.
|
|
.LP
|
|
If DAP support was enabled when \fBnccopy\fP was built, the file name may
|
|
specify a DAP URL. This may be used to convert data on DAP servers to
|
|
local netCDF files.
|
|
.SH OPTIONS
|
|
.IP "\fB -k \fP \fI kind \fP"
|
|
Specifies the kind of file to be created (that is, the format variant)
|
|
and, by inference, the data model (i.e. netcdf-3 (classic) versus
|
|
netcdf-4 (enhanced)). The possible arguments are as follows.
|
|
.RS
|
|
.RS
|
|
.IP "'1' or 'classic' => netcdf classic format"
|
|
.IP "'2', '64-bit-offset', or '64-bit offset' => netCDF 64-bit format"
|
|
.IP "'3', 'hdf5', 'netCDF-4', or 'enhanced' => netCDF-4 format (enhanced data model)"
|
|
.IP "'4', 'hdf5-nc3', 'netCDF-4 classic model', or 'enhanced-nc3' => netCDF-4 classic model format"
|
|
.RE
|
|
.RE
|
|
.IP
|
|
If no value for -k is specified, then the output will use the same
|
|
format as the input, except if the input is classic or 64-bit offset
|
|
and either chunking or compression is specified, in which case the
|
|
output will be netCDF-4 classic model format. Note that attempting
|
|
some kinds of format conversion will result in an error, if the
|
|
conversion is not possible. For example, an attempt to copy a
|
|
netCDF-4 file that uses features of the enhanced model, such as
|
|
groups or variable-length strings, to any of the other kinds of netCDF
|
|
formats that use the classic model will result in an error.
|
|
.IP "\fB -d \fP \fI n \fP"
|
|
For netCDF-4 output, including netCDF-4 classic model, specify
|
|
deflation level (level of compression) for variable data output. 0
|
|
corresponds to no compression and 9 to maximum compression, with
|
|
higher levels of compression requiring marginally more time to
|
|
compress or uncompress than lower levels. Compression achieved may
|
|
also depend on output chunking parameters. If this option is
|
|
specified for a classic format or 64-bit offset format input file, it
|
|
is not necessary to also specify that the output should be netCDF-4
|
|
classic model, as that will be the default. If this option is not
|
|
specified and the input file has compressed variables, the compression
|
|
will still be preserved in the output, using the same chunking as in
|
|
the input by default.
|
|
.IP
|
|
Note that specifying output deflation level with \fBnccopy\fP results in
|
|
all output variables compressed using
|
|
the same compression level, but the API has no such restriction. With
|
|
a program you can customize compression for each variable independently.
|
|
.IP "\fB -s \fP"
|
|
For netCDF-4 output, including netCDF-4 classic model, specify
|
|
shuffling of variable data bytes before compression or after
|
|
decompression. This option is ignored unless a non-zero deflation
|
|
level is specified. Turning shuffling on sometimes improves
|
|
compression.
|
|
.IP "\fB -u \fP"
|
|
Convert any unlimited size dimensions in the input to fixed size
|
|
dimensions in the output. This can speed up variable-at-a-time
|
|
access, but slow down record-at-a-time access to multiple variables
|
|
along an unlimited dimension.
|
|
.IP "\fB -w \fP"
|
|
Keep output in memory (as a diskless netCDF file) until output is
|
|
closed, at which time output file is written to disk. This can
|
|
greatly speedup operations such as converting unlimited dimension to
|
|
fixed size (-u option), chunking, rechunking, or compressing the
|
|
input. It requires that available memory is large enough to hold the
|
|
output file. This option may provide a larger speedup than careful
|
|
tuning of the -m, -h, or -e options, and it's certainly a lot simpler.
|
|
.IP "\fB -c \fP \fIchunkspec\fP"
|
|
For netCDF-4 output, including netCDF-4 classic model, specify
|
|
chunking (multidimensional tiling) for variable data in the output.
|
|
This is useful to specify the units of disk access, compression, or
|
|
other filters such as checksums. Changing the chunking in a netCDF
|
|
file can also greatly speedup access, by choosing chunk shapes that
|
|
are appropriate for the most common access patterns.
|
|
.IP
|
|
The \fIchunkspec\fP argument is a string of comma-separated
|
|
associations, each specifying a dimension name, a '/' character, and
|
|
optionally the corresponding chunk length for that dimension. No
|
|
blanks should appear in the chunkspec string, except possibly escaped
|
|
blanks that are part of a dimension name. A chunkspec must name at
|
|
least one dimension, and may omit dimensions which are not to be
|
|
chunked or for which the default chunk length is desired. If a
|
|
dimension name is followed by a '/' character but no subsequent chunk
|
|
length, the actual dimension length is assumed. If copying a classic
|
|
model file to a netCDF-4 output file and not naming all dimensions in
|
|
the chunkspec, unnamed dimensions will also use the actual dimension
|
|
length for the chunk length. An example of a chunkspec for variables
|
|
that use 'm' and 'n' dimensions might be 'm/100,n/200' to specify
|
|
100 by 200 chunks. To see the chunking resulting from copying with a
|
|
chunkspec, use the '-s' option of ncdump on the output file.
|
|
.IP
|
|
As an I/O optimization, \fBnccopy\fP has a threshold for the minimum size of
|
|
non-record variables that get chunked, currently 8192 bytes. In the future,
|
|
use of this threshold and its size may be settable in an option.
|
|
.IP
|
|
Note that \fBnccopy\fP requires variables that share a dimension to
|
|
also share the chunk size associated with that dimension, but the
|
|
programming interface has no such restriction. If you need to
|
|
customize chunking for variables independently, you will need to use
|
|
the library API in a custom utility program.
|
|
.IP "\fB -v \fP \fI var1,... \fP"
|
|
The output will include data values for the specified variables, in
|
|
addition to the declarations of all dimensions, variables, and
|
|
attributes. One or more variables must be specified by name in the
|
|
comma-delimited list following this option. The list must be a single
|
|
argument to the command, hence cannot contain unescaped blanks or
|
|
other white space characters. The named variables must be valid netCDF
|
|
variables in the input-file. A variable within a group in a netCDF-4
|
|
file may be specified with an absolute path name, such as
|
|
'/GroupA/GroupA2/var'. Use of a relative path name such as 'var' or
|
|
'grp/var' specifies all matching variable names in the file. The
|
|
default, without this option, is to include data values for \fI all \fP variables
|
|
in the output.
|
|
.IP "\fB -V \fP \fI var1,... \fP"
|
|
The output will include the specified variables only but all dimensions and
|
|
global or group attributes. One or more variables must be specified by name in the
|
|
comma-delimited list following this option. The list must be a single argument
|
|
to the command, hence cannot contain unescaped blanks or other white space
|
|
characters. The named variables must be valid netCDF variables in the
|
|
input-file. A variable within a group in a netCDF-4 file may be specified with
|
|
an absolute path name, such as '/GroupA/GroupA2/var'. Use of a relative path
|
|
name such as 'var' or 'grp/var' specifies all matching variable names in the
|
|
file. The default, without this option, is to include \fI all \fP variables in the
|
|
output.
|
|
.IP "\fB -g \fP \fI grp1,... \fP"
|
|
The output will include data values only for the specified groups.
|
|
One or more groups must be specified by name in the comma-delimited
|
|
list following this option. The list must be a single argument to the
|
|
command. The named groups must be valid netCDF groups in the
|
|
input-file. The default, without this option, is to include data values for all
|
|
groups in the output.
|
|
.IP "\fB -G \fP \fI grp1,... \fP"
|
|
The output will include only the specified groups.
|
|
One or more groups must be specified by name in the comma-delimited
|
|
list following this option. The list must be a single argument to the
|
|
command. The named groups must be valid netCDF groups in the
|
|
input-file. The default, without this option, is to include all groups in the
|
|
output.
|
|
.IP "\fB -m \fP \fI bufsize \fP"
|
|
An integer or floating-point number that specifies the size, in bytes,
|
|
of the copy buffer used to copy large variables. A suffix of K, M, G,
|
|
or T multiplies the copy buffer size by one thousand, million,
|
|
billion, or trillion, respectively. The default is 5 Mbytes,
|
|
but will be increased if necessary to hold at least one chunk of
|
|
netCDF-4 chunked variables in the input file. You may want to specify
|
|
a value larger than the default for copying large files over high
|
|
latency networks. Using the '-w' option may provide better
|
|
performance, if the output fits in memory.
|
|
.IP "\fB -h \fP \fI chunk_cache \fP"
|
|
For netCDF-4 output, including netCDF-4 classic model, an integer or
|
|
floating-point number that specifies the size in bytes of chunk cache
|
|
for each chunked variable. This is not a property of the file, but merely
|
|
a performance tuning parameter for avoiding compressing or
|
|
decompressing the same data multiple times while copying and changing
|
|
chunk shapes. A suffix of K, M, G, or T multiplies the chunk cache
|
|
size by one thousand, million, billion, or trillion, respectively.
|
|
The default is 4.194304 Mbytes (or whatever was specified for the
|
|
configure-time constant CHUNK_CACHE_SIZE when the netCDF library was
|
|
built). Ideally, the \fBnccopy\fP utility should accept only one
|
|
memory buffer size and divide it optimally between a copy buffer and
|
|
chunk cache, but no general algorithm for computing the optimum chunk
|
|
cache size has been implemented yet. Using the '-w' option may
|
|
provide better performance, if the output fits in memory.
|
|
.IP "\fB -e \fP \fI cache_elems \fP"
|
|
For netCDF-4 output, including netCDF-4 classic model, specifies
|
|
number of chunkss that the chunk cache can hold. A suffix of K, M, G,
|
|
or T multiplies the number of chunks that can be held in the cache
|
|
by one thousand, million, billion, or trillion, respectively. This is not a
|
|
property of the file, but merely a performance tuning parameter for
|
|
avoiding compressing or decompressing the same data multiple times
|
|
while copying and changing chunk shapes. The default is 1009 (or
|
|
whatever was specified for the configure-time constant
|
|
CHUNK_CACHE_NELEMS when the netCDF library was built). Ideally, the
|
|
\fBnccopy\fP utility should determine an optimum value for this
|
|
parameter, but no general algorithm for computing the optimum number
|
|
of chunk cache elements has been implemented yet.
|
|
.IP "\fB -r \fP"
|
|
Read netCDF classic or 64-bit offset input file into a diskless netCDF
|
|
file in memory before copying. Requires that input file be small
|
|
enough to fit into memory. For nccopy, this doesn't seem to provide
|
|
any significant speedup, so may not be a useful option.
|
|
.SH EXAMPLES
|
|
.LP
|
|
Make a copy of foo1.nc, a netCDF file of any type, to foo2.nc, a
|
|
netCDF file of the same type:
|
|
.RS
|
|
.HP
|
|
nccopy foo1.nc foo2.nc
|
|
.RE
|
|
.LP
|
|
Note that the above copy will not be as fast as use of cp or other
|
|
simple copy utility, because the file is copied using
|
|
only the netCDF
|
|
API. If the input file has extra bytes
|
|
after the end of the
|
|
netCDF data, those will not be copied, because they are not accessible
|
|
through the netCDF interface. If the original file was generated in
|
|
'No fill' mode so that fill values are not stored for padding for data
|
|
alignment, the output file may have different padding bytes.
|
|
.LP
|
|
Convert a netCDF-4 classic model file, compressed.nc, that uses compression,
|
|
to a netCDF-3 file classic.nc:
|
|
.RS
|
|
.HP
|
|
nccopy -k classic compressed.nc classic.nc
|
|
.RE
|
|
.LP
|
|
Note that '1' could be used instead of 'classic'.
|
|
.LP
|
|
Download the variable 'time_bnds' and its associated attributes from
|
|
an OPeNDAP server and copy the result to a netCDF file named 'tb.nc':
|
|
.RS
|
|
.HP
|
|
nccopy 'http://test.opendap.org/opendap/data/nc/sst.mnmean.nc.gz?time_bnds' tb.nc
|
|
.RE
|
|
.LP
|
|
Note that URLs that name specific variables as command-line arguments
|
|
should generally be quoted, to avoid the shell interpreting special
|
|
characters such as '?'.
|
|
.LP
|
|
Compress all the variables in the input file foo.nc, a netCDF file of any
|
|
type, to the output file bar.nc:
|
|
.RS
|
|
.HP
|
|
nccopy -d1 foo.nc bar.nc
|
|
.RE
|
|
.LP
|
|
If foo.nc was a classic or 64-bit offset netCDF file, bar.nc will be a
|
|
netCDF-4 classic model netCDF file, because the classic and 64-bit
|
|
offset format variants don't support compression. If foo.nc was a
|
|
netCDF-4 file with some variables compressed using various deflation
|
|
levels, the output will also be a netCDF-4 file of the same type, but
|
|
all the variables, including any uncompressed variables in the input,
|
|
will now use deflation level 1.
|
|
.LP
|
|
Assume the input data includes gridded variables that use time, lat,
|
|
lon dimensions, with 1000 times by 1000 latitudes by 1000 longitudes,
|
|
and that the time dimension varies most slowly. Also assume that
|
|
users want quick access to data at all times for a small set of
|
|
lat-lon points. Accessing data for 1000 times would typically require
|
|
accessing 1000 disk blocks, which may be slow.
|
|
.LP
|
|
Reorganizing the data into chunks on disk that have all the time in
|
|
each chunk for a few lat and lon coordinates would greatly speed up
|
|
such access. To chunk the data in the input file slow.nc, a netCDF
|
|
file of any type, to the output file fast.nc, you could use;
|
|
.RS
|
|
.HP
|
|
nccopy -c time/1000,lat/40,lon/40 slow.nc fast.nc
|
|
.RE
|
|
.LP
|
|
to specify data chunks of 1000 times, 40 latitudes, and 40 longitudes.
|
|
If you had enough memory to contain the output file, you could speed
|
|
up the rechunking operation significantly by creating the output in
|
|
memory before writing it to disk on close:
|
|
.RS
|
|
.HP
|
|
nccopy -w -c time/1000,lat/40,lon/40 slow.nc fast.nc
|
|
.RE
|
|
.SH "SEE ALSO"
|
|
.LP
|
|
.BR ncdump(1), ncgen(1), netcdf(3)
|