netcdf-c/docs/old/netcdf-internal.texi

682 lines
21 KiB
Plaintext
Raw Normal View History

2010-06-03 21:24:43 +08:00
\input texinfo @c -*-texinfo-*-
@comment $Id: netcdf-internal.texi,v 1.5 2009/09/29 18:49:08 dmh Exp $
@c %**start of header
@setfilename netcdf-internal.info
@settitle Documentation for NetCDF Library Internals
@setcontentsaftertitlepage
@c %**end of header
@setchapternewpage off
@html
<link rel=stylesheet type=text/css href=http://www.unidata.ucar.edu/themes/Unidata/style/style.css>
@end html
@dircategory netCDF scientific data format
@direntry
* netcdf-internal: (netcdf-internal). Documentation for NetCDF Library Internals
@end direntry
@titlepage
@title Documentation for NetCDF Library Internals
@author Ed Hartnett
@author Unidata Program Center
@page
@vskip Opt plus 1filll
This is internal documentation of the netCDF library.
Copyright @copyright{} 2004, Unidata Program Center
@end titlepage
@ifnottex
@node Top, C Code, (dir), (dir)
@top NetCDF Library Internals
@end ifnottex
The most recent update of this documentation was released with version
3.6.0 of netCDF.
The netCDF data model is described in The NetCDF Users' Guide
(@pxref{Top, The NetCDF Users Guide,, netcdf, The NetCDF
Users' Guide}).
Reference guides are available for the C (@pxref{Top, The NetCDF Users
Guide for C,, netcdf-c, The NetCDF Users' Guide for C}), C++
(@pxref{Top, The NetCDF Users' Guide for C++,, netcdf-cxx, The NetCDF
Users Guide for C++}), FORTRAN 77 (@pxref{Top, The NetCDF Users' Guide
for FORTRAN 77,, netcdf-f77, The NetCDF Users Guide for FORTRAN 77}),
and FORTRAN (@pxref{Top, The NetCDF Users' Guide for FORTRAN 90,,
netcdf-f90, The NetCDF Users' Guide for FORTRAN 90}).
@menu
* C Code::
* Derivative Works::
2010-06-03 21:24:43 +08:00
* Concept Index::
@detailmenu
--- The Detailed Node Listing ---
C Code
* libsrc directory::
* nc_test directory::
* nctest directory::
* cxx directory::
* man directory::
* ncgen and ncgen4 directories::
* ncdump directory::
* fortran directory::
@end detailmenu
@end menu
@node C Code, Derivative Works, Top, Top
2010-06-03 21:24:43 +08:00
@chapter C Code
The netCDF library is implemented in C in a bunch of directories under
netcdf-3.
@menu
* libsrc directory::
* nc_test directory::
* nctest directory::
* cxx directory::
* man directory::
* ncgen and ncgen4 directories::
* ncdump directory::
* fortran directory::
@end menu
@node libsrc directory, nc_test directory, C Code, C Code
@section libsrc directory
The libsrc directory holds the core library C code.
@subsection m4 Files
The m4 macro processor is used as a pre-pre-processor for attr.m4,
putget.m4, ncx.m4, t_ncxx.m4. The m4 macros are used to deal with the
6 different netcdf data types.
@table @code
@item attr.m4
Attribute functions.
@item putget.m4
Contains putNCvx_@var{type}_@var{type}, putNCv_@var{type}_@var{type},
getNCvx_@var{type}_@var{type}, getNCv_@var{type}_@var{type}, plus a
bunch of other important internal functions dealing with reading and writing
data. External functions nc_put_var@var{X}_@var{type} and
nc_get_var@var{X}_@var{type}, are implemented.
@item ncx.m4
Contains netCDF implementation of XDR. Bit-fiddling on VAXes and other
fun stuff.
@item t_ncxx.m4
Test program for netCDF XDR library.
@end table
@subsection C Header Files
@table @code
@item netcdf.h
The formal definition of the netCDF API.
@item nc.h
Private data structures, objects and interfaces.
@item ncio.h
I/O abstraction interface, including struct ncio.
@item ncx.h
External data representation interface.
@item fbits.h
Preprocessor macros for dealing with flags: fSet, fClr, fIsSet, fMask,
pIf, pIff.
@item ncconfig.h
Generated automatically by configure.
@item onstack.h
This file provides definitions which allow us to "allocate" arrays on
the stack where possible. (Where not possible, malloc and free are
used.)
@item rnd.h
Some rounding macros.
@end table
@subsection C Code Files
@table @code
@item nc.c
Holds nc_open, nc_create, nc_enddef, nc_close, nc_delete, nc_abort,
nc_redef, nc_inq, nc_sync,, nc_set_fill. It also holds a lot if
internal functions.
@item attr.c
Generated from attr.m4, contains attribute functions.
@item dim.c
Dimension functions.
@item var.c
Variable function
@item v1hpg.c
This module defines the external representation of the "header" of a
netcdf version one file. For each of the components of the NC
structure, There are (static) ncx_len_XXX(), ncx_put_XXX() and
v1h_get_XXX() functions. These define the external representation of
the components. The exported entry points for the whole NC structure
are built up from these.
Although the name of this file implies that it should only apply to
data version 1, it was modified by the 64-bit offset people, so that
it actually handles version 2 data as well.
@item v2i.c
Version 2 API implemented in terms of version 3 API.
@item error.c
Contains nc_strerror.
@item ncio.c
Just decides whether to use ffio.c (for Crays) or posixio.c (for
everyone else). (See below).
@item posixio.c
@itemx ffio.c
Some file I/O, and a Cray-specific implementation. These two files
have some functions with the same name and signatures, for example
ncio_open. If building on a Cray, ffio.c is used. If building on a
posix system, posixio.c is used.
One of the really complex functions in posixio.c is px_get, which
reads data from a netCDF file in perhaps a super-efficient manner?
There are functions for the rel, get, move, sync, free operations, one
set of functions (with _spx_) if NC_SHARE is in use, and another set
(with _px_) when NC_SHARE is not in use.
See also the section I/O Layering below.
@item putget.c
Generated from putget.m4.
@item string.c
NC_string structures are manipulated here. Also contains NC_check_name.
@item imap.c
Check map functionality?
@item libvers.c
Implements nc_inq_libvers.
@item ncx.c
@itemx ncx_cray.c
Created from ncx.m4, this file contains implementation of netCDF XDR,
and a Cray-specific implementation.
@item t_nc.c
@itemx t_ncio.c
@itemx t_ncx.c
There are extra tests, activated with make full_test, in . They didn't
compile on my cygwin system, but worked fine on linux. See the extra
tests section below.
@end table
@subsection Makefile
Let us not neglect the Makefile, hand-crafted by Glenn and Steve to
stand the test of many different installation platforms.
@subsection I/O Layering
Here's some discussion from Glenn (July, 1997) in the support archive:
@example
Given any platform specific I/O system which is capable of
random access, it is straightforward to write an ncio
implementation which uses that I/O system. A competent C
programmer can do it in less than a day. The interface is
defined in ncio.h. There are two implementations (buffered and
unbuffered) in posixio.c, another in ffio.c, and another contributed
mmapio.c (in pub/netcdf/contrib at our ftp site)
which can be used as examples. (The buffered version in posixio.c has
gotten unreadable at this point, I'm sorry to say.)
A brief outline of the ncio interface follows.
There are 2 public 'constructors':
ncio_create()
and
ncio_open().
The first creates a new file and the second opens an existing one.
There is a public 'destructor',
ncio_close()
which closes the descriptor and calls internal function ncio.free()
to free any allocated resources.
The 'constructors' return a data structure which includes
4 other 'member functions'
ncio.get() - converts a file region specified by an offset
and extent to a memory pointer. The region may be
locked until the corresponding call to rel().
ncio.rel() - releases the region specified by offset.
ncio.move() - Copy one region to another without making anything
available to higher layers. May be just implemented in
terms of get() and rel(), or may be tricky to be efficient.
Only used in by nc_enddef() after redefinition.
and
ncio.sync() - Flush any buffers to disk. May be a no-op on
if I/O is unbuffered.
The interactions between layers and error semantics are more clearly defined
for ncio than they were for the older xdr stream based system. The functions
all return the system error indication (errno.h) or 0 for no error.
The sizehint parameter to ncio_open() and ncio_create() is a contract between
the upper layers and ncio. It a negotiated value passed. A suggested value
is passed in by reference, and may be modified upon return. The upper layers
use the returned value as a maximum extent for calls to ncio.get().
In the netcdf distribution, there is test program 't_ncio.c',
which can be used to unit test an ncio implementation. The program
is script driven, so a variety of access patterns can be tested
by feeding it different scripts.
@end example
And some more:
@example
For netcdf I/O on the CRAY, we can identify a couple of levels of buffering.
There is a "contract" between the higher layers of
netcdf and the "ncio" layer which says "we won't request more than
'chunksize' bytes per request. This controls the maximum size of
the buffer used internal to the ncio implementation for system
read and write.
On the Cray, the ncio implementation is usually 'ffio.c', which uses the
cray specific ffio library. This is in contrast to the implementation
used on most other systems, 'posixio.c', which used POSIX read(), write(),
lseek() calls. The ffio library has all sorts of controls (which I don't
pretend
to understand), including the bufa directive you cite above.
If you are using some sort of assign statement external to netcdf, it
is probably being overridden by the following code sequence from
ffio.c:
ControlString = getenv("NETCDF_FFIOSPEC");
if(ControlString == NULL)
@{
ControlString="bufa:336:2";
@}
fd = ffopens(path, oflags, 0, 0, &stat, ControlString);
EG, if the NETCDF_FFIOSPEC environment variable is not set, use the
default shown. Otherwise, use the environment variable.
Currently for ffio, the "chunksize" contract is set to the st_oblksize
member of struct ffc_stat_s obtained from a fffcntl(). You can check
this by putting a breakpoint in ffio.c:blksize().
As you may be able to tell from ffio.c, we have stubbed out interlayer
communication to allow this to be set as the *sizehintp argument to
ncio_open() and ncio_create(). In netcdf-3.4, we intend to fully implement this
and expose this parameter to the user.
@end example
In 2004, while on an unrelated research expedition to find the
Brazillian brown-toed tree frog, Russ Rew found a sealed tomb. Within - a
bronze statue of the terrible Monkey God. While scratching the netCDF logo
in the Monkey God's forehead, Russ discovered a secret compartment,
containing an old, decaying scroll.
Naturally he brought it back to Unidata's History and Antiquities
Division, on the 103rd floor of UCAR Tower #2. Within the scroll, written in
blood, Unidata archeologists found the following:
@example
Internal to the netcdf implementation, there is a parameterized
contract between the upper layers and the i/o layer regarding the
maximum amount of data to be transferred between the layers in a
single call. Call this the "chunksize." If the file is opened for
synchronous operations (NC_SHARE), this is to the amount of data
transferred from the file in a single i/o call, and, typically, the
size of the allocated buffer in the i/o layer. In the more usual
buffered case (NC_NOSHARE), accesses to the underlying file system are
alighned on chunksize boundaries and the size of the allocated buffer
is twice this number.
So, the chunksize controls a space versus time tradeoff, memory
allocated in the netcdf library versus number of system calls. It also
controls the relationship between outer and inner loop counters for
large data accesses. A large chunksize would have a small (1?) outer
look counter and a larger inner loop counter, reducing function call
overhead. (Generally this effect is much less significant than the
memory versus system call effect.)
May the curse of the Monkey God devour any who desecrate the chunksize
parameter.
@end example
@subsection Extra Tests
According to Russ make full_test runs three tests:
@table @code
@item test
the library blunder test which is a quick test of the library
implementation
@item nctest
the test of the netCDF-2 interface, which is still used in a lot of
third-party netCDF software.
@item test_ncx
a test of the XDR-replacement library. netCDF-2 used the
industry-standard XDR library, but netCDF-3 uses our own replacement
for it that Glenn wrote as ncx.c and ncxx.c.
@end table
The latter test seems to work if you run something like
@example
c89 t_ncxx.c ncx.o -o t_ncx
@end example
first to create the "t_ncx" executable, then run "make test_ncx" and
it will run the two tests "t_ncx" and "t_ncxx":
@example
$ make test_ncx
c89 -o t_ncxx -g t_ncxx.o ncx.o
./t_ncx
./t_ncxx
@end example
which produce no output if they succeed.
@node nc_test directory, nctest directory, libsrc directory, C Code
@section nc_test directory
This runs the version 3 tests suite.
The main program, nc_test, can be run with a command line option to
create a farly rich test file. It's then called again without the
option to read the file and also engage in a bunch of test writes to
scratch.nc.
@node nctest directory, cxx directory, nc_test directory, C Code
@section nctest directory
This runs the version 2 test suite.
@node cxx directory, man directory, nctest directory, C Code
@section cxx directory
This directory contains the C++ interface to netCDF.
@node man directory, ncgen and ncgen4 directories, cxx directory, C Code
@section man directory
This directory holds the .m4 file that is used to generate both the C
and fortran man pages. I wish I had known about this directory before
I introducted the doc directory! I have moved all the docs
into the man directory, and tried to delete the doc directory. But cvs
won't let me. It intends that I always remember my sins. Thanks cvs,
you're like a conscience.
@node ncgen and ncgen4 directories, ncdump directory, man directory, C Code
@section ncgen and ncgen4 directories
The ncgen directory is the home of ncgen, of course.
This program uses lex and yacc to parse the CDL input file.
Note that the ncgen program used to called ncgen4, so this
version of ncgen can in fact handle the full netCDF-4 enhanced
data model as CDL input. The old ncgen is still available,
but is now called ncgen3.
lex and yacc files:
@table @code
@item ncgen.l
Input for flex, the output of which is renamed ncgenyy.c and #included
by ncgentab.c.
@item ncgen.y
Input for yacc, the output of which is reanmed ncgentab.c.
@end table
Header files:
@table @code
@item ncgen.h
Defines a bunch of extern variables, like ncid, ndims, nvars,
... *dims, *vars, *atts.
@item ncgentab.h
Long list of defines generared by yacc?
@item genlib.h
Prototypes for all the ncgen functions.
@item generic.h
Defines union generic, which can hold any type of value (used for
handling fill values).
@end table
Code files:
@table @code
@item main.c
Main entry point. Handles command line options and then calls yyparse.
@item ncgentab.c
This is generated from ncgen.y by yacc.
@item ncgenyy.c
This is included in ncgentab.c, and not, therefore, on the compile
list in the makefile, since it's compiled as part of ncgentab.c.
@item genlib.c
Bunch of functions for ncgen, including ones to write appropriate
fortran or C code for a netcdf file. Neat! Also the gen_netcdf
function, which actually writes out the netcdf file being generated
by ncgen. Also has emalloc function.
@item getfill.c
A few functions dealing with fill values and their defaults.
@item init.c
@item load.c
@item escapes.c
Contains one functions, expand_escapes, expands escape characters,
like \t, in input.
@end table
@node ncdump directory, fortran directory, ncgen and ncgen4 directories, C Code
@section ncdump directory
This directory holds ncdump, of course! No m4 or any of that stuff
here - just plain old C.
@node fortran directory, , ncdump directory, C Code
@section fortran directory
Amazingly, the fortran interface is actually C code! Steve gets some
package involving cfortran.h, which defines a C function of the exact
signature which will be produced by a fortran program calling a C
function. So _nf_open will map to nc_open.
@node Derivative Works, Concept Index, C Code, Top
@chapter Derivative Works
2010-06-03 21:24:43 +08:00
At Unidata, the creative energies are simply enormous. NetCDF has
spawned a host of derivative works, some samples of which are listed
2010-06-03 21:24:43 +08:00
below.
@section From ``A Tale of Two Data Formats,'' the bestselling novel
@example
It was the best of times, it was the worst of times,
it was the age of webpages, it was the age of ftp,
it was the epoch of free software, it was the epoch of Microsoft,
it was the season of C, it was the season of Java,
it was the spring of hope, it was the winter of despair,
we had everything before us, we had nothing before us,
we were all going direct to HDF5, we were all going direct
the other way--
@end example
@section ``The Marriage of NetCDF,'' an Opera in Three Acts
The massive response to this opera has been called the ``Marriage
Phenomenon'' by New York Times Sunday Arts and Leisure Section Editor
Albert Winklepops. The multi-billion dollar marketing empire that
sprang from the opera after it's first season at the Cambridge Theatre
in London's West End is headquartered in little-known Boulder,
Colorado (most famous resident: Mork from Ork).
@example
Dr. Rew:
[enters stage left and moves downstage center for aria]
I've developed a format,
a great data format,
and it must take over the world!
But it might be surpassed,
by that pain in the ass,
that data format
from Illinois!
Mike Folk:
[enters stage right and moves downstage center]
I've a nice data format,
a real data format,
with plenty of features,
yoh-hoooo
But scientists don't use it,
they say it's confusing,
A simpler API
would dooooo...
@end example
@contents
@node Transcript from Jerry Springer Show
@section Transcript from Jerry Springer Show, aired 12/12/03,
@subsection show title:``I've Beed Dumped for a Newer Data Format''
@example
[Announcer]: ...live today from Boulder, Colorado, the Jerry Springer Show!
[Audience]: (chanting) Jer-ry! Jer-ry!
[Jerry]: Hello and welcome to the Jerry Springer show - today we have
some very special guests, starting with netCDF, a data format from
right here in Boulder.
[Audience]: (applause)
[netCDF]: Hi Jerry!
[Jerry]: Hello netCDF, and welcome to the show. You're here to tell us
something, right?
[netCDF]: (sobbing) I think I'm going to be abandoned for a newer,
younger data format.
[Audience]: (gasps)
[Jerry]: Well, netCDF, we've brought in WRF, a professional model,
also from right here in Boulder.
[Audience]: (applause)
[Jerry]: Now WRF, you've been involved in an intimate relationship
with netCDF for how long now?
[WRF]: Oh yea, like, a looong time. But, you know, I got "intimate
relationships" with lots of data formats, if you know what I mean!
(winking)
[Audience]: (laughter)
[Jerry]: Hmmm, well I suppose that as a professional model, you must
be quite attractive to data formats. I'm sure lots of data formats are
interested in you. Is that correct?
[WRF]: Oh yea Jerry. Like you sez, I'm, you know, like, "attractive."
Oh yea! (flexes beefy arm muscles)
[Audience]: (booing from men, wooo-hooing from women)
[Jerry]: (to audience) Settle down now. Let's hear what WRF has to say
to netCDF. (to WRF) WRF, how do you feel about the fact that netCDF is
worried about younder data formats? Do you think she's right to be
worried?
[WRF]: Hey baby, what can I say? I'm a love machine! (gyrates flabby
hips)
[Audience]: (more booing, shouts of "sit down lard-ass!")
[WRF]: (appologetically, kneels in front of netCDF) But I'll always
love you baby! Just 'cause there are other formats in my life, like,
it doesn't mean that I can't love you, you know?
[Audience]: (sighing) Ahhhhhh...
[Jerry]: Now we have a surprise guest, flow all the way here from
2010-06-03 21:24:43 +08:00
Champagne-Urbana, Illinois, to appear as a guest on this show. Her
name is HDF5, and she's a professional data format.
[HDF5]: (flouning in, twirling a red handbag, and winking to WRF) Hi-ya, hon.
[Audience]: (booing, shouts of "dumb slut!")
[netCDF]: (screaming to HDF5) I can't believe you have the nerve to
come here, you hoe! (WRF jumps up in alarm and backs away)
[Jerry]: (to security) Steve, you better get between these two before
someone gets hurt...
@end example
@node Concept Index, , Derivative Works, Top
2010-06-03 21:24:43 +08:00
@chapter Concept Index
@printindex cp
@bye
End: