Go to file
Greg Sjaardema 1db3d07beb
Proof-of-Concept: Avoid N^2 behavior in NC4_inq_dim
The current library seems to have some behavior which is N^2 in the number of vars in a file.

The `NC4_inq_dim` routine calls down to `nc4_find_dim_len` which iterates through each `var` in the file/group and calls `find_var_dim_max_length` on each var and finds the largest length of the dim on each of those vars. This is done only for unlimited vars.

I have a file with 129 dim and 1630 vars.  The unlimited dimension is of length 41.  In my test program, I am reading data from 4 files which have the same dim and var count and reading every 4th time step (unlimited dimension).  If I run a profile, I see that 98.2% of the program time is in the `nc_get_vara_float` call tree and most of that is in `find_var_dim_max_length` (94.8%).

There are 66,142 calls to `nc_get_vara_float` resulting in 107,307,290 calls to `find_var_dim_max_length` with twice that number of calls to `malloc/free` and calls to 5 HDF5 routines.  All of this, at least in my case, to return the same `41` each time.

The proof of concept patch here will check whether the file is read-only (or no_write) and if so, it will cache the value of the dim length the first time it is calculated.   With this change, my example run is sped up by a factor of 60.  The time for `NC4_inq_dim` and below drops from 97.2% down to 2.7%.

I'm not sure whether this is the correct fix, or if there is some behavior that I am overlooking, but my users would definitely like a 10 second run compared to a 10 minute run... 

This is on current Netcdf master branch.

I will try to attach some valgrind/callgrind profiles.
2020-04-30 11:01:10 -06:00
.github
cmake
conda.recipe
ctest_scripts Added a ctest script with DAP tests enabled. 2020-02-11 15:09:29 -07:00
dap4_test fixed distclean target in dap4_test 2020-01-23 04:40:37 -07:00
debug force github checks restart 2020-03-29 14:50:28 -06:00
docs Tweaked docs to fix dead references introduced as part of separating out NUG from netCDF-C. 2020-03-27 14:21:25 -06:00
examples Add support for multiple filters per variable. 2020-02-16 12:59:33 -07:00
h5_test fixed missing declaration 2020-04-23 23:32:29 -05:00
hdf4_test
include Merge remote-tracking branch 'upstream/master' 2020-04-23 15:36:14 -05:00
libdap2 Fix conflicts with master 2020-02-27 14:06:45 -07:00
libdap4 Use proper CURLOPT values for VERIFYHOST and VERIFYPEER 2020-04-10 13:42:27 -06:00
libdispatch cleanup 2020-04-15 05:53:59 -06:00
libhdf4 Fix reclamation of the ->format_XXX_info fields 2020-03-29 12:48:59 -06:00
libhdf5 Proof-of-Concept: Avoid N^2 behavior in NC4_inq_dim 2020-04-30 11:01:10 -06:00
liblib Updated so version info in line with guidelines found at https://www.gnu.org/software/libtool/manual/html_node/Updating-version-info.html 2020-03-26 11:26:10 -06:00
libsrc Fix conflicts with master 2020-02-27 14:06:45 -07:00
libsrc4 Fix reclamation of the ->format_XXX_info fields 2020-03-29 12:48:59 -06:00
libsrcp Fix conflicts with master 2020-02-27 14:06:45 -07:00
nc_perf fix for memory leak due to HDF5 types 2020-02-09 11:47:13 -07:00
nc_test whitespace cleanup of test 2020-04-15 06:10:12 -06:00
nc_test4 Fix missing forward declarations 2020-04-03 20:15:34 -06:00
ncdap_test
ncdump Correcting a formatting error for scalars when dumping with ncdump -f 2020-04-28 15:49:03 -06:00
ncgen Make utilities support NC_COMPACT 2020-02-29 12:06:21 -07:00
ncgen3
nctest
NUG Use proper CURLOPT values for VERIFYHOST and VERIFYPEER 2020-04-10 13:42:27 -06:00
oc2 Use proper CURLOPT values for VERIFYHOST and VERIFYPEER 2020-04-10 13:42:27 -06:00
plugins Fix missing forward declarations 2020-04-03 20:15:34 -06:00
unit_test fixed warning 2020-03-02 16:29:52 -07:00
.gitignore Shuffling NUG and documentation. 2020-02-06 16:14:25 -07:00
.travis.yml
acinclude.m4
appveyor.yml
bootstrap
cmake_uninstall.cmake.in
CMakeInstallation.cmake
CMakeLists.txt Use proper CURLOPT values for VERIFYHOST and VERIFYPEER 2020-04-10 13:42:27 -06:00
COMPILE.cmake.txt
config.h.cmake.in Use proper CURLOPT values for VERIFYHOST and VERIFYPEER 2020-04-10 13:42:27 -06:00
config.h.cmake.in.old-works
configure.ac Use proper CURLOPT values for VERIFYHOST and VERIFYPEER 2020-04-10 13:42:27 -06:00
COPYRIGHT
CTestConfig.cmake.in
CTestCustom.cmake
dods.m4
FixBundle.cmake.in
INSTALL.md
lib_flags.am
libnetcdf.settings.in Merge pull request #1619 from NOAA-GSD/ejh_more_szip 2020-02-06 12:42:27 -07:00
Makefile.am Add support for multiple filters per variable. 2020-02-16 12:59:33 -07:00
mclean
nc-config.cmake.in
nc-config.in
netcdf.pc.in
netCDFConfig.cmake.in Correct typo. 2020-01-24 16:53:16 -07:00
PostInstall.cmake
postinstall.sh.in
README.md Correcting dead link to installation 2020-04-24 16:44:07 -06:00
RELEASE_NOTES.md Updated release notes. 2020-04-28 15:52:40 -06:00
test_common.in
test_prog.c
test-driver-verbose
wjna

Unidata NetCDF

Build Status Coverity Scan Build Status

About

The Unidata network Common Data Form (netCDF) is an interface for scientific data access and a freely-distributed software library that provides an implementation of the interface. The netCDF library also defines a machine-independent format for representing scientific data. Together, the interface, library, and format support the creation, access, and sharing of scientific data. The current netCDF software provides C interfaces for applications and data. Separate software distributions available from Unidata provide Java, Fortran, Python, and C++ interfaces. They have been tested on various common platforms.

Properties

NetCDF files are self-describing, network-transparent, directly accessible, and extendible. Self-describing means that a netCDF file includes information about the data it contains. Network-transparent means that a netCDF file is represented in a form that can be accessed by computers with different ways of storing integers, characters, and floating-point numbers. Direct-access means that a small subset of a large dataset may be accessed efficiently, without first reading through all the preceding data. Extendible means that data can be appended to a netCDF dataset without copying it or redefining its structure.

Use

NetCDF is useful for supporting access to diverse kinds of scientific data in heterogeneous networking environments and for writing application software that does not depend on application-specific formats. For information about a variety of analysis and display packages that have been developed to analyze and display data in netCDF form, see

More information

For more information about netCDF, see

Latest releases

You can obtain a copy of the latest released version of netCDF software for various languages:

Copyright and licensing information can be found here, as well as in the COPYRIGHT file accompanying the software

Installation

To install the netCDF-C software, please see the file INSTALL in the netCDF-C distribution, or the (usually more up-to-date) document:

Documentation

A language-independent User's Guide for netCDF, and some other language-specific user-level documents are available from:

A mailing list, netcdfgroup@unidata.ucar.edu, exists for discussion of the netCDF interface and announcements about netCDF bugs, fixes, and enhancements. For information about how to subscribe, see the URL

Feedback

We appreciate feedback from users of this package. Please send comments, suggestions, and bug reports to support-netcdf@unidata.ucar.edu.