Commit Graph

49 Commits

Author SHA1 Message Date
Dennis Heimbigner
ee509ff4f3 Re-Implement the nc_get/put_vars operations for netcdf-4 using the
corresponding HDF5 operations.

re: github issue https://github.com/Unidata/netcdf-c/issues/908
also in reference to https://github.com/pydata/xarray/issues/2004

The netcdf-c library has implemented the nc_get_vars and nc_put_vars
operations as element at a time. This has resulted in very slow
operation.

This pr attempts to improve the situation for netcdf-4/hdf5 files
by using the slab operations provided by the hdf5 library. The new
implementation passes the get/put vars stride information down to
the hdf5 slab operations.

The result appears to improve performance significantly. Some simple
tests on large 2-D arrays shows speedups in excess of 150.

Misc. other changes:
1. fix bug in ncgen/semantics.c; using a list's allocated length
   instead of actual length.
2. Added a temporary hook in the netcdf library plus a performance
   test case (tst_varsperf.c) to estimate the speedup. After users
   have had some experience with this, I will remove it, probably
   after the 4.7 release.
2018-05-22 16:50:52 -06:00
Dennis Heimbigner
5bc174920f re: esupport MQO-415619
There is an error in handling the nc_create_memio function
as noted in the above esupport thread.
This attempts to fix it.

Also do a master merge
2018-05-03 21:02:32 -06:00
Dennis Heimbigner
0a44d9ae3a Merge branch 'master' into inmemory.dmh 2018-04-23 11:30:14 -06:00
Dennis Heimbigner
4739cd3225 Master merge and conflict resolution 2018-04-12 21:51:17 -06:00
Ed Hartnett
96154d9303 added merged HDF4 changes 2018-04-04 14:11:44 -06:00
Dennis Heimbigner
25f062528b This completes (for now) the refactoring of libsrc4.
The file docs/indexing.dox tries to provide design
information for the refactoring.

The primary change is to replace all walking of linked
lists with the use of the NCindex data structure.
Ncindex is a combination of a hash table (for name-based
lookup) and a vector (for walking the elements in the index).
Additionally, global vectors are added to NC_HDF5_FILE_INFO_T
to support direct mapping of an e.g. dimid to the NC_DIM_INFO_T
object. These global vectors exist for dimensions, types, and groups
because they have globally unique id numbers.

WARNING:
1. since libsrc4 and libsrchdf4 share code, there are also
   changes in libsrchdf4.
2. Any outstanding pull requests that change libsrc4 or libhdf4
   are likely to cause conflicts with this code.
3. The original reason for doing this was for performance improvements,
   but as noted elsewhere, this may not be significant because
   the meta-data read performance apparently is being dominated
   by the hdf5 library because we do bulk meta-data reading rather
   than lazy reading.
2018-03-16 11:46:18 -06:00
Dennis Heimbigner
ccc70d640b re: esupport MQO-415619
and https://github.com/Unidata/netcdf-c/issues/708

Expand the NC_INMEMORY capabilities to support writing and accessing
the final modified memory.

Three new functions have been added:
nc_open_memio, nc_create_mem, and nc_close_memio.

The following new capabilities were added.
1. nc_open_memio() allows the NC_WRITE mode flag
   so a chunk of memory can be passed in and be modified
2. nc_create_mem() allows the NC_INMEMORY flag to be set
   to cause the created file to be kept in memory.
3. nc_close_mem() allows the final in-memory contents to be
   retrieved at the time the file is closed.
4. A special flag, NC_MEMIO_LOCK, is provided to ensure that
   the provided memory will not be freed or reallocated.

Note the following.
1. If nc_open_memio() is called with NC_WRITE, and NC_MEMIO_LOCK is not set,
   then the netcdf-c library will take control of the incoming memory.
   This means that the original memory block should not be freed
   but the block returned by nc_close_mem() must be freed.
2. If nc_open_memio() is called with NC_WRITE, and NC_MEMIO_LOCK is set,
   then modifications to the original memory may fail if the space available
   is insufficient.

Documentation is provided in the file docs/inmemory.md.
A test case is provided: nc_test/tst_inmemory.c driven by
nc_test/run_inmemory.sh

WARNING: changes were made to the dispatch table for
the close entry. From int (*close)(int) to int (*close)(int,void*).
2018-02-25 21:45:31 -07:00
Ed Hartnett
2358d4a910 moved HDF4 to its own dispatch layer 2018-02-08 06:20:58 -07:00
Ed Hartnett
3fa3d3f9f9 ported rename fix changes from branch ejh_fill_values for easy merging 2018-01-05 06:01:22 -07:00
Dennis Heimbigner
ed44fd7306 Add szip support via libaec 2017-08-27 13:35:20 -06:00
Dennis Heimbigner
86fc8745dc merge master and resolve conflicts 2017-08-12 15:50:31 -06:00
Wei-keng Liao
29ae0b72fe X_INT64_MIN, X_INT64_MAX, and X_UINT64_MAX should be used internally 2017-06-06 18:20:26 -05:00
Dennis Heimbigner
7c3164577e Finalize the compression support.
This relies on the HDF5 capability to
dynamically load compression filters.
Note that a compression filter is just
a subcase of filters.

The primary user-visible changes are as follows:
1. Add a standard header "netcdf_filter.h" that defines
   the necessary API extensions
2. Modify ncgen to support two new special attributes
   "_Filter_ID" and "_Filter_Parameters" so that compression
   can be turned on when creating a file using ncgen.
4. Add a detailed description of filtering support
   to the user's guide; see the file filters.md
5. Add a test case directory for this: nc_test4/filter_test.
   It is fragile and a ./configure flags (-enable-filter-test)
   is defined (default disabled) to shut this off this test
   to avoid spurious 'make check' failures.

Note that the HDF5 documentation is not up-to-date, so
much of what is encoded here comes from examining the
actual code in the file H5PL.c in the HDF5 source code.
2017-04-27 13:01:59 -06:00
Greg Sjaardema
cbb9448ab0 Remove unused fields from struct
The nvars, ndims, and natts fields on the NC_HDF5_FILE_INFO struct are
never set.  The nvars field is read, but since it is never written,
the value is always zero.
2017-03-06 11:14:00 -07:00
Greg Sjaardema
473529d199 Remove unused ndims from grp struct 2017-03-06 11:14:00 -07:00
Greg Sjaardema
c84b475ccf Remove var linked list 2016-11-16 08:45:10 -07:00
Greg Sjaardema
dee1baca8e Store vars in array instead of linked list (linked list still active) 2016-11-16 08:45:06 -07:00
Dennis Heimbigner
0cf1e2c49f re: Github issue netcdf-c 300
Modified provenance code to allocate the minimal space
needed for _NCProperties attribute in file.  Basically
required using malloc in the provenance code and in ncdump.
Otherwise should cause no externally visible effects.
Also removed the ENABLE_FILEINFO from configure.ac since
the provenance code is no longer optional.
2016-08-08 09:24:19 -06:00
Ward Fisher
2e71768c47 Fenceposted includes to nc4internal.h in support of https://github.com/Unidata/netcdf-c/issues/275 2016-06-08 11:26:37 -06:00
Dennis Heimbigner
11a259ad86 Add provenance info for netcdf-4 files.
This consists of a persistent attribute named
_NCProperties plus two computed attributes
_IsNetcdf4 and _SuperblockVersion.
See the 'Provenance Attributes' section
of docs/attribute_conventions.md for details.
2016-05-07 14:32:07 -06:00
Greg Sjaardema
1dda09655a Add missing function prototype for hash_fast 2016-03-08 09:41:24 -07:00
Greg Sjaardema
1a84a6a99e Add hash field to dim and var to facilitate fast name compare
In non-classic netcdf-4 models, it is allowable to have
large numbers of dims and vars.  In many operations, the
entire list of dims or vars is searched for a dim/var matching
a specific name which results in *lots* of strncmp or strcmp
calls.

If we add a hash field to the var and dim structs similar to what
has already been done for the netcdf-3 formats, then we can hash the
name being searched for and numerically compare that value with
the var/dim hash value.  If they match, then do a more expensive
strncmp call to ensure that the names truly match.
2016-03-03 13:18:31 -07:00
Ward Fisher
d8b65ccea1 Fix for https://github.com/Unidata/netcdf-c/issues/223 2016-02-19 15:05:39 -07:00
Dennis Heimbigner
b5ba424793 Clean up the handling of hdf5 initialization by
creating an nc4_hdf5_initialize(void) function
plus nc4_hdf5_initialized flag.
Also fix potential null exception in nc4internal.c
2016-01-28 16:19:38 -07:00
Ward Fisher
612b35a84c Merge branch 'master' into cdf-5, in preparation for merging the CDF-5 functionality into the master branch. This will be the key new feature for netcdf 4.4.0. 2015-11-05 13:40:35 -07:00
tbeu
e2820e4d8a Fix common typos
Detected by https://github.com/vlajos/misspell_fixer
2015-08-20 11:42:05 +02:00
dmh
859f105005 merge-squash 2015-08-15 16:26:35 -06:00
Russ Rew
b40068afdb fix arg type in internal function 2015-02-05 17:05:10 -07:00
Quincey Koziol
2917a6a123 Interim checkpoint of working code. 2014-12-01 08:52:53 -06:00
Quincey Koziol
8769d58b1d Initial fix for further rename issue. 2014-11-24 09:36:58 -06:00
Quincey Koziol
b2dfacbcfa Big clean up to type handling in libsrc4, which makes fill-values work
correctly for variables with string datatype, plus a few other minor changes.
2014-02-11 17:12:08 -06:00
Quincey Koziol
b3044de434 Refactored read_scale(), memio_new(), var_create_dataset() and makespecial()
to clean up resources properly on failure.

Refactored doubly-linked list code for objects in the libsrc4 directory,
    cleaning up the add/del routines, breaking out the common next/prev
    pointers into a struct and extracting the add/del operations on them,
    changed the list of dims to add new dims in the same order as the other
    types, made all add routines able to optionally return a pointer to the
    newly created object.

Removed some dead code (pg_var(), nc4_pg_var1(), nc4_pg_varm(), misc. small
    routines, etc)

Fixed fill value handling for string types in nc4_get_vara().

Changed many malloc()+strcpy() pairs into calls to strdup().

Cleaned up misc. other minor Coverity issues.
2013-12-08 03:29:26 -06:00
Quincey Koziol
e1fc13b215 Many changes to address NCF-177 (renaming dimensions and variables). Also
many cleanups to fix compiler warnings, streamline iteration over objects
in HDF5 file when opening the file, and generally straightening out the code
to be cleaner and simpler.

Tested on Mac OS/X with gcc 4.8 and OpenMPI (which uses clang).
2013-11-30 23:20:28 -06:00
Quincey Koziol
d9069aaeaa More progress toward fixing NCF-177. 2013-09-28 12:40:21 -05:00
Quincey Koziol
3cdce9e3af Correct error when a parallel application writes different amounts of data to
an unlimited-dimension variable, and different processes don't agree on the
whether to extend the underlying HDF5 dataset, or don't agree on the amount
to extend the dataset.
2013-08-18 20:45:17 -05:00
Dennis Heimbigner
b85d3568ec Added the necessary code to support
group renaming. The primary API
is 'nc_rename_grp(int grpid, const char* name)'.
No test cases provided yet.
This also required adding a rename_grp entry
to the dispatch tables.
2013-07-19 21:48:37 +00:00
Russ Rew
6cd8fca60d Quincey's fixes for NCF-250, netCDF-4 parallel independent access with
unlimited dimension hanging.  Extending the size of an unlimited
dimension in HDF5 must be a collective operation, so now an error is
returned if trying to extend in independent access mode.

Quincey's bug fixes for parallel build portability, particularly
OpenMPI on MacOS-X.
2013-07-08 21:31:13 +00:00
Dennis Heimbigner
c583f91992 merge with trunk and fix conflicts 2013-05-10 17:04:28 +00:00
Russ Rew
e1c8870fea Renamed internal function. 2013-04-22 22:58:43 +00:00
Ward Fisher
9f187a1484 Merged the fix for NCF-29 from Quincy into the trunk. 2013-03-26 18:57:26 +00:00
Dennis Heimbigner
c659633d2e merge from trunk 2013-03-26 16:45:02 +00:00
Russ Rew
b36dc5b5fb Fix NCF-244 bug for case where using different order for defining
coordinate variables and their associated dimensions occurs in any
subgroup, rather than just the root group.  If this occurs, the
variable attribute "_Netcdf4Dimid" is created for every dimension
scale.

Also add a test for this bug fix in tst_dims3.nc, based on Pedro
Vicente's demo.
2013-03-26 15:14:19 +00:00
Russ Rew
2071ca9bf3 Fixed NCF-244, a netCDF-4 bug resulting in two dimensions with the same dimension ID 2013-03-25 18:06:19 +00:00
Dennis Heimbigner
dea3c726c8 merge trunk into this branch 2013-03-15 20:31:07 +00:00
Koziol
0f29dd60c0 Description:
Fix Jira issue NCF-29 (https://bugtracking.unidata.ucar.edu/browse/NCF-29):
making the netCDF-4 library ignore HDF5 datasets and attributes which have
datatypes (such as references) that it doesn't understand.

Tested on:
    Mac OSX/64 10.8.2 (amazon) w/--enable-netcdf-4 --enable-extra-tests --enable-extra-example-tests
        --disable-shared --enable-logging
2013-03-01 08:10:28 +00:00
Ward Fisher
7b91248723 Merged changes from netcdf-branch.
o Changed variable names 'typeid' to 'typeid1' to avoid a namespace conflict
in visual studio.
o Cleaned up a handful of warnings in Visual Studio.
o Addressed a few Coverity-discovered issuesl
o Made changes to CMake-based builds.
2012-12-13 22:09:41 +00:00
Dennis Heimbigner
5ca78309cc The effect of this change is to make the struct NC structure
contain as little file-type specific info as possible.  It
modifies especially libsrc so that all of the netcdf-3 data
that used to be in struct NC is now kept in a separate chunk
of data pointed to by the struct NC. This makes all of
current protocols consistent: netcdf-3, netcdf-4, and dap.
2012-09-06 19:44:03 +00:00
Dennis Heimbigner
d7790e7e7e 2011-09-16 18:36:08 +00:00
Ed Hartnett
d58c18c623 added diskless files API, subsetting not working, classic model only 2011-08-16 21:04:33 +00:00