netcdf-c

mirror of https://github.com/Unidata/netcdf-c.git synced 2025-02-17 16:50:18 +08:00

Author	SHA1	Message	Date
Dennis Heimbigner	fb40a72b45	Improve performance of the nc_reclaim_data and nc_copy_data functions. re: Issue https://github.com/Unidata/netcdf-c/issues/2685 re: PR https://github.com/Unidata/netcdf-c/pull/2179 As noted in PR https://github.com/Unidata/netcdf-c/pull/2179, the old code did not allow for reclaiming instances of types, nor for properly copying them. That PR provided new functions capable of reclaiming/copying instances of arbitrary types. However, as noted by Issue https://github.com/Unidata/netcdf-c/issues/2685, using these most general functions resulted in a significant performance degradation, even for common cases. This PR attempts to mitigate the cost of using the general reclaim/copy functions in two ways. First, the previous functions operating at the top level by using ncid and typeid arguments. These functions were augmented with equivalent versions that used the netcdf-c library internal data structures to allow direct access to needed information. These new functions are used internally to the library. The second mitigation involves optimizing the internal functions by providing early tests for common cases. This avoids unnecessary recursive function calls. The overall result is a significant improvement in speed by a factor of roughly twenty -- your mileage may vary. These optimized functions are still not as fast as the original (more limited) functions, but they are getting close. Additional optimizations are possible. But the cost is a significant "uglification" of the code that I deemed a step too far, at least for now. ## Misc. Changes 1. Added a test case to check the proper reclamation/copy of complex types. 2. Found and fixed some places where nc_reclaim/copy should have been used. 3. Replaced, in the netcdf-c library, (almost all) occurrences of nc_reclaim_copy with calls to NC_reclaim/copy. This plus the optimizations is the primary speed-up mechanism. 4. In DAP4, the metadata is held in a substrate in-memory file; this required some changes so that the reclaim/copy code accessed that substrate dispatcher rather than the DAP4 dispatcher. 5. Re-factored and isolated the code that computes if a type is (transitively) variable-sized or not. 6. Clean up the reclamation code in ncgen; adding the use of nc_reclaim exposed some memory problems.	2023-05-20 17:11:25 -06:00
Dennis Heimbigner	49737888ca	Improve S3 Documentation and Support ## Improvements to S3 Documentation * Create a new document quickstart_paths.md that give a summary of the legal path formats used by netcdf-c. This includes both file paths and URL paths. * Modify nczarr.md to remove most of the S3 related text. * Move the S3 text from nczarr.md to a new document cloud.md. * Add some S3-related text to the byterange.md document. Hopefully, this will make it easier for users to find the information they want. ## Rebuild NCZarr Testing In order to avoid problems with running make check in parallel, two changes were made: 1. The nczarr_test test system was rebuilt. Now, for each test. any generated files are kept in a test-specific directory, isolated from all other test executions. 2. Similarly, since the S3 test bucket is shared, any generated S3 objects are isolated using a test-specific key path. ## Other S3 Related Changes * Add code to ensure that files created on S3 are reclaimed at end of testing. * Used the bash "trap" command to ensure S3 cleanup even if the test fails. * Cleanup the S3 related configure.ac flag set since S3 is used in several places. So now one should use the option --enable-s3 instead of --enable-nczarr-s3, although the latter is still kept as a deprecated alias for the former. * Get some of the github actions yml to work with S3; required fixing various test scripts adding a secret to access the Unidata S3 bucket. * Cleanup S3 portion of libnetcdf.settings.in and netcdf_meta.h.in and test_common.in. * Merge partial S3 support into dhttp.c. * Create an experimental s3 access library especially for use with Windows. It is enabled by using the options --enable-s3-internal (automake) or -DENABLE_S3_INTERNAL=ON (CMake). Also add a unit-test for it. * Move some definitions from ncrc.h to ncs3sdk.h ## Other Changes * Provide a default implementation of strlcpy and move this and similar defaults into dmissing.c.	2023-04-25 17:15:06 -06:00
Dennis Heimbigner	591e6b2f6d	Fix DAP4 remotetest server Warning: This PR is a follow on to PR https://github.com/Unidata/netcdf-c/pull/2555 and should not be merged until that prior PR has been merged. The changeset for this PR is a delta on the PR https://github.com/Unidata/netcdf-c/pull/2555. This PR re-enables the use of the server remotetest.unidata.ucar.edu/d4ts to test several features: 1. Show that access over the Internet to servers using the DAP4 protocol works. 2. Test that DAP4 support in the [Thredds Data Server](https://github.com/Unidata/tds) is operating correctly. 4. Test that the DAP4 support in the [netcdf-java library](https://github.com/Unidata/netcdf-java) library and the DAP4 support in the netcdf-c library are consistent and are interoperable. The test inputs (primarily \.nc* files) provided in the netcdf-c library are also used by the DAP4 Test Server (aka d4ts) to present web access to a collection of data files accessible via the DAP4 protocol and which can be used for testing Internet access to a working server. To be precise, this version of d4ts is currently in unmerged branches of the netcdf-java and tds Github repositories and so are not actually in the main repositories yet. However, the d4ts.war file was created from that branch and used to populate the remotetest.unidata.ucar.edu server The two other remote servers that were used in the past are Hyrax (OPenDAP.org) and thredds-test. These will continue to remain disabled until those servers can be fixed. ## Primary Changes * Rebuild the baselineremote directory. This directory contains the validation data needed to test the remote servers. * Re-enable using remotetest.unidata.ucar.edu as part of the DAP4 testing process. * Fix the dap4_test/test_remote.sh test script to match the current available test data. * Make some changes to libdap4 to improve the ability to catch malformed data streams [affects a lot of files in libdap4]. ## Misc. Unrelated Changes * Remove a raft of warnings, especially in nc_test4/tst_quantize.c. * Add some additional explanatory information to the NCZarr documentation. * Cleanup some Doxygen errors in the docs file and reorder some files.	2022-11-15 20:29:21 -07:00
Dennis Heimbigner	8b9253fef2	Fix various problem around VLEN's re: https://github.com/Unidata/netcdf-c/issues/541 re: https://github.com/Unidata/netcdf-c/issues/1208 re: https://github.com/Unidata/netcdf-c/issues/2078 re: https://github.com/Unidata/netcdf-c/issues/2041 re: https://github.com/Unidata/netcdf-c/issues/2143 For a long time, there have been known problems with the management of complex types containing VLENs. This also involves the string type because it is stored as a VLEN of chars. This PR (mostly) fixes this problem. But note that it adds new functions to netcdf.h (see below) and this may require bumping the .so number. These new functions can be removed, if desired, in favor of functions in netcdf_aux.h, but netcdf.h seems the better place for them because they are intended as alternatives to the nc_free_vlen and nc_free_string functions already in netcdf.h. The term complex type refers to any type that directly or transitively references a VLEN type. So an array of VLENS, a compound with a VLEN field, and so on. In order to properly handle instances of these complex types, it is necessary to have function that can recursively walk instances of such types to perform various actions on them. The term "deep" is also used to mean recursive. At the moment, the two operations needed by the netcdf library are: * free'ing an instance of the complex type * copying an instance of the complex type. The current library does only shallow free and shallow copy of complex types. This means that only the top level is properly free'd or copied, but deep internal blocks in the instance are not touched. Note that the term "vector" will be used to mean a contiguous (in memory) sequence of instances of some type. Given an array with, say, dimensions 2 X 3 X 4, this will be stored in memory as a vector of length 234=24 instances. The use cases are primarily these. ## nc_get_vars Suppose one is reading a vector of instances using nc_get_vars (or nc_get_vara or nc_get_var, etc.). These functions will return the vector in the top-level memory provided. All interior blocks (form nested VLEN or strings) will have been dynamically allocated. After using this vector of instances, it is necessary to free (aka reclaim) the dynamically allocated memory, otherwise a memory leak occurs. So, the recursive reclaim function is used to walk the returned instance vector and do a deep reclaim of the data. Currently functions are defined in netcdf.h that are supposed to handle this: nc_free_vlen(), nc_free_vlens(), and nc_free_string(). Unfortunately, these functions only do a shallow free, so deeply nested instances are not properly handled by them. Note that internally, the provided data is immediately written so there is no need to copy it. But the caller may need to reclaim the data it passed into the function. ## nc_put_att Suppose one is writing a vector of instances as the data of an attribute using, say, nc_put_att. Internally, the incoming attribute data must be copied and stored so that changes/reclamation of the input data will not affect the attribute. Again, the code inside the netcdf library does only shallow copying rather than deep copy. As a result, one sees effects such as described in Github Issue https://github.com/Unidata/netcdf-c/issues/2143. Also, after defining the attribute, it may be necessary for the user to free the data that was provided as input to nc_put_att(). ## nc_get_att Suppose one is reading a vector of instances as the data of an attribute using, say, nc_get_att. Internally, the existing attribute data must be copied and returned to the caller, and the caller is responsible for reclaiming the returned data. Again, the code inside the netcdf library does only shallow copying rather than deep copy. So this can lead to memory leaks and errors because the deep data is shared between the library and the user. # Solution The solution is to build properly recursive reclaim and copy functions and use those as needed. These recursive functions are defined in libdispatch/dinstance.c and their signatures are defined in include/netcdf.h. For back compatibility, corresponding "ncaux_XXX" functions are defined in include/netcdf_aux.h. ```` int nc_reclaim_data(int ncid, nc_type xtypeid, void* memory, size_t count); int nc_reclaim_data_all(int ncid, nc_type xtypeid, void* memory, size_t count); int nc_copy_data(int ncid, nc_type xtypeid, const void* memory, size_t count, void* copy); int nc_copy_data_all(int ncid, nc_type xtypeid, const void* memory, size_t count, void** copyp); ```` There are two variants. The first two, nc_reclaim_data() and nc_copy_data(), assume the top-level vector is managed by the caller. For reclaim, this is so the user can use, for example, a statically allocated vector. For copy, it assumes the user provides the space into which the copy is stored. The second two, nc_reclaim_data_all() and nc_copy_data_all(), allows the functions to manage the top-level. So for nc_reclaim_data_all, the top level is assumed to be dynamically allocated and will be free'd by nc_reclaim_data_all(). The nc_copy_data_all() function will allocate the top level and return a pointer to it to the user. The user can later pass that pointer to nc_reclaim_data_all() to reclaim the instance(s). # Internal Changes The netcdf-c library internals are changed to use the proper reclaim and copy functions. It turns out that the places where these functions are needed is quite pervasive in the netcdf-c library code. Using these functions also allows some simplification of the code since the stdata and vldata fields of NC_ATT_INFO are no longer needed. Currently this is commented out using the SEPDATA \#define macro. When any bugs are largely fixed, all this code will be removed. # Known Bugs 1. There is still one known failure that has not been solved. All the failures revolve around some variant of this .cdl file. The proximate cause of failure is the use of a VLEN FillValue. ```` netcdf x { types: float() row_of_floats ; dimensions: m = 5 ; variables: row_of_floats ragged_array(m) ; row_of_floats ragged_array:_FillValue = {-999} ; data: ragged_array = {10, 11, 12, 13, 14}, {20, 21, 22, 23}, {30, 31, 32}, {40, 41}, _ ; } ```` When a solution is found, I will either add it to this PR or post a new PR. # Related Changes Mark nc_free_vlen(s) as deprecated in favor of ncaux_reclaim_data. * Remove the --enable-unfixed-memory-leaks option. * Remove the NC_VLENS_NOTEST code that suppresses some vlen tests. * Document this change in docs/internal.md * Disable the tst_vlen_data test in ncdump/tst_nccopy4.sh. * Mark types as fixed size or not (transitively) to optimize the reclaim and copy functions. # Misc. Changes * Make Doxygen process libdispatch/daux.c * Make sure the NC_ATT_INFO_T.container field is set.	2022-01-08 18:30:00 -07:00
Edward Hartnett	0f26083f4d	perparing to apply bitgroom algorithm	2021-08-25 01:31:26 -06:00
Greg Sjaardema	56c0d5cf8a	Spelling fixes	2019-09-18 08:03:01 -06:00
Ed Hartnett	af91209981	cleanup of whitespace in HDF5 directory	2019-02-19 05:55:22 -07:00
Ed Hartnett	f6443bce8f	rest of separation of libhdf5 and libsrc4	2018-11-30 14:05:11 -07:00
Ed Hartnett	b5ed407e9f	better handling of normalizing names in HDF5 atts	2018-11-30 13:28:18 -07:00
Ed Hartnett	433499771b	moved special att reading function to libhdf5	2018-11-30 07:50:15 -07:00
Ed Hartnett	104b4b50fe	clean up	2018-11-29 06:25:34 -07:00
Ed Hartnett	cc18944fa7	moved lazy atts handling for nc_inq_attid()	2018-11-26 09:58:31 -07:00
Ed Hartnett	8e7fc913cb	moving lazy atts code to libhdf5	2018-11-26 08:24:18 -07:00
Ed Hartnett	1df4bb1762	moving lazy atts code to libhdf5	2018-11-26 08:21:32 -07:00
Ed Hartnett	8390d572ad	Merge branch 'master' into ejh_hdf5_sep_next	2018-09-06 17:30:37 -06:00
Dennis Heimbigner	2ea1cf5f1b	There was a request to extend the provenance information stored in the _NCProperties attribute to allow two things: 1. capture of additional library dependencies (over and above hdf5) 2. Recognition of non-netcdf libraries that create netcdf-4 format files. To this end, the _NCProperties format has been extended to be and arbitrary set of key=value pairs separated by commas. This new format has version = 2, and uses commas as the pair separator. Thus the general form is: _NCProperties = "version=2,key1=value,key2=value2..." ; This new version is accompanied by a new ./configure option of the form --with-ncproperties="key1=value1,key2=value2..." that specifies pairs to add to the _NCProperties attribute for all files created with that netcdf library. At this point, what is missing is some programmatic way to specify either all the pairs or additional pairs to the _NCProperties attribute. Not sure of the best way to do this. Builders using non-netcdf libraries can specify whatever they want in the key value pairs (as long as the version=2 is specified first). By convention, the primary library is expected to be the the first pair after the leading version=2 pair, but this is convention only and is neither required nor enforced. Related changes: 1. Fixed the tests that check _NCProperties to properly operate with version=2. 2. When reading a version 1 _NCProperties attribute, convert it to look like a version 2 attribute. 2. Added some version 2 tests to ncdump/tst_fileinfo.c and ncdump/tst_fileinfo.sh Misc Changes: 1. Fix minor problem in ncdap_test/testurl.sh where a parameter to buildurl needed to be quoted. 2. Minor fix to ncgen to swap switches -H and -h to be consistent with other utilities. 3. Document the -M flag in nccopy usage() and the nccopy man page. 4. Modify a test case to use the nccopy -M flag.	2018-08-25 21:44:41 -06:00
Ed Hartnett	8885c75ade	removing unneeded lookups	2018-08-22 06:08:19 -06:00
Ed Hartnett	5a52f28bb7	further condensing code	2018-07-21 10:43:36 -06:00
Ed Hartnett	7aed50a902	performance test for fast global att reads	2018-07-21 07:29:12 -06:00
Ed Hartnett	786c5a8f2e	moved hdf5 specific header stuff to hdf5internal.h	2018-07-12 07:05:21 -06:00
Ed Hartnett	697f033823	renamed NC_HDF5_FILE_INFO to NC_FILE_INFO	2018-06-22 07:08:09 -06:00
Ed Hartnett	eafe151f13	added test	2018-06-19 14:59:07 -06:00
Ed Hartnett	6b90169278	switching to att_not_read	2018-06-19 05:05:44 -06:00
Ed Hartnett	dad70cf880	more lazy atts	2018-06-19 04:54:03 -06:00
Ed Hartnett	19ae8b47d1	took out src_long and dest_long again. Getting good at it! ;-)	2018-06-16 05:33:04 -06:00
Ed Hartnett	037a3cb58c	reverting	2018-06-09 06:17:52 -06:00
Ed Hartnett	09366bf43b	removed longs from conver_type again	2018-06-09 06:14:14 -06:00
Ward Fisher	1d789d9d39	Additional reconciliation	2018-06-08 15:50:39 -06:00
Ed Hartnett	9a2782b56c	got long working with master	2018-06-05 14:40:49 -06:00
Ward Fisher	4283b791b5	Misc changes to appveyor testing.	2018-05-24 13:16:56 -06:00
Ed Hartnett	3e320a5bfb	moved more HDF5 functions to libhdf5	2018-05-15 06:47:52 -06:00
Dennis Heimbigner	42e8028726	Re: github issues https://github.com/Unidata/netcdf-c/issues/917 https://github.com/Unidata/netcdf-c/issues/915 Fix following memory errors: 1. global_buffer_overflow 2. nc4_att_list_add	2018-03-29 14:57:40 -06:00
Dennis Heimbigner	25f062528b	This completes (for now) the refactoring of libsrc4. The file docs/indexing.dox tries to provide design information for the refactoring. The primary change is to replace all walking of linked lists with the use of the NCindex data structure. Ncindex is a combination of a hash table (for name-based lookup) and a vector (for walking the elements in the index). Additionally, global vectors are added to NC_HDF5_FILE_INFO_T to support direct mapping of an e.g. dimid to the NC_DIM_INFO_T object. These global vectors exist for dimensions, types, and groups because they have globally unique id numbers. WARNING: 1. since libsrc4 and libsrchdf4 share code, there are also changes in libsrchdf4. 2. Any outstanding pull requests that change libsrc4 or libhdf4 are likely to cause conflicts with this code. 3. The original reason for doing this was for performance improvements, but as noted elsewhere, this may not be significant because the meta-data read performance apparently is being dominated by the hdf5 library because we do bulk meta-data reading rather than lazy reading.	2018-03-16 11:46:18 -06:00
Ed Hartnett	7c936a7bb6	brought in changes from ejh_att	2018-01-31 08:44:33 -07:00
Ed Hartnett	4de61e21f2	more docs, more cleaning	2017-12-04 12:21:14 -07:00
Ed Hartnett	fec74e18ef	more internal documentation	2017-12-04 07:07:45 -07:00
Ward Fisher	7edb08977a	Added platform checks for ARM.	2017-02-03 11:19:39 -07:00
Wei-keng Liao	fe9685deb4	implement an error code precedence	2016-12-01 12:31:20 -06:00
Wei-keng Liao	4cdbf7dba5	Merge branch 'master' into issue258	2016-12-01 01:09:45 -06:00
Greg Sjaardema	a55d96eba1	Clean-up build after changes -- remove unused variables	2016-11-16 08:45:28 -07:00
Greg Sjaardema	dee1baca8e	Store vars in array instead of linked list (linked list still active)	2016-11-16 08:45:06 -07:00
Wei-keng Liao	4725c1484b	move varid check right after ncid	2016-10-12 13:33:17 -05:00
Dennis Heimbigner	0cf1e2c49f	re: Github issue netcdf-c 300 Modified provenance code to allocate the minimal space needed for _NCProperties attribute in file. Basically required using malloc in the provenance code and in ncdump. Otherwise should cause no externally visible effects. Also removed the ENABLE_FILEINFO from configure.ac since the provenance code is no longer optional.	2016-08-08 09:24:19 -06:00
Ward Fisher	1ebb104f74	Tentatively fixed https://github.com/Unidata/netcdf-c/issues/239 but the test needs to be extended.	2016-06-10 17:03:08 -06:00
Dennis Heimbigner	835511eaeb	HDF5 is generating unnecessary error messages when netcdf4 logging is enabled re: github netcdf-c issue #271 This occurs for several reasons, including: 1. using H5Aopen_name instead of H5Aexists to test if attribute exists. 2. using H5Eset_auto instead of H5Eset_auto2. There are probably others that will have to be extinguished as encountered. p.s Hope I did not overdo this and kill too much.	2016-05-27 10:08:01 -06:00
Ward Fisher	fcca7ae57d	Merge branch 'master' into provfix	2016-05-16 12:39:15 -06:00
Dennis Heimbigner	4fa1470241	re: github issue https://github.com/Unidata/netcdf-c/issues/265 Charlie Zender noted that we forgot to define what happens for various netcdf API attribute operations, notably nc_inq_att() and nc_get_att(). So, I added a list of legal and illegal api calls for the provenance attributes in docs/attribute_conventions.md. I also added more test cases to ncdump/tst_fileinfo.c to verify and fixed resultant errors.	2016-05-15 18:03:04 -06:00
Dennis Heimbigner	7e0db68dce	Finally get around to removing all that obsolete pnetcdf related code in libsrc4.	2016-05-14 22:31:41 -06:00
Dennis Heimbigner	11a259ad86	Add provenance info for netcdf-4 files. This consists of a persistent attribute named _NCProperties plus two computed attributes _IsNetcdf4 and _SuperblockVersion. See the 'Provenance Attributes' section of docs/attribute_conventions.md for details.	2016-05-07 14:32:07 -06:00
Ward Fisher	473259b772	Corrected issue where overwriting an attribute of type NC_CHAR with NC_STRING would result in dangling data.	2015-11-11 11:32:12 -07:00

1 2

77 Commits