Fix github issue https://github.com/Unidata/netcdf-c/issues/899
which came from e-support UOY-859712.
The problem was that the vlen_max parameter
to libsrc/var.c#NC_check_vlen was of type size_t.
However, it is being called, sometimes, with values
of size X_INT64_MAX. The resulting truncation was
causing dimension failures as noted in the e-support
report.
Fix is to change the vlen_max argument (and some
local variables in NC_check_vlen) to be of declared
as unsigned long long.
The file docs/indexing.dox tries to provide design
information for the refactoring.
The primary change is to replace all walking of linked
lists with the use of the NCindex data structure.
Ncindex is a combination of a hash table (for name-based
lookup) and a vector (for walking the elements in the index).
Additionally, global vectors are added to NC_HDF5_FILE_INFO_T
to support direct mapping of an e.g. dimid to the NC_DIM_INFO_T
object. These global vectors exist for dimensions, types, and groups
because they have globally unique id numbers.
WARNING:
1. since libsrc4 and libsrchdf4 share code, there are also
changes in libsrchdf4.
2. Any outstanding pull requests that change libsrc4 or libhdf4
are likely to cause conflicts with this code.
3. The original reason for doing this was for performance improvements,
but as noted elsewhere, this may not be significant because
the meta-data read performance apparently is being dominated
by the hdf5 library because we do bulk meta-data reading rather
than lazy reading.
and https://github.com/Unidata/netcdf-c/issues/708
Expand the NC_INMEMORY capabilities to support writing and accessing
the final modified memory.
Three new functions have been added:
nc_open_memio, nc_create_mem, and nc_close_memio.
The following new capabilities were added.
1. nc_open_memio() allows the NC_WRITE mode flag
so a chunk of memory can be passed in and be modified
2. nc_create_mem() allows the NC_INMEMORY flag to be set
to cause the created file to be kept in memory.
3. nc_close_mem() allows the final in-memory contents to be
retrieved at the time the file is closed.
4. A special flag, NC_MEMIO_LOCK, is provided to ensure that
the provided memory will not be freed or reallocated.
Note the following.
1. If nc_open_memio() is called with NC_WRITE, and NC_MEMIO_LOCK is not set,
then the netcdf-c library will take control of the incoming memory.
This means that the original memory block should not be freed
but the block returned by nc_close_mem() must be freed.
2. If nc_open_memio() is called with NC_WRITE, and NC_MEMIO_LOCK is set,
then modifications to the original memory may fail if the space available
is insufficient.
Documentation is provided in the file docs/inmemory.md.
A test case is provided: nc_test/tst_inmemory.c driven by
nc_test/run_inmemory.sh
WARNING: changes were made to the dispatch table for
the close entry. From int (*close)(int) to int (*close)(int,void*).
(I hope) metadata mechanism. This mostly just adds new pieces of
code (e.g. nclistmap) and does some minor fixes.
It should be transparent to everything else.
The next set of changes will be the big step.
strlcat provides better protection against buffer overflows.
Code is taken from the FreeBSD project source code. Specifically:
https://github.com/freebsd/freebsd/blob/master/lib/libc/string/strlcat.c
License appears to be acceptable, but needs to be checked by e.g. Debian.
Step 1:
1. Add to netcdf-c/include/ncconfigure.h to use our version
if not already available as determined by HAVE_STRLCAT in config.h.
2. Add the strlcat code to libdispatch/dstring.c
3. Turns out that strlcat was already defined in several places.
So remove it from:
ncgen3/genlib.c
ncdump/dumplib.c
3. Define strlcat extern definition in ncconfigure.h.
4. Modify following directories to use strlcat:
libdap2 libdap4 ncdap_test dap4_test
Will do others in subsequent steps.
re pull request https://github.com/Unidata/netcdf-c/pull/405
re pull request https://github.com/Unidata/netcdf-c/pull/446
Notes:
1. This branch is a cleanup of the magic.dmh branch.
2. magic.dmh was originally merged, but caused problems with parallel IO.
It was re-issued as pull request https://github.com/Unidata/netcdf-c/pull/446.
3. This branch + pull request replace any previous pull requests and magic.dmh branch.
Given an otherwise valid netCDF file that has a corrupted header,
the netcdf library currently crashes. Instead, it should return
NC_ENOTNC.
Additionally, the NC_check_file_type code does not do the
forward search required by hdf5 files. It currently only looks
at file position 0 instead of 512, 1024, 2048,... Also, it turns
out that the HDF4 magic number is assumed to always be at the
beginning of the file (unlike HDF5).
The change is localized to libdispatch/dfile.c See
https://support.hdfgroup.org/release4/doc/DSpec_html/DS.pdf
Also, it turns out that the code in NC_check_file_type is duplicated
(mostly) in the function libsrc4/nc4file.c#nc_check_for_hdf.
This branch does the following.
1. Make NC_check_file_type return NC_ENOTNC instead of crashing.
2. Remove nc_check_for_hdf and centralize all file format checking
NC_check_file_type.
3. Add proper forward search for HDF5 files (but not HDF4 files)
to look for the magic number at offsets of 0, 512, 1024...
4. Add test tst_hdf5_offset.sh. This tests that hdf5 files with
an offset are properly recognized. It does so by prefixing
a legal file with some number of zero bytes: 512, 1024, etc.
5. Off-topic: Added -N flag to ncdump to force a specific output dataset name.
2. Factored out the parameter string parsing for ncgen and nccopy
int libdispatch/dfilter.c + include/ncfilter.h
3. Allow a parameter string to use constant types other than
unsigned int. See docs/filters.md for details.
4. Moved the old content of include/netcdf_filter.h into include/netcdf.h
and removed include/netcdf_filter.h as no longer needed.
5. Force the test filter (bzip2) in nc_test4/filter_test to
be built using BUILT_SOURCES.
The use of the following version-specific curl flags
is not always properly wrapped or aliased using
config.h HAVE_CURL... ifdefs.
# CURLOPT_USERNAME is not defined until curl version 7.19.1
# CURLOPT_PASSWORD is not defined until curl version 7.19.1
# CURLOPT_KEYPASSWD is not defined until curl version 7.16.4
-- aliased as needed to CURLOPT_SSLKEYPASSWD
# CURLINFO_RESPONSE_CODE is not defined until curl version 7.10.7
-- aliased as needed to CURLINFO_HTTP_CODE
# CURLOPT_CHUNK_BGN_FUNCTION is not defined until curl version 7.21.0
-- not used in our code
Primary change is to cleanup code and remove duplicated code.
1. Unify the rc file reading into libdispatch/drc.c. Eventually extend
if we need rc file for netcdf itself as opposed to the dap code.
2. Unify the extraction from the rc file of DAP authorization info.
3. Misc. other small unifications: make temp file, read file.
4. Avoid use of libcurl when reading file:// because
there is some kind of problem with the Visual Studio version.
Might be related to the winpath problem.
In any case, do direct read instead.
5. Add new error code NC_ERCFILE for errors in reading RC file.
6. Complete documentation cleanup as indicated in this comment
https://github.com/Unidata/netcdf-c/pull/472#issuecomment-325926426
7. Convert some occurrences of #ifdef _WIN32 to #ifdef _MSC_VER
generates garbage. This in turn interferes with using .netrc
because the garbage user+pwd can will override the
.netrc. Note that this may work ok sometimes
if the garbage happens to start with a nul character.
2. It turns out that the user:pwd combination needs to support
character escaping. One reason is the user may contain an '@' character.
The other is that modern password rules make it not unlikely that
the password will contain characters that interfere with url parsing.
So, the rule I have implemented is that all occurrences of the user:pwd
format must escape any dodgy characters. The escape format is URL escaping
of the form %XX. This applies both to user:pwd
embedded in a URL as well as the use of HTTP.CREDENTIALS.USERPASSWORD
in a .dodsrc/.daprc file. The user and password in .netrc must not
be escaped. This is now documented in docs/auth.md
The fix for #2 actually obviated #1. Now, internally, the user and pwd
are stored separately and not in the user:pwd format. They are combined
(and escaped) only when needed.
2. modified ncdap_tests to remove <cr>
from generated output before comparison
to expected. This is a hack because
my other attempts to force windows to use
binary mode have not worked.
from e-support OYW-455599.
Problem was that in nctime.c#CDMonthDay, it was setting up
the month -> #days table correctly, but it did not use it
because it forgot to check for Cd366, it only checked for Cd365.
were added to provide a path name converter from e.g. cygwin
paths to e.g. windows paths. This is necessary because
the shell scripts may produce cygwin paths, but the code
may have been compiled with Visual Studio. Similar issues
arise with Mingw.
At appropriate places, and if using Visual Studio or Mingw,
I added calls to the path conversion code.
Apparently I forgot to find all the places where this
conversion was needed. So this pr does the following:
1. Push the calls to the converter to the various libXXX
directories and out of libdispatch/dfile.c.
2. Add conversion calls to other parts of the code like oc2.
I also turns out that conversion code in dapcvt.c
had a bug when handling DAP Byte type under visual studio.
Notes:
1. there may still be places I missed that need to do path conversion.
2. need to make sure that calls to e.g. H5open also use converted path.
This is a follow-on in that the old utf8 code was still being
used in ncgen to convert utf8->utf16 when converting cdl to Java
(see genj.c).
The new code apparently has no utf16 support, but it does have
utf32 support. Converting utf32 -> utf16 can be approximated by
truncating the 32bits to 16 bits, unless the top 16 bits are
not zero. This latter condition is unlikely to be common because
it implies use of some rather obscure characters.
So solution is to convert to utf32 and truncate to 16 bits to
get utf16. An error is reported if the high-order truncated 16
bits are not zero. If we get complaints, then I will figure out
how to convert full utf32 to a utf16 pair.
Other changes:
1. removed the old code from ncgen.
2. changed UTF8PROC_DLLEXPORT (in utf8proc) to EXTERNL
and added appropriate includes. This should fix
issue https://github.com/Unidata/netcdf-c/issues/404,
but since we cannot duplicate the failure, I am not quite
sure.
This is a follow-on in that the old utf8 code was still being
used in ncgen to convert utf8->utf16 when converting cdl to Java
(see genj.c).
The new code apparently has no utf16 support, but it does have
utf32 support. Converting utf32 -> utf16 can be approximated by
truncating the 32bits to 16 bits, unless the top 16 bits are
not zero. This latter condition is unlikely to be common because
it implies use of some rather obscure characters.
So solution is to convert to utf32 and truncate to 16 bits to
get utf16. An error is reported if the high-order truncated 16
bits are not zero. If we get complaints, then I will figure out
how to convert full utf32 to a utf16 pair.
Also removed the old code from ncgen.
This relies on the HDF5 capability to
dynamically load compression filters.
Note that a compression filter is just
a subcase of filters.
The primary user-visible changes are as follows:
1. Add a standard header "netcdf_filter.h" that defines
the necessary API extensions
2. Modify ncgen to support two new special attributes
"_Filter_ID" and "_Filter_Parameters" so that compression
can be turned on when creating a file using ncgen.
4. Add a detailed description of filtering support
to the user's guide; see the file filters.md
5. Add a test case directory for this: nc_test4/filter_test.
It is fragile and a ./configure flags (-enable-filter-test)
is defined (default disabled) to shut this off this test
to avoid spurious 'make check' failures.
Note that the HDF5 documentation is not up-to-date, so
much of what is encoded here comes from examining the
actual code in the file H5PL.c in the HDF5 source code.
1. When running under windows (as opposed to cygwin)
we need to make sure to not user /cygdrive/ file paths.
This was ocurring in libdap4/d4read.c, but may occur
elsewhere.
2. Shell scripts in the git repo are not being checked-out
with the executable mode set. Had core.filemode set to false.
Was a major hassle to fix.
Specific changes:
1. Add dap4 code: libdap4 and dap4_test.
Note that until the d4ts server problem is solved, dap4 is turned off.
2. Modify various files to support dap4 flags:
configure.ac, Makefile.am, CMakeLists.txt, etc.
3. Add nc_test/test_common.sh. This centralizes
the handling of the locations of various
things in the build tree: e.g. where is
ncgen.exe located. See nc_test/test_common.sh
for details.
4. Modify .sh files to use test_common.sh
5. Obsolete separate oc2 by moving it to be part of
netcdf-c. This means replacing code with netcdf-c
equivalents.
5. Add --with-testserver to configure.ac to allow
override of the servers to be used for --enable-dap-remote-tests.
6. There were multiple versions of nctypealignment code. Try to
centralize in libdispatch/doffset.c and include/ncoffsets.h
7. Add a unit test for the ncuri code because of its complexity.
8. Move the findserver code out of libdispatch and into
a separate, self contained program in ncdap_test and dap4_test.
9. Move the dispatch header files (nc{3,4}dispatch.h) to
.../include because they are now shared by modules.
10. Revamp the handling of TOPSRCDIR and TOPBUILDDIR for shell scripts.
11. Make use of MREMAP if available
12. Misc. minor changes e.g.
- #include <config.h> -> #include "config.h"
- Add some no-install headers to /include
- extern -> EXTERNL and vice versa as needed
- misc header cleanup
- clean up checking for misc. unix vs microsoft functions
13. Change copyright decls in some files to point to LICENSE file.
14. Add notes to RELEASENOTES.md
The nvars, ndims, and natts fields on the NC_HDF5_FILE_INFO struct are
never set. The nvars field is read, but since it is never written,
the value is always zero.
Update utf8proc.[ch] to use the version now
maintained by the Julia Language project
(https://github.com/JuliaLang/utf8proc/blob/master/LICENSE.md).
The license for the previous version was
unacceptable for the Debian and Ubuntu release
systems. The new version both updates the code
and addresses the license issue.
It turns out that the utf8proc software we are using
was turned over to the Julia Language developers
and the license terms changed to allow modification.
(https://github.com/JuliaLang/utf8proc/blob/master/LICENSE.md).
So the fix here is as follows:
1. Wrap the library with a fixed interface: libdispatch/dutf8.c
and include/ncutf8.h.
2. Replace the existing utf8proc code with the new version
from https://github.com/JuliaLang/utf8proc.
3. Add a couple more test cases: nc_test/tst_utf8_validate.c
and nc_test_utf8_phrases.c. If/when I can find a usable
normalization test, I will incorporate that later.
Modified provenance code to allocate the minimal space
needed for _NCProperties attribute in file. Basically
required using malloc in the provenance code and in ncdump.
Otherwise should cause no externally visible effects.
Also removed the ENABLE_FILEINFO from configure.ac since
the provenance code is no longer optional.
As best I can tell, this should be ENABLE_PARALLEL4 instead of ENABLE_PARALLEL. ENABLE_PARALLEL is not used other than in a couple documentation files. But, ENABLE_PARALLEL4 is set in the top-level CMakeLists.txt file if a parallel hdf5 library is detected.
The single commented out line was the only use of a c-99 or c++ style comment in the entire file. Either remove line completely or change to c-89 style comment to permit use with projects that still require c89 compatible code.
This consists of a persistent attribute named
_NCProperties plus two computed attributes
_IsNetcdf4 and _SuperblockVersion.
See the 'Provenance Attributes' section
of docs/attribute_conventions.md for details.
The addition of the nc_hashmap to facilitate quick
retrieval of var and dim by name did not take into
account key collisions -- two or more names hashed
to the same value. If the keys matched, it assumed
that the names matched also.
This change fixes this incorrect assumption and
checks both the key (which is the hash of the name)
and if the keys match, it also checks that the names
match.
While there have been no instances of duplicate keys,
they are certain to occur and cause difficult to
debug issues. This fix eliminates that defect.
In non-classic netcdf-4 models, it is allowable to have
large numbers of dims and vars. In many operations, the
entire list of dims or vars is searched for a dim/var matching
a specific name which results in *lots* of strncmp or strcmp
calls.
If we add a hash field to the var and dim structs similar to what
has already been done for the netcdf-3 formats, then we can hash the
name being searched for and numerically compare that value with
the var/dim hash value. If they match, then do a more expensive
strncmp call to ensure that the names truly match.
re: e support ZCL-340681 and CPW-270700
HDF4 supports compression (and chunking)
but the chunking was not being recorded
for HDF4 files. So, I modified the necessary
files to support HDF4 chunking.
was incomplete; complete the fix.
2. There is an inconsistency in the netcdfh interface
about whether rank (# dimensions) is an int or
size_t. I inadvertently assumed the latter and that
breaks some API calls; so revert back.
times, but returning the same ncid.
2. Separate out the dap auth tests
and make them disabled by default.
3. turn of ncdap_test/test_varm3 until
we can find a copy of coads_climatology.nc
error occurs after an "exit:" label.
Corrected a dozen Coverity errors (mainly allocation issues, along with a few
other things):
711711, 711802, 711803, 711905, 970825, 996123, 996124, 1025787,
1047274, 1130013, 1130014, 1139538
Refactored internal fill-value code to correctly handle string types, and
especially to allow NULL pointers and null strings (ie. "") to be
distinguished. The code now avoids partially aliasing the two together
(which only happened on the 'write' side of things and wasn't reflected on
the 'read' side, adding to the previous confusion).
Probably still weak on handling fill-values of variable-length and compound
datatypes.
Refactored the recursive metadata reads a bit more, to process HDF5 named
datatypes and datasets immediately, avoiding chewing up memory for those
types of objects, etc.
Finished uncommenting and updating the nc_test4/tst_fills2.c code (as I'm
proceeding alphabetically through the nc_test4 code files).
Add a new function called nc_inq_format_extended that
returns more detailed format information (vis-a-vis
nc_inq_format) about an open dataset.
Note that the netcdf API will present the file as if it had
the format specified by nc_inq_format. The true file
format, however, may not even be a netcdf file; it might be
DAP, HDF4, or PNETCDF, for example. This function returns
that true file type. It also returns the effective mode for
the file.
signature: nc_inq_format_extended(int ncid, int* formatp, int* modep)
where
* ncid is the NetCDF ID from a previous call to nc_open() or
nc_create().
* formatp is a pointer to a location for returned true format.
* modep is a pointer to a location for returned mode flags.
Refer to the actual list in the file netcdf.h to see the
currently defined set.
Also added test cases (tst_formatx*).
to clean up resources properly on failure.
Refactored doubly-linked list code for objects in the libsrc4 directory,
cleaning up the add/del routines, breaking out the common next/prev
pointers into a struct and extracting the add/del operations on them,
changed the list of dims to add new dims in the same order as the other
types, made all add routines able to optionally return a pointer to the
newly created object.
Removed some dead code (pg_var(), nc4_pg_var1(), nc4_pg_varm(), misc. small
routines, etc)
Fixed fill value handling for string types in nc4_get_vara().
Changed many malloc()+strcpy() pairs into calls to strdup().
Cleaned up misc. other minor Coverity issues.
many cleanups to fix compiler warnings, streamline iteration over objects
in HDF5 file when opening the file, and generally straightening out the code
to be cleaner and simpler.
Tested on Mac OS/X with gcc 4.8 and OpenMPI (which uses clang).
are duplicated in netcdf.h and netcdf_par.h
This commit removed the duplicates from netcdf.h
since they are only used when parallism is
enabled.
/* Use these with nc_var_par_access(). */
#define NC_INDEPENDENT 0
#define NC_COLLECTIVE 1
/* Set parallel access for a variable to independent (the default) or
* collective. */
EXTERNL int
nc_var_par_access(int ncid, int varid, int par_access);
an unlimited-dimension variable, and different processes don't agree on the
whether to extend the underlying HDF5 dataset, or don't agree on the amount
to extend the dataset.
group renaming. The primary API
is 'nc_rename_grp(int grpid, const char* name)'.
No test cases provided yet.
This also required adding a rename_grp entry
to the dispatch tables.
Also added ability capability for netCDF-4 to write and read NIL
values for string type attributes and variables, so these can be read
if used in HDF5 files.
Include are additions to CMakeLists files to reflect new tests.
unlimited dimension hanging. Extending the size of an unlimited
dimension in HDF5 must be a collective operation, so now an error is
returned if trying to extend in independent access mode.
Quincey's bug fixes for parallel build portability, particularly
OpenMPI on MacOS-X.
The DAP code compiles constraints into an internal
tree form. When it goes to fetch data, it converts
that tree form into a string suitable for use in a url.
The tree->string code was using '<' when it should be
using '<=' and vice versa.
The fix is to make sure the right operator is used.
coordinate variables and their associated dimensions occurs in any
subgroup, rather than just the root group. If this occurs, the
variable attribute "_Netcdf4Dimid" is created for every dimension
scale.
Also add a test for this bug fix in tst_dims3.nc, based on Pedro
Vicente's demo.
Fix Jira issue NCF-29 (https://bugtracking.unidata.ucar.edu/browse/NCF-29):
making the netCDF-4 library ignore HDF5 datasets and attributes which have
datatypes (such as references) that it doesn't understand.
Tested on:
Mac OSX/64 10.8.2 (amazon) w/--enable-netcdf-4 --enable-extra-tests --enable-extra-example-tests
--disable-shared --enable-logging
CMake related changes in CMakeLists.txt files,
cmake_config.h.in. Other changes relate to
Windows-specific issues, and changes made
when regenerating generated source files.
o Changed variable names 'typeid' to 'typeid1' to avoid a namespace conflict
in visual studio.
o Cleaned up a handful of warnings in Visual Studio.
o Addressed a few Coverity-discovered issuesl
o Made changes to CMake-based builds.
contain as little file-type specific info as possible. It
modifies especially libsrc so that all of the netcdf-3 data
that used to be in struct NC is now kept in a separate chunk
of data pointed to by the struct NC. This makes all of
current protocols consistent: netcdf-3, netcdf-4, and dap.
reported by static analysis, including memory leak in ncdump, missing
size_t cast for chunk cache. Fixed various doc problems, including
byte vs. char issues, missing NC_UBYTE in type list, needed link to
"Building with Windows" page.
implementation. Deleted obsolete win32, soon to be replaced by Ward's
Windows 32- and 64-bit fixes for building with MSYS/MinGW. Made
cosmetic cleanup to output of "make check" to make it easier for users
to interpret. Fixed bug NCF-175: ncdump -t incorrectly interpreting
units attribute (such as "days") without a base time (such as "since
2007-01-01") as a time unit.
Changed name to 4.2.1-beta.
range_error checks in netCDF-4 type conversion code. Made netCDF
attribute tests with type conversion more comprehensive and stringent,
fixing bugs identified with better tests. Changed a test in
nc_test/tst_atts.c to use netCDF-3 file instead of netCDF-4 file,
because that directory is supposed to be for tests that work with
--disable-netcdf-4. Added test demonstrating NCF-171 bug on 32-bit
platforms, only run when configured with --enable-extra-tests.
o include/nctime.h:
o Modified how the following functions are specified
external when building a DLL:
* cdRel2Iso
* cdChar2Comp
* Cdh2e
* Cde2h
* cdParseRelunits
The in-memory files can be made persistent if nc_create is called with
NC_DISKLESS|NC_WRITE flags set. Initial test case also included.
- Modified ncio mechanism to support
multiple ncio packages; this is so we
can have posixio and memio operating
at the same time.
- cleanup up a bunch of lint issues (unused variables, etc).
- Fix NCF-157 to modify DAP code to support
partial variable retrieval.
- Fix of NCF-154 to solve problem of ncgen
improperly processing data lists for variables
of size greater than 2**18 bytes.
- Fix ncgen processing of char variables that have
multiple unlimited dimensions.
- Partly fix Jira issue: NCF-145 (vlen issues).
- Benchmark program nc_test4/tst_ar4_*) requires arguments
and should only be invoked inside a shell
script; fixed so that they terminate cleanly
if invoked with no arguments.
- Fix the Doxygen processing so it will work
with make distcheck.
- Begin switchover to using an alternative to ncio.
- Begin support for in-memory (diskless) files.