netcdf-c/nc_test/test_byterange.sh

128 lines
4.0 KiB
Bash
Raw Normal View History

Provide byte-range reading of remote datasets re: issue https://github.com/Unidata/netcdf-c/issues/1251 Assume that you have the URL to a remote dataset which is a normal netcdf-3 or netcdf-4 file. This PR allows the netcdf-c to read that dataset's contents as a netcdf file using HTTP byte ranges if the remote server supports byte-range access. Originally, this PR was set up to access Amazon S3 objects, but it can also access other remote datasets such as those provided by a Thredds server via the HTTPServer access protocol. It may also work for other kinds of servers. Note that this is not intended as a true production capability because, as is known, this kind of access to can be quite slow. In addition, the byte-range IO drivers do not currently do any sort of optimization or caching. An additional goal here is to gain some experience with the Amazon S3 REST protocol. This architecture and its use documented in the file docs/byterange.dox. There are currently two test cases: 1. nc_test/tst_s3raw.c - this does a simple open, check format, close cycle for a remote netcdf-3 file and a remote netcdf-4 file. 2. nc_test/test_s3raw.sh - this uses ncdump to investigate some remote datasets. This PR also incorporates significantly changed model inference code (see the superceded PR https://github.com/Unidata/netcdf-c/pull/1259). 1. It centralizes the code that infers the dispatcher. 2. It adds support for byte-range URLs Other changes: 1. NC_HDF5_finalize was not being properly called by nc_finalize(). 2. Fix minor bug in ncgen3.l 3. fix memory leak in nc4info.c 4. add code to walk the .daprc triples and to replace protocol= fragment tag with a more general mode= tag. Final Note: Th inference code is still way too complicated. We need to move to the validfile() model used by netcdf Java, where each dispatcher is asked if it can process the file. This decentralizes the inference code. This will be done after all the major new dispatchers (PIO, Zarr, etc) have been implemented.
2019-01-02 09:27:36 +08:00
#!/bin/sh
if test "x$srcdir" = x ; then srcdir=`pwd`; fi
. ../test_common.sh
set -e
Add support for Zarr string type to NCZarr * re: https://github.com/Unidata/netcdf-c/pull/2278 * re: https://github.com/Unidata/netcdf-c/issues/2485 * re: https://github.com/Unidata/netcdf-c/issues/2474 This PR subsumes PR https://github.com/Unidata/netcdf-c/pull/2278. Actually is a bit an omnibus covering several issues. ## PR https://github.com/Unidata/netcdf-c/pull/2278 Add support for the Zarr string type. Zarr strings are restricted currently to be of fixed size. The primary issue to be addressed is to provide a way for user to specify the size of the fixed length strings. This is handled by providing the following new attributes special: 1. **_nczarr_default_maxstrlen** — This is an attribute of the root group. It specifies the default maximum string length for string types. If not specified, then it has the value of 64 characters. 2. **_nczarr_maxstrlen** — This is a per-variable attribute. It specifies the maximum string length for the string type associated with the variable. If not specified, then it is assigned the value of **_nczarr_default_maxstrlen**. This PR also requires some hacking to handle the existing netcdf-c NC_CHAR type, which does not exist in zarr. The goal was to choose numpy types for both the netcdf-c NC_STRING type and the netcdf-c NC_CHAR type such that if a pure zarr implementation read them, it would still work and an NC_CHAR type would be handled by zarr as a string of length 1. For writing variables and NCZarr attributes, the type mapping is as follows: * "|S1" for NC_CHAR. * ">S1" for NC_STRING && MAXSTRLEN==1 * ">Sn" for NC_STRING && MAXSTRLEN==n Note that it is a bit of a hack to use endianness, but it should be ok since for string/char, the endianness has no meaning. For reading attributes with pure zarr (i.e. with no nczarr atribute types defined), they will always be interpreted as of type NC_CHAR. ## Issue: https://github.com/Unidata/netcdf-c/issues/2474 This PR partly fixes this issue because it provided more comprehensive support for Zarr attributes that are JSON valued expressions. This PR still does not address the problem in that issue where the _ARRAY_DIMENSION attribute is incorrectly set. Than can only be fixed by the creator of the datasets. ## Issue: https://github.com/Unidata/netcdf-c/issues/2485 This PR also fixes the scalar failure shown in this issue. It generally cleans up scalar handling. It also adds a note to the documentation describing that NCZarr supports scalars while Zarr does not and also how scalar interoperability is achieved. ## Misc. Other Changes 1. Convert the nczarr special attributes and keys to be all lower case. So "_NCZARR_ATTR" now used "_nczarr_attr. Support back compatibility for the upper case names. 2. Cleanup my too-clever-by-half handling of scalars in libnczarr.
2022-08-28 10:21:13 +08:00
# Note: thredds-test is currently down and the URLs need to be replaced
# Test Urls
Add support for Zarr string type to NCZarr * re: https://github.com/Unidata/netcdf-c/pull/2278 * re: https://github.com/Unidata/netcdf-c/issues/2485 * re: https://github.com/Unidata/netcdf-c/issues/2474 This PR subsumes PR https://github.com/Unidata/netcdf-c/pull/2278. Actually is a bit an omnibus covering several issues. ## PR https://github.com/Unidata/netcdf-c/pull/2278 Add support for the Zarr string type. Zarr strings are restricted currently to be of fixed size. The primary issue to be addressed is to provide a way for user to specify the size of the fixed length strings. This is handled by providing the following new attributes special: 1. **_nczarr_default_maxstrlen** — This is an attribute of the root group. It specifies the default maximum string length for string types. If not specified, then it has the value of 64 characters. 2. **_nczarr_maxstrlen** — This is a per-variable attribute. It specifies the maximum string length for the string type associated with the variable. If not specified, then it is assigned the value of **_nczarr_default_maxstrlen**. This PR also requires some hacking to handle the existing netcdf-c NC_CHAR type, which does not exist in zarr. The goal was to choose numpy types for both the netcdf-c NC_STRING type and the netcdf-c NC_CHAR type such that if a pure zarr implementation read them, it would still work and an NC_CHAR type would be handled by zarr as a string of length 1. For writing variables and NCZarr attributes, the type mapping is as follows: * "|S1" for NC_CHAR. * ">S1" for NC_STRING && MAXSTRLEN==1 * ">Sn" for NC_STRING && MAXSTRLEN==n Note that it is a bit of a hack to use endianness, but it should be ok since for string/char, the endianness has no meaning. For reading attributes with pure zarr (i.e. with no nczarr atribute types defined), they will always be interpreted as of type NC_CHAR. ## Issue: https://github.com/Unidata/netcdf-c/issues/2474 This PR partly fixes this issue because it provided more comprehensive support for Zarr attributes that are JSON valued expressions. This PR still does not address the problem in that issue where the _ARRAY_DIMENSION attribute is incorrectly set. Than can only be fixed by the creator of the datasets. ## Issue: https://github.com/Unidata/netcdf-c/issues/2485 This PR also fixes the scalar failure shown in this issue. It generally cleans up scalar handling. It also adds a note to the documentation describing that NCZarr supports scalars while Zarr does not and also how scalar interoperability is achieved. ## Misc. Other Changes 1. Convert the nczarr special attributes and keys to be all lower case. So "_NCZARR_ATTR" now used "_nczarr_attr. Support back compatibility for the upper case names. 2. Cleanup my too-clever-by-half handling of scalars in libnczarr.
2022-08-28 10:21:13 +08:00
if test "x$FEATURE_THREDDSTEST" = x1 ; then
Fix byterange handling of some URLS re: Issue The byterange handling of the following URLS fails. ### Problem 1: "https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc#mode=bytes" It turns out that byterange in hdf5 has two possible targets: S3 and not-S3 (e.g. a thredds server or the crudata URL above). Each uses a different HDF5 Virtual File Driver (VFD). I incorrectly set up the byterange code in libhdf5 so that it would choose one or the other of the two VFD's for any netcdf-c library build. The fix is to allow it to choose either one at run-time. ### Problem 2: "https://noaa-goes16.s3.amazonaws.com/ABI-L1b-RadF/2022/001/18/OR_ABI-L1b-RadF-M6C01_G16_s20220011800205_e20220011809513_c20220011809562.nc#mode=bytes,s3" When given what appears to be an S3-related URL, the netcdf-c library code converts it into a canonical, so-called "path" format. In casing out the possible input URL formats, I missed the case where the host contains the bucket ("noaa-goes16"), but not the region. So the fix was to check for this case. ## Misc. Related Changes 1. Since S3 is used in more than just NCZarr, I changed the automake/cmake options to replace "--enable-nczarr-s3" with "--enable-s3", but keeping the former option as a synonym for the latter. This also entailed cleaning up libnetcdf.settings WRT S3 support 2. Added the above URLS as additional test cases ## Misc. Un-Related Changes 1. CURLOPT_PUT is deprecated in favor to CURLOPT_UPLOAD 2. Fix some minor warnings ## Open Problems * Under Ubuntu, either libcrypto or aws-sdk-cpp has a memory leak.
2023-03-03 10:51:02 +08:00
URL3="https://thredds-test.unidata.ucar.edu/thredds/fileServer/pointData/cf_dsg/example/point.nc#mode=bytes"
URL4b="https://thredds-test.unidata.ucar.edu/thredds/fileServer/irma/metar/files/METAR_20170910_0000.nc#bytes"
Add support for Zarr string type to NCZarr * re: https://github.com/Unidata/netcdf-c/pull/2278 * re: https://github.com/Unidata/netcdf-c/issues/2485 * re: https://github.com/Unidata/netcdf-c/issues/2474 This PR subsumes PR https://github.com/Unidata/netcdf-c/pull/2278. Actually is a bit an omnibus covering several issues. ## PR https://github.com/Unidata/netcdf-c/pull/2278 Add support for the Zarr string type. Zarr strings are restricted currently to be of fixed size. The primary issue to be addressed is to provide a way for user to specify the size of the fixed length strings. This is handled by providing the following new attributes special: 1. **_nczarr_default_maxstrlen** — This is an attribute of the root group. It specifies the default maximum string length for string types. If not specified, then it has the value of 64 characters. 2. **_nczarr_maxstrlen** — This is a per-variable attribute. It specifies the maximum string length for the string type associated with the variable. If not specified, then it is assigned the value of **_nczarr_default_maxstrlen**. This PR also requires some hacking to handle the existing netcdf-c NC_CHAR type, which does not exist in zarr. The goal was to choose numpy types for both the netcdf-c NC_STRING type and the netcdf-c NC_CHAR type such that if a pure zarr implementation read them, it would still work and an NC_CHAR type would be handled by zarr as a string of length 1. For writing variables and NCZarr attributes, the type mapping is as follows: * "|S1" for NC_CHAR. * ">S1" for NC_STRING && MAXSTRLEN==1 * ">Sn" for NC_STRING && MAXSTRLEN==n Note that it is a bit of a hack to use endianness, but it should be ok since for string/char, the endianness has no meaning. For reading attributes with pure zarr (i.e. with no nczarr atribute types defined), they will always be interpreted as of type NC_CHAR. ## Issue: https://github.com/Unidata/netcdf-c/issues/2474 This PR partly fixes this issue because it provided more comprehensive support for Zarr attributes that are JSON valued expressions. This PR still does not address the problem in that issue where the _ARRAY_DIMENSION attribute is incorrectly set. Than can only be fixed by the creator of the datasets. ## Issue: https://github.com/Unidata/netcdf-c/issues/2485 This PR also fixes the scalar failure shown in this issue. It generally cleans up scalar handling. It also adds a note to the documentation describing that NCZarr supports scalars while Zarr does not and also how scalar interoperability is achieved. ## Misc. Other Changes 1. Convert the nczarr special attributes and keys to be all lower case. So "_NCZARR_ATTR" now used "_nczarr_attr. Support back compatibility for the upper case names. 2. Cleanup my too-clever-by-half handling of scalars in libnczarr.
2022-08-28 10:21:13 +08:00
fi
Fix byterange handling of some URLS re: Issue The byterange handling of the following URLS fails. ### Problem 1: "https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc#mode=bytes" It turns out that byterange in hdf5 has two possible targets: S3 and not-S3 (e.g. a thredds server or the crudata URL above). Each uses a different HDF5 Virtual File Driver (VFD). I incorrectly set up the byterange code in libhdf5 so that it would choose one or the other of the two VFD's for any netcdf-c library build. The fix is to allow it to choose either one at run-time. ### Problem 2: "https://noaa-goes16.s3.amazonaws.com/ABI-L1b-RadF/2022/001/18/OR_ABI-L1b-RadF-M6C01_G16_s20220011800205_e20220011809513_c20220011809562.nc#mode=bytes,s3" When given what appears to be an S3-related URL, the netcdf-c library code converts it into a canonical, so-called "path" format. In casing out the possible input URL formats, I missed the case where the host contains the bucket ("noaa-goes16"), but not the region. So the fix was to check for this case. ## Misc. Related Changes 1. Since S3 is used in more than just NCZarr, I changed the automake/cmake options to replace "--enable-nczarr-s3" with "--enable-s3", but keeping the former option as a synonym for the latter. This also entailed cleaning up libnetcdf.settings WRT S3 support 2. Added the above URLS as additional test cases ## Misc. Un-Related Changes 1. CURLOPT_PUT is deprecated in favor to CURLOPT_UPLOAD 2. Fix some minor warnings ## Open Problems * Under Ubuntu, either libcrypto or aws-sdk-cpp has a memory leak.
2023-03-03 10:51:02 +08:00
if test "x$FEATURE_S3TESTS" = xyes ; then
URL4a="https://s3.us-east-1.amazonaws.com/noaa-goes16/ABI-L1b-RadC/2017/059/03/OR_ABI-L1b-RadC-M3C13_G16_s20170590337505_e20170590340289_c20170590340316.nc#mode=bytes"
URL4c="s3://noaa-goes16/ABI-L1b-RadC/2017/059/03/OR_ABI-L1b-RadC-M3C13_G16_s20170590337505_e20170590340289_c20170590340316.nc#mode=bytes"
# Requires auth
URL3b="s3://unidata-zarr-test-data/byterangefiles/upload3.nc#bytes"
# Requires auth
URL4d="s3://unidata-zarr-test-data/byterangefiles/upload4.nc#bytes&aws.profile=unidata"
Fix byterange handling of some URLS re: Issue The byterange handling of the following URLS fails. ### Problem 1: "https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc#mode=bytes" It turns out that byterange in hdf5 has two possible targets: S3 and not-S3 (e.g. a thredds server or the crudata URL above). Each uses a different HDF5 Virtual File Driver (VFD). I incorrectly set up the byterange code in libhdf5 so that it would choose one or the other of the two VFD's for any netcdf-c library build. The fix is to allow it to choose either one at run-time. ### Problem 2: "https://noaa-goes16.s3.amazonaws.com/ABI-L1b-RadF/2022/001/18/OR_ABI-L1b-RadF-M6C01_G16_s20220011800205_e20220011809513_c20220011809562.nc#mode=bytes,s3" When given what appears to be an S3-related URL, the netcdf-c library code converts it into a canonical, so-called "path" format. In casing out the possible input URL formats, I missed the case where the host contains the bucket ("noaa-goes16"), but not the region. So the fix was to check for this case. ## Misc. Related Changes 1. Since S3 is used in more than just NCZarr, I changed the automake/cmake options to replace "--enable-nczarr-s3" with "--enable-s3", but keeping the former option as a synonym for the latter. This also entailed cleaning up libnetcdf.settings WRT S3 support 2. Added the above URLS as additional test cases ## Misc. Un-Related Changes 1. CURLOPT_PUT is deprecated in favor to CURLOPT_UPLOAD 2. Fix some minor warnings ## Open Problems * Under Ubuntu, either libcrypto or aws-sdk-cpp has a memory leak.
2023-03-03 10:51:02 +08:00
# Test alternate URL with no specified region
URL4e="http://noaa-goes16.s3.amazonaws.com/ABI-L1b-RadF/2022/001/18/OR_ABI-L1b-RadF-M6C01_G16_s20220011800205_e20220011809513_c20220011809562.nc#mode=bytes,s3"
fi
URL4f="https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc#mode=bytes"
Mostly revert the filter code to reduce its complexity of use. re: https://github.com/Unidata/netcdf-c/issues/1836 Revert the internal filter code to simplify it. From the user's point of view, the only visible changes should be: 1. The functions that convert text to filter specs have had their signature reverted and have been moved to netcdf_aux.h 2. Some filter API functions now return NC_ENOFILTER when inquiry is made about some filter. Internally,the dispatch table has been modified to get rid of the filter_actions entry and associated complex structures. It has been replaced with inq_var_filter_ids and inq_var_filter_info entries and the dispatch table version has been bumped to 3. Corresponding NOOP and NOTNC4 functions were added to libdispatch/dnotnc4.c. Also, the filter_action entries in dispatch tables were replaced for all dispatch code bases (HDF5, DAP2, etc). This should only impact UDF users. In the process, it became clear that the form of the filters field in NC_VAR_INFO_T was format dependent, so I converted it to be of type void* and pushed its management into the various dispatch code bases. Specifically libhdf5 and libnczarr now manage the filters field in their own way. The auxilliary functions for parsing textual filter specifications were moved to netcdf_aux.h and were renamed to the following: * ncaux_h5filterspec_parse * ncaux_h5filterspec_parselist * ncaux_h5filterspec_free * ncaux_h5filter_fix8 Misc. Other Changes: 1. Document NUG/filters.md updated to reflect the changes above. 2. All the old data types (structs and enums) used by filter_actions actions were deleted. The exception is the NC_H5_Filterspec because it is needed by ncaux_h5filterspec_parselist. 3. Clientside filters were removed -- another enhancement for which no-one ever asked. 4. The ability to remove filters was itself removed. 5. Some functionality needed by nczarr was moved from libhdf5 to libsrc4 e.g. nc4_find_default_chunksizes 6. All the filterx code was removed 7. ncfilter.h and nc4filter.c no longer used Misc. Unrelated Changes: 1. The nczarr_test makefile clean was leaving some directories; so add clean-local to take care of them.
2020-09-28 02:43:46 +08:00
# Do not use unless we know it has some permanence (note the segment 'testing' in the URL);
URL4x="https://s3.us-west-2.amazonaws.com/coawst-public/testing/HadCRUT.4.6.0.0.median.nc#mode=bytes,&aws.profile=none"
Provide byte-range reading of remote datasets re: issue https://github.com/Unidata/netcdf-c/issues/1251 Assume that you have the URL to a remote dataset which is a normal netcdf-3 or netcdf-4 file. This PR allows the netcdf-c to read that dataset's contents as a netcdf file using HTTP byte ranges if the remote server supports byte-range access. Originally, this PR was set up to access Amazon S3 objects, but it can also access other remote datasets such as those provided by a Thredds server via the HTTPServer access protocol. It may also work for other kinds of servers. Note that this is not intended as a true production capability because, as is known, this kind of access to can be quite slow. In addition, the byte-range IO drivers do not currently do any sort of optimization or caching. An additional goal here is to gain some experience with the Amazon S3 REST protocol. This architecture and its use documented in the file docs/byterange.dox. There are currently two test cases: 1. nc_test/tst_s3raw.c - this does a simple open, check format, close cycle for a remote netcdf-3 file and a remote netcdf-4 file. 2. nc_test/test_s3raw.sh - this uses ncdump to investigate some remote datasets. This PR also incorporates significantly changed model inference code (see the superceded PR https://github.com/Unidata/netcdf-c/pull/1259). 1. It centralizes the code that infers the dispatcher. 2. It adds support for byte-range URLs Other changes: 1. NC_HDF5_finalize was not being properly called by nc_finalize(). 2. Fix minor bug in ncgen3.l 3. fix memory leak in nc4info.c 4. add code to walk the .daprc triples and to replace protocol= fragment tag with a more general mode= tag. Final Note: Th inference code is still way too complicated. We need to move to the validfile() model used by netcdf Java, where each dispatcher is asked if it can process the file. This decentralizes the inference code. This will be done after all the major new dispatchers (PIO, Zarr, etc) have been implemented.
2019-01-02 09:27:36 +08:00
Fix byterange handling of some URLS re: Issue The byterange handling of the following URLS fails. ### Problem 1: "https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc#mode=bytes" It turns out that byterange in hdf5 has two possible targets: S3 and not-S3 (e.g. a thredds server or the crudata URL above). Each uses a different HDF5 Virtual File Driver (VFD). I incorrectly set up the byterange code in libhdf5 so that it would choose one or the other of the two VFD's for any netcdf-c library build. The fix is to allow it to choose either one at run-time. ### Problem 2: "https://noaa-goes16.s3.amazonaws.com/ABI-L1b-RadF/2022/001/18/OR_ABI-L1b-RadF-M6C01_G16_s20220011800205_e20220011809513_c20220011809562.nc#mode=bytes,s3" When given what appears to be an S3-related URL, the netcdf-c library code converts it into a canonical, so-called "path" format. In casing out the possible input URL formats, I missed the case where the host contains the bucket ("noaa-goes16"), but not the region. So the fix was to check for this case. ## Misc. Related Changes 1. Since S3 is used in more than just NCZarr, I changed the automake/cmake options to replace "--enable-nczarr-s3" with "--enable-s3", but keeping the former option as a synonym for the latter. This also entailed cleaning up libnetcdf.settings WRT S3 support 2. Added the above URLS as additional test cases ## Misc. Un-Related Changes 1. CURLOPT_PUT is deprecated in favor to CURLOPT_UPLOAD 2. Fix some minor warnings ## Open Problems * Under Ubuntu, either libcrypto or aws-sdk-cpp has a memory leak.
2023-03-03 10:51:02 +08:00
Provide byte-range reading of remote datasets re: issue https://github.com/Unidata/netcdf-c/issues/1251 Assume that you have the URL to a remote dataset which is a normal netcdf-3 or netcdf-4 file. This PR allows the netcdf-c to read that dataset's contents as a netcdf file using HTTP byte ranges if the remote server supports byte-range access. Originally, this PR was set up to access Amazon S3 objects, but it can also access other remote datasets such as those provided by a Thredds server via the HTTPServer access protocol. It may also work for other kinds of servers. Note that this is not intended as a true production capability because, as is known, this kind of access to can be quite slow. In addition, the byte-range IO drivers do not currently do any sort of optimization or caching. An additional goal here is to gain some experience with the Amazon S3 REST protocol. This architecture and its use documented in the file docs/byterange.dox. There are currently two test cases: 1. nc_test/tst_s3raw.c - this does a simple open, check format, close cycle for a remote netcdf-3 file and a remote netcdf-4 file. 2. nc_test/test_s3raw.sh - this uses ncdump to investigate some remote datasets. This PR also incorporates significantly changed model inference code (see the superceded PR https://github.com/Unidata/netcdf-c/pull/1259). 1. It centralizes the code that infers the dispatcher. 2. It adds support for byte-range URLs Other changes: 1. NC_HDF5_finalize was not being properly called by nc_finalize(). 2. Fix minor bug in ncgen3.l 3. fix memory leak in nc4info.c 4. add code to walk the .daprc triples and to replace protocol= fragment tag with a more general mode= tag. Final Note: Th inference code is still way too complicated. We need to move to the validfile() model used by netcdf Java, where each dispatcher is asked if it can process the file. This decentralizes the inference code. This will be done after all the major new dispatchers (PIO, Zarr, etc) have been implemented.
2019-01-02 09:27:36 +08:00
echo ""
testsetup() {
U=$1
# Create and upload test files
if test "x$FEATURE_S3TESTS" = xyes ; then
rm -f upload4.nc upload3.nc
${execdir}/../nczarr_test/s3util clear -u ${U} -k /byterangefiles
${NCGEN} -lb -3 ${srcdir}/nc_enddef.cdl
mv nc_enddef.nc upload3.nc
${execdir}/../nczarr_test/s3util upload -u ${U} -k /byterangefiles/upload3.nc -f upload3.nc
if test "x$FEATURE_HDF5" = xyes ; then
${NCGEN} -lb -4 ${srcdir}/nc_enddef.cdl
mv nc_enddef.nc upload4.nc
${execdir}/../nczarr_test/s3util upload -u ${U} -k /byterangefiles/upload4.nc -f upload4.nc
fi
fi
rm -f tst_http_nc3.cdl tst_http_nc4?.cdl
}
testcleanup() {
U=$1
rm -f upload4.nc upload3.nc
if test "x$FEATURE_S3TESTS" = xyes ; then
${execdir}/../nczarr_test/s3util clear -u ${U} -k /byterangefiles
fi
}
testbytes() {
TAG="$1"
EXPECTED="$2"
U="$3"
K=`${NCDUMP} -k "$U" | tr -d '\r'`
Provide byte-range reading of remote datasets re: issue https://github.com/Unidata/netcdf-c/issues/1251 Assume that you have the URL to a remote dataset which is a normal netcdf-3 or netcdf-4 file. This PR allows the netcdf-c to read that dataset's contents as a netcdf file using HTTP byte ranges if the remote server supports byte-range access. Originally, this PR was set up to access Amazon S3 objects, but it can also access other remote datasets such as those provided by a Thredds server via the HTTPServer access protocol. It may also work for other kinds of servers. Note that this is not intended as a true production capability because, as is known, this kind of access to can be quite slow. In addition, the byte-range IO drivers do not currently do any sort of optimization or caching. An additional goal here is to gain some experience with the Amazon S3 REST protocol. This architecture and its use documented in the file docs/byterange.dox. There are currently two test cases: 1. nc_test/tst_s3raw.c - this does a simple open, check format, close cycle for a remote netcdf-3 file and a remote netcdf-4 file. 2. nc_test/test_s3raw.sh - this uses ncdump to investigate some remote datasets. This PR also incorporates significantly changed model inference code (see the superceded PR https://github.com/Unidata/netcdf-c/pull/1259). 1. It centralizes the code that infers the dispatcher. 2. It adds support for byte-range URLs Other changes: 1. NC_HDF5_finalize was not being properly called by nc_finalize(). 2. Fix minor bug in ncgen3.l 3. fix memory leak in nc4info.c 4. add code to walk the .daprc triples and to replace protocol= fragment tag with a more general mode= tag. Final Note: Th inference code is still way too complicated. We need to move to the validfile() model used by netcdf Java, where each dispatcher is asked if it can process the file. This decentralizes the inference code. This will be done after all the major new dispatchers (PIO, Zarr, etc) have been implemented.
2019-01-02 09:27:36 +08:00
if test "x$K" != "x$EXPECTED" ; then
echo "test_http: -k flag mismatch: expected=$EXPECTED have=$K"
exit 1
fi
rm -f tst_http_$TAG.cdl
Provide byte-range reading of remote datasets re: issue https://github.com/Unidata/netcdf-c/issues/1251 Assume that you have the URL to a remote dataset which is a normal netcdf-3 or netcdf-4 file. This PR allows the netcdf-c to read that dataset's contents as a netcdf file using HTTP byte ranges if the remote server supports byte-range access. Originally, this PR was set up to access Amazon S3 objects, but it can also access other remote datasets such as those provided by a Thredds server via the HTTPServer access protocol. It may also work for other kinds of servers. Note that this is not intended as a true production capability because, as is known, this kind of access to can be quite slow. In addition, the byte-range IO drivers do not currently do any sort of optimization or caching. An additional goal here is to gain some experience with the Amazon S3 REST protocol. This architecture and its use documented in the file docs/byterange.dox. There are currently two test cases: 1. nc_test/tst_s3raw.c - this does a simple open, check format, close cycle for a remote netcdf-3 file and a remote netcdf-4 file. 2. nc_test/test_s3raw.sh - this uses ncdump to investigate some remote datasets. This PR also incorporates significantly changed model inference code (see the superceded PR https://github.com/Unidata/netcdf-c/pull/1259). 1. It centralizes the code that infers the dispatcher. 2. It adds support for byte-range URLs Other changes: 1. NC_HDF5_finalize was not being properly called by nc_finalize(). 2. Fix minor bug in ncgen3.l 3. fix memory leak in nc4info.c 4. add code to walk the .daprc triples and to replace protocol= fragment tag with a more general mode= tag. Final Note: Th inference code is still way too complicated. We need to move to the validfile() model used by netcdf Java, where each dispatcher is asked if it can process the file. This decentralizes the inference code. This will be done after all the major new dispatchers (PIO, Zarr, etc) have been implemented.
2019-01-02 09:27:36 +08:00
# Now test the reading of at least the metadata
${NCDUMP} -h "$U" >tst_http_$TAG.cdl
Provide byte-range reading of remote datasets re: issue https://github.com/Unidata/netcdf-c/issues/1251 Assume that you have the URL to a remote dataset which is a normal netcdf-3 or netcdf-4 file. This PR allows the netcdf-c to read that dataset's contents as a netcdf file using HTTP byte ranges if the remote server supports byte-range access. Originally, this PR was set up to access Amazon S3 objects, but it can also access other remote datasets such as those provided by a Thredds server via the HTTPServer access protocol. It may also work for other kinds of servers. Note that this is not intended as a true production capability because, as is known, this kind of access to can be quite slow. In addition, the byte-range IO drivers do not currently do any sort of optimization or caching. An additional goal here is to gain some experience with the Amazon S3 REST protocol. This architecture and its use documented in the file docs/byterange.dox. There are currently two test cases: 1. nc_test/tst_s3raw.c - this does a simple open, check format, close cycle for a remote netcdf-3 file and a remote netcdf-4 file. 2. nc_test/test_s3raw.sh - this uses ncdump to investigate some remote datasets. This PR also incorporates significantly changed model inference code (see the superceded PR https://github.com/Unidata/netcdf-c/pull/1259). 1. It centralizes the code that infers the dispatcher. 2. It adds support for byte-range URLs Other changes: 1. NC_HDF5_finalize was not being properly called by nc_finalize(). 2. Fix minor bug in ncgen3.l 3. fix memory leak in nc4info.c 4. add code to walk the .daprc triples and to replace protocol= fragment tag with a more general mode= tag. Final Note: Th inference code is still way too complicated. We need to move to the validfile() model used by netcdf Java, where each dispatcher is asked if it can process the file. This decentralizes the inference code. This will be done after all the major new dispatchers (PIO, Zarr, etc) have been implemented.
2019-01-02 09:27:36 +08:00
# compare
diff -wb tst_http_$TAG.cdl ${srcdir}/ref_tst_http_$TAG.cdl
}
Provide byte-range reading of remote datasets re: issue https://github.com/Unidata/netcdf-c/issues/1251 Assume that you have the URL to a remote dataset which is a normal netcdf-3 or netcdf-4 file. This PR allows the netcdf-c to read that dataset's contents as a netcdf file using HTTP byte ranges if the remote server supports byte-range access. Originally, this PR was set up to access Amazon S3 objects, but it can also access other remote datasets such as those provided by a Thredds server via the HTTPServer access protocol. It may also work for other kinds of servers. Note that this is not intended as a true production capability because, as is known, this kind of access to can be quite slow. In addition, the byte-range IO drivers do not currently do any sort of optimization or caching. An additional goal here is to gain some experience with the Amazon S3 REST protocol. This architecture and its use documented in the file docs/byterange.dox. There are currently two test cases: 1. nc_test/tst_s3raw.c - this does a simple open, check format, close cycle for a remote netcdf-3 file and a remote netcdf-4 file. 2. nc_test/test_s3raw.sh - this uses ncdump to investigate some remote datasets. This PR also incorporates significantly changed model inference code (see the superceded PR https://github.com/Unidata/netcdf-c/pull/1259). 1. It centralizes the code that infers the dispatcher. 2. It adds support for byte-range URLs Other changes: 1. NC_HDF5_finalize was not being properly called by nc_finalize(). 2. Fix minor bug in ncgen3.l 3. fix memory leak in nc4info.c 4. add code to walk the .daprc triples and to replace protocol= fragment tag with a more general mode= tag. Final Note: Th inference code is still way too complicated. We need to move to the validfile() model used by netcdf Java, where each dispatcher is asked if it can process the file. This decentralizes the inference code. This will be done after all the major new dispatchers (PIO, Zarr, etc) have been implemented.
2019-01-02 09:27:36 +08:00
tests3auth() {
TAG="$1"
EXPECTED="$2"
U="$3"
K=`${NCDUMP} -k "$U" | tr -d '\r'`
if test "x$K" != "x$EXPECTED" ; then
echo "test_http: -k flag mismatch: expected=$EXPECTED have=$K"
exit 1
fi
rm -f tmp_${TAG}.cdl
# Now test the reading of at least the metadata
${NCDUMP} -n nc_enddef "$U" >tmp_${TAG}.cdl
# compare
diff -wb tmp_$TAG.cdl ${srcdir}/nc_enddef.cdl
}
testsetup https://s3.us-east-1.amazonaws.com/unidata-zarr-test-data
echo "*** Testing reading NetCDF-3 file with http"
Add support for Zarr string type to NCZarr * re: https://github.com/Unidata/netcdf-c/pull/2278 * re: https://github.com/Unidata/netcdf-c/issues/2485 * re: https://github.com/Unidata/netcdf-c/issues/2474 This PR subsumes PR https://github.com/Unidata/netcdf-c/pull/2278. Actually is a bit an omnibus covering several issues. ## PR https://github.com/Unidata/netcdf-c/pull/2278 Add support for the Zarr string type. Zarr strings are restricted currently to be of fixed size. The primary issue to be addressed is to provide a way for user to specify the size of the fixed length strings. This is handled by providing the following new attributes special: 1. **_nczarr_default_maxstrlen** — This is an attribute of the root group. It specifies the default maximum string length for string types. If not specified, then it has the value of 64 characters. 2. **_nczarr_maxstrlen** — This is a per-variable attribute. It specifies the maximum string length for the string type associated with the variable. If not specified, then it is assigned the value of **_nczarr_default_maxstrlen**. This PR also requires some hacking to handle the existing netcdf-c NC_CHAR type, which does not exist in zarr. The goal was to choose numpy types for both the netcdf-c NC_STRING type and the netcdf-c NC_CHAR type such that if a pure zarr implementation read them, it would still work and an NC_CHAR type would be handled by zarr as a string of length 1. For writing variables and NCZarr attributes, the type mapping is as follows: * "|S1" for NC_CHAR. * ">S1" for NC_STRING && MAXSTRLEN==1 * ">Sn" for NC_STRING && MAXSTRLEN==n Note that it is a bit of a hack to use endianness, but it should be ok since for string/char, the endianness has no meaning. For reading attributes with pure zarr (i.e. with no nczarr atribute types defined), they will always be interpreted as of type NC_CHAR. ## Issue: https://github.com/Unidata/netcdf-c/issues/2474 This PR partly fixes this issue because it provided more comprehensive support for Zarr attributes that are JSON valued expressions. This PR still does not address the problem in that issue where the _ARRAY_DIMENSION attribute is incorrectly set. Than can only be fixed by the creator of the datasets. ## Issue: https://github.com/Unidata/netcdf-c/issues/2485 This PR also fixes the scalar failure shown in this issue. It generally cleans up scalar handling. It also adds a note to the documentation describing that NCZarr supports scalars while Zarr does not and also how scalar interoperability is achieved. ## Misc. Other Changes 1. Convert the nczarr special attributes and keys to be all lower case. So "_NCZARR_ATTR" now used "_nczarr_attr. Support back compatibility for the upper case names. 2. Cleanup my too-clever-by-half handling of scalars in libnczarr.
2022-08-28 10:21:13 +08:00
if test "x$FEATURE_THREDDSTEST" = x1 ; then
echo "***Test remote classic file"
testbytes nc3 classic "$URL3"
fi
if test "x$FEATURE_HDF5" = xyes ; then
Fix byterange handling of some URLS re: Issue The byterange handling of the following URLS fails. ### Problem 1: "https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc#mode=bytes" It turns out that byterange in hdf5 has two possible targets: S3 and not-S3 (e.g. a thredds server or the crudata URL above). Each uses a different HDF5 Virtual File Driver (VFD). I incorrectly set up the byterange code in libhdf5 so that it would choose one or the other of the two VFD's for any netcdf-c library build. The fix is to allow it to choose either one at run-time. ### Problem 2: "https://noaa-goes16.s3.amazonaws.com/ABI-L1b-RadF/2022/001/18/OR_ABI-L1b-RadF-M6C01_G16_s20220011800205_e20220011809513_c20220011809562.nc#mode=bytes,s3" When given what appears to be an S3-related URL, the netcdf-c library code converts it into a canonical, so-called "path" format. In casing out the possible input URL formats, I missed the case where the host contains the bucket ("noaa-goes16"), but not the region. So the fix was to check for this case. ## Misc. Related Changes 1. Since S3 is used in more than just NCZarr, I changed the automake/cmake options to replace "--enable-nczarr-s3" with "--enable-s3", but keeping the former option as a synonym for the latter. This also entailed cleaning up libnetcdf.settings WRT S3 support 2. Added the above URLS as additional test cases ## Misc. Un-Related Changes 1. CURLOPT_PUT is deprecated in favor to CURLOPT_UPLOAD 2. Fix some minor warnings ## Open Problems * Under Ubuntu, either libcrypto or aws-sdk-cpp has a memory leak.
2023-03-03 10:51:02 +08:00
echo "***Test remote netcdf-4 files: non-s3"
Add support for Zarr string type to NCZarr * re: https://github.com/Unidata/netcdf-c/pull/2278 * re: https://github.com/Unidata/netcdf-c/issues/2485 * re: https://github.com/Unidata/netcdf-c/issues/2474 This PR subsumes PR https://github.com/Unidata/netcdf-c/pull/2278. Actually is a bit an omnibus covering several issues. ## PR https://github.com/Unidata/netcdf-c/pull/2278 Add support for the Zarr string type. Zarr strings are restricted currently to be of fixed size. The primary issue to be addressed is to provide a way for user to specify the size of the fixed length strings. This is handled by providing the following new attributes special: 1. **_nczarr_default_maxstrlen** — This is an attribute of the root group. It specifies the default maximum string length for string types. If not specified, then it has the value of 64 characters. 2. **_nczarr_maxstrlen** — This is a per-variable attribute. It specifies the maximum string length for the string type associated with the variable. If not specified, then it is assigned the value of **_nczarr_default_maxstrlen**. This PR also requires some hacking to handle the existing netcdf-c NC_CHAR type, which does not exist in zarr. The goal was to choose numpy types for both the netcdf-c NC_STRING type and the netcdf-c NC_CHAR type such that if a pure zarr implementation read them, it would still work and an NC_CHAR type would be handled by zarr as a string of length 1. For writing variables and NCZarr attributes, the type mapping is as follows: * "|S1" for NC_CHAR. * ">S1" for NC_STRING && MAXSTRLEN==1 * ">Sn" for NC_STRING && MAXSTRLEN==n Note that it is a bit of a hack to use endianness, but it should be ok since for string/char, the endianness has no meaning. For reading attributes with pure zarr (i.e. with no nczarr atribute types defined), they will always be interpreted as of type NC_CHAR. ## Issue: https://github.com/Unidata/netcdf-c/issues/2474 This PR partly fixes this issue because it provided more comprehensive support for Zarr attributes that are JSON valued expressions. This PR still does not address the problem in that issue where the _ARRAY_DIMENSION attribute is incorrectly set. Than can only be fixed by the creator of the datasets. ## Issue: https://github.com/Unidata/netcdf-c/issues/2485 This PR also fixes the scalar failure shown in this issue. It generally cleans up scalar handling. It also adds a note to the documentation describing that NCZarr supports scalars while Zarr does not and also how scalar interoperability is achieved. ## Misc. Other Changes 1. Convert the nczarr special attributes and keys to be all lower case. So "_NCZARR_ATTR" now used "_nczarr_attr. Support back compatibility for the upper case names. 2. Cleanup my too-clever-by-half handling of scalars in libnczarr.
2022-08-28 10:21:13 +08:00
if test "x$FEATURE_THREDDSTEST" = x1 ; then
testbytes nc4b netCDF-4 "$URL4b"
fi
Fix byterange handling of some URLS re: Issue The byterange handling of the following URLS fails. ### Problem 1: "https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc#mode=bytes" It turns out that byterange in hdf5 has two possible targets: S3 and not-S3 (e.g. a thredds server or the crudata URL above). Each uses a different HDF5 Virtual File Driver (VFD). I incorrectly set up the byterange code in libhdf5 so that it would choose one or the other of the two VFD's for any netcdf-c library build. The fix is to allow it to choose either one at run-time. ### Problem 2: "https://noaa-goes16.s3.amazonaws.com/ABI-L1b-RadF/2022/001/18/OR_ABI-L1b-RadF-M6C01_G16_s20220011800205_e20220011809513_c20220011809562.nc#mode=bytes,s3" When given what appears to be an S3-related URL, the netcdf-c library code converts it into a canonical, so-called "path" format. In casing out the possible input URL formats, I missed the case where the host contains the bucket ("noaa-goes16"), but not the region. So the fix was to check for this case. ## Misc. Related Changes 1. Since S3 is used in more than just NCZarr, I changed the automake/cmake options to replace "--enable-nczarr-s3" with "--enable-s3", but keeping the former option as a synonym for the latter. This also entailed cleaning up libnetcdf.settings WRT S3 support 2. Added the above URLS as additional test cases ## Misc. Un-Related Changes 1. CURLOPT_PUT is deprecated in favor to CURLOPT_UPLOAD 2. Fix some minor warnings ## Open Problems * Under Ubuntu, either libcrypto or aws-sdk-cpp has a memory leak.
2023-03-03 10:51:02 +08:00
testbytes nc4f netCDF-4 "$URL4f"
fi
if test "x$FEATURE_S3" = xyes ; then
echo "***Test remote netcdf-3 file: s3 auth"
tests3auth nc3b classic "$URL3b"
fi
Fix byterange handling of some URLS re: Issue The byterange handling of the following URLS fails. ### Problem 1: "https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc#mode=bytes" It turns out that byterange in hdf5 has two possible targets: S3 and not-S3 (e.g. a thredds server or the crudata URL above). Each uses a different HDF5 Virtual File Driver (VFD). I incorrectly set up the byterange code in libhdf5 so that it would choose one or the other of the two VFD's for any netcdf-c library build. The fix is to allow it to choose either one at run-time. ### Problem 2: "https://noaa-goes16.s3.amazonaws.com/ABI-L1b-RadF/2022/001/18/OR_ABI-L1b-RadF-M6C01_G16_s20220011800205_e20220011809513_c20220011809562.nc#mode=bytes,s3" When given what appears to be an S3-related URL, the netcdf-c library code converts it into a canonical, so-called "path" format. In casing out the possible input URL formats, I missed the case where the host contains the bucket ("noaa-goes16"), but not the region. So the fix was to check for this case. ## Misc. Related Changes 1. Since S3 is used in more than just NCZarr, I changed the automake/cmake options to replace "--enable-nczarr-s3" with "--enable-s3", but keeping the former option as a synonym for the latter. This also entailed cleaning up libnetcdf.settings WRT S3 support 2. Added the above URLS as additional test cases ## Misc. Un-Related Changes 1. CURLOPT_PUT is deprecated in favor to CURLOPT_UPLOAD 2. Fix some minor warnings ## Open Problems * Under Ubuntu, either libcrypto or aws-sdk-cpp has a memory leak.
2023-03-03 10:51:02 +08:00
if test "x$FEATURE_S3" = xyes && test "x$FEATURE_HDF5" = xyes ; then
echo "***Test remote netdf-4 file: s3"
testbytes nc4a netCDF-4 "$URL4a"
echo "***Test remote netcdf-4 file: s3"
testbytes nc4c netCDF-4 "$URL4c"
echo "***Test remote netcdf-4 file: s3 auth"
tests3auth nc4d netCDF-4 "$URL4d"
Fix byterange handling of some URLS re: Issue The byterange handling of the following URLS fails. ### Problem 1: "https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc#mode=bytes" It turns out that byterange in hdf5 has two possible targets: S3 and not-S3 (e.g. a thredds server or the crudata URL above). Each uses a different HDF5 Virtual File Driver (VFD). I incorrectly set up the byterange code in libhdf5 so that it would choose one or the other of the two VFD's for any netcdf-c library build. The fix is to allow it to choose either one at run-time. ### Problem 2: "https://noaa-goes16.s3.amazonaws.com/ABI-L1b-RadF/2022/001/18/OR_ABI-L1b-RadF-M6C01_G16_s20220011800205_e20220011809513_c20220011809562.nc#mode=bytes,s3" When given what appears to be an S3-related URL, the netcdf-c library code converts it into a canonical, so-called "path" format. In casing out the possible input URL formats, I missed the case where the host contains the bucket ("noaa-goes16"), but not the region. So the fix was to check for this case. ## Misc. Related Changes 1. Since S3 is used in more than just NCZarr, I changed the automake/cmake options to replace "--enable-nczarr-s3" with "--enable-s3", but keeping the former option as a synonym for the latter. This also entailed cleaning up libnetcdf.settings WRT S3 support 2. Added the above URLS as additional test cases ## Misc. Un-Related Changes 1. CURLOPT_PUT is deprecated in favor to CURLOPT_UPLOAD 2. Fix some minor warnings ## Open Problems * Under Ubuntu, either libcrypto or aws-sdk-cpp has a memory leak.
2023-03-03 10:51:02 +08:00
echo "***Test remote netcdf-4 file: s3 noauth"
testbytes nc4e netCDF-4 "$URL4e"
fi
# Cleanup
testcleanup https://s3.us-east-1.amazonaws.com/unidata-zarr-test-data
Provide byte-range reading of remote datasets re: issue https://github.com/Unidata/netcdf-c/issues/1251 Assume that you have the URL to a remote dataset which is a normal netcdf-3 or netcdf-4 file. This PR allows the netcdf-c to read that dataset's contents as a netcdf file using HTTP byte ranges if the remote server supports byte-range access. Originally, this PR was set up to access Amazon S3 objects, but it can also access other remote datasets such as those provided by a Thredds server via the HTTPServer access protocol. It may also work for other kinds of servers. Note that this is not intended as a true production capability because, as is known, this kind of access to can be quite slow. In addition, the byte-range IO drivers do not currently do any sort of optimization or caching. An additional goal here is to gain some experience with the Amazon S3 REST protocol. This architecture and its use documented in the file docs/byterange.dox. There are currently two test cases: 1. nc_test/tst_s3raw.c - this does a simple open, check format, close cycle for a remote netcdf-3 file and a remote netcdf-4 file. 2. nc_test/test_s3raw.sh - this uses ncdump to investigate some remote datasets. This PR also incorporates significantly changed model inference code (see the superceded PR https://github.com/Unidata/netcdf-c/pull/1259). 1. It centralizes the code that infers the dispatcher. 2. It adds support for byte-range URLs Other changes: 1. NC_HDF5_finalize was not being properly called by nc_finalize(). 2. Fix minor bug in ncgen3.l 3. fix memory leak in nc4info.c 4. add code to walk the .daprc triples and to replace protocol= fragment tag with a more general mode= tag. Final Note: Th inference code is still way too complicated. We need to move to the validfile() model used by netcdf Java, where each dispatcher is asked if it can process the file. This decentralizes the inference code. This will be done after all the major new dispatchers (PIO, Zarr, etc) have been implemented.
2019-01-02 09:27:36 +08:00
exit