Add a configurable "test case" that will create

and then open a file with a lot of metadata.
The test is configurable to determine the parameters
for the created metadata.
This commit is contained in:
Dennis Heimbigner 2018-03-17 16:25:13 -06:00
parent b70f67a891
commit cc136cad08
6 changed files with 57 additions and 6 deletions

1
cf
View File

@ -118,6 +118,7 @@ FLAGS="$FLAGS --enable-logging"
#FLAGS="$FLAGS --disable-silent-rules"
#FLAGS="$FLAGS --with-testservers=remotestserver.localhost:8083"
FLAGS="$FLAGS --disable-filter-testing"
#FLAGS="$FLAGS --enable-metadata-perf"
if test "x$PAR4" != x1 ; then
FLAGS="$FLAGS --disable-parallel4"

View File

@ -1208,6 +1208,7 @@ AM_CONDITIONAL(BUILD_DISKLESS, [test x$enable_diskless = xyes])
AM_CONDITIONAL(BUILD_MMAP, [test x$enable_mmap = xyes])
AM_CONDITIONAL(BUILD_DOCS, [test x$enable_doxygen = xyes])
AM_CONDITIONAL(SHOW_DOXYGEN_TAG_LIST, [test x$enable_doxygen_tasks = xyes])
AM_CONDITIONAL(ENABLE_METADATA_PERF, [test x$enable_metadata_perf = xyes])
# If the machine doesn't have a long long, and we want netCDF-4, then
# we've got problems!
@ -1295,6 +1296,14 @@ if test "x$enable_jna" = xyes ; then
AC_DEFINE([JNA], [1], [if true, include JNA bug fix])
fi
# Control large metadata performance test
AC_MSG_CHECKING([whether large metadata performance testing should be run])
AC_ARG_ENABLE([metadata-perf],
[AS_HELP_STRING([--enable-metadata-perf],
[Test performance of nc_create and nc_open on large metadata])])
test "x$enable_metadata_perf" = xyes || enable_metadata_perf=no
AC_MSG_RESULT($enable_metadata_perf)
# Control filter test/example
AC_MSG_CHECKING([whether filter testing should be run])
AC_ARG_ENABLE([filter-testing],

View File

@ -347,6 +347,46 @@ for(i=0;i<ncindexsize(grp->children);i++) {
\endcode
In this case, the iteration is by index into the underlying vector.
\section Sperf Performance
The initial impetus for this change was to improve the performance
of netcdf-4 metadata loading by replacing linear searches with O(1)
searches.
In fact, this goal has not been met. It appears to be the case
that the metadata loading costs are entirely dominated by the
performance of the HDF5 library. The reason for this is that
the netcdf-c library loads all the metadata immediately
when a file is opened. This in turn means that all of the metadata
is immediately extracted from the underlying HDF5 file. So, there is
no opportunity for lazy loading to be used.
The remedys of which I can conceive are these.
1. Modify the netcdf-c library to also do lazy loading
2. Store a single metadata object into the file so it can
be read at one shot. This object would then be processed
in-memory to construct the internal metadata. The costs for
this approach are space in the file plus the need to keep it
consistent with the actual metadata stored by HDF5.
It should be noted that there is an effect from this change.
Using gprof, one can see that in the original code the obj_list_add
function was the dominate function called by a large percentage (about 20%).
Whereas with the new code, the function call distribute is more much more
even with no function taking more than 4-5%.
Some other observations:
1. the utf8 code now shows up as taking about 4%. Given that most names
are straight ASCII, it might pay to try to optimize for this to avoid
invoking the utf8 processing code.
2. In the new code, attribute processing appears to take up a lot of the
time. This, however might be an artifact of the test cases.
3. There is a small performance improvement from avoiding walking the linked
list. It appears that creating a file is about 10% faster and opening a file
is also about 10% faster.
\section Snotes_and_warnings Notes and Warning
1. NCindex is currently not used for enum constants and compound fields.

View File

@ -119,6 +119,11 @@ tst_parallel4 tst_nc4perf tst_mode tst_simplerw_coll_r
TESTS += run_par_test.sh
endif
if ENABLE_METADATA_PERF
check_PROGRAMS += bigmeta openbigmeta
TESTS += perftest.sh
endif
EXTRA_DIST = run_par_test.sh run_bm.sh run_bm_test1.sh \
run_bm_test2.sh run_bm_radar_2D.sh run_bm_radar_2D_compression1.sh \
run_par_bm_test.sh run_bm_elena.sh run_par_bm_radar_2D.sh \
@ -131,7 +136,7 @@ tst_put_vars_two_unlim_dim.c tst_empty_vlen_unlim.c \
run_empty_vlen_test.sh ref_hdf5_compat1.nc ref_hdf5_compat2.nc \
ref_hdf5_compat3.nc tst_misc.sh tdset.h5 tst_szip.sh ref_szip.h5 \
ref_szip.cdl tst_filter.sh bzip2.cdl filtered.cdl unfiltered.cdl \
ref_bzip2.c findplugin.in
ref_bzip2.c findplugin.in perftest.sh
CLEANFILES = tst_mpi_parallel.bin cdm_sea_soundings.nc bm_chunking.nc \
bm_radar.nc bm_radar1.nc radar_3d_compression_test.txt \
@ -145,7 +150,7 @@ tst_interops2.h4 tst_h5_endians.nc tst_h4_lendian.h4 test.nc \
tst_atts_string_rewrite.nc tst_empty_vlen_unlim.nc \
tst_empty_vlen_lim.nc tst_parallel4_simplerw_coll.nc \
tst_fill_attr_vanish.nc tst_rehash.nc testszip.nc test.h5 \
szip_dump.cdl
szip_dump.cdl perftest.txt bigmeta.nc
DISTCLEANFILES = findplugin.sh

View File

@ -57,7 +57,6 @@ if test "x$NOP" != x1 ; then
echo "***Testing url prefix parameters"
buildurl $PREFIX ""
# Invoke ncdump to extract the URL
echo "command: ${NCDUMP} -h $url"
${NCDUMP} -h "$url" >./tmp_testurl 2> ./errtmp_testurl
if test "x${SHOW}" = x1 ; then cat ./tmp_testurl ; fi
fi
@ -67,7 +66,6 @@ if test "x$NOS" != x1 ; then
echo "***Testing url suffix parameters"
buildurl "" $SUFFIX
# Invoke ncdump to extract the URL
echo "command: ${NCDUMP} -h $url"
${NCDUMP} -h "$url" >./tmp_testurl 2> ./errtmp_testurl
if test "x${SHOW}" = x1 ; then cat ./tmp_testurl ; fi
fi
@ -77,7 +75,6 @@ if test "x$NOB" != x1 ; then
echo "***Testing url prefix+suffix parameters"
buildurl $BOTHP $BOTHS
# Invoke ncdump to extract the URL
echo "command: ${NCDUMP} -h $url"
${NCDUMP} -h "$url" >./tmp_testurl 2> ./errtmp_testurl
if test "x${SHOW}" = x1 ; then cat ./tmp_testurl ; fi
fi

View File

@ -73,7 +73,6 @@ for x in ${TESTSET} ; do
if test "x${t}" = "x${x}" ; then isxfail=1; fi
done
ok=1
echo command: ${VALGRIND} ${NCDUMP} ${FLAGS} "${url}"
if ${VALGRIND} ${NCDUMP} ${FLAGS} "${url}" | sed 's/\\r//g' > ${x}.dmp ; then ok=$ok; else ok=0; fi
# compare with expected
if diff -w ${EXPECTED}/${x}.dmp ${x}.dmp ; then ok=$ok; else ok=0; fi