gcc/libgomp/testsuite/Makefile.am
Thomas Schwinge 6c3b30ef9e Support parallel testing in libgomp, part II [PR66005]
..., and enable if 'flock' is available for serializing execution testing.

Regarding the default of 19 parallel slots, this turned out to be a local
minimum for wall time when testing this on:

    $ uname -srvi
    Linux 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 x86_64
    $ grep '^model name' < /proc/cpuinfo | uniq -c
         32 model name      : Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz

... in two configurations: case (a) standard configuration, no offloading
configured, case (b) offloading for GCN and nvptx configured but no devices
available.  For both cases, default plus '-m32' variant.

    $ \time make check-target-libgomp RUNTESTFLAGS="--target_board=unix\{,-m32\}"

Case (a), baseline:

    6432.23user 332.38system 47:32.28elapsed 237%CPU (0avgtext+0avgdata 505044maxresident)k
    6382.43user 319.21system 47:06.04elapsed 237%CPU (0avgtext+0avgdata 505172maxresident)k

This is what people have been complaining about, rightly so, in
<https://gcc.gnu.org/PR66005> "libgomp make check time is excessive" and
elsewhere.

Case (a), parallelized:

    -j12 GCC_TEST_PARALLEL_SLOTS=10
    3088.49user 267.74system 6:43.82elapsed 831%CPU (0avgtext+0avgdata 505188maxresident)k
    -j15 GCC_TEST_PARALLEL_SLOTS=15
    3308.08user 294.79system 5:56.04elapsed 1011%CPU (0avgtext+0avgdata 505360maxresident)k
    -j17 GCC_TEST_PARALLEL_SLOTS=17
    3539.93user 298.99system 5:27.86elapsed 1170%CPU (0avgtext+0avgdata 505112maxresident)k
    -j18 GCC_TEST_PARALLEL_SLOTS=18
    3697.50user 317.18system 5:14.63elapsed 1275%CPU (0avgtext+0avgdata 505360maxresident)k
    -j19 GCC_TEST_PARALLEL_SLOTS=19
    3765.94user 324.27system 5:13.22elapsed 1305%CPU (0avgtext+0avgdata 505128maxresident)k
    -j20 GCC_TEST_PARALLEL_SLOTS=20
    3684.66user 312.32system 5:15.26elapsed 1267%CPU (0avgtext+0avgdata 505100maxresident)k
    -j23 GCC_TEST_PARALLEL_SLOTS=23
    4040.59user 347.10system 5:29.12elapsed 1333%CPU (0avgtext+0avgdata 505200maxresident)k
    -j26 GCC_TEST_PARALLEL_SLOTS=26
    3973.24user 377.96system 5:24.70elapsed 1340%CPU (0avgtext+0avgdata 505160maxresident)k
    -j32 GCC_TEST_PARALLEL_SLOTS=32
    4004.42user 346.10system 5:16.11elapsed 1376%CPU (0avgtext+0avgdata 505160maxresident)k

Yay!

Case (b), baseline; 2+ h:

    7227.58user 700.54system 2:14:33elapsed 98%CPU (0avgtext+0avgdata 994264maxresident)k

Case (b), parallelized:

    -j12 GCC_TEST_PARALLEL_SLOTS=10
    7377.46user 777.52system 16:06.63elapsed 843%CPU (0avgtext+0avgdata 994344maxresident)k
    -j15 GCC_TEST_PARALLEL_SLOTS=15
    8019.18user 721.42system 12:13.56elapsed 1191%CPU (0avgtext+0avgdata 994228maxresident)k
    -j17 GCC_TEST_PARALLEL_SLOTS=17
    8530.11user 716.95system 10:45.92elapsed 1431%CPU (0avgtext+0avgdata 994176maxresident)k
    -j18 GCC_TEST_PARALLEL_SLOTS=18
    8776.79user 645.89system 10:27.20elapsed 1502%CPU (0avgtext+0avgdata 994248maxresident)k
    -j19 GCC_TEST_PARALLEL_SLOTS=19
    9332.37user 641.76system 10:15.09elapsed 1621%CPU (0avgtext+0avgdata 994260maxresident)k
    -j20 GCC_TEST_PARALLEL_SLOTS=20
    9609.54user 789.88system 10:26.94elapsed 1658%CPU (0avgtext+0avgdata 994284maxresident)k
    -j23 GCC_TEST_PARALLEL_SLOTS=23
    10362.40user 911.14system 10:44.47elapsed 1749%CPU (0avgtext+0avgdata 994208maxresident)k
    -j26 GCC_TEST_PARALLEL_SLOTS=26
    11159.44user 850.99system 11:09.25elapsed 1794%CPU (0avgtext+0avgdata 994256maxresident)k
    -j32 GCC_TEST_PARALLEL_SLOTS=32
    11453.50user 939.52system 11:00.38elapsed 1876%CPU (0avgtext+0avgdata 994240maxresident)k

On my Dell Precision 7530 laptop:

    $ uname -srvi
    Linux 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64
    $ grep '^model name' < /proc/cpuinfo | uniq -c
         12 model name      : Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
    $ nvidia-smi -L
    GPU 0: Quadro P1000 (UUID: GPU-e043973b-b52a-d02b-c066-a8fdbf64e8ea)

... in two configurations: case (c) standard configuration, no offloading
configured, case (d) offloading for nvptx configured and device available.
For both cases, only default variant, no '-m32'.

    $ \time make check-target-libgomp

Case (c), baseline; roughly half of case (a) (just one variant):

    1180.98user 110.80system 19:36.40elapsed 109%CPU (0avgtext+0avgdata 505148maxresident)k
    1133.22user 111.08system 19:35.75elapsed 105%CPU (0avgtext+0avgdata 505212maxresident)k

Case (c), parallelized:

    -j12 GCC_TEST_PARALLEL_SLOTS=2
    1143.83user 110.76system 10:20.46elapsed 202%CPU (0avgtext+0avgdata 505216maxresident)k
    -j12 GCC_TEST_PARALLEL_SLOTS=6
    1737.08user 143.94system 4:59.48elapsed 628%CPU (0avgtext+0avgdata 505200maxresident)k
    1730.31user 143.02system 4:58.75elapsed 627%CPU (0avgtext+0avgdata 505152maxresident)k
    -j12 GCC_TEST_PARALLEL_SLOTS=8
    2192.63user 169.34system 4:52.96elapsed 806%CPU (0avgtext+0avgdata 505216maxresident)k
    2219.04user 167.67system 4:53.19elapsed 814%CPU (0avgtext+0avgdata 505152maxresident)k
    -j12 GCC_TEST_PARALLEL_SLOTS=10
    2463.93user 184.98system 4:48.39elapsed 918%CPU (0avgtext+0avgdata 505200maxresident)k
    2455.62user 183.68system 4:47.40elapsed 918%CPU (0avgtext+0avgdata 505216maxresident)k
    -j12 GCC_TEST_PARALLEL_SLOTS=12
    2591.04user 192.64system 4:44.98elapsed 976%CPU (0avgtext+0avgdata 505216maxresident)k
    2581.23user 195.21system 4:47.51elapsed 965%CPU (0avgtext+0avgdata 505212maxresident)k
    -j20 GCC_TEST_PARALLEL_SLOTS=20 [oversubscribe]
    2613.18user 199.51system 4:44.06elapsed 990%CPU (0avgtext+0avgdata 505216maxresident)k

Case (d), baseline (compared to case (b): only nvptx offloading compilation,
but also nvptx offloading execution); ~1 h:

    2841.93user 653.68system 1:02:26elapsed 93%CPU (0avgtext+0avgdata 909792maxresident)k
    2842.03user 654.39system 1:02:24elapsed 93%CPU (0avgtext+0avgdata 909880maxresident)k

Case (d), parallelized:

    -j12 GCC_TEST_PARALLEL_SLOTS=2
    2856.39user 606.87system 33:58.64elapsed 169%CPU (0avgtext+0avgdata 909948maxresident)k
    -j12 GCC_TEST_PARALLEL_SLOTS=6
    3444.90user 666.86system 18:37.57elapsed 367%CPU (0avgtext+0avgdata 909856maxresident)k
    3462.13user 667.13system 18:36.87elapsed 369%CPU (0avgtext+0avgdata 909872maxresident)k
    -j12 GCC_TEST_PARALLEL_SLOTS=8
    3929.74user 716.22system 18:02.36elapsed 429%CPU (0avgtext+0avgdata 909832maxresident)k
    -j12 GCC_TEST_PARALLEL_SLOTS=10
    4152.84user 736.16system 17:43.05elapsed 459%CPU (0avgtext+0avgdata 909872maxresident)k
    -j12 GCC_TEST_PARALLEL_SLOTS=12
    4209.60user 749.00system 17:35.20elapsed 469%CPU (0avgtext+0avgdata 909840maxresident)k
    -j20 GCC_TEST_PARALLEL_SLOTS=20 [oversubscribe]
    4255.54user 756.78system 17:29.06elapsed 477%CPU (0avgtext+0avgdata 909868maxresident)k

Worth noting is that with nvptx offloading, there is one execution test case
that times out ('libgomp.fortran/reverse-offload-5.f90').  This effectively
stalls progress for almost 5 min: quickly other executions test cases queue up
on the lock for all parallel slots.  That's working as expected; just noting
this as it accordingly does skew the wall time numbers.

	PR testsuite/66005
	libgomp/
	* configure.ac: Look for 'flock'.
	* testsuite/Makefile.am (gcc_test_parallel_slots): Enable parallel testing.
	* testsuite/config/default.exp: Don't 'load_lib "standard.exp"' here...
	* testsuite/lib/libgomp.exp: ... but here, instead.
	(libgomp_load): Override for parallel testing.
	* testsuite/libgomp-site-extra.exp.in (FLOCK): Set.
	* configure: Regenerate.
	* Makefile.in: Regenerate.
	* testsuite/Makefile.in: Regenerate.
2023-05-15 12:11:18 +02:00

147 lines
5.9 KiB
Makefile

## Process this file with automake to produce Makefile.in.
AUTOMAKE_OPTIONS = foreign
# May be used by various substitution variables.
gcc_version := $(shell @get_gcc_base_ver@ $(top_srcdir)/../gcc/BASE-VER)
EXPECT = $(shell if test -f $(top_builddir)/../expect/expect; then \
echo $(top_builddir)/../expect/expect; else echo expect; fi)
_RUNTEST = $(shell if test -f $(top_srcdir)/../dejagnu/runtest; then \
echo $(top_srcdir)/../dejagnu/runtest; else echo runtest; fi)
RUNTESTDEFAULTFLAGS = --tool $$tool --srcdir $$srcdir
PWD_COMMAND = $${PWDCMD-pwd}
EXTRA_DEJAGNU_SITE_CONFIG = libgomp-site-extra.exp
# Instead of directly in ../testsuite/libgomp-test-support.exp.in, the
# following variables have to be "routed through" this Makefile, for expansion
# of the several (Makefile) variables used therein.
libgomp-test-support.exp: libgomp-test-support.pt.exp Makefile
cp $< $@.tmp
echo >> $@.tmp \
'set offload_additional_options "$(offload_additional_options)"'
echo >> $@.tmp \
'set offload_additional_lib_paths "$(offload_additional_lib_paths)"'
mv $@.tmp $@
site.exp: Makefile $(EXTRA_DEJAGNU_SITE_CONFIG)
@echo 'Making a new site.exp file ...'
@echo '## these variables are automatically generated by make ##' >site.tmp
@echo '# Do not edit here. If you wish to override these values' >>site.tmp
@echo '# edit the last section' >>site.tmp
@echo 'set srcdir "$(srcdir)"' >>site.tmp
@echo "set objdir `pwd`" >>site.tmp
@echo 'set build_alias "$(build_alias)"' >>site.tmp
@echo 'set build_triplet $(build_triplet)' >>site.tmp
@echo 'set host_alias "$(host_alias)"' >>site.tmp
@echo 'set host_triplet $(host_triplet)' >>site.tmp
@echo 'set target_alias "$(target_alias)"' >>site.tmp
@echo 'set target_triplet $(target_triplet)' >>site.tmp
@list='$(EXTRA_DEJAGNU_SITE_CONFIG)'; for f in $$list; do \
echo "## Begin content included from file $$f. Do not modify. ##" \
&& cat `test -f "$$f" || echo '$(srcdir)/'`$$f \
&& echo "## End content included from file $$f. ##" \
|| exit 1; \
done >> site.tmp
@echo "## End of auto-generated content; you can edit from here. ##" >> site.tmp
@if test -f site.exp; then \
sed -e '1,/^## End of auto-generated content.*##/d' site.exp >> site.tmp; \
fi
@-rm -f site.bak
@test ! -f site.exp || mv site.exp site.bak
@mv site.tmp site.exp
%/site.exp: site.exp
-@test -d $* || mkdir $*
@srcdir=`cd $(srcdir); ${PWD_COMMAND}`;
@objdir=`${PWD_COMMAND}`/$*; \
sed -e "s|^set srcdir .*$$|set srcdir $$srcdir|" \
-e "s|^set objdir .*$$|set objdir $$objdir|" \
site.exp > $*/site.exp.tmp
@-rm -f $*/site.bak
@test ! -f $*/site.exp || mv $*/site.exp $*/site.bak
@mv $*/site.exp.tmp $*/site.exp
check_p_numbers0:=1 2 3 4 5 6 7 8 9
check_p_numbers1:=0 $(check_p_numbers0)
check_p_numbers2:=$(foreach i,$(check_p_numbers0),$(addprefix $(i),$(check_p_numbers1)))
check_p_numbers3:=$(addprefix 0,$(check_p_numbers1)) $(check_p_numbers2)
check_p_numbers4:=$(foreach i,$(check_p_numbers0),$(addprefix $(i),$(check_p_numbers3)))
check_p_numbers5:=$(addprefix 0,$(check_p_numbers3)) $(check_p_numbers4)
check_p_numbers6:=$(foreach i,$(check_p_numbers0),$(addprefix $(i),$(check_p_numbers5)))
check_p_numbers:=$(check_p_numbers0) $(check_p_numbers2) $(check_p_numbers4) $(check_p_numbers6)
# If unable to serialize execution testing, use just one parallel slot.
gcc_test_parallel_slots:=$(if $(FLOCK),$(if $(GCC_TEST_PARALLEL_SLOTS),$(GCC_TEST_PARALLEL_SLOTS),19),1)
check_p_subdirs=$(wordlist 1,$(gcc_test_parallel_slots),$(check_p_numbers))
check_DEJAGNU_libgomp_targets = $(addprefix check-DEJAGNUlibgomp,$(check_p_subdirs))
$(check_DEJAGNU_libgomp_targets): check-DEJAGNUlibgomp%: libgomp%/site.exp
check-DEJAGNU $(check_DEJAGNU_libgomp_targets): check-DEJAGNU%: site.exp
$(if $*,@)AR="$(AR)"; export AR; \
RANLIB="$(RANLIB)"; export RANLIB; \
if [ -z "$*" ] && [ -n "$(filter -j%, $(MFLAGS))" ]; then \
rm -rf libgomp-parallel || true; \
mkdir libgomp-parallel; \
$(MAKE) $(AM_MAKEFLAGS) $(check_DEJAGNU_libgomp_targets); \
rm -rf libgomp-parallel || true; \
for idx in $(check_p_subdirs); do \
if [ -d libgomp$$idx ]; then \
mv -f libgomp$$idx/libgomp.sum libgomp$$idx/libgomp.sum.sep; \
mv -f libgomp$$idx/libgomp.log libgomp$$idx/libgomp.log.sep; \
fi; \
done; \
$(SHELL) $(srcdir)/../../contrib/dg-extract-results.sh \
libgomp[0-9]*/libgomp.sum.sep > libgomp.sum; \
$(SHELL) $(srcdir)/../../contrib/dg-extract-results.sh -L \
libgomp[0-9]*/libgomp.log.sep > libgomp.log; \
exit 0; \
fi; \
srcdir=`$(am__cd) $(srcdir) && pwd`; export srcdir; \
EXPECT=$(EXPECT); export EXPECT; \
runtest=$(_RUNTEST); \
if [ -z "$$runtest" ]; then runtest=runtest; fi; \
tool=libgomp; \
if [ -n "$*" ]; then \
if [ -f libgomp-parallel/finished ]; then rm -rf "$*"; exit 0; fi; \
GCC_RUNTEST_PARALLELIZE_DIR=`${PWD_COMMAND}`/libgomp-parallel; \
export GCC_RUNTEST_PARALLELIZE_DIR; \
cd "$*"; \
fi; \
if $(SHELL) -c "$$runtest --version" > /dev/null 2>&1; then \
$$runtest $(AM_RUNTESTFLAGS) $(RUNTESTDEFAULTFLAGS) \
$(RUNTESTFLAGS); \
if [ -n "$*" ]; then \
touch $$GCC_RUNTEST_PARALLELIZE_DIR/finished; \
fi; \
else \
echo "WARNING: could not find \`runtest'" 1>&2; :;\
fi
distclean-DEJAGNU:
-rm -f site.exp site.bak
-l='$(PACKAGE)'; for tool in $$l; do \
rm -f $$tool.sum $$tool.log; \
done
distclean-am: distclean-DEJAGNU
check-am:
@if test -n "$(filter -j%, $(MFLAGS))"; then \
num_cpus=@CPU_COUNT@; \
if type -p getconf 2>/dev/null >/dev/null; then \
num_cpus=`getconf _NPROCESSORS_ONLN 2>/dev/null`; \
case "$$num_cpus" in \
'' | 0* | *[!0-9]*) num_cpus=@CPU_COUNT@;; \
esac; \
fi; \
if test $$num_cpus -gt 8 && test -z "$$OMP_NUM_THREADS"; then \
OMP_NUM_THREADS=8; export OMP_NUM_THREADS; \
echo @@@ libgomp OMP_NUM_THREADS adjusted to 8 because of parallel make check and too many CPUs; \
fi; \
fi; \
$(MAKE) $(AM_MAKEFLAGS) check-DEJAGNU
all-local: libgomp-test-support.exp
.PHONY: check-DEJAGNU distclean-DEJAGNU