glibc/sysdeps/powerpc
Pedro Franco de Carvalho 813c6ec808 powerpc: optimize strcpy/stpcpy for POWER9/10
This patch modifies the current POWER9 implementation of strcpy and
stpcpy to optimize it for POWER9/10.

Since no new POWER10 instructions are used, the original POWER9 strcpy is
modified instead of creating a new implementation for POWER10.  This
implementation is based on both the original POWER9 implementation of
strcpy and the preamble of the new POWER10 implementation of strlen.

The changes also affect stpcpy, which uses the same implementation with
some additional code before returning.

On POWER9, averaging improvements across the benchmark
inputs (length/source alignment/destination alignment), for an
experiment that ran the benchmark five times, bench-strcpy showed an
improvement of 5.23%, and bench-stpcpy showed an improvement of 6.59%.

On POWER10, bench-strcpy showed 13.16%, and bench-stpcpy showed 13.59%.

The changes are:

1. Removed the null string optimization.

   Although this results in a few extra cycles for the null string, in
   combination with the second change, this resulted in improvements for
   for other cases.

2. Adapted the preamble from strlen for POWER10.

   This is the part of the function that handles up to the first 16 bytes
   of the string.

3. Increased number of unrolled iterations in the main loop to 6.

Reviewed-by: Matheus Castanho <msc@linux.ibm.com>
Tested-by: Matheus Castanho <msc@linux.ibm.com>
2021-07-01 17:58:53 -03:00
..
bits
fpu
nofpu
nptl nptl: Move pthread_spin_trylock into libc 2021-04-23 17:06:48 +02:00
power4
power6
powerpc32
powerpc64 powerpc: optimize strcpy/stpcpy for POWER9/10 2021-07-01 17:58:53 -03:00
sys/platform
abort-instr.h
atomic-machine.h
cpu-features.c
cpu-features.h
dl-procinfo.c
dl-procinfo.h
dl-tls.c
dl-tls.h
dl-tunables.list
elf-initfini.h
ffs.c
fpu_control.h
gccframe.h
hwcapinfo.c
hwcapinfo.h
ifunc-sel.h
jmpbuf-offsets.h
jmpbuf-unwind.h
ldsodefs.h
libc-tls.c
locale-defines.sym
longjmp.c
machine-gmon.h
Makefile Remove stale references to libdl.a 2021-06-09 19:14:02 +02:00
math-tests-snan-cast.h
memusage.h
mod-cache-ppc.c
mod-tlsopt-powerpc.c
mp_clz_tab.c
novmx-longjmp.c
novmx-sigjmp.c
novmxsetjmp.h
preconfigure
preconfigure.ac
rtld-global-offsets.sym
sigjmp.c
sotruss-lib.c
stackinfo.h
sysdep.h
test-arith.c
test-arithf.c
test-get_hwcap-static.c
test-get_hwcap.c
test-gettimebase.c
tls-macros.h
tst-cache-ppc-static-dlopen.c
tst-cache-ppc-static.c
tst-cache-ppc.c
tst-set_ppr.c
tst-stack-align.h Properly check stack alignment [BZ #27901] 2021-05-24 07:42:12 -07:00
tst-tlsifunc-static.c
tst-tlsifunc.c
tst-tlsopt-powerpc.c
Versions