glibc/sysdeps
Siddhesh Poyarekar 70c97f8493 aarch64,falkor: Ignore prefetcher hints for memmove tail
The tail of the copy loops are unable to train the falkor hardware
prefetcher because they load from a different base compared to the hot
loop.  In this case avoid serializing the instructions by loading them
into different registers.  Also peel the last iteration of the loop
into the tail (and have them use different registers) since it gives
better performance for medium sizes.

This results in performance improvements of between 3% and 20% over
the current falkor implementation for sizes between 128 bytes and 1K
on the memmove-walk benchmark, thus mostly covering the regressions
seen against the generic memmove.

	* sysdeps/aarch64/multiarch/memmove_falkor.S
	(__memmove_falkor): Use multiple registers to move data in
	loop tail.
2018-05-11 00:08:02 +05:30
..
aarch64 aarch64,falkor: Ignore prefetcher hints for memmove tail 2018-05-11 00:08:02 +05:30
alpha
arm
generic
gnu
hppa
htl
hurd
i386
ia64
ieee754
init_array
m68k
mach
microblaze
mips
nios2
nptl
posix
powerpc
pthread
riscv
s390
sh
sparc
unix
wordsize-32
wordsize-64
x86
x86_64