glibc/sysdeps/aarch64
Siddhesh Poyarekar db725a458e aarch64,falkor: Ignore prefetcher tagging for smaller copies
For smaller and medium sized copies, the effect of hardware
prefetching are not as dominant as instruction level parallelism.
Hence it makes more sense to load data into multiple registers than to
try and route them to the same prefetch unit.  This is also the case
for the loop exit where we are unable to latch on to the same prefetch
unit anyway so it makes more sense to have data loaded in parallel.

The performance results are a bit mixed with memcpy-random, with
numbers jumping between -1% and +3%, i.e. the numbers don't seem
repeatable.  memcpy-walk sees a 70% improvement (i.e. > 2x) for 128
bytes and that improvement reduces down as the impact of the tail copy
decreases in comparison to the loop.

	* sysdeps/aarch64/multiarch/memcpy_falkor.S (__memcpy_falkor):
	Use multiple registers to copy data in loop tail.
2018-05-11 00:11:52 +05:30
..
bits
fpu Move math_opt_barrier, math_force_eval to separate math-barriers.h. 2018-05-09 19:45:47 +00:00
multiarch aarch64,falkor: Ignore prefetcher tagging for smaller copies 2018-05-11 00:11:52 +05:30
nptl hurd: add gscope support 2018-03-11 13:06:33 +01:00
soft-fp
__longjmp.S
abort-instr.h
atomic-machine.h
bsd-_setjmp.S
bsd-setjmp.S
configure
configure.ac
crti.S
crtn.S
dl-irel.h
dl-link.sym
dl-machine.h elf: Unify symbol address run-time calculation [BZ #19818] 2018-04-04 23:09:37 +01:00
dl-sysdep.h
dl-tls.h
dl-tlsdesc.h
dl-tlsdesc.S
dl-trampoline.S
dl-tunables.list
Implies
jmpbuf-offsets.h
jmpbuf-unwind.h
ldsodefs.h
libc-tls.c
libm-test-ulps [PATCH 1/7] sin/cos slow paths: avoid slow paths for small inputs 2018-04-03 16:52:16 +01:00
libm-test-ulps-name
linkmap.h
machine-gmon.h
Makefile
math-tests.h
mcount.c
memchr.S
memcmp.S aarch64: Fix branch target to loop16 2018-03-06 23:01:02 +05:30
memcpy.S
memmove.S
memset-reg.h
memset.S
memusage.h
preconfigure
rawmemchr.S
setjmp.S
sotruss-lib.c
stackinfo.h
start.S
stpcpy.S
strchr.S
strchrnul.S
strcmp.S aarch64/strcmp: fix misaligned loop jump target 2018-02-22 23:48:14 +05:30
strcpy.S
string_private.h
strlen.S
strncmp.S aarch64/strncmp: Use lsr instead of mov+lsr 2018-03-15 08:06:21 +05:30
strnlen.S
strrchr.S
sysdep.h
tls-macros.h
tlsdesc.c
tlsdesc.sym
tst-audit.h
Versions