glibc/sysdeps/aarch64
Wilco Dijkstra 922369032c [AArch64] Optimized memcmp.
This is an optimized memcmp for AArch64.  This is a complete rewrite
using a different algorithm.  The previous version split into cases
where both inputs were aligned, the inputs were mutually aligned and
unaligned using a byte loop.  The new version combines all these cases,
while small inputs of less than 8 bytes are handled separately.

This allows the main code to be sped up using unaligned loads since
there are now at least 8 bytes to be compared.  After the first 8 bytes,
align the first input.  This ensures each iteration does at most one
unaligned access and mutually aligned inputs behave as aligned.
After the main loop, process the last 8 bytes using unaligned accesses.

This improves performance of (mutually) aligned cases by 25% and
unaligned by >500% (yes >6 times faster) on large inputs.

	* sysdeps/aarch64/memcmp.S (memcmp):
	Rewrite of optimized memcmp.
2017-08-10 17:00:38 +01:00
..
bits
fpu
multiarch memcpy_falkor: Fix code style in comments 2017-08-09 12:57:59 +05:30
nptl
soft-fp
__longjmp.S
abort-instr.h
atomic-machine.h
backtrace.c
bsd-_setjmp.S
bsd-setjmp.S
configure
configure.ac
crti.S
crtn.S
dl-irel.h
dl-link.sym
dl-machine.h
dl-sysdep.h
dl-tls.h
dl-tlsdesc.h
dl-tlsdesc.S [AArch64] Add more cfi annotations to tlsdesc entry points 2017-06-21 15:04:37 +01:00
dl-trampoline.S
dl-tunables.list tunables, aarch64: New tunable to override cpu 2017-06-30 22:58:39 +05:30
Implies
jmpbuf-offsets.h
jmpbuf-unwind.h
ldsodefs.h
libc-tls.c
libm-test-ulps
libm-test-ulps-name
linkmap.h
machine-gmon.h
Makefile
math-tests.h
mcount.c
memchr.S
memcmp.S [AArch64] Optimized memcmp. 2017-08-10 17:00:38 +01:00
memcpy.S
memmove.S
memset.S
memusage.h
preconfigure
rawmemchr.S
setjmp.S
sotruss-lib.c
stackinfo.h
start.S
stpcpy.S
strchr.S
strchrnul.S
strcmp.S
strcpy.S
string_private.h
strlen.S
strncmp.S
strnlen.S
strrchr.S
sysdep.h
tls-macros.h
tlsdesc.c
tlsdesc.sym
tst-audit.h
Versions