glibc/sysdeps/powerpc/powerpc64
Adhemerval Zanella 71ae86478e PowerPC: memset optimization for POWER8/PPC64
This patch adds an optimized memset implementation for POWER8.  For
sizes from 0 to 255 bytes, a word/doubleword algorithm similar to
POWER7 optimized one is used.

For size higher than 255 two strategies are used:

1. If the constant is different than 0, the memory is written with
   altivec vector instruction;

2. If constant is 0, dbcz instructions are used.  The loop is unrolled
   to clear 512 byte at time.

Using vector instructions increases throughput considerable, with a
double performance for sizes larger than 1024.  The dcbz loops unrolls
also shows performance improvement, by doubling throughput for sizes
larger than 8192 bytes.
2014-09-10 07:39:46 -04:00
..
970
a2
bits
cell
fpu
multiarch PowerPC: memset optimization for POWER8/PPC64 2014-09-10 07:39:46 -04:00
power4 PowerPC: multiarch bzero cleanup for PPC64 2014-09-10 07:39:46 -04:00
power5
power5+
power6 PowerPC: multiarch bzero cleanup for PPC64 2014-09-10 07:39:46 -04:00
power6x
power7 PowerPC: multiarch bzero cleanup for PPC64 2014-09-10 07:39:46 -04:00
power8 PowerPC: memset optimization for POWER8/PPC64 2014-09-10 07:39:46 -04:00
__longjmp-common.S
__longjmp.S
addmul_1.S
backtrace.c
bsd-_setjmp.S
bsd-setjmp.S
bzero.S
configure
configure.ac
crti.S
crtn.S
dl-dtprocnum.h
dl-irel.h
dl-machine.c
dl-machine.h
dl-trampoline.S
entry.h PowerPC: Fix gprof entry point for LE 2014-07-30 09:01:25 -03:00
ffsll.c
hp-timing.h Always provide HP_SMALL_TIMING_AVAIL 2014-07-03 08:38:36 -07:00
Implies
lshift.S
Makefile Remove HP_TIMING_DIFF_INIT and dl_hp_timing_overhead 2014-07-03 08:38:25 -07:00
memcpy.S
memset.S
mul_1.S
ppc-mcount.S
register-dump.h
rtld-memset.c
setjmp-common.S Remove unnecessary uses of NOT_IN_libc 2014-08-21 10:26:46 +05:30
setjmp.S
stackguard-macros.h
start.S
stpcpy.S
strchr.S
strcmp.S
strcpy.S
strlen.S
strncmp.S
submul_1.S
sysdep.h
tls-macros.h
tst-audit.h