glibc/sysdeps/x86_64
Noah Goldstein 642933158e x86: Optimize and shrink st{r|p}{n}{cat|cpy}-avx2 functions
Optimizations are:
    1. Use more overlapping stores to avoid branches.
    2. Reduce how unrolled the aligning copies are (this is more of a
       code-size save, its a negative for some sizes in terms of
       perf).
    3. For st{r|p}n{cat|cpy} re-order the branches to minimize the
       number that are taken.

Performance Changes:

    Times are from N = 10 runs of the benchmark suite and are
    reported as geometric mean of all ratios of
    New Implementation / Old Implementation.

    strcat-avx2      -> 0.998
    strcpy-avx2      -> 0.937
    stpcpy-avx2      -> 0.971

    strncpy-avx2     -> 0.793
    stpncpy-avx2     -> 0.775

    strncat-avx2     -> 0.962

Code Size Changes:
    function         -> Bytes New / Bytes Old -> Ratio

    strcat-avx2      ->  685 / 1639 -> 0.418
    strcpy-avx2      ->  560 /  903 -> 0.620
    stpcpy-avx2      ->  592 /  939 -> 0.630

    strncpy-avx2     -> 1176 / 2390 -> 0.492
    stpncpy-avx2     -> 1268 / 2438 -> 0.520

    strncat-avx2     -> 1042 / 2563 -> 0.407

Notes:
    1. Because of the significant difference between the
       implementations they are split into three files.

           strcpy-avx2.S    -> strcpy, stpcpy, strcat
           strncpy-avx2.S   -> strncpy
           strncat-avx2.S    > strncat

       I couldn't find a way to merge them without making the
       ifdefs incredibly difficult to follow.

Full check passes on x86-64 and build succeeds for all ISA levels w/
and w/o multiarch.
2022-11-08 19:22:33 -08:00
..
64
fpu x86: Remove .tfloat usage 2022-10-03 14:03:21 -03:00
multiarch x86: Optimize and shrink st{r|p}{n}{cat|cpy}-avx2 functions 2022-11-08 19:22:33 -08:00
nptl elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
x32 Fix build with GCC 13 _FloatN, _FloatNx built-in functions 2022-10-31 23:20:08 +00:00
____longjmp_chk.S
__longjmp.S Introduce <pointer_guard.h>, extracted from <sysdep.h> 2022-10-18 17:03:55 +02:00
_mcount.S
abort-instr.h
add_n.S
addmul_1.S
bsd-_setjmp.S
bsd-setjmp.S
configure
configure.ac
crti.S
crtn.S
dl-hwcaps-subdirs.c
dl-irel.h
dl-machine.h x86-64: Only define used SSE/AVX/AVX512 run-time resolvers 2022-06-27 14:17:52 -07:00
dl-procinfo.c
dl-runtime.h
dl-tls.c
dl-tls.h
dl-tlsdesc.h
dl-tlsdesc.S
dl-trampoline.h x86-64: Small improvements to dl-trampoline.S 2022-06-29 19:47:52 -07:00
dl-trampoline.S x86-64: Small improvements to dl-trampoline.S 2022-06-29 19:47:52 -07:00
ffs.c
ffsll.c
ifuncmain8.c
ifuncmod8.c
Implies
isa-default-impl.h x86: Remove faulty sanity tests for RTLD build with no multiarch 2022-06-23 11:14:08 -07:00
isa.h
jmpbuf-offsets.h
jmpbuf-unwind.h Use PTR_MANGLE and PTR_DEMANGLE unconditionally in C sources 2022-10-18 17:04:10 +02:00
l10nflist.c
link-defines.sym
locale-defines.sym
localplt.data elf: Rework exception handling in the dynamic loader [BZ #25486] 2022-11-03 09:39:31 +01:00
lshift.S
machine-gmon.h
Makefile x86_64: Remove platform directory library loading test 2022-10-06 07:59:48 -03:00
memchr.S x86: Add support for compiling {raw|w}memchr with high ISA level 2022-06-22 19:41:35 -07:00
memcmp-isa-default-impl.h x86: Add support for building {w}memcmp{eq} with explicit ISA level 2022-07-05 16:42:42 -07:00
memcmp.S x86: Add support for building {w}memcmp{eq} with explicit ISA level 2022-07-05 16:42:42 -07:00
memcmpeq.S x86: Add support for building {w}memcmp{eq} with explicit ISA level 2022-07-05 16:42:42 -07:00
memcpy_chk.S
memcpy.S
memmove_chk.S
memmove.S x86: Add support for building {w}memmove{_chk} with explicit ISA level 2022-07-05 16:42:42 -07:00
mempcpy_chk.S
mempcpy.S
memrchr.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
memset_chk.S
memset.S x86: Add support for building {w}memset{_chk} with explicit ISA level 2022-07-05 16:42:42 -07:00
mp_clz_tab.c
mul_1.S
preconfigure
preconfigure.ac
rawmemchr.S x86: Add support for compiling {raw|w}memchr with high ISA level 2022-06-22 19:41:35 -07:00
rshift.S
rtld-offsets.sym
setjmp.S Introduce <pointer_guard.h>, extracted from <sysdep.h> 2022-10-18 17:03:55 +02:00
stackguard-macros.h
stackinfo.h nptl: x86_64: Use same code for CURRENT_STACK_FRAME and stackinfo_get_sp 2022-08-31 09:04:27 -03:00
start.S
stpcpy.S x86: Add support to build st{p|r}{n}{cpy|cat} with explicit ISA level 2022-07-16 03:07:59 -07:00
stpncpy.S x86: Add support to build st{p|r}{n}{cpy|cat} with explicit ISA level 2022-07-16 03:07:59 -07:00
strcasecmp_l-nonascii.c
strcasecmp_l.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
strcasecmp.S
strcat.S x86: Add support to build st{p|r}{n}{cpy|cat} with explicit ISA level 2022-07-16 03:07:59 -07:00
strchr-isa-default-impl.h x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
strchr.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
strchrnul.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
strcmp.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
strcpy.S x86: Add support to build st{p|r}{n}{cpy|cat} with explicit ISA level 2022-07-16 03:07:59 -07:00
strcspn-generic.c x86: Add support for building str{c|p}{brk|spn} with explicit ISA level 2022-07-05 16:42:42 -07:00
strcspn.c x86: Add support for building str{c|p}{brk|spn} with explicit ISA level 2022-07-05 16:42:42 -07:00
strlen.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
strncase_l-nonascii.c
strncase_l.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
strncase.S
strncat.S x86: Add support to build st{p|r}{n}{cpy|cat} with explicit ISA level 2022-07-16 03:07:59 -07:00
strncmp.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
strncpy.S x86: Add support to build st{p|r}{n}{cpy|cat} with explicit ISA level 2022-07-16 03:07:59 -07:00
strnlen.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
strpbrk-generic.c x86: Add support for building str{c|p}{brk|spn} with explicit ISA level 2022-07-05 16:42:42 -07:00
strpbrk.c x86: Add support for building str{c|p}{brk|spn} with explicit ISA level 2022-07-05 16:42:42 -07:00
strrchr.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
strspn-generic.c x86: Add support for building str{c|p}{brk|spn} with explicit ISA level 2022-07-05 16:42:42 -07:00
strspn.c x86: Add support for building str{c|p}{brk|spn} with explicit ISA level 2022-07-05 16:42:42 -07:00
sub_n.S
submul_1.S
sysdep.h x86-64: Move LP_SIZE definition to its own header 2022-10-18 17:02:08 +02:00
tls_get_addr.S
tlsdesc.c
tlsdesc.sym
tst-audit3.c
tst-audit4-aux.c
tst-audit4.c
tst-audit5.c
tst-audit6.c
tst-audit7.c
tst-audit10-aux.c
tst-audit10.c
tst-audit.h
tst-auditmod3a.c
tst-auditmod3b.c
tst-auditmod4a.c
tst-auditmod4b.c
tst-auditmod5a.c
tst-auditmod5b.c
tst-auditmod6a.c
tst-auditmod6b.c
tst-auditmod6c.c
tst-auditmod7a.c
tst-auditmod7b.c
tst-auditmod10a.c
tst-auditmod10b.c
tst-avx512-aux.c
tst-avx512.c
tst-avx512mod.c
tst-avx-aux.c
tst-avx.c
tst-avxmod.c
tst-glibc-hwcaps.c
tst-platform-1.c
tst-platformmod-1.c
tst-platformmod-2.c
tst-quad1.c
tst-quad1pie.c
tst-quad2.c
tst-quad2pie.c
tst-quadmod1.S
tst-quadmod1pie.S
tst-quadmod2.S
tst-quadmod2pie.S
tst-rsi-strlen.c
tst-rsi-wcslen.c
tst-split-dynreloc.c
tst-split-dynreloc.lds
tst-sse.c
tst-ssemod.c
tst-x86-64-tls-1.c
varshift.c x86: Add support for building str{c|p}{brk|spn} with explicit ISA level 2022-07-05 16:42:42 -07:00
Versions
wcschr.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
wcscmp.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
wcscpy-generic.c x86: Add support to build wcscpy with explicit ISA level 2022-07-16 03:07:59 -07:00
wcscpy.S x86: Add support to build wcscpy with explicit ISA level 2022-07-16 03:07:59 -07:00
wcslen.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
wcsncmp-generic.c x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
wcsncmp.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
wcsnlen-generic.c x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
wcsnlen.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
wcsrchr.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
wmemchr.S x86: Add support for compiling {raw|w}memchr with high ISA level 2022-06-22 19:41:35 -07:00
wmemcmp.S x86: Add support for building {w}memcmp{eq} with explicit ISA level 2022-07-05 16:42:42 -07:00
wmemset_chk.S
wmemset.S
wordcopy.c
x86-lp_size.h x86-64: Move LP_SIZE definition to its own header 2022-10-18 17:02:08 +02:00