glibc/sysdeps/x86_64
Noah Goldstein c796418d00 x86: Optimize L(less_vec) case in memcmp-evex-movbe.S
No bug.
Optimizations are twofold.

1) Replace page cross and 0/1 checks with masked load instructions in
   L(less_vec). In applications this reduces branch-misses in the
   hot [0, 32] case.
2) Change controlflow so that L(less_vec) case gets the fall through.

Change 2) helps copies in the [0, 32] size range but comes at the cost
of copies in the [33, 64] size range.  From profiles of GCC and
Python3, 94%+ and 99%+ of calls are in the [0, 32] range so this
appears to the the right tradeoff.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit abddd61de0)
2022-04-26 18:18:16 -07:00
..
64
fpu x86-64: Optimize load of all bits set into ZMM register [BZ #28252] 2022-04-26 18:18:15 -07:00
multiarch x86: Optimize L(less_vec) case in memcmp-evex-movbe.S 2022-04-26 18:18:16 -07:00
nptl nptl: Move pthread_spin_trylock into libc 2021-04-23 17:06:48 +02:00
x32 mcheck: Align struct hdr to MALLOC_ALIGNMENT bytes [BZ #28068] 2021-07-12 18:13:32 -07:00
____longjmp_chk.S
__longjmp.S
_mcount.S
abort-instr.h
add_n.S
addmul_1.S
bsd-_setjmp.S
bsd-setjmp.S
bzero.S
configure x86_64: Remove unneeded static PIE check for undefined weak diagnostic 2021-07-08 14:26:22 -07:00
configure.ac x86_64: Remove unneeded static PIE check for undefined weak diagnostic 2021-07-08 14:26:22 -07:00
crti.S
crtn.S
dl-hwcaps-subdirs.c <sys/platform/x86.h>: Remove the C preprocessor magic 2021-01-21 05:58:17 -08:00
dl-irel.h
dl-machine.h elf: Fix dynamic-link.h usage on rtld.c 2022-04-08 14:18:11 -04:00
dl-procinfo.c
dl-runtime.h elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
dl-tls.c elf: Use relaxed atomics for racy accesses [BZ #19329] 2021-05-11 17:16:37 +01:00
dl-tls.h
dl-tlsdesc.h x86_64: Remove lazy tlsdesc relocation related code 2021-04-15 09:47:47 +01:00
dl-tlsdesc.S x86_64: Remove lazy tlsdesc relocation related code 2021-04-15 09:47:47 +01:00
dl-trampoline.h elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
dl-trampoline.S
ffs.c
ffsll.c
htonl.S
ifuncmain8.c
ifuncmod8.c
Implies Remove dbl-64/wordsize-64 (part 2) 2021-01-07 15:26:26 +00:00
isa.h
jmpbuf-offsets.h
jmpbuf-unwind.h
l10nflist.c
link-defines.sym
locale-defines.sym
localplt.data mtrace: Wean away from malloc hooks 2021-07-22 18:38:06 +05:30
lshift.S
machine-gmon.h
Makefile Add a generic malloc test for MALLOC_ALIGNMENT 2021-07-09 06:39:30 -07:00
memchr.S x86: Fix overflow bug with wmemchr-sse2 and wmemchr-avx2 [BZ #27974] 2021-06-23 14:13:03 -04:00
memcmp.S
memcpy_chk.S
memcpy.S
memmove_chk.S
memmove.S x86: Optimize memmove-vec-unaligned-erms.S 2022-04-26 18:18:16 -07:00
mempcpy_chk.S
mempcpy.S
memrchr.S
memset_chk.S
memset.S x86: Optimize memset-vec-unaligned-erms.S 2022-04-26 18:18:16 -07:00
memusage.h
mp_clz_tab.c
mul_1.S
preconfigure
preconfigure.ac
rawmemchr.S
rshift.S
rtld-offsets.sym
setjmp.S
stackguard-macros.h
stackinfo.h
start.S Reduce the statically linked startup code [BZ #23323] 2021-02-25 12:13:02 +01:00
stpcpy.S
strcasecmp_l-nonascii.c
strcasecmp_l.S
strcasecmp.S
strcat.S
strchr.S
strchrnul.S
strcmp.S x86-64: Replace movzx with movzbl 2022-04-26 18:18:16 -07:00
strcpy.S
strcspn.S
strlen.S x86-64: Move strlen.S to multiarch/strlen-vec.S 2021-06-23 10:24:35 -07:00
strncase_l-nonascii.c
strncase_l.S
strncase.S
strncmp.S
strnlen.S
strpbrk.S
strrchr.S
strspn.S
sub_n.S
submul_1.S
sysdep.h x86-64: Add AVX optimized string/memory functions for RTM 2021-03-29 07:40:17 -07:00
tls_get_addr.S
tls-macros.h
tlsdesc.c elf: Remove lazy tlsdesc relocation related code 2021-04-21 14:35:53 +01:00
tlsdesc.sym
tst-audit3.c
tst-audit4-aux.c
tst-audit4.c
tst-audit5.c
tst-audit6.c
tst-audit7.c
tst-audit10-aux.c
tst-audit10.c
tst-audit.h
tst-auditmod3a.c
tst-auditmod3b.c
tst-auditmod4a.c
tst-auditmod4b.c
tst-auditmod5a.c
tst-auditmod5b.c
tst-auditmod6a.c
tst-auditmod6b.c
tst-auditmod6c.c
tst-auditmod7a.c
tst-auditmod7b.c
tst-auditmod10a.c
tst-auditmod10b.c
tst-avx512-aux.c
tst-avx512.c
tst-avx512mod.c
tst-avx-aux.c
tst-avx.c
tst-avxmod.c
tst-glibc-hwcaps.c <sys/platform/x86.h>: Remove the C preprocessor magic 2021-01-21 05:58:17 -08:00
tst-platform-1.c
tst-platformmod-1.c
tst-platformmod-2.c
tst-quad1.c
tst-quad1pie.c
tst-quad2.c
tst-quad2pie.c
tst-quadmod1.S
tst-quadmod1pie.S
tst-quadmod2.S
tst-quadmod2pie.S
tst-rsi-strlen.c x86-64: Test strlen and wcslen with 0 in the RSI register [BZ #28064] 2021-07-08 18:55:40 -04:00
tst-rsi-wcslen.c x86-64: Test strlen and wcslen with 0 in the RSI register [BZ #28064] 2021-07-08 18:55:40 -04:00
tst-split-dynreloc.c
tst-split-dynreloc.lds
tst-sse.c
tst-ssemod.c
tst-x86_64-1.c
tst-x86_64mod-1.c
tst-x86-64-tls-1.c x86_64: Correct THREAD_SETMEM/THREAD_SETMEM_NC for movq [BZ #27591] 2021-04-01 07:00:22 -07:00
Versions
wcschr.S
wcscmp.S
wcslen.S
wcsrchr.S
wmemset_chk.S
wmemset.S
wordcopy.c