glibc/sysdeps/x86_64
Noah Goldstein a910d7e164 x86: Shrink code size of memchr-avx2.S
This is not meant as a performance optimization. The previous code was
far to liberal in aligning targets and wasted code size unnecissarily.

The total code size saving is: 59 bytes

There are no major changes in the benchmarks.
Geometric Mean of all benchmarks New / Old: 0.967

Full xcheck passes on x86_64.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

(cherry picked from commit 6dcbb7d95d)

x86: Fix page cross case in rawmemchr-avx2 [BZ #29234]

commit 6dcbb7d95d
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date:   Mon Jun 6 21:11:33 2022 -0700

    x86: Shrink code size of memchr-avx2.S

Changed how the page cross case aligned string (rdi) in
rawmemchr. This was incompatible with how
`L(cross_page_continue)` expected the pointer to be aligned and
would cause rawmemchr to read data start started before the
beginning of the string. What it would read was in valid memory
but could count CHAR matches resulting in an incorrect return
value.

This commit fixes that issue by essentially reverting the changes to
the L(page_cross) case as they didn't really matter.

Test cases added and all pass with the new code (and where confirmed
to fail with the old code).
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

(cherry picked from commit 2c9af8421d)
2022-07-18 20:45:21 -07:00
..
64
fpu x86-64: Optimize load of all bits set into ZMM register [BZ #28252] 2022-04-26 18:18:15 -07:00
multiarch x86: Shrink code size of memchr-avx2.S 2022-07-18 20:45:21 -07:00
nptl nptl: Move pthread_spin_trylock into libc 2021-04-23 17:06:48 +02:00
x32 mcheck: Align struct hdr to MALLOC_ALIGNMENT bytes [BZ #28068] 2021-07-12 18:13:32 -07:00
____longjmp_chk.S
__longjmp.S
_mcount.S
abort-instr.h
add_n.S
addmul_1.S
bsd-_setjmp.S
bsd-setjmp.S
configure x86_64: Remove unneeded static PIE check for undefined weak diagnostic 2021-07-08 14:26:22 -07:00
configure.ac x86_64: Remove unneeded static PIE check for undefined weak diagnostic 2021-07-08 14:26:22 -07:00
crti.S
crtn.S
dl-hwcaps-subdirs.c
dl-irel.h
dl-machine.h x86-64: Ignore r_addend for R_X86_64_GLOB_DAT/R_X86_64_JUMP_SLOT 2022-07-18 20:45:20 -07:00
dl-procinfo.c
dl-runtime.h elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
dl-tls.c elf: Use relaxed atomics for racy accesses [BZ #19329] 2021-05-11 17:16:37 +01:00
dl-tls.h
dl-tlsdesc.h
dl-tlsdesc.S
dl-trampoline.h elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
dl-trampoline.S
ffs.c
ffsll.c
htonl.S
ifuncmain8.c
ifuncmod8.c
Implies
isa.h
jmpbuf-offsets.h
jmpbuf-unwind.h
l10nflist.c
link-defines.sym
locale-defines.sym
localplt.data mtrace: Wean away from malloc hooks 2021-07-22 18:38:06 +05:30
lshift.S
machine-gmon.h
Makefile Add a generic malloc test for MALLOC_ALIGNMENT 2021-07-09 06:39:30 -07:00
memchr.S x86: Fix overflow bug with wmemchr-sse2 and wmemchr-avx2 [BZ #27974] 2021-06-23 14:13:03 -04:00
memcmp.S
memcpy_chk.S
memcpy.S
memmove_chk.S
memmove.S x86: Optimize memmove-vec-unaligned-erms.S 2022-04-26 18:18:16 -07:00
mempcpy_chk.S
mempcpy.S
memrchr.S x86: Optimize memrchr-sse2.S 2022-07-18 20:45:21 -07:00
memset_chk.S
memset.S x86_64: Remove bzero optimization 2022-07-18 20:45:20 -07:00
memusage.h
mp_clz_tab.c
mul_1.S
preconfigure
preconfigure.ac
rawmemchr.S
rshift.S
rtld-offsets.sym
setjmp.S
stackguard-macros.h
stackinfo.h
start.S
stpcpy.S
strcasecmp_l-nonascii.c
strcasecmp_l.S
strcasecmp.S
strcat.S
strchr.S
strchrnul.S
strcmp.S x86: Optimize str{n}casecmp TOLOWER logic in strcmp.S 2022-05-16 18:54:17 -07:00
strcpy.S
strlen.S x86-64: Move strlen.S to multiarch/strlen-vec.S 2021-06-23 10:24:35 -07:00
strncase_l-nonascii.c
strncase_l.S
strncase.S
strncmp.S
strnlen.S
strrchr.S x86: Optimize {str|wcs}rchr-sse2 2022-05-16 18:55:37 -07:00
sub_n.S
submul_1.S
sysdep.h x86: Add COND_VZEROUPPER that can replace vzeroupper if no ret 2022-07-18 20:45:20 -07:00
tls_get_addr.S
tls-macros.h
tlsdesc.c elf: Remove lazy tlsdesc relocation related code 2021-04-21 14:35:53 +01:00
tlsdesc.sym
tst-audit3.c
tst-audit4-aux.c
tst-audit4.c
tst-audit5.c
tst-audit6.c
tst-audit7.c
tst-audit10-aux.c
tst-audit10.c
tst-audit.h
tst-auditmod3a.c
tst-auditmod3b.c
tst-auditmod4a.c
tst-auditmod4b.c
tst-auditmod5a.c
tst-auditmod5b.c
tst-auditmod6a.c
tst-auditmod6b.c
tst-auditmod6c.c
tst-auditmod7a.c
tst-auditmod7b.c
tst-auditmod10a.c
tst-auditmod10b.c
tst-avx512-aux.c
tst-avx512.c
tst-avx512mod.c
tst-avx-aux.c
tst-avx.c
tst-avxmod.c
tst-glibc-hwcaps.c
tst-platform-1.c
tst-platformmod-1.c
tst-platformmod-2.c
tst-quad1.c
tst-quad1pie.c
tst-quad2.c
tst-quad2pie.c
tst-quadmod1.S
tst-quadmod1pie.S
tst-quadmod2.S
tst-quadmod2pie.S
tst-rsi-strlen.c x86-64: Test strlen and wcslen with 0 in the RSI register [BZ #28064] 2021-07-08 18:55:40 -04:00
tst-rsi-wcslen.c x86-64: Test strlen and wcslen with 0 in the RSI register [BZ #28064] 2021-07-08 18:55:40 -04:00
tst-split-dynreloc.c
tst-split-dynreloc.lds
tst-sse.c
tst-ssemod.c
tst-x86_64-1.c
tst-x86_64mod-1.c
tst-x86-64-tls-1.c
Versions
wcschr.S
wcscmp.S
wcslen.S x86: Small improvements for wcslen 2022-05-16 18:55:09 -07:00
wcsrchr.S x86: Optimize {str|wcs}rchr-sse2 2022-05-16 18:55:37 -07:00
wmemset_chk.S
wmemset.S
wordcopy.c