glibc/sysdeps
Noah Goldstein b05bd59823 x86: Optimize memrchr-avx2.S
The new code:
    1. prioritizes smaller user-arg lengths more.
    2. optimizes target placement more carefully
    3. reuses logic more
    4. fixes up various inefficiencies in the logic. The biggest
       case here is the `lzcnt` logic for checking returns which
       saves either a branch or multiple instructions.

The total code size saving is: 306 bytes
Geometric Mean of all benchmarks New / Old: 0.760

Regressions:
There are some regressions. Particularly where the length (user arg
length) is large but the position of the match char is near the
beginning of the string (in first VEC). This case has roughly a
10-20% regression.

This is because the new logic gives the hot path for immediate matches
to shorter lengths (the more common input). This case has roughly
a 15-45% speedup.

Full xcheck passes on x86_64.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

(cherry picked from commit af5306a735)
2022-07-18 20:45:21 -07:00
..
aarch64 elf: Fix runtime linker auditing on aarch64 (BZ #26643) 2022-04-12 13:33:10 -04:00
alpha elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
arc elf: Fix dynamic-link.h usage on rtld.c 2022-04-08 14:18:11 -04:00
arm elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
csky elf: Fix dynamic-link.h usage on rtld.c 2022-04-08 14:18:11 -04:00
generic csu: Implement and use _dl_early_allocate during static startup 2022-05-19 12:13:53 +02:00
gnu hurd: Fix glob lstat compatibility 2021-07-22 20:31:52 +02:00
hppa hppa: Remove _dl_skip_args usage (BZ# 29165) 2022-06-10 09:13:54 -03:00
htl htl: Do not expose pthread hidden proto outside libpthread 2021-07-18 20:25:33 +00:00
hurd Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
i386 i386: Regenerate ulps 2022-04-27 21:20:43 -04:00
ia64 elf: Issue la_symbind for bind-now (BZ #23734) 2022-04-12 13:32:59 -04:00
ieee754 Update math: redirect roundeven function 2021-06-27 07:56:57 -07:00
m68k elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
mach rtld: Use generic argv adjustment in ld.so [BZ #23293] 2022-05-19 16:48:47 +01:00
microblaze elf: Fix dynamic-link.h usage on rtld.c 2022-04-08 14:18:11 -04:00
mips elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
nios2 nios2: Remove _dl_skip_args usage (BZ# 29187) 2022-06-10 09:15:00 -03:00
nptl nptl: Handle spurious EINTR when thread cancellation is disabled (BZ#29029) 2022-04-15 09:52:54 -03:00
posix socket: Use 64 bit stat for isfdtype (BZ# 29209) 2022-06-01 13:34:51 -03:00
powerpc powerpc: Fix VSX register number on __strncpy_power9 [BZ #29197] 2022-06-07 15:34:20 -03:00
pthread nptl: Fix __libc_cleanup_pop_restore asynchronous restore (BZ#29214) 2022-06-08 17:15:08 -03:00
riscv elf: Fix dynamic-link.h usage on rtld.c 2022-04-08 14:18:11 -04:00
s390 S390: Enable static PIE 2022-05-19 17:15:57 +02:00
sh elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
sparc elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
unix linux: Fix mq_timereceive check for 32 bit fallback code (BZ 29304) 2022-06-30 10:46:39 -03:00
wordsize-32 Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
wordsize-64 Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
x86 x86: Improve L to support L(XXX_SYMBOL (YYY, ZZZ)) 2022-05-16 18:52:19 -07:00
x86_64 x86: Optimize memrchr-avx2.S 2022-07-18 20:45:21 -07:00