glibc/sysdeps
Raghuveer Devulapalli 82a707aeb7 x86_64: Add strstr function with 512-bit EVEX
Adding a 512-bit EVEX version of strstr. The algorithm works as follows:

(1) We spend a few cycles at the begining to peek into the needle. We
locate an edge in the needle (first occurance of 2 consequent distinct
characters) and also store the first 64-bytes into a zmm register.

(2) We search for the edge in the haystack by looking into one cache
line of the haystack at a time. This avoids having to read past a page
boundary which can cause a seg fault.

(3) If an edge is found in the haystack we first compare the first
64-bytes of the needle (already stored in a zmm register) before we
proceed with a full string compare performed byte by byte.

Benchmarking results: (old = strstr_sse2_unaligned, new = strstr_avx512)

Geometric mean of all benchmarks: new / old =  0.66

Difficult skiptable(0) : new / old =  0.02
Difficult skiptable(1) : new / old =  0.01
Difficult 2-way : new / old =  0.25
Difficult testing first 2 : new / old =  1.26
Difficult skiptable(0) : new / old =  0.05
Difficult skiptable(1) : new / old =  0.06
Difficult 2-way : new / old =  0.26
Difficult testing first 2 : new / old =  1.05
Difficult skiptable(0) : new / old =  0.42
Difficult skiptable(1) : new / old =  0.24
Difficult 2-way : new / old =  0.21
Difficult testing first 2 : new / old =  1.04
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

(cherry picked from commit 5082a287d5)

x86: Remove __mmask intrinsics in strstr-avx512.c

The intrinsics are not available before GCC7 and using standard
operators generates code of equivalent or better quality.

Removed:
    _cvtmask64_u64
    _kshiftri_mask64
    _kand_mask64

Geometric Mean of 5 Runs of Full Benchmark Suite New / Old: 0.958

(cherry picked from commit f2698954ff)
2022-07-18 20:45:20 -07:00
..
aarch64 elf: Fix runtime linker auditing on aarch64 (BZ #26643) 2022-04-12 13:33:10 -04:00
alpha elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
arc elf: Fix dynamic-link.h usage on rtld.c 2022-04-08 14:18:11 -04:00
arm elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
csky elf: Fix dynamic-link.h usage on rtld.c 2022-04-08 14:18:11 -04:00
generic csu: Implement and use _dl_early_allocate during static startup 2022-05-19 12:13:53 +02:00
gnu hurd: Fix glob lstat compatibility 2021-07-22 20:31:52 +02:00
hppa hppa: Remove _dl_skip_args usage (BZ# 29165) 2022-06-10 09:13:54 -03:00
htl htl: Do not expose pthread hidden proto outside libpthread 2021-07-18 20:25:33 +00:00
hurd Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
i386 i386: Regenerate ulps 2022-04-27 21:20:43 -04:00
ia64 elf: Issue la_symbind for bind-now (BZ #23734) 2022-04-12 13:32:59 -04:00
ieee754 Update math: redirect roundeven function 2021-06-27 07:56:57 -07:00
m68k elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
mach rtld: Use generic argv adjustment in ld.so [BZ #23293] 2022-05-19 16:48:47 +01:00
microblaze elf: Fix dynamic-link.h usage on rtld.c 2022-04-08 14:18:11 -04:00
mips elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
nios2 nios2: Remove _dl_skip_args usage (BZ# 29187) 2022-06-10 09:15:00 -03:00
nptl nptl: Handle spurious EINTR when thread cancellation is disabled (BZ#29029) 2022-04-15 09:52:54 -03:00
posix socket: Use 64 bit stat for isfdtype (BZ# 29209) 2022-06-01 13:34:51 -03:00
powerpc powerpc: Fix VSX register number on __strncpy_power9 [BZ #29197] 2022-06-07 15:34:20 -03:00
pthread nptl: Fix __libc_cleanup_pop_restore asynchronous restore (BZ#29214) 2022-06-08 17:15:08 -03:00
riscv elf: Fix dynamic-link.h usage on rtld.c 2022-04-08 14:18:11 -04:00
s390 S390: Enable static PIE 2022-05-19 17:15:57 +02:00
sh elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
sparc elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
unix linux: Fix mq_timereceive check for 32 bit fallback code (BZ 29304) 2022-06-30 10:46:39 -03:00
wordsize-32 Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
wordsize-64 Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
x86 x86: Improve L to support L(XXX_SYMBOL (YYY, ZZZ)) 2022-05-16 18:52:19 -07:00
x86_64 x86_64: Add strstr function with 512-bit EVEX 2022-07-18 20:45:20 -07:00