glibc/sysdeps
Noah Goldstein ea19c490a3 x86: Improve vec generation in memset-vec-unaligned-erms.S
No bug.

Split vec generation into multiple steps. This allows the
broadcast in AVX2 to use 'xmm' registers for the L(less_vec)
case. This saves an expensive lane-cross instruction and removes
the need for 'vzeroupper'.

For SSE2 replace 2x 'punpck' instructions with zero-idiom 'pxor' for
byte broadcast.

Results for memset-avx2 small (geomean of N = 20 benchset runs).

size, New Time, Old Time, New / Old
   0,    4.100,    3.831,     0.934
   1,    5.074,    4.399,     0.867
   2,    4.433,    4.411,     0.995
   4,    4.487,    4.415,     0.984
   8,    4.454,    4.396,     0.987
  16,    4.502,    4.443,     0.987

All relevant string/wcsmbs tests are passing.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

(cherry picked from commit b62ace2740)
2022-05-05 08:54:11 -07:00
..
aarch64
alpha
arc
arm
csky
generic
gnu
hppa hppa: Fix bind-now audit (BZ #28857) 2022-04-12 13:33:17 -04:00
htl
hurd
i386 i386: Regenerate ulps 2022-04-27 21:20:43 -04:00
ia64
ieee754
m68k
mach
microblaze
mips
nios2
nptl nptl: Handle spurious EINTR when thread cancellation is disabled (BZ#29029) 2022-04-15 09:52:54 -03:00
posix
powerpc
pthread nptl: Handle spurious EINTR when thread cancellation is disabled (BZ#29029) 2022-04-15 09:52:54 -03:00
riscv
s390 S390: Add new s390 platform z16. 2022-04-14 14:21:57 +02:00
sh
sparc
unix Add HWCAP2_AFP, HWCAP2_RPRES from Linux 5.17 to AArch64 bits/hwcap.h 2022-05-03 11:08:52 +02:00
wordsize-32
wordsize-64
x86 x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI 2022-04-26 18:18:16 -07:00
x86_64 x86: Improve vec generation in memset-vec-unaligned-erms.S 2022-05-05 08:54:11 -07:00