glibc/sysdeps
Adhemerval Zanella Netto e169aff0e9 x86: Add SSE2 optimized chacha20
It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-amd64-ssse3.S.  It replaces the ROTATE_SHUF_2 (which
uses pshufb) by ROTATE2 and thus making the original implementation
SSE2.

As for generic implementation, the last step that XOR with the
input is omited. The final state register clearing is also
omitted.

On a Ryzen 9 5900X it shows the following improvements (using
formatted bench-arc4random data):

GENERIC                                    MB/s
-----------------------------------------------
arc4random [single-thread]               443.11
arc4random_buf(16) [single-thread]       552.27
arc4random_buf(32) [single-thread]       626.86
arc4random_buf(48) [single-thread]       649.81
arc4random_buf(64) [single-thread]       663.95
arc4random_buf(80) [single-thread]       674.78
arc4random_buf(96) [single-thread]       675.17
arc4random_buf(112) [single-thread]      680.69
arc4random_buf(128) [single-thread]      683.20
-----------------------------------------------

SSE                                        MB/s
-----------------------------------------------
arc4random [single-thread]               704.25
arc4random_buf(16) [single-thread]      1018.17
arc4random_buf(32) [single-thread]      1315.27
arc4random_buf(48) [single-thread]      1449.36
arc4random_buf(64) [single-thread]      1511.16
arc4random_buf(80) [single-thread]      1539.48
arc4random_buf(96) [single-thread]      1571.06
arc4random_buf(112) [single-thread]     1596.16
arc4random_buf(128) [single-thread]     1613.48
-----------------------------------------------

Checked on x86_64-linux-gnu.
2022-07-22 11:58:27 -03:00
..
aarch64 aarch64: Add optimized chacha20 2022-07-22 11:58:27 -03:00
alpha
arc
arm
csky
generic aarch64: Add optimized chacha20 2022-07-22 11:58:27 -03:00
gnu
hppa
htl
hurd
i386 i386: Remove -Wa,-mtune=i686 2022-07-12 11:14:32 -07:00
ia64
ieee754
m68k
mach stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) 2022-07-22 11:58:27 -03:00
microblaze
mips
nios2
nptl stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) 2022-07-22 11:58:27 -03:00
or1k
posix
powerpc
pthread
riscv
s390
sh
sparc
unix stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) 2022-07-22 11:58:27 -03:00
wordsize-32
wordsize-64
x86 x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
x86_64 x86: Add SSE2 optimized chacha20 2022-07-22 11:58:27 -03:00