openssl/crypto/md5
Min Zhou 3d68e2937e md5: add assembly implementation for loongarch64
This change can improve md5 performance by using a hand-optimized
assembly implementation of the inner loop of md5 calculation.
This implementation refered to md5-x86_64.pl and made more effort
to reorder instructions for separating data dependencies as much
as possible.

Test with:
$ openssl speed md5

3A5000
type             16 bytes    64 bytes     256 bytes    1024 bytes   8192 bytes   16384 bytes
md5              45061.04k   130440.75k   291105.28k   421101.23k   484639.27k   488320.43k
md5-modified     47179.95k   139015.57k   308836.69k   445963.26k   512540.67k   518215.00k
                   +5%         +7%          +6%          +6%          +6%          +6%

3A6000
type             16 bytes    64 bytes     256 bytes    1024 bytes   8192 bytes   16384 bytes
md5              60070.06k   161822.76k   325817.60k   438017.02k   486864.21k   492243.31k
md5-modified     62827.74k   170294.04k   343795.03k   463324.50k   515831.13k   520060.93k
                   +5%         +5%          +6%          +6%          +6%          +6%

Signed-off-by: Min Zhou <zhoumin@loongson.cn>
Co-authored-by: Xi Ruoyao <xry111@xry111.site>

Reviewed-by: Shane Lontis <shane.lontis@oracle.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/21704)
2023-12-27 10:15:29 +01:00
..
asm
build.info
md5_dgst.c
md5_local.h
md5_one.c
md5_sha1.c