This patch optimizes SM4 for ARM processor using ASIMD instruction
It will improve performance if both of following conditions are met:
1) Input data equal to or more than 4 blocks
2) Cipher mode allows parallelism, including ECB,CTR,GCM or CBC decryption
This patch implements SM4 SBOX lookup in vector registers, with the
benefit of constant processing time over existing C implementation.
It is only enabled for micro-architecture N1/V1. In the ideal scenario,
performance can reach up to 2.7X
When either of above two conditions is not met, e.g. single block input
or CFB/OFB mode, CBC encryption, performance could drop about 50%.
The assembly code has been reviewed internally by ARM engineer
Fangming.Fang@arm.com
Signed-off-by: Daniel Hu <Daniel.Hu@arm.com>
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17951)
This patch implements the SM4 optimization for ARM processor,
using SM4 HW instruction, which is an optional feature of
crypto extension for aarch64 V8.
Tested on some modern ARM micro-architectures with SM4 support, the
performance uplift can be observed around 8X~40X over existing
C implementation in openssl. Algorithms that can be parallelized
(like CTR, ECB, CBC decryption) are on higher end, with algorithm
like CBC encryption on lower end (due to inter-block dependency)
Perf data on Yitian-710 2.75GHz hardware, before and after optimization:
Before:
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
SM4-CTR 105787.80k 107837.87k 108380.84k 108462.08k 108549.46k 108554.92k
SM4-ECB 111924.58k 118173.76k 119776.00k 120093.70k 120264.02k 120274.94k
SM4-CBC 106428.09k 109190.98k 109674.33k 109774.51k 109827.41k 109827.41k
After (7.4x - 36.6x faster):
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
SM4-CTR 781979.02k 2432994.28k 3437753.86k 3834177.88k 3963715.58k 3974556.33k
SM4-ECB 937590.69k 2941689.02k 3945751.81k 4328655.87k 4459181.40k 4468692.31k
SM4-CBC 890639.88k 1027746.58k 1050621.78k 1056696.66k 1058613.93k 1058701.31k
Signed-off-by: Daniel Hu <Daniel.Hu@arm.com>
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17455)
For functions that exist in 1.1.1 provide a simple aliases via #define.
Fixes#15236
Functions with OSSL_DECODER_, OSSL_ENCODER_, OSSL_STORE_LOADER_,
EVP_KEYEXCH_, EVP_KEM_, EVP_ASYM_CIPHER_, EVP_SIGNATURE_,
EVP_KEYMGMT_, EVP_RAND_, EVP_MAC_, EVP_KDF_, EVP_PKEY_,
EVP_MD_, and EVP_CIPHER_ prefixes are renamed.
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15405)
Add a "where did this EVP_{CIPHER,MD} come from" flag: global, via fetch,
or via EVP_{CIPHER,MD}_meth_new. Update EVP_{CIPHER,MD}_free to handle all
three origins. The flag is deliberately right before some function pointers,
so that compile-time failures (int/pointer) will occur, as opposed to
taking a bit in the existing "flags" field. The "global variable" flag
is non-zero, so the default case of using OPENSSL_zalloc (for provider
ciphers), will do the right thing. Ref-counting is a no-op for
Make up_ref no-op for global MD and CIPHER objects
Deprecate EVP_MD_CTX_md(). Added EVP_MD_CTX_get0_md() (same semantics as
the deprecated function) and EVP_MD_CTX_get1_md(). Likewise, deprecate
EVP_CIPHER_CTX_cipher() in favor of EVP_CIPHER_CTX_get0_cipher(), and add
EVP_CIPHER_CTX_get1_CIPHER().
Refactor EVP_MD_free() and EVP_MD_meth_free() to call new common
evp_md_free_int() function.
Refactor EVP_CIPHER_free() and EVP_CIPHER_meth_free() to call new common
evp_cipher_free_int() function.
Also change some flags tests to explicit test == or != zero. E.g.,
if (flags & x) --> if ((flags & x) != 0)
if (!(flags & x)) --> if ((flags & x) == 0)
Only done for those lines where "get0_cipher" calls were made.
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/14193)
Inline the pre-13273237a65d46186b6bea0b51aec90670d4598a versions
of EVP_CIPHER_CTX_iv(), EVP_CIPHER_CTX_original_iv(), and
EVP_CIPHER_CTX_iv_noconst() in e_sm4.c.
For the legacy implementations, there's no need to use an
in-provider storage for the IV, when the crypto operations
themselves will be performed outside of the provider.
Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org>
(Merged from https://github.com/openssl/openssl/pull/12233)
Currently, there are two different directories which contain internal
header files of libcrypto which are meant to be shared internally:
While header files in 'include/internal' are intended to be shared
between libcrypto and libssl, the files in 'crypto/include/internal'
are intended to be shared inside libcrypto only.
To make things complicated, the include search path is set up in such
a way that the directive #include "internal/file.h" could refer to
a file in either of these two directoroes. This makes it necessary
in some cases to add a '_int.h' suffix to some files to resolve this
ambiguity:
#include "internal/file.h" # located in 'include/internal'
#include "internal/file_int.h" # located in 'crypto/include/internal'
This commit moves the private crypto headers from
'crypto/include/internal' to 'include/crypto'
As a result, the include directives become unambiguous
#include "internal/file.h" # located in 'include/internal'
#include "crypto/file.h" # located in 'include/crypto'
hence the superfluous '_int.h' suffixes can be stripped.
The files 'store_int.h' and 'store.h' need to be treated specially;
they are joined into a single file.
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/9333)