Typos in doc/man* will be fixed in a different commit.
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/20910)
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/20202)
Signed-off-by: Xu Yizhou <xuyizhou1@huawei.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/19914)
Signed-off-by: Xu Yizhou <xuyizhou1@huawei.com>
Reviewed-by: Paul Yang <kaishen.yy@antfin.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/19910)
Fixes#19858
During decryption, the last ciphertext is not fed to next block
correctly when the number of input blocks is exactly 4. Fix this
and add the corresponding test cases.
Thanks xu-yi-zhou for reporting this issue and proposing the fix.
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/19872)
this optimization comes from libgcrypt, increse about 48% speed
Benchmark on my AMD Ryzen Threadripper 3990X
before:
Did 5752000 SM4 setup operations in 1000151us (5751131.6 ops/sec)
after:
Did 8506000 SM4 setup operations in 1000023us (8505804.4 ops/sec)
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Hugo Landau <hlandau@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/19270)
This patch optimizes SM4 for ARM processor using ASIMD instruction
It will improve performance if both of following conditions are met:
1) Input data equal to or more than 4 blocks
2) Cipher mode allows parallelism, including ECB,CTR,GCM or CBC decryption
This patch implements SM4 SBOX lookup in vector registers, with the
benefit of constant processing time over existing C implementation.
It is only enabled for micro-architecture N1/V1. In the ideal scenario,
performance can reach up to 2.7X
When either of above two conditions is not met, e.g. single block input
or CFB/OFB mode, CBC encryption, performance could drop about 50%.
The assembly code has been reviewed internally by ARM engineer
Fangming.Fang@arm.com
Signed-off-by: Daniel Hu <Daniel.Hu@arm.com>
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17951)
This patch implements the SM4 optimization for ARM processor,
using SM4 HW instruction, which is an optional feature of
crypto extension for aarch64 V8.
Tested on some modern ARM micro-architectures with SM4 support, the
performance uplift can be observed around 8X~40X over existing
C implementation in openssl. Algorithms that can be parallelized
(like CTR, ECB, CBC decryption) are on higher end, with algorithm
like CBC encryption on lower end (due to inter-block dependency)
Perf data on Yitian-710 2.75GHz hardware, before and after optimization:
Before:
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
SM4-CTR 105787.80k 107837.87k 108380.84k 108462.08k 108549.46k 108554.92k
SM4-ECB 111924.58k 118173.76k 119776.00k 120093.70k 120264.02k 120274.94k
SM4-CBC 106428.09k 109190.98k 109674.33k 109774.51k 109827.41k 109827.41k
After (7.4x - 36.6x faster):
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
SM4-CTR 781979.02k 2432994.28k 3437753.86k 3834177.88k 3963715.58k 3974556.33k
SM4-ECB 937590.69k 2941689.02k 3945751.81k 4328655.87k 4459181.40k 4468692.31k
SM4-CBC 890639.88k 1027746.58k 1050621.78k 1056696.66k 1058613.93k 1058701.31k
Signed-off-by: Daniel Hu <Daniel.Hu@arm.com>
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17455)
Currently, there are two different directories which contain internal
header files of libcrypto which are meant to be shared internally:
While header files in 'include/internal' are intended to be shared
between libcrypto and libssl, the files in 'crypto/include/internal'
are intended to be shared inside libcrypto only.
To make things complicated, the include search path is set up in such
a way that the directive #include "internal/file.h" could refer to
a file in either of these two directoroes. This makes it necessary
in some cases to add a '_int.h' suffix to some files to resolve this
ambiguity:
#include "internal/file.h" # located in 'include/internal'
#include "internal/file_int.h" # located in 'crypto/include/internal'
This commit moves the private crypto headers from
'crypto/include/internal' to 'include/crypto'
As a result, the include directives become unambiguous
#include "internal/file.h" # located in 'include/internal'
#include "crypto/file.h" # located in 'include/crypto'
hence the superfluous '_int.h' suffixes can be stripped.
The files 'store_int.h' and 'store.h' need to be treated specially;
they are joined into a single file.
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/9333)