Also fully duplicate the context on dup
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/18142)
When the context is reinitialized, i.e. the same key should be used
we must properly reinitialize the underlying implementation.
However in POLY1305 case it does not make sense as this special MAC
should not reuse keys. We fail with this provided implementation
when reinitialization happens.
Fixes#17811
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/18100)
CLA: trivial
Reviewed-by: Matthias St. Pierre <Matthias.St.Pierre@ncp-e.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/18023)
This patch optimizes SM4 for ARM processor using ASIMD instruction
It will improve performance if both of following conditions are met:
1) Input data equal to or more than 4 blocks
2) Cipher mode allows parallelism, including ECB,CTR,GCM or CBC decryption
This patch implements SM4 SBOX lookup in vector registers, with the
benefit of constant processing time over existing C implementation.
It is only enabled for micro-architecture N1/V1. In the ideal scenario,
performance can reach up to 2.7X
When either of above two conditions is not met, e.g. single block input
or CFB/OFB mode, CBC encryption, performance could drop about 50%.
The assembly code has been reviewed internally by ARM engineer
Fangming.Fang@arm.com
Signed-off-by: Daniel Hu <Daniel.Hu@arm.com>
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17951)
This refactors OSSL_LIB_CTX to avoid using CRYPTO_EX_DATA. The assorted
objects to be managed by OSSL_LIB_CTX are hardcoded and are initialized
eagerly rather than lazily, which avoids the need for locking on access
in most cases.
Fixes#17116.
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17881)
The scrypt KDF provider's dup method calls kdf_scrypt_new passing a
libctx, but a provider context is expected. Since the provider context
is passed as void *, this was not caught.
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17873)
This patch will allow the SM4-GCM function to leverage the SM4
high-performance CTR crypto interface already implemented for ARM,
which is faster than current single block cipher routine used
for GCM
It does not address the acceleration of GHASH function of GCM,
which can be a future task, still we can see immediate uplift of
performance (up to 4X)
Before this patch:
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
SM4-GCM 186432.92k 394234.05k 587916.46k 639365.12k 648486.91k 652924.25k
After the patch:
SM4-GCM 193924.87k 860940.35k 1696083.71k 2302548.31k 2580411.73k 2607398.91k
Signed-off-by: Daniel Hu <Daniel.Hu@arm.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17814)
This header files are included by multiple other headers.
It's better to add define guards to prevent multi-inclusion.
Adhere to the coding style, all preprocessor directives inside
the guards gain a space.
Signed-off-by: Weiguo Li <liwg06@foxmail.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Matthias St. Pierre <Matthias.St.Pierre@ncp-e.com>
(Merged from https://github.com/openssl/openssl/pull/17666)
Since the OPENSSL_strdup() may return NULL if allocation
fails, it should be better to check the return value.
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17651)
Including e_os.h with a path from a header file doesn't work well on
certain exotic platform. It simply fails to build.
Since we don't seem to be able to stop ourselves, the better move is
to move e_os.h to an include directory that's part of the inclusion
path given to the compiler.
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17641)
Add copyright to files that were missing it.
Update license from OpenSSL to Apache as needed.
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17606)
The private key for rsa, dsa, dh and ecx was being included when the
selector was just the public key. (ec was working correctly).
This matches the documented behaviour.
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17200)
Increase the block numbers to 8 for every iteration. Increase the hash
table capacity. Make use of EOR3 instruction to improve the performance.
This can improve performance 25-40% on out-of-order microarchitectures
with a large number of fast execution units, such as Neoverse V1. We also
see 20-30% performance improvements on other architectures such as the M1.
Assembly code reviewd by Tom Cosgrove (ARM).
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15916)
Assembly code reviewed by Shricharan Srivatsan <ssrivat@us.ibm.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/16854)
This involves the following functions:
ERR_new(), ERR_set_debug(), ERR_set_error(), ERR_vset_error(),
ERR_set_mark(), ERR_clear_last_mark(), ERR_pop_to_mark(void)
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17474)
This patch implements the SM4 optimization for ARM processor,
using SM4 HW instruction, which is an optional feature of
crypto extension for aarch64 V8.
Tested on some modern ARM micro-architectures with SM4 support, the
performance uplift can be observed around 8X~40X over existing
C implementation in openssl. Algorithms that can be parallelized
(like CTR, ECB, CBC decryption) are on higher end, with algorithm
like CBC encryption on lower end (due to inter-block dependency)
Perf data on Yitian-710 2.75GHz hardware, before and after optimization:
Before:
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
SM4-CTR 105787.80k 107837.87k 108380.84k 108462.08k 108549.46k 108554.92k
SM4-ECB 111924.58k 118173.76k 119776.00k 120093.70k 120264.02k 120274.94k
SM4-CBC 106428.09k 109190.98k 109674.33k 109774.51k 109827.41k 109827.41k
After (7.4x - 36.6x faster):
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
SM4-CTR 781979.02k 2432994.28k 3437753.86k 3834177.88k 3963715.58k 3974556.33k
SM4-ECB 937590.69k 2941689.02k 3945751.81k 4328655.87k 4459181.40k 4468692.31k
SM4-CBC 890639.88k 1027746.58k 1050621.78k 1056696.66k 1058613.93k 1058701.31k
Signed-off-by: Daniel Hu <Daniel.Hu@arm.com>
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17455)
Most of the DRGB code is run under lock from the EVP layer. This is relied
on to make the majority of TSAN operations safe. However, it is still necessary
to enable locking for all DRBGs created.
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
(Merged from https://github.com/openssl/openssl/pull/17479)
There is risk to pass the gctx with NULL value to rsa_gen_set_params
which dereference gctx directly.
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17429)