Description:
Mark Wooden and Franck Rondepierre noted that the square-root-mod-p
operations used in the EdDSA RFC (RFC 8032) can be simplified. For
Ed25519, instead of computing u*v^3 * (u * v^7)^((p-5)/8), we can
compute u * (u*v)^((p-5)/8). This saves 3 multiplications and 2
squarings. For more details (including a proof), see the following
message from the CFRG mailing list:
https://mailarchive.ietf.org/arch/msg/cfrg/qlKpMBqxXZYmDpXXIx6LO3Oznv4/
Note that the Ed448 implementation (see
ossl_curve448_point_decode_like_eddsa_and_mul_by_ratio() in
./crypto/ec/curve448/curve448.c) appears to already use this simpler
method (i.e. it does not follow the method suggested in RFC 8032).
Testing:
Build and then run the test suite:
./Configure -Werror --strict-warnings
make update
make
make test
Numerical testing of the square-root computation can be done using the
following sage script:
def legendre(x,p):
return kronecker(x,p)
# Ed25519
p = 2**255-19
# -1 is a square
if legendre(-1,p)==1:
print("-1 is a square")
# suppose u/v is a square.
# to compute one of its square roots, find x such that
# x**4 == (u/v)**2 .
# this implies
# x**2 == u/v, or
# x**2 == -(u/v) ,
# which implies either x or i*x is a square-root of u/v (where i is a square root of -1).
# we can take x equal to u * (u*v)**((p-5)/8).
# 2 is a generator
# this can be checked by factoring p-1
# and then showing 2**((p-1)/q) != 1 (mod p)
# for all primes q dividing p-1.
g = 2
s = p>>2 # s = (p-1)/4
i = power_mod(g, s, p)
t = p>>3 # t = (p-5)/8
COUNT = 1<<18
while COUNT > 0:
COUNT -= 1
r = randint(0,p-1) # r = u/v
v = randint(1,p-1)
u = mod(r*v,p)
# compute x = u * (u*v)**((p-5)/8)
w = mod(u*v,p)
x = mod(u*power_mod(w, t, p), p)
# check that x**2 == r, or (i*x)**2 == r, or r is not a square
rr = power_mod(x, 2, p)
if rr==r:
continue
rr = power_mod(mod(i*x,p), 2, p)
if rr==r:
continue
if legendre(r,p) != 1:
continue
print("failure!")
exit()
print("passed!")
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17544)
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17542)
This patch implements the SM4 optimization for ARM processor,
using SM4 HW instruction, which is an optional feature of
crypto extension for aarch64 V8.
Tested on some modern ARM micro-architectures with SM4 support, the
performance uplift can be observed around 8X~40X over existing
C implementation in openssl. Algorithms that can be parallelized
(like CTR, ECB, CBC decryption) are on higher end, with algorithm
like CBC encryption on lower end (due to inter-block dependency)
Perf data on Yitian-710 2.75GHz hardware, before and after optimization:
Before:
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
SM4-CTR 105787.80k 107837.87k 108380.84k 108462.08k 108549.46k 108554.92k
SM4-ECB 111924.58k 118173.76k 119776.00k 120093.70k 120264.02k 120274.94k
SM4-CBC 106428.09k 109190.98k 109674.33k 109774.51k 109827.41k 109827.41k
After (7.4x - 36.6x faster):
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
SM4-CTR 781979.02k 2432994.28k 3437753.86k 3834177.88k 3963715.58k 3974556.33k
SM4-ECB 937590.69k 2941689.02k 3945751.81k 4328655.87k 4459181.40k 4468692.31k
SM4-CBC 890639.88k 1027746.58k 1050621.78k 1056696.66k 1058613.93k 1058701.31k
Signed-off-by: Daniel Hu <Daniel.Hu@arm.com>
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17455)
These compilers define _ARCH_PPC64 for 32 bit builds
so we cannot depend solely on this define to identify
32 bit build.
Fixes#17087
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17497)
This takes out the lock step stacks that allow a fast property to name
resolution. Follow on from #17325.
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17388)
Also update and slightly extend the respective documentation and simplify some code.
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/16251)
SM3 hardware instruction is optional feature of crypto extension for
aarch64. This implementation accelerates SM3 via SM3 instructions. For
the platform not supporting SM3 instruction, the original C
implementation still works. Thanks to AliBaba for testing and reporting
the following perf numbers for Yitian710:
Benchmark on T-Head Yitian-710 2.75GHz:
Before:
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
sm3 49297.82k 121062.63k 223106.05k 283371.52k 307574.10k 309400.92k
After (33% - 74% faster):
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
sm3 65640.01k 179121.79k 359854.59k 481448.96k 534055.59k 538274.47k
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17454)
PR #17255 fixed a bug in EVP_DigestInit_ex(). While backporting the PR
to 1.1.1 (see #17472) I spotted an error in the original patch. This fixes
it.
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17473)
Doing the tsan operations under lock would be difficult to arrange here (locks
require memory allocation).
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
(Merged from https://github.com/openssl/openssl/pull/17479)
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17471)
Specifically:
* out of range
* unsigned negatives
* inexact reals
* bad param types
* buffers that are too small
* null function arguments
* unknown sizes of real
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17440)
Attempting to fetch one of the above and providing a query string was
failing with an internal assertion error. We must ensure that we give the
provider when calling ossl_method_store_cache_set()
Fixes#17456
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17459)
We already statically link libcrypto to endecode_test even in a "shared"
build. This can cause problems on some platforms with tests that load the
legacy provider which is dynamically linked to libcrypto. Two versions of
libcrypto are then linked to the same executable which can lead to crashes.
Fixes#17059
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17345)
If an EVP_MD_CTX is reused then memory allocated and stored in md_data
can be leaked unless the EVP_MD's cleanup function is called.
Fixes#17149
Reviewed-by: Dmitry Belyavskiy <beldmit@gmail.com>
(Merged from https://github.com/openssl/openssl/pull/17255)
MDs created via EVP_MD_meth_new() are inherently legacy and therefore
need to go down the legacy route when they are used.
Reviewed-by: Dmitry Belyavskiy <beldmit@gmail.com>
(Merged from https://github.com/openssl/openssl/pull/17255)
When compiling openssl for tianocore compiling abs_val() and pow_10()
fails with the following error because SSE support is disabled:
crypto/bio/bio_print.c:587:46: error: SSE register return with SSE disabled
Fix that by using EFIAPI calling convention when compiling for UEFI.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17442)
GCC's __ARMEL__ and __ARMEB__ defines denote little- and big-endian arm,
respectively. They are not defined on aarch64, which instead use
__AARCH64EL__ and __AARCH64EB__.
However, OpenSSL's assembly originally used the 32-bit defines on both
platforms and even define __ARMEL__ and __ARMEB__ in arm_arch.h. This is
less portable and can even interfere with other headers, which use
__ARMEL__ to detect little-endian arm.
Over time, the aarch64 assembly has switched to the correct defines,
such as in 32bbb62ea6. This commit
finishes the job: poly1305-armv8.pl needed a fix and the dual-arch
armx.pl files get one more transform to convert from 32-bit to 64-bit.
(There is an even more official endianness detector, __ARM_BIG_ENDIAN in
the Arm C Language Extensions. But I've stuck with the GCC ones here as
that would be a larger change.)
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
(Merged from https://github.com/openssl/openssl/pull/17373)
Use clang -Wconditional-uninitialized to build, the error "initialize
the variable 'buffer_size' to silence this warning" will be reported.
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17375)
See discussion in #17088, where the real solution was postponed to 4.0.
This preliminarily fixes the issue that the HTTP(S) proxy environment vars
were neglected when determining whether a proxy should be used for HTTPS.
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17310)
On IA64 the use of setjmp()/ longjmp() does not properly save the
state of the register stack engine (RSE) and requires extra care.
The use of it in the async interface led to a failure in the
test_async.t test since its introduction in 1.1.0 series.
Instead of properly adding the needed assembly bits here use the
swapcontext() function which properly saves the whole context.
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17370)
OSSL_trace_end() should validate that the category it has been passed
by the caler is valid, and return immediately if not.
Fixes#17353
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Matthias St. Pierre <Matthias.St.Pierre@ncp-e.com>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17371)
The UI method always adds NUL termination and we need to
compensate for that when using it from a pem_password_cb
because the buffer used in pem_password_cb does not account
for that and the returned password should be able fill the
whole buffer.
Fixes#16601
Reviewed-by: Ben Kaduk <kaduk@mit.edu>
(Merged from https://github.com/openssl/openssl/pull/17320)