Commit Graph

387 Commits

Author SHA1 Message Date
Tom Cosgrove
7b508cd1e1 Ensure there's only one copy of OPENSSL_armcap_P in libcrypto.a
Change-Id: Ia94e528a2d55934435de6a2949784c52eb38d82f

Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/20621)
2023-03-29 12:21:31 +02:00
Tomas Mraz
e5dd732749 rsaz-*k-avx512.pl: fix wrong name of avx512 flag variable
Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com>
Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/20519)

(cherry picked from commit d4765408c7)
2023-03-17 11:25:29 +01:00
Richard Levitte
523e057730 Fix LLVM vs Apple LLVM version numbering confusion, for $avx512ifma
Apple LLVM has a different version numbering scheme than upstream LLVM.
That makes for quite a bit of confusion.

https://en.wikipedia.org/wiki/Xcode#Toolchain_versions to the rescue,
they have collected quite a lot of useful data.

This change is concentrated around the `$avx512ifma` flag

Fixes #16670 for the master branch

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/19361)
2022-10-13 15:45:46 +02:00
Rohan McLure
2f1112b22a Fix unrolled montgomery multiplication for POWER9
In the reference C implementation in bn_asm.c, tp[num + 1] contains the
carry bit for accumulations into tp[num]. tp[num + 1] is only ever
assigned, never itself incremented.

Reviewed-by: Hugo Landau <hlandau@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/18883)
2022-08-17 13:00:50 +02:00
Rohan McLure
eae70100fa Revert "Revert "bn: Add fixed length (n=6), unrolled PPC Montgomery Multiplication""
This reverts commit 712d9cc90e.

Reviewed-by: Hugo Landau <hlandau@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/18883)
2022-08-17 13:00:50 +02:00
Tomas Mraz
0ae365e1f8 Always end BN_mod_exp_mont_consttime with normal Montgomery reduction.
This partially fixes a bug where, on x86_64, BN_mod_exp_mont_consttime
would sometimes return m, the modulus, when it should have returned
zero. Thanks to Guido Vranken for reporting it. It is only a partial fix
because the same bug also exists in the "rsaz" codepath.

The bug only affects zero outputs (with non-zero inputs), so we believe
it has no security impact on our cryptographic functions.

The fx is to delete lowercase bn_from_montgomery altogether, and have the
mont5 path use the same BN_from_montgomery ending as the non-mont5 path.
This only impacts the final step of the whole exponentiation and has no
measurable perf impact.

See the original BoringSSL commit
https://boringssl.googlesource.com/boringssl/+/13c9d5c69d04485a7a8840c12185c832026c8315
for further analysis.

Original-author: David Benjamin <davidben@google.com>

Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/18510)
2022-06-16 15:22:35 +02:00
Tomas Mraz
712d9cc90e Revert "bn: Add fixed length (n=6), unrolled PPC Montgomery Multiplication"
This reverts commit 0d40ca47bd.

It was found that the computation produces incorrect results in some
cases.

Reviewed-by: Dmitry Belyavskiy <beldmit@gmail.com>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/18512)
2022-06-15 09:54:02 +02:00
Matt Caswell
fecb3aae22 Update copyright year
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Release: yes
2022-05-03 13:34:51 +01:00
Dimitris Apostolou
e304aa87b3 Fix typos
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17392)
2022-01-05 12:37:20 +01:00
Bernd Edlinger
336923c0c8 Fix a carry overflow bug in bn_sqr_comba4/8 for mips 32-bit targets
bn_sqr_comba8 does for instance compute a wrong result for the value:
a=0x4aaac919 62056c84 fba7334e 1a6be678 022181ba fd3aa878 899b2346 ee210f45

The correct result is:
r=0x15c72e32 605a3061 d11b1012 3c187483 6df96999 bd0c22ba d3e7d437 4724a82f
    912c5e61 6a187efe 8f7c47fc f6945fe5 75be8e3d 97ed17d4 7950b465 3cb32899

but the actual result was:
r=0x15c72e32 605a3061 d11b1012 3c187483 6df96999 bd0c22ba d3e7d437 4724a82f
    912c5e61 6a187efe 8f7c47fc f6945fe5 75be8e3c 97ed17d4 7950b465 3cb32899

so the forth word of the result was 0x75be8e3c but should have been
0x75be8e3d instead.

Likewise bn_sqr_comba4 has an identical bug for the same value as well:
a=0x022181ba fd3aa878 899b2346 ee210f45

correct result:
r=0x00048a69 9fe82f8b 62bd2ed1 88781335 75be8e3d 97ed17d4 7950b465 3cb32899

wrong result:
r=0x00048a69 9fe82f8b 62bd2ed1 88781335 75be8e3c 97ed17d4 7950b465 3cb32899

Fortunately the bn_mul_comba4/8 code paths are not affected.

Also the mips64 target does in fact not handle the carry propagation
correctly.

Example:
a=0x4aaac91900000000 62056c8400000000 fba7334e00000000 1a6be67800000000
    022181ba00000000 fd3aa87800000000 899b234635dad283 ee210f4500000001

correct result:
r=0x15c72e32272c4471 392debf018c679c8 b85496496bf8254c d0204f36611e2be1
    0cdb3db8f3c081d8 c94ba0e1bacc5061 191b83d47ff929f6 5be0aebfc13ae68d
    3eea7a7fdf2f5758 42f7ec656cab3cb5 6a28095be34756f2 64f24687bf37de06
    2822309cd1d292f9 6fa698c972372f09 771e97d3a868cda0 dc421e8a00000001

wrong result:
r=0x15c72e32272c4471 392debf018c679c8 b85496496bf8254c d0204f36611e2be1
    0cdb3db8f3c081d8 c94ba0e1bacc5061 191b83d47ff929f6 5be0aebfc13ae68d
    3eea7a7fdf2f5758 42f7ec656cab3cb5 6a28095be34756f2 64f24687bf37de06
    2822309cd1d292f8 6fa698c972372f09 771e97d3a868cda0 dc421e8a00000001

Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17258)
2021-12-14 06:43:04 +01:00
Andrey Matyukov
f87b4c4ea6 Dual 1536/2048-bit exponentiation optimization for Intel IceLake CPU
It uses AVX512_IFMA + AVX512_VL (with 256-bit wide registers) ISA to
keep lower power license.

Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/14908)
2021-11-19 12:50:34 +10:00
x2018
1287dabd0b fix some code with obvious wrong coding style
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/16918)
2021-10-28 13:10:46 +10:00
Russ Butler
19e277dd19 aarch64: support BTI and pointer authentication in assembly
This change adds optional support for
- Armv8.3-A Pointer Authentication (PAuth) and
- Armv8.5-A Branch Target Identification (BTI)
features to the perl scripts.

Both features can be enabled with additional compiler flags.
Unless any of these are enabled explicitly there is no code change at
all.

The extensions are briefly described below. Please read the appropriate
chapters of the Arm Architecture Reference Manual for the complete
specification.

Scope
-----

This change only affects generated assembly code.

Armv8.3-A Pointer Authentication
--------------------------------

Pointer Authentication extension supports the authentication of the
contents of registers before they are used for indirect branching
or load.

PAuth provides a probabilistic method to detect corruption of register
values. PAuth signing instructions generate a Pointer Authentication
Code (PAC) based on the value of a register, a seed and a key.
The generated PAC is inserted into the original value in the register.
A PAuth authentication instruction recomputes the PAC, and if it matches
the PAC in the register, restores its original value. In case of a
mismatch, an architecturally unmapped address is generated instead.

With PAuth, mitigation against ROP (Return-oriented Programming) attacks
can be implemented. This is achieved by signing the contents of the
link-register (LR) before it is pushed to stack. Once LR is popped,
it is authenticated. This way a stack corruption which overwrites the
LR on the stack is detectable.

The PAuth extension adds several new instructions, some of which are not
recognized by older hardware. To support a single codebase for both pre
Armv8.3-A targets and newer ones, only NOP-space instructions are added
by this patch. These instructions are treated as NOPs on hardware
which does not support Armv8.3-A. Furthermore, this patch only considers
cases where LR is saved to the stack and then restored before branching
to its content. There are cases in the code where LR is pushed to stack
but it is not used later. We do not address these cases as they are not
affected by PAuth.

There are two keys available to sign an instruction address: A and B.
PACIASP and PACIBSP only differ in the used keys: A and B, respectively.
The keys are typically managed by the operating system.

To enable generating code for PAuth compile with
-mbranch-protection=<mode>:

- standard or pac-ret: add PACIASP and AUTIASP, also enables BTI
  (read below)
- pac-ret+b-key: add PACIBSP and AUTIBSP

Armv8.5-A Branch Target Identification
--------------------------------------

Branch Target Identification features some new instructions which
protect the execution of instructions on guarded pages which are not
intended branch targets.

If Armv8.5-A is supported by the hardware, execution of an instruction
changes the value of PSTATE.BTYPE field. If an indirect branch
lands on a guarded page the target instruction must be one of the
BTI <jc> flavors, or in case of a direct call or jump it can be any
other instruction. If the target instruction is not compatible with the
value of PSTATE.BTYPE a Branch Target Exception is generated.

In short, indirect jumps are compatible with BTI <j> and <jc> while
indirect calls are compatible with BTI <c> and <jc>. Please refer to the
specification for the details.

Armv8.3-A PACIASP and PACIBSP are implicit branch target
identification instructions which are equivalent with BTI c or BTI jc
depending on system register configuration.

BTI is used to mitigate JOP (Jump-oriented Programming) attacks by
limiting the set of instructions which can be jumped to.

BTI requires active linker support to mark the pages with BTI-enabled
code as guarded. For ELF64 files BTI compatibility is recorded in the
.note.gnu.property section. For a shared object or static binary it is
required that all linked units support BTI. This means that even a
single assembly file without the required note section turns-off BTI
for the whole binary or shared object.

The new BTI instructions are treated as NOPs on hardware which does
not support Armv8.5-A or on pages which are not guarded.

To insert this new and optional instruction compile with
-mbranch-protection=standard (also enables PAuth) or +bti.

When targeting a guarded page from a non-guarded page, weaker
compatibility restrictions apply to maintain compatibility between
legacy and new code. For detailed rules please refer to the Arm ARM.

Compiler support
----------------

Compiler support requires understanding '-mbranch-protection=<mode>'
and emitting the appropriate feature macros (__ARM_FEATURE_BTI_DEFAULT
and __ARM_FEATURE_PAC_DEFAULT). The current state is the following:

-------------------------------------------------------
| Compiler | -mbranch-protection | Feature macros     |
+----------+---------------------+--------------------+
| clang    | 9.0.0               | 11.0.0             |
+----------+---------------------+--------------------+
| gcc      | 9                   | expected in 10.1+  |
-------------------------------------------------------

Available Platforms
------------------

Arm Fast Model and QEMU support both extensions.

https://developer.arm.com/tools-and-software/simulation-models/fast-models
https://www.qemu.org/

Implementation Notes
--------------------

This change adds BTI landing pads even to assembly functions which are
likely to be directly called only. In these cases, landing pads might
be superfluous depending on what code the linker generates.
Code size and performance impact for these cases would be negligible.

Interaction with C code
-----------------------

Pointer Authentication is a per-frame protection while Branch Target
Identification can be turned on and off only for all code pages of a
whole shared object or static binary. Because of these properties if
C/C++ code is compiled without any of the above features but assembly
files support any of them unconditionally there is no incompatibility
between the two.

Useful Links
------------

To fully understand the details of both PAuth and BTI it is advised to
read the related chapters of the Arm Architecture Reference Manual
(Arm ARM):
https://developer.arm.com/documentation/ddi0487/latest/

Additional materials:

"Providing protection for complex software"
https://developer.arm.com/architectures/learn-the-architecture/providing-protection-for-complex-software

Arm Compiler Reference Guide Version 6.14: -mbranch-protection
https://developer.arm.com/documentation/101754/0614/armclang-Reference/armclang-Command-line-Options/-mbranch-protection?lang=en

Arm C Language Extensions (ACLE)
https://developer.arm.com/docs/101028/latest

Addional Notes
--------------

This patch is a copy of the work done by Tamas Petz in boringssl. It
contains the changes from the following commits:

aarch64: support BTI and pointer authentication in assembly
    Change-Id: I4335f92e2ccc8e209c7d68a0a79f1acdf3aeb791
    URL: https://boringssl-review.googlesource.com/c/boringssl/+/42084
aarch64: Improve conditional compilation
    Change-Id: I14902a64e5f403c2b6a117bc9f5fb1a4f4611ebf
    URL: https://boringssl-review.googlesource.com/c/boringssl/+/43524
aarch64: Fix name of gnu property note section
    Change-Id: I6c432d1c852129e9c273f6469a8b60e3983671ec
    URL: https://boringssl-review.googlesource.com/c/boringssl/+/44024

Change-Id: I2d95ebc5e4aeb5610d3b226f9754ee80cf74a9af

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/16674)
2021-10-01 09:35:38 +02:00
Matt Caswell
54b4053130 Update copyright year
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/16176)
2021-07-29 15:41:35 +01:00
Tomas Mraz
52f7e44ec8 Split bignum code out of the sparcv9cap.c
Fixes #15978

Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/16019)
2021-07-15 09:33:04 +02:00
Martin Schwenke
e7370fa016 bn: Fix .size directive
This requires the text address.

Fixes #15923

Signed-off-by: Martin Schwenke <martin@meltin.net>

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15971)
2021-07-06 10:49:01 +10:00
Martin Schwenke
bf9b78214d bn: Use a basic branch-if-not-zero
Ancient toolchains fail the build because they don't like the hints,
newer ISAs recommend not using the hints and relying on dynamic branch
prediction.

Signed-off-by: Martin Schwenke <martin@meltin.net>

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15971)
2021-07-06 10:49:01 +10:00
Martin Schwenke
3f55ff6af5 bn: save/restore registers to/from stack
mtvsrd/mfvsrd are ISA >= 2.07 only, so this won't work for older
CPUs.

It would be possible to use this scheme only in the ISA >= 3.0
implementation.  However, in the future it may be possible for newer
ISAs to allow CPU implementations without a vector unit, so don't
bother.  The performance improvement versus using the stack was small
anyway.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15798)
2021-06-22 18:30:17 +10:00
Martin Schwenke
5b7f986457 bn: Switch $i to be unused r9
No need to save/restore because it is volatile.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15798)
2021-06-22 18:30:17 +10:00
Martin Schwenke
77bd294bd0 bn: Drop unnecessary use of r9
This is done in other versions due to the possibility of an early
return.  However, there is no early return here.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15798)
2021-06-22 18:30:17 +10:00
Martin Schwenke
450d980480 bn: Update .align pseudo-ops to match convention
64-bit alignment at the beginning of functions, 32-bit alignment for
loop targets.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15798)
2021-06-22 18:30:17 +10:00
Martin Schwenke
7f98eaab8b bn: Drop use of .p2align pseudo-op
This works on Linux but breaks the build on AIX.

Fixes #15748

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15798)
2021-06-22 18:30:17 +10:00
Pauli
e475d9a443 rsa: rename global rsaz_ sumbols so they are in namespace
The symbols renamed are:

RSAZ_amm52x20_x1_256
RSAZ_amm52x20_x2_256
rsaz_avx512ifma_eligible
RSAZ_mod_exp_avx512_x2

Additionally, RSAZ_exp52x20_x2_256 was made static

Reviewed-by: Shane Lontis <shane.lontis@oracle.com>
(Merged from https://github.com/openssl/openssl/pull/15445)
2021-05-27 09:35:50 +10:00
Pauli
190c029eab bn: rename extract_multiplier_2x20_win5 -> ossl_extract_multiplier_2x20_win5
Reviewed-by: Shane Lontis <shane.lontis@oracle.com>
(Merged from https://github.com/openssl/openssl/pull/15445)
2021-05-27 09:35:50 +10:00
Matt Caswell
d136db212e Remove some perl 5.14 use from rsaz-avx512.pl
The non-destructive substitution syntax (s///r), was introduced in perl
5.14. We need to support 5.10 and above.

Fixes #15378

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15379)
2021-05-24 11:23:58 +10:00
Matt Caswell
0789c7d834 Update copyright year
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15381)
2021-05-20 14:22:33 +01:00
fangming.fang
1064616012 Optimize RSA on armv8
Add Neon path for RSA on armv8, this optimisation targets to A72
and N1 that are ones of important cores of infrastructure. Other
platforms are not impacted.

A72
                        old		new             improved
rsa  512 sign		9828.6		9738.7		-1%
rsa  512 verify		121497.2	122367.7	1%
rsa 1024 sign		1818		1816.9		0%
rsa 1024 verify		37175.6		37161.3		0%
rsa 2048 sign		267.3		267.4		0%
rsa 2048 verify		10127.6		10119.6		0%
rsa 3072 sign		86.8		87		0%
rsa 3072 verify		4604.2		4956.2		8%
rsa 4096 sign		38.3		38.5		1%
rsa 4096 verify		2619.8		2972.1		13%
rsa 7680 sign		5		7		40%
rsa 7680 verify		756	     	929.4		23%
rsa 15360 sign		0.8	     	1		25%
rsa 15360 verify	190.4	 	246		29%

N1
                        old		new             improved
rsa  512 sign		12599.2		12596.7		0%
rsa  512 verify		148636.1	148656.2	0%
rsa 1024 sign		2150.6		2148.9		0%
rsa 1024 verify		42353.5		42265.2		0%
rsa 2048 sign		305.5		305.3		0%
rsa 2048 verify		11209.7		11205.2		0%
rsa 3072 sign		97.8		98.2		0%
rsa 3072 verify		5061.3		5990.7		18%
rsa 4096 sign		42.8		43		0%
rsa 4096 verify		2867.6		3509.8		22%
rsa 7680 sign		5.5		8.4		53%
rsa 7680 verify		823.5		1058.3		29%
rsa 15360 sign		0.9		1.1		22%
rsa 15360 verify	207		273.9		32%

CustomizedGitHooks: yes
Change-Id: I01c732cc429d793c4eb5ffd27ccd30ff9cebf8af
Jira: SECLIB-540

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/14761)
2021-05-09 23:15:07 +10:00
Martin Schwenke
0d40ca47bd bn: Add fixed length (n=6), unrolled PPC Montgomery Multiplication
Overall improvement for p384 of ~18% on Power 9, compared to existing
Power assembling code.  See comment in code for more details.

Multiple unrolled versions could be generated for values other than
6.  However, for TLS 1.3 the only other ECC algorithms that might use
Montgomery Multiplication are p256 and p521, but these have custom
algorithms that don't use Montgomery Multiplication.  Non-ECC
algorithms are likely to use larger key lengths that won't fit into
the n <= 10 length limitation of this code.

Signed-off-by: Amitay Isaacs <amitay@ozlabs.org>
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Signed-off-by: Martin Schwenke <martin@meltin.net>

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15175)
2021-05-08 20:39:29 +10:00
Matt Caswell
3c2bdd7df9 Update copyright year
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/14801)
2021-04-08 13:04:41 +01:00
Andrey Matyukov
788a72e92f Increase minimum clang version requirement for rsaz-avx512.pl
The reason is that clang-6 does not enable proper -march flags by
default for assembly modules (rsaz-avx512.pl requires avx512ifma, avx512dq,
avx512vl, avx512f). This is not true for newer clang versions - clang-7 and
further work ok.

For older clang versions users who want to get optimization from this
file, we have a note in the OPENSSL_ia32cap.pod with the workaround that
proposes having a wrapper that forces using external assembler.

Fixes #14668: clang-6.0.0 build broken

Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/14671)
2021-03-30 19:05:35 +02:00
Andrey Matyukov
b238e78fe8 Rearranged .pdata entries in rsaz-avx512.pl to make them properly ordered.
Fixes #14660: Windows linking error

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/14665)
2021-03-24 09:11:55 +00:00
Andrey Matyukov
c781eb1c63 Dual 1024-bit exponentiation optimization for Intel IceLake CPU
with AVX512_IFMA + AVX512_VL instructions, primarily for RSA CRT private key
operations. It uses 256-bit registers to avoid CPU frequency scaling issues.
The performance speedup for RSA2k signature on ICL is ~2x.

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/13750)
2021-03-22 09:48:00 +00:00
Jung-uk Kim
cd84d8832d Ignore vendor name in Clang version number.
For example, FreeBSD prepends "FreeBSD" to version string, e.g.,

FreeBSD clang version 11.0.0 (git@github.com:llvm/llvm-project.git llvmorg-11.0.0-rc2-0-g414f32a9e86)
Target: x86_64-unknown-freebsd13.0
Thread model: posix
InstalledDir: /usr/bin

This prevented us from properly detecting AVX support, etc.

CLA: trivial

Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Ben Kaduk <kaduk@mit.edu>
(Merged from https://github.com/openssl/openssl/pull/12725)
2020-08-27 20:27:26 -07:00
Matt Caswell
33388b44b6 Update copyright year
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/11616)
2020-04-23 13:55:52 +01:00
David Benjamin
a21314dbbc Also check for errors in x86_64-xlate.pl.
In https://github.com/openssl/openssl/pull/10883, I'd meant to exclude
the perlasm drivers since they aren't opening pipes and do not
particularly need it, but I only noticed x86_64-xlate.pl, so
arm-xlate.pl and ppc-xlate.pl got the change.

That seems to have been fine, so be consistent and also apply the change
to x86_64-xlate.pl. Checking for errors is generally a good idea.

Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: David Benjamin <davidben@google.com>
(Merged from https://github.com/openssl/openssl/pull/10930)
2020-02-17 12:17:53 +10:00
Dr. Matthias St. Pierre
7fa8bcfe43 Fix misspelling errors and typos reported by codespell
Fixes #10998

Reviewed-by: Shane Lontis <shane.lontis@oracle.com>
(Merged from https://github.com/openssl/openssl/pull/11000)
2020-02-06 17:01:00 +01:00
David Benjamin
32be631ca1 Do not silently truncate files on perlasm errors
If one of the perlasm xlate drivers crashes, OpenSSL's build will
currently swallow the error and silently truncate the output to however
far the driver got. This will hopefully fail to build, but better to
check such things.

Handle this by checking for errors when closing STDOUT (which is a pipe
to the xlate driver).

Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org>
(Merged from https://github.com/openssl/openssl/pull/10883)
2020-01-22 18:11:30 +01:00
Richard Levitte
9bb3e5fd87 For all assembler scripts where it matters, recognise clang > 9.x
Fixes #10853

Reviewed-by: Paul Dale <paul.dale@oracle.com>
(Merged from https://github.com/openssl/openssl/pull/10855)
2020-01-17 08:55:45 +01:00
Bernd Edlinger
013c2e8d1a Add some missing cfi frame info in rsaz-x86_64
Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
(Merged from https://github.com/openssl/openssl/pull/10652)
2019-12-20 23:05:42 +01:00
Bernd Edlinger
0190c52ab8 Add some missing cfi frame info in x86_64-mont5.pl
Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
(Merged from https://github.com/openssl/openssl/pull/10651)
2019-12-20 22:56:00 +01:00
Bernd Edlinger
8736f95381 Improve the overflow handling in rsaz_512_sqr
We have always a carry in %rcx or %rbx in range 0..2
from the previous stage, that is added to the result
of the 64-bit square, but the low nibble of any square
can only be 0, 1, 4, 9.

Therefore one "adcq $0, %rdx" can be removed.
Likewise in the ADX code we can remove one
"adcx %rbp, $out" since %rbp is always 0, and carry is
also zero, therefore that is a no-op.

Reviewed-by: Paul Dale <paul.dale@oracle.com>
(Merged from https://github.com/openssl/openssl/pull/10574)
2019-12-06 13:31:31 +01:00
Andy Polyakov
8c6f86c7c5 Fix an overflow bug in rsaz_512_sqr
There is an overflow bug in the x64_64 Montgomery squaring procedure used in
exponentiation with 512-bit moduli. No EC algorithms are affected. Analysis
suggests that attacks against 2-prime RSA1024, 3-prime RSA1536, and DSA1024 as a
result of this defect would be very difficult to perform and are not believed
likely. Attacks against DH512 are considered just feasible. However, for an
attack the target would have to re-use the DH512 private key, which is not
recommended anyway. Also applications directly using the low level API
BN_mod_exp may be affected if they use BN_FLG_CONSTTIME.

CVE-2019-1551

Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
(Merged from https://github.com/openssl/openssl/pull/10574)
2019-12-06 13:31:31 +01:00
Patrick Steuer
97a986f782 s390x assembly pack: fix bn_mul_comba4
Signed-off-by: Patrick Steuer <patrick.steuer@de.ibm.com>

Reviewed-by: Paul Dale <paul.dale@oracle.com>
(Merged from https://github.com/openssl/openssl/pull/10454)
2019-11-17 13:52:02 +01:00
Patrick Steuer
6f93f06135 s390x assembly pack: enable clang build
clang imposes some restrictions on the assembler code that
gcc does not.

Signed-off-by: Patrick Steuer <patrick.steuer@de.ibm.com>

Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/10330)
2019-11-03 11:25:31 +01:00
Dr. Matthias St. Pierre
706457b7bd Reorganize local header files
Apart from public and internal header files, there is a third type called
local header files, which are located next to source files in the source
directory. Currently, they have different suffixes like

  '*_lcl.h', '*_local.h', or '*_int.h'

This commit changes the different suffixes to '*_local.h' uniformly.

Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/9333)
2019-09-28 20:26:35 +02:00
Richard Levitte
1aa89a7a3a Unify all assembler file generators
They now generally conform to the following argument sequence:

    script.pl "$(PERLASM_SCHEME)" [ C preprocessor arguments ... ] \
              $(PROCESSOR) <output file>

However, in the spirit of being able to use these scripts manually,
they also allow for no argument, or for only the flavour, or for only
the output file.  This is done by only using the last argument as
output file if it's a file (it has an extension), and only using the
first argument as flavour if it isn't a file (it doesn't have an
extension).

While we're at it, we make all $xlate calls the same, i.e. the $output
argument is always quoted, and we always die on error when trying to
start $xlate.

There's a perl lesson in this, regarding operator priority...

This will always succeed, even when it fails:

    open FOO, "something" || die "ERR: $!";

The reason is that '||' has higher priority than list operators (a
function is essentially a list operator and gobbles up everything
following it that isn't lower priority), and since a non-empty string
is always true, so that ends up being exactly the same as:

    open FOO, "something";

This, however, will fail if "something" can't be opened:

    open FOO, "something" or die "ERR: $!";

The reason is that 'or' has lower priority that list operators,
i.e. it's performed after the 'open' call.

Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/9884)
2019-09-16 16:29:57 +02:00
Antoine Cœur
c2969ff6e7 Fix Typos
CLA: trivial

Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Matthias St. Pierre <Matthias.St.Pierre@ncp-e.com>
(Merged from https://github.com/openssl/openssl/pull/9288)
2019-07-02 14:22:29 +02:00
Hua Zhang
1b9c5f2e2f Fix compiling error for mips32r6 and mips64r6
There are some compiling errors for mips32r6 and mips64r6:

crypto/bn/bn-mips.S:56: Error: opcode not supported on this processor: mips2 (mips2) `mulu $1,$12,$7'
crypto/mips_arch.h: Assembler messages:
crypto/mips_arch.h:15: Error: junk at end of line, first unrecognized character is `&'

Signed-off-by: Hua Zhang <hua.zhang1974@hotmail.com>

Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/8464)
2019-03-19 07:36:21 +01:00
Richard Levitte
2864df8f9d Add missing '.text' in crypto/bn/asm/ppc.pl
Fixes #8495

Reviewed-by: Matthias St. Pierre <Matthias.St.Pierre@ncp-e.com>
(Merged from https://github.com/openssl/openssl/pull/8496)
2019-03-19 07:33:28 +01:00
David Benjamin
c0e8e5007b Fix some CFI issues in x86_64 assembly
The add/double shortcut in ecp_nistz256-x86_64.pl left one instruction
point that did not unwind, and the "slow" path in AES_cbc_encrypt was
not annotated correctly. For the latter, add
.cfi_{remember,restore}_state support to perlasm.

Next, fill in a bunch of functions that are missing no-op .cfi_startproc
and .cfi_endproc blocks. libunwind cannot unwind those stack frames
otherwise.

Finally, work around a bug in libunwind by not encoding rflags. (rflags
isn't a callee-saved register, so there's not much need to annotate it
anyway.)

These were found as part of ABI testing work in BoringSSL.

Reviewed-by: Richard Levitte <levitte@openssl.org>
GH: #8109
2019-02-17 23:39:51 +01:00