Commit Graph

34 Commits

Author SHA1 Message Date
Xiaokang Qian
513e103f14 Apply aes-gcm unroll8+eor3 optimization patch to Neoverse V2
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/20184)
2023-02-08 16:54:57 +01:00
Xu Yizhou
88c53cf17d Apply SM4 optimization patch to Kunpeng-920
In the ideal scenario, performance can reach up to 2.2X.
But in single block input or CFB/OFB mode, CBC encryption,
performance could drop about 50%.

Perf data on Kunpeng-920 2.6GHz hardware, before and after optimization:

Before:
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
SM4-CTR 75318.96k 79089.62k 79736.15k 79934.12k 80325.44k 80068.61k
SM4-ECB 80211.39k 84998.36k 86472.28k 87024.93k 87144.80k 86862.51k
SM4-GCM 72156.19k 82012.08k 83848.02k 84322.65k 85103.65k 84896.43k
SM4-CBC 77956.13k 80638.81k 81976.17k 81606.31k 82078.91k 81750.70k
SM4-CFB 78078.20k 81054.87k 81841.07k 82396.38k 82203.99k 82236.76k
SM4-OFB 78282.76k 82074.03k 82765.74k 82989.06k 83200.68k 83487.17k

After:
type    16 bytes  64 bytes   256 bytes  1024 bytes 8192 bytes 16384 bytes
SM4-CTR 35678.07k 120687.25k 176632.27k 177192.62k 177586.18k 178295.18k
SM4-ECB 35540.32k 122628.07k 175067.90k 178007.84k 178298.88k 178328.92k
SM4-GCM 34215.75k 116720.50k 170275.16k 171770.88k 172714.21k 172272.30k
SM4-CBC 35645.60k 36544.86k  36515.50k  36732.15k  36618.24k  36629.16k
SM4-CFB 35528.14k 35690.99k  35954.86k  35843.42k  35809.18k  35809.96k
SM4-OFB 35563.55k 35853.56k  35963.05k  36203.52k  36233.85k  36307.82k

Signed-off-by: Xu Yizhou <xuyizhou1@huawei.com>

Reviewed-by: Hugo Landau <hlandau@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/19547)
2022-11-02 08:45:10 +11:00
Tom Cosgrove
1efd8533e1 Fix aarch64 signed bit shift issue found by UBSAN
Also fix conditional branch out of range when using sanitisers.

Fixes #18813

Signed-off-by: Tom Cosgrove <tom.cosgrove@arm.com>

Change-Id: Ic543885091ed3ef2ddcbe21de0a4ac0bca1e2494

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/18816)
2022-07-19 12:14:33 +02:00
XiaokangQian
9224a407f9 Apply the AES-GCM unroll8 optimization patch to Neoverse N2
The loop unrolling and use of EOR3 can improve N2 performance
by up to 32%

Signed-off-by: XiaokangQian <xiaokang.qian@arm.com>

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/18350)
2022-05-23 11:05:51 +10:00
Matt Caswell
fecb3aae22 Update copyright year
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Release: yes
2022-05-03 13:34:51 +01:00
Daniel Hu
b1b2146ded Acceleration of chacha20 on aarch64 by SVE
This patch accelerates chacha20 on aarch64 when Scalable Vector Extension
(SVE) is supported by CPU. Tested on modern micro-architecture with
256-bit SVE, it has the potential to improve performance up to 20%

The solution takes a hybrid approach. SVE will handle multi-blocks that fit
the SVE vector length, with Neon/Scalar to process any tail data

Test result:
With SVE
type            1024 bytes   8192 bytes  16384 bytes
ChaCha20        1596208.13k  1650010.79k  1653151.06k

Without SVE (by Neon/Scalar)
type            1024 bytes   8192 bytes  16384 bytes
chacha20        1355487.91k  1372678.83k  1372662.44k

The assembly code has been reviewed internally by
ARM engineer Fangming.Fang@arm.com

Signed-off-by: Daniel Hu <Daniel.Hu@arm.com>

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17916)
2022-05-03 14:37:46 +10:00
XiaokangQian
954f45ba4c Optimize AES-GCM for uarchs with unroll and new instructions
Increase the block numbers to 8 for every iteration.  Increase the hash
table capacity.  Make use of EOR3 instruction to improve the performance.

This can improve performance 25-40% on out-of-order microarchitectures
with a large number of fast execution units, such as Neoverse V1.  We also
see 20-30% performance improvements on other architectures such as the M1.

Assembly code reviewd by Tom Cosgrove (ARM).

Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15916)
2022-01-25 14:30:00 +11:00
Daniel Hu
15b7175f55 SM4 optimization for ARM by HW instruction
This patch implements the SM4 optimization for ARM processor,
using SM4 HW instruction, which is an optional feature of
crypto extension for aarch64 V8.

Tested on some modern ARM micro-architectures with SM4 support, the
performance uplift can be observed around 8X~40X over existing
C implementation in openssl. Algorithms that can be parallelized
(like CTR, ECB, CBC decryption) are on higher end, with algorithm
like CBC encryption on lower end (due to inter-block dependency)

Perf data on Yitian-710 2.75GHz hardware, before and after optimization:

Before:
  type      16 bytes     64 bytes    256 bytes    1024 bytes   8192 bytes  16384 bytes
  SM4-CTR  105787.80k   107837.87k   108380.84k   108462.08k   108549.46k   108554.92k
  SM4-ECB  111924.58k   118173.76k   119776.00k   120093.70k   120264.02k   120274.94k
  SM4-CBC  106428.09k   109190.98k   109674.33k   109774.51k   109827.41k   109827.41k

After (7.4x - 36.6x faster):
  type      16 bytes     64 bytes    256 bytes    1024 bytes   8192 bytes  16384 bytes
  SM4-CTR  781979.02k  2432994.28k  3437753.86k  3834177.88k  3963715.58k  3974556.33k
  SM4-ECB  937590.69k  2941689.02k  3945751.81k  4328655.87k  4459181.40k  4468692.31k
  SM4-CBC  890639.88k  1027746.58k  1050621.78k  1056696.66k  1058613.93k  1058701.31k

Signed-off-by: Daniel Hu <Daniel.Hu@arm.com>

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17455)
2022-01-18 11:52:14 +01:00
fangming.fang
71396cd048 SM3 acceleration with SM3 hardware instruction on aarch64
SM3 hardware instruction is optional feature of crypto extension for
aarch64. This implementation accelerates SM3 via SM3 instructions. For
the platform not supporting SM3 instruction, the original C
implementation still works. Thanks to AliBaba for testing and reporting
the following perf numbers for Yitian710:

Benchmark on T-Head Yitian-710 2.75GHz:

Before:
type  16 bytes     64 bytes    256 bytes    1024 bytes   8192 bytes   16384 bytes
sm3   49297.82k   121062.63k   223106.05k   283371.52k   307574.10k   309400.92k

After (33% - 74% faster):
type  16 bytes     64 bytes    256 bytes    1024 bytes   8192 bytes   16384 bytes
sm3   65640.01k   179121.79k   359854.59k   481448.96k   534055.59k   538274.47k

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17454)
2022-01-14 11:40:05 +01:00
David Benjamin
40c24d74de Don't use __ARMEL__/__ARMEB__ in aarch64 assembly
GCC's __ARMEL__ and __ARMEB__ defines denote little- and big-endian arm,
respectively. They are not defined on aarch64, which instead use
__AARCH64EL__ and __AARCH64EB__.

However, OpenSSL's assembly originally used the 32-bit defines on both
platforms and even define __ARMEL__ and __ARMEB__ in arm_arch.h. This is
less portable and can even interfere with other headers, which use
__ARMEL__ to detect little-endian arm.

Over time, the aarch64 assembly has switched to the correct defines,
such as in 32bbb62ea6. This commit
finishes the job: poly1305-armv8.pl needed a fix and the dual-arch
armx.pl files get one more transform to convert from 32-bit to 64-bit.

(There is an even more official endianness detector, __ARM_BIG_ENDIAN in
the Arm C Language Extensions. But I've stuck with the GCC ones here as
that would be a larger change.)

Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
(Merged from https://github.com/openssl/openssl/pull/17373)
2022-01-09 07:40:44 +01:00
Orr Toledano
efa1f22483 Add Arm Assembly (aarch64) support for RNG
Include aarch64 asm instructions for random number generation using the
RNDR and RNDRRS instructions. Provide detection functions for RNDR and
RNDRRS getauxval.

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15361)
2021-12-16 12:38:09 +01:00
Russ Butler
19e277dd19 aarch64: support BTI and pointer authentication in assembly
This change adds optional support for
- Armv8.3-A Pointer Authentication (PAuth) and
- Armv8.5-A Branch Target Identification (BTI)
features to the perl scripts.

Both features can be enabled with additional compiler flags.
Unless any of these are enabled explicitly there is no code change at
all.

The extensions are briefly described below. Please read the appropriate
chapters of the Arm Architecture Reference Manual for the complete
specification.

Scope
-----

This change only affects generated assembly code.

Armv8.3-A Pointer Authentication
--------------------------------

Pointer Authentication extension supports the authentication of the
contents of registers before they are used for indirect branching
or load.

PAuth provides a probabilistic method to detect corruption of register
values. PAuth signing instructions generate a Pointer Authentication
Code (PAC) based on the value of a register, a seed and a key.
The generated PAC is inserted into the original value in the register.
A PAuth authentication instruction recomputes the PAC, and if it matches
the PAC in the register, restores its original value. In case of a
mismatch, an architecturally unmapped address is generated instead.

With PAuth, mitigation against ROP (Return-oriented Programming) attacks
can be implemented. This is achieved by signing the contents of the
link-register (LR) before it is pushed to stack. Once LR is popped,
it is authenticated. This way a stack corruption which overwrites the
LR on the stack is detectable.

The PAuth extension adds several new instructions, some of which are not
recognized by older hardware. To support a single codebase for both pre
Armv8.3-A targets and newer ones, only NOP-space instructions are added
by this patch. These instructions are treated as NOPs on hardware
which does not support Armv8.3-A. Furthermore, this patch only considers
cases where LR is saved to the stack and then restored before branching
to its content. There are cases in the code where LR is pushed to stack
but it is not used later. We do not address these cases as they are not
affected by PAuth.

There are two keys available to sign an instruction address: A and B.
PACIASP and PACIBSP only differ in the used keys: A and B, respectively.
The keys are typically managed by the operating system.

To enable generating code for PAuth compile with
-mbranch-protection=<mode>:

- standard or pac-ret: add PACIASP and AUTIASP, also enables BTI
  (read below)
- pac-ret+b-key: add PACIBSP and AUTIBSP

Armv8.5-A Branch Target Identification
--------------------------------------

Branch Target Identification features some new instructions which
protect the execution of instructions on guarded pages which are not
intended branch targets.

If Armv8.5-A is supported by the hardware, execution of an instruction
changes the value of PSTATE.BTYPE field. If an indirect branch
lands on a guarded page the target instruction must be one of the
BTI <jc> flavors, or in case of a direct call or jump it can be any
other instruction. If the target instruction is not compatible with the
value of PSTATE.BTYPE a Branch Target Exception is generated.

In short, indirect jumps are compatible with BTI <j> and <jc> while
indirect calls are compatible with BTI <c> and <jc>. Please refer to the
specification for the details.

Armv8.3-A PACIASP and PACIBSP are implicit branch target
identification instructions which are equivalent with BTI c or BTI jc
depending on system register configuration.

BTI is used to mitigate JOP (Jump-oriented Programming) attacks by
limiting the set of instructions which can be jumped to.

BTI requires active linker support to mark the pages with BTI-enabled
code as guarded. For ELF64 files BTI compatibility is recorded in the
.note.gnu.property section. For a shared object or static binary it is
required that all linked units support BTI. This means that even a
single assembly file without the required note section turns-off BTI
for the whole binary or shared object.

The new BTI instructions are treated as NOPs on hardware which does
not support Armv8.5-A or on pages which are not guarded.

To insert this new and optional instruction compile with
-mbranch-protection=standard (also enables PAuth) or +bti.

When targeting a guarded page from a non-guarded page, weaker
compatibility restrictions apply to maintain compatibility between
legacy and new code. For detailed rules please refer to the Arm ARM.

Compiler support
----------------

Compiler support requires understanding '-mbranch-protection=<mode>'
and emitting the appropriate feature macros (__ARM_FEATURE_BTI_DEFAULT
and __ARM_FEATURE_PAC_DEFAULT). The current state is the following:

-------------------------------------------------------
| Compiler | -mbranch-protection | Feature macros     |
+----------+---------------------+--------------------+
| clang    | 9.0.0               | 11.0.0             |
+----------+---------------------+--------------------+
| gcc      | 9                   | expected in 10.1+  |
-------------------------------------------------------

Available Platforms
------------------

Arm Fast Model and QEMU support both extensions.

https://developer.arm.com/tools-and-software/simulation-models/fast-models
https://www.qemu.org/

Implementation Notes
--------------------

This change adds BTI landing pads even to assembly functions which are
likely to be directly called only. In these cases, landing pads might
be superfluous depending on what code the linker generates.
Code size and performance impact for these cases would be negligible.

Interaction with C code
-----------------------

Pointer Authentication is a per-frame protection while Branch Target
Identification can be turned on and off only for all code pages of a
whole shared object or static binary. Because of these properties if
C/C++ code is compiled without any of the above features but assembly
files support any of them unconditionally there is no incompatibility
between the two.

Useful Links
------------

To fully understand the details of both PAuth and BTI it is advised to
read the related chapters of the Arm Architecture Reference Manual
(Arm ARM):
https://developer.arm.com/documentation/ddi0487/latest/

Additional materials:

"Providing protection for complex software"
https://developer.arm.com/architectures/learn-the-architecture/providing-protection-for-complex-software

Arm Compiler Reference Guide Version 6.14: -mbranch-protection
https://developer.arm.com/documentation/101754/0614/armclang-Reference/armclang-Command-line-Options/-mbranch-protection?lang=en

Arm C Language Extensions (ACLE)
https://developer.arm.com/docs/101028/latest

Addional Notes
--------------

This patch is a copy of the work done by Tamas Petz in boringssl. It
contains the changes from the following commits:

aarch64: support BTI and pointer authentication in assembly
    Change-Id: I4335f92e2ccc8e209c7d68a0a79f1acdf3aeb791
    URL: https://boringssl-review.googlesource.com/c/boringssl/+/42084
aarch64: Improve conditional compilation
    Change-Id: I14902a64e5f403c2b6a117bc9f5fb1a4f4611ebf
    URL: https://boringssl-review.googlesource.com/c/boringssl/+/43524
aarch64: Fix name of gnu property note section
    Change-Id: I6c432d1c852129e9c273f6469a8b60e3983671ec
    URL: https://boringssl-review.googlesource.com/c/boringssl/+/44024

Change-Id: I2d95ebc5e4aeb5610d3b226f9754ee80cf74a9af

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/16674)
2021-10-01 09:35:38 +02:00
Matt Caswell
0789c7d834 Update copyright year
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15381)
2021-05-20 14:22:33 +01:00
Xiaofei Bai
2bdec3b037 crypto/arm_arch.h: add a variable declaration
Add this variable declaration to prevent
"-Werror,-Wmissing-variable-declarations" error from compiler.
This error currently only happens on clang.

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15240)
2021-05-14 00:03:30 +10:00
Fangming.Fang
5ea64b456b Read MIDR_EL1 system register on aarch64
MIDR_EL1 system register exposes microarchitecture information so that
people can make micro-arch related optimization such as exposing as
much instruction level parallelism as possible.

MIDR_EL1 register can be read only if HWCAP_CPUID feature is supported.

Change-Id: Iabb8a36c5d31b184dba6399f378598058d394d4e

Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org>
(Merged from https://github.com/openssl/openssl/pull/11744)
2020-12-09 16:17:17 +01:00
Dr. Matthias St. Pierre
ae4186b004 Fix header file include guard names
Make the include guards consistent by renaming them systematically according
to the naming conventions below

For the public header files (in the 'include/openssl' directory), the guard
names try to match the path specified in the include directives, with
all letters converted to upper case and '/' and '.' replaced by '_'. For the
private header files files, an extra 'OSSL_' is added as prefix.

Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/9333)
2019-09-28 20:26:36 +02:00
Richard Levitte
0e9725bcb9 Following the license change, modify the boilerplates in crypto/
[skip ci]

Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/7827)
2018-12-06 15:32:17 +01:00
Bernd Edlinger
0e0f8116e2 Fix building linux-armv4 with --strict-warnings
Reviewed-by: Rich Salz <rsalz@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6026)
2018-04-20 15:49:33 +02:00
Matt Caswell
6738bf1417 Update copyright year
Reviewed-by: Richard Levitte <levitte@openssl.org>
2018-02-13 13:59:25 +00:00
Andy Polyakov
77f3612e2b crypto/armcap.c: detect hardware-assisted SHA512 support.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2018-02-12 14:04:53 +01:00
Josh Soref
46f4e1bec5 Many spelling fixes/typo's corrected.
Around 138 distinct errors found and fixed; thanks!

Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/3459)
2017-11-11 19:03:10 -05:00
Rich Salz
d2e9e32018 Copyright consolidation 07/10
Reviewed-by: Richard Levitte <levitte@openssl.org>
2016-05-17 14:51:26 -04:00
Matt Caswell
0f113f3ee4 Run util/openssl-format-source -v -c .
Reviewed-by: Tim Hudson <tjh@openssl.org>
2015-01-22 09:20:09 +00:00
Andy Polyakov
c1669e1c20 Remove inconsistency in ARM support.
This facilitates "universal" builds, ones that target multiple
architectures, e.g. ARMv5 through ARMv7. See commentary in
Configure for details.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
2015-01-04 23:45:08 +01:00
Dr. Stephen Henson
c603c723ce Remove OPENSSL_FIPSCANISTER code.
OPENSSL_FIPSCANISTER is only set if the fips module is being built
(as opposed to being used). Since the fips module wont be built in
master this is redundant.
Reviewed-by: Tim Hudson <tjh@openssl.org>
2014-12-08 13:25:16 +00:00
Andy Polyakov
e8d93e342b Add linux-aarch64 taget.
armcap.c is shared between 32- and 64-bit builds and features link-time
detection of getauxval.

Submitted by: Ard Biesheuvel.
2014-06-01 17:21:06 +02:00
Andy Polyakov
4afa9f033d crypto/armcap.c: detect ARMv8 capabilities [in 32-bit build]. 2014-05-04 10:55:49 +02:00
Andy Polyakov
3c075bf07f arm_arch.h: allow to specify __ARM_ARCH__ elsewhere. 2011-11-09 20:08:44 +00:00
Andy Polyakov
3ee4d41fe1 arm_arch.h: add missing pre-defined macro, __ARM_ARCH_5TEJ__. 2011-10-19 18:57:03 +00:00
Dr. Stephen Henson
1d5121552d Make sure OPENSSL_FIPSCANISTER is visible to ARM assembly language files. 2011-07-22 14:20:50 +00:00
Andy Polyakov
87873f4328 ARM assembler pack: add platform run-time detection. 2011-07-17 17:40:29 +00:00
Dr. Stephen Henson
ce02589259 Now the FIPS capable OpenSSL is available simplify the various FIPS test
build options.

All fispcanisterbuild builds only build fipscanister.o and include symbol
renaming.

Move all renamed symbols to fipssyms.h

Update README.FIPS
2011-06-22 12:30:18 +00:00
Dr. Stephen Henson
a95bbadb57 Include fipssyms.h for ARM builds to translate symbols.
Translate arm symbol to fips_*.
2011-05-04 14:16:03 +00:00
Andy Polyakov
e512375186 ARM assembler pack: add missing arm_arch.h. 2011-04-01 21:09:09 +00:00