Commit Graph

22 Commits

Author SHA1 Message Date
Matt Caswell
fecb3aae22 Update copyright year
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Release: yes
2022-05-03 13:34:51 +01:00
Daniel Hu
b1b2146ded Acceleration of chacha20 on aarch64 by SVE
This patch accelerates chacha20 on aarch64 when Scalable Vector Extension
(SVE) is supported by CPU. Tested on modern micro-architecture with
256-bit SVE, it has the potential to improve performance up to 20%

The solution takes a hybrid approach. SVE will handle multi-blocks that fit
the SVE vector length, with Neon/Scalar to process any tail data

Test result:
With SVE
type            1024 bytes   8192 bytes  16384 bytes
ChaCha20        1596208.13k  1650010.79k  1653151.06k

Without SVE (by Neon/Scalar)
type            1024 bytes   8192 bytes  16384 bytes
chacha20        1355487.91k  1372678.83k  1372662.44k

The assembly code has been reviewed internally by
ARM engineer Fangming.Fang@arm.com

Signed-off-by: Daniel Hu <Daniel.Hu@arm.com>

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17916)
2022-05-03 14:37:46 +10:00
yavtuk
4d63eaf99b Prefer .inst rather than .long for probe instructions in arm64cpuid.pl
Fixes an issue disassembling the functions because the symtab contains
an attribute indicating the presence of data within them.

CLA: trivial

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/18086)
2022-04-29 08:49:07 +02:00
Tom Cosgrove
3a23f01268 aarch64: fix branch target indications in arm64cpuid.pl and keccak1600
Add missing AARCH64_VALID_CALL_TARGET to armv8_rng_probe(). Also add
these to the functions defined by gen_random(), and note that this Perl
sub prints the assembler out directly, not going via the $code xlate
mechanism (and therefore coming before the include of arm_arch.h). So
fix this too.

In KeccakF1600_int, AARCH64_SIGN_LINK_REGISTER functions as
AARCH64_VALID_CALL_TARGET on BTI-only builds, so it needs to come before
the 'adr' line.

Change-Id: If241efe71591c88253a3e36647ced00300c3c1a3

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17653)
2022-02-09 13:24:31 +11:00
XiaokangQian
954f45ba4c Optimize AES-GCM for uarchs with unroll and new instructions
Increase the block numbers to 8 for every iteration.  Increase the hash
table capacity.  Make use of EOR3 instruction to improve the performance.

This can improve performance 25-40% on out-of-order microarchitectures
with a large number of fast execution units, such as Neoverse V1.  We also
see 20-30% performance improvements on other architectures such as the M1.

Assembly code reviewd by Tom Cosgrove (ARM).

Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15916)
2022-01-25 14:30:00 +11:00
Daniel Hu
15b7175f55 SM4 optimization for ARM by HW instruction
This patch implements the SM4 optimization for ARM processor,
using SM4 HW instruction, which is an optional feature of
crypto extension for aarch64 V8.

Tested on some modern ARM micro-architectures with SM4 support, the
performance uplift can be observed around 8X~40X over existing
C implementation in openssl. Algorithms that can be parallelized
(like CTR, ECB, CBC decryption) are on higher end, with algorithm
like CBC encryption on lower end (due to inter-block dependency)

Perf data on Yitian-710 2.75GHz hardware, before and after optimization:

Before:
  type      16 bytes     64 bytes    256 bytes    1024 bytes   8192 bytes  16384 bytes
  SM4-CTR  105787.80k   107837.87k   108380.84k   108462.08k   108549.46k   108554.92k
  SM4-ECB  111924.58k   118173.76k   119776.00k   120093.70k   120264.02k   120274.94k
  SM4-CBC  106428.09k   109190.98k   109674.33k   109774.51k   109827.41k   109827.41k

After (7.4x - 36.6x faster):
  type      16 bytes     64 bytes    256 bytes    1024 bytes   8192 bytes  16384 bytes
  SM4-CTR  781979.02k  2432994.28k  3437753.86k  3834177.88k  3963715.58k  3974556.33k
  SM4-ECB  937590.69k  2941689.02k  3945751.81k  4328655.87k  4459181.40k  4468692.31k
  SM4-CBC  890639.88k  1027746.58k  1050621.78k  1056696.66k  1058613.93k  1058701.31k

Signed-off-by: Daniel Hu <Daniel.Hu@arm.com>

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17455)
2022-01-18 11:52:14 +01:00
fangming.fang
71396cd048 SM3 acceleration with SM3 hardware instruction on aarch64
SM3 hardware instruction is optional feature of crypto extension for
aarch64. This implementation accelerates SM3 via SM3 instructions. For
the platform not supporting SM3 instruction, the original C
implementation still works. Thanks to AliBaba for testing and reporting
the following perf numbers for Yitian710:

Benchmark on T-Head Yitian-710 2.75GHz:

Before:
type  16 bytes     64 bytes    256 bytes    1024 bytes   8192 bytes   16384 bytes
sm3   49297.82k   121062.63k   223106.05k   283371.52k   307574.10k   309400.92k

After (33% - 74% faster):
type  16 bytes     64 bytes    256 bytes    1024 bytes   8192 bytes   16384 bytes
sm3   65640.01k   179121.79k   359854.59k   481448.96k   534055.59k   538274.47k

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17454)
2022-01-14 11:40:05 +01:00
Orr Toledano
efa1f22483 Add Arm Assembly (aarch64) support for RNG
Include aarch64 asm instructions for random number generation using the
RNDR and RNDRRS instructions. Provide detection functions for RNDR and
RNDRRS getauxval.

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15361)
2021-12-16 12:38:09 +01:00
Russ Butler
19e277dd19 aarch64: support BTI and pointer authentication in assembly
This change adds optional support for
- Armv8.3-A Pointer Authentication (PAuth) and
- Armv8.5-A Branch Target Identification (BTI)
features to the perl scripts.

Both features can be enabled with additional compiler flags.
Unless any of these are enabled explicitly there is no code change at
all.

The extensions are briefly described below. Please read the appropriate
chapters of the Arm Architecture Reference Manual for the complete
specification.

Scope
-----

This change only affects generated assembly code.

Armv8.3-A Pointer Authentication
--------------------------------

Pointer Authentication extension supports the authentication of the
contents of registers before they are used for indirect branching
or load.

PAuth provides a probabilistic method to detect corruption of register
values. PAuth signing instructions generate a Pointer Authentication
Code (PAC) based on the value of a register, a seed and a key.
The generated PAC is inserted into the original value in the register.
A PAuth authentication instruction recomputes the PAC, and if it matches
the PAC in the register, restores its original value. In case of a
mismatch, an architecturally unmapped address is generated instead.

With PAuth, mitigation against ROP (Return-oriented Programming) attacks
can be implemented. This is achieved by signing the contents of the
link-register (LR) before it is pushed to stack. Once LR is popped,
it is authenticated. This way a stack corruption which overwrites the
LR on the stack is detectable.

The PAuth extension adds several new instructions, some of which are not
recognized by older hardware. To support a single codebase for both pre
Armv8.3-A targets and newer ones, only NOP-space instructions are added
by this patch. These instructions are treated as NOPs on hardware
which does not support Armv8.3-A. Furthermore, this patch only considers
cases where LR is saved to the stack and then restored before branching
to its content. There are cases in the code where LR is pushed to stack
but it is not used later. We do not address these cases as they are not
affected by PAuth.

There are two keys available to sign an instruction address: A and B.
PACIASP and PACIBSP only differ in the used keys: A and B, respectively.
The keys are typically managed by the operating system.

To enable generating code for PAuth compile with
-mbranch-protection=<mode>:

- standard or pac-ret: add PACIASP and AUTIASP, also enables BTI
  (read below)
- pac-ret+b-key: add PACIBSP and AUTIBSP

Armv8.5-A Branch Target Identification
--------------------------------------

Branch Target Identification features some new instructions which
protect the execution of instructions on guarded pages which are not
intended branch targets.

If Armv8.5-A is supported by the hardware, execution of an instruction
changes the value of PSTATE.BTYPE field. If an indirect branch
lands on a guarded page the target instruction must be one of the
BTI <jc> flavors, or in case of a direct call or jump it can be any
other instruction. If the target instruction is not compatible with the
value of PSTATE.BTYPE a Branch Target Exception is generated.

In short, indirect jumps are compatible with BTI <j> and <jc> while
indirect calls are compatible with BTI <c> and <jc>. Please refer to the
specification for the details.

Armv8.3-A PACIASP and PACIBSP are implicit branch target
identification instructions which are equivalent with BTI c or BTI jc
depending on system register configuration.

BTI is used to mitigate JOP (Jump-oriented Programming) attacks by
limiting the set of instructions which can be jumped to.

BTI requires active linker support to mark the pages with BTI-enabled
code as guarded. For ELF64 files BTI compatibility is recorded in the
.note.gnu.property section. For a shared object or static binary it is
required that all linked units support BTI. This means that even a
single assembly file without the required note section turns-off BTI
for the whole binary or shared object.

The new BTI instructions are treated as NOPs on hardware which does
not support Armv8.5-A or on pages which are not guarded.

To insert this new and optional instruction compile with
-mbranch-protection=standard (also enables PAuth) or +bti.

When targeting a guarded page from a non-guarded page, weaker
compatibility restrictions apply to maintain compatibility between
legacy and new code. For detailed rules please refer to the Arm ARM.

Compiler support
----------------

Compiler support requires understanding '-mbranch-protection=<mode>'
and emitting the appropriate feature macros (__ARM_FEATURE_BTI_DEFAULT
and __ARM_FEATURE_PAC_DEFAULT). The current state is the following:

-------------------------------------------------------
| Compiler | -mbranch-protection | Feature macros     |
+----------+---------------------+--------------------+
| clang    | 9.0.0               | 11.0.0             |
+----------+---------------------+--------------------+
| gcc      | 9                   | expected in 10.1+  |
-------------------------------------------------------

Available Platforms
------------------

Arm Fast Model and QEMU support both extensions.

https://developer.arm.com/tools-and-software/simulation-models/fast-models
https://www.qemu.org/

Implementation Notes
--------------------

This change adds BTI landing pads even to assembly functions which are
likely to be directly called only. In these cases, landing pads might
be superfluous depending on what code the linker generates.
Code size and performance impact for these cases would be negligible.

Interaction with C code
-----------------------

Pointer Authentication is a per-frame protection while Branch Target
Identification can be turned on and off only for all code pages of a
whole shared object or static binary. Because of these properties if
C/C++ code is compiled without any of the above features but assembly
files support any of them unconditionally there is no incompatibility
between the two.

Useful Links
------------

To fully understand the details of both PAuth and BTI it is advised to
read the related chapters of the Arm Architecture Reference Manual
(Arm ARM):
https://developer.arm.com/documentation/ddi0487/latest/

Additional materials:

"Providing protection for complex software"
https://developer.arm.com/architectures/learn-the-architecture/providing-protection-for-complex-software

Arm Compiler Reference Guide Version 6.14: -mbranch-protection
https://developer.arm.com/documentation/101754/0614/armclang-Reference/armclang-Command-line-Options/-mbranch-protection?lang=en

Arm C Language Extensions (ACLE)
https://developer.arm.com/docs/101028/latest

Addional Notes
--------------

This patch is a copy of the work done by Tamas Petz in boringssl. It
contains the changes from the following commits:

aarch64: support BTI and pointer authentication in assembly
    Change-Id: I4335f92e2ccc8e209c7d68a0a79f1acdf3aeb791
    URL: https://boringssl-review.googlesource.com/c/boringssl/+/42084
aarch64: Improve conditional compilation
    Change-Id: I14902a64e5f403c2b6a117bc9f5fb1a4f4611ebf
    URL: https://boringssl-review.googlesource.com/c/boringssl/+/43524
aarch64: Fix name of gnu property note section
    Change-Id: I6c432d1c852129e9c273f6469a8b60e3983671ec
    URL: https://boringssl-review.googlesource.com/c/boringssl/+/44024

Change-Id: I2d95ebc5e4aeb5610d3b226f9754ee80cf74a9af

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/16674)
2021-10-01 09:35:38 +02:00
Fangming.Fang
5ea64b456b Read MIDR_EL1 system register on aarch64
MIDR_EL1 system register exposes microarchitecture information so that
people can make micro-arch related optimization such as exposing as
much instruction level parallelism as possible.

MIDR_EL1 register can be read only if HWCAP_CPUID feature is supported.

Change-Id: Iabb8a36c5d31b184dba6399f378598058d394d4e

Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org>
(Merged from https://github.com/openssl/openssl/pull/11744)
2020-12-09 16:17:17 +01:00
Matt Caswell
33388b44b6 Update copyright year
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/11616)
2020-04-23 13:55:52 +01:00
David Benjamin
a21314dbbc Also check for errors in x86_64-xlate.pl.
In https://github.com/openssl/openssl/pull/10883, I'd meant to exclude
the perlasm drivers since they aren't opening pipes and do not
particularly need it, but I only noticed x86_64-xlate.pl, so
arm-xlate.pl and ppc-xlate.pl got the change.

That seems to have been fine, so be consistent and also apply the change
to x86_64-xlate.pl. Checking for errors is generally a good idea.

Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: David Benjamin <davidben@google.com>
(Merged from https://github.com/openssl/openssl/pull/10930)
2020-02-17 12:17:53 +10:00
David Benjamin
32be631ca1 Do not silently truncate files on perlasm errors
If one of the perlasm xlate drivers crashes, OpenSSL's build will
currently swallow the error and silently truncate the output to however
far the driver got. This will hopefully fail to build, but better to
check such things.

Handle this by checking for errors when closing STDOUT (which is a pipe
to the xlate driver).

Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org>
(Merged from https://github.com/openssl/openssl/pull/10883)
2020-01-22 18:11:30 +01:00
Richard Levitte
1aa89a7a3a Unify all assembler file generators
They now generally conform to the following argument sequence:

    script.pl "$(PERLASM_SCHEME)" [ C preprocessor arguments ... ] \
              $(PROCESSOR) <output file>

However, in the spirit of being able to use these scripts manually,
they also allow for no argument, or for only the flavour, or for only
the output file.  This is done by only using the last argument as
output file if it's a file (it has an extension), and only using the
first argument as flavour if it isn't a file (it doesn't have an
extension).

While we're at it, we make all $xlate calls the same, i.e. the $output
argument is always quoted, and we always die on error when trying to
start $xlate.

There's a perl lesson in this, regarding operator priority...

This will always succeed, even when it fails:

    open FOO, "something" || die "ERR: $!";

The reason is that '||' has higher priority than list operators (a
function is essentially a list operator and gobbles up everything
following it that isn't lower priority), and since a non-empty string
is always true, so that ends up being exactly the same as:

    open FOO, "something";

This, however, will fail if "something" can't be opened:

    open FOO, "something" or die "ERR: $!";

The reason is that 'or' has lower priority that list operators,
i.e. it's performed after the 'open' call.

Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/9884)
2019-09-16 16:29:57 +02:00
Richard Levitte
0e9725bcb9 Following the license change, modify the boilerplates in crypto/
[skip ci]

Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/7827)
2018-12-06 15:32:17 +01:00
Andy Polyakov
9a708bf982 {arm64|x86_64}cpuid.pl: add special 16-byte case to OPENSSL_memcmp.
OPENSSL_memcmp is a must in GCM decrypt and general-purpose loop takes
quite a portion of execution time for short inputs, more than GHASH for
few-byte inputs according to profiler. Special 16-byte case takes it off
top five list in profiler output.

Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6312)
2018-06-03 21:15:18 +02:00
Matt Caswell
6738bf1417 Update copyright year
Reviewed-by: Richard Levitte <levitte@openssl.org>
2018-02-13 13:59:25 +00:00
Andy Polyakov
77f3612e2b crypto/armcap.c: detect hardware-assisted SHA512 support.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2018-02-12 14:04:53 +01:00
Andy Polyakov
e33826f01b Add assembly CRYPTO_memcmp.
GH: #102

Reviewed-by: Richard Levitte <levitte@openssl.org>
2016-05-19 22:33:00 +02:00
Andy Polyakov
6133b4edbd ARMv8 assembly pack: add OPENSSL_cleanse.
Reviewed-by: Richard Levitte <levitte@openssl.org>
2016-05-16 22:21:17 +02:00
Rich Salz
e0a651945c Copyright consolidation: perl files
Add copyright to most .pl files
This does NOT cover any .pl file that has other copyright in it.
Most of those are Andy's but some are public domain.
Fix typo's in some existing files.

Reviewed-by: Richard Levitte <levitte@openssl.org>
2016-04-20 09:45:40 -04:00
Andy Polyakov
9b05cbc33e Add assembly support to ios64-cross.
Fix typos in ios64-cross config line.

Reviewed-by: Tim Hudson <tjh@openssl.org>
2015-01-23 15:38:41 +01:00