Commit Graph

36 Commits

Author SHA1 Message Date
Matt Caswell
fecb3aae22 Update copyright year
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Release: yes
2022-05-03 13:34:51 +01:00
Daniel Hu
b1b2146ded Acceleration of chacha20 on aarch64 by SVE
This patch accelerates chacha20 on aarch64 when Scalable Vector Extension
(SVE) is supported by CPU. Tested on modern micro-architecture with
256-bit SVE, it has the potential to improve performance up to 20%

The solution takes a hybrid approach. SVE will handle multi-blocks that fit
the SVE vector length, with Neon/Scalar to process any tail data

Test result:
With SVE
type            1024 bytes   8192 bytes  16384 bytes
ChaCha20        1596208.13k  1650010.79k  1653151.06k

Without SVE (by Neon/Scalar)
type            1024 bytes   8192 bytes  16384 bytes
chacha20        1355487.91k  1372678.83k  1372662.44k

The assembly code has been reviewed internally by
ARM engineer Fangming.Fang@arm.com

Signed-off-by: Daniel Hu <Daniel.Hu@arm.com>

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17916)
2022-05-03 14:37:46 +10:00
XiaokangQian
954f45ba4c Optimize AES-GCM for uarchs with unroll and new instructions
Increase the block numbers to 8 for every iteration.  Increase the hash
table capacity.  Make use of EOR3 instruction to improve the performance.

This can improve performance 25-40% on out-of-order microarchitectures
with a large number of fast execution units, such as Neoverse V1.  We also
see 20-30% performance improvements on other architectures such as the M1.

Assembly code reviewd by Tom Cosgrove (ARM).

Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15916)
2022-01-25 14:30:00 +11:00
Daniel Hu
15b7175f55 SM4 optimization for ARM by HW instruction
This patch implements the SM4 optimization for ARM processor,
using SM4 HW instruction, which is an optional feature of
crypto extension for aarch64 V8.

Tested on some modern ARM micro-architectures with SM4 support, the
performance uplift can be observed around 8X~40X over existing
C implementation in openssl. Algorithms that can be parallelized
(like CTR, ECB, CBC decryption) are on higher end, with algorithm
like CBC encryption on lower end (due to inter-block dependency)

Perf data on Yitian-710 2.75GHz hardware, before and after optimization:

Before:
  type      16 bytes     64 bytes    256 bytes    1024 bytes   8192 bytes  16384 bytes
  SM4-CTR  105787.80k   107837.87k   108380.84k   108462.08k   108549.46k   108554.92k
  SM4-ECB  111924.58k   118173.76k   119776.00k   120093.70k   120264.02k   120274.94k
  SM4-CBC  106428.09k   109190.98k   109674.33k   109774.51k   109827.41k   109827.41k

After (7.4x - 36.6x faster):
  type      16 bytes     64 bytes    256 bytes    1024 bytes   8192 bytes  16384 bytes
  SM4-CTR  781979.02k  2432994.28k  3437753.86k  3834177.88k  3963715.58k  3974556.33k
  SM4-ECB  937590.69k  2941689.02k  3945751.81k  4328655.87k  4459181.40k  4468692.31k
  SM4-CBC  890639.88k  1027746.58k  1050621.78k  1056696.66k  1058613.93k  1058701.31k

Signed-off-by: Daniel Hu <Daniel.Hu@arm.com>

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17455)
2022-01-18 11:52:14 +01:00
fangming.fang
71396cd048 SM3 acceleration with SM3 hardware instruction on aarch64
SM3 hardware instruction is optional feature of crypto extension for
aarch64. This implementation accelerates SM3 via SM3 instructions. For
the platform not supporting SM3 instruction, the original C
implementation still works. Thanks to AliBaba for testing and reporting
the following perf numbers for Yitian710:

Benchmark on T-Head Yitian-710 2.75GHz:

Before:
type  16 bytes     64 bytes    256 bytes    1024 bytes   8192 bytes   16384 bytes
sm3   49297.82k   121062.63k   223106.05k   283371.52k   307574.10k   309400.92k

After (33% - 74% faster):
type  16 bytes     64 bytes    256 bytes    1024 bytes   8192 bytes   16384 bytes
sm3   65640.01k   179121.79k   359854.59k   481448.96k   534055.59k   538274.47k

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17454)
2022-01-14 11:40:05 +01:00
fangming.fang
abc4345a19 fix building failure when using -Wconditional-uninitialized
Use clang -Wconditional-uninitialized to build, the error "initialize
the variable 'buffer_size' to silence this warning" will be reported.

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17375)
2022-01-05 09:59:31 +01:00
Orr Toledano
efa1f22483 Add Arm Assembly (aarch64) support for RNG
Include aarch64 asm instructions for random number generation using the
RNDR and RNDRRS instructions. Provide detection functions for RNDR and
RNDRRS getauxval.

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15361)
2021-12-16 12:38:09 +01:00
Allan Jude
c1dabe26e3 Fix detection of ARMv7 and ARM64 CPU features on FreeBSD
OpenSSL assumes AT_HWCAP = 16 (as on Linux), but on FreeBSD AT_HWCAP = 25
Switch to using AT_HWCAP, and setting it to 16 if it is not defined.

OpenSSL calls elf_auxv_info() with AT_CANARY which returns ENOENT
resulting in all ARM acceleration features being disabled.

CLA: trivial

Reviewed-by: Ben Kaduk <kaduk@mit.edu>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17082)
2021-11-24 11:00:24 +01:00
yunh
d5567d5f6e enable getauxval on android 10
Fixes #9498

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15870)

(cherry picked from commit b2dea4d5f2)
2021-06-25 18:31:05 +10:00
Lars Immisch
a5d250e57e Use getauxval on Android with API level > 18
We received analytics that devices of the device family Oppo A37x
are crashing with SIGILL when trying to load libcrypto.so.
These crashes were fixed by using the system-supplied getauxval function.

Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/11257)
2021-06-15 12:53:26 +02:00
Tom Cosgrove
bb97dc508f Initialise OPENSSL_armcap_P to 0 before setting it based on capabilities, not after
Signed-off-by: Tom Cosgrove <tom.cosgrove@arm.com>

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Shane Lontis <shane.lontis@oracle.com>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15486)
2021-05-28 10:28:29 +10:00
David CARLIER
f1a45f68bc armcap: fix Mac M1 SHA512 support.
The SIGILL catch/trap works however disabled purposely for Darwin,
 thus relying on native api instead.

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/14935)
2021-05-11 10:00:27 +02:00
fangming.fang
1064616012 Optimize RSA on armv8
Add Neon path for RSA on armv8, this optimisation targets to A72
and N1 that are ones of important cores of infrastructure. Other
platforms are not impacted.

A72
                        old		new             improved
rsa  512 sign		9828.6		9738.7		-1%
rsa  512 verify		121497.2	122367.7	1%
rsa 1024 sign		1818		1816.9		0%
rsa 1024 verify		37175.6		37161.3		0%
rsa 2048 sign		267.3		267.4		0%
rsa 2048 verify		10127.6		10119.6		0%
rsa 3072 sign		86.8		87		0%
rsa 3072 verify		4604.2		4956.2		8%
rsa 4096 sign		38.3		38.5		1%
rsa 4096 verify		2619.8		2972.1		13%
rsa 7680 sign		5		7		40%
rsa 7680 verify		756	     	929.4		23%
rsa 15360 sign		0.8	     	1		25%
rsa 15360 verify	190.4	 	246		29%

N1
                        old		new             improved
rsa  512 sign		12599.2		12596.7		0%
rsa  512 verify		148636.1	148656.2	0%
rsa 1024 sign		2150.6		2148.9		0%
rsa 1024 verify		42353.5		42265.2		0%
rsa 2048 sign		305.5		305.3		0%
rsa 2048 verify		11209.7		11205.2		0%
rsa 3072 sign		97.8		98.2		0%
rsa 3072 verify		5061.3		5990.7		18%
rsa 4096 sign		42.8		43		0%
rsa 4096 verify		2867.6		3509.8		22%
rsa 7680 sign		5.5		8.4		53%
rsa 7680 verify		823.5		1058.3		29%
rsa 15360 sign		0.9		1.1		22%
rsa 15360 verify	207		273.9		32%

CustomizedGitHooks: yes
Change-Id: I01c732cc429d793c4eb5ffd27ccd30ff9cebf8af
Jira: SECLIB-540

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/14761)
2021-05-09 23:15:07 +10:00
Richard Levitte
4333b89f50 Update copyright year
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/13999)
2021-01-28 13:54:57 +01:00
David Carlier
5eb24fbd1c OPENSSL_cpuid_setup FreeBSD arm update.
when possible using the getauxval equivalent which has similar ids as Linux, instead of bad instructions catch approach.

Reviewed-by: Ben Kaduk <kaduk@mit.edu>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/13650)
2021-01-14 08:34:38 +00:00
Fangming.Fang
5ea64b456b Read MIDR_EL1 system register on aarch64
MIDR_EL1 system register exposes microarchitecture information so that
people can make micro-arch related optimization such as exposing as
much instruction level parallelism as possible.

MIDR_EL1 register can be read only if HWCAP_CPUID feature is supported.

Change-Id: Iabb8a36c5d31b184dba6399f378598058d394d4e

Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org>
(Merged from https://github.com/openssl/openssl/pull/11744)
2020-12-09 16:17:17 +01:00
Richard Levitte
5f40dd158c crypto/armcap.c, crypto/ppccap.c: stricter use of getauxval()
Having a weak getauxval() and only depending on GNU C without looking
at the library we build against meant that it got picked up where not
really expected.

So we change this to check for the glibc version, and since we know it
exists from that version, there's no real need to make it weak.

Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
(Merged from https://github.com/openssl/openssl/pull/8028)
2019-01-16 18:00:48 +01:00
Richard Levitte
0e9725bcb9 Following the license change, modify the boilerplates in crypto/
[skip ci]

Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/7827)
2018-12-06 15:32:17 +01:00
Andy Polyakov
72983c0eab crypto/armcap.c: mask SHA512 hardware detection on iOS.
When running iOS application from command line it's impossible to
get past the failing capability detection. This is because it's
executed under debugger and iOS debugger is impossible to deal with.
[If Apple implements SHA512 in silicon, it would have to be detected
with sysctlbyname.]

Reviewed-by: Rich Salz <rsalz@openssl.org>
2018-03-06 23:18:24 +01:00
Matt Caswell
6738bf1417 Update copyright year
Reviewed-by: Richard Levitte <levitte@openssl.org>
2018-02-13 13:59:25 +00:00
Andy Polyakov
77f3612e2b crypto/armcap.c: detect hardware-assisted SHA512 support.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2018-02-12 14:04:53 +01:00
Kurt Roeckx
d807db26a4 Create a prototype for OPENSSL_rdtsc
Switch to make it return an uint32_t instead of the various different
types it returns now.

Fixes: #3125

Reviewed-by: Andy Polyakov <appro@openssl.org>
GH: #4757
2017-11-25 14:30:11 +01:00
Xiaoyin Liu
c9a41d7dd6 Fix typo in files in crypto folder
Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
Reviewed-by: Andy Polyakov <appro@openssl.org>
GH: #4093
2017-08-05 20:42:06 +02:00
komainu8
6ea3bca427 Modify type of variable in OPENSSL_cpuid_setup function
CLA: trivial

Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/3651)
2017-06-16 16:58:51 -04:00
Andy Polyakov
8653e78f43 crypto/armcap.c: short-circuit processor capability probe in iOS builds.
Capability probing by catching SIGILL appears to be problematic
on iOS. But since Apple universe is "monocultural", it's actually
possible to simply set pre-defined processor capability mask.

Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/2617)
2017-02-15 23:16:23 +01:00
Rich Salz
d2e9e32018 Copyright consolidation 07/10
Reviewed-by: Richard Levitte <levitte@openssl.org>
2016-05-17 14:51:26 -04:00
Andy Polyakov
313e6ec11f Add assembly support for 32-bit iOS.
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
2015-04-20 15:06:22 +02:00
Andy Polyakov
9b05cbc33e Add assembly support to ios64-cross.
Fix typos in ios64-cross config line.

Reviewed-by: Tim Hudson <tjh@openssl.org>
2015-01-23 15:38:41 +01:00
Matt Caswell
0f113f3ee4 Run util/openssl-format-source -v -c .
Reviewed-by: Tim Hudson <tjh@openssl.org>
2015-01-22 09:20:09 +00:00
Andy Polyakov
c1669e1c20 Remove inconsistency in ARM support.
This facilitates "universal" builds, ones that target multiple
architectures, e.g. ARMv5 through ARMv7. See commentary in
Configure for details.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
2015-01-04 23:45:08 +01:00
Andy Polyakov
e8d93e342b Add linux-aarch64 taget.
armcap.c is shared between 32- and 64-bit builds and features link-time
detection of getauxval.

Submitted by: Ard Biesheuvel.
2014-06-01 17:21:06 +02:00
Andy Polyakov
4afa9f033d crypto/armcap.c: detect ARMv8 capabilities [in 32-bit build]. 2014-05-04 10:55:49 +02:00
Andy Polyakov
8e52a9063a crypto/armcap.c: fix typo in rdtsc subroutine.
PR: 3125
Submitted by: Kyle McMartin
2013-09-15 22:07:49 +02:00
Dr. Stephen Henson
482cdf2489 typo 2011-10-24 13:23:51 +00:00
Andy Polyakov
033a25cef5 armcap.c: auto-setup processor capability vector. 2011-10-20 20:52:26 +00:00
Andy Polyakov
87873f4328 ARM assembler pack: add platform run-time detection. 2011-07-17 17:40:29 +00:00