Commit Graph

337 Commits

Author SHA1 Message Date
Bernd Edlinger
b0d3442efc Add some missing cfi frame info in aesni-sha and sha-x86_64.pl
Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
(Merged from https://github.com/openssl/openssl/pull/10655)
2019-12-20 23:14:37 +01:00
Bernd Edlinger
95bbe6eff7 Add some missing cfi frame info in keccak1600-x86_64.pl
Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
(Merged from https://github.com/openssl/openssl/pull/10654)
2019-12-20 23:12:00 +01:00
Bernd Edlinger
9ce91035bc Fix sha512_block_data_order_avx2 backtrace info
We store a secondary frame pointer info for the debugger
in the red zone.

Fixes #8853

[extended tests]

Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/9624)
2019-11-20 14:06:23 +01:00
Richard Levitte
1aa89a7a3a Unify all assembler file generators
They now generally conform to the following argument sequence:

    script.pl "$(PERLASM_SCHEME)" [ C preprocessor arguments ... ] \
              $(PROCESSOR) <output file>

However, in the spirit of being able to use these scripts manually,
they also allow for no argument, or for only the flavour, or for only
the output file.  This is done by only using the last argument as
output file if it's a file (it has an extension), and only using the
first argument as flavour if it isn't a file (it doesn't have an
extension).

While we're at it, we make all $xlate calls the same, i.e. the $output
argument is always quoted, and we always die on error when trying to
start $xlate.

There's a perl lesson in this, regarding operator priority...

This will always succeed, even when it fails:

    open FOO, "something" || die "ERR: $!";

The reason is that '||' has higher priority than list operators (a
function is essentially a list operator and gobbles up everything
following it that isn't lower priority), and since a non-empty string
is always true, so that ends up being exactly the same as:

    open FOO, "something";

This, however, will fail if "something" can't be opened:

    open FOO, "something" or die "ERR: $!";

The reason is that 'or' has lower priority that list operators,
i.e. it's performed after the 'open' call.

Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/9884)
2019-09-16 16:29:57 +02:00
Omid Najafi
2a17758940 Fix syntax error for the armv4 assembler
The error was from the alignment syntax of the code.
More details:
https://stackoverflow.com/questions/57316823/arm-assembly-syntax-in-vst-vld-commands?noredirect=1#comment101133590_57316823

CLA: trivial

Fixes: #9518

Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
(Merged from https://github.com/openssl/openssl/pull/9518)
2019-08-15 14:22:16 +02:00
Lei Maohui
7b0fceed21 Fix build error for aarch64 big endian.
Modified rev to rev64, because rev only takes integer registers.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90827
Otherwise, the following error will occur.

Error: operand 1 must be an integer register -- `rev v31.16b,v31.16b'

CLA: trivial

Signed-off-by: Lei Maohui <leimaohui@cn.fujitsu.com>

Reviewed-by: Shane Lontis <shane.lontis@oracle.com>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/9151)
2019-07-08 10:53:02 +02:00
Antoine Cœur
c2969ff6e7 Fix Typos
CLA: trivial

Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Matthias St. Pierre <Matthias.St.Pierre@ncp-e.com>
(Merged from https://github.com/openssl/openssl/pull/9288)
2019-07-02 14:22:29 +02:00
Andy Polyakov
6465321e40 ARM64 assembly pack: add ThunderX2 results.
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/8776)
2019-04-17 21:08:13 +02:00
Andy Polyakov
1b1ff9b94d sha/asm/keccak1600-ppc64.pl: up 10% performance improvement.
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/8444)
2019-03-11 12:33:39 +01:00
Andy Polyakov
db42bb440e ARM64 assembly pack: make it Windows-friendly.
"Windows friendliness" means a) unified PIC-ification, unified across
all platforms; b) unified commantary delimiter; c) explicit ldur/stur,
as Visual Studio assembler can't automatically encode ldr/str as
ldur/stur when needed.

Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/8256)
2019-02-16 17:01:15 +01:00
Andy Polyakov
3405db97e5 ARM assembly pack: make it Windows-friendly.
"Windows friendliness" means a) flipping .thumb and .text directives,
b) always generate Thumb-2 code when asked(*); c) Windows-specific
references to external OPENSSL_armcap_P.

(*) so far *some* modules were compiled as .code 32 even if Thumb-2
was targeted. It works at hardware level because processor can alternate
between the modes with no overhead. But clang --target=arm-windows's
builtin assembler just refuses to compile .code 32...

Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/8252)
2019-02-16 16:59:23 +01:00
Andy Polyakov
9a18aae5f2 AArch64 assembly pack: authenticate return addresses.
ARMv8.3 adds pointer authentication extension, which in this case allows
to ensure that, when offloaded to stack, return address is same at return
as at entry to the subroutine. The new instructions are nops on processors
that don't implement the extension, so that the vetification is backward
compatible.

Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/8205)
2019-02-12 19:00:42 +01:00
Richard Levitte
a598ed0dc4 Following the license change, modify the boilerplates in crypto/sha/
[skip ci]

Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/7816)
2018-12-06 15:23:03 +01:00
Richard Levitte
389c09fa09 License: change any non-boilerplate comment referring to "OpenSSL license"
Make it just say "the License", which refers back to the standard
boilerplate.

Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/7764)
2018-12-06 13:26:28 +01:00
Andy Polyakov
6b956fe77b sha/asm/sha512p8-ppc.pl: optimize epilogue.
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/7643)
2018-11-16 09:23:50 +01:00
Andy Polyakov
79d7fb990c sha/asm/sha512p8-ppc.pl: fix typo in prologue.
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/7643)
2018-11-16 09:23:50 +01:00
Andy Polyakov
9986bfefa4 sha/asm/keccak1600-armv8.pl: halve the size of hw-assisted subroutine.
Yes, it's second halving, i.e. it's now 1/4 of original size, or more
specifically inner loop. The challenge with Keccak is that you need
more temporary registers than there are available. By reversing the
order in which columns are assigned in Chi, it's possible to use
three of A[][] registers as temporary prior their assigment.

Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/7294)
2018-10-19 10:43:02 +02:00
Andy Polyakov
fc97c882f4 sha/asm/keccak1600-s390x.pl: resolve -march=z900 portability issue.
Negative displacement in memory references was not originally specified,
so that for maximum coverage one should abstain from it, just like with
any other extension. [Unless it's guarded by run-time switch, but there
is no switch in keccak1600-s390x.]

Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/7239)
2018-10-12 20:51:27 +02:00
Matt Caswell
1212818eb0 Update copyright year
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/7176)
2018-09-11 13:45:17 +01:00
Pauli
8794be2ed8 Remove development artifacts.
The issue was discovered on the x86/64 when attempting to include
libcrypto inside another shared library.  A relocation of type
R_X86_64_PC32 was generated which causes a linker error.

Reviewed-by: Rich Salz <rsalz@openssl.org>
Reviewed-by: Andy Polyakov <appro@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6595)
2018-07-02 07:21:26 +10:00
Andy Polyakov
1753d12374 PA-RISC assembly pack: make it work with GNU assembler for HP-UX.
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6583)
2018-06-25 16:45:48 +02:00
Andy Polyakov
2e51557bc9 sha/asm/sha{256|512}-armv4.pl: harmonize thumb2 support with the rest.
Reviewed-by: Richard Levitte <levitte@openssl.org>
2018-06-22 14:28:08 +02:00
Matt Caswell
fd38836ba8 Update copyright year
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6538)
2018-06-20 15:29:23 +01:00
Andy Polyakov
b55e21b357 sha/asm/sha{1|256}-586.pl: harmonize clang version detection.
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6499)
2018-06-18 19:59:03 +02:00
Andy Polyakov
f0c77d66b4 sha/asm/sha512p8-ppc.pl: fix build on Mac OS X.
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6419)
2018-06-06 22:13:24 +02:00
Andy Polyakov
c4d9ef4cc5 sha/asm/sha512p8-ppc.pl: improve POWER9 performance by ~10%.
Biggest part, ~7%, of improvement resulted from omitting constants'
table index increment in each round. And minor part from rescheduling
instructions. Apparently POWER9 (and POWER8) manage to dispatch
instructions more efficiently if they are laid down as if they have
no latency...

Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6406)
2018-06-03 21:20:40 +02:00
Andy Polyakov
41013cd63c PPC assembly pack: correct POWER9 results.
As it turns out originally published results were skewed by "turbo"
mode. VM apparently remains oblivious to dynamic frequency scaling,
and reports that processor operates at "base" frequency at all times.
While actual frequency gets increased under load.

Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6406)
2018-06-03 21:20:06 +02:00
Matt Caswell
83cf7abf8e Update copyright year
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6371)
2018-05-29 13:16:04 +01:00
Andy Polyakov
13f6857db1 PPC assembly pack: add POWER9 results.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2018-05-10 11:44:21 +02:00
Matt Caswell
6ec5fce25e Update copyright year
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6145)
2018-05-01 13:34:30 +01:00
Andy Polyakov
e9afe7a143 sha/asm/keccak1600-armv4.pl: adapt for multi-platform.
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6042)
2018-04-23 17:27:53 +02:00
Andy Polyakov
0fe72aaaa9 sha/asm/keccak1600-x86_64.pl: make it work on Windows.
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6042)
2018-04-23 17:27:31 +02:00
Andy Polyakov
dd2d7b19f8 sha/asm/keccak1600-armv8.pl: halve the size of hw-assisted subroutine.
Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
2018-04-23 17:19:57 +02:00
Matt Caswell
b0edda11cb Update copyright year
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/5689)
2018-03-20 13:08:46 +00:00
Andy Polyakov
9d3cab4bdb MIPS assembly pack: default heuristic detection to little-endian.
Current endianness detection is somewhat opportunistic and can fail
in cross-compile scenario. Since we are more likely to cross-compile
for little-endian now, adjust the default accordingly.

Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/5613)
2018-03-19 14:31:30 +01:00
Richard Levitte
2bd3b626dd Make a few more asm modules conform: last argument is output file
Fixes #5310

Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/5315)
2018-03-08 19:31:41 +01:00
Andy Polyakov
b761ff4e77 sha/asm/keccak1600-armv8.pl: add hardware-assisted ARMv8.2 subroutines.
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/5358)
2018-02-19 14:15:31 +01:00
Matt Caswell
6738bf1417 Update copyright year
Reviewed-by: Richard Levitte <levitte@openssl.org>
2018-02-13 13:59:25 +00:00
Andy Polyakov
af0fcf7b46 sha/asm/sha512-armv8.pl: add hardware-assisted SHA512 subroutine.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2018-02-12 14:05:05 +01:00
Andy Polyakov
24d06e8ca0 Add sha/asm/keccak1600-avx512vl.pl.
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/4948)
2017-12-22 12:38:40 +01:00
Andy Polyakov
7533162322 ARMv8 assembly pack: add Qualcomm Kryo results.
[skip ci]

Reviewed-by: Tim Hudson <tjh@openssl.org>
2017-11-13 11:13:00 +01:00
Josh Soref
46f4e1bec5 Many spelling fixes/typo's corrected.
Around 138 distinct errors found and fixed; thanks!

Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/3459)
2017-11-11 19:03:10 -05:00
Patrick Steuer
bc4e831ccd s390x assembly pack: extend s390x capability vector.
Extend the s390x capability vector to store the longer facility list
available from z13 onwards. The bits indicating the vector extensions
are set to zero, if the kernel does not enable the vector facility.

Also add capability bits returned by the crypto instructions' query
functions.

Signed-off-by: Patrick Steuer <patrick.steuer@de.ibm.com>

Reviewed-by: Andy Polyakov <appro@openssl.org>
Reviewed-by: Tim Hudson <tjh@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/4542)
2017-10-30 14:31:32 +01:00
Patrick Steuer
af1d638730 s390x assembly pack: remove capability double-checking.
An instruction's QUERY function is executed at initialization, iff the required
MSA level is installed. Therefore, it is sufficient to check the bits returned
by the QUERY functions. The MSA level does not have to be checked at every
function call.
crypto/aes/asm/aes-s390x.pl: The AES key schedule must be computed if the
required KM or KMC function codes are not available. Formally, the availability
of a KMC function code does not imply the availability of the corresponding KM
function code.

Signed-off-by: Patrick Steuer <patrick.steuer@de.ibm.com>

Reviewed-by: Andy Polyakov <appro@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/4501)
2017-10-17 21:55:33 +02:00
Rich Salz
e3713c365c Remove email addresses from source code.
Names were not removed.
Some comments were updated.
Replace Andy's address with openssl.org

Reviewed-by: Andy Polyakov <appro@openssl.org>
Reviewed-by: Paul Dale <paul.dale@oracle.com>
(Merged from https://github.com/openssl/openssl/pull/4516)
2017-10-13 10:06:59 -04:00
Andy Polyakov
236dd46339 sha/asm/keccak1600-armv8.pl: fix return value buglet and ...
... script data load.

On related note an attempt was made to merge rotations with logical
operations. I mean as we know, ARM ISA has merged rotate-n-logical
instructions which can be used here. And they were used to improve
keccak1600-armv4 performance. But not here. Even though this approach
resulted in improvement on Cortex-A53 proportional to reduction of
amount of instructions, ~8%, it didn't exactly worked out on
non-Cortex cores. Presumably because they break merged instructions
to separate μ-ops, which results in higher *operations* count. X-Gene
and Denver went ~20% slower and Apple A7 - 40%. The optimization was
therefore dismissed.

Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-09-09 19:09:36 +02:00
Andy Polyakov
e0584e96c1 sha/asm/keccak1600-armv4.pl: optimize for Thumb-2.
Reduce per-round instruction count in Thumb-2 case by 16%. This is
achieved by folding ldr/str pairs to their double-word counterparts.

Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-08-16 20:25:20 +02:00
Andy Polyakov
3c1a60e56f sha/asm/keccak1600-avx512.pl: fix buglet in SHA3_squeeze tail.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-08-12 12:23:31 +02:00
Andy Polyakov
d9ca12cbf6 sha/asm/keccak1600-armv4.pl: improve non-NEON performance by ~10%.
This is achieved mostly by ~10% reduction of amount of instructions
per round thanks to a) switch to KECCAK_2X variant; b) merge of
almost 1/2 rotations with logical instructions. Performance is
improved on all observed processors except on Cortex-A15. This is
because it's capable of exploiting more parallelism and can execute
original code for same amount of time.

Reviewed-by: Rich Salz <rsalz@openssl.org>
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
(Merged from https://github.com/openssl/openssl/pull/4057)
2017-08-02 23:22:28 +02:00
Xiaoyin Liu
bac5b39c96 Fix typo in sha1-thumb.pl
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/4056)
2017-07-30 21:26:38 -04:00