In https://github.com/openssl/openssl/pull/10883, I'd meant to exclude
the perlasm drivers since they aren't opening pipes and do not
particularly need it, but I only noticed x86_64-xlate.pl, so
arm-xlate.pl and ppc-xlate.pl got the change.
That seems to have been fine, so be consistent and also apply the change
to x86_64-xlate.pl. Checking for errors is generally a good idea.
Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: David Benjamin <davidben@google.com>
(Merged from https://github.com/openssl/openssl/pull/10930)
FIXES#10692#10638
a bug for aarch64 bigendian with instructions 'st1' and 'ld1' on AES-GCM mode.
CLA: trivial
Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Paul Dale <paul.dale@oracle.com>
(Merged from https://github.com/openssl/openssl/pull/10751)
If one of the perlasm xlate drivers crashes, OpenSSL's build will
currently swallow the error and silently truncate the output to however
far the driver got. This will hopefully fail to build, but better to
check such things.
Handle this by checking for errors when closing STDOUT (which is a pipe
to the xlate driver).
Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org>
(Merged from https://github.com/openssl/openssl/pull/10883)
Aes-ecb mode can be optimized by inverleaving cipher operation on
several blocks and loop unrolling. Interleaving needs one ideal
unrolling factor, here we adopt the same factor with aes-cbc,
which is described as below:
If blocks number > 5, select 5 blocks as one iteration,every
loop, decrease the blocks number by 5.
If 3 < left blocks < 5 select 3 blocks as one iteration, every
loop, decrease the block number by 3.
If left blocks < 3, treat them as tail blocks.
Detailed implementation will have a little adjustment for squeezing
code space.
With this way, for small size such as 16 bytes, the performance is
similar as before, but for big size such as 16k bytes, the performance
improves a lot, even reaches to 100%, for some arches such as A57,
the improvement even exceeds 100%. The following table will list the
encryption performance data on aarch64, take a72 and a57 as examples.
Performance value takes the unit of cycles per byte, takes the format
as comparision of values. List them as below:
A72:
Before optimization After optimization Improve
evp-aes-128-ecb@16 17.26538237 16.82663866 2.61%
evp-aes-128-ecb@64 5.50528499 5.222637557 5.41%
evp-aes-128-ecb@256 2.632700213 1.908442892 37.95%
evp-aes-128-ecb@1024 1.876102047 1.078018868 74.03%
evp-aes-128-ecb@8192 1.6550392 0.853982929 93.80%
evp-aes-128-ecb@16384 1.636871283 0.847623957 93.11%
evp-aes-192-ecb@16 17.73104961 17.09692468 3.71%
evp-aes-192-ecb@64 5.78984398 5.418545192 6.85%
evp-aes-192-ecb@256 2.872005308 2.081815274 37.96%
evp-aes-192-ecb@1024 2.083226672 1.25095642 66.53%
evp-aes-192-ecb@8192 1.831992057 0.995916251 83.95%
evp-aes-192-ecb@16384 1.821590009 0.993820525 83.29%
evp-aes-256-ecb@16 18.0606306 17.96963317 0.51%
evp-aes-256-ecb@64 6.19651997 5.762465812 7.53%
evp-aes-256-ecb@256 3.176991394 2.24642538 41.42%
evp-aes-256-ecb@1024 2.385991919 1.396018192 70.91%
evp-aes-256-ecb@8192 2.147862636 1.142222597 88.04%
evp-aes-256-ecb@16384 2.131361787 1.135944617 87.63%
A57:
Before optimization After optimization Improve
evp-aes-128-ecb@16 18.61045121 18.36456218 1.34%
evp-aes-128-ecb@64 6.438628994 5.467959461 17.75%
evp-aes-128-ecb@256 2.957452881 1.97238604 49.94%
evp-aes-128-ecb@1024 2.117096219 1.099665054 92.52%
evp-aes-128-ecb@8192 1.868385973 0.837440804 123.11%
evp-aes-128-ecb@16384 1.853078526 0.822420027 125.32%
evp-aes-192-ecb@16 19.07021756 18.50018552 3.08%
evp-aes-192-ecb@64 6.672351486 5.696088921 17.14%
evp-aes-192-ecb@256 3.260427769 2.131449916 52.97%
evp-aes-192-ecb@1024 2.410522832 1.250529718 92.76%
evp-aes-192-ecb@8192 2.17921605 0.973225504 123.92%
evp-aes-192-ecb@16384 2.162250997 0.95919871 125.42%
evp-aes-256-ecb@16 19.3008384 19.12743654 0.91%
evp-aes-256-ecb@64 6.992950658 5.92149541 18.09%
evp-aes-256-ecb@256 3.576361743 2.287619504 56.34%
evp-aes-256-ecb@1024 2.726671027 1.381267599 97.40%
evp-aes-256-ecb@8192 2.493583657 1.110959913 124.45%
evp-aes-256-ecb@16384 2.473916816 1.099967073 124.91%
Change-Id: Iccd23d972e0d52d22dc093f4c208f69c9d5a0ca7
Reviewed-by: Shane Lontis <shane.lontis@oracle.com>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/10518)
They now generally conform to the following argument sequence:
script.pl "$(PERLASM_SCHEME)" [ C preprocessor arguments ... ] \
$(PROCESSOR) <output file>
However, in the spirit of being able to use these scripts manually,
they also allow for no argument, or for only the flavour, or for only
the output file. This is done by only using the last argument as
output file if it's a file (it has an extension), and only using the
first argument as flavour if it isn't a file (it doesn't have an
extension).
While we're at it, we make all $xlate calls the same, i.e. the $output
argument is always quoted, and we always die on error when trying to
start $xlate.
There's a perl lesson in this, regarding operator priority...
This will always succeed, even when it fails:
open FOO, "something" || die "ERR: $!";
The reason is that '||' has higher priority than list operators (a
function is essentially a list operator and gobbles up everything
following it that isn't lower priority), and since a non-empty string
is always true, so that ends up being exactly the same as:
open FOO, "something";
This, however, will fail if "something" can't be opened:
open FOO, "something" or die "ERR: $!";
The reason is that 'or' has lower priority that list operators,
i.e. it's performed after the 'open' call.
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/9884)
"Windows friendliness" means a) flipping .thumb and .text directives,
b) always generate Thumb-2 code when asked(*); c) Windows-specific
references to external OPENSSL_armcap_P.
(*) so far *some* modules were compiled as .code 32 even if Thumb-2
was targeted. It works at hardware level because processor can alternate
between the modes with no overhead. But clang --target=arm-windows's
builtin assembler just refuses to compile .code 32...
Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/8252)
ARMv8.3 adds pointer authentication extension, which in this case allows
to ensure that, when offloaded to stack, return address is same at return
as at entry to the subroutine. The new instructions are nops on processors
that don't implement the extension, so that the vetification is backward
compatible.
Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/8205)
Around 138 distinct errors found and fixed; thanks!
Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/3459)
The prevailing style seems to not have trailing whitespace, but a few
lines do. This is mostly in the perlasm files, but a few C files got
them after the reformat. This is the result of:
find . -name '*.pl' | xargs sed -E -i '' -e 's/( |'$'\t'')*$//'
find . -name '*.c' | xargs sed -E -i '' -e 's/( |'$'\t'')*$//'
find . -name '*.h' | xargs sed -E -i '' -e 's/( |'$'\t'')*$//'
Then bn_prime.h was excluded since this is a generated file.
Note mkerr.pl has some changes in a heredoc for some help output, but
other lines there lack trailing whitespace too.
Reviewed-by: Kurt Roeckx <kurt@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
ARM has optimized Cortex-A5x pipeline to favour pairs of complementary
AES instructions. While modified code improves performance of post-r0p0
Cortex-A53 performance by >40% (for CBC decrypt and CTR), it hurts
original r0p0. We favour later revisions, because one can't prevent
future from coming. Improvement on post-r0p0 Cortex-A57 exceeds 50%,
while new code is not slower on r0p0, or Apple A7 for that matter.
[Update even SHA results for latest Cortex-A53.]
Reviewed-by: Richard Levitte <levitte@openssl.org>
This facilitates "universal" builds, ones that target multiple
architectures, e.g. ARMv5 through ARMv7. See commentary in
Configure for details.
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Matt Caswell <matt@openssl.org>