Commit Graph

84 Commits

Author SHA1 Message Date
Ard Biesheuvel
fcf6e9d056 crypto/poly1305/asm: fix armv8 pointer authentication
PAC pointer authentication signs the return address against the value
of the stack pointer, to prevent stack overrun exploits from corrupting
the control flow. However, this requires that the AUTIASP is issued with
SP holding the same value as it held when the PAC value was generated.
The Poly1305 armv8 code got this wrong, resulting in crashes on PAC
capable hardware.

Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org>
(Merged from https://github.com/openssl/openssl/pull/13256)
2020-10-29 17:17:21 +01:00
Romain Geissler
a135948dda Fix aarch64 static linking into shared libraries (see issue #10842 and pull request #11464)
This tries to fix the following link errors on aarch64 when using OpenSSL
3.0.0 alpha 6, compiling it with "no-shared" and -fPIC in CFLAGS, then
trying to use the resulting OpenSSL static libraries in the build of
elfutils, which embed libcrypto.a into libdebuginfo.so, which hides all
symbols (except the libdebuginfod ones) by default:

/opt/1A/toolchain/aarch64-v4.0.86/lib/gcc/aarch64-1a-linux-gnu/8.4.1/../../../../aarch64-1a-linux-gnu/bin/ld: /workdir/build/build-pack/build-pack-temporary-static-dependencies/install/lib/libcrypto.a(libcrypto-lib-sha1-armv8.o): relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `OPENSSL_armcap_P' which may bind externally can not be used when making a shared object; recompile with -fPIC
/workdir/build/build-pack/build-pack-temporary-static-dependencies/install/lib/libcrypto.a(libcrypto-lib-sha1-armv8.o): in function `sha1_block_data_order':
(.text+0x0): dangerous relocation: unsupported relocation
/opt/1A/toolchain/aarch64-v4.0.86/lib/gcc/aarch64-1a-linux-gnu/8.4.1/../../../../aarch64-1a-linux-gnu/bin/ld: /workdir/build/build-pack/build-pack-temporary-static-dependencies/install/lib/libcrypto.a(libcrypto-lib-chacha-armv8.o): relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `OPENSSL_armcap_P' which may bind externally can not be used when making a shared object; recompile with -fPIC
/workdir/build/build-pack/build-pack-temporary-static-dependencies/install/lib/libcrypto.a(libcrypto-lib-chacha-armv8.o): in function `ChaCha20_ctr32':
(.text+0x6c): dangerous relocation: unsupported relocation
/opt/1A/toolchain/aarch64-v4.0.86/lib/gcc/aarch64-1a-linux-gnu/8.4.1/../../../../aarch64-1a-linux-gnu/bin/ld: /workdir/build/build-pack/build-pack-temporary-static-dependencies/install/lib/libcrypto.a(libcrypto-lib-sha256-armv8.o): relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `OPENSSL_armcap_P' which may bind externally can not be used when making a shared object; recompile with -fPIC /workdir/build/build-pack/build-pack-temporary-static-dependencies/install/lib/libcrypto.a(libcrypto-lib-sha256-armv8.o): in function `sha256_block_data_order':
(.text+0x0): dangerous relocation: unsupported relocation
/opt/1A/toolchain/aarch64-v4.0.86/lib/gcc/aarch64-1a-linux-gnu/8.4.1/../../../../aarch64-1a-linux-gnu/bin/ld: /workdir/build/build-pack/build-pack-temporary-static-dependencies/install/lib/libcrypto.a(libcrypto-lib-sha512-armv8.o): relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `OPENSSL_armcap_P' which may bind externally can not be used when making a shared object; recompile with -fPIC /workdir/build/build-pack/build-pack-temporary-static-dependencies/install/lib/libcrypto.a(libcrypto-lib-sha512-armv8.o): in function `sha512_block_data_order':
(.text+0x0): dangerous relocation: unsupported relocation
/opt/1A/toolchain/aarch64-v4.0.86/lib/gcc/aarch64-1a-linux-gnu/8.4.1/../../../../aarch64-1a-linux-gnu/bin/ld: /workdir/build/build-pack/build-pack-temporary-static-dependencies/install/lib/libcrypto.a(libcrypto-lib-poly1305-armv8.o): relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `OPENSSL_armcap_P' which may bind externally can not be used when making a shared object; recompile with -fPIC
/workdir/build/build-pack/build-pack-temporary-static-dependencies/install/lib/libcrypto.a(libcrypto-lib-poly1305-armv8.o): in function `poly1305_init':
(.text+0x14): dangerous relocation: unsupported relocation
/workdir/build/build-pack/build-pack-temporary-static-dependencies/install/lib/libcrypto.a(libcrypto-lib-poly1305-armv8.o): in function `poly1305_emit_neon':
(.text+0x8e4): relocation truncated to fit: R_AARCH64_CONDBR19 against symbol `poly1305_emit' defined in .text section in /workdir/build/build-pack/build-pack-temporary-static-dependencies/install/lib/libcrypto.a(libcrypto-lib-poly1305-armv8.o)

In poly1305-armv8.pl, hide symbols the same way they are hidden in poly1305-x86_64.pl.

Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
Reviewed-by: Paul Dale <paul.dale@oracle.com>
(Merged from https://github.com/openssl/openssl/pull/13056)
2020-10-22 12:16:49 +10:00
Jung-uk Kim
cd84d8832d Ignore vendor name in Clang version number.
For example, FreeBSD prepends "FreeBSD" to version string, e.g.,

FreeBSD clang version 11.0.0 (git@github.com:llvm/llvm-project.git llvmorg-11.0.0-rc2-0-g414f32a9e86)
Target: x86_64-unknown-freebsd13.0
Thread model: posix
InstalledDir: /usr/bin

This prevented us from properly detecting AVX support, etc.

CLA: trivial

Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Ben Kaduk <kaduk@mit.edu>
(Merged from https://github.com/openssl/openssl/pull/12725)
2020-08-27 20:27:26 -07:00
Matt Caswell
33388b44b6 Update copyright year
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/11616)
2020-04-23 13:55:52 +01:00
David Benjamin
a21314dbbc Also check for errors in x86_64-xlate.pl.
In https://github.com/openssl/openssl/pull/10883, I'd meant to exclude
the perlasm drivers since they aren't opening pipes and do not
particularly need it, but I only noticed x86_64-xlate.pl, so
arm-xlate.pl and ppc-xlate.pl got the change.

That seems to have been fine, so be consistent and also apply the change
to x86_64-xlate.pl. Checking for errors is generally a good idea.

Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: David Benjamin <davidben@google.com>
(Merged from https://github.com/openssl/openssl/pull/10930)
2020-02-17 12:17:53 +10:00
H.J. Lu
98ad3fe82b x86_64: Add endbranch at function entries for Intel CET
To support Intel CET, all indirect branch targets must start with
endbranch.  Here is a patch to add endbranch to function entries
in x86_64 assembly codes which are indirect branch targets as
discovered by running openssl testsuite on Intel CET machine and
visual inspection.

Verified with

$ CC="gcc -Wl,-z,cet-report=error" ./Configure shared linux-x86_64 -fcf-protection
$ make
$ make test

and

$ CC="gcc -mx32 -Wl,-z,cet-report=error" ./Configure shared linux-x32 -fcf-protection
$ make
$ make test # <<< passed with https://github.com/openssl/openssl/pull/10988

Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/10982)
2020-02-15 22:15:03 +01:00
David Benjamin
32be631ca1 Do not silently truncate files on perlasm errors
If one of the perlasm xlate drivers crashes, OpenSSL's build will
currently swallow the error and silently truncate the output to however
far the driver got. This will hopefully fail to build, but better to
check such things.

Handle this by checking for errors when closing STDOUT (which is a pipe
to the xlate driver).

Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org>
(Merged from https://github.com/openssl/openssl/pull/10883)
2020-01-22 18:11:30 +01:00
Richard Levitte
9bb3e5fd87 For all assembler scripts where it matters, recognise clang > 9.x
Fixes #10853

Reviewed-by: Paul Dale <paul.dale@oracle.com>
(Merged from https://github.com/openssl/openssl/pull/10855)
2020-01-17 08:55:45 +01:00
Bernd Edlinger
048fa13e5e Add some missing cfi frame info in poly1305-x86_64.pl
Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
(Merged from https://github.com/openssl/openssl/pull/10678)
2019-12-23 20:26:19 +01:00
Patrick Steuer
826112295a s390x assembly pack: perlasm module update
- add instructions: clfi, stck, stckf, kdsa
- clfi and clgfi belong to extended-immediate (not long-displacement)
- some cleanup

Signed-off-by: Patrick Steuer <patrick.steuer@de.ibm.com>

Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/10346)
2019-11-05 10:05:27 +01:00
Richard Levitte
1aa89a7a3a Unify all assembler file generators
They now generally conform to the following argument sequence:

    script.pl "$(PERLASM_SCHEME)" [ C preprocessor arguments ... ] \
              $(PROCESSOR) <output file>

However, in the spirit of being able to use these scripts manually,
they also allow for no argument, or for only the flavour, or for only
the output file.  This is done by only using the last argument as
output file if it's a file (it has an extension), and only using the
first argument as flavour if it isn't a file (it doesn't have an
extension).

While we're at it, we make all $xlate calls the same, i.e. the $output
argument is always quoted, and we always die on error when trying to
start $xlate.

There's a perl lesson in this, regarding operator priority...

This will always succeed, even when it fails:

    open FOO, "something" || die "ERR: $!";

The reason is that '||' has higher priority than list operators (a
function is essentially a list operator and gobbles up everything
following it that isn't lower priority), and since a non-empty string
is always true, so that ends up being exactly the same as:

    open FOO, "something";

This, however, will fail if "something" can't be opened:

    open FOO, "something" or die "ERR: $!";

The reason is that 'or' has lower priority that list operators,
i.e. it's performed after the 'open' call.

Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/9884)
2019-09-16 16:29:57 +02:00
Antoine Cœur
c2969ff6e7 Fix Typos
CLA: trivial

Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Matthias St. Pierre <Matthias.St.Pierre@ncp-e.com>
(Merged from https://github.com/openssl/openssl/pull/9288)
2019-07-02 14:22:29 +02:00
Patrick Steuer
5ee08f45bc s390x assembly pack: remove poly1305 dependency on non-base memnonics
Signed-off-by: Patrick Steuer <patrick.steuer@de.ibm.com>

Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/8181)
2019-04-25 23:07:36 +02:00
Andy Polyakov
6465321e40 ARM64 assembly pack: add ThunderX2 results.
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/8776)
2019-04-17 21:08:13 +02:00
Patrick Steuer
2e6b615f79 s390x assembly pack: import poly from cryptogams repo
>=20% faster than present code.

Signed-off-by: Patrick Steuer <patrick.steuer@de.ibm.com>

Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/8560)
2019-03-29 17:05:39 +01:00
Andy Polyakov
291bc802e4 IA64 assembly pack: add {chacha|poly1305}-ia64 modules.
Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/8540)
2019-03-29 07:33:15 +01:00
Andy Polyakov
db42bb440e ARM64 assembly pack: make it Windows-friendly.
"Windows friendliness" means a) unified PIC-ification, unified across
all platforms; b) unified commantary delimiter; c) explicit ldur/stur,
as Visual Studio assembler can't automatically encode ldr/str as
ldur/stur when needed.

Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/8256)
2019-02-16 17:01:15 +01:00
Andy Polyakov
3405db97e5 ARM assembly pack: make it Windows-friendly.
"Windows friendliness" means a) flipping .thumb and .text directives,
b) always generate Thumb-2 code when asked(*); c) Windows-specific
references to external OPENSSL_armcap_P.

(*) so far *some* modules were compiled as .code 32 even if Thumb-2
was targeted. It works at hardware level because processor can alternate
between the modes with no overhead. But clang --target=arm-windows's
builtin assembler just refuses to compile .code 32...

Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/8252)
2019-02-16 16:59:23 +01:00
Andy Polyakov
9a18aae5f2 AArch64 assembly pack: authenticate return addresses.
ARMv8.3 adds pointer authentication extension, which in this case allows
to ensure that, when offloaded to stack, return address is same at return
as at entry to the subroutine. The new instructions are nops on processors
that don't implement the extension, so that the vetification is backward
compatible.

Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/8205)
2019-02-12 19:00:42 +01:00
Patrick Steuer
d6f4b0a8bf crypto/poly1305/asm/poly1305-s390x.pl: add vx code path.
Signed-off-by: Patrick Steuer <patrick.steuer@de.ibm.com>

Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/7991)
2019-02-05 15:43:40 +01:00
Andy Polyakov
a28e4890ee poly1305/asm/poly1305-ppc.pl: add vector base 2^26 implementation.
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/8120)
2019-02-01 09:23:25 +01:00
Richard Levitte
49d3b6416b Following the license change, modify the boilerplates in crypto/poly1305/
[skip ci]

Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/7810)
2018-12-06 15:13:16 +01:00
Matt Caswell
1212818eb0 Update copyright year
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/7176)
2018-09-11 13:45:17 +01:00
Andy Polyakov
8977880603 poly1305/asm/poly1305-x86_64.pl: fix solaris64-x86_64-cc build.
Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6676)
2018-07-10 12:01:56 +02:00
Andy Polyakov
0edb109f97 evp/e_chacha20_poly1305.c: further improve small-fragment TLS performance.
Improvement coefficients vary with TLS fragment length and platform, on
most Intel processors maximum improvement is ~50%, while on Ryzen - 80%.
The "secret" is new dedicated ChaCha20_128 code path and vectorized xor
helpers.

Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6638)
2018-07-06 16:33:19 +02:00
Matt Caswell
fd38836ba8 Update copyright year
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6538)
2018-06-20 15:29:23 +01:00
Andy Polyakov
27635a4ecb {chacha|poly1305}/asm/*-x64.pl: harmonize clang version detection.
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6499)
2018-06-18 19:59:07 +02:00
Andy Polyakov
41013cd63c PPC assembly pack: correct POWER9 results.
As it turns out originally published results were skewed by "turbo"
mode. VM apparently remains oblivious to dynamic frequency scaling,
and reports that processor operates at "base" frequency at all times.
While actual frequency gets increased under load.

Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6406)
2018-06-03 21:20:06 +02:00
Matt Caswell
83cf7abf8e Update copyright year
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6371)
2018-05-29 13:16:04 +01:00
Andy Polyakov
13f6857db1 PPC assembly pack: add POWER9 results.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2018-05-10 11:44:21 +02:00
Matt Caswell
6ec5fce25e Update copyright year
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6145)
2018-05-01 13:34:30 +01:00
Rahul Chaudhry
5bb1cd2292 poly1305/asm/poly1305-armv4.pl: remove unintentional relocation.
Branch to global symbol results in reference to PLT, and when compiling
for THUMB-2 - in a R_ARM_THM_JUMP19 relocation. Some linkers don't
support this relocation (ld.gold), while others can end up truncating
the relocation to fit (ld.bfd).

Convert this branch through PLT into a direct branch that the assembler
can resolve locally.

See https://github.com/android-ndk/ndk/issues/337 for background.

The current workaround is to disable poly1305 optimization assembly,
which is not optimal and can be reverted after this patch:
beab607d2b

CLA: trivial

Reviewed-by: Andy Polyakov <appro@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/5949)
2018-04-18 19:47:53 +02:00
Andy Polyakov
4dfe4310c3 poly1305/asm/poly1305-x86_64.pl: add Knights Landing AVX512 result.
Hardware used for benchmarking courtesy of Atos, experiments run by
Romain Dolbeau <romain.dolbeau@atos.net>. Kudos!

Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/4855)
2017-12-23 16:06:25 +01:00
Andy Polyakov
a8f302e5ba poly1305/asm/poly1305-x86_64.pl: switch to pure AVX512F.
Convert AVX512F+VL+BW code path to pure AVX512F, so that it can be
executed even on Knights Landing. Trigger for modification was
observation that AVX512 code paths can negatively affect overall
Skylake-X system performance. Since we are likely to suppress
AVX512F capability flag [at least on Skylake-X], conversion serves
as kind of "investment protection".

Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/4758)
2017-11-25 22:06:10 +01:00
Andy Polyakov
7533162322 ARMv8 assembly pack: add Qualcomm Kryo results.
[skip ci]

Reviewed-by: Tim Hudson <tjh@openssl.org>
2017-11-13 11:13:00 +01:00
Josh Soref
46f4e1bec5 Many spelling fixes/typo's corrected.
Around 138 distinct errors found and fixed; thanks!

Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/3459)
2017-11-11 19:03:10 -05:00
Andy Polyakov
64d92d7498 x86_64 assembly pack: "optimize" for Knights Landing, add AVX-512 results.
"Optimize" is in quotes because it's rather a "salvage operation"
for now. Idea is to identify processor capability flags that
drive Knights Landing to suboptimial code paths and mask them.
Two flags were identified, XSAVE and ADCX/ADOX. Former affects
choice of AES-NI code path specific for Silvermont (Knights Landing
is of Silvermont "ancestry"). And 64-bit ADCX/ADOX instructions are
effectively mishandled at decode time. In both cases we are looking
at ~2x improvement.

AVX-512 results cover even Skylake-X :-)

Hardware used for benchmarking courtesy of Atos, experiments run by
Romain Dolbeau <romain.dolbeau@atos.net>. Kudos!

Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-07-21 14:07:32 +02:00
Andy Polyakov
54f8f9a1ed x86_64 assembly pack: fill some blanks in Ryzen results.
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
2017-07-03 18:17:00 +02:00
David Benjamin
e195c8a256 Remove filename argument to x86 asm_init.
The assembler already knows the actual path to the generated file and,
in other perlasm architectures, is left to manage debug symbols itself.
Notably, in OpenSSL 1.1.x's new build system, which allows a separate
build directory, converting .pl to .s as the scripts currently do result
in the wrong paths.

This also avoids inconsistencies from some of the files using $0 and
some passing in the filename.

Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Andy Polyakov <appro@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/3431)
2017-05-11 17:00:23 -04:00
Andy Polyakov
0a5d1a38f2 poly1305/asm/poly1305-x86_64.pl: add poly1305_blocks_vpmadd52_8x.
As hinted by its name new subroutine processes 8 input blocks in
parallel by loading data to 512-bit registers. It still needs more
work, as it needs to handle some specific input lengths better.
In this sense it's yet another intermediate step...

Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-03-22 10:59:59 +01:00
Andy Polyakov
6cbfd94d08 x86_64 assembly pack: add some Ryzen performance results.
Reviewed-by: Tim Hudson <tjh@openssl.org>
2017-03-22 10:58:01 +01:00
Andy Polyakov
c2b935904a poly1305/asm/poly1305-x86_64.pl: add poly1305_blocks_vpmadd52_4x.
As hinted by its name new subroutine processes 4 input blocks in
parallel. It still operates on 256-bit registers and is just
another step toward full-blown AVX512IFMA procedure.

Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-03-13 18:48:34 +01:00
Andy Polyakov
a25cef89fd poly1305/asm/poly1305-armv8.pl: ilp32-specific poly1305_init fix.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-03-13 18:46:11 +01:00
Andy Polyakov
e052083cc7 poly1305/asm/poly1305-x86_64.pl: minor AVX512 optimization.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-02-26 21:27:54 +01:00
Andy Polyakov
1c47e8836f poly1305/asm/poly1305-x86_64.pl: add CFI annotations.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-02-26 21:26:07 +01:00
Andy Polyakov
fd910ef959 poly1305/asm/poly1305-x86_64.pl: add VPMADD52 code path.
This is initial and minimal single-block implementation.

Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-02-25 18:36:41 +01:00
Andy Polyakov
73e8a5c826 poly1305/asm/poly1305-x86_64.pl: switch to vpermdd in table expansion.
Effectively it's minor size optimization, 5-6% per affected subroutine.

Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-02-25 18:36:37 +01:00
Andy Polyakov
c1e1fc500d poly1305/asm/poly1305-x86_64.pl: optimize AVX512 code path.
On pre-Skylake best optimization strategy was balancing port-specific
instructions, while on Skylake minimizing the sheer amount appears
more sensible.

Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-02-25 18:35:45 +01:00
Andy Polyakov
a30b0522cb x86 assembly pack: update performance results.
Reviewed-by: Richard Levitte <levitte@openssl.org>
2016-12-19 16:18:25 +01:00
Andy Polyakov
1ea01427c5 poly1305/asm/poly1305-x86_64.pl: allow nasm to assemble AVX512 code.
chacha/asm/chacha-x86_64.pl: refine nasm version detection logic.

Reviewed-by: Richard Levitte <levitte@openssl.org>
2016-12-15 17:57:50 +01:00