Commit Graph

15 Commits

Author SHA1 Message Date
Dimitri Papadopoulos
eb4129e12c Fix typos found by codespell
Typos in doc/man* will be fixed in a different commit.

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/20910)
2023-06-15 10:11:46 +10:00
Xu Yizhou
cded5d0525 Fix SM4-XTS build failure on Mac mini M1
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/20202)
2023-02-06 12:36:07 +01:00
Xu Yizhou
c007203b94 SM4 AESE optimization for ARMv8
Signed-off-by: Xu Yizhou <xuyizhou1@huawei.com>

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/19914)
2023-02-02 10:16:47 +11:00
Xu Yizhou
accd3bdd11 Fix SM4 test failures on big-endian ARM processors
Signed-off-by: Xu Yizhou <xuyizhou1@huawei.com>

Reviewed-by: Paul Yang <kaishen.yy@antfin.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/19910)
2023-01-06 14:08:13 +01:00
fangming.fang
d89e0361d5 Fix SM4-CBC regression on Armv8
Fixes #19858

During decryption, the last ciphertext is not fed to next block
correctly when the number of input blocks is exactly 4. Fix this
and add the corresponding test cases.

Thanks xu-yi-zhou for reporting this issue and proposing the fix.

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/19872)
2022-12-13 09:42:06 +11:00
zhangzhilei
704e8090b4 optimize ossl_sm4_set_key speed
this optimization comes from libgcrypt, increse about 48% speed

Benchmark on my AMD Ryzen Threadripper 3990X

before:
Did 5752000 SM4 setup operations in 1000151us (5751131.6 ops/sec)
after:
Did 8506000 SM4 setup operations in 1000023us (8505804.4 ops/sec)

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Hugo Landau <hlandau@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/19270)
2022-10-13 13:20:24 +01:00
Matt Caswell
fecb3aae22 Update copyright year
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Release: yes
2022-05-03 13:34:51 +01:00
Daniel Hu
4908787f21 SM4 optimization for ARM by ASIMD
This patch optimizes SM4 for ARM processor using ASIMD instruction

It will improve performance if both of following conditions are met:
1) Input data equal to or more than 4 blocks
2) Cipher mode allows parallelism, including ECB,CTR,GCM or CBC decryption

This patch implements SM4 SBOX lookup in vector registers, with the
benefit of constant processing time over existing C implementation.

It is only enabled for micro-architecture N1/V1. In the ideal scenario,
performance can reach up to 2.7X

When either of above two conditions is not met, e.g. single block input
or CFB/OFB mode, CBC encryption, performance could drop about 50%.

The assembly code has been reviewed internally by ARM engineer
Fangming.Fang@arm.com

Signed-off-by: Daniel Hu <Daniel.Hu@arm.com>

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17951)
2022-04-12 10:37:42 +02:00
zhangzhilei
13ba91cb02 SM4 optimization for non-asm mode
This patch use table-lookup borrow from aes in crypto/aes/aes_core.c.

Test on my PC(AMD Ryzen Threadripper 3990X 64-Core Processor),

before and after optimization:

debug mode:

Before:
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
SM4-CBC          40101.14k    41453.80k    42073.86k    42174.81k    42216.11k    42227.03k
SM4-ECB          41222.60k    42074.88k    42673.66k    42868.05k    42896.04k    42844.16k
SM4-CTR          35867.22k    36874.47k    37004.97k    37083.82k    37052.42k    37076.99k

After:
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
SM4-CBC          47273.51k    48957.40k    49665.19k    49810.77k    49859.24k    49834.67k
SM4-ECB          48100.01k    49323.34k    50224.04k    50273.28k    50533.72k    50730.12k
SM4-CTR          41352.64k    42621.29k    42971.22k    43061.59k    43089.92k    43100.84k

non-debug mode:

Before:
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
SM4-CBC         141596.59k   145102.93k   146794.50k   146540.89k   146650.45k   146877.10k
SM4-ECB         144774.71k   155106.28k   158166.36k   158279.00k   158520.66k   159280.97k
SM4-CTR         138021.10k   141577.60k   142493.53k   142736.38k   142852.10k   143125.16k

After:
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
SM4-CBC         142016.95k   150068.48k   152238.25k   152773.97k   153094.83k   152027.14k
SM4-ECB         148842.94k   159919.87k   163628.37k   164515.84k   164697.43k   164790.27k
SM4-CTR         141774.23k   146206.89k   147470.25k   147816.28k   146770.60k   148346.20k

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17766)
2022-03-03 13:19:55 +01:00
Daniel Hu
15b7175f55 SM4 optimization for ARM by HW instruction
This patch implements the SM4 optimization for ARM processor,
using SM4 HW instruction, which is an optional feature of
crypto extension for aarch64 V8.

Tested on some modern ARM micro-architectures with SM4 support, the
performance uplift can be observed around 8X~40X over existing
C implementation in openssl. Algorithms that can be parallelized
(like CTR, ECB, CBC decryption) are on higher end, with algorithm
like CBC encryption on lower end (due to inter-block dependency)

Perf data on Yitian-710 2.75GHz hardware, before and after optimization:

Before:
  type      16 bytes     64 bytes    256 bytes    1024 bytes   8192 bytes  16384 bytes
  SM4-CTR  105787.80k   107837.87k   108380.84k   108462.08k   108549.46k   108554.92k
  SM4-ECB  111924.58k   118173.76k   119776.00k   120093.70k   120264.02k   120274.94k
  SM4-CBC  106428.09k   109190.98k   109674.33k   109774.51k   109827.41k   109827.41k

After (7.4x - 36.6x faster):
  type      16 bytes     64 bytes    256 bytes    1024 bytes   8192 bytes  16384 bytes
  SM4-CTR  781979.02k  2432994.28k  3437753.86k  3834177.88k  3963715.58k  3974556.33k
  SM4-ECB  937590.69k  2941689.02k  3945751.81k  4328655.87k  4459181.40k  4468692.31k
  SM4-CBC  890639.88k  1027746.58k  1050621.78k  1056696.66k  1058613.93k  1058701.31k

Signed-off-by: Daniel Hu <Daniel.Hu@arm.com>

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17455)
2022-01-18 11:52:14 +01:00
Matt Caswell
3c2bdd7df9 Update copyright year
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/14801)
2021-04-08 13:04:41 +01:00
Shane Lontis
8a6e912520 Add ossl_ symbols for sm3 and sm4
Partial fix for #12964

Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/14473)
2021-03-18 17:52:37 +10:00
Dr. Matthias St. Pierre
25f2138b0a Reorganize private crypto header files
Currently, there are two different directories which contain internal
header files of libcrypto which are meant to be shared internally:

While header files in 'include/internal' are intended to be shared
between libcrypto and libssl, the files in 'crypto/include/internal'
are intended to be shared inside libcrypto only.

To make things complicated, the include search path is set up in such
a way that the directive #include "internal/file.h" could refer to
a file in either of these two directoroes. This makes it necessary
in some cases to add a '_int.h' suffix to some files to resolve this
ambiguity:

  #include "internal/file.h"      # located in 'include/internal'
  #include "internal/file_int.h"  # located in 'crypto/include/internal'

This commit moves the private crypto headers from

  'crypto/include/internal'  to  'include/crypto'

As a result, the include directives become unambiguous

  #include "internal/file.h"       # located in 'include/internal'
  #include "crypto/file.h"         # located in 'include/crypto'

hence the superfluous '_int.h' suffixes can be stripped.

The files 'store_int.h' and 'store.h' need to be treated specially;
they are joined into a single file.

Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/9333)
2019-09-28 20:26:34 +02:00
Richard Levitte
f9f859adc6 Following the license change, modify the boilerplates in crypto/smN/
[skip ci]

Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/7818)
2018-12-06 15:24:52 +01:00
Ronald Tse
f19a5ff9ab SM4: Add SM4 block cipher to EVP
Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Paul Dale <paul.dale@oracle.com>
(Merged from https://github.com/openssl/openssl/pull/4552)
2017-10-31 15:19:14 +10:00