Optimize AES-XTS mode in OpenSSL for aarch64

Aes-xts mode can be optimized by interleaving cipher operation on
several blocks and loop unrolling. Interleaving needs one ideal
unrolling factor, here we adopt the same factor with aes-cbc,
which is described as below:
	If blocks number > 5, select 5 blocks as one iteration,every
	loop, decrease the blocks number by 5.
	If left blocks < 5, treat them as tail blocks.
Detailed implementation has a little adjustment for squeezing
code space.
With this way, for small size such as 16 bytes, the performance is
similar as before, but for big size such as 16k bytes, the performance
improves a lot, even reaches to 2x uplift, for some arches such as A57,
the improvement even reaches more than 2x uplift. We collect many
performance datas on different micro-archs such as thunderx2,
ampere-emag, a72, a75, a57, a53 and N1, all of which reach 0.5-2x uplift.
The following table lists the encryption performance data on aarch64,
take a72, a75, a57, a53 and N1 as examples. Performance value takes the
unit of cycles per byte, takes the format as comparision of values.
List them as below:

A72:
                            Before optimization     After optimization  Improve
evp-aes-128-xts@16          8.899913518             5.949087263         49.60%
evp-aes-128-xts@64          4.525512668             3.389141845         33.53%
evp-aes-128-xts@256         3.502906908             1.633573479         114.43%
evp-aes-128-xts@1024        3.174210419             1.155952639         174.60%
evp-aes-128-xts@8192        3.053019303             1.028134888         196.95%
evp-aes-128-xts@16384       3.025292462             1.02021169          196.54%
evp-aes-256-xts@16          9.971105023             6.754233758         47.63%
evp-aes-256-xts@64          4.931479093             3.786527393         30.24%
evp-aes-256-xts@256         3.746788153             1.943975947         92.74%
evp-aes-256-xts@1024        3.401743802             1.477394648         130.25%
evp-aes-256-xts@8192        3.278769327             1.32950421          146.62%
evp-aes-256-xts@16384       3.27093296              1.325276257         146.81%

A75:
                            Before optimization     After optimization  Improve
evp-aes-128-xts@16          8.397965173             5.126839098         63.80%
evp-aes-128-xts@64          4.176860631             2.59817764          60.76%
evp-aes-128-xts@256         3.069126585             1.284561028         138.92%
evp-aes-128-xts@1024        2.805962699             0.932754655         200.83%
evp-aes-128-xts@8192        2.725820131             0.829820397         228.48%
evp-aes-128-xts@16384       2.71521905              0.823251591         229.82%
evp-aes-256-xts@16          11.24790935             7.383914448         52.33%
evp-aes-256-xts@64          5.294128847             3.048641998         73.66%
evp-aes-256-xts@256         3.861649617             1.570359905         145.91%
evp-aes-256-xts@1024        3.537646797             1.200493533         194.68%
evp-aes-256-xts@8192        3.435353012             1.085345319         216.52%
evp-aes-256-xts@16384       3.437952563             1.097963822         213.12%

A57:
                            Before optimization     After optimization  Improve
evp-aes-128-xts@16          10.57455446             7.165438012         47.58%
evp-aes-128-xts@64          5.418185447             3.721241202         45.60%
evp-aes-128-xts@256         3.855184592             1.747145379         120.66%
evp-aes-128-xts@1024        3.477199757             1.253049735         177.50%
evp-aes-128-xts@8192        3.36768104              1.091943159         208.41%
evp-aes-128-xts@16384       3.360373443             1.088942789         208.59%
evp-aes-256-xts@16          12.54559459             8.745489036         43.45%
evp-aes-256-xts@64          6.542808937             4.326387568         51.23%
evp-aes-256-xts@256         4.62668822              2.119908754         118.25%
evp-aes-256-xts@1024        4.161716505             1.557335554         167.23%
evp-aes-256-xts@8192        4.032462227             1.377749511         192.68%
evp-aes-256-xts@16384       4.023293877             1.371558933         193.34%

A53:
                            Before optimization     After optimization  Improve
evp-aes-128-xts@16          18.07842135             13.96980808         29.40%
evp-aes-128-xts@64          7.933818397             6.07159276          30.70%
evp-aes-128-xts@256         5.264604704             2.611155744         101.60%
evp-aes-128-xts@1024        4.606660117             1.722713454         167.40%
evp-aes-128-xts@8192        4.405160115             1.454379201         202.90%
evp-aes-128-xts@16384       4.401592028             1.442279392         205.20%
evp-aes-256-xts@16          20.07084054             16.00803726         25.40%
evp-aes-256-xts@64          9.192647294             6.883876732         33.50%
evp-aes-256-xts@256         6.336143161             3.108140452         103.90%
evp-aes-256-xts@1024        5.62502952              2.097960651         168.10%
evp-aes-256-xts@8192        5.412085608             1.807294191         199.50%
evp-aes-256-xts@16384       5.403062591             1.790135764         201.80%

N1:
                            Before optimization     After optimization  Improve
evp-aes-128-xts@16          6.48147613              4.209415473         53.98%
evp-aes-128-xts@64          2.847744115             1.950757468         45.98%
evp-aes-128-xts@256         2.085711968             1.061903238         96.41%
evp-aes-128-xts@1024        1.842014669             0.798486302         130.69%
evp-aes-128-xts@8192        1.760449052             0.713853939         146.61%
evp-aes-128-xts@16384       1.760763546             0.707702009         148.80%
evp-aes-256-xts@16          7.264142817             5.265970454         37.94%
evp-aes-256-xts@64          3.251356212             2.41176323          34.81%
evp-aes-256-xts@256         2.380488469             1.342095742         77.37%
evp-aes-256-xts@1024        2.08853022              1.041718215         100.49%
evp-aes-256-xts@8192        2.027432668             0.944571334         114.64%
evp-aes-256-xts@16384       2.00740782              0.941991415         113.10%

Add more XTS test cases to cover the cipher stealing mode and cases of different
number of blocks.

CustomizedGitHooks: yes
Change-Id: I93ee31b2575e1413764e27b599af62994deb4c96

Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org>
(Merged from https://github.com/openssl/openssl/pull/11399)
This commit is contained in:
XiaokangQian 2020-03-13 03:27:34 +00:00 committed by Tomas Mraz
parent c87a7f31a3
commit 9ce8e0d17e
3 changed files with 1468 additions and 0 deletions

File diff suppressed because it is too large Load Diff

View File

@ -90,6 +90,10 @@ void AES_xts_decrypt(const unsigned char *inp, unsigned char *out, size_t len,
# define HWAES_decrypt aes_v8_decrypt
# define HWAES_cbc_encrypt aes_v8_cbc_encrypt
# define HWAES_ecb_encrypt aes_v8_ecb_encrypt
# if __ARM_MAX_ARCH__>=8
# define HWAES_xts_encrypt aes_v8_xts_encrypt
# define HWAES_xts_decrypt aes_v8_xts_decrypt
# endif
# define HWAES_ctr32_encrypt_blocks aes_v8_ctr32_encrypt_blocks
# define AES_PMULL_CAPABLE ((OPENSSL_armcap_P & ARMV8_PMULL) && (OPENSSL_armcap_P & ARMV8_AES))
# define AES_GCM_ENC_BYTES 512

View File

@ -1146,6 +1146,44 @@ IV = 00000000000000000000000000000000
Plaintext = 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1
Ciphertext = 27A7479BEFA1D476489F308CD4CFA6E2A96E4BBE3208FF25287DD3819616E89CC78CF7F5E543445F8333D8FA7F56000005279FA5D8B5E4AD40E736DDB4D35412328063FD2AAB53E5EA1E0A9F332500A5DF9487D07A5C92CC512C8866C7E860CE93FDF166A24912B422976146AE20CE846BB7DC9BA94A767AAEF20C0D61AD02655EA92DC4C4E41A8952C651D33174BE51A10C421110E6D81588EDE82103A252D8A750E8768DEFFFED9122810AAEB99F910409B03D164E727C31290FD4E039500872AF
Title = AES XTS Non standard test vectors - generated from reference implementation
Cipher = aes-128-xts
Key = fffefdfcfbfaf9f8f7f6f5f4f3f2f1f0bfbebdbcbbbab9b8b7b6b5b4b3b2b1b0
IV = 9a785634120000000000000000000000
Plaintext = 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f2021
Ciphertext = edbf9dace45d6f6a7306e64be5dd824b9dc31efeb418c373ce073b66755529982538
Cipher = aes-128-xts
Key = fffefdfcfbfaf9f8f7f6f5f4f3f2f1f0bfbebdbcbbbab9b8b7b6b5b4b3b2b1b0
IV = 9a785634120000000000000000000000
Plaintext = 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031
Ciphertext = edbf9dace45d6f6a7306e64be5dd824b2538f5724fcf24249ac111ab45ad39237a709959673bd8747d58690f8c762a353ad6
Cipher = aes-128-xts
Key = 2718281828459045235360287471352631415926535897932384626433832795
IV = 00000000000000000000000000000000
Plaintext = 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f
Ciphertext = 27a7479befa1d476489f308cd4cfa6e2a96e4bbe3208ff25287dd3819616e89cc78cf7f5e543445f8333d8fa7f56000005279fa5d8b5e4ad40e736ddb4d35412
Cipher = aes-128-xts
Key = fffefdfcfbfaf9f8f7f6f5f4f3f2f1f0bfbebdbcbbbab9b8b7b6b5b4b3b2b1b0
IV = 9a785634120000000000000000000000
Plaintext = 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f40
Ciphertext = edbf9dace45d6f6a7306e64be5dd824b2538f5724fcf24249ac111ab45ad39233ad6183c66fa548a3cdf3e36d2b21ccde9ffb48286ec211619e02decc7ca0883c6
Cipher = aes-128-xts
Key = fffefdfcfbfaf9f8f7f6f5f4f3f2f1f0bfbebdbcbbbab9b8b7b6b5b4b3b2b1b0
IV = 9a785634120000000000000000000000
Plaintext = 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f
Ciphertext = edbf9dace45d6f6a7306e64be5dd824b2538f5724fcf24249ac111ab45ad39233ad6183c66fa548a3cdf3e36d2b21ccdc6bc657cb3aeb87ba2c5f58ffafacd76d0a098b687c0b6536d560ca007051b0b
Cipher = aes-128-xts
Key = fffefdfcfbfaf9f8f7f6f5f4f3f2f1f0bfbebdbcbbbab9b8b7b6b5b4b3b2b1b0
IV = 9a785634120000000000000000000000
Plaintext = 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f5051
Ciphertext = edbf9dace45d6f6a7306e64be5dd824b2538f5724fcf24249ac111ab45ad39233ad6183c66fa548a3cdf3e36d2b21ccdc6bc657cb3aeb87ba2c5f58ffafacd765ecc4c85c0a01bf317b823fbd6111956d0a0
Title = Case insensitive AES tests
Cipher = Aes-128-eCb