mirror of
https://github.com/openssl/openssl.git
synced 2025-03-01 19:28:10 +08:00
Aes-ecb mode can be optimized by inverleaving cipher operation on several blocks and loop unrolling. Interleaving needs one ideal unrolling factor, here we adopt the same factor with aes-cbc, which is described as below: If blocks number > 5, select 5 blocks as one iteration,every loop, decrease the blocks number by 5. If 3 < left blocks < 5 select 3 blocks as one iteration, every loop, decrease the block number by 3. If left blocks < 3, treat them as tail blocks. Detailed implementation will have a little adjustment for squeezing code space. With this way, for small size such as 16 bytes, the performance is similar as before, but for big size such as 16k bytes, the performance improves a lot, even reaches to 100%, for some arches such as A57, the improvement even exceeds 100%. The following table will list the encryption performance data on aarch64, take a72 and a57 as examples. Performance value takes the unit of cycles per byte, takes the format as comparision of values. List them as below: A72: Before optimization After optimization Improve evp-aes-128-ecb@16 17.26538237 16.82663866 2.61% evp-aes-128-ecb@64 5.50528499 5.222637557 5.41% evp-aes-128-ecb@256 2.632700213 1.908442892 37.95% evp-aes-128-ecb@1024 1.876102047 1.078018868 74.03% evp-aes-128-ecb@8192 1.6550392 0.853982929 93.80% evp-aes-128-ecb@16384 1.636871283 0.847623957 93.11% evp-aes-192-ecb@16 17.73104961 17.09692468 3.71% evp-aes-192-ecb@64 5.78984398 5.418545192 6.85% evp-aes-192-ecb@256 2.872005308 2.081815274 37.96% evp-aes-192-ecb@1024 2.083226672 1.25095642 66.53% evp-aes-192-ecb@8192 1.831992057 0.995916251 83.95% evp-aes-192-ecb@16384 1.821590009 0.993820525 83.29% evp-aes-256-ecb@16 18.0606306 17.96963317 0.51% evp-aes-256-ecb@64 6.19651997 5.762465812 7.53% evp-aes-256-ecb@256 3.176991394 2.24642538 41.42% evp-aes-256-ecb@1024 2.385991919 1.396018192 70.91% evp-aes-256-ecb@8192 2.147862636 1.142222597 88.04% evp-aes-256-ecb@16384 2.131361787 1.135944617 87.63% A57: Before optimization After optimization Improve evp-aes-128-ecb@16 18.61045121 18.36456218 1.34% evp-aes-128-ecb@64 6.438628994 5.467959461 17.75% evp-aes-128-ecb@256 2.957452881 1.97238604 49.94% evp-aes-128-ecb@1024 2.117096219 1.099665054 92.52% evp-aes-128-ecb@8192 1.868385973 0.837440804 123.11% evp-aes-128-ecb@16384 1.853078526 0.822420027 125.32% evp-aes-192-ecb@16 19.07021756 18.50018552 3.08% evp-aes-192-ecb@64 6.672351486 5.696088921 17.14% evp-aes-192-ecb@256 3.260427769 2.131449916 52.97% evp-aes-192-ecb@1024 2.410522832 1.250529718 92.76% evp-aes-192-ecb@8192 2.17921605 0.973225504 123.92% evp-aes-192-ecb@16384 2.162250997 0.95919871 125.42% evp-aes-256-ecb@16 19.3008384 19.12743654 0.91% evp-aes-256-ecb@64 6.992950658 5.92149541 18.09% evp-aes-256-ecb@256 3.576361743 2.287619504 56.34% evp-aes-256-ecb@1024 2.726671027 1.381267599 97.40% evp-aes-256-ecb@8192 2.493583657 1.110959913 124.45% evp-aes-256-ecb@16384 2.473916816 1.099967073 124.91% Change-Id: Iccd23d972e0d52d22dc093f4c208f69c9d5a0ca7 Reviewed-by: Shane Lontis <shane.lontis@oracle.com> Reviewed-by: Richard Levitte <levitte@openssl.org> (Merged from https://github.com/openssl/openssl/pull/10518) |
||
---|---|---|
.. | ||
aes | ||
aria | ||
asn1 | ||
async | ||
bf | ||
bio | ||
bn | ||
buffer | ||
camellia | ||
cast | ||
chacha | ||
cmac | ||
cmp | ||
cms | ||
comp | ||
conf | ||
crmf | ||
ct | ||
des | ||
dh | ||
dsa | ||
dso | ||
ec | ||
engine | ||
err | ||
ess | ||
evp | ||
hmac | ||
idea | ||
kdf | ||
lhash | ||
md2 | ||
md4 | ||
md5 | ||
mdc2 | ||
modes | ||
objects | ||
ocsp | ||
pem | ||
perlasm | ||
pkcs7 | ||
pkcs12 | ||
poly1305 | ||
property | ||
rand | ||
rc2 | ||
rc4 | ||
rc5 | ||
ripemd | ||
rsa | ||
seed | ||
serializer | ||
sha | ||
siphash | ||
sm2 | ||
sm3 | ||
sm4 | ||
srp | ||
stack | ||
store | ||
ts | ||
txt_db | ||
ui | ||
whrlpool | ||
x509 | ||
alphacpuid.pl | ||
arm64cpuid.pl | ||
arm_arch.h | ||
armcap.c | ||
armv4cpuid.pl | ||
asn1_dsa.c | ||
bsearch.c | ||
build.info | ||
c64xpluscpuid.pl | ||
context.c | ||
core_algorithm.c | ||
core_fetch.c | ||
core_namemap.c | ||
cpt_err.c | ||
cryptlib.c | ||
ctype.c | ||
cversion.c | ||
dllmain.c | ||
ebcdic.c | ||
ex_data.c | ||
getenv.c | ||
ia64cpuid.S | ||
info.c | ||
init.c | ||
initthread.c | ||
LPdir_nyi.c | ||
LPdir_unix.c | ||
LPdir_vms.c | ||
LPdir_win32.c | ||
LPdir_win.c | ||
LPdir_wince.c | ||
mem_clr.c | ||
mem_dbg.c | ||
mem_sec.c | ||
mem.c | ||
mips_arch.h | ||
o_dir.c | ||
o_fips.c | ||
o_fopen.c | ||
o_init.c | ||
o_str.c | ||
o_time.c | ||
packet.c | ||
param_build.c | ||
params_from_text.c | ||
params.c | ||
pariscid.pl | ||
ppc_arch.h | ||
ppccap.c | ||
ppccpuid.pl | ||
provider_conf.c | ||
provider_core.c | ||
provider_local.h | ||
provider_predefined.c | ||
provider.c | ||
README.sparse_array | ||
s390x_arch.h | ||
s390xcap.c | ||
s390xcpuid.pl | ||
sparc_arch.h | ||
sparccpuid.S | ||
sparcv9cap.c | ||
sparse_array.c | ||
threads_none.c | ||
threads_pthread.c | ||
threads_win.c | ||
trace.c | ||
uid.c | ||
vms_rms.h | ||
x86_64cpuid.pl | ||
x86cpuid.pl |
The sparse_array.c file contains an implementation of a sparse array that attempts to be both space and time efficient. The sparse array is represented using a tree structure. Each node in the tree contains a block of pointers to either the user supplied leaf values or to another node. There are a number of parameters used to define the block size: OPENSSL_SA_BLOCK_BITS Specifies the number of bits covered by each block SA_BLOCK_MAX Specifies the number of pointers in each block SA_BLOCK_MASK Specifies a bit mask to perform modulo block size SA_BLOCK_MAX_LEVELS Indicates the maximum possible height of the tree These constants are inter-related: SA_BLOCK_MAX = 2 ^ OPENSSL_SA_BLOCK_BITS SA_BLOCK_MASK = SA_BLOCK_MAX - 1 SA_BLOCK_MAX_LEVELS = number of bits in size_t divided by OPENSSL_SA_BLOCK_BITS rounded up to the next multiple of OPENSSL_SA_BLOCK_BITS OPENSSL_SA_BLOCK_BITS can be defined at compile time and this overrides the built in setting. As a space and performance optimisation, the height of the tree is usually less than the maximum possible height. Only sufficient height is allocated to accommodate the largest index added to the data structure. The largest index used to add a value to the array determines the tree height: +----------------------+---------------------+ | Largest Added Index | Height of Tree | +----------------------+---------------------+ | SA_BLOCK_MAX - 1 | 1 | | SA_BLOCK_MAX ^ 2 - 1 | 2 | | SA_BLOCK_MAX ^ 3 - 1 | 3 | | ... | ... | | size_t max | SA_BLOCK_MAX_LEVELS | +----------------------+---------------------+ The tree height is dynamically increased as needed based on additions. An empty tree is represented by a NULL root pointer. Inserting a value at index 0 results in the allocation of a top level node full of null pointers except for the single pointer to the user's data (N = SA_BLOCK_MAX for brevity): +----+ |Root| |Node| +-+--+ | | | v +-+-+---+---+---+---+ | 0 | 1 | 2 |...|N-1| | |nil|nil|...|nil| +-+-+---+---+---+---+ | | | v +-+--+ |User| |Data| +----+ Index 0 Inserting at element 2N+1 creates a new root node and pushes down the old root node. It then creates a second second level node to hold the pointer to the user's new data: +----+ |Root| |Node| +-+--+ | | | v +-+-+---+---+---+---+ | 0 | 1 | 2 |...|N-1| | |nil| |...|nil| +-+-+---+-+-+---+---+ | | | +------------------+ | | v v +-+-+---+---+---+---+ +-+-+---+---+---+---+ | 0 | 1 | 2 |...|N-1| | 0 | 1 | 2 |...|N-1| |nil| |nil|...|nil| |nil| |nil|...|nil| +-+-+---+---+---+---+ +---+-+-+---+---+---+ | | | | | | v v +-+--+ +-+--+ |User| |User| |Data| |Data| +----+ +----+ Index 0 Index 2N+1 The nodes themselves are allocated in a sparse manner. Only nodes which exist along a path from the root of the tree to an added leaf will be allocated. The complexity is hidden and nodes are allocated on an as needed basis. Because the data is expected to be sparse this doesn't result in a large waste of space. Values can be removed from the sparse array by setting their index position to NULL. The data structure does not attempt to reclaim nodes or reduce the height of the tree on removal. For example, now setting index 0 to NULL would result in: +----+ |Root| |Node| +-+--+ | | | v +-+-+---+---+---+---+ | 0 | 1 | 2 |...|N-1| | |nil| |...|nil| +-+-+---+-+-+---+---+ | | | +------------------+ | | v v +-+-+---+---+---+---+ +-+-+---+---+---+---+ | 0 | 1 | 2 |...|N-1| | 0 | 1 | 2 |...|N-1| |nil|nil|nil|...|nil| |nil| |nil|...|nil| +---+---+---+---+---+ +---+-+-+---+---+---+ | | | v +-+--+ |User| |Data| +----+ Index 2N+1 Accesses to elements in the sparse array take O(log n) time where n is the largest element. The base of the logarithm is SA_BLOCK_MAX, so for moderately small indices (e.g. NIDs), single level (constant time) access is achievable. Space usage is O(minimum(m, n log(n)) where m is the number of elements in the array. Note: sparse arrays only include pointers to types. Thus, SPARSE_ARRAY_OF(char) can be used to store a string.