mirror of
https://github.com/openssl/openssl.git
synced 2025-01-18 13:44:20 +08:00
4c22909e31
the remainder left in %edx. Here is the resulting performance improvement matrix (improvement as a result of this *and* previous tune-up committed two days ago). The results were obtained by profiling the "div" part of the crypto/bn/bnspeed.c. CPU BN_div bn_div_words overall comment ------------------------------------------------------------------------ PII +16% accumulated by +2-3% PII multiplies damn fast! Taking inlining multiplication out of the loop didn't make too much difference. Eliminating of the multiplication involved in remainder calculation is the major factor. Pentium +45% accumulated by +7-9% mull isn't that fast and replacing inlining multiplications with additions in the loop has more visible effect:-) MIPS +75% +12% +20-25% In addition to the taking mults R10000 out of the loop (giving 12% in the asm/mips3.s) three mults were eliminated in BN_div. Alpha +30% +50% +10-15% Same as above. But remember that EV4 bn_div_words is a C implementation. It takes 4 Alpha mults in C to do the same thing as 1 MIPS mult in assembler does. So the effect (50%) is more impressive. But not the overall one... Well, if Alpha bn_mul_add would be implemented in assembler overall improvement would be closer to MIPS... |
||
---|---|---|
.. | ||
asm | ||
old | ||
.cvsignore | ||
bn_add.c | ||
bn_asm.c | ||
bn_blind.c | ||
bn_comba.c | ||
bn_div.c | ||
bn_err.c | ||
bn_exp2.c | ||
bn_exp.c | ||
bn_gcd.c | ||
bn_lcl.h | ||
bn_lib.c | ||
bn_mont.c | ||
bn_mpi.c | ||
bn_mul.c | ||
bn_opts.c | ||
bn_prime.c | ||
bn_prime.h | ||
bn_prime.pl | ||
bn_print.c | ||
bn_rand.c | ||
bn_recp.c | ||
bn_shift.c | ||
bn_sqr.c | ||
bn_word.c | ||
bn.h | ||
bn.mul | ||
bnspeed.c | ||
bntest.c | ||
comba.pl | ||
d.c | ||
exp.c | ||
expspeed.c | ||
exptest.c | ||
Makefile.ssl | ||
new | ||
test.c | ||
todo | ||
vms-helper.c |