openssl/crypto/des/options.txt

40 lines
2.2 KiB
Plaintext
Raw Normal View History

Note that the UNROLL option makes the 'inner' des loop unroll all 16 rounds
instead of the default 4.
RISC1 and RISC2 are 2 alternatives for the inner loop and
PTR means to use pointers arithmatic instead of arrays.
FreeBSD - Pentium Pro 200mhz - gcc 2.7.2.2 - assembler 577,000 4620k/s
IRIX 6.2 - R10000 195mhz - cc (-O3 -n32) - UNROLL RISC2 PTR 496,000 3968k/s
solaris 2.5.1 usparc 167mhz?? - SC4.0 - UNROLL RISC1 PTR [1] 459,400 3672k/s
FreeBSD - Pentium Pro 200mhz - gcc 2.7.2.2 - UNROLL RISC1 433,000 3468k/s
solaris 2.5.1 usparc 167mhz?? - gcc 2.7.2 - UNROLL 380,000 3041k/s
linux - pentium 100mhz - gcc 2.7.0 - assembler 281,000 2250k/s
NT 4.0 - pentium 100mhz - VC 4.2 - assembler 281,000 2250k/s
AIX 4.1? - PPC604 100mhz - cc - UNROLL 275,000 2200k/s
IRIX 5.3 - R4400 200mhz - gcc 2.6.3 - UNROLL RISC2 PTR 235,300 1882k/s
IRIX 5.3 - R4400 200mhz - cc - UNROLL RISC2 PTR 233,700 1869k/s
NT 4.0 - pentium 100mhz - VC 4.2 - UNROLL RISC1 PTR 191,000 1528k/s
DEC Alpha 165mhz?? - cc - RISC2 PTR [2] 181,000 1448k/s
linux - pentium 100mhz - gcc 2.7.0 - UNROLL RISC1 PTR 158,500 1268k/s
HPUX 10 - 9000/887 - cc - UNROLL [3] 148,000 1190k/s
solaris 2.5.1 - sparc 10 50mhz - gcc 2.7.2 - UNROLL 123,600 989k/s
IRIX 5.3 - R4000 100mhz - cc - UNROLL RISC2 PTR 101,000 808k/s
DGUX - 88100 50mhz(?) - gcc 2.6.3 - UNROLL 81,000 648k/s
solaris 2.4 486 50mhz - gcc 2.6.3 - assembler 65,000 522k/s
HPUX 10 - 9000/887 - k&r cc (default compiler) - UNROLL PTR 76,000 608k/s
solaris 2.4 486 50mhz - gcc 2.6.3 - UNROLL RISC2 43,500 344k/s
AIX - old slow one :-) - cc - 39,000 312k/s
Notes.
[1] For the ultra sparc, SunC 4.0
cc -xtarget=ultra -xarch=v8plus -Xa -xO5, running 'des_opts'
gives a speed of 344,000 des/s while 'speed' gives 459,000 des/s.
I'll record the higher since it is coming from the library but it
is all rather weird.
[2] Similar to the ultra sparc ([1]), 181,000 for 'des_opts' vs 175,000.
[3] I was unable to get access to this machine when it was not heavily loaded.
As such, my timing program was never able to get more that %30 of the CPU.
This would cause the program to give much lower speed numbers because
it would be 'fighting' to stay in the cache with the other CPU burning
processes.