glibc/sysdeps/powerpc/powerpc64
Adhemerval Zanella cfee7d9cf4 powerpc: Remove f{max,min}{f} assembly implementations
This patch removes the powerpc assembly implementation of fmax/fmin.
Based on benchtests, the assembly ones shows:

$ ./testrun.sh benchtests/bench-fmax
  "fmax": {
   "": {
    "duration": 5.07586e+09,
    "iterations": 2.01676e+09,
    "max": 1350.39,
    "min": 2.073,
    "mean": 2.51684
   },
   "qNaN": {
    "duration": 5.09315e+09,
    "iterations": 8.4568e+08,
    "max": 2788,
    "min": 5.806,
    "mean": 6.02255
   },
   "sNaN": {
    "duration": 5.09073e+09,
    "iterations": 8.42316e+08,
    "max": 4215.84,
    "min": 5.737,
    "mean": 6.04373
   }

And

$ ./testrun.sh benchtests/bench-fmin
  "fmin": {
   "": {
    "duration": 5.07711e+09,
    "iterations": 2.02982e+09,
    "max": 497.094,
    "min": 2.073,
    "mean": 2.50126
   },
   "qNaN": {
    "duration": 5.09134e+09,
    "iterations": 8.46968e+08,
    "max": 2255.14,
    "min": 5.807,
    "mean": 6.01125
   },
   "sNaN": {
    "duration": 5.09122e+09,
    "iterations": 8.4746e+08,
    "max": 1969.38,
    "min": 5.729,
    "mean": 6.00763
   }
  }

The default implementation (math/s_f{max.min}_template.c) shows slight better
latency for all cases:

$ ./testrun.sh benchtests/bench-fmax
  "fmax": {
   "": {
    "duration": 5.07044e+09,
    "iterations": 2.38695e+09,
    "max": 2048.58,
    "min": 2.073,
    "mean": 2.12423
   },
   "qNaN": {
    "duration": 5.09004e+09,
    "iterations": 9.45428e+08,
    "max": 3306.93,
    "min": 5.138,
    "mean": 5.38385
   },
   "sNaN": {
    "duration": 5.08458e+09,
    "iterations": 1.15959e+09,
    "max": 972.008,
    "min": 3.321,
    "mean": 4.3848
   }
  }

And:

$ ./testrun.sh benchtests/bench-fmin
  "fmin": {
   "": {
    "duration": 5.06817e+09,
    "iterations": 2.3913e+09,
    "max": 1177.9,
    "min": 2.073,
    "mean": 2.11942
   },
   "qNaN": {
    "duration": 5.08857e+09,
    "iterations": 9.45656e+08,
    "max": 2658.83,
    "min": 5.09,
    "mean": 5.38099
   },
   "sNaN": {
    "duration": 5.08093e+09,
    "iterations": 1.16725e+09,
    "max": 1030.74,
    "min": 3.323,
    "mean": 4.3529
   }
  }

Both were run with GCC 5.4 (ubuntu 16 default installation) using default
compiler flags on POWER8E 3.4GHz (powerpc64le-linux-gnu).
2016-12-27 17:42:09 -02:00
..
970
a2
bits Define wordsize.h macros everywhere 2016-11-04 09:37:44 -07:00
cell
fpu powerpc: Remove f{max,min}{f} assembly implementations 2016-12-27 17:42:09 -02:00
multiarch powerpc: strncmp optimization for power9 2016-12-13 10:53:42 +05:30
power4
power5
power5+
power6 Fix cmpli usage in power6 memset. 2016-10-25 15:54:16 +00:00
power6x
power7 Fix powerpc64/power7 memchr for large input sizes 2016-12-16 11:30:20 -02:00
power8 powerpc: Fix return code of strcasecmp for unaligned inputs 2016-07-05 21:20:41 +05:30
power9 powerpc: strncmp optimization for power9 2016-12-13 10:53:42 +05:30
__longjmp-common.S
__longjmp.S
addmul_1.S
atomic-machine.h Remove atomic_compare_and_exchange_bool_rel. 2016-06-24 23:04:40 +03:00
backtrace.c
bsd-_setjmp.S
bsd-setjmp.S
bzero.S
configure
configure.ac
crti.S
crtn.S
dl-dtprocnum.h
dl-irel.h
dl-machine.c
dl-machine.h
dl-trampoline.S
entry.h
ffsll.c
hp-timing.h
Implies
lshift.S
Makefile
memcpy.S
memset.S
mul_1.S
ppc-mcount.S
register-dump.h
rtld-memset.c
setjmp-common.S powerpc: Add hidden definition for __sigsetjmp 2016-11-29 10:16:35 +01:00
setjmp.S
stackguard-macros.h
start.S
strchr.S
strcmp.S
strlen.S
strncmp.S
strtok_r.S
strtok.S
submul_1.S
sysdep.h
tls-macros.h
tst-audit.h