glibc/sysdeps
Joseph Myers 0f04fc07f6 Use VSQRT instruction for ARM sqrt (bug 20660).
This patch makes ARM sqrt and sqrtf use the VSQRT VFP square root
instruction when available, instead of much larger generic code for
computing square roots.

Now, GCC will normally inline sqrt calls except for negative arguments
where errno needs to be set, and because the benchtests fail to use
-fno-builtin that means no significant difference in benchmark results
for sqrt (note, however, there are lots of __ieee754_sqrt calls
internally in libm, which are *not* inlined - although some
architectures define __ieee754_sqrt in their math_private.h for that
purpose, ARM doesn't - so improving out-of-line sqrt performance is
still relevant to those other functions, if not for most ordinary
direct users of sqrt).  With the benchtests changed to use
-fno-builtin for sqrt tests, typical performance results before the
change are ("max" is wildly varying in any case):

    "duration": 9.88358e+09,
    "iterations": 4.8783e+07,
    "max": 457.764,
    "min": 183.105,
    "mean": 202.603

and after it are:

    "duration": 9.45663e+09,
    "iterations": 2.24385e+08,
    "max": 274.659,
    "min": 30.517,
    "mean": 42.1447

Tested for ARM (hard-float and soft-float).

	[BZ #20660]
	* sysdeps/arm/e_sqrt.c: New file.
	* sysdeps/arm/e_sqrtf.c: Likewise.
2016-10-20 23:24:44 +00:00
..
aarch64 Add femode_t functions: aarch64. 2016-09-07 16:41:20 +00:00
alpha Add femode_t functions: alpha. 2016-09-07 16:42:19 +00:00
arm Use VSQRT instruction for ARM sqrt (bug 20660). 2016-10-20 23:24:44 +00:00
generic Define HIGH_ORDER_BIT_IS_SET_FOR_SNAN to 0 or 1. 2016-10-17 22:48:51 +00:00
gnu Add TCP_REPAIR_WINDOW from Linux 4.8. 2016-10-03 21:01:42 +00:00
hppa Define HIGH_ORDER_BIT_IS_SET_FOR_SNAN to 0 or 1. 2016-10-17 22:48:51 +00:00
i386 Installed-header hygiene (BZ#20366): stack_t. 2016-09-23 08:43:56 -04:00
ia64 Remove the ptw-% patterns 2016-09-14 16:02:06 +02:00
ieee754 Add getpayload, getpayloadf, getpayloadl. 2016-10-19 01:49:09 +00:00
init_array Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
m68k Installed-header hygiene (BZ#20366): stack_t. 2016-09-23 08:43:56 -04:00
mach Installed-header hygiene (BZ#20366): stack_t. 2016-09-23 08:43:56 -04:00
microblaze Add femode_t functions. 2016-09-07 16:40:09 +00:00
mips Define HIGH_ORDER_BIT_IS_SET_FOR_SNAN to 0 or 1. 2016-10-17 22:48:51 +00:00
nacl Add getpayload, getpayloadf, getpayloadl. 2016-10-19 01:49:09 +00:00
nios2 Add femode_t functions. 2016-09-07 16:40:09 +00:00
nptl Installed-header hygiene (BZ#20366): time.h types. 2016-09-23 08:43:56 -04:00
posix hurd: fix fcntl visibility 2016-09-18 23:48:55 +02:00
powerpc Stop powerpc copysignl raising "invalid" for sNaN argument (bug 20718). 2016-10-19 22:58:34 +00:00
pthread Installed-header hygiene (BZ#20366): time.h types. 2016-09-23 08:43:56 -04:00
s390 S390: Fix fp comparison not raising FE_INVALID. 2016-10-17 10:37:11 +02:00
sh Add femode_t functions: sh. 2016-09-07 16:48:08 +00:00
sparc Remove remnants of .og patterns 2016-09-20 12:18:13 +02:00
tile Add femode_t functions. 2016-09-07 16:40:09 +00:00
unix Add getpayload, getpayloadf, getpayloadl. 2016-10-19 01:49:09 +00:00
wordsize-32 Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
wordsize-64 Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
x86 Bug 20689: Fix FMA and AVX2 detection on Intel 2016-10-17 19:39:54 -04:00
x86_64 Use __builtin_fma more in dbl-64 code. 2016-09-30 15:49:51 +00:00