glibc/sysdeps/ieee754/dbl-64/e_exp.c

178 lines
5.7 KiB
C
Raw Normal View History

Add new exp and exp2 implementations Optimized exp and exp2 implementations using a lookup table for fractional powers of 2. There are several variants, see e_exp_data.c, they can be selected by modifying math_config.h allowing different tradeoffs. The default selection should be acceptable as generic libm code. Worst case error is 0.509 ULP for exp and 0.507 ULP for exp2, on aarch64 the rodata size is 2160 bytes, shared between exp and exp2. On aarch64 .text + .rodata size decreased by 24912 bytes. The non-nearest rounding error is less than 1 ULP even on targets without efficient round implementation (although the error rate is higher in that case). Targets with single instruction, rounding mode independent, to nearest integer rounding and conversion can use them by setting TOINT_INTRINSICS and adding the necessary code to their math_private.h. The __exp1 code uses the same algorithm, so the error bound of pow increased a bit. New double precision error handling code was added following the style of the single precision error handling code. Improvements on Cortex-A72 compared to current glibc master: exp thruput: 1.61x in [-9.9 9.9] exp latency: 1.53x in [-9.9 9.9] exp thruput: 1.13x in [0.5 1] exp latency: 1.30x in [0.5 1] exp2 thruput: 2.03x in [-9.9 9.9] exp2 latency: 1.64x in [-9.9 9.9] For small (< 1) inputs the current exp code uses a separate algorithm so the speed up there is less. Was tested on aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and arm-linux-gnueabihf (!TOINT_INTRINSICS, no fma contraction) and x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction) and powerpc64le-linux-gnu (!TOINT_INTRINSICS, fma contraction) targets, only non-nearest rounding ulp errors increase and they are within acceptable bounds (ulp updates are in separate patches). * NEWS: Mention exp and exp2 improvements. * math/Makefile (libm-support): Remove t_exp. (type-double-routines): Add math_err and e_exp_data. * sysdeps/aarch64/libm-test-ulps: Update. * sysdeps/arm/libm-test-ulps: Update. * sysdeps/i386/fpu/e_exp_data.c: New file. * sysdeps/i386/fpu/math_err.c: New file. * sysdeps/i386/fpu/t_exp.c: Remove. * sysdeps/ia64/fpu/e_exp_data.c: New file. * sysdeps/ia64/fpu/math_err.c: New file. * sysdeps/ia64/fpu/t_exp.c: Remove. * sysdeps/ieee754/dbl-64/e_exp.c: Rewrite. * sysdeps/ieee754/dbl-64/e_exp2.c: Rewrite. * sysdeps/ieee754/dbl-64/e_exp_data.c: New file. * sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Update error bound. * sysdeps/ieee754/dbl-64/eexp.tbl: Remove. * sysdeps/ieee754/dbl-64/math_config.h: New file. * sysdeps/ieee754/dbl-64/math_err.c: New file. * sysdeps/ieee754/dbl-64/t_exp.c: Remove. * sysdeps/ieee754/dbl-64/t_exp2.h: Remove. * sysdeps/ieee754/dbl-64/uexp.h: Remove. * sysdeps/ieee754/dbl-64/uexp.tbl: Remove. * sysdeps/m68k/m680x0/fpu/e_exp_data.c: New file. * sysdeps/m68k/m680x0/fpu/math_err.c: New file. * sysdeps/m68k/m680x0/fpu/t_exp.c: Remove. * sysdeps/powerpc/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Update.
2018-02-13 02:16:03 +08:00
/* Double-precision e^x function.
Copyright (C) 2018 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
<http://www.gnu.org/licenses/>. */
Update. 2001-03-11 Ulrich Drepper <drepper@redhat.com> Last-bit accurate math library implementation by IBM Haifa. Contributed by Abraham Ziv <ziv@il.ibm.com>, Moshe Olshansky <olshansk@il.ibm.com>, Ealan Henis <ealan@il.ibm.com>, and Anna Reitman <reitman@il.ibm.com>. * math/Makefile (dbl-only-routines): New variable. (libm-routines): Add $(dbl-only-routines). * sysdeps/ieee754/dbl-64/e_acos.c: Empty, definition is in e_asin.c. * sysdeps/ieee754/dbl-64/e_asin.c: Replaced with accurate asin implementation. * sysdeps/ieee754/dbl-64/e_atan2.c: Replaced with accurate atan2 implementation. * sysdeps/ieee754/dbl-64/e_exp.c: Replaced with accurate exp implementation. * sysdeps/ieee754/dbl-64/e_lgamma_r.c: Don't use __kernel_sin and __kernel_cos. * sysdeps/ieee754/dbl-64/e_log.c: Replaced with accurate log implementation. * sysdeps/ieee754/dbl-64/e_remainder.c: Replaced with accurate remainder implementation. * sysdeps/ieee754/dbl-64/e_pow.c: Replaced with accurate pow implementation. * sysdeps/ieee754/dbl-64/e_sqrt.c: Replaced with accurate sqrt implementation. * sysdeps/ieee754/dbl-64/k_cos.c: Empty, definition is in s_sin.c. * sysdeps/ieee754/dbl-64/k_sin.c: Empty, definition is in s_sin.c. * sysdeps/ieee754/dbl-64/s_atan.c: Replaced with accurate atan implementation. * sysdeps/ieee754/dbl-64/s_cos.c: Empty, definition is in s_sin.c. * sysdeps/ieee754/dbl-64/s_sin.c: Replaced with accurate sin/cos implementation. * sysdeps/ieee754/dbl-64/s_sincos.c: Rewritten to not use __kernel_sin and __kernel_cos. * sysdeps/ieee754/dbl-64/s_tan.c: Replaced with accurate tan implementation. * sysdeps/ieee754/dbl-64/Dist: Add new non-code files. * sysdeps/ieee754/dbl-64/MathLib.h: New file. * sysdeps/ieee754/dbl-64/asincos.tbl: New file. * sysdeps/ieee754/dbl-64/atnat.h: New file. * sysdeps/ieee754/dbl-64/atnat2.h: New file. * sysdeps/ieee754/dbl-64/branred.c: New file. * sysdeps/ieee754/dbl-64/branred.h: New file. * sysdeps/ieee754/dbl-64/dla.h: New file. * sysdeps/ieee754/dbl-64/doasin.c: New file. * sysdeps/ieee754/dbl-64/doasin.h: New file. * sysdeps/ieee754/dbl-64/dosincos.c: New file. * sysdeps/ieee754/dbl-64/dosincos.h: New file. * sysdeps/ieee754/dbl-64/endian.h: New file. * sysdeps/ieee754/dbl-64/halfulp.c: New file. * sysdeps/ieee754/dbl-64/mpa.c: New file. * sysdeps/ieee754/dbl-64/mpa.h: New file. * sysdeps/ieee754/dbl-64/mpa2.h: New file. * sysdeps/ieee754/dbl-64/mpatan.c: New file. * sysdeps/ieee754/dbl-64/mpatan.h: New file. * sysdeps/ieee754/dbl-64/mpatan2.c: New file. * sysdeps/ieee754/dbl-64/mpexp.c: New file. * sysdeps/ieee754/dbl-64/mpexp.h: New file. * sysdeps/ieee754/dbl-64/mplog.c: New file. * sysdeps/ieee754/dbl-64/mplog.h: New file. * sysdeps/ieee754/dbl-64/mpsqrt.c: New file. * sysdeps/ieee754/dbl-64/mpsqrt.h: New file. * sysdeps/ieee754/dbl-64/mptan.c: New file. * sysdeps/ieee754/dbl-64/mydefs.h: New file. * sysdeps/ieee754/dbl-64/powtwo.tbl: New file. * sysdeps/ieee754/dbl-64/root.tbl: New file. * sysdeps/ieee754/dbl-64/sincos.tbl: New file. * sysdeps/ieee754/dbl-64/sincos32.c: New file. * sysdeps/ieee754/dbl-64/sincos32.h: New file. * sysdeps/ieee754/dbl-64/slowexp.c: New file. * sysdeps/ieee754/dbl-64/slowpow.c: New file. * sysdeps/ieee754/dbl-64/uasncs.h: New file. * sysdeps/ieee754/dbl-64/uatan.tbl: New file. * sysdeps/ieee754/dbl-64/uexp.h: New file. * sysdeps/ieee754/dbl-64/uexp.tbl: New file. * sysdeps/ieee754/dbl-64/ulog.h: New file. * sysdeps/ieee754/dbl-64/ulog.tbl: New file. * sysdeps/ieee754/dbl-64/upow.h: New file. * sysdeps/ieee754/dbl-64/upow.tbl: New file. * sysdeps/ieee754/dbl-64/urem.h: New file. * sysdeps/ieee754/dbl-64/uroot.h: New file. * sysdeps/ieee754/dbl-64/usncs.h: New file. * sysdeps/ieee754/dbl-64/utan.h: New file. * sysdeps/ieee754/dbl-64/utan.tbl: New file. * sysdeps/i386/fpu/branred.c: New file. * sysdeps/i386/fpu/doasin.c: New file. * sysdeps/i386/fpu/dosincos.c: New file. * sysdeps/i386/fpu/halfulp.c: New file. * sysdeps/i386/fpu/mpa.c: New file. * sysdeps/i386/fpu/mpatan.c: New file. * sysdeps/i386/fpu/mpatan2.c: New file. * sysdeps/i386/fpu/mpexp.c: New file. * sysdeps/i386/fpu/mplog.c: New file. * sysdeps/i386/fpu/mpsqrt.c: New file. * sysdeps/i386/fpu/mptan.c: New file. * sysdeps/i386/fpu/sincos32.c: New file. * sysdeps/i386/fpu/slowexp.c: New file. * sysdeps/i386/fpu/slowpow.c: New file. * sysdeps/ia64/fpu/branred.c: New file. * sysdeps/ia64/fpu/doasin.c: New file. * sysdeps/ia64/fpu/dosincos.c: New file. * sysdeps/ia64/fpu/halfulp.c: New file. * sysdeps/ia64/fpu/mpa.c: New file. * sysdeps/ia64/fpu/mpatan.c: New file. * sysdeps/ia64/fpu/mpatan2.c: New file. * sysdeps/ia64/fpu/mpexp.c: New file. * sysdeps/ia64/fpu/mplog.c: New file. * sysdeps/ia64/fpu/mpsqrt.c: New file. * sysdeps/ia64/fpu/mptan.c: New file. * sysdeps/ia64/fpu/sincos32.c: New file. * sysdeps/ia64/fpu/slowexp.c: New file. * sysdeps/ia64/fpu/slowpow.c: New file. * sysdeps/m68k/fpu/branred.c: New file. * sysdeps/m68k/fpu/doasin.c: New file. * sysdeps/m68k/fpu/dosincos.c: New file. * sysdeps/m68k/fpu/halfulp.c: New file. * sysdeps/m68k/fpu/mpa.c: New file. * sysdeps/m68k/fpu/mpatan.c: New file. * sysdeps/m68k/fpu/mpatan2.c: New file. * sysdeps/m68k/fpu/mpexp.c: New file. * sysdeps/m68k/fpu/mplog.c: New file. * sysdeps/m68k/fpu/mpsqrt.c: New file. * sysdeps/m68k/fpu/mptan.c: New file. * sysdeps/m68k/fpu/sincos32.c: New file. * sysdeps/m68k/fpu/slowexp.c: New file. * sysdeps/m68k/fpu/slowpow.c: New file. * iconvdata/gconv-modules: Add a number of alias, mostly for IBM codepages.
2001-03-12 08:04:52 +08:00
#include <math.h>
Add new exp and exp2 implementations Optimized exp and exp2 implementations using a lookup table for fractional powers of 2. There are several variants, see e_exp_data.c, they can be selected by modifying math_config.h allowing different tradeoffs. The default selection should be acceptable as generic libm code. Worst case error is 0.509 ULP for exp and 0.507 ULP for exp2, on aarch64 the rodata size is 2160 bytes, shared between exp and exp2. On aarch64 .text + .rodata size decreased by 24912 bytes. The non-nearest rounding error is less than 1 ULP even on targets without efficient round implementation (although the error rate is higher in that case). Targets with single instruction, rounding mode independent, to nearest integer rounding and conversion can use them by setting TOINT_INTRINSICS and adding the necessary code to their math_private.h. The __exp1 code uses the same algorithm, so the error bound of pow increased a bit. New double precision error handling code was added following the style of the single precision error handling code. Improvements on Cortex-A72 compared to current glibc master: exp thruput: 1.61x in [-9.9 9.9] exp latency: 1.53x in [-9.9 9.9] exp thruput: 1.13x in [0.5 1] exp latency: 1.30x in [0.5 1] exp2 thruput: 2.03x in [-9.9 9.9] exp2 latency: 1.64x in [-9.9 9.9] For small (< 1) inputs the current exp code uses a separate algorithm so the speed up there is less. Was tested on aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and arm-linux-gnueabihf (!TOINT_INTRINSICS, no fma contraction) and x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction) and powerpc64le-linux-gnu (!TOINT_INTRINSICS, fma contraction) targets, only non-nearest rounding ulp errors increase and they are within acceptable bounds (ulp updates are in separate patches). * NEWS: Mention exp and exp2 improvements. * math/Makefile (libm-support): Remove t_exp. (type-double-routines): Add math_err and e_exp_data. * sysdeps/aarch64/libm-test-ulps: Update. * sysdeps/arm/libm-test-ulps: Update. * sysdeps/i386/fpu/e_exp_data.c: New file. * sysdeps/i386/fpu/math_err.c: New file. * sysdeps/i386/fpu/t_exp.c: Remove. * sysdeps/ia64/fpu/e_exp_data.c: New file. * sysdeps/ia64/fpu/math_err.c: New file. * sysdeps/ia64/fpu/t_exp.c: Remove. * sysdeps/ieee754/dbl-64/e_exp.c: Rewrite. * sysdeps/ieee754/dbl-64/e_exp2.c: Rewrite. * sysdeps/ieee754/dbl-64/e_exp_data.c: New file. * sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Update error bound. * sysdeps/ieee754/dbl-64/eexp.tbl: Remove. * sysdeps/ieee754/dbl-64/math_config.h: New file. * sysdeps/ieee754/dbl-64/math_err.c: New file. * sysdeps/ieee754/dbl-64/t_exp.c: Remove. * sysdeps/ieee754/dbl-64/t_exp2.h: Remove. * sysdeps/ieee754/dbl-64/uexp.h: Remove. * sysdeps/ieee754/dbl-64/uexp.tbl: Remove. * sysdeps/m68k/m680x0/fpu/e_exp_data.c: New file. * sysdeps/m68k/m680x0/fpu/math_err.c: New file. * sysdeps/m68k/m680x0/fpu/t_exp.c: Remove. * sysdeps/powerpc/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Update.
2018-02-13 02:16:03 +08:00
#include <stdint.h>
Do not include math-barriers.h in math_private.h. This patch continues the math_private.h cleanup by stopping math_private.h from including math-barriers.h and making the users of the barrier macros include the latter header directly. No attempt is made to remove any math_private.h includes that are now unused, except in strtod_l.c where that is done to avoid line number changes in assertions, so that installed stripped shared libraries can be compared before and after the patch. (I think the floating-point environment support in math_private.h should also move out - some architectures already have fenv_private.h as an architecture-internal header included from their math_private.h - and after moving that out might be a better time to identify unused math_private.h includes.) Tested for x86_64 and x86, and tested with build-many-glibcs.py that installed stripped shared libraries are unchanged by the patch. * sysdeps/generic/math_private.h: Do not include <math-barriers.h>. * stdlib/strtod_l.c: Include <math-barriers.h> instead of <math_private.h>. * math/fromfp.h: Include <math-barriers.h>. * math/math-narrow.h: Likewise. * math/s_nextafter.c: Likewise. * math/s_nexttowardf.c: Likewise. * sysdeps/aarch64/fpu/s_llrint.c: Likewise. * sysdeps/aarch64/fpu/s_llrintf.c: Likewise. * sysdeps/aarch64/fpu/s_lrint.c: Likewise. * sysdeps/aarch64/fpu/s_lrintf.c: Likewise. * sysdeps/i386/fpu/s_nextafterl.c: Likewise. * sysdeps/i386/fpu/s_nexttoward.c: Likewise. * sysdeps/i386/fpu/s_nexttowardf.c: Likewise. * sysdeps/ieee754/dbl-64/e_atan2.c: Likewise. * sysdeps/ieee754/dbl-64/e_atanh.c: Likewise. * sysdeps/ieee754/dbl-64/e_exp.c: Likewise. * sysdeps/ieee754/dbl-64/e_exp2.c: Likewise. * sysdeps/ieee754/dbl-64/e_j0.c: Likewise. * sysdeps/ieee754/dbl-64/e_sqrt.c: Likewise. * sysdeps/ieee754/dbl-64/s_expm1.c: Likewise. * sysdeps/ieee754/dbl-64/s_fma.c: Likewise. * sysdeps/ieee754/dbl-64/s_fmaf.c: Likewise. * sysdeps/ieee754/dbl-64/s_log1p.c: Likewise. * sysdeps/ieee754/dbl-64/s_nearbyint.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_nearbyint.c: Likewise. * sysdeps/ieee754/flt-32/e_atanhf.c: Likewise. * sysdeps/ieee754/flt-32/e_j0f.c: Likewise. * sysdeps/ieee754/flt-32/s_expm1f.c: Likewise. * sysdeps/ieee754/flt-32/s_log1pf.c: Likewise. * sysdeps/ieee754/flt-32/s_nearbyintf.c: Likewise. * sysdeps/ieee754/flt-32/s_nextafterf.c: Likewise. * sysdeps/ieee754/k_standardl.c: Likewise. * sysdeps/ieee754/ldbl-128/e_asinl.c: Likewise. * sysdeps/ieee754/ldbl-128/e_expl.c: Likewise. * sysdeps/ieee754/ldbl-128/e_powl.c: Likewise. * sysdeps/ieee754/ldbl-128/s_fmal.c: Likewise. * sysdeps/ieee754/ldbl-128/s_nearbyintl.c: Likewise. * sysdeps/ieee754/ldbl-128/s_nextafterl.c: Likewise. * sysdeps/ieee754/ldbl-128/s_nexttoward.c: Likewise. * sysdeps/ieee754/ldbl-128/s_nexttowardf.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/e_asinl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_fmal.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_nextafterl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_nexttoward.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_nexttowardf.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_rintl.c: Likewise. * sysdeps/ieee754/ldbl-96/e_atanhl.c: Likewise. * sysdeps/ieee754/ldbl-96/e_j0l.c: Likewise. * sysdeps/ieee754/ldbl-96/s_fma.c: Likewise. * sysdeps/ieee754/ldbl-96/s_fmal.c: Likewise. * sysdeps/ieee754/ldbl-96/s_nexttoward.c: Likewise. * sysdeps/ieee754/ldbl-96/s_nexttowardf.c: Likewise. * sysdeps/ieee754/ldbl-opt/s_nexttowardfd.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_nextafterl.c: Likewise.
2018-05-11 23:11:38 +08:00
#include <math-barriers.h>
Add new exp and exp2 implementations Optimized exp and exp2 implementations using a lookup table for fractional powers of 2. There are several variants, see e_exp_data.c, they can be selected by modifying math_config.h allowing different tradeoffs. The default selection should be acceptable as generic libm code. Worst case error is 0.509 ULP for exp and 0.507 ULP for exp2, on aarch64 the rodata size is 2160 bytes, shared between exp and exp2. On aarch64 .text + .rodata size decreased by 24912 bytes. The non-nearest rounding error is less than 1 ULP even on targets without efficient round implementation (although the error rate is higher in that case). Targets with single instruction, rounding mode independent, to nearest integer rounding and conversion can use them by setting TOINT_INTRINSICS and adding the necessary code to their math_private.h. The __exp1 code uses the same algorithm, so the error bound of pow increased a bit. New double precision error handling code was added following the style of the single precision error handling code. Improvements on Cortex-A72 compared to current glibc master: exp thruput: 1.61x in [-9.9 9.9] exp latency: 1.53x in [-9.9 9.9] exp thruput: 1.13x in [0.5 1] exp latency: 1.30x in [0.5 1] exp2 thruput: 2.03x in [-9.9 9.9] exp2 latency: 1.64x in [-9.9 9.9] For small (< 1) inputs the current exp code uses a separate algorithm so the speed up there is less. Was tested on aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and arm-linux-gnueabihf (!TOINT_INTRINSICS, no fma contraction) and x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction) and powerpc64le-linux-gnu (!TOINT_INTRINSICS, fma contraction) targets, only non-nearest rounding ulp errors increase and they are within acceptable bounds (ulp updates are in separate patches). * NEWS: Mention exp and exp2 improvements. * math/Makefile (libm-support): Remove t_exp. (type-double-routines): Add math_err and e_exp_data. * sysdeps/aarch64/libm-test-ulps: Update. * sysdeps/arm/libm-test-ulps: Update. * sysdeps/i386/fpu/e_exp_data.c: New file. * sysdeps/i386/fpu/math_err.c: New file. * sysdeps/i386/fpu/t_exp.c: Remove. * sysdeps/ia64/fpu/e_exp_data.c: New file. * sysdeps/ia64/fpu/math_err.c: New file. * sysdeps/ia64/fpu/t_exp.c: Remove. * sysdeps/ieee754/dbl-64/e_exp.c: Rewrite. * sysdeps/ieee754/dbl-64/e_exp2.c: Rewrite. * sysdeps/ieee754/dbl-64/e_exp_data.c: New file. * sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Update error bound. * sysdeps/ieee754/dbl-64/eexp.tbl: Remove. * sysdeps/ieee754/dbl-64/math_config.h: New file. * sysdeps/ieee754/dbl-64/math_err.c: New file. * sysdeps/ieee754/dbl-64/t_exp.c: Remove. * sysdeps/ieee754/dbl-64/t_exp2.h: Remove. * sysdeps/ieee754/dbl-64/uexp.h: Remove. * sysdeps/ieee754/dbl-64/uexp.tbl: Remove. * sysdeps/m68k/m680x0/fpu/e_exp_data.c: New file. * sysdeps/m68k/m680x0/fpu/math_err.c: New file. * sysdeps/m68k/m680x0/fpu/t_exp.c: Remove. * sysdeps/powerpc/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Update.
2018-02-13 02:16:03 +08:00
#include <math-narrow-eval.h>
#include "math_config.h"
#define N (1 << EXP_TABLE_BITS)
#define InvLn2N __exp_data.invln2N
#define NegLn2hiN __exp_data.negln2hiN
#define NegLn2loN __exp_data.negln2loN
#define Shift __exp_data.shift
#define T __exp_data.tab
#define C2 __exp_data.poly[5 - EXP_POLY_ORDER]
#define C3 __exp_data.poly[6 - EXP_POLY_ORDER]
#define C4 __exp_data.poly[7 - EXP_POLY_ORDER]
#define C5 __exp_data.poly[8 - EXP_POLY_ORDER]
/* Handle cases that may overflow or underflow when computing the result that
is scale*(1+TMP) without intermediate rounding. The bit representation of
scale is in SBITS, however it has a computed exponent that may have
overflown into the sign bit so that needs to be adjusted before using it as
a double. (int32_t)KI is the k used in the argument reduction and exponent
adjustment of scale, positive k here means the result may overflow and
negative k means the result may underflow. */
static inline double
specialcase (double_t tmp, uint64_t sbits, uint64_t ki)
{
double_t scale, y;
if ((ki & 0x80000000) == 0)
{
/* k > 0, the exponent of scale might have overflowed by <= 460. */
sbits -= 1009ull << 52;
scale = asdouble (sbits);
y = 0x1p1009 * (scale + scale * tmp);
return check_oflow (y);
}
/* k < 0, need special care in the subnormal range. */
sbits += 1022ull << 52;
scale = asdouble (sbits);
y = scale + scale * tmp;
if (y < 1.0)
{
/* Round y to the right precision before scaling it into the subnormal
range to avoid double rounding that can cause 0.5+E/2 ulp error where
E is the worst-case ulp error outside the subnormal range. So this
is only useful if the goal is better than 1 ulp worst-case error. */
double_t hi, lo;
lo = scale - y + scale * tmp;
hi = 1.0 + y;
lo = 1.0 - hi + y + lo;
y = math_narrow_eval (hi + lo) - 1.0;
/* Avoid -0.0 with downward rounding. */
if (WANT_ROUNDING && y == 0.0)
y = 0.0;
/* The underflow exception needs to be signaled explicitly. */
math_force_eval (math_opt_barrier (0x1p-1022) * 0x1p-1022);
}
y = 0x1p-1022 * y;
return check_uflow (y);
}
/* Top 12 bits of a double (sign and exponent bits). */
static inline uint32_t
top12 (double x)
{
return asuint64 (x) >> 52;
}
/* Computes exp(x+xtail) where |xtail| < 2^-8/N and |xtail| <= |x|.
If hastail is 0 then xtail is assumed to be 0 too. */
static inline double
exp_inline (double x, double xtail, int hastail)
{
uint32_t abstop;
uint64_t ki, idx, top, sbits;
/* double_t for better performance on targets with FLT_EVAL_METHOD==2. */
double_t kd, z, r, r2, scale, tail, tmp;
abstop = top12 (x) & 0x7ff;
if (__glibc_unlikely (abstop - top12 (0x1p-54)
>= top12 (512.0) - top12 (0x1p-54)))
{
if (abstop - top12 (0x1p-54) >= 0x80000000)
/* Avoid spurious underflow for tiny x. */
/* Note: 0 is common input. */
return WANT_ROUNDING ? 1.0 + x : 1.0;
if (abstop >= top12 (1024.0))
{
if (asuint64 (x) == asuint64 (-INFINITY))
return 0.0;
if (abstop >= top12 (INFINITY))
return 1.0 + x;
if (asuint64 (x) >> 63)
return __math_uflow (0);
else
return __math_oflow (0);
}
/* Large x is special cased below. */
abstop = 0;
}
/* exp(x) = 2^(k/N) * exp(r), with exp(r) in [2^(-1/2N),2^(1/2N)]. */
/* x = ln2/N*k + r, with int k and r in [-ln2/2N, ln2/2N]. */
z = InvLn2N * x;
#if TOINT_INTRINSICS
kd = roundtoint (z);
ki = converttoint (z);
#else
/* z - kd is in [-1, 1] in non-nearest rounding modes. */
kd = math_narrow_eval (z + Shift);
ki = asuint64 (kd);
kd -= Shift;
#endif
r = x + kd * NegLn2hiN + kd * NegLn2loN;
/* The code assumes 2^-200 < |xtail| < 2^-8/N. */
if (hastail)
r += xtail;
/* 2^(k/N) ~= scale * (1 + tail). */
idx = 2 * (ki % N);
top = ki << (52 - EXP_TABLE_BITS);
tail = asdouble (T[idx]);
/* This is only a valid scale when -1023*N < k < 1024*N. */
sbits = T[idx + 1] + top;
/* exp(x) = 2^(k/N) * exp(r) ~= scale + scale * (tail + exp(r) - 1). */
/* Evaluation is optimized assuming superscalar pipelined execution. */
r2 = r * r;
/* Without fma the worst case error is 0.25/N ulp larger. */
/* Worst case error is less than 0.5+1.11/N+(abs poly error * 2^53) ulp. */
tmp = tail + r + r2 * (C2 + r * C3) + r2 * r2 * (C4 + r * C5);
if (__glibc_unlikely (abstop == 0))
return specialcase (tmp, sbits, ki);
scale = asdouble (sbits);
/* Note: tmp == 0 or |tmp| > 2^-200 and scale > 2^-739, so there
is no spurious underflow here even without fma. */
return scale + scale * tmp;
}
Revert exp reimplementation (causes test failures). Revert: 2017-12-19 Joseph Myers <joseph@codesourcery.com> * sysdeps/x86_64/fpu/libm-test-ulps: Update. 2017-12-19 Patrick McGehearty <patrick.mcgehearty@oracle.com> * sysdeps/ieee754/dbl-64/e_exp.c: Include <math-svid-compat.h> and <errno.h>. Include "eexp.tbl". (half): New constant. (one): Likewise. (__ieee754_exp): Rewrite. (__slowexp): Remove prototype. * sysdeps/ieee754/dbl-64/eexp.tbl: New file. * sysdeps/ieee754/dbl-64/slowexp.c: Remove file. * sysdeps/i386/fpu/slowexp.c: Likewise. * sysdeps/ia64/fpu/slowexp.c: Likewise. * sysdeps/m68k/m680x0/fpu/slowexp.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Likewise. * sysdeps/generic/math_private.h (__slowexp): Remove prototype. * sysdeps/ieee754/dbl-64/e_pow.c: Remove mention of slowexp.c in comment. * sysdeps/powerpc/power4/fpu/Makefile [$(subdir) = math] (CPPFLAGS-slowexp.c): Remove variable. * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove slowexp-fma, slowexp-fma4 and slowexp-avx. (CFLAGS-slowexp-fma.c): Remove variable. (CFLAGS-slowexp-fma4.c): Likewise. (CFLAGS-slowexp-avx.c): Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Do not define as macro. * sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Likewise. * math/Makefile (type-double-routines): Remove slowexp. * manual/probes.texi (slowexp_p6): Remove. (slowexp_p32): Likewise.
2017-12-20 02:11:37 +08:00
#ifndef SECTION
# define SECTION
#endif
double
Revert exp reimplementation (causes test failures). Revert: 2017-12-19 Joseph Myers <joseph@codesourcery.com> * sysdeps/x86_64/fpu/libm-test-ulps: Update. 2017-12-19 Patrick McGehearty <patrick.mcgehearty@oracle.com> * sysdeps/ieee754/dbl-64/e_exp.c: Include <math-svid-compat.h> and <errno.h>. Include "eexp.tbl". (half): New constant. (one): Likewise. (__ieee754_exp): Rewrite. (__slowexp): Remove prototype. * sysdeps/ieee754/dbl-64/eexp.tbl: New file. * sysdeps/ieee754/dbl-64/slowexp.c: Remove file. * sysdeps/i386/fpu/slowexp.c: Likewise. * sysdeps/ia64/fpu/slowexp.c: Likewise. * sysdeps/m68k/m680x0/fpu/slowexp.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Likewise. * sysdeps/generic/math_private.h (__slowexp): Remove prototype. * sysdeps/ieee754/dbl-64/e_pow.c: Remove mention of slowexp.c in comment. * sysdeps/powerpc/power4/fpu/Makefile [$(subdir) = math] (CPPFLAGS-slowexp.c): Remove variable. * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove slowexp-fma, slowexp-fma4 and slowexp-avx. (CFLAGS-slowexp-fma.c): Remove variable. (CFLAGS-slowexp-fma4.c): Likewise. (CFLAGS-slowexp-avx.c): Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Do not define as macro. * sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Likewise. * math/Makefile (type-double-routines): Remove slowexp. * manual/probes.texi (slowexp_p6): Remove. (slowexp_p32): Likewise.
2017-12-20 02:11:37 +08:00
SECTION
__ieee754_exp (double x)
2013-10-08 18:52:28 +08:00
{
Add new exp and exp2 implementations Optimized exp and exp2 implementations using a lookup table for fractional powers of 2. There are several variants, see e_exp_data.c, they can be selected by modifying math_config.h allowing different tradeoffs. The default selection should be acceptable as generic libm code. Worst case error is 0.509 ULP for exp and 0.507 ULP for exp2, on aarch64 the rodata size is 2160 bytes, shared between exp and exp2. On aarch64 .text + .rodata size decreased by 24912 bytes. The non-nearest rounding error is less than 1 ULP even on targets without efficient round implementation (although the error rate is higher in that case). Targets with single instruction, rounding mode independent, to nearest integer rounding and conversion can use them by setting TOINT_INTRINSICS and adding the necessary code to their math_private.h. The __exp1 code uses the same algorithm, so the error bound of pow increased a bit. New double precision error handling code was added following the style of the single precision error handling code. Improvements on Cortex-A72 compared to current glibc master: exp thruput: 1.61x in [-9.9 9.9] exp latency: 1.53x in [-9.9 9.9] exp thruput: 1.13x in [0.5 1] exp latency: 1.30x in [0.5 1] exp2 thruput: 2.03x in [-9.9 9.9] exp2 latency: 1.64x in [-9.9 9.9] For small (< 1) inputs the current exp code uses a separate algorithm so the speed up there is less. Was tested on aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and arm-linux-gnueabihf (!TOINT_INTRINSICS, no fma contraction) and x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction) and powerpc64le-linux-gnu (!TOINT_INTRINSICS, fma contraction) targets, only non-nearest rounding ulp errors increase and they are within acceptable bounds (ulp updates are in separate patches). * NEWS: Mention exp and exp2 improvements. * math/Makefile (libm-support): Remove t_exp. (type-double-routines): Add math_err and e_exp_data. * sysdeps/aarch64/libm-test-ulps: Update. * sysdeps/arm/libm-test-ulps: Update. * sysdeps/i386/fpu/e_exp_data.c: New file. * sysdeps/i386/fpu/math_err.c: New file. * sysdeps/i386/fpu/t_exp.c: Remove. * sysdeps/ia64/fpu/e_exp_data.c: New file. * sysdeps/ia64/fpu/math_err.c: New file. * sysdeps/ia64/fpu/t_exp.c: Remove. * sysdeps/ieee754/dbl-64/e_exp.c: Rewrite. * sysdeps/ieee754/dbl-64/e_exp2.c: Rewrite. * sysdeps/ieee754/dbl-64/e_exp_data.c: New file. * sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Update error bound. * sysdeps/ieee754/dbl-64/eexp.tbl: Remove. * sysdeps/ieee754/dbl-64/math_config.h: New file. * sysdeps/ieee754/dbl-64/math_err.c: New file. * sysdeps/ieee754/dbl-64/t_exp.c: Remove. * sysdeps/ieee754/dbl-64/t_exp2.h: Remove. * sysdeps/ieee754/dbl-64/uexp.h: Remove. * sysdeps/ieee754/dbl-64/uexp.tbl: Remove. * sysdeps/m68k/m680x0/fpu/e_exp_data.c: New file. * sysdeps/m68k/m680x0/fpu/math_err.c: New file. * sysdeps/m68k/m680x0/fpu/t_exp.c: Remove. * sysdeps/powerpc/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Update.
2018-02-13 02:16:03 +08:00
return exp_inline (x, 0, 0);
Update. 2001-03-11 Ulrich Drepper <drepper@redhat.com> Last-bit accurate math library implementation by IBM Haifa. Contributed by Abraham Ziv <ziv@il.ibm.com>, Moshe Olshansky <olshansk@il.ibm.com>, Ealan Henis <ealan@il.ibm.com>, and Anna Reitman <reitman@il.ibm.com>. * math/Makefile (dbl-only-routines): New variable. (libm-routines): Add $(dbl-only-routines). * sysdeps/ieee754/dbl-64/e_acos.c: Empty, definition is in e_asin.c. * sysdeps/ieee754/dbl-64/e_asin.c: Replaced with accurate asin implementation. * sysdeps/ieee754/dbl-64/e_atan2.c: Replaced with accurate atan2 implementation. * sysdeps/ieee754/dbl-64/e_exp.c: Replaced with accurate exp implementation. * sysdeps/ieee754/dbl-64/e_lgamma_r.c: Don't use __kernel_sin and __kernel_cos. * sysdeps/ieee754/dbl-64/e_log.c: Replaced with accurate log implementation. * sysdeps/ieee754/dbl-64/e_remainder.c: Replaced with accurate remainder implementation. * sysdeps/ieee754/dbl-64/e_pow.c: Replaced with accurate pow implementation. * sysdeps/ieee754/dbl-64/e_sqrt.c: Replaced with accurate sqrt implementation. * sysdeps/ieee754/dbl-64/k_cos.c: Empty, definition is in s_sin.c. * sysdeps/ieee754/dbl-64/k_sin.c: Empty, definition is in s_sin.c. * sysdeps/ieee754/dbl-64/s_atan.c: Replaced with accurate atan implementation. * sysdeps/ieee754/dbl-64/s_cos.c: Empty, definition is in s_sin.c. * sysdeps/ieee754/dbl-64/s_sin.c: Replaced with accurate sin/cos implementation. * sysdeps/ieee754/dbl-64/s_sincos.c: Rewritten to not use __kernel_sin and __kernel_cos. * sysdeps/ieee754/dbl-64/s_tan.c: Replaced with accurate tan implementation. * sysdeps/ieee754/dbl-64/Dist: Add new non-code files. * sysdeps/ieee754/dbl-64/MathLib.h: New file. * sysdeps/ieee754/dbl-64/asincos.tbl: New file. * sysdeps/ieee754/dbl-64/atnat.h: New file. * sysdeps/ieee754/dbl-64/atnat2.h: New file. * sysdeps/ieee754/dbl-64/branred.c: New file. * sysdeps/ieee754/dbl-64/branred.h: New file. * sysdeps/ieee754/dbl-64/dla.h: New file. * sysdeps/ieee754/dbl-64/doasin.c: New file. * sysdeps/ieee754/dbl-64/doasin.h: New file. * sysdeps/ieee754/dbl-64/dosincos.c: New file. * sysdeps/ieee754/dbl-64/dosincos.h: New file. * sysdeps/ieee754/dbl-64/endian.h: New file. * sysdeps/ieee754/dbl-64/halfulp.c: New file. * sysdeps/ieee754/dbl-64/mpa.c: New file. * sysdeps/ieee754/dbl-64/mpa.h: New file. * sysdeps/ieee754/dbl-64/mpa2.h: New file. * sysdeps/ieee754/dbl-64/mpatan.c: New file. * sysdeps/ieee754/dbl-64/mpatan.h: New file. * sysdeps/ieee754/dbl-64/mpatan2.c: New file. * sysdeps/ieee754/dbl-64/mpexp.c: New file. * sysdeps/ieee754/dbl-64/mpexp.h: New file. * sysdeps/ieee754/dbl-64/mplog.c: New file. * sysdeps/ieee754/dbl-64/mplog.h: New file. * sysdeps/ieee754/dbl-64/mpsqrt.c: New file. * sysdeps/ieee754/dbl-64/mpsqrt.h: New file. * sysdeps/ieee754/dbl-64/mptan.c: New file. * sysdeps/ieee754/dbl-64/mydefs.h: New file. * sysdeps/ieee754/dbl-64/powtwo.tbl: New file. * sysdeps/ieee754/dbl-64/root.tbl: New file. * sysdeps/ieee754/dbl-64/sincos.tbl: New file. * sysdeps/ieee754/dbl-64/sincos32.c: New file. * sysdeps/ieee754/dbl-64/sincos32.h: New file. * sysdeps/ieee754/dbl-64/slowexp.c: New file. * sysdeps/ieee754/dbl-64/slowpow.c: New file. * sysdeps/ieee754/dbl-64/uasncs.h: New file. * sysdeps/ieee754/dbl-64/uatan.tbl: New file. * sysdeps/ieee754/dbl-64/uexp.h: New file. * sysdeps/ieee754/dbl-64/uexp.tbl: New file. * sysdeps/ieee754/dbl-64/ulog.h: New file. * sysdeps/ieee754/dbl-64/ulog.tbl: New file. * sysdeps/ieee754/dbl-64/upow.h: New file. * sysdeps/ieee754/dbl-64/upow.tbl: New file. * sysdeps/ieee754/dbl-64/urem.h: New file. * sysdeps/ieee754/dbl-64/uroot.h: New file. * sysdeps/ieee754/dbl-64/usncs.h: New file. * sysdeps/ieee754/dbl-64/utan.h: New file. * sysdeps/ieee754/dbl-64/utan.tbl: New file. * sysdeps/i386/fpu/branred.c: New file. * sysdeps/i386/fpu/doasin.c: New file. * sysdeps/i386/fpu/dosincos.c: New file. * sysdeps/i386/fpu/halfulp.c: New file. * sysdeps/i386/fpu/mpa.c: New file. * sysdeps/i386/fpu/mpatan.c: New file. * sysdeps/i386/fpu/mpatan2.c: New file. * sysdeps/i386/fpu/mpexp.c: New file. * sysdeps/i386/fpu/mplog.c: New file. * sysdeps/i386/fpu/mpsqrt.c: New file. * sysdeps/i386/fpu/mptan.c: New file. * sysdeps/i386/fpu/sincos32.c: New file. * sysdeps/i386/fpu/slowexp.c: New file. * sysdeps/i386/fpu/slowpow.c: New file. * sysdeps/ia64/fpu/branred.c: New file. * sysdeps/ia64/fpu/doasin.c: New file. * sysdeps/ia64/fpu/dosincos.c: New file. * sysdeps/ia64/fpu/halfulp.c: New file. * sysdeps/ia64/fpu/mpa.c: New file. * sysdeps/ia64/fpu/mpatan.c: New file. * sysdeps/ia64/fpu/mpatan2.c: New file. * sysdeps/ia64/fpu/mpexp.c: New file. * sysdeps/ia64/fpu/mplog.c: New file. * sysdeps/ia64/fpu/mpsqrt.c: New file. * sysdeps/ia64/fpu/mptan.c: New file. * sysdeps/ia64/fpu/sincos32.c: New file. * sysdeps/ia64/fpu/slowexp.c: New file. * sysdeps/ia64/fpu/slowpow.c: New file. * sysdeps/m68k/fpu/branred.c: New file. * sysdeps/m68k/fpu/doasin.c: New file. * sysdeps/m68k/fpu/dosincos.c: New file. * sysdeps/m68k/fpu/halfulp.c: New file. * sysdeps/m68k/fpu/mpa.c: New file. * sysdeps/m68k/fpu/mpatan.c: New file. * sysdeps/m68k/fpu/mpatan2.c: New file. * sysdeps/m68k/fpu/mpexp.c: New file. * sysdeps/m68k/fpu/mplog.c: New file. * sysdeps/m68k/fpu/mpsqrt.c: New file. * sysdeps/m68k/fpu/mptan.c: New file. * sysdeps/m68k/fpu/sincos32.c: New file. * sysdeps/m68k/fpu/slowexp.c: New file. * sysdeps/m68k/fpu/slowpow.c: New file. * iconvdata/gconv-modules: Add a number of alias, mostly for IBM codepages.
2001-03-12 08:04:52 +08:00
}
#ifndef __ieee754_exp
strong_alias (__ieee754_exp, __exp_finite)
#endif
Update. 2001-03-11 Ulrich Drepper <drepper@redhat.com> Last-bit accurate math library implementation by IBM Haifa. Contributed by Abraham Ziv <ziv@il.ibm.com>, Moshe Olshansky <olshansk@il.ibm.com>, Ealan Henis <ealan@il.ibm.com>, and Anna Reitman <reitman@il.ibm.com>. * math/Makefile (dbl-only-routines): New variable. (libm-routines): Add $(dbl-only-routines). * sysdeps/ieee754/dbl-64/e_acos.c: Empty, definition is in e_asin.c. * sysdeps/ieee754/dbl-64/e_asin.c: Replaced with accurate asin implementation. * sysdeps/ieee754/dbl-64/e_atan2.c: Replaced with accurate atan2 implementation. * sysdeps/ieee754/dbl-64/e_exp.c: Replaced with accurate exp implementation. * sysdeps/ieee754/dbl-64/e_lgamma_r.c: Don't use __kernel_sin and __kernel_cos. * sysdeps/ieee754/dbl-64/e_log.c: Replaced with accurate log implementation. * sysdeps/ieee754/dbl-64/e_remainder.c: Replaced with accurate remainder implementation. * sysdeps/ieee754/dbl-64/e_pow.c: Replaced with accurate pow implementation. * sysdeps/ieee754/dbl-64/e_sqrt.c: Replaced with accurate sqrt implementation. * sysdeps/ieee754/dbl-64/k_cos.c: Empty, definition is in s_sin.c. * sysdeps/ieee754/dbl-64/k_sin.c: Empty, definition is in s_sin.c. * sysdeps/ieee754/dbl-64/s_atan.c: Replaced with accurate atan implementation. * sysdeps/ieee754/dbl-64/s_cos.c: Empty, definition is in s_sin.c. * sysdeps/ieee754/dbl-64/s_sin.c: Replaced with accurate sin/cos implementation. * sysdeps/ieee754/dbl-64/s_sincos.c: Rewritten to not use __kernel_sin and __kernel_cos. * sysdeps/ieee754/dbl-64/s_tan.c: Replaced with accurate tan implementation. * sysdeps/ieee754/dbl-64/Dist: Add new non-code files. * sysdeps/ieee754/dbl-64/MathLib.h: New file. * sysdeps/ieee754/dbl-64/asincos.tbl: New file. * sysdeps/ieee754/dbl-64/atnat.h: New file. * sysdeps/ieee754/dbl-64/atnat2.h: New file. * sysdeps/ieee754/dbl-64/branred.c: New file. * sysdeps/ieee754/dbl-64/branred.h: New file. * sysdeps/ieee754/dbl-64/dla.h: New file. * sysdeps/ieee754/dbl-64/doasin.c: New file. * sysdeps/ieee754/dbl-64/doasin.h: New file. * sysdeps/ieee754/dbl-64/dosincos.c: New file. * sysdeps/ieee754/dbl-64/dosincos.h: New file. * sysdeps/ieee754/dbl-64/endian.h: New file. * sysdeps/ieee754/dbl-64/halfulp.c: New file. * sysdeps/ieee754/dbl-64/mpa.c: New file. * sysdeps/ieee754/dbl-64/mpa.h: New file. * sysdeps/ieee754/dbl-64/mpa2.h: New file. * sysdeps/ieee754/dbl-64/mpatan.c: New file. * sysdeps/ieee754/dbl-64/mpatan.h: New file. * sysdeps/ieee754/dbl-64/mpatan2.c: New file. * sysdeps/ieee754/dbl-64/mpexp.c: New file. * sysdeps/ieee754/dbl-64/mpexp.h: New file. * sysdeps/ieee754/dbl-64/mplog.c: New file. * sysdeps/ieee754/dbl-64/mplog.h: New file. * sysdeps/ieee754/dbl-64/mpsqrt.c: New file. * sysdeps/ieee754/dbl-64/mpsqrt.h: New file. * sysdeps/ieee754/dbl-64/mptan.c: New file. * sysdeps/ieee754/dbl-64/mydefs.h: New file. * sysdeps/ieee754/dbl-64/powtwo.tbl: New file. * sysdeps/ieee754/dbl-64/root.tbl: New file. * sysdeps/ieee754/dbl-64/sincos.tbl: New file. * sysdeps/ieee754/dbl-64/sincos32.c: New file. * sysdeps/ieee754/dbl-64/sincos32.h: New file. * sysdeps/ieee754/dbl-64/slowexp.c: New file. * sysdeps/ieee754/dbl-64/slowpow.c: New file. * sysdeps/ieee754/dbl-64/uasncs.h: New file. * sysdeps/ieee754/dbl-64/uatan.tbl: New file. * sysdeps/ieee754/dbl-64/uexp.h: New file. * sysdeps/ieee754/dbl-64/uexp.tbl: New file. * sysdeps/ieee754/dbl-64/ulog.h: New file. * sysdeps/ieee754/dbl-64/ulog.tbl: New file. * sysdeps/ieee754/dbl-64/upow.h: New file. * sysdeps/ieee754/dbl-64/upow.tbl: New file. * sysdeps/ieee754/dbl-64/urem.h: New file. * sysdeps/ieee754/dbl-64/uroot.h: New file. * sysdeps/ieee754/dbl-64/usncs.h: New file. * sysdeps/ieee754/dbl-64/utan.h: New file. * sysdeps/ieee754/dbl-64/utan.tbl: New file. * sysdeps/i386/fpu/branred.c: New file. * sysdeps/i386/fpu/doasin.c: New file. * sysdeps/i386/fpu/dosincos.c: New file. * sysdeps/i386/fpu/halfulp.c: New file. * sysdeps/i386/fpu/mpa.c: New file. * sysdeps/i386/fpu/mpatan.c: New file. * sysdeps/i386/fpu/mpatan2.c: New file. * sysdeps/i386/fpu/mpexp.c: New file. * sysdeps/i386/fpu/mplog.c: New file. * sysdeps/i386/fpu/mpsqrt.c: New file. * sysdeps/i386/fpu/mptan.c: New file. * sysdeps/i386/fpu/sincos32.c: New file. * sysdeps/i386/fpu/slowexp.c: New file. * sysdeps/i386/fpu/slowpow.c: New file. * sysdeps/ia64/fpu/branred.c: New file. * sysdeps/ia64/fpu/doasin.c: New file. * sysdeps/ia64/fpu/dosincos.c: New file. * sysdeps/ia64/fpu/halfulp.c: New file. * sysdeps/ia64/fpu/mpa.c: New file. * sysdeps/ia64/fpu/mpatan.c: New file. * sysdeps/ia64/fpu/mpatan2.c: New file. * sysdeps/ia64/fpu/mpexp.c: New file. * sysdeps/ia64/fpu/mplog.c: New file. * sysdeps/ia64/fpu/mpsqrt.c: New file. * sysdeps/ia64/fpu/mptan.c: New file. * sysdeps/ia64/fpu/sincos32.c: New file. * sysdeps/ia64/fpu/slowexp.c: New file. * sysdeps/ia64/fpu/slowpow.c: New file. * sysdeps/m68k/fpu/branred.c: New file. * sysdeps/m68k/fpu/doasin.c: New file. * sysdeps/m68k/fpu/dosincos.c: New file. * sysdeps/m68k/fpu/halfulp.c: New file. * sysdeps/m68k/fpu/mpa.c: New file. * sysdeps/m68k/fpu/mpatan.c: New file. * sysdeps/m68k/fpu/mpatan2.c: New file. * sysdeps/m68k/fpu/mpexp.c: New file. * sysdeps/m68k/fpu/mplog.c: New file. * sysdeps/m68k/fpu/mpsqrt.c: New file. * sysdeps/m68k/fpu/mptan.c: New file. * sysdeps/m68k/fpu/sincos32.c: New file. * sysdeps/m68k/fpu/slowexp.c: New file. * sysdeps/m68k/fpu/slowpow.c: New file. * iconvdata/gconv-modules: Add a number of alias, mostly for IBM codepages.
2001-03-12 08:04:52 +08:00
Remove slow paths from pow Remove the slow paths from pow. Like several other double precision math functions, pow is exactly rounded. This is not required from math functions and causes major overheads as it requires multiple fallbacks using higher precision arithmetic if a result is close to 0.5ULP. Ridiculous slowdowns of up to 100000x have been reported when the highest precision path triggers. All GLIBC math tests pass on AArch64 and x64 (with ULP of pow set to 1). The worst case error is ~0.506ULP. A simple test over a few hundred million values shows pow is 10% faster on average. This fixes BZ #13932. [BZ #13932] * sysdeps/ieee754/dbl-64/uexp.h (err_1): Remove. * benchtests/pow-inputs: Update comment for slow path cases. * manual/probes.texi (slowpow_p10): Delete removed probe. (slowpow_p10): Likewise. * math/Makefile: Remove halfulp.c and slowpow.c. * sysdeps/aarch64/libm-test-ulps: Set ULP of pow to 1. * sysdeps/generic/math_private.h (__exp1): Remove error argument. (__halfulp): Remove. (__slowpow): Remove. * sysdeps/i386/fpu/halfulp.c: Delete file. * sysdeps/i386/fpu/slowpow.c: Likewise. * sysdeps/ia64/fpu/halfulp.c: Likewise. * sysdeps/ia64/fpu/slowpow.c: Likewise. * sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove error argument, improve comments and add error analysis. * sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Add error analysis. (power1): Remove function: (log1): Remove error argument, add error analysis. (my_log2): Remove function. * sysdeps/ieee754/dbl-64/halfulp.c: Delete file. * sysdeps/ieee754/dbl-64/slowpow.c: Likewise. * sysdeps/m68k/m680x0/fpu/halfulp.c: Likewise. * sysdeps/m68k/m680x0/fpu/slowpow.c: Likewise. * sysdeps/powerpc/power4/fpu/Makefile: Remove CPPFLAGS-slowpow.c. * sysdeps/x86_64/fpu/libm-test-ulps: Set ULP of pow to 1. * sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowpow-fma.c, slowpow-fma4.c, halfulp-fma.c, halfulp-fma4.c. * sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__slowpow): Remove define. * sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__slowpow): Likewise. * sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Delete file. * sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c: Likewise.
2018-02-12 18:42:42 +08:00
/* Compute e^(x+xx). */
double
SECTION
Remove slow paths from pow Remove the slow paths from pow. Like several other double precision math functions, pow is exactly rounded. This is not required from math functions and causes major overheads as it requires multiple fallbacks using higher precision arithmetic if a result is close to 0.5ULP. Ridiculous slowdowns of up to 100000x have been reported when the highest precision path triggers. All GLIBC math tests pass on AArch64 and x64 (with ULP of pow set to 1). The worst case error is ~0.506ULP. A simple test over a few hundred million values shows pow is 10% faster on average. This fixes BZ #13932. [BZ #13932] * sysdeps/ieee754/dbl-64/uexp.h (err_1): Remove. * benchtests/pow-inputs: Update comment for slow path cases. * manual/probes.texi (slowpow_p10): Delete removed probe. (slowpow_p10): Likewise. * math/Makefile: Remove halfulp.c and slowpow.c. * sysdeps/aarch64/libm-test-ulps: Set ULP of pow to 1. * sysdeps/generic/math_private.h (__exp1): Remove error argument. (__halfulp): Remove. (__slowpow): Remove. * sysdeps/i386/fpu/halfulp.c: Delete file. * sysdeps/i386/fpu/slowpow.c: Likewise. * sysdeps/ia64/fpu/halfulp.c: Likewise. * sysdeps/ia64/fpu/slowpow.c: Likewise. * sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove error argument, improve comments and add error analysis. * sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Add error analysis. (power1): Remove function: (log1): Remove error argument, add error analysis. (my_log2): Remove function. * sysdeps/ieee754/dbl-64/halfulp.c: Delete file. * sysdeps/ieee754/dbl-64/slowpow.c: Likewise. * sysdeps/m68k/m680x0/fpu/halfulp.c: Likewise. * sysdeps/m68k/m680x0/fpu/slowpow.c: Likewise. * sysdeps/powerpc/power4/fpu/Makefile: Remove CPPFLAGS-slowpow.c. * sysdeps/x86_64/fpu/libm-test-ulps: Set ULP of pow to 1. * sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowpow-fma.c, slowpow-fma4.c, halfulp-fma.c, halfulp-fma4.c. * sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__slowpow): Remove define. * sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__slowpow): Likewise. * sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Delete file. * sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c: Likewise.
2018-02-12 18:42:42 +08:00
__exp1 (double x, double xx)
2013-10-08 18:52:28 +08:00
{
Add new exp and exp2 implementations Optimized exp and exp2 implementations using a lookup table for fractional powers of 2. There are several variants, see e_exp_data.c, they can be selected by modifying math_config.h allowing different tradeoffs. The default selection should be acceptable as generic libm code. Worst case error is 0.509 ULP for exp and 0.507 ULP for exp2, on aarch64 the rodata size is 2160 bytes, shared between exp and exp2. On aarch64 .text + .rodata size decreased by 24912 bytes. The non-nearest rounding error is less than 1 ULP even on targets without efficient round implementation (although the error rate is higher in that case). Targets with single instruction, rounding mode independent, to nearest integer rounding and conversion can use them by setting TOINT_INTRINSICS and adding the necessary code to their math_private.h. The __exp1 code uses the same algorithm, so the error bound of pow increased a bit. New double precision error handling code was added following the style of the single precision error handling code. Improvements on Cortex-A72 compared to current glibc master: exp thruput: 1.61x in [-9.9 9.9] exp latency: 1.53x in [-9.9 9.9] exp thruput: 1.13x in [0.5 1] exp latency: 1.30x in [0.5 1] exp2 thruput: 2.03x in [-9.9 9.9] exp2 latency: 1.64x in [-9.9 9.9] For small (< 1) inputs the current exp code uses a separate algorithm so the speed up there is less. Was tested on aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and arm-linux-gnueabihf (!TOINT_INTRINSICS, no fma contraction) and x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction) and powerpc64le-linux-gnu (!TOINT_INTRINSICS, fma contraction) targets, only non-nearest rounding ulp errors increase and they are within acceptable bounds (ulp updates are in separate patches). * NEWS: Mention exp and exp2 improvements. * math/Makefile (libm-support): Remove t_exp. (type-double-routines): Add math_err and e_exp_data. * sysdeps/aarch64/libm-test-ulps: Update. * sysdeps/arm/libm-test-ulps: Update. * sysdeps/i386/fpu/e_exp_data.c: New file. * sysdeps/i386/fpu/math_err.c: New file. * sysdeps/i386/fpu/t_exp.c: Remove. * sysdeps/ia64/fpu/e_exp_data.c: New file. * sysdeps/ia64/fpu/math_err.c: New file. * sysdeps/ia64/fpu/t_exp.c: Remove. * sysdeps/ieee754/dbl-64/e_exp.c: Rewrite. * sysdeps/ieee754/dbl-64/e_exp2.c: Rewrite. * sysdeps/ieee754/dbl-64/e_exp_data.c: New file. * sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Update error bound. * sysdeps/ieee754/dbl-64/eexp.tbl: Remove. * sysdeps/ieee754/dbl-64/math_config.h: New file. * sysdeps/ieee754/dbl-64/math_err.c: New file. * sysdeps/ieee754/dbl-64/t_exp.c: Remove. * sysdeps/ieee754/dbl-64/t_exp2.h: Remove. * sysdeps/ieee754/dbl-64/uexp.h: Remove. * sysdeps/ieee754/dbl-64/uexp.tbl: Remove. * sysdeps/m68k/m680x0/fpu/e_exp_data.c: New file. * sysdeps/m68k/m680x0/fpu/math_err.c: New file. * sysdeps/m68k/m680x0/fpu/t_exp.c: Remove. * sysdeps/powerpc/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Update.
2018-02-13 02:16:03 +08:00
return exp_inline (x, xx, 1);
}