glibc/sysdeps/mips
Adhemerval Zanella bccb0648ea math: Use tanf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic tanf.

The code was adapted to glibc style, to use the definition of
math_config.h, to remove errno handling, and to use a generic
128 bit routine for ABIs that do not support it natively.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (neoverse1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       82.3961       54.8052       33.49%
x86_64v2                     82.3415       54.8052       33.44%
x86_64v3                     69.3661       50.4864       27.22%
i686                         219.271       45.5396       79.23%
aarch64                      29.2127       19.1951       34.29%
power10                      19.5060       16.2760       16.56%

reciprocal-throughput         master       patched  improvement
x86_64                       28.3976       19.7334       30.51%
x86_64v2                     28.4568       19.7334       30.65%
x86_64v3                     21.1815       16.1811       23.61%
i686                         105.016       15.1426       85.58%
aarch64                      18.1573       10.7681       40.70%
power10                       8.7207        8.7097        0.13%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22 10:52:27 -03:00
..
bits
fpu
ieee754
include/sys
mips32 math: Use tanf from CORE-MATH 2024-11-22 10:52:27 -03:00
mips64 math: Use tanf from CORE-MATH 2024-11-22 10:52:27 -03:00
nofpu
nptl
sys
__longjmp.c
abort-instr.h
add_n.S
addmul_1.S
atomic-machine.h
bsd-_setjmp.S
bsd-setjmp.S
configure
configure.ac
dl-debug.h
dl-dtprocnum.h
dl-machine-reject-phdr.h
dl-machine-rel.h
dl-machine.h
dl-procinfo.c
dl-procinfo.h
dl-r_debug.h
dl-relocate-ld.h
dl-tls.h
dl-trampoline.c
elf_machine_sym_no_match.h
elf-initfini.h
fpregdef.h
fpu_control.h
gccframe.h
Implies
jmpbuf-unwind.h
ldbl-classify-compat.h
ldsodefs.h
libc-tls.c
linkmap.h
localplt.data
lshift.S
machine-gmon.h
Makefile
math-tests-snan-payload.h
math-use-builtins-ffs.h
memcpy.S
memset.S
mul_1.S
nan-high-order-bit.h
preconfigure
preconfigure.ac
regdef.h
rshift.S
setjmp_aux.c
setjmp.S
sgidefs.h
sotruss-lib.c
stackinfo.h
start.S
strcmp.S
sub_n.S
submul_1.S
tininess.h
tst-abi-fp32mod.c
tst-abi-fp64amod.c
tst-abi-fp64mod.c
tst-abi-fpxxmod.c
tst-abi-fpxxomod.c
tst-abi-interlink.c
tst-audit.h
tst-mode-switch-1.c
tst-mode-switch-2.c
tst-mode-switch-3.c
tst-undefined-weak-lib.S
tst-undefined-weak.c
unwind-arch.h
utmp-size.h