Commit Graph

41782 Commits

Author SHA1 Message Date
DJ Delorie
e79e5c4899 assert: ensure posix compliance, add tests for such
Fix assert.c so that even the fallback
case conforms to POSIX, although not exactly the same as
the default case so a test can tell the difference.

Add a test that verifies that abort is called, and that the
message printed to stderr has all the info that POSIX requires.
Verify this even when malloc isn't usable.

Reviewed-by: Paul Eggert <eggert@cs.ucla.edu>
2024-12-20 22:44:01 -05:00
Adhemerval Zanella
b3a7a15d99 cet: Drop '#pragma GCC target' in tst-cet-legacy-10a[-static].c
After

commit 215447f5cb
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue Dec 17 06:18:55 2024 +0800

    cet: Pass -mshstk to compiler for tst-cet-legacy-10a[-static].c

we can remove '#pragma GCC target' in tst-cet-legacy-10a[-static].c.

Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
2024-12-21 06:16:58 +08:00
Aurelien Jarno
6fd215d6ae posix: fix system when a child cannot be created [BZ #32450]
POSIX states that "if a child process cannot be created, or if the
termination status for the command language interpreter cannot be
obtained, system() shall return -1 and set errno to indicate the error."

In the glibc implementation it could happen when posix_spawn fails,
which happens when the underlying fork, vfork, or clone call fails. They
could fail with EAGAIN and ENOMEM.

Resolves: BZ #32450
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-12-20 22:57:06 +01:00
H.J. Lu
034cd67528 Don't use glibc <tgmath.h> when testing with Clang
Clang has its own <tgmath.h> and doesn't use <tgmath.h> from glibc.  Pass
"-I." to compiler only if $($(<F)-no-include-dot) are undefined.  Define
it to yes for tgmath tests when testing with Clang.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2024-12-21 05:24:07 +08:00
H.J. Lu
6025b399c7 stdio-common: Exclude bug28 when clang is used
Clang 19 takes a very long time, it ran more than 27 minutes on Intel Core
i7-1195G7 before the process was killed, to compile bug28.c:

https://github.com/llvm/llvm-project/issues/120462

Exclude it when Clang is used for testing.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2024-12-21 05:14:01 +08:00
H.J. Lu
40bf25b754 Fix elf: Introduce is_rtld_link_map [BZ #32488]
Also use is_rtld_link_map in dl-cet.c.  This fixes BZ #32488.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-12-21 04:36:18 +08:00
Adhemerval Zanella
c3ee510267 math: xfail some tanpi tests for ibm128-libgcc
On powerpc math/test-ibm128-tanpi shows multiple failures:

testing long double (without inline functions)
Failure: tanpi_downward (0xfffffffffffffffdp-1): Exception "Divide by zero" not set
Failure: tanpi_downward (0xfffffffffffffffdp-1): errno set to 0, expected 34 (ERANGE)
Failure: Test: tanpi_downward (0xfffffffffffffffdp-1)
Result:
 is:          4.68843873182857939141363635204365e+28   0x1.2efbb6629d1d59b032520400df8p+95
 should be:   inf   inf
Failure: tanpi_downward (0x3fffffffffffffffffffffffffdp-1): Exception "Divide by zero" not set
Failure: tanpi_downward (0x3fffffffffffffffffffffffffdp-1): errno set to 0, expected 34 (ERANGE)
Failure: Test: tanpi_downward (0x3fffffffffffffffffffffffffdp-1)
Result:
 is:          1.41444453325831960404472183124793e+16   0x1.9202627cbf98e052d5fdbeee1f8p+53
 should be:   inf   inf
Failure: tanpi_downward (-0xf.ffffffffffffbffffffffffffcp+1020): Exception "Invalid operation" set
Failure: tanpi_downward (-0xf.ffffffffffffbffffffffffffcp+1020): Exception "Overflow" set
Failure: tanpi_downward (-0xf.ffffffffffffbffffffffffffcp+1020): errno set to 33, expected 0 (unchanged)
Failure: Test: tanpi_downward (-0xf.ffffffffffffbffffffffffffcp+1020)
Result:
 is:         qNaN
 should be:  -0.00000000000000000000000000000000e+00  -0x0.000000000000000000000000000p+0
Failure: Test: tanpi_downward (0x3.fffffffffffffffcp+108)
Result:
 is:          2.91356019227449116879287504834896e-15   0x1.a3e365fee24d4632f95a2235698p-49
 should be:   0.00000000000000000000000000000000e+00   0x0.000000000000000000000000000p+0
 difference:  2.91356019227449116879287504834896e-15   0x1.a3e365fee24d4632f95a2235698p-49
 ulp       :  179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321
 max.ulp   :  8.0000
Failure: Test: tanpi_downward (0x3.ffffffffffffffffffffffffffp+108)
Result:
 is:          7.94911926685664643005642781870827e-16   0x1.ca3c4b83eb5688e1474146dc338p-51
 should be:   0.00000000000000000000000000000000e+00   0x0.000000000000000000000000000p+0
 difference:  7.94911926685664643005642781870827e-16   0x1.ca3c4b83eb5688e1474146dc338p-51
 ulp       :  160891965142034222272327839154722485473479235229008379884749401713481320342777314570400076204240982703218835644458374555276642
 max.ulp   :  8.0000
Failure: tanpi_towardzero (0xfffffffffffffffdp-1): Exception "Divide by zero" not set
Failure: tanpi_towardzero (0xfffffffffffffffdp-1): errno set to 0, expected 34 (ERANGE)
Failure: Test: tanpi_towardzero (0xfffffffffffffffdp-1)
Result:
 is:          2.14718475310122677917055904836884e+28   0x1.1584624c14882fff76592b4ec10p+94
 should be:   inf   inf
Failure: tanpi_towardzero (-0xfffffffffffffffdp-1): Exception "Divide by zero" not set
Failure: tanpi_towardzero (-0xfffffffffffffffdp-1): errno set to 0, expected 34 (ERANGE)
Failure: Test: tanpi_towardzero (-0xfffffffffffffffdp-1)
Result:
 is:         -2.14718475310122677917055904836884e+28  -0x1.1584624c14882fff76592b4ec10p+94
 should be:  -inf  -inf
Failure: tanpi_towardzero (0x3fffffffffffffffffffffffffdp-1): Exception "Divide by zero" not set
Failure: tanpi_towardzero (0x3fffffffffffffffffffffffffdp-1): errno set to 0, expected 34 (ERANGE)
Failure: Test: tanpi_towardzero (0x3fffffffffffffffffffffffffdp-1)
Result:
 is:          6.60739946234609289593176521179840e+15   0x1.7796511d79d6ce55bc8bf083fe0p+52
 should be:   inf   inf
Failure: tanpi_towardzero (-0x3fffffffffffffffffffffffffdp-1): Exception "Divide by zero" not set
Failure: tanpi_towardzero (-0x3fffffffffffffffffffffffffdp-1): errno set to 0, expected 34 (ERANGE)
Failure: Test: tanpi_towardzero (-0x3fffffffffffffffffffffffffdp-1)
Result:
 is:         -6.60739946234609289593176521179840e+15  -0x1.7796511d79d6ce55bc8bf083fe0p+52
 should be:  -inf  -inf
Failure: Test: tanpi_towardzero (-0x3.fffffffffffffffcp+108)
Result:
 is:         -1.17953443892757434921819283936141e-14  -0x1.a8f8d97fb893518cbe5688935c0p-47
 should be:  -0.00000000000000000000000000000000e+00  -0x0.000000000000000000000000000p+0
 difference:  1.17953443892757434921819283936141e-14   0x1.a8f8d97fb893518cbe5688935c0p-47
 ulp       :  179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321
 max.ulp   :  8.0000
Failure: Test: tanpi_towardzero (-0x3.ffffffffffffffffffffffffffp+108)
Result:
 is:         -1.85584803206881692897837494734542e-14  -0x1.4e51e25c1f5ab4470a3a0a42c24p-46
 should be:  -0.00000000000000000000000000000000e+00  -0x0.000000000000000000000000000p+0
 difference:  1.85584803206881692897837494734542e-14   0x1.4e51e25c1f5ab4470a3a0a42c24p-46
 ulp       :  179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321
 max.ulp   :  8.0000
Failure: Test: tanpi_towardzero (0x3.fffffffffffffffcp+108)
Result:
 is:          1.17953443892757434921819283936141e-14   0x1.a8f8d97fb893518cbe5688935c0p-47
 should be:   0.00000000000000000000000000000000e+00   0x0.000000000000000000000000000p+0
 difference:  1.17953443892757434921819283936141e-14   0x1.a8f8d97fb893518cbe5688935c0p-47
 ulp       :  179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321
 max.ulp   :  8.0000
Failure: Test: tanpi_towardzero (0x3.ffffffffffffffffffffffffffp+108)
Result:
 is:          1.85584803206881692897837494734542e-14   0x1.4e51e25c1f5ab4470a3a0a42c24p-46
 should be:   0.00000000000000000000000000000000e+00   0x0.000000000000000000000000000p+0
 difference:  1.85584803206881692897837494734542e-14   0x1.4e51e25c1f5ab4470a3a0a42c24p-46
 ulp       :  179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321
 max.ulp   :  8.0000
Failure: tanpi_upward (-0xfffffffffffffffdp-1): Exception "Divide by zero" not set
Failure: tanpi_upward (-0xfffffffffffffffdp-1): errno set to 0, expected 34 (ERANGE)
Failure: Test: tanpi_upward (-0xfffffffffffffffdp-1)
Result:
 is:         -2.14718475310122677917055904836884e+28  -0x1.1584624c14882fff76592b4ec10p+94
 should be:  -inf  -inf
Failure: tanpi_upward (-0x3fffffffffffffffffffffffffdp-1): Exception "Divide by zero" not set
Failure: tanpi_upward (-0x3fffffffffffffffffffffffffdp-1): errno set to 0, expected 34 (ERANGE)
Failure: Test: tanpi_upward (-0x3fffffffffffffffffffffffffdp-1)
Result:
 is:         -6.60739946234609289593176521179829e+15  -0x1.7796511d79d6ce55bc8bf083fdbp+52
 should be:  -inf  -inf
Failure: Test: tanpi_upward (-0x3.fffffffffffffffcp+108)
Result:
 is:         -1.17953443892757434921819283936138e-14  -0x1.a8f8d97fb893518cbe5688935b0p-47
 should be:  -0.00000000000000000000000000000000e+00  -0x0.000000000000000000000000000p+0
 difference:  1.17953443892757434921819283936139e-14   0x1.a8f8d97fb893518cbe5688935b0p-47
 ulp       :  inf
 max.ulp   :  8.0000
Failure: Test: tanpi_upward (-0x3.ffffffffffffffffffffffffffp+108)
Result:
 is:         -1.85584803206881692897837494734542e-14  -0x1.4e51e25c1f5ab4470a3a0a42c24p-46
 should be:  -0.00000000000000000000000000000000e+00  -0x0.000000000000000000000000000p+0
 difference:  1.85584803206881692897837494734543e-14   0x1.4e51e25c1f5ab4470a3a0a42c24p-46
 ulp       :  inf
 max.ulp   :  8.0000
Failure: tanpi_upward (0xf.ffffffffffffbffffffffffffcp+1020): Exception "Invalid operation" set
Failure: tanpi_upward (0xf.ffffffffffffbffffffffffffcp+1020): Exception "Overflow" set
Failure: tanpi_upward (0xf.ffffffffffffbffffffffffffcp+1020): errno set to 33, expected 0 (unchanged)
Failure: Test: tanpi_upward (0xf.ffffffffffffbffffffffffffcp+1020)
Result:
 is:         qNaN
 should be:   0.00000000000000000000000000000000e+00   0x0.000000000000000000000000000p+0
2024-12-20 15:09:40 -03:00
Florian Weimer
495b96e064 elf: Reorder audit events in dlcose to match _dl_fini (bug 32066)
This was discovered after extending elf/tst-audit23 to cover
dlclose of the dlmopen namespace.

Auditors already experience the new order during process
shutdown (_dl_fini), so no LAV_CURRENT bump or backwards
compatibility code seems necessary.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-12-20 16:17:10 +01:00
Florian Weimer
c4b160744c elf: Call la_objclose for proxy link maps in _dl_fini (bug 32065)
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-12-20 16:17:08 +01:00
Florian Weimer
8f36b14696 elf: Signal la_objopen for the proxy link map in dlmopen (bug 31985)
Previously, the ld.so link map was silently added to the namespace.
This change produces an auditing event for it.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-12-20 16:16:21 +01:00
Florian Weimer
a20bc2f623 elf: Add the endswith function to <endswith.h>
And include <stdbool.h> for a definition of bool.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-12-20 16:15:53 +01:00
Florian Weimer
4a50fdf8b2 elf: Update DSO list, write audit log to elf/tst-audit23.out
After commit 1d5024f4f0
("support: Build with exceptions and asynchronous unwind tables
[BZ #30587]"), libgcc_s is expected to show up in the DSO
list on 32-bit Arm.  Do not update max_objs because vdso is not
tracked (and which is the reason why the test currently passes
even with libgcc_s present).

Also write the log output from the auditor to standard output,
for easier test debugging.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-12-20 16:15:51 +01:00
Florian Weimer
ef5823d955 elf: Move _dl_rtld_map, _dl_rtld_audit_state out of GL
This avoids immediate GLIBC_PRIVATE ABI issues if the size of
struct link_map or struct auditstate changes.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-12-20 15:52:57 +01:00
Florian Weimer
2b1dba3eb3 elf: Introduce is_rtld_link_map
Unconditionally define it to false for static builds.

This avoids the awkward use of weak_extern for _dl_rtld_map
in checks that cannot be possibly true on static builds.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-12-20 15:52:57 +01:00
Joseph Myers
322e9d4e44 Add F_CREATED_QUERY from Linux 6.12 to bits/fcntl-linux.h
Linux 6.12 adds a new constant F_CREATED_QUERY.  Add it to glibc's
bits/fcntl-linux.h.

Tested for x86_64.
2024-12-20 11:47:33 +00:00
Joseph Myers
37d9618492 Add HWCAP_LOONGARCH_LSPW from Linux 6.12 to bits/hwcap.h
Add the new Linux 6.12 HWCAP_LOONGARCH_LSPW to the corresponding
bits/hwcap.h.

Tested with build-many-glibcs.py for loongarch64-linux-gnu-lp64d.
2024-12-20 11:47:03 +00:00
Joseph Myers
fbdd8b3fa8 Add MSG_SOCK_DEVMEM from Linux 6.12 to bits/socket.h
Linux 6.12 adds a constant MSG_SOCK_DEVMEM (recall that various
constants such as this one are defined in the non-uapi linux/socket.h
but still form part of the kernel/userspace interface, so that
non-uapi header is one that needs checking each release for new such
constants).  Add it to glibc's bits/socket.h.

Tested for x86_64.
2024-12-20 11:46:06 +00:00
Florian Weimer
9a6533429e i386: Regenerate ulps
As seen on an Intel i9-9900K CPU, with glibc built with GCC 11.5,
configured with and without --disable-multi-arch.
2024-12-20 12:40:17 +01:00
Florian Weimer
6fba7d6578 x86_64: Regenerate ulps
As seen with an AMD 7950X CPU, on a glibc built with GCC 11.5.
2024-12-20 07:22:02 +01:00
Florian Weimer
6a99b4172a aarch64: Regenerate ulps
Results from running on Neoverse-V2, built with GCC 11.5.
2024-12-20 07:12:30 +01:00
Florian Weimer
e79b9e962d elf: Remove code dependent on __rtld_lock_default_lock_recursive macro
Neither NPTL nor Hurd define this macro anymore.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-12-19 21:29:58 +01:00
Florian Weimer
70d0836305 Linux: Accept null arguments for utimensat pathname
This matches kernel behavior.  With this change, it is possible
to use utimensat as a replacement for the futimens interface,
similar to what glibc does internally.

Reviewed-by: Paul Eggert <eggert@cs.ucla.edu>
2024-12-19 21:21:30 +01:00
Florian Weimer
30d3fd7f4f x86_64: Remove unused padding from tcbhead_t
This padding is difficult to use for preserving the internal
GLIBC_PRIVATE ABI.  The comment is misleading.  Current Address
Sanitizer uses heuristics to determine struct pthread size.
It does not depend on its precise layout.  It merely scans for
pointers allocated using malloc.

Due to the removal of the padding, the assert for its start
is no longer required.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-12-19 21:21:30 +01:00
Joseph Myers
d7f587398c Add further DSO dependency sorting tests
The current DSO dependency sorting tests are for a limited number of
specific cases, including some from particular bug reports.

Add tests that systematically cover all possible DAGs for an
executable and the shared libraries it depends on, directly or
indirectly, up to four objects (an executable and three shared
libraries).  (For this kind of DAG - ones with a single source vertex
from which all others are reachable, and an ordering on the edges from
each vertex - there are 57 DAGs on four vertices, 3399 on five
vertices and 1026944 on six vertices; see
https://arxiv.org/pdf/2303.14710 for more details on this enumeration.
I've tested that the 3399 cases with five vertices do all pass if
enabled.)

These tests are replicating the sorting logic from the dynamic linker
(thereby, for example, asserting that it doesn't accidentally change);
I'm not claiming that the logic in the dynamic linker is in some
abstract sense optimal.  Note that these tests do illustrate how in
some cases the two sorting algorithms produce different results for a
DAG (I think all the existing tests for such differences are ones
involving cycles, and the motivation for the new algorithm was also to
improve the handling of cycles):

  tst-dso-ordering-all4-44: a->[bc];{}->[cba]
  output(glibc.rtld.dynamic_sort=1): c>b>a>{}<a<b<c
  output(glibc.rtld.dynamic_sort=2): b>c>a>{}<a<c<b

They also illustrate that sometimes the sorting algorithms do not
follow the order in which dependencies are listed in DT_NEEDED even
though there is a valid topological sort that does follow that, which
might be counterintuitive considering that the DT_NEEDED ordering is
followed in the simplest cases:

  tst-dso-ordering-all4-56: {}->[abc]
  output: c>b>a>{}<a<b<c

shows such a simple case following DT_NEEDED order for destructor
execution (the reverse of it for constructor execution), but

  tst-dso-ordering-all4-41: a->[cb];{}->[cba]
  output: c>b>a>{}<a<b<c

shows that c and b are in the opposite order to what might be expected
from the simplest case, though there is no dependency requiring such
an opposite order to be used.

(I'm not asserting that either of those things is a problem, simply
observing them as less obvious properties of the sorting algorithms
shown up by these tests.)

Tested for x86_64.
2024-12-19 18:56:04 +00:00
Joseph Myers
539bf8dd41 Add NT_X86_XSAVE_LAYOUT and NT_ARM_POE from Linux 6.12 to elf.h
Linux 6.12 adds new ELF note types NT_X86_XSAVE_LAYOUT and NT_ARM_POE.
Add these to glibc's elf.h.

Tested for x86_64.
2024-12-19 17:09:19 +00:00
Joseph Myers
29ae632e76 Add SCHED_EXT from Linux 6.12 to bits/sched.h
Linux 6.12 adds the SCHED_EXT constant.  Add it to glibc's
bits/sched.h and update the kernel version in tst-sched-consts.py.

Tested for x86_64.
2024-12-19 17:08:38 +00:00
John David Anglin
57256971b0 hppa: Fix strace detach-vfork test
This change implements vfork.S for direct support of the vfork
syscall.  clone.S is revised to correct child support for the
vfork case.

The main bug was creating a frame prior to the clone syscall.
This was done to allow the rp and r4 registers to be saved and
restored from the stack frame.  r4 was used to save and restore
the PIC register, r19, across the system call and the call to
set errno.  But in the vfork case, it is undefined behavior
for the child to return from the function in which vfork was
called.  It is surprising that this usually worked.

Syscalls on hppa save and restore rp and r19, so we don't need
to create a frame prior to the clone syscall.  We only need a
frame when __syscall_error is called.  We also don't need to
save and restore r19 around the call to $$dyncall as r19 is not
used in the code after $$dyncall.

This considerably simplifies clone.S.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
2024-12-19 11:30:09 -05:00
Joseph Myers
5fcee06dc7 Update kernel version to 6.12 in header constant tests
There are no new constants covered by tst-mman-consts.py,
tst-mount-consts.py or tst-pidfd-consts.py in Linux 6.12 that need any
header changes, so update the kernel version in those tests.
(tst-sched-consts.py will need updating separately along with adding
SCHED_EXT.)

Tested with build-many-glibcs.py.
2024-12-19 15:38:59 +00:00
Paul Zimmermann
d421d36582 added url of CORE-MATH project
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella
0e0be3ed80 math: Use tanhf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic tanhf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      51.5273        41.0951        20.25%
x86_64v2                    47.7021        39.1526        17.92%
x86_64v3                    45.0373        34.2737        23.90%
i686                       133.9970        83.8596        37.42%
aarch64 (Neoverse)          21.5439        14.7961        31.32%
power10                     13.3301         8.4406        36.68%

reciprocal-throughput        master        patched   improvement
x86_64                      24.9493        12.8547        48.48%
x86_64v2                    20.7051        12.7761        38.29%
x86_64v3                    19.2492        11.0851        42.41%
i686                        78.6498        29.8211        62.08%
aarch64 (Neoverse)          11.6026        7.11487        38.68%
power10                      6.3328         2.8746        54.61%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella
1751c0519a math: Use sinhf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic sinhf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      52.6819        49.1489         6.71%
x86_64v2                    49.1162        42.9447        12.57%
x86_64v3                    46.9732        39.9157        15.02%
i686                       141.1470       129.6410         8.15%
aarch64 (Neoverse)          20.8539        17.1288        17.86%
power10                     14.5258        9.1906         36.73%

reciprocal-throughput        master        patched   improvement
x86_64                      27.5553        23.9395        13.12%
x86_64v2                    21.6423        20.3219         6.10%
x86_64v3                    21.4842        16.0224        25.42%
i686                        87.9709        86.1626         2.06%
aarch64 (Neoverse)          15.1919        12.2744        19.20%
power10                      7.2188         5.2611        27.12%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella
9583836785 math: Use coshf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode),
although it should worse performance than current one.  The current
implementation performance comes mainly from the internal usage of
the optimize expf implementation, and shows a maximum ULPs of 2 for
FE_TONEAREST and 3 for other rounding modes.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      40.6995        49.0737       -20.58%
x86_64v2                    40.5841        44.3604        -9.30%
x86_64v3                    39.3879        39.7502        -0.92%
i686                       112.3380       129.8570       -15.59%
aarch64 (Neoverse)          18.6914        17.0946         8.54%
power10                     11.1343        9.3245         16.25%

reciprocal-throughput        master        patched   improvement
x86_64                      18.6471        24.1077       -29.28%
x86_64v2                    17.7501        20.2946       -14.34%
x86_64v3                    17.8262        17.1877         3.58%
i686                        64.1454        86.5645       -34.95%
aarch64 (Neoverse)          9.77226        12.2314       -25.16%
power10                      4.0200        5.3316        -32.63%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella
7cfd8b5698 math: Use atanhf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic atanhf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      59.4930        45.8568        22.92%
x86_64v2                    59.5705        45.5804        23.48%
x86_64v3                    53.1838        37.7155        29.08%
i686                        169.354       133.5940        21.12%
aarch64 (Neoverse)          26.0781        16.9829        34.88%
power10                     15.6591        10.7623        31.27%

reciprocal-throughput        master        patched   improvement
x86_64                      23.5903        18.5766        21.25%
x86_64v2                    22.6489        18.2683        19.34%
x86_64v3                    19.0401        13.9474        26.75%
i686                        97.6034       107.3260        -9.96%
aarch64 (Neoverse)          15.3664        9.57846        37.67%
power10                      6.8877        4.6242         32.86%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella
6f9bacf36b math: Use atan2f from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic atan2f.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      68.1175        69.2014        -1.59%
x86_64v2                    66.9884        66.0081         1.46%
x86_64v3                    57.7034        61.6407        -6.82%
i686                       189.8690        152.7560       19.55%
aarch64 (Neoverse)          32.6151        24.5382        24.76%
power10                     21.7282        17.1896        20.89%

reciprocal-throughput        master        patched   improvement
x86_64                      34.5202        31.6155         8.41%
x86_64v2                    32.6379        30.3372         7.05%
x86_64v3                    34.3677        23.6455        31.20%
i686                       157.7290        75.8308        51.92%
aarch64 (Neoverse)          27.7788        16.2671        41.44%
power10                     15.5715         8.1588        47.60%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella
a357d6273f math: Use atanf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic atanf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      56.8265        53.6842         5.53%
x86_64v2                    54.8177        53.6842         2.07%
x86_64v3                    46.2915        48.7034        -5.21%
i686                       158.3760        108.9560       31.20%
aarch64 (Neoverse)           21.687        20.5893         5.06%
power10                     13.1903        13.5012        -2.36%

reciprocal-throughput        master        patched   improvement
x86_64                      16.6787        16.7601        -0.49%
x86_64v2                    16.6983        16.7601        -0.37%
x86_64v3                    16.2268        12.1391        25.19%
i686                       138.6840        36.0640        74.00%
aarch64 (Neoverse)          11.8012        10.3565        12.24%
power10                      5.3212         4.2894        19.39%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella
ed608a40e2 math: Use asinhf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic asinhf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      64.5128        56.9717        11.69%
x86_64v2                    63.3065        57.2666         9.54%
x86_64v3                    62.8719        51.4170        18.22%
i686                       189.1630        137.635        27.24%
aarch64 (Neoverse)          25.3551        20.5757        18.85%
power10                     17.9712        13.3302        25.82%

reciprocal-throughput        master        patched   improvement
x86_64                      20.0844        15.4731        22.96%
x86_64v2                    19.2919        15.4000        20.17%
x86_64v3                    18.7226        11.9009        36.44%
i686                       103.7670        80.2681        22.65%
aarch64 (Neoverse)          12.5005        8.68969        30.49%
power10                      7.2220        5.03617        30.27%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>:
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella
5fb4b566ef math: Use asinf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic asinf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      42.8237        35.2460        17.70%
x86_64v2                    43.3711        35.9406        17.13%
x86_64v3                    35.0335        30.5744        12.73%
i686                       213.8780        104.4710       51.15%
aarch64 (Neoverse)          17.2937        13.6025        21.34%
power10                     12.0227        7.4241         38.25%

reciprocal-throughput        master        patched   improvement
x86_64                      13.6770        15.5231       -13.50%
x86_64v2                    13.8722        16.0446       -15.66%
x86_64v3                    13.6211        13.2753         2.54%
i686                       186.7670        45.4388        75.67%
aarch64 (Neoverse)          9.96089        9.39285         5.70%
power10                      4.9862        3.7819         24.15%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella
673e6fe110 math: Use acoshf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic acoshf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      61.2471        58.7742         4.04%
x86_64-v2                   62.6519        59.0523         5.75%
x86_64-v3                   58.7408        50.1393        14.64%
aarch64                     24.8580        21.3317        14.19%
power10                     17.0469        13.1345        22.95%

reciprocal-throughput        master        patched   improvement
x86_64                      16.1618        15.1864         6.04%
x86_64-v2                   15.7729        14.7563         6.45%
x86_64-v3                   14.1669        11.9568        15.60%
aarch64                      10.911        9.5486         12.49%
power10                     6.38196        5.06734        20.60%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella
66fa7ad437 math: Use acosf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic acosf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      52.5098        36.6312        30.24%
x86_64v2                    53.0217        37.3091        29.63%
x86_64v3                    42.8501        32.3977        24.39%
i686                       207.3960       109.4000        47.25%
aarch64                     21.3694        13.7871        35.48%
power10                     14.5542         7.2891        49.92%

reciprocal-throughput        master        patched   improvement
x86_64                      14.1487        15.9508       -12.74%
x86_64v2                    14.3293        16.1899       -12.98%
x86_64v3                    13.6563        12.6161         7.62%
i686                       158.4060        45.7354        71.13%
aarch64                     12.5515        9.19233        26.76%
power10                      5.7868         3.3487        42.13%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella
45126f866c math: Fix the expected carg (inf) results
The pi defined constants are not the expected value for carg
on non-default rounding modes (similar to atan).  Instead use
autogenerated value.
2024-12-18 17:24:43 -03:00
Adhemerval Zanella
abe1d65aa6 math: Fix the expected atan2f (inf) results
The pi defined constants are not the expected value for atan2
on non-default rounding modes.  Instead use the autogenerated value.

Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella
517c213377 math: Fix the expected atanf (inf) results
The M_PI_2 (lit_pi_2_d) constant is not the expected value for atanf
on non-default rounding modes.  Instead use the autogenerated value.
2024-12-18 17:24:43 -03:00
Adhemerval Zanella
aa3e67ced6 math: Add inf support on gen-auto-libm-tests.c
For some correctly rounded inputs where infinity might generate
a number (like atanf), comparing to a pre-defined constant does not
yield the expected result in all rounding modes.

The most straightforward way to handle it would be to get the expected
result from mpfr, where it handles all the rounding modes.
2024-12-18 17:24:42 -03:00
Adhemerval Zanella
a993eea641 math: Fix spurious-divbyzero flag name
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:42 -03:00
Adhemerval Zanella
042ed4b28a benchtests: Add tanhf benchmark
Random inputs in the range [-10,10].

Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:42 -03:00
Adhemerval Zanella
b76b90a809 benchtests: Add sinhf benchmark
Random inputs in the range [-10,10].

Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:42 -03:00
Adhemerval Zanella
7b7a3fa121 benchtests: Add coshf benchmark
Random inputs in the range [-10,10].

Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:42 -03:00
Adhemerval Zanella
4f1e26ba47 benchtests: Add atanhf benchmark
The input is based on acosf one (random inputs in [-1,1]).

Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:42 -03:00
Adhemerval Zanella
fa857e6c7b benchtests: Add atan2f benchmark
Random inputs in the range [-10,10].

Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:42 -03:00
Adhemerval Zanella
74a275d244 benchtests: Add atanf benchmark
Random inputs in the range [-10,10].

Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:42 -03:00