C23 adds various <math.h> function families originally defined in TS
18661-4. Add the tanpi functions (tan(pi*x)).
Tested for x86_64 and x86, and with build-many-glibcs.py.
Update i686 libm-test-ulps to fix
FAIL: math/test-float64x-cospi
FAIL: math/test-float64x-sinpi
FAIL: math/test-ldouble-cospi
FAIL: math/test-ldouble-sinpi
when building glibc with GCC 7.4.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the sinpi functions (sin(pi*x)).
Tested for x86_64 and x86, and with build-many-glibcs.py.
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the cospi functions (cos(pi*x)).
Tested for x86_64 and x86, and with build-many-glibcs.py.
Add calloc-clear-memory.h to clear memory size up to 36 bytes (72 bytes
on 64-bit targets) for calloc. Use repeated stores with 1 branch, instead
of up to 3 branches. On x86-64, it is faster than memset since calling
memset needs 1 indirect branch, 1 broadcast, and up to 4 branches.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Add a threaded test for pthread_spin_trylock attempting to lock already
acquired spin lock and checking for correct return code.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Remove ZVA 128 support from memset - the new memset no longer
guarantees count >= 256, which can result in underflow and a
crash if ZVA size is 128 ([1]). Since only one CPU uses a ZVA
size of 128 and its memcpy implementation was removed in commit
e162ab2bf1b82c40f29e1925986582fa07568ce8, remove this special
case too.
[1] https://sourceware.org/pipermail/libc-alpha/2024-November/161626.html
Reviewed-by: Andrew Pinski <quic_apinski@quicinc.com>
GCC 15 (e876acab6cdd84bb2b32c98fc69fb0ba29c81153) and binutils
(e7a16d9fd65098045ef5959bf98d990f12314111) both removed all Nios II
support, and the architecture has been EOL'ed by the vendor. The
kernel still has support, but without a proper compiler there
is no much sense in keep it on glibc.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Some CORE-MATH routines uses roundeven and most of ISA do not have
an specific instruction for the operation. In this case, the call
will be routed to generic implementation.
However, if the ISA does support round() and ctz() there is a better
alternative (as used by CORE-MATH).
This patch adds such optimization and also enables it on powerpc.
On a power10 it shows the following improvement:
expm1f master patched improvement
latency 9.8574 7.0139 28.85%
reciprocal-throughput 4.3742 2.6592 39.21%
Checked on powerpc64le-linux-gnu and aarch64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
The constants themselves were added to elf.h back in 8754a4133e but the
array in _dl_show_auxv wasn't modified accordingly, resulting in the
following output when running LD_SHOW_AUXV=1 /bin/true on recent Linux:
AT_??? (0x1b): 0x1c
AT_??? (0x1c): 0x20
With this patch:
AT_RSEQ_FEATURE_SIZE: 28
AT_RSEQ_ALIGN: 32
Tested on Linux 6.11 x86_64
Signed-off-by: Yannick Le Pennec <yannick.lepennec@live.fr>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
When adding explicit initialization of rseq fields prior to
registration, I glossed over the fact that 'cpu_id_start' is also
documented as initialized by user-space.
While current kernels don't validate the content of this field on
registration, future ones could.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Add ROP protect instructions to strncpy and ppc-mount functions.
Modify FRAME_MIN_SIZE to 48 bytes for ELFv2 to reserve additional
16 bytes for ROP save slot and padding.
Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
The k>>31 in signgam = 1 - (((k&(k>>31))&1)<<1); is not portable:
* The ISO C standard says "If E1 has a signed type and a negative
value, the resulting value is implementation-defined." (this is
still in C23).
* If the int type is larger than 32 bits (e.g. a 64-bit type),
then k = INT_MAX; line 144 will make k>>31 put 1 in bit 0
(thus signgam will be -1) while 0 is expected.
Moreover, instead of the fx >= 0x1p31f condition, testing fx >= 0
is probably better for 2 reasons:
The signgam expression has more or less a condition on the sign
of fx (the goal of k>>31, which can be dropped with this new
condition). Since fx ≥ 0 should be the most common case, one can
get signgam directly in this case (value 1). And this simplifies
the expression for the other case (fx < 0).
This new condition may be easier/faster to test on the processor
(e.g. by avoiding a load of a constant from the memory).
This is commit d41459c731865516318f813cf4c966dafa0eecbf from CORE-MATH.
Checked on x86_64-linux-gnu.
Test coverage of sem_getvalue is fairly limited. Add a test that runs
it on threads on each CPU. For this purpose I adapted
tst-skeleton-thread-affinity.c; it didn't seem very suitable to use
as-is or include directly in a different test doing things per-CPU,
but did seem a suitable starting point (thus sharing
tst-skeleton-affinity.c) for such testing.
Tested for x86_64.
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic tanf.
The code was adapted to glibc style, to use the definition of
math_config.h, to remove errno handling, and to use a generic
128 bit routine for ABIs that do not support it natively.
Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (neoverse1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):
latency master patched improvement
x86_64 82.3961 54.8052 33.49%
x86_64v2 82.3415 54.8052 33.44%
x86_64v3 69.3661 50.4864 27.22%
i686 219.271 45.5396 79.23%
aarch64 29.2127 19.1951 34.29%
power10 19.5060 16.2760 16.56%
reciprocal-throughput master patched improvement
x86_64 28.3976 19.7334 30.51%
x86_64v2 28.4568 19.7334 30.65%
x86_64v3 21.1815 16.1811 23.61%
i686 105.016 15.1426 85.58%
aarch64 18.1573 10.7681 40.70%
power10 8.7207 8.7097 0.13%
Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
The hardware architects have a new recommendation not to use
non-temporal load/stores for memset. This patch removes this path.
I found there was no difference in the memset speed with/without
non-temporal load/stores either.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The hardware architects have a new recommendation not to use
non-temporal load/stores for memcpy. This patch removes this path.
I found there was no difference in the memcpy speed with/without
non-temporal load/stores either.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The ROP instructions were added in ISA 3.1 (ie, Power10), however they
were defined so that if executed on older cpus, they would behave as
nops. This allows us to emit them on older cpus and they'd just be
ignored, but if run on a Power10, then the binary would be ROP protected.
Hash instructions use negative offsets so the default position
of ROP pointer is FRAME_ROP_SAVE from caller's SP.
Modified FRAME_MIN_SIZE_PARM to 112 for ELFv2 to reserve
additional 16 bytes for ROP save slot and padding.
Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
This patch adds support for memory protection keys on AArch64 systems with
enabled Stage 1 permission overlays feature introduced in Armv8.9 / 9.4
(FEAT_S1POE) [1].
1. Internal functions "pkey_read" and "pkey_write" to access data
associated with memory protection keys.
2. Implementation of API functions "pkey_get" and "pkey_set" for
the AArch64 target.
3. AArch64-specific PKEY flags for READ and EXECUTE (see below).
4. New target-specific test that checks behaviour of pkeys on
AArch64 targets.
5. This patch also extends existing generic test for pkeys.
6. HWCAP constant for Permission Overlay Extension feature.
To support more accurate mapping of underlying permissions to the
PKEY flags, we introduce additional AArch64-specific flags. The full
list of flags is:
- PKEY_UNRESTRICTED: 0x0 (for completeness)
- PKEY_DISABLE_ACCESS: 0x1 (existing flag)
- PKEY_DISABLE_WRITE: 0x2 (existing flag)
- PKEY_DISABLE_EXECUTE: 0x4 (new flag, AArch64 specific)
- PKEY_DISABLE_READ: 0x8 (new flag, AArch64 specific)
The problem here is that PKEY_DISABLE_ACCESS has unusual semantics as
it overlaps with existing PKEY_DISABLE_WRITE and new PKEY_DISABLE_READ.
For this reason mapping between permission bits RWX and "restrictions"
bits awxr (a for disable access, etc) becomes complicated:
- PKEY_DISABLE_ACCESS disables both R and W
- PKEY_DISABLE_{WRITE,READ} disables W and R respectively
- PKEY_DISABLE_EXECUTE disables X
Combinations like the one below are accepted although they are redundant:
- PKEY_DISABLE_ACCESS | PKEY_DISABLE_READ | PKEY_DISABLE_WRITE
Reverse mapping tries to retain backward compatibility and ORs
PKEY_DISABLE_ACCESS whenever both flags PKEY_DISABLE_READ and
PKEY_DISABLE_WRITE would be present.
This will break code that compares pkey_get output with == instead
of using bitwise operations. The latter is more correct since PKEY_*
constants are essentially bit flags.
It should be noted that PKEY_DISABLE_ACCESS does not prevent execution.
[1] https://developer.arm.com/documentation/ddi0487/ka/ section D8.4.1.4
Co-authored-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
ThunderX1 and ThunderX2 have been retired for a few years now.
So let's remove the thunderx{,2} specific versions of memcpy.
The performance gain or them was for medium and large sizes
while the generic (aarch64) memcpy will handle just slightly worse.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Two of the architecture bits/fenv.h headers define femode_t if
__GLIBC_USE (IEC_60559_BFP_EXT), instead of the correct condition
__GLIBC_USE (IEC_60559_BFP_EXT_C23) (both were added after commit
0175c9e9be5f0b2000859666b6e1ef3696f1123b, but were probably first
developed before it and then not updated to take account of its
changes). This results in failures of the installed headers check for
fenv.h when building with GCC 15 (defaults to -std=gnu23 - we don't
yet have an installed-headers test specifically for C23 mode and don't
yet require a compiler with such a mode for building glibc) together
with a combination of options leaving C23 features enabled, since the
declarations of functions using femode_t use the correct conditions;
see
<https://sourceware.org/pipermail/libc-testresults/2024q4/013163.html>.
Fix the conditionals to get <fenv.h> to work correctly in C23 mode
again.
Tested with build-many-glibcs.py (arc-linux-gnu, arch-linux-gnuhf,
or1k-linux-gnu-hard, or1k-linux-gnu-soft).
Update the inline asm syscall wrappers to match the newer register constraint
usage in INTERNAL_VSYSCALL_CALL_TYPE. Use the faster mfocrf instruction when
available, rather than the slower mfcr microcoded instruction.