41334 Commits

Author SHA1 Message Date
H.J. Lu
d8a1a1aef7 x86: Detect Intel Diamond Rapids
Detect Intel Diamond Rapids and tune it similar to Intel Granite Rapids.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
(cherry picked from commit de14f1959ee5f9b845a7cae43bee03068b8136f0)
2025-04-12 12:27:19 -07:00
Sunil K Pandey
9ee8083c4e x86: Handle unknown Intel processor with default tuning
Enable default tuning for unknown Intel processor.

Tested on x86, no regression.

Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 9f0deff558d1d6b08c425c157f50de85013ada9c)
2025-04-12 12:24:53 -07:00
Sunil K Pandey
44f92df800 x86: Add ARL/PTL/CWF model detection support
- Add ARROWLAKE model detection.
- Add PANTHERLAKE model detection.
- Add CLEARWATERFOREST model detection.

Intel® Architecture Instruction Set Extensions Programming Reference
https://cdrdv2.intel.com/v1/dl/getContent/671368 Section 1.2.

No regression, validated model detection on SDE.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit e53eb952b970ac94c97d74fb447418fb327ca096)
2025-04-12 12:22:25 -07:00
Sunil K Pandey
7c6bd71b4d x86: Optimize xstate size calculation
Scan xstate IDs up to the maximum supported xstate ID.  Remove the
separate AMX xstate calculation.  Instead, exclude the AMX space from
the start of TILECFG to the end of TILEDATA in xsave_state_size.

Completed validation on SKL/SKX/SPR/SDE and compared xsave state size
with "ld.so --list-diagnostics" option, no regression.

Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
(cherry picked from commit 70b648855185e967e54668b101d24704c3fb869d)
2025-04-12 12:22:17 -07:00
Noah Goldstein
8fe27af20c x86: Use Avoid_Non_Temporal_Memset to control non-temporal path
This is just a refactor and there should be no behavioral change from
this commit.

The goal is to make `Avoid_Non_Temporal_Memset` a more universal knob
for controlling whether we use non-temporal memset rather than having
extra logic based on vendor.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

(cherry picked from commit b93dddfaf440aa12f45d7c356f6ffe9f27d35577)
2025-04-12 12:21:32 -07:00
Florian Weimer
a87d9a2c2c x86: Link tst-gnu2-tls2-x86-noxsave{,c,xsavec} with libpthread
This fixes a test build failure on Hurd.

Fixes commit 145097dff170507fe73190e8e41194f5b5f7e6bf ("x86: Use separate
variable for TLSDESC XSAVE/XSAVEC state size (bug 32810)").

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit c6e2895695118ab59c7b17feb0fcb75a53e3478c)
2025-03-31 21:34:27 +02:00
Florian Weimer
df22af58f6 x86: Use separate variable for TLSDESC XSAVE/XSAVEC state size (bug 32810)
Previously, the initialization code reused the xsave_state_full_size
member of struct cpu_features for the TLSDESC state size.  However,
the tunable processing code assumes that this member has the
original XSAVE (non-compact) state size, so that it can use its
value if XSAVEC is disabled via tunable.

This change uses a separate variable and not a struct member because
the value is only needed in ld.so and the static libc, but not in
libc.so.  As a result, struct cpu_features layout does not change,
helping a future backport of this change.

Fixes commit 9b7091415af47082664717210ac49d51551456ab ("x86-64:
Update _dl_tlsdesc_dynamic to preserve AMX registers").

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 145097dff170507fe73190e8e41194f5b5f7e6bf)
2025-03-29 10:19:51 +01:00
Florian Weimer
8fc492bb42 x86: Skip XSAVE state size reset if ISA level requires XSAVE
If we have to use XSAVE or XSAVEC trampolines, do not adjust the size
information they need.  Technically, it is an operator error to try to
run with -XSAVE,-XSAVEC on such builds, but this change here disables
some unnecessary code with higher ISA levels and simplifies testing.

Related to commit befe2d3c4dec8be2cdd01a47132e47bdb7020922
("x86-64: Don't use SSE resolvers for ISA level 3 or above").

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 59585ddaa2d44f22af04bb4b8bd4ad1e302c4c02)
2025-03-29 10:19:51 +01:00
Sunil K Pandey
e5f5dfdda2 x86_64: Add atanh with FMA
On SPR, it improves atanh bench performance by:

			Before		After		Improvement
reciprocal-throughput	15.1715		14.8628		2%
latency			57.1941		56.1883		2%

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit c7c4a5906f326f1290b1c2413a83c530564ec4b8)
2025-03-18 10:03:01 -07:00
Sunil K Pandey
e854f6d37c x86_64: Add sinh with FMA
On SPR, it improves sinh bench performance by:

			Before		After		Improvement
reciprocal-throughput	14.2017		11.815		17%
latency			36.4917		35.2114		4%

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit dded0d20f67ba1925ccbcb9cf28f0c75febe0dbe)
2025-03-18 09:59:36 -07:00
Sunil K Pandey
ee1ab93023 x86_64: Add tanh with FMA
On Skylake, it improves tanh bench performance by:

	Before 		After 		Improvement
max	110.89		95.826		14%
min	20.966		20.157		4%
mean	30.9601		29.8431		4%

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit c6352111c72a20b3588ae304dd99b63e25dd6d85)
2025-03-18 09:59:29 -07:00
Michael Jeanson
3e820e17a8 nptl: clear the whole rseq area before registration
Due to the extensible nature of the rseq area we can't explictly
initialize fields that are not part of the ABI yet. It was agreed with
upstream that all new fields will be documented as zero initialized by
userspace. Future kernels configured with CONFIG_DEBUG_RSEQ will
validate the content of all fields during registration.

Replace the explicit field initialization with a memset of the whole
rseq area which will cover fields as they are added to future kernels.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 689a62a4217fae78b9ce0db781dc2a421f2b1ab4)
2025-03-12 19:31:34 +00:00
Wilco Dijkstra
d8e8342369 math: Improve layout of exp/exp10 data
GCC aligns global data to 16 bytes if their size is >= 16 bytes.  This patch
changes the exp_data struct slightly so that the fields are better aligned
and without gaps.  As a result on targets that support them, more load-pair
instructions are used in exp.  Exp10 is improved by moving invlog10_2N later
so that neglog10_2hiN and neglog10_2loN can be loaded using load-pair.

The exp benchmark improves 2.5%, "144bits" by 7.2%, "768bits" by 12.7% on
Neoverse V2.  Exp10 improves by 1.5%.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit 5afaf99edb326fd9f36eb306a828d129a3a1d7f7)
2025-02-28 14:23:10 +00:00
Wilco Dijkstra
d6175a44e9 AArch64: Use prefer_sve_ifuncs for SVE memset
Use prefer_sve_ifuncs for SVE memset just like memcpy.

Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>
(cherry picked from commit 0f044be1dae5169d0e57f8d487b427863aeadab4)
2025-02-28 14:23:10 +00:00
Wilco Dijkstra
7038970f1f AArch64: Add SVE memset
Add SVE memset based on the generic memset with predicated load for sizes < 16.
Unaligned memsets of 128-1024 are improved by ~20% on average by using aligned
stores for the last 64 bytes.  Performance of random memset benchmark improves
by ~2% on Neoverse V1.

Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>
(cherry picked from commit 163b1bbb76caba4d9673c07940c5930a1afa7548)
2025-02-28 14:23:10 +00:00
Wilco Dijkstra
27fa0268ea math: Improve layout of expf data
GCC aligns global data to 16 bytes if their size is >= 16 bytes.  This patch
changes the exp2f_data struct slightly so that the fields are better aligned.
As a result on targets that support them, load-pair instructions accessing
poly_scaled and invln2_scaled are now 16-byte aligned.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit 44fa9c1080fe6a9539f0d2345b9d2ae37b8ee57a)
2025-02-28 14:23:10 +00:00
Wilco Dijkstra
41eb2f8b58 AArch64: Remove zva_128 from memset
Remove ZVA 128 support from memset - the new memset no longer
guarantees count >= 256, which can result in underflow and a
crash if ZVA size is 128 ([1]).  Since only one CPU uses a ZVA
size of 128 and its memcpy implementation was removed in commit
e162ab2bf1b82c40f29e1925986582fa07568ce8, remove this special
case too.

[1] https://sourceware.org/pipermail/libc-alpha/2024-November/161626.html

Reviewed-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit a08d9a52f967531a77e1824c23b5368c6434a72d)
2025-02-28 14:23:10 +00:00
Wilco Dijkstra
544fb349d3 AArch64: Optimize memset
Improve small memsets by avoiding branches and use overlapping stores.
Use DC ZVA for copies over 128 bytes.  Remove unnecessary code for ZVA sizes
other than 64 and 128.  Performance of random memset benchmark improves by 24%
on Neoverse N1.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit cec3aef32412779e207f825db0d057ebb4628ae8)
2025-02-28 14:22:28 +00:00
Wilco Dijkstra
64896b7d32 AArch64: Improve generic strlen
Improve performance by handling another 16 bytes before entering the loop.
Use ADDHN in the loop to avoid SHRN+FMOV when it terminates.  Change final
size computation to avoid increasing latency.  On Neoverse V1 performance
of the random strlen benchmark improves by 4.6%.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 3dc426b642dcafdbc11a99f2767e081d086f5fc7)
2025-02-28 14:22:28 +00:00
Wilco Dijkstra
fd9a3a36fd Revert "AArch64: Add vector logp1 alias for log1p"
This reverts commit a991a0fc7c051d7ef2ea7778e0a699f22d4e53d7.
2025-02-27 20:34:34 +00:00
Yat Long Poon
06fd8ad78f AArch64: Improve codegen for SVE powf
Improve memory access with indexed/unpredicated instructions.
Eliminate register spills.  Speedup on Neoverse V1: 3%.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
(cherry picked from commit 95e807209b680257a9afe81a507754f1565dbb4d)
2025-02-27 17:08:58 +00:00
Yat Long Poon
7dc549c5a4 AArch64: Improve codegen for SVE pow
Move constants to struct.  Improve memory access with indexed/unpredicated
instructions.  Eliminate register spills.  Speedup on Neoverse V1: 24%.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
(cherry picked from commit 0b195651db3ae793187c7dd6d78b5a7a8da9d5e6)
2025-02-27 17:08:49 +00:00
Yat Long Poon
194185c289 AArch64: Improve codegen for SVE erfcf
Reduce number of MOV/MOVPRFXs and use unpredicated FMUL.
Replace MUL with LSL.  Speedup on Neoverse V1: 6%.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
(cherry picked from commit f5ff34cb3c75ec1061c75bb9188b3c1176426947)
2025-02-27 17:08:42 +00:00
Luna Lamb
4b0bb84eb7 Aarch64: Improve codegen in SVE exp and users, and update expf_inline
Use unpredicted muls, and improve memory access.
7%, 3% and 1% improvement in throughput microbenchmark on Neoverse V1,
for exp, exp2 and cosh respectively.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
(cherry picked from commit c0ff447edf19bd4630fe79adf5e8b896405b059f)
2025-02-27 17:08:35 +00:00
Luna Lamb
0ff6a9ff79 Aarch64: Improve codegen in SVE asinh
Use unpredicated muls, use lanewise mla's and improve memory access.
1% regression in throughput microbenchmark on Neoverse V1.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
(cherry picked from commit 8f0e7fe61e0a2ad5ed777933703ce09053810ec4)
2025-02-27 17:08:28 +00:00
Luna Lamb
d983f14c30 AArch64: Improve codegen in SVE expm1f and users
Use unpredicated muls, use absolute compare and improve memory access.
Expm1f, sinhf and tanhf show 7%, 5% and 1% improvement in throughput
microbenchmark on Neoverse V1.

(cherry picked from commit f86b4cf87581cf1e45702b07880679ffa0b1f47a)
2025-02-27 17:08:16 +00:00
Yat Long Poon
aa7c61ea15 AArch64: Improve codegen for SVE log1pf users
Reduce memory access by using lanewise MLA and reduce number of MOVPRFXs.
Move log1pf implementation to inline helper function.
Speedup on Neoverse V1 for log1pf (10%), acoshf (-1%), atanhf (2%), asinhf (2%).

(cherry picked from commit 91c1fadba338752bf514cd4cca057b27b1b10eed)
2025-02-27 17:08:06 +00:00
Yat Long Poon
ab5ba6c188 AArch64: Improve codegen for SVE logs
Reduce memory access by using lanewise MLA and moving constants to struct
and reduce number of MOVPRFXs.
Update maximum ULP error for double log_sve from 1 to 2.
Speedup on Neoverse V1 for log (3%), log2 (5%), and log10 (4%).

(cherry picked from commit 32d193a372feb28f9da247bb7283d404b84429c6)
2025-02-27 17:07:55 +00:00
Luna Lamb
5f45c0f91e AArch64: Improve codegen in SVE tans
Improves memory access.
Tan: MOVPRFX 7 -> 2, LD1RD 12 -> 5, move MOV away from return.
Tanf: MOV 2 -> 1, MOVPRFX 6 -> 3, LD1RW 5 -> 4, move mov away from return.

(cherry picked from commit aa6609feb20ebf8653db639dabe2a6afc77b02cc)
2025-02-27 17:07:37 +00:00
Luna Lamb
abfd20ebbd AArch64: Improve codegen in AdvSIMD asinh
Improves memory access and removes spills.
Load the polynomial evaluation coefficients into 2 vectors and use lanewise
MLAs.  Reduces MOVs 6->3 , LDR 11->5, STR/STP 2->0, ADRP 3->2.

(cherry picked from commit 140b985e5a2071000122b3cb63ebfe88cf21dd29)
2025-02-27 17:07:30 +00:00
Joana Cruz
bf2b60a560 AArch64: Improve codegen of AdvSIMD expf family
Load the polynomial evaluation coefficients into 2 vectors and use lanewise MLAs.
Also use intrinsics instead of native operations.
expf: 3% improvement in throughput microbenchmark on Neoverse V1, exp2f: 5%,
exp10f: 13%, coshf: 14%.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
(cherry picked from commit cff9648d0b50d19cdaf685f6767add040d4e1a8e)
2025-02-27 17:07:22 +00:00
Joana Cruz
41dc9e7c2d AArch64: Improve codegen of AdvSIMD atan(2)(f)
Load the polynomial evaluation coefficients into 2 vectors and use lanewise MLAs.
8% improvement in throughput microbenchmark on Neoverse V1.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
(cherry picked from commit 6914774b9d3460876d9ad4482782213ec01a752e)
2025-02-27 17:07:15 +00:00
Joana Cruz
9170b921fa AArch64: Improve codegen of AdvSIMD logf function family
Load the polynomial evaluation coefficients into 2 vectors and use lanewise MLAs.
8% improvement in throughput microbenchmark on Neoverse V1 for log2 and log,
and 2% for log10.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
(cherry picked from commit d6e034f5b222a9ed1aeb5de0c0c7d0dda8b63da3)
2025-02-27 17:07:08 +00:00
Pierre Blanchard
2aed9796bf AArch64: Improve codegen in users of ADVSIMD log1p helper
Add inline helper for log1p and rearrange operations so MOV
is not necessary in reduction or around the special-case handler.
Reduce memory access by using more indexed MLAs in polynomial.
Speedup on Neoverse V1 for log1p (3.5%), acosh (7.5%) and atanh (10%).

(cherry picked from commit ca0c0d0f26fbf75b9cacc65122b457e8fdec40b8)
2025-02-27 17:07:01 +00:00
Pierre Blanchard
ae04f63087 AArch64: Improve codegen in AdvSIMD logs
Remove spurious ADRP and a few MOVs.
Reduce memory access by using more indexed MLAs in polynomial.
Align notation so that algorithms are easier to compare.
Speedup on Neoverse V1 for log10 (8%), log (8.5%), and log2 (10%).
Update error threshold in AdvSIMD log (now matches SVE log).

(cherry picked from commit 8eb5ad2ebc94cc5bedbac57c226c02ec254479c7)
2025-02-27 17:06:54 +00:00
Pierre Blanchard
4148940836 AArch64: Improve codegen in AdvSIMD pow
Remove spurious ADRP. Improve memory access by shuffling constants and
using more indexed MLAs.

A few more optimisation with no impact on accuracy
- force fmas contraction
- switch from shift-aided rint to rint instruction

Between 1 and 5% throughput improvement on Neoverse
V1 depending on benchmark.

(cherry picked from commit 569cfaaf4984ae70b23c61ee28a609b5aef93fea)
2025-02-27 17:06:45 +00:00
Joe Ramsay
76c923fe9d AArch64: Remove SVE erf and erfc tables
By using a combination of mask-and-add instead of the shift-based
index calculation the routines can share the same table as other
variants with no performance degradation.

The tables change name because of other changes in downstream AOR.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
(cherry picked from commit 2d82d781a539ce8e82178fc1fa2c99ae1884e7fe)
2025-02-27 17:06:35 +00:00
Joe Ramsay
9ff7559b27 AArch64: Small optimisation in AdvSIMD erf and erfc
In both routines, reduce register pressure such that GCC 14 emits no
spills for erf and fewer spills for erfc.  Also use more efficient
comparison for the special-case in erf.

Benchtests show erf improves by 6.4%, erfc by 1.0%.

(cherry picked from commit 1cf29fbc5be23db775d1dfa6b332ded6e6554252)
2025-02-27 17:06:25 +00:00
Joe Ramsay
68f2eb20de AArch64: Simplify rounding-multiply pattern in several AdvSIMD routines
This operation can be simplified to use simpler multiply-round-convert
sequence, which uses fewer instructions and constants.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
(cherry picked from commit 16a59571e4e9fd019d3fc23a2e7d73c1df8bb5cb)
2025-02-27 17:06:18 +00:00
Joe Ramsay
a947a43b95 AArch64: Improve codegen in users of ADVSIMD expm1f helper
Rearrange operations so MOV is not necessary in reduction or around
the special-case handler.  Reduce memory access by using more indexed
MLAs in polynomial.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
(cherry picked from commit 7900ac490db32f6bccff812733f00280dde34e27)
2025-02-27 17:06:09 +00:00
Joe Ramsay
5202401730 AArch64: Improve codegen in users of AdvSIMD log1pf helper
log1pf is quite register-intensive - use fewer registers for the
polynomial, and make various changes to shorten dependency chains in
parent routines.  There is now no spilling with GCC 14.  Accuracy moves
around a little - comments adjusted accordingly but does not require
regen-ulps.

Use the helper in log1pf as well, instead of having separate
implementations.  The more accurate polynomial means special-casing can
be simplified, and the shorter dependency chain avoids the usual dance
around v0, which is otherwise difficult.

There is a small duplication of vectors containing 1.0f (or 0x3f800000) -
GCC is not currently able to efficiently handle values which fit in FMOV
but not MOVI, and are reinterpreted to integer.  There may be potential
for more optimisation if this is fixed.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
(cherry picked from commit 5bc100bd4b7e00db3009ae93d25d303341545d23)
2025-02-27 17:06:01 +00:00
Joe Ramsay
c4373426e3 AArch64: Improve codegen in SVE F32 logs
Reduce MOVPRFXs by using unpredicated (non-destructive) instructions
where possible.  Similar to the recent change to AdvSIMD F32 logs,
adjust special-case arguments and bounds to allow for more optimal
register usage.  For all 3 routines one MOVPRFX remains in the
reduction, which cannot be avoided as immediate AND and ASR are both
destructive.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
(cherry picked from commit a15b1394b5eba98ffe28a02a392b587e4fe13c0d)
2025-02-27 17:05:53 +00:00
Joe Ramsay
354aeaf213 AArch64: Improve codegen in SVE expf & related routines
Reduce MOV and MOVPRFX by improving special-case handling.  Use inline
helper to duplicate the entire computation between the special- and
non-special case branches, removing the contention for z0 between x
and the return value.

Also rearrange some MLAs and MLSs - by making the multiplicand the
destination we can avoid a MOVPRFX in several cases.  Also change which
constants go in the vector used for lanewise ops - the last lane is no
longer wasted.

Spotted that shift was incorrect in exp2f and exp10f, w.r.t. to the
comment that explains it.  Fixed - worst-case ULP for exp2f moves
around but it doesn't change significantly for either routine.

Worst-case error for coshf increases due to passing x to exp rather
than abs(x) - updated the comment, but does not require regen-ulps.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
(cherry picked from commit 7b8c134b5460ed933d610fa92ed1227372b68fdc)
2025-02-27 17:05:45 +00:00
Joe Ramsay
a991a0fc7c AArch64: Add vector logp1 alias for log1p
This enables vectorisation of C23 logp1, which is an alias for log1p.
There are no new tests or ulp entries because the new symbols are simply
aliases.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
(cherry picked from commit 751a5502bea1d13551c62c47bb9bd25bff870cda)
2025-02-27 17:05:38 +00:00
Joe Ramsay
ff10623706 aarch64: Avoid redundant MOVs in AdvSIMD F32 logs
Since the last operation is destructive, the first argument to the FMA
also has to be the first argument to the special-case in order to
avoid unnecessary MOVs. Reorder arguments and adjust special-case
bounds to facilitate this.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
(cherry picked from commit 8b09af572b208bfde4d31c6abbae047dcc217675)
2025-02-27 17:05:29 +00:00
John David Anglin
523f855581 math: Add optimization barrier to ensure a1 + u.d is not reused [BZ #30664]
A number of fma tests started to fail on hppa when gcc was changed to
use Ranger rather than EVRP.  Eventually I found that the value of
a1 + u.d in this is block of code was being computed in FE_TOWARDZERO
mode and not the original rounding mode:

    if (TININESS_AFTER_ROUNDING)
      {
        w.d = a1 + u.d;
        if (w.ieee.exponent == 109)
          return w.d * 0x1p-108;
      }

This caused the exponent value to be wrong and the wrong return path
to be used.

Here we add an optimization barrier after the rounding mode is reset
to ensure that the previous value of a1 + u.d is not reused.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
2025-02-25 16:00:49 -05:00
Siddhesh Poyarekar
d6c156c326 assert: Add test for CVE-2025-0395
Use the __progname symbol to override the program name to induce the
failure that CVE-2025-0395 describes.

This is related to BZ #32582

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit cdb9ba84191ce72e86346fb8b1d906e7cd930ea2)
2025-02-13 12:46:50 -05:00
John David Anglin
e899ca3651 nptl: Correct stack size attribute when stack grows up [BZ #32574]
Set stack size attribute to the size of the mmap'd region only
when the size of the remaining stack space is less than the size
of the mmap'd region.

This was reversed.  As a result, the initial stack size was only
135168 bytes.  On architectures where the stack grows down, the
initial stack size is approximately 8384512 bytes with the default
rlimit settings.  The small main stack size on hppa broke
applications like ruby that check for stack overflows.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
2025-02-02 17:47:15 -05:00
H.J. Lu
8566822197 stdlib: Test using setenv with updated environ [BZ #32588]
Add a test for setenv with updated environ.  Verify that BZ #32588 is
fixed.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 8ab34497de14e35aff09b607222fe1309ef156da)
2025-01-25 10:01:50 +08:00
Sam James
be48b8f6ad
malloc: obscure calloc use in tst-calloc
Similar to a9944a52c967ce76a5894c30d0274b824df43c7a and
f9493a15ea9cfb63a815c00c23142369ec09d8ce, we need to hide calloc use from
the compiler to accommodate GCC's r15-6566-g804e9d55d9e54c change.

First, include tst-malloc-aux.h, but then use `volatile` variables
for size.

The test passes without the tst-malloc-aux.h change but IMO we want
it there for consistency and to avoid future problems (possibly silent).

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit c3d1dac96bdd10250aa37bb367d5ef8334a093a1)
2025-01-24 01:08:12 +00:00