This commit fixes the bug mentioned in the previous commit.
The previous implementations of wmemchr in these files relied
on n * sizeof(wchar_t) which was not guranteed by the standard.
The new overflow tests added in the previous commit now
pass (As well as all the other tests).
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 645a158978f9520e74074e8c14047503be4db0f0)
No bug. This commit optimizes memchr-avx2.S. The optimizations include
replacing some branches with cmovcc, avoiding some branches entirely
in the less_4x_vec case, making the page cross logic less strict,
asaving a few instructions the in loop return loop. test-memchr,
test-rawmemchr, and test-wmemchr are all passing.
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit acfd088a1963ba51cd83c78f95c0ab25ead79e04)
Place strings ending at page boundary without the null byte. If an
implementation goes beyond EXP_LEN, it will trigger the segfault.
(cherry picked from commit cb882b21b63606aabd6e55afe23b42434d95f2ef)
Since __strlen_evex and __strnlen_evex added by
commit 1fd8c163a83d96ace1ff78fa6bac7aee084f6f77
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Fri Mar 5 06:24:52 2021 -0800
x86-64: Add ifunc-avx2.h functions with 256-bit EVEX
use sarx:
c4 e2 6a f7 c0 sarx %edx,%eax,%eax
require BMI2 for __strlen_evex and __strnlen_evex in ifunc-impl-list.c.
ifunc-avx2.h already requires BMI2 for EVEX implementation.
(cherry picked from commit 55bf411b451c13f0fb7ff3d3bf9a820020b45df1)
Fix some indentations of ifdef in file strlen-evex.S which are off by 1
and confusing to read.
(cherry picked from commit 595c22ecd8e87a27fd19270ed30fdbae9ad25426)
Update ifunc-memmove.h to select the function optimized with AVX512
instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable
AVX512VL since VZEROUPPER isn't needed at function exit.
(cherry picked from commit e4fda4631017e49d4ee5a2755db34289b6860fa4)
Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized
with AVX512 instructions using ZMM16-ZMM31 registers to avoid RTM abort
with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at
function exit.
(cherry picked from commit 4e2d8f352774b56078c34648b14a2412c38384f4)
At function exit, AVX optimized string/memory functions have VZEROUPPER
which triggers RTM abort. When such functions are called inside a
transactionally executing RTM region, RTM abort causes severe performance
degradation. Add tests to verify that string/memory functions won't
cause RTM abort in RTM region.
(cherry picked from commit 4bd660be40967cd69072f69ebc2ad32bfcc1f206)
Since VZEROUPPER triggers RTM abort while VZEROALL won't, select AVX
optimized string/memory functions with
xtest
jz 1f
vzeroall
ret
1:
vzeroupper
ret
at function exit on processors with usable RTM, but without 256-bit EVEX
instructions to avoid VZEROUPPER inside a transactionally executing RTM
region.
(cherry picked from commit 7ebba91361badf7531d4e75050627a88d424872f)
Update ifunc-memcmp.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL, AVX512BW and MOVBE since VZEROUPPER isn't needed at function
exit.
(cherry picked from commit 91264fe3577fe887b4860923fa6142b5274c8965)
Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized
with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM
abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at
function exit.
(cherry picked from commit 1b968b6b9b3aac702ac2f133e0dd16cfdbb415ee)
Update ifunc-memmove.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL since VZEROUPPER isn't needed at function exit.
(cherry picked from commit 63ad43566f7a25d140dc723598aeb441ad657eed)
Update ifunc-strcpy.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit.
(cherry picked from commit 525bc2a32c9710df40371f951217c6ae7a923aee)
Update ifunc-avx2.h, strchr.c, strcmp.c, strncmp.c and wcsnlen.c to
select the function optimized with 256-bit EVEX instructions using
YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW
and BMI2 since VZEROUPPER isn't needed at function exit.
For strcmp/strncmp, prefer AVX2 strcmp/strncmp if Prefer_AVX2_STRCMP
is set.
(cherry picked from commit 1fd8c163a83d96ace1ff78fa6bac7aee084f6f77)
1. Set Prefer_No_VZEROUPPER if RTM is usable to avoid RTM abort triggered
by VZEROUPPER inside a transactionally executing RTM region.
2. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2
loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs,
1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp. Add
Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions.
(cherry picked from commit 1da50d4bda07f04135dca39f40e79fc9eabed1f8)
Fixes [BZ# 28755] for wcsncmp by redirecting length >= 2^56 to
__wcscmp_avx2. For x86_64 this covers the entire address range so any
length larger could not possibly be used to bound `s1` or `s2`.
test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass.
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit ddf0992cf57a93200e0c782e2a94d0733a5a0b87)
Whenever a sub-make is created, it inherits the variable subdirs from its
parent. This is also true when make check is called with a restricted
list of subdirs. In this scenario, make install is executed "partially"
and testroot.pristine ends up with an incomplete installation.
[BZ #24794]
* Makefile (testroot.pristine/install.stamp): Pass
subdirs='$(all-subdirs)' to make install.
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 35e038c1d2ccb3a75395662f9c4f28d85a61444f)
When parse_tunables tries to erase a tunable marked as SXID_ERASE for
setuid programs, it ends up setting the envvar string iterator
incorrectly, because of which it may parse the next tunable
incorrectly. Given that currently the implementation allows malformed
and unrecognized tunables pass through, it may even allow SXID_ERASE
tunables to go through.
This change revamps the SXID_ERASE implementation so that:
- Only valid tunables are written back to the tunestr string, because
of which children of SXID programs will only inherit a clean list of
identified tunables that are not SXID_ERASE.
- Unrecognized tunables get scrubbed off from the environment and
subsequently from the child environment.
- This has the side-effect that a tunable that is not identified by
the setxid binary, will not be passed on to a non-setxid child even
if the child could have identified that tunable. This may break
applications that expect this behaviour but expecting such tunables
to cross the SXID boundary is wrong.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 2ed18c5b534d9e92fc006202a5af0df6b72e7aca)
Instead of passing GLIBC_TUNABLES via the environment, pass the
environment variable from parent to child. This allows us to test
multiple variables to ensure better coverage.
The test list currently only includes the case that's already being
tested. More tests will be added later.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 061fe3f8add46a89b7453e87eabb9c4695005ced)
Use the support_capture_subprogram_self_sgid to spawn an sgid child.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit ca335281068a1ed549a75ee64f90a8310755956f)
Add a new function support_capture_subprogram_self_sgid that spawns an
sgid child of the running program with its own image and returns the
exit code of the child process. This functionality is used by at
least three tests in the testsuite at the moment, so it makes sense to
consolidate.
There is also a new function support_subprogram_wait which should
provide simple system() like functionality that does not set up file
actions. This is useful in cases where only the return code of the
spawned subprocess is interesting.
This patch also ports tst-secure-getenv to this new function. A
subsequent patch will port other tests. This also brings an important
change to tst-secure-getenv behaviour. Now instead of succeeding, the
test fails as UNSUPPORTED if it is unable to spawn a setgid child,
which is how it should have been in the first place.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 716a3bdc41b2b4b864dc64475015ba51e35e1273)
- Add a newline to the end of error messages in transfer().
- Fixed the name of support_subprocess_init().
(cherry picked from commit 95c68080a3ded882789b1629f872c3ad531efda0)
Pass environ to posix_spawn so that the child process can inherit
environment of the test.
(cherry picked from commit e958490f8c74e660bd93c128b3bea746e268f3f6)
In commit 745664bd798ec8fd50438605948eea594179fba1 a use-after-free
was fixed, but this led to an occasional double-free. This patch
tracks the "live" allocation better.
Tested manually by a third party.
Related: RHBZ 1927877
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit dca565886b5e8bd7966e15f0ca42ee5cff686673)
The conversion loop to the internal encoding does not follow
the interface contract that __GCONV_FULL_OUTPUT is only returned
after the internal wchar_t buffer has been filled completely. This
is enforced by the first of the two asserts in iconv/skeleton.c:
/* We must run out of output buffer space in this
rerun. */
assert (outbuf == outerr);
assert (nstatus == __GCONV_FULL_OUTPUT);
This commit solves this issue by queuing a second wide character
which cannot be written immediately in the state variable, like
other converters already do (e.g., BIG5-HKSCS or TSCII).
Reported-by: Tavis Ormandy <taviso@gmail.com>
(cherry picked from commit 7d88c6142c6efc160c0ee5e4f85cde382c072888)
Calling an IFUNC function defined in unrelocated executable also leads to
segfault. Issue a fatal error message when calling IFUNC function defined
in the unrelocated executable from a shared library.
On x86, ifuncmain6pie failed with:
[hjl@gnu-cfl-2 build-i686-linux]$ ./elf/ifuncmain6pie --direct
./elf/ifuncmain6pie: IFUNC symbol 'foo' referenced in '/export/build/gnu/tools-build/glibc-32bit/build-i686-linux/elf/ifuncmod6.so' is defined in the executable and creates an unsatisfiable circular dependency.
[hjl@gnu-cfl-2 build-i686-linux]$ readelf -rW elf/ifuncmod6.so | grep foo
00003ff4 00000706 R_386_GLOB_DAT 0000400c foo_ptr
00003ff8 00000406 R_386_GLOB_DAT 00000000 foo
0000400c 00000401 R_386_32 00000000 foo
[hjl@gnu-cfl-2 build-i686-linux]$
Remove non-JUMP_SLOT relocations against foo in ifuncmod6.so, which
trigger the circular IFUNC dependency, and build ifuncmain6pie with
-Wl,-z,lazy.
(cherry picked from commits 6ea5b57afa5cdc9ce367d2b69a2cebfb273e4617
and 7137d682ebfcb6db5dfc5f39724718699922f06c)
Update dl_cet_check() to set header.feature_1 in TCB when both IBT and
SHSTK are always on.
(cherry picked from commit 2ef23b520597f4ea1790a669b83e608f24f4cf12)
When copying with "rep movsb", if the distance between source and
destination is N*4GB + [1..63] with N >= 0, performance may be very
slow. This patch updates memmove-vec-unaligned-erms.S for AVX and
AVX512 versions with the distance in RCX:
cmpl $63, %ecx
// Don't use "rep movsb" if ECX <= 63
jbe L(Don't use rep movsb")
Use "rep movsb"
Benchtests data with bench-memcpy, bench-memcpy-large, bench-memcpy-random
and bench-memcpy-walk on Skylake, Ice Lake and Tiger Lake show that its
performance impact is within noise range as "rep movsb" is only used for
data size >= 4KB.
(cherry picked from commit 3ec5d83d2a237d39e7fd6ef7a0bc8ac4c171a4a5)
The variant PCS support was ineffective because in the common case
linkmap->l_mach.plt == 0 but then the symbol table flags were ignored
and normal lazy binding was used instead of resolving the relocs early.
(This was a misunderstanding about how GOT[1] is setup by the linker.)
In practice this mainly affects SVE calls when the vector length is
more than 128 bits, then the top bits of the argument registers get
clobbered during lazy binding.
Fixes bug 26798.
(cherry picked from commit 558251bd8785760ad40fcbfeaaee5d27fa5b0fe4)
Add CPU detection of Neoverse N2 and Neoverse V1, and select __memcpy_simd as
the memcpy/memmove ifunc.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit e11ed9d2b4558eeacff81557dc9557001af42a6b)
Further optimize integer memcpy. Small cases now include copies up
to 32 bytes. 64-128 byte copies are split into two cases to improve
performance of 64-96 byte copies. Comments have been rewritten.
(cherry picked from commit 700065132744e0dfa6d4d9142d63f6e3a1934726)
Increase the upper bound on medium cases from 96 to 128 bytes.
Now, up to 128 bytes are copied unrolled.
Increase the upper bound on small cases from 16 to 32 bytes so that
copies of 17-32 bytes are not impacted by the larger medium case.
Benchmarking:
The attached figures show relative timing difference with respect
to 'memcpy_generic', which is the existing implementation.
'memcpy_med_128' denotes the the version of memcpy_generic with
only the medium case enlarged. The 'memcpy_med_128_small_32' numbers
are for the version of memcpy_generic submitted in this patch, which
has both medium and small cases enlarged. The figures were generated
using the script from:
https://www.sourceware.org/ml/libc-alpha/2019-10/msg00563.html
Depending on the platform, the performance improvement in the
bench-memcpy-random.c benchmark ranges from 6% to 20% between
the original and final version of memcpy.S
Tested against GLIBC testsuite and randomized tests.
(cherry picked from commit b9f145df85145506f8e61bac38b792584a38d88f)
Rename IS_ARES to IS_NEOVERSE_N1 since that is a bit clearer.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 0f6278a8793a5d04ea31878119eccf99f469a02d)
On some microarchitectures performance of the backwards memmove improves if
the stores use STR with decreasing addresses. So change the memmove loop
in memcpy_advsimd.S to use 2x STR rather than STP.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit bd394d131c10c9ec22c6424197b79410042eed99)
Add a new memcpy using 128-bit Q registers - this is faster on modern
cores and reduces codesize. Similar to the generic memcpy, small cases
include copies up to 32 bytes. 64-128 byte copies are split into two
cases to improve performance of 64-96 byte copies. Large copies align
the source rather than the destination.
bench-memcpy-random is ~9% faster than memcpy_falkor on Neoverse N1,
so make this memcpy the default on N1 (on Centriq it is 15% faster than
memcpy_falkor).
Passes GLIBC regression tests.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 4a733bf375238a6a595033b5785cea7f27d61307)
Given almost all uses of ENTRY are for string/memory functions,
align ENTRY to a cacheline to simplify things.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 34f0d01d5e43c7dedd002ab47f6266dfb5b79c22)
strcmp-avx2.S: In avx2 strncmp function, strings are compared in
chunks of 4 vector size(i.e. 32x4=128 byte for avx2). After first 4
vector size comparison, code must check whether it already passed
the given offset. This patch implement avx2 offset check condition
for strncmp function, if both string compare same for first 4 vector
size.
(cherry picked from commit 75870237ff3bb363447b03f4b0af100227570910)
During cleanup, before returning from get*_r functions, the end*ent
calls must not change errno. Otherwise, an ERANGE error from the
underlying implementation can be hidden, causing unexpected lookup
failures. This commit introduces an internal_end*ent_noerror
function which saves and restore errno, and marks the original
internal_end*ent function as warn_unused_result, so that it is used
only in contexts were errors from it can be handled explicitly.
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 790b8dda4455865cb8c3a47801f4304c1a43baf6)
When unwinding through a signal frame the backtrace function on PowerPC
didn't check array bounds when storing the frame address. Fixes commit
d400dcac5e ("PowerPC: fix backtrace to handle signal trampolines").
(cherry picked from commit d93769405996dfc11d216ddbe415946617b5a494)
The value of `end_name' points into the value of `dirname', thus don't
deallocate the latter before the last use of the former.
(cherry picked from commit ddc650e9b3dc916eab417ce9f79e67337b05035c)
This fixes commit 9333498794cde1d5cca518bad ("Avoid ldbl-96 stack
corruption from range reduction of pseudo-zero (bug 25487).").
(cherry picked from commit c10acd40262486dac597001aecc20ad9d3bd0e4a)
Bug 25487 reports stack corruption in ldbl-96 sinl on a pseudo-zero
argument (an representation where all the significand bits, including
the explicit high bit, are zero, but the exponent is not zero, which
is not a valid representation for the long double type).
Although this is not a valid long double representation, existing
practice in this area (see bug 4586, originally marked invalid but
subsequently fixed) is that we still seek to avoid invalid memory
accesses as a result, in case of programs that treat arbitrary binary
data as long double representations, although the invalid
representations of the ldbl-96 format do not need to be consistently
handled the same as any particular valid representation.
This patch makes the range reduction detect pseudo-zero and unnormal
representations that would otherwise go to __kernel_rem_pio2, and
returns a NaN for them instead of continuing with the range reduction
process. (Pseudo-zero and unnormal representations whose unbiased
exponent is less than -1 have already been safely returned from the
function before this point without going through the rest of range
reduction.) Pseudo-zero representations would previously result in
the value passed to __kernel_rem_pio2 being all-zero, which is
definitely unsafe; unnormal representations would previously result in
a value passed whose high bit is zero, which might well be unsafe
since that is not a form of input expected by __kernel_rem_pio2.
Tested for x86_64.
(cherry picked from commit 9333498794cde1d5cca518badf79533a24114b6f)
In commit 3dd4d40b420846dd35869ccc8f8627feef2cff32 ("xfs: Sanity check
flags of Q_XQUOTARM call"), Linux 5.4 added checking for the flags
argument, causing the test to fail due to too restrictive test
expectations.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 1f7525d924b608a3e43b10fcfb3d46b8a6e9e4f9)
Without the asm redirects, strchr et al. are not const-correct.
libc++ has a wrapper header that works with and without
__CORRECT_ISO_CPP_STRING_H_PROTO (using a Clang extension). But when
Clang is used with libstdc++ or just C headers, the overloaded functions
with the correct types are not declared.
This change does not impact current GCC (with libstdc++ or libc++).
(cherry picked from commit 953ceff17a4a15b10cfdd5edc3c8cae4884c8ec3)