Based on the glibc microbenchmark, only a few short inputs with this
strncmp-aligned and strncmp-lsx implementation experience performance
degradation, overall, strncmp-aligned could reduce the runtime 0%-10%
for aligned comparision, 10%-25% for unaligend comparision, strncmp-lsx
could reduce the runtime about 0%-60%.
Based on the glibc microbenchmark, strcmp-aligned implementation could
reduce the runtime 0%-10% for aligned comparison, 10%-20% for unaligned
comparison, strcmp-lsx implemenation could reduce the runtime 0%-50%.
Based on the glibc microbenchmark, strnlen-aligned implementation could
reduce the runtime more than 10%, strnlen-lsx implementation could reduce
the runtime about 50%-78%, strnlen-lasx implementation could reduce the
runtime about 50%-88%.
The path auxv[*].a_val could either be an integer or a string,
depending on the a_type value. Use a separate field, a_val_string, to
simplify mechanical parsing of the --list-diagnostics output.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
On Skylake, it changes log1p bench performance by:
Before After Improvement
max 63.349 58.347 8%
min 4.448 5.651 -30%
mean 12.0674 10.336 14%
The minimum code path is
if (hx < 0x3FDA827A) /* x < 0.41422 */
{
if (__glibc_unlikely (ax >= 0x3ff00000)) /* x <= -1.0 */
{
...
}
if (__glibc_unlikely (ax < 0x3e200000)) /* |x| < 2**-29 */
{
math_force_eval (two54 + x); /* raise inexact */
if (ax < 0x3c900000) /* |x| < 2**-54 */
{
...
}
else
return x - x * x * 0.5;
FMA and non-FMA code sequences look similar. Non-FMA version is slightly
faster. Since log1p is called by asinh and atanh, it improves asinh
performance by:
Before After Improvement
max 75.645 63.135 16%
min 10.074 10.071 0%
mean 15.9483 14.9089 6%
and improves atanh performance by:
Before After Improvement
max 91.768 75.081 18%
min 15.548 13.883 10%
mean 18.3713 16.8011 8%
When building with fortify enabled, GCC < 12 issues a warning on the
fortify strncat wrapper might overflow the destination buffer (the
failure is tied to -Werror).
Checked on ppc64 and x86_64.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The static PIE configure check uses link tests. When bootstrapping
a cross-toolchain, the link tests fail due to missing crt-files /
libc.so. As we explicitely want to test an issue in binutils (ld),
we now also explicitely check for known linker versions.
See also commit 368b7c614b
S390: Use compile-only instead of also link-tests in configure.
These implementations improve the time to copy data in the glibc
microbenchmark as below:
memcpy-lasx reduces the runtime about 8%-76%
memcpy-lsx reduces the runtime about 8%-72%
memcpy-unaligned reduces the runtime of unaligned data copying up to 40%
memcpy-aligned reduece the runtime of unaligned data copying up to 25%
memmove-lasx reduces the runtime about 20%-73%
memmove-lsx reduces the runtime about 50%
memmove-unaligned reduces the runtime of unaligned data moving up to 40%
memmove-aligned reduces the runtime of unaligned data moving up to 25%
These implementations improve the time to run strchr{nul}
microbenchmark in glibc as below:
strchr-lasx reduces the runtime about 50%-83%
strchr-lsx reduces the runtime about 30%-67%
strchr-aligned reduces the runtime about 10%-20%
strchrnul-lasx reduces the runtime about 50%-83%
strchrnul-lsx reduces the runtime about 36%-65%
strchrnul-aligned reduces the runtime about 6%-10%
SYS_modify_ldt requires CONFIG_MODIFY_LDT_SYSCALL to be set in the kernel, which
some distributions may disable for hardening. Check if that's the case (unset)
and mark the test as UNSUPPORTED if so.
Reviewed-by: DJ Delorie <dj@redhat.com>
Signed-off-by: Sam James <sam@gentoo.org>
All callers pass 1 or 0x11 anyway (same meaning according to man page),
but still.
Reviewed-by: DJ Delorie <dj@redhat.com>
Signed-off-by: Sam James <sam@gentoo.org>
On i686 f_type is an i32 so the test fails when that has the top bit set.
Explicitly cast to u32.
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Commit 78ceef25d6 ("configure: Remove
--enable-all-warnings option") removed it due to a missing +.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
On the test workload (mpv --cache=yes with VP9 video decoding), the
bin scanning has a very poor success rate (less than 2%). The tcache
scanning has about 50% success rate, so keep that.
Update comments in malloc/tst-memalign-2 to indicate the purpose
of the tests. Even with the scanning removed, the additional
merging opportunities since commit 542b110585
("malloc: Enable merging of remainders in memalign (bug 30723)")
are sufficient to pass the existing large bins test.
Remove leftover variables from _int_free from refactoring in the
same commit.
Reviewed-by: DJ Delorie <dj@redhat.com>
On Skylake, it improves expm1 bench performance by:
Before After Improvement
max 70.204 68.054 3%
min 20.709 16.2 22%
mean 22.1221 16.7367 24%
NB: Add
extern long double __expm1l (long double);
extern long double __expm1f128 (long double);
for __typeof (__expm1l) and __typeof (__expm1f128) when __expm1 is
defined since __expm1 may be expanded in their declarations which
causes the build failure.
strlen-lasx is implemeted by LASX simd instructions(256bit)
strlen-lsx is implemeted by LSX simd instructions(128bit)
strlen-align is implemented by LA basic instructions and never use unaligned memory acess
LoongArch glibc can add some LASX/LSX vector instructions codes,
change the required minimum binutils version to 2.41 which could
support vector instructions. HAVE_LOONGARCH_VEC_ASM is removed
accordingly.
The following usage of macro LEAF/ENTRY are all feasible:
1. LEAF(fcn) -- the align value of fcn is .align 3(default value)
2. LEAF(fcn, 6) -- the align value of fcn is .align 6
The:
```
if (shared_per_thread > 0 && threads > 0)
shared_per_thread /= threads;
```
Code was accidentally moved to inside the else scope. This doesn't
match how it was previously (before af992e7abd).
This patch fixes that by putting the division after the `else` block.
Previously, calling _int_free from _int_memalign could put remainders
into the tcache or into fastbins, where they are invisible to the
low-level allocator. This results in missed merge opportunities
because once these freed chunks become available to the low-level
allocator, further memalign allocations (even of the same size are)
likely obstructing merges.
Furthermore, during forwards merging in _int_memalign, do not
completely give up when the remainder is too small to serve as a
chunk on its own. We can still give it back if it can be merged
with the following unused chunk. This makes it more likely that
memalign calls in a loop achieve a compact memory layout,
independently of initial heap layout.
Drop some useless (unsigned long) casts along the way, and tweak
the style to more closely match GNU on changed lines.
Reviewed-by: DJ Delorie <dj@redhat.com>
The nscd daemon caches hosts data from NSS modules verbatim, without
filtering protocol families or sorting them (otherwise separate caches
would be needed for certain ai_flags combinations). The cache
implementation is complete separate from the getaddrinfo code. This
means that rebuilding getaddrinfo is not needed. The only function
actually used is __bump_nl_timestamp from check_pf.c, and this change
moves it into nscd/connections.c.
Tested on x86_64-linux-gnu with -fexceptions, built with
build-many-glibcs.py. I also backported this patch into a distribution
that still supports nscd and verified manually that caching still works.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Since i686 provides the fortified wrappers for memcpy, mempcpy,
memmove, and memset on the same string implementation, the static
build tries to optimized it by not tying the fortified wrappers
to string routine (to avoid pulling the fortify function if
they are not required).
Checked on i686-linux-gnu building with different option:
default and --disable-multi-arch plus default, --disable-default-pie,
--enable-fortify-source={2,3}, and --enable-fortify-source={2,3}
with --disable-default-pie.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
With multiarch disabled, the default memmove implementation provides
the fortify routines for memcpy, mempcpy, and memmove. However, it
does not provide the internal hidden definitions used when building
with fortify enabled. The memset has a similar issue.
Checked on x86_64-linux-gnu building with different options:
default and --disable-multi-arch plus default, --disable-default-pie,
--enable-fortify-source={2,3}, and --enable-fortify-source={2,3}
with --disable-default-pie.
Tested-by: Andreas K. Huettel <dilfridge@gentoo.org>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Linux 6.4 adds new constants PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG
and PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG. Add those to all
relevant sys/ptrace.h headers, along with adding the associated
argument structure to bits/ptrace-shared.h (named struct
__ptrace_sud_config there following the usual convention for such
structures).
Tested for x86_64 and with build-many-glibcs.py.
Making error_t defined to enum __error_t_codes conveniently makes the
debugger print symbolic values, but in C++ int is not interoperable with
enum __error_t_codes, leading to C++ application build issues, so let's
revert error_t to int in C++.
This is the only missing part in struct statvfs.
The LSB calls [f]statfs() deprecated, and its weird types are definitely
off-putting. However, its use is required to get f_type.
Instead, allocate one of the six spares to f_type,
copied directly from struct statfs.
This then becomes a small glibc extension to the standard interface
on Linux and the Hurd, instead of two different interfaces, one of which
is quite odd due to being an ABI type, and there no longer is any reason
to use statfs().
The underlying kernel type is a mess, but all architectures agree on u32
(or more) for the ABI, and all filesystem magicks are 32-bit integers.
We don't lose any generality by using u32, and by doing so we both make
the API consistent with the Hurd, and allow C++
switch(f_type) { case RAMFS_MAGIC: ...; }
Also fix tst-statvfs so that it actually fails;
as it stood, all it did was return 0 always.
Test statfs()' and statvfs()' f_types are the same.
Link: https://lore.kernel.org/linux-man/f54kudgblgk643u32tb6at4cd3kkzha6hslahv24szs4raroaz@ogivjbfdaqtb/t/#u
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>