2031 Commits

Author SHA1 Message Date
Florian Weimer
53b3a3a26f elf: Support recursive use of dynamic TLS in interposed malloc
It turns out that quite a few applications use bundled mallocs that
have been built to use global-dynamic TLS (instead of the recommended
initial-exec TLS).  The previous workaround from
commit afe42e935b3ee97bac9a7064157587777259c60e ("elf: Avoid some
free (NULL) calls in _dl_update_slotinfo") does not fix all
encountered cases unfortunatelly.

This change avoids the TLS generation update for recursive use
of TLS from a malloc that was called during a TLS update.  This
is possible because an interposed malloc has a fixed module ID and
TLS slot.  (It cannot be unloaded.)  If an initially-loaded module ID
is encountered in __tls_get_addr and the dynamic linker is already
in the middle of a TLS update, use the outdated DTV, thus avoiding
another call into malloc.  It's still necessary to update the
DTV to the most recent generation, to get out of the slow path,
which is why the check for recursion is needed.

The bookkeeping is done using a global counter instead of per-thread
flag because TLS access in the dynamic linker is tricky.

All this will go away once the dynamic linker stops using malloc
for TLS, likely as part of a change that pre-allocates all TLS
during pthread_create/dlopen.

Fixes commit d2123d68275acc0f061e73d5f86ca504e0d5a344 ("elf: Fix slow
tls access after dlopen [BZ #19924]").

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 018f0fc3b818d4d1460a4e2384c24802504b1d20)
2025-01-11 08:19:41 -08:00
Szabolcs Nagy
147a830307 elf: Fix slow tls access after dlopen [BZ #19924]
In short: __tls_get_addr checks the global generation counter and if
the current dtv is older then _dl_update_slotinfo updates dtv up to the
generation of the accessed module. So if the global generation is newer
than generation of the module then __tls_get_addr keeps hitting the
slow dtv update path. The dtv update path includes a number of checks
to see if any update is needed and this already causes measurable tls
access slow down after dlopen.

It may be possible to detect up-to-date dtv faster.  But if there are
many modules loaded (> TLS_SLOTINFO_SURPLUS) then this requires at
least walking the slotinfo list.

This patch tries to update the dtv to the global generation instead, so
after a dlopen the tls access slow path is only hit once.  The modules
with larger generation than the accessed one were not necessarily
synchronized before, so additional synchronization is needed.

This patch uses acquire/release synchronization when accessing the
generation counter.

Note: in the x86_64 version of dl-tls.c the generation is only loaded
once, since relaxed mo is not faster than acquire mo load.

I have not benchmarked this. Tested by Adhemerval Zanella on aarch64,
powerpc, sparc, x86 who reported that it fixes the performance issue
of bug 19924.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit d2123d68275acc0f061e73d5f86ca504e0d5a344)
2025-01-10 08:49:31 -08:00
Florian Weimer
b41034cebf Add mremap tests
Add tests for MREMAP_MAYMOVE and MREMAP_FIXED.  On Linux, also test
MREMAP_DONTUNMAP.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit ff0320bec2810192d453c579623482fab87bfa01)
2024-08-01 17:15:36 +02:00
Florian Weimer
eeeaf0fe2d login: Check default sizes of structs utmp, utmpx, lastlog
The default <utmp-size.h> is for ports with a 64-bit time_t.
Ports with a 32-bit time_t or with __WORDSIZE_TIME64_COMPAT32=1
need to override it.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit 4d4da5aab936504b2d3eca3146e109630d9093c4)
2024-05-02 19:30:48 +02:00
Florian Weimer
dc3b5b9048 Revert "elf: Always call destructors in reverse constructor order (bug 30785)"
This reverts commit 5d83a52a4905405792418d6a0f3afa3601a98c37.

Reason for revert: Incompatibility with existing applications.
2023-10-18 14:31:02 +02:00
Florian Weimer
5d83a52a49 elf: Always call destructors in reverse constructor order (bug 30785)
The current implementation of dlclose (and process exit) re-sorts the
link maps before calling ELF destructors.  Destructor order is not the
reverse of the constructor order as a result: The second sort takes
relocation dependencies into account, and other differences can result
from ambiguous inputs, such as cycles.  (The force_first handling in
_dl_sort_maps is not effective for dlclose.)  After the changes in
this commit, there is still a required difference due to
dlopen/dlclose ordering by the application, but the previous
discrepancies went beyond that.

A new global (namespace-spanning) list of link maps,
_dl_init_called_list, is updated right before ELF constructors are
called from _dl_init.

In dl_close_worker, the maps variable, an on-stack variable length
array, is eliminated.  (VLAs are problematic, and dlclose should not
call malloc because it cannot readily deal with malloc failure.)
Marking still-used objects uses the namespace list directly, with
next and next_idx replacing the done_index variable.

After marking, _dl_init_called_list is used to call the destructors
of now-unused maps in reverse destructor order.  These destructors
can call dlopen.  Previously, new objects do not have l_map_used set.
This had to change: There is no copy of the link map list anymore,
so processing would cover newly opened (and unmarked) mappings,
unloading them.  Now, _dl_init (indirectly) sets l_map_used, too.
(dlclose is handled by the existing reentrancy guard.)

After _dl_init_called_list traversal, two more loops follow.  The
processing order changes to the original link map order in the
namespace.  Previously, dependency order was used.  The difference
should not matter because relocation dependencies could already
reorder link maps in the old code.

The changes to _dl_fini remove the sorting step and replace it with
a traversal of _dl_init_called_list.  The l_direct_opencount
decrement outside the loader lock is removed because it appears
incorrect: the counter manipulation could race with other dynamic
loader operations.

tst-audit23 needs adjustments to the changes in LA_ACT_DELETE
notifications.  The new approach for checking la_activity should
make it clearer that la_activty calls come in pairs around namespace
updates.

The dependency sorting test cases need updates because the destructor
order is always the opposite order of constructor order, even with
relocation dependencies or cycles present.

There is a future cleanup opportunity to remove the now-constant
force_first and for_fini arguments from the _dl_sort_maps function.

Fixes commit 1df71d32fe5f5905ffd5d100e5e9ca8ad62 ("elf: Implement
force_first handling in _dl_sort_maps_dfs (bug 28937)").

Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 6985865bc3ad5b23147ee73466583dd7fdf65892)
2023-09-11 09:36:01 +02:00
Florian Weimer
b6c7135576 elf: Introduce to _dl_call_fini
This consolidates the destructor invocations from _dl_fini and
dlclose.  Remove the micro-optimization that avoids
calling _dl_call_fini if they are no destructors (as dlclose is quite
expensive anyway).  The debug log message is now printed
unconditionally.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-09-11 09:35:47 +02:00
Adhemerval Zanella
2628500f5d m68k: Enforce 4-byte alignment on internal locks (BZ #29537)
A new internal definition, __LIBC_LOCK_ALIGNMENT, is used to force
the 4-byte alignment only for m68k, other architecture keep the
natural alignment of the type used internally (and hppa does not
require 16-byte alignment for kernel-assisted CAS).

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit aeb4d2e9815d459e2640a31f5abb8ef803830107)
2022-09-21 12:01:47 -03:00
Florian Weimer
d1241cf001 elf: Rename _dl_sort_maps parameter from skip to force_first
The new implementation will not be able to skip an arbitrary number
of objects.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit dbb75513f5cf9285c77c9e55777c5c35b653f890)
2022-09-20 11:04:44 +02:00
Jason A. Donenfeld
eaad4f9e8f arc4random: simplify design for better safety
Rather than buffering 16 MiB of entropy in userspace (by way of
chacha20), simply call getrandom() every time.

This approach is doubtlessly slower, for now, but trying to prematurely
optimize arc4random appears to be leading toward all sorts of nasty
properties and gotchas. Instead, this patch takes a much more
conservative approach. The interface is added as a basic loop wrapper
around getrandom(), and then later, the kernel and libc together can
work together on optimizing that.

This prevents numerous issues in which userspace is unaware of when it
really must throw away its buffer, since we avoid buffering all
together. Future improvements may include userspace learning more from
the kernel about when to do that, which might make these sorts of
chacha20-based optimizations more possible. The current heuristic of 16
MiB is meaningless garbage that doesn't correspond to anything the
kernel might know about. So for now, let's just do something
conservative that we know is correct and won't lead to cryptographic
issues for users of this function.

This patch might be considered along the lines of, "optimization is the
root of all evil," in that the much more complex implementation it
replaces moves too fast without considering security implications,
whereas the incremental approach done here is a much safer way of going
about things. Once this lands, we can take our time in optimizing this
properly using new interplay between the kernel and userspace.

getrandom(0) is used, since that's the one that ensures the bytes
returned are cryptographically secure. But on systems without it, we
fallback to using /dev/urandom. This is unfortunate because it means
opening a file descriptor, but there's not much of a choice. Secondly,
as part of the fallback, in order to get more or less the same
properties of getrandom(0), we poll on /dev/random, and if the poll
succeeds at least once, then we assume the RNG is initialized. This is a
rough approximation, as the ancient "non-blocking pool" initialized
after the "blocking pool", not before, and it may not port back to all
ancient kernels, though it does to all kernels supported by glibc
(≥3.2), so generally it's the best approximation we can do.

The motivation for including arc4random, in the first place, is to have
source-level compatibility with existing code. That means this patch
doesn't attempt to litigate the interface itself. It does, however,
choose a conservative approach for implementing it.

Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Mark Harris <mark.hsj@gmail.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-07-27 08:58:27 -03:00
Adhemerval Zanella Netto
4c128c7823 aarch64: Add optimized chacha20
It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-aarch64.S.  It is used as default and only
little-endian is supported (BE uses generic code).

As for generic implementation, the last step that XOR with the
input is omited.  The final state register clearing is also
omitted.

On a virtualized Linux on Apple M1 it shows the following
improvements (using formatted bench-arc4random data):

GENERIC                                    MB/s
-----------------------------------------------
arc4random [single-thread]               380.89
arc4random_buf(16) [single-thread]       500.73
arc4random_buf(32) [single-thread]       552.61
arc4random_buf(48) [single-thread]       566.82
arc4random_buf(64) [single-thread]       574.01
arc4random_buf(80) [single-thread]       581.02
arc4random_buf(96) [single-thread]       591.19
arc4random_buf(112) [single-thread]      592.29
arc4random_buf(128) [single-thread]      596.43
-----------------------------------------------

OPTIMIZED                                  MB/s
-----------------------------------------------
arc4random [single-thread]               569.60
arc4random_buf(16) [single-thread]       825.78
arc4random_buf(32) [single-thread]       987.03
arc4random_buf(48) [single-thread]      1042.39
arc4random_buf(64) [single-thread]      1075.50
arc4random_buf(80) [single-thread]      1094.68
arc4random_buf(96) [single-thread]      1130.16
arc4random_buf(112) [single-thread]     1129.58
arc4random_buf(128) [single-thread]     1137.91
-----------------------------------------------

Checked on aarch64-linux-gnu.
2022-07-22 11:58:27 -03:00
Adhemerval Zanella Netto
6f4e0fcfa2 stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417)
The implementation is based on scalar Chacha20 with per-thread cache.
It uses getrandom or /dev/urandom as fallback to get the initial entropy,
and reseeds the internal state on every 16MB of consumed buffer.

To improve performance and lower memory consumption the per-thread cache
is allocated lazily on first arc4random functions call, and if the
memory allocation fails getentropy or /dev/urandom is used as fallback.
The cache is also cleared on thread exit iff it was initialized (so if
arc4random is not called it is not touched).

Although it is lock-free, arc4random is still not async-signal-safe
(the per thread state is not updated atomically).

The ChaCha20 implementation is based on RFC8439 [1], omitting the final
XOR of the keystream with the plaintext because the plaintext is a
stream of zeros.  This strategy is similar to what OpenBSD arc4random
does.

The arc4random_uniform is based on previous work by Florian Weimer,
where the algorithm is based on Jérémie Lumbroso paper Optimal Discrete
Uniform Generation from Coin Flips, and Applications (2013) [2], who
credits Donald E. Knuth and Andrew C. Yao, The complexity of nonuniform
random number generation (1976), for solving the general case.

The main advantage of this method is the that the unit of randomness is not
the uniform random variable (uint32_t), but a random bit.  It optimizes the
internal buffer sampling by initially consuming a 32-bit random variable
and then sampling byte per byte.  Depending of the upper bound requested,
it might lead to better CPU utilization.

Checked on x86_64-linux-gnu, aarch64-linux, and powerpc64le-linux-gnu.

Co-authored-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>

[1] https://datatracker.ietf.org/doc/html/rfc8439
[2] https://arxiv.org/pdf/1304.1916.pdf
2022-07-22 11:58:27 -03:00
Adhemerval Zanella
8ee2c043cf Fix hurd namespace issues for internal signal functions
It was introduced by "Refactor internal-signals.h
(a1bdd81664aa681364d)".  Use the internal symbols instead.

Checked with a build for i686-gnu.
2022-07-04 11:10:06 -03:00
Adhemerval Zanella
a1bdd81664 Refactor internal-signals.h
The main drive is to optimize the internal usage and required size
when sigset_t is embedded in other data structures.  On Linux, the
current supported signal set requires up to 8 bytes (16 on mips),
was lower than the user defined sigset_t (128 bytes).

A new internal type internal_sigset_t is added, along with the
functions to operate on it similar to the ones for sigset_t.  The
internal-signals.h is also refactored to remove unused functions

Besides small stack usage on some functions (posix_spawn, abort)
it lower the struct pthread by about 120 bytes (112 on mips).

Checked on x86_64-linux-gnu.

Reviewed-by: Arjun Shankar <arjun@redhat.com>
2022-06-30 14:56:21 -03:00
Fangrui Song
de38b2a343 elf: Remove ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA
If an executable has copy relocations for extern protected data, that
can only work if the library containing the definition is built with
assumptions (a) the compiler emits GOT-generating relocations (b) the
linker produces R_*_GLOB_DAT instead of R_*_RELATIVE.  Otherwise the
library uses its own definition directly and the executable accesses a
stale copy.  Note: the GOT relocations defeat the purpose of protected
visibility as an optimization, but allow rtld to make the executable and
library use the same copy when copy relocations are present, but it
turns out this never worked perfectly.

ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA has strange semantics when both
a.so and b.so define protected var and the executable copy relocates
var: b.so accesses its own copy even with GLOB_DAT.  The behavior change
is from commit 62da1e3b00b51383ffa7efc89d8addda0502e107 (x86) and then
copied to nios2 (ae5eae7cfc9c4a8297ff82ec6b794faca1976ecc) and arc
(0e7d930c4c11de896fe807f67fa1eb756c9c1e05).

Without ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA, b.so accesses the copy
relocated data like a.so.

There is now a warning for copy relocation on protected symbol since
commit 7374c02b683b7110b853a32496a619410364d70b.  It's extremely
unlikely anyone relies on the ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA
behavior, so let's remove it: this removes a check in the symbol lookup
code.
2022-06-15 11:29:55 -07:00
Fangrui Song
7374c02b68 elf: Refine direct extern access diagnostics to protected symbol
Refine commit 349b0441dab375099b1d7f6909c1742286a67da9:

1. Copy relocations for extern protected data do not work properly,
regardless whether GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS is used.
It makes sense to produce a warning unconditionally.

2. Non-zero value of an undefined function symbol may break pointer
equality, but may be benign in many cases (many programs don't take the
address in the shared object then compare it with the address in the
executable).  Reword the diagnostic to be clearer.

3. Remove the unneeded condition !(undef_map->l_1_needed &
GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS). If the executable does
not not have GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS (can only
occur in error cases), the diagnostic should be emitted as well.

When the defining shared object has
GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS, report an error to apply
the intended enforcement.
2022-06-14 13:07:27 -07:00
Adhemerval Zanella
81e7fdd7cc elf: Remove _dl_skip_args
Now that no architecture uses it anymore.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-05-30 16:33:54 -03:00
Adhemerval Zanella
efeb2bd1ab math: Add math-use-builtins-fabs (BZ#29027)
Both float, double, and _Float128 are assumed to be supported
(float and double already only uses builtins).  Only long double
is parametrized due GCC bug 29253 which prevents its usage on
powerpc.

It allows to remove i686, ia64, x86_64, powerpc, and sparc arch
specific implementation.

On ia64 it also fixes the sNAN handling:

  math/test-float64x-fabs
  math/test-ldouble-fabs

Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu,
powerpc64-linux-gnu, sparc64-linux-gnu, and ia64-linux-gnu.
2022-05-23 17:49:18 -03:00
Noah Goldstein
9a421348cd elf: Optimize _dl_new_hash in dl-new-hash.h
Unroll slightly and enforce good instruction scheduling. This improves
performance on out-of-order machines. The unrolling allows for
pipelined multiplies.

As well, as an optional sysdep, reorder the operations and prevent
reassosiation for better scheduling and higher ILP. This commit
only adds the barrier for x86, although it should be either no
change or a win for any architecture.

Unrolling further started to induce slowdowns for sizes [0, 4]
but can help the loop so if larger sizes are the target further
unrolling can be beneficial.

Results for _dl_new_hash
Benchmarked on Tigerlake: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz

Time as Geometric Mean of N=30 runs
Geometric of all benchmark New / Old: 0.674
  type, length, New Time, Old Time, New Time / Old Time
 fixed,      0,    2.865,     2.72,               1.053
 fixed,      1,    3.567,    2.489,               1.433
 fixed,      2,    2.577,    3.649,               0.706
 fixed,      3,    3.644,    5.983,               0.609
 fixed,      4,    4.211,    6.833,               0.616
 fixed,      5,    4.741,    9.372,               0.506
 fixed,      6,    5.415,    9.561,               0.566
 fixed,      7,    6.649,   10.789,               0.616
 fixed,      8,    8.081,   11.808,               0.684
 fixed,      9,    8.427,   12.935,               0.651
 fixed,     10,    8.673,   14.134,               0.614
 fixed,     11,    10.69,   15.408,               0.694
 fixed,     12,   10.789,   16.982,               0.635
 fixed,     13,   12.169,   18.411,               0.661
 fixed,     14,   12.659,   19.914,               0.636
 fixed,     15,   13.526,   21.541,               0.628
 fixed,     16,   14.211,   23.088,               0.616
 fixed,     32,   29.412,   52.722,               0.558
 fixed,     64,    65.41,  142.351,               0.459
 fixed,    128,  138.505,  295.625,               0.469
 fixed,    256,  291.707,  601.983,               0.485
random,      2,   12.698,   12.849,               0.988
random,      4,   16.065,   15.857,               1.013
random,      8,   19.564,   21.105,               0.927
random,     16,   23.919,   26.823,               0.892
random,     32,   31.987,   39.591,               0.808
random,     64,   49.282,   71.487,               0.689
random,    128,    82.23,  145.364,               0.566
random,    256,  152.209,  298.434,                0.51

Co-authored-by: Alexander Monakov <amonakov@ispras.ru>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2022-05-23 10:38:40 -05:00
Adhemerval Zanella
32dd8c251a linux: Add pidfd_getfd
This was added on Linux 5.6 (8649c322f75c96e7ced2fec201e123b2b073bf09)
as a way to retrieve a file descriptors for another process though
pidfd (created either with CLONE_PIDFD or pidfd_getfd).  The
functionality is similar to recvmmsg SCM_RIGHTS.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2022-05-17 10:33:07 -03:00
Szabolcs Nagy
86147bbeec rtld: Remove DL_ARGV_NOT_RELRO and make _dl_skip_args const
_dl_skip_args is always 0, so the target specific code that modifies
argv after relro protection is applied is no longer used.

After the patch relro protection is applied to _dl_argv consistently
on all targets.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-05-17 10:14:03 +01:00
Adhemerval Zanella
d2db60d8d8 Remove dl-librecon.h header.
The Linux version used by i686 and m68k provide three overrrides for
generic code:

  1. DISTINGUISH_LIB_VERSIONS to print additional information when
     libc5 is used by a dependency.

  2. EXTRA_LD_ENVVARS to that enabled LD_LIBRARY_VERSION environment
     variable.

  3. EXTRA_UNSECURE_ENVVARS to add two environment variables related
     to aout support.

None are really requires, it has some decades since libc5 or aout
suppported was removed and Linux even remove support for aout files.
The LD_LIBRARY_VERSION is also dead code, dl_correct_cache_id is not
used anywhere.

Checked on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-05-16 15:03:49 -03:00
Adhemerval Zanella
c628c22963 elf: Remove ldconfig kernel version check
Now that it was removed on libc.so.
2022-05-16 15:03:49 -03:00
Adhemerval Zanella
b46d250656 Remove kernel version check
The kernel version check is used to avoid glibc to run on older
kernels where some syscall are not available and fallback code are
not enabled to handle graciously fail.  However, it does not prevent
if the kernel does not correctly advertise its version through
vDSO note, uname or procfs.

Also kernel version checks are sometime not desirable by users,
where they want to deploy on different system with different kernel
version knowing the minimum set of syscall is always presented on
such systems.

The kernel version check has been removed along with the
LD_ASSUME_KERNEL environment variable.  The minimum kernel used to
built glibc is still provided through NT_GNU_ABI_TAG ELF note and
also printed when libc.so is issued.

Checked on x86_64-linux-gnu.
2022-05-16 15:03:49 -03:00
Florian Weimer
f787e138aa csu: Implement and use _dl_early_allocate during static startup
This implements mmap fallback for a brk failure during TLS
allocation.

scripts/tls-elf-edit.py is updated to support the new patching method.
The script no longer requires that in the input object is of ET_DYN
type.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-05-16 18:42:03 +02:00
Adhemerval Zanella
6fad891dfd stdio: Remove the usage of $(fno-unit-at-a-time) for siglist.c
The siglist.c is built with -fno-toplevel-reorder to avoid compiler
to reorder the compat assembly directives due an assembler
issue [1] (fixed on 2.39).

This patch removes the compiler flags by split the compat symbol
generation in two phases.  First the __sys_siglist and __sys_sigabbrev
without any compat symbol directive is preprocessed to generate an
assembly source code.  This generate assembly is then used as input
on a platform agnostic siglist.S which then creates the compat
definitions.  This prevents compiler to move any compat directive
prior the _sys_errlist definition itself.

Checked on a make check run-built-tests=no on all affected ABIs.

Reviewed-by: Fangrui Song <maskray@google.com>
2022-05-13 10:54:41 -03:00
Noah Goldstein
911c63a51c sysdeps: Add 'get_fast_jitter' interace in fast-jitter.h
'get_fast_jitter' is meant to be used purely for performance
purposes. In all cases it's used it should be acceptable to get no
randomness (see default case). An example use case is in setting
jitter for retries between threads at a lock. There is a
performance benefit to having jitter, but only if the jitter can
be generated very quickly and ultimately there is no serious issue
if no jitter is generated.

The implementation generally uses 'HP_TIMING_NOW' iff it is
inlined (avoid any potential syscall paths).
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-04-27 17:17:43 -05:00
Fangrui Song
693517b922 elf: Remove unused enum allowmask
Unused since 52a01100ad011293197637e42b5be1a479a2f4ae
("elf: Remove ad-hoc restrictions on dlopen callers [BZ #22787]").

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-04-25 01:01:02 -07:00
Fangrui Song
3e9acce8c5 elf: Remove __libc_init_secure
After 73fc4e28b9464f0e13edc719a5372839970e7ddb,
__libc_enable_secure_decided is always 0 and a statically linked
executable may overwrite __libc_enable_secure without considering
AT_SECURE.

The __libc_enable_secure has been correctly initialized in _dl_aux_init,
so just remove __libc_enable_secure_decided and __libc_init_secure.
This allows us to remove some startup_get*id functions from
22b79ed7f413cd980a7af0cf258da5bf82b6d5e5.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-04-19 15:52:27 -07:00
Szabolcs Nagy
707efc2955 Remove _dl_skip_args_internal declaration
It does not seem to be used.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-04-12 14:42:26 +01:00
Adhemerval Zanella
f60e45ba10 elf: Remove inline _dl_dprintf
It is not used on rtld and ldsodef interfaces are meant to be used
solely on loader.  It also removes the only usage of gcc extension
__builtin_va_arg_pack.
2022-03-23 10:42:01 -03:00
Adhemerval Zanella
144761540a elf: Remove LD_USE_LOAD_BIAS
It is solely for prelink with PIE executables [1].

[1] https://sourceware.org/legacy-ml/libc-hacker/2003-11/msg00127.html

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2022-02-10 09:18:15 -03:00
Adhemerval Zanella
6628c742b2 elf: Remove prelink support
Prelinked binaries and libraries still work, the dynamic tags
DT_GNU_PRELINKED, DT_GNU_LIBLIST, DT_GNU_CONFLICT just ignored
(meaning the process is reallocated as default).

The loader environment variable TRACE_PRELINKING is also removed,
since it used solely on prelink.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2022-02-10 09:16:12 -03:00
Samuel Thibault
355bc7f736 SET_RELHOOK: merge i386 and x86_64, and move to sysdeps/mach/hurd/x86
It is not Hurd-specific, but H.J. Lu wants it there.

Also, dc.a can be used to avoid hardcoding .long vs .quad and thus use
the same implementation for i386 and x86_64.
2022-02-01 20:08:25 +00:00
Ben Woodard
ce9a68c57c elf: Fix runtime linker auditing on aarch64 (BZ #26643)
The rtld audit support show two problems on aarch64:

  1. _dl_runtime_resolve does not preserve x8, the indirect result
      location register, which might generate wrong result calls
      depending of the function signature.

  2. The NEON Q registers pushed onto the stack by _dl_runtime_resolve
     were twice the size of D registers extracted from the stack frame by
     _dl_runtime_profile.

While 2. might result in wrong information passed on the PLT tracing,
1. generates wrong runtime behaviour.

The aarch64 rtld audit support is changed to:

  * Both La_aarch64_regs and La_aarch64_retval are expanded to include
    both x8 and the full sized NEON V registers, as defined by the
    ABI.

  * dl_runtime_profile needed to extract registers saved by
    _dl_runtime_resolve and put them into the new correctly sized
    La_aarch64_regs structure.

  * The LAV_CURRENT check is change to only accept new audit modules
    to avoid the undefined behavior of not save/restore x8.

  * Different than other architectures, audit modules older than
    LAV_CURRENT are rejected (both La_aarch64_regs and La_aarch64_retval
    changed their layout and there are no requirements to support multiple
    audit interface with the inherent aarch64 issues).

  * A new field is also reserved on both La_aarch64_regs and
    La_aarch64_retval to support variant pcs symbols.

Similar to x86, a new La_aarch64_vector type to represent the NEON
register is added on the La_aarch64_regs (so each type can be accessed
directly).

Since LAV_CURRENT was already bumped to support bind-now, there is
no need to increase it again.

Checked on aarch64-linux-gnu.

Co-authored-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2022-02-01 14:49:46 -03:00
Adhemerval Zanella
32612615c5 elf: Issue la_symbind for bind-now (BZ #23734)
The audit symbind callback is not called for binaries built with
-Wl,-z,now or when LD_BIND_NOW=1 is used, nor the PLT tracking callbacks
(plt_enter and plt_exit) since this would change the expected
program semantics (where no PLT is expected) and would have performance
implications (such as for BZ#15533).

LAV_CURRENT is also bumped to indicate the audit ABI change (where
la_symbind flags are set by the loader to indicate no possible PLT
trace).

To handle powerpc64 ELFv1 function descriptor, _dl_audit_symbind
requires to know whether bind-now is used so the symbol value is
updated to function text segment instead of the OPD (for lazy binding
this is done by PPC64_LOAD_FUNCPTR on _dl_runtime_resolve).

Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu,
powerpc64-linux-gnu.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2022-02-01 14:49:46 -03:00
Adhemerval Zanella
254d3d5aef elf: Fix initial-exec TLS access on audit modules (BZ #28096)
For audit modules and dependencies with initial-exec TLS, we can not
set the initial TLS image on default loader initialization because it
would already be set by the audit setup.  However, subsequent thread
creation would need to follow the default behaviour.

This patch fixes it by setting l_auditing link_map field not only
for the audit modules, but also for all its dependencies.  This is
used on _dl_allocate_tls_init to avoid the static TLS initialization
at load time.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2022-02-01 14:49:46 -03:00
H.J. Lu
3fb18fd80c elf: Add <dl-r_debug.h>
Add <dl-r_debug.h> to get the adddress of the r_debug structure after
relocation and its offset before relocation from the PT_DYNAMIC segment
to support DT_DEBUG, DT_MIPS_RLD_MAP_REL and DT_MIPS_RLD_MAP.

Co-developed-by: Xi Ruoyao <xry111@mengyan1223.wang>
2022-01-31 07:05:48 -08:00
Adhemerval Zanella
9fe6f63638 elf: Fix 64 time_t support for installed statically binaries
The usage of internal static symbol for statically linked binaries
does not work correctly for objects built with -D_TIME_BITS=64,
since the internal definition does not provide the expected aliases.

This patch makes it to use the default stat functions instead (which
uses the default 64 time_t alias and types).

Checked on i686-linux-gnu.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2022-01-17 10:57:09 -03:00
Adhemerval Zanella
cedd498dbc Revert "elf: Fix 64 time_t support for installed statically binaries"
This reverts commit 0b8e83eb1455f3c0332eeb1f96fbc262fbd054e0.
2022-01-17 10:56:58 -03:00
Adhemerval Zanella
0b8e83eb14 elf: Fix 64 time_t support for installed statically binaries
The usage of internal static symbol for statically linked binaries
does not work correctly for objects built with -D_TIME_BITS=64,
since the internal definition does not provide the expected aliases.

This patch makes it to use the default stat functions instead (which
uses the default 64 time_t alias and types).

Checked on i686-linux-gnu.
2022-01-12 10:30:10 -03:00
Szabolcs Nagy
347a5b592c math: Fix float conversion regressions with gcc-12 [BZ #28713]
Converting double precision constants to float is now affected by the
runtime dynamic rounding mode instead of being evaluated at compile
time with default rounding mode (except static object initializers).

This can change the computed result and cause performance regression.
The known correctness issues (increased ulp errors) are already fixed,
this patch fixes remaining cases of unnecessary runtime conversions.

Add float M_* macros to math.h as new GNU extension API.  To avoid
conversions the new M_* macros are used and instead of casting double
literals to float, use float literals (only required if the conversion
is inexact).

The patch was tested on aarch64 where the following symbols had new
spurious conversion instructions that got fixed:

  __clog10f
  __gammaf_r_finite@GLIBC_2.17
  __j0f_finite@GLIBC_2.17
  __j1f_finite@GLIBC_2.17
  __jnf_finite@GLIBC_2.17
  __kernel_casinhf
  __lgamma_negf
  __log1pf
  __y0f_finite@GLIBC_2.17
  __y1f_finite@GLIBC_2.17
  cacosf
  cacoshf
  casinhf
  catanf
  catanhf
  clogf
  gammaf_positive

Fixes bug 28713.

Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2022-01-10 14:27:17 +00:00
Samuel Thibault
d5b0046e3d ttydefaults.h: Fix CSTATUS to control-t
4.4BSD actually defaults CSTATUS to control-t, so our generic header should
as well.
2022-01-07 00:23:05 +01:00
Adhemerval Zanella
65ccd641ba debug: Remove catchsegv and libSegfault (BZ #14913)
Trapping SIGSEGV within the process is error-prone, adds security
issues, and modern analysis design tends to happen out of the
process (either by attaching a debugger or by post-mortem analysis).

The libSegfault also has some design problems, it uses non
async-signal-safe function (backtrace) on signal handler.

There are multiple alternatives if users do want to use similar
functionality, such as sigsegv gnulib module or libsegfault.
2022-01-06 07:59:49 -03:00
H.J. Lu
9288c92d00 elf: Add <dl-debug.h>
Add <dl-debug.h> to setup debugging entry in PT_DYNAMIC segment to support
DT_DEBUG, DT_MIPS_RLD_MAP_REL and DT_MIPS_RLD_MAP.

Tested on x86-64, x32 and i686 as well as with build-many-glibcs.py.
2022-01-03 05:16:03 -08:00
Paul Eggert
581c785bf3 Update copyright dates with scripts/update-copyrights
I used these shell commands:

../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")

and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.

I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah.  I don't
know why I run into these diagnostics whereas others evidently do not.

remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
2022-01-01 11:40:24 -08:00
Florian Weimer
5d28a8962d elf: Add _dl_find_object function
It can be used to speed up the libgcc unwinder, and the internal
_dl_find_dso_for_object function (which is used for caller
identification in dlopen and related functions, and in dladdr).

_dl_find_object is in the internal namespace due to bug 28503.
If libgcc switches to _dl_find_object, this namespace issue will
be fixed.  It is located in libc for two reasons: it is necessary
to forward the call to the static libc after static dlopen, and
there is a link ordering issue with -static-libgcc and libgcc_eh.a
because libc.so is not a linker script that includes ld.so in the
glibc build tree (so that GCC's internal -lc after libgcc_eh.a does
not pick up ld.so).

It is necessary to do the i386 customization in the
sysdeps/x86/bits/dl_find_object.h header shared with x86-64 because
otherwise, multilib installations are broken.

The implementation uses software transactional memory, as suggested
by Torvald Riegel.  Two copies of the supporting data structures are
used, also achieving full async-signal-safety.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-12-28 22:52:56 +01:00
Adhemerval Zanella
83b8d5027d malloc: Remove memusage.h
And use machine-sp.h instead.  The Linux implementation is based on
already provided CURRENT_STACK_FRAME (used on nptl code) and
STACK_GROWS_UPWARD is replaced with _STACK_GROWS_UP.
2021-12-28 14:57:57 -03:00
Adhemerval Zanella
a75b1e35c5 malloc: Use hp-timing on libmemusage
Instead of reimplemeting on GETTIME macro.
2021-12-28 14:57:57 -03:00
Adhemerval Zanella
5a5f7a160d malloc: Remove atomic_* usage
These typedef are used solely on memusage and can be replaced with
generic types.
2021-12-28 14:57:57 -03:00