This patch remove the PID cache and usage in current GLIBC code. Current
usage is mainly used a performance optimization to avoid the syscall,
however it adds some issues:
- The exposed clone syscall will try to set pid/tid to make the new
thread somewhat compatible with current GLIBC assumptions. This cause
a set of issue with new workloads and usecases (such as BZ#17214 and
[1]) as well for new internal usage of clone to optimize other algorithms
(such as clone plus CLONE_VM for posix_spawn, BZ#19957).
- The caching complexity also added some bugs in the past [2] [3] and
requires more effort of each port to handle such requirements (for
both clone and vfork implementation).
- Caching performance gain in mainly on getpid and some specific
code paths. The getpid performance leverage is questionable [4],
either by the idea of getpid being a hotspot as for the getpid
implementation itself (if it is indeed a justifiable hotspot a
vDSO symbol could let to a much more simpler solution).
Other usage is mainly for non usual code paths, such as pthread
cancellation signal and handling.
For thread creation (on stack allocation) the code simplification in fact
adds some performance gain due the no need of transverse the stack cache
and invalidate each element pid.
Other thread usages will require a direct getpid syscall, such as
cancellation/setxid signal, thread cancellation, thread fail path (at
create_thread), and thread signal (pthread_kill and pthread_sigqueue).
However these are hardly usual hotspots and I think adding a syscall is
justifiable.
It also simplifies both the clone and vfork arch-specific implementation.
And by review each fork implementation there are some discrepancies that
this patch also solves:
- microblaze clone/vfork does not set/reset the pid/tid field
- hppa uses the default vfork implementation that fallback to fork.
Since vfork is deprecated I do not think we should bother with it.
The patch also removes the TID caching in clone. My understanding for
such semantic is try provide some pthread usage after a user program
issue clone directly (as done by thread creation with CLONE_PARENT_SETTID
and pthread tid member). However, as stated before in multiple discussions
threads, GLIBC provides clone syscalls without further supporting all this
semantics.
I ran a full make check on x86_64, x32, i686, armhf, aarch64, and powerpc64le.
For sparc32, sparc64, and mips I ran the basic fork and vfork tests from
posix/ folder (on a qemu system). So it would require further testing
on alpha, hppa, ia64, m68k, nios2, s390, sh, and tile (I excluded microblaze
because it is already implementing the patch semantic regarding clone/vfork).
[1] https://codereview.chromium.org/800183004/
[2] https://sourceware.org/ml/libc-alpha/2006-07/msg00123.html
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=15368
[4] http://yarchive.net/comp/linux/getpid_caching.html
* sysdeps/nptl/fork.c (__libc_fork): Remove pid cache setting.
* nptl/allocatestack.c (allocate_stack): Likewise.
(__reclaim_stacks): Likewise.
(setxid_signal_thread): Obtain pid through syscall.
* nptl/nptl-init.c (sigcancel_handler): Likewise.
(sighandle_setxid): Likewise.
* nptl/pthread_cancel.c (pthread_cancel): Likewise.
* sysdeps/unix/sysv/linux/pthread_kill.c (__pthread_kill): Likewise.
* sysdeps/unix/sysv/linux/pthread_sigqueue.c (pthread_sigqueue):
Likewise.
* sysdeps/unix/sysv/linux/createthread.c (create_thread): Likewise.
* sysdeps/unix/sysv/linux/getpid.c: Remove file.
* nptl/descr.h (struct pthread): Change comment about pid value.
* nptl/pthread_getattr_np.c (pthread_getattr_np): Remove thread
pid assert.
* sysdeps/unix/sysv/linux/pthread-pids.h (__pthread_initialize_pids):
Do not set pid value.
* nptl_db/td_ta_thr_iter.c (iterate_thread_list): Remove thread
pid cache check.
* nptl_db/td_thr_validate.c (td_thr_validate): Likewise.
* sysdeps/aarch64/nptl/tcb-offsets.sym: Remove pid offset.
* sysdeps/alpha/nptl/tcb-offsets.sym: Likewise.
* sysdeps/arm/nptl/tcb-offsets.sym: Likewise.
* sysdeps/hppa/nptl/tcb-offsets.sym: Likewise.
* sysdeps/i386/nptl/tcb-offsets.sym: Likewise.
* sysdeps/ia64/nptl/tcb-offsets.sym: Likewise.
* sysdeps/m68k/nptl/tcb-offsets.sym: Likewise.
* sysdeps/microblaze/nptl/tcb-offsets.sym: Likewise.
* sysdeps/mips/nptl/tcb-offsets.sym: Likewise.
* sysdeps/nios2/nptl/tcb-offsets.sym: Likewise.
* sysdeps/powerpc/nptl/tcb-offsets.sym: Likewise.
* sysdeps/s390/nptl/tcb-offsets.sym: Likewise.
* sysdeps/sh/nptl/tcb-offsets.sym: Likewise.
* sysdeps/sparc/nptl/tcb-offsets.sym: Likewise.
* sysdeps/tile/nptl/tcb-offsets.sym: Likewise.
* sysdeps/x86_64/nptl/tcb-offsets.sym: Likewise.
* sysdeps/unix/sysv/linux/aarch64/clone.S: Remove pid and tid caching.
* sysdeps/unix/sysv/linux/alpha/clone.S: Likewise.
* sysdeps/unix/sysv/linux/arm/clone.S: Likewise.
* sysdeps/unix/sysv/linux/hppa/clone.S: Likewise.
* sysdeps/unix/sysv/linux/i386/clone.S: Likewise.
* sysdeps/unix/sysv/linux/ia64/clone2.S: Likewise.
* sysdeps/unix/sysv/linux/mips/clone.S: Likewise.
* sysdeps/unix/sysv/linux/nios2/clone.S: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/clone.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/clone.S: Likewise.
* sysdeps/unix/sysv/linux/sh/clone.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/clone.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/clone.S: Likewise.
* sysdeps/unix/sysv/linux/tile/clone.S: Likewise.
* sysdeps/unix/sysv/linux/x86_64/clone.S: Likewise.
* sysdeps/unix/sysv/linux/aarch64/vfork.S: Remove pid set and reset.
* sysdeps/unix/sysv/linux/alpha/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/arm/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/i386/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/ia64/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/m68k/clone.S: Likewise.
* sysdeps/unix/sysv/linux/m68k/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/mips/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/nios2/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/sh/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/tile/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/x86_64/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/tst-clone2.c (f): Remove direct pthread
struct access.
(clone_test): Remove function.
(do_test): Rewrite to take in consideration pid is not cached anymore.
Many headers are expected to expose a subset of the type definitions
in time.h. time.h has a whole bunch of messy logic for conditionally
defining some its types and structs, but, as best I can tell, this
has never worked 100%. In particular, __need_timespec is ineffective
if _TIME_H has already been defined, which means that if you compile
#include <time.h>
#include <sched.h>
with e.g. -fsyntax-only -std=c89 -Wall -Wsystem-headers, you will get
In file included from test.c:2:0:
/usr/include/sched.h:74:57: warning: "struct timespec" declared inside
parameter list will not be visible outside of this definition or declaration
extern int sched_rr_get_interval (__pid_t __pid, struct timespec *__t) __THROW;
^~~~~~~~
And if you want to _use_ sched_rr_get_interval in a TU compiled that
way, you're hosed.
This patch replaces all of that with small bits/types/TYPE.h headers
as introduced earlier. time.h and bits/time.h are now *much* simpler,
and a lot of other headers are slightly simpler.
* time/time.h, bits/time.h, sysdeps/unix/sysv/linux/bits/time.h:
Remove all logic conditional on __need macros. Move all the
conditionally defined types to their own headers...
* time/bits/types/clock_t.h: Define clock_t here.
* time/bits/types/clockid_t.h: Define clockid_t here.
* time/bits/types/struct_itimerspec.h: Define struct itimerspec here.
* time/bits/types/struct_timespec.h: Define struct timespec here.
* time/bits/types/struct_timeval.h: Define struct timeval here.
* time/bits/types/struct_tm.h: Define struct tm here.
* time/bits/types/time_t.h: Define time_t here.
* time/bits/types/timer_t.h: Define timer_t here.
* time/Makefile: Install the new headers.
* bits/resource.h, io/fcntl.h, io/sys/poll.h, io/sys/stat.h
* io/utime.h, misc/sys/select.h, posix/sched.h, posix/sys/times.h
* posix/sys/types.h, resolv/netdb.h, rt/aio.h, rt/mqueue.h
* signal/signal.h, pthread/semaphore.h, sysdeps/nptl/pthread.h
* sysdeps/unix/sysv/linux/alpha/bits/resource.h
* sysdeps/unix/sysv/linux/alpha/sys/acct.h
* sysdeps/unix/sysv/linux/bits/resource.h
* sysdeps/unix/sysv/linux/bits/timex.h
* sysdeps/unix/sysv/linux/mips/bits/resource.h
* sysdeps/unix/sysv/linux/net/ppp_defs.h
* sysdeps/unix/sysv/linux/sparc/bits/resource.h
* sysdeps/unix/sysv/linux/sys/acct.h
* sysdeps/unix/sysv/linux/sys/timerfd.h
* sysvipc/sys/msg.h, sysvipc/sys/sem.h, sysvipc/sys/shm.h
* time/sys/time.h, time/sys/timeb.h
Use the new bits/types headers.
* include/time.h: Remove __need logic.
* include/bits/time.h
* include/bits/types/clock_t.h, include/bits/types/clockid_t.h
* include/bits/types/time_t.h, include/bits/types/timer_t.h
* include/bits/types/struct_itimerspec.h
* include/bits/types/struct_timespec.h
* include/bits/types/struct_timeval.h
* include/bits/types/struct_tm.h:
New wrapper headers.
Nothing depends on the PTW macro anymore, so the mechanism to define
PTW for recompliations of libc routines is no longer needed. The
source files are still recompiled for the nptl directory, just without
the “ptw-” prefix.
(Reducing the number of pattern rules in sysd-rules is critical for
improving make performance.)
Existing interposed mallocs do not define the glibc-internal
fork callbacks (and they should not), so statically interposed
mallocs lead to link failures because the strong reference from
fork pulls in glibc's malloc, resulting in multiple definitions
of malloc-related symbols.
This provides a band-aid and addresses the scenario where fork is
called from a signal handler while the process is in the malloc
subsystem (or has acquired the libio list lock). It does not
address the general issue of async-signal-safety of fork;
multi-threaded processes are not covered, and some glibc
subsystems have fork handlers which are not async-signal-safe.
Previously, a thread M invoking fork would acquire locks in this order:
(M1) malloc arena locks (in the registered fork handler)
(M2) libio list lock
A thread F invoking flush (NULL) would acquire locks in this order:
(F1) libio list lock
(F2) individual _IO_FILE locks
A thread G running getdelim would use this order:
(G1) _IO_FILE lock
(G2) malloc arena lock
After executing (M1), (F1), (G1), none of the threads can make progress.
This commit changes the fork lock order to:
(M'1) libio list lock
(M'2) malloc arena locks
It explicitly encodes the lock order in the implementations of fork,
and does not rely on the registration order, thus avoiding the deadlock.
The previous barrier implementation did not fulfill the POSIX requirements
for when a barrier can be destroyed. Specifically, it was possible that
threads that haven't noticed yet that their round is complete still access
the barrier's memory, and that those accesses can happen after the barrier
has been legally destroyed.
The new algorithm does not have this issue, and it avoids using a lock
internally.
POSIX and C++11 require that a thread can destroy a mutex if no other
thread owns the mutex, is blocked on the mutex, or will try to acquire
it in the future. After destroying the mutex, it can reuse or unmap the
underlying memory. Thus, we must not access a mutex' memory after
releasing it. Currently, we can load the private flag after releasing
the mutex, which is fixed by this patch.
See https://sourceware.org/bugzilla/show_bug.cgi?id=13690 for more
background.
We need to call futex_wake on the lock after releasing it, however. This
is by design, and can lead to spurious wake-ups on unrelated futex words
(e.g., when the mutex memory is reused for another mutex). This behavior
is documented in the glibc-internal futex API and in recent drafts of the
Linux kernel's futex documentation (see the draft_futex branch of
git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git).
sysdeps/nptl/configure.ac tests for forced unwind support and the C
cleanup attribute, giving errors if either is unsupported. It does
nothing beyond running those two tests.
Both the attribute, and _Unwind_GetCFA which is used in the forced
unwind test, were added in GCC 3.3. Thus these tests are long
obsolete, and this patch removes the configure fragment running them,
along with associated conditionals.
Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by the patch).
* sysdeps/nptl/configure.ac: Remove file.
* sysdeps/nptl/configure: Remove generated file.
* configure.ac (libc_cv_forced_unwind): Do not substitute.
* configure: Regenerated.
* config.h.in (HAVE_FORCED_UNWIND): Remove #undef.
* config.make.in (have-forced-unwind): Remove variable.
* nptl/Makefile [$(have-forced-unwind) = yes]: Make code
unconditional.
* nptl/descr.h [HAVE_FORCED_UNWIND]: Likewise.
* nptl/unwind.c [HAVE_FORCED_UNWIND]: Likewise.
(__pthread_unwind) [!HAVE_FORCED_UNWIND]: Remove conditional code.
* nptl/version.c [HAVE_FORCED_UNWIND]: Make code unconditional.
* sysdeps/nptl/Makefile [$(have-forced-unwind) = yes]: Make code
unconditional.
sysdeps/nptl/configure.ac has code to give errors if certain tests in
the top-level configure failed. However, all those failure conditions
also produce errors in the top-level configure, so the errors in the
NPTL configure are completely redundant; this patch removes them.
(As suggested in
<https://sourceware.org/ml/libc-alpha/2015-10/msg00510.html>, I think
the top-level tests in question can be completely removed as
unnecessary given the version tests. But even without that there is
clearly no point in duplicating code that gives an error if the test
fails.)
Tested for x86_64 (testsuite, and that installed shared libraries are
unchanged by the patch).
* sysdeps/nptl/configure.ac: Do not give errors based on the
results of top-level configure tests.
* sysdeps/nptl/configure: Regenerated.
It was noted in
<https://sourceware.org/ml/libc-alpha/2012-09/msg00305.html> that the
bits/*.h naming scheme should only be used for installed headers.
This patch renames bits/stdio-lock.h to plain stdio-lock.h to follow
that convention.
Tested for x86_64 (testsuite, and that installed stripped shared
libraries are unchanged by the patch).
[BZ #14912]
* bits/stdio-lock.h: Move to ...
* sysdeps/generic/stdio-lock.h: ...here.
(_BITS_STDIO_LOCK_H): Rename macro to _STDIO_LOCK_H.
* sysdeps/nptl/bits/stdio-lock.h: Move to ...
* sysdeps/nptl/stdio-lock.h: ...here.
(_BITS_STDIO_LOCK_H): Rename macro to _STDIO_LOCK_H.
* include/libio.h: Include <stdio-lock.h> instead of
<bits/stdio-lock.h>.
* sysdeps/nptl/fork.c: Likewise.
* sysdeps/pthread/flockfile.c: Likewise.
* sysdeps/pthread/ftrylockfile.c: Likewise.
* sysdeps/pthread/funlockfile.c: Likewise.
It was noted in
<https://sourceware.org/ml/libc-alpha/2012-09/msg00305.html> that the
bits/*.h naming scheme should only be used for installed headers.
This patch renames bits/libc-tsd.h to plain libc-tsd.h to follow that
convention.
Tested for x86_64 (testing, and that installed stripped shared
libraries are unchanged by the patch).
[BZ #14912]
* bits/libc-tsd.h: Move to ...
* sysdeps/generic/libc-tsd.h: ...here.
(_GENERIC_BITS_LIBC_TSD_H): Rename macro to _GENERIC_LIBC_TSD_H.
* sysdeps/mach/hurd/bits/libc-tsd.h: Move to ...
* sysdeps/mach/hurd/libc-tsd.h: ...here.
(_BITS_LIBC_TSD_H): Rename macro to _LIBC_TSD_H.
* include/ctype.h: Include <libc-tsd.h> instead of
<bits/libc-tsd.h>.
* include/rpc/rpc.h: Likewise.
* locale/localeinfo.h: Likewise.
* sunrpc/rpc_thread.c: Likewise.
* sysdeps/mach/hurd/malloc-machine.h: Likewise.
* sysdeps/nptl/malloc-machine.h: Likewise.
This adds new functions for futex operations, starting with wait,
abstimed_wait, reltimed_wait, wake. They add documentation and error
checking according to the current draft of the Linux kernel futex manpage.
Waiting with absolute or relative timeouts is split into separate functions.
This allows for removing a few cases of code duplication in pthreads code,
which uses absolute timeouts; also, it allows us to put platform-specific
code to go from an absolute to a relative timeout into the platform-specific
futex abstractions..
Futex operations that can be canceled are also split out into separate
functions suffixed by "_cancelable".
There are separate versions for both Linux and NaCl; while they currently
differ only slightly, my expectation is that the separate versions of
lowlevellock-futex.h will eventually be merged into futex-internal.h
when we get to move the lll_ functions over to the new futex API.
This patch replaces unsigned long int and 1UL with uint64_t and
(uint64_t) 1 to support ILP32 targets like x32.
[BZ #17870]
* nptl/sem_post.c (__new_sem_post): Replace unsigned long int
with uint64_t.
* nptl/sem_waitcommon.c (__sem_wait_cleanup): Replace 1UL with
(uint64_t) 1.
(__new_sem_wait_slow): Replace unsigned long int with uint64_t.
Replace 1UL with (uint64_t) 1.
* sysdeps/nptl/internaltypes.h (new_sem): Replace unsigned long
int with uint64_t.
This commit fixes semaphore destruction by either using 64b atomic
operations (where available), or by using two separate fields when only
32b atomic operations are available. In the latter case, we keep a
conservative estimate of whether there are any waiting threads in one
bit of the field that counts the number of available tokens, thus
allowing sem_post to atomically both add a token and determine whether
it needs to call futex_wake.
See:
https://sourceware.org/ml/libc-alpha/2014-12/msg00155.html
2014-08-12 Bernard Ogden <bernie.ogden@linaro.org>
[BZ #16892]
* sysdeps/nptl/lowlevellock.h (__lll_timedlock): Use
atomic_compare_and_exchange_bool_acq rather than atomic_exchange_acq.