Refactor the sincos implementation - rather than rely on odd partial inlining
of preprocessed portions from sin and cos, explicitly write out the cases.
This makes sincos much easier to maintain and provides an additional 16-20%
speedup between 0 and 2^27. The overall speedup of sincos is 48% over this range.
Between 0 and PI it is 66% faster.
* sysdeps/ieee754/dbl-64/s_sin.c (__sin): Cleanup ifdefs.
(__cos): Likewise.
* sysdeps/ieee754/dbl-64/s_sin.c (__sincos): Refactor using the same
logic as sin and cos.
Refactor duplicated code into do_sin. Since all calls to do_sin use copysign to
set the sign of the result, move it inside do_sin. Small inputs use a separate
polynomial, so move this into do_sin as well (the check is based on the more
conservative case when doing large range reduction, but could be relaxed).
* sysdeps/ieee754/dbl-64/s_sin.c (do_sin): Use TAYLOR_SIN for small
inputs. Return correct sign.
(do_sincos): Remove small input check before do_sin, let do_sin set
the sign.
(__sin): Likewise.
(__cos): Likewise.
For huge inputs use the improved do_sincos function as well. Now no cases use
the correction factor returned by do_sin, do_cos and TAYLOR_SIN, so remove it.
* sysdeps/ieee754/dbl-64/s_sin.c (TAYLOR_SIN): Remove cor parameter.
(do_cos): Remove corp parameter and calculations.
(do_sin): Likewise.
(do_sincos): Remove cor variable.
(__sin): Use do_sincos for huge inputs.
(__cos): Likewise.
* sysdeps/ieee754/dbl-64/s_sincos.c (__sincos): Likewise.
(reduce_and_compute_sincos): Remove unused function.
This patch improves the accuracy of the range reduction. When the input is
large (2^27) and very close to a multiple of PI/2, using 110 bits of PI is not
enough. Improve range reduction accuracy to 136 bits. As a result the special
checks for results close to zero can be removed. The ULP of the polynomials is
at worst 0.55ULP, so there is no reason for the slow functions, and they can be
removed.
* sysdeps/ieee754/dbl-64/s_sin.c (reduce_sincos_1): Rename to
reduce_sincos, improve accuracy to 136 bits.
(do_sincos_1): Rename to do_sincos, remove fallbacks to slow functions.
(__sin): Use improved reduction and simplified do_sincos calculation.
(__cos): Likewise.
* sysdeps/ieee754/dbl-64/s_sincos.c (__sincos): Likewise.
This patch removes the large range reduction code and defers to the huge range
reduction code. The first level range reducer supports inputs up to 2^27,
which is way too large given that inputs for sin/cos are typically small
(< 10), and optimizing for a smaller range would give a significant speedup.
Input values above 2^27 are practically never used, so there is no reason for
supporting range reduction between 2^27 and 2^48. Removing it significantly
simplifies code and enables further speedups. There is about a 2.3x slowdown
in this range due to __branred being extremely slow (a better algorithm could
easily more than double performance).
* sysdeps/ieee754/dbl-64/s_sin.c (reduce_sincos_2): Remove function.
(do_sincos_2): Likewise.
(__sin): Remove middle range reduction case.
(__cos): Likewise.
* sysdeps/ieee754/dbl-64/s_sincos.c (__sincos): Remove middle range
reduction case.
This series of patches removes the slow patchs from sin, cos and sincos.
Besides greatly simplifying the implementation, the new version is also much
faster for inputs up to PI (41% faster) and for large inputs needing range
reduction (27% faster).
ULP is ~0.55 with no errors found after testing 1.6 billion inputs across most
of the range with mpsin and mpcos. The number of incorrectly rounded results
(ie. ULP >0.5) is at most ~2750 per million inputs between 0.125 and 0.5,
the average is ~850 per million between 0 and PI.
Tested on AArch64 and x86_64 with no regressions.
The first patch removes the slow paths for the cases where the input is small
and doesn't require range reduction. Update ULP tables for sin, cos and sincos
on AArch64 and x86_64.
* sysdeps/aarch64/libm-test-ulps: Update ULP for sin, cos, sincos.
* sysdeps/ieee754/dbl-64/s_sin.c (__sin): Remove slow paths for small
inputs.
(__cos): Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sin, cos, sincos.
This patch assumes O_DIRECTORY works as defined by POSIX on opendir
implementation (aligning with other glibc code, for instance pwd). This
allows remove both the fallback code to handle system with missing or
broken O_DIRECTORY along with the Linux specific opendir.c which just
advertise the working flag.
Checked on aarch64-linux-gnu, x86_64-linux-gnu, i686-linux-gnu,
sparcv9-linux-gnu, sparc64-linux-gnu, powerpc-linux-gnu, and
powerpc64le-linux-gnu.
* sysdeps/posix/opendir.c (o_directory_works, tryopen_o_directory):
Remove definitions.
(opendir_oflags): Use O_DIRECTORY regardless.
(__opendir, __opendirat): Remove need_isdir_precheck usage.
* sysdeps/unix/sysv/linux/opendir.c: Remove file.
* htl/cthreads-compat.c (__cthread_detach): Call __pthread_detach
instead of pthread_detach.
(__cthread_fork): Call __pthread_create instead of pthread_create.
(__cthread_keycreate): Call __pthread_key_create instead of
pthread_key_create.
(__cthread_getspecific): Call __pthread_getspecific instead of
pthread_getspecific.
(__cthread_setspecific): Call __pthread_setspecific instead of
pthread_setspecific.
* htl/pt-alloc.c (__pthread_alloc): Call __pthread_mutex_lock and
__pthread_mutex_unlock instead of pthread_mutex_lock and
pthread_mutex_unlock.
* htl/pt-cleanup.c (__pthread_get_cleanup_stack): Rename to
___pthread_get_cleanup_stack.
(__pthread_get_cleanup_stack): New strong alias.
* htl/pt-create.c: Include <pthreadP.h>.
(entry_point): Call __pthread_exit instead of pthread_exit.
(pthread_create): Rename to __pthread_create.
(pthread_create): New strong alias.
* htl/pt-detach.c (pthread_detach): Rename to __pthread_detach.
(pthread_detach): New strong alias.
(__pthread_detach): Call __pthread_cond_broadcast instead of
pthread_cond_broadcast.
* htl/pt-exit.c (__pthread_exit): Call __pthread_setcancelstate
instead of pthread_setcancelstate.
* htl/pt-testcancel.c: Include <pthreadP.h>.
(pthread_testcancel): Call __pthread_exit instead of pthread_exit.
* sysdeps/htl/pt-attr-getstack.c: Include <pthreadP.h>
(__pthread_attr_getstack): Call __pthread_attr_getstackaddr and
__pthread_attr_getstacksize instead of pthread_attr_getstackaddr and
pthread_attr_getstacksize.
* sysdeps/htl/pt-attr-getstackaddr.c (pthread_attr_getstackaddr):
Rename to __pthread_attr_getstackaddr.
(pthread_attr_getstackaddr): New strong alias.
* sysdeps/htl/pt-attr-getstacksize.c (pthread_attr_getstacksize):
Rename to __pthread_attr_getstacksize.
(pthread_attr_getstacksize): New strong alias.
* sysdeps/htl/pt-attr-setstack.c: Include <pthreadP.h>.
(pthread_attr_setstack): Rename to __pthread_attr_setstack.
(pthread_attr_setstack): New strong alias.
(__pthread_attr_setstack): Call __pthread_attr_getstacksize,
__pthread_attr_setstacksize and __pthread_attr_setstackaddr instead of
pthread_attr_getstacksize, pthread_attr_setstacksize and
pthread_attr_setstackaddr.
* sysdeps/htl/pt-attr-setstackaddr.c (pthread_attr_setstackaddr):
Rename to __pthread_attr_setstackaddr.
(pthread_attr_setstackaddr): New strong alias.
* sysdeps/htl/pt-attr-setstacksize.c (pthread_attr_setstacksize):
Rename to __pthread_attr_setstacksize.
(pthread_attr_setstacksize): New strong alias.
* sysdeps/htl/pt-cond-timedwait.c: Include <pthreadP.h>.
(__pthread_cond_timedwait_internal): Use __pthread_exit instead of
pthread_exit.
* sysdeps/htl/pt-key-create.c: Include <pthreadP.h>.
(__pthread_key_create): New hidden def.
* sysdeps/htl/pt-key.h: Include <pthreadP.h>.
* sysdeps/htl/pthreadP.h (_pthread_mutex_init,
__pthread_cond_broadcast, __pthread_create, __pthread_detach,
__pthread_exit, __pthread_key_create, __pthread_getspecific,
__pthread_setspecific, __pthread_setcancelstate,
__pthread_attr_getstackaddr, __pthread_attr_setstackaddr,
__pthread_attr_getstacksize, __pthread_attr_setstacksize,
__pthread_attr_setstack, ___pthread_get_cleanup_stack): New
declarations.
(__pthread_key_create, _pthread_mutex_init): New hidden declarations.
* sysdeps/mach/hurd/htl/pt-attr-setstackaddr.c
(pthread_attr_setstackaddr): Rename to __pthread_attr_setstackaddr.
(pthread_attr_setstackaddr): New strong alias.
* sysdeps/mach/hurd/htl/pt-attr-setstacksize.c
(pthread_attr_setstacksize): Rename to __pthread_attr_setstacksize.
(pthread_attr_setstacksize): New strong alias.
* sysdeps/mach/hurd/htl/pt-docancel.c: Include <pthreadP.h>.
(call_exit): Call __pthread_exit instead of pthread_exit.
* sysdeps/mach/hurd/htl/pt-mutex-init.c: Include <pthreadP.h>.
(_pthread_mutex_init): New hidden definition.
* sysdeps/mach/hurd/htl/pt-sysdep.c: Include <pthreadP.h>.
(_init_routine): Call __pthread_attr_init and __pthread_attr_setstack
instead of pthread_attr_init and pthread_attr_setstack.
Contributed by
Agustina Arzille <avarzille@riseup.net>
Amos Jeffries <squid3@treenet.co.nz>
David Michael <fedora.dm0@gmail.com>
Marco Gerards <marco@gnu.org>
Marcus Brinkmann <marcus@gnu.org>
Neal H. Walfield <neal@gnu.org>
Pino Toscano <toscano.pino@tiscali.it>
Richard Braun <rbraun@sceen.net>
Roland McGrath <roland@gnu.org>
Samuel Thibault <samuel.thibault@ens-lyon.org>
Thomas DiModica <ricinwich@yahoo.com>
Thomas Schwinge <tschwinge@gnu.org>
* htl: New directory.
* sysdeps/htl: New directory.
* sysdeps/hurd/htl: New directory.
* sysdeps/i386/htl: New directory.
* sysdeps/mach/htl: New directory.
* sysdeps/mach/hurd/htl: New directory.
* sysdeps/mach/hurd/i386/htl: New directory.
* nscd/Depend, resolv/Depend, rt/Depend: Add htl dependency.
* sysdeps/mach/hurd/i386/Implies: Add mach/hurd/i386/htl imply.
* sysdeps/mach/hurd/i386/libpthread.abilist: New file.
This patch fixes 3dc214977 for sparc. Different than other architectures
SPARC kernel Kconfig does not define CONFIG_CLONE_BACKWARDS, however it
has the same ABI as if it did, implemented by sparc-specific code
(sparc_do_fork).
It also has a unique return value convention for clone:
Parent --> %o0 == child's pid, %o1 == 0
Child --> %o0 == parent's pid, %o1 == 1
Which required a special macro to correct issue the syscall
(INLINE_CLONE_SYSCALL).
Checked on sparc64-linux-gnu and sparcv9-linux-gnu.
* sysdeps/unix/sysv/linux/arch-fork.h [__ASSUME_CLONE_BACKWARDS]
(arch_fork): Issue INLINE_CLONE_SYSCALL if defined.
* sysdeps/unix/sysv/linux/sparc/kernel-features.h
(__ASSUME_CLONE_BACKWARDS): Define.
When there is no login uid Linux sets /proc/self/loginid to the sentinel
value of, (uid_t) -1. If this is set we can return early and avoid
needlessly looking up the sentinel value in any configured nss
databases.
Checked on aarch64-linux-gnu.
* sysdeps/unix/sysv/linux/getlogin_r.c (__getlogin_r_loginuid): Return
early when linux sentinel value is set.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Linux kernel architectures have various arrangements for umount
syscalls. There is a syscall that takes flags, and an older one that
does not. Newer architectures have only the one taking flags, under
the name umount2 (or under the name umount, in the ia64 case). Older
architectures may have both, under the names umount2 and umount (or
under the names umount and oldumount, in the alpha case). glibc then
has several similar implementations of the umount function (no flags)
in terms of either the __umount2 function, or the corresponding
syscall, or in terms of the old syscall under either of its names.
This patch simplifies the implementations in glibc by always using the
__umount2 function to implement the umount function on all systems
using the Linux kernel. The linux/generic implementation is moved to
sysdeps/unix/sysv/linux (without any changes to code or comments) and
all the other variants are removed. (This will have the effect of
causing the new syscall to be used in some cases that previously used
the old one, but as discussed for previous changes, such a change to
the underlying syscalls used is OK.)
There remain two variants of how the __umount2 function is
implemented, either in umount2.S, or, for ia64, in syscalls.list.
Tested with build-many-glibcs.py.
[BZ #16552]
* sysdeps/unix/sysv/linux/generic/umount.c: Move to ....
* sysdeps/unix/sysv/linux/umount.c: ... here.
* sysdeps/unix/sysv/linux/arm/umount.c: Remove file.
* sysdeps/unix/sysv/linux/hppa/umount.c: Likewise.
* sysdeps/unix/sysv/linux/ia64/umount.c: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/umount.c: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/umount.c: Likewise.
* sysdeps/unix/sysv/linux/umount.S: Likewise.
* sysdeps/unix/sysv/linux/x86_64/umount.c: Likewise.
* sysdeps/generic/libc-start.h [!SHARED] (ARCH_SETUP_TLS): Define to
__libc_setup_tls.
* sysdeps/unix/sysv/linux/powerpc/libc-start.h [!SHARED]
(ARCH_SETUP_TLS): Likewise.
* sysdeps/mach/hurd/libc-start.h: New file copied from
sysdeps/generic/libc-start.h, but define ARCH_SETUP_TLS to empty.
* csu/libc-start.c [!SHARED] (LIBC_START_MAIN): Call ARCH_SETUP_TLS instead
of __libc_setup_tls.
* sysdeps/mach/hurd/i386/init-first.c [!SHARED] (init1): Call
__libc_setup_tls before initializing libpthread and running _hurd_init which
starts the signal thread.
Letting rtld access errno through TLS can not work at early stages since
TLS will not be initialized yet. When a private errno is not possible,
we thus have no other way than going through __errno_location.
* include/errno.h [IS_IN(rtld) && !RTLD_PRIVATE_ERRNO]: Do not use the
TLS declaration of errno.
When $(tests-execstack-$(have-z-execstack)) is added to tests before
it is defined, it is empty. This patch adds it to tests after it is
defined.
[BZ #22998]
* elf/Makefile (tests): Add $(tests-execstack-$(have-z-execstack))
after it is defined.
No glibc configuration uses the present debug/backtrace.c, whereas
several #include the x86_64 version. The x86_64 version is
effectively a generic one (using _Unwind_Backtrace from libgcc, which
works much more reliably than the built-in functions used by
debug/backtrace.c). This patch moves it to debug/backtrace.c and
removes all the #includes of the x86_64 version from other
architectures which are no longer required.
I do not know whether all the other architecture-specific backtrace
implementations that are based on _Unwind_Backtrace are required, or
whether, where their differences from the generic version do something
useful, suitable hooks could be added to the generic version to reduce
the duplication involved.
Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged by this patch.
* sysdeps/x86_64/backtrace.c: Move to ....
* debug/backtrace.c: ... here.
* sysdeps/aarch64/backtrace.c: Remove file.
* sysdeps/alpha/backtrace.c: Likewise.
* sysdeps/hppa/backtrace.c: Likewise.
* sysdeps/ia64/backtrace.c: Likewise.
* sysdeps/mips/backtrace.c: Likewise.
* sysdeps/nios2/backtrace.c: Likewise.
* sysdeps/riscv/backtrace.c: Likewise.
* sysdeps/sh/backtrace.c: Likewise.
* sysdeps/tile/backtrace.c: Likewise.
The powerpc and sparc bits/mathinline.h include inlines of fdim and
fdimf. These are not restricted to -fno-math-errno, but do not set
errno, and wrongly use ordered <= comparisons instead of the required
islessequal comparisons (this latter issue is latent on powerpc
because GCC wrongly uses unordered comparison instructions for
operations that should use ordered comparison instructions).
Since we wish to avoid such header inlines anyway, leaving it to the
compiler to inline such standard functions under appropriate
conditions, this patch fixes those issues by removing the inlines in
question (and thus removing the sparc bits/mathinline.h header which
had no other inlines left in it). I've filed
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85003> for adding
correct fdim inlines to GCC, since the function is simple enough that
a correct inline is a perfectly reasonable architecture-independent
optimization with -fno-math-errno and in the absence of implicit
excess precision.
Tested with build-many-glibcs.py for all its powerpc and sparc
configurations.
[BZ #22987]
* sysdeps/powerpc/bits/mathinline.h (fdim): Remove inline
function.
(fdimf): Likewise.
* sysdeps/sparc/fpu/bits/mathinline.h: Remove file.