glibc

mirror of git://sourceware.org/git/glibc.git synced 2025-04-12 14:21:18 +08:00

Author	SHA1	Message	Date
Arjun Shankar	6abb4002df	Fix deadlock when pthread_atfork handler calls pthread_atfork or dlclose In multi-threaded programs, registering via pthread_atfork, de-registering implicitly via dlclose, or running pthread_atfork handlers during fork was protected by an internal lock. This meant that a pthread_atfork handler attempting to register another handler or dlclose a dynamically loaded library would lead to a deadlock. This commit fixes the deadlock in the following way: During the execution of handlers at fork time, the atfork lock is released prior to the execution of each handler and taken again upon its return. Any handler registrations or de-registrations that occurred during the execution of the handler are accounted for before proceeding with further handler execution. If a handler that hasn't been executed yet gets de-registered by another handler during fork, it will not be executed. If a handler gets registered by another handler during fork, it will not be executed during that particular fork. The possibility that handlers may now be registered or deregistered during handler execution means that identifying the next handler to be run after a given handler may register/de-register others requires some bookkeeping. The fork_handler struct has an additional field, 'id', which is assigned sequentially during registration. Thus, handlers are executed in ascending order of 'id' during 'prepare', and descending order of 'id' during parent/child handler execution after the fork. Two tests are included: * tst-atfork3: Adhemerval Zanella <adhemerval.zanella@linaro.org> This test exercises calling dlclose from prepare, parent, and child handlers. * tst-atfork4: This test exercises calling pthread_atfork and dlclose from the prepare handler. [BZ #24595, BZ #27054] Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 52a103e237329b9f88a28513fe7506ffc3bd8ced)	2022-05-30 12:37:58 +02:00
Noah Goldstein	ac87df8d75	x86: Fallback {str\|wcs}cmp RTM in the ncmp overflow case [BZ #29127 ] Re-cherry-pick commit c627209832 for strcmp-avx2.S change which was omitted in intial cherry pick because at the time this bug was not present on release branch. Fixes BZ #29127. In the overflow fallback strncmp-avx2-rtm and wcsncmp-avx2-rtm would call strcmp-avx2 and wcscmp-avx2 respectively. This would have not checks around vzeroupper and would trigger spurious aborts. This commit fixes that. test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass on AVX2 machines with and without RTM. Co-authored-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit c6272098323153db373f2986c67786ea8c85f1cf)	2022-05-25 14:30:20 -07:00
Sergei Trofimovich	478cd506ea	string.h: fix __fortified_attr_access macro call [BZ #29162 ] commit e938c0274 "Don't add access size hints to fortifiable functions" converted a few '__attr_access ((...))' into '__fortified_attr_access (...)' calls. But one of conversions had double parentheses of '__fortified_attr_access (...)'. Noticed as a gnat6 build failure: /<<NIX>>-glibc-2.34-210-dev/include/bits/string_fortified.h:110:50: error: macro "__fortified_attr_access" requires 3 arguments, but only 1 given The change fixes parentheses. This is seen when using compilers that do not support __builtin___stpncpy_chk, e.g. gcc older than 4.7, clang older than 2.6 or some compiler not derived from gcc or clang. Signed-off-by: Sergei Trofimovich <slyich@gmail.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit 5a5f94af0542f9a35aaa7992c18eb4e2403a29b9)	2022-05-23 14:00:31 +05:30
Szabolcs Nagy	2b128a7d30	linux: Add a getauxval test [BZ #23293 ] This is for bug 23293 and it relies on the glibc test system running tests via explicit ld.so invokation by default. Reviewed-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 9faf5262c77487c96da8a3e961b88c0b1879e186)	2022-05-19 10:48:52 +01:00
Szabolcs Nagy	f5f7144dfc	rtld: Use generic argv adjustment in ld.so [BZ #23293 ] When an executable is invoked as ./ld.so [ld.so-args] ./exe [exe-args] then the argv is adujusted in ld.so before calling the entry point of the executable so ld.so args are not visible to it. On most targets this requires moving argv, env and auxv on the stack to ensure correct stack alignment at the entry point. This had several issues: - The code for this adjustment on the stack is written in asm as part of the target specific ld.so _start code which is hard to maintain. - The adjustment is done after _dl_start returns, where it's too late to update GLRO(dl_auxv), as it is already readonly, so it points to memory that was clobbered by the adjustment. This is bug 23293. - _environ is also wrong in ld.so after the adjustment, but it is likely not used after _dl_start returns so this is not user visible. - _dl_argv was updated, but for this it was moved out of relro, which changes security properties across targets unnecessarily. This patch introduces a generic _dl_start_args_adjust function that handles the argument adjustments after ld.so processed its own args and before relro protection is applied. The same algorithm is used on all targets, _dl_skip_args is now 0, so existing target specific adjustment code is no longer used. The bug affects aarch64, alpha, arc, arm, csky, ia64, nios2, s390-32 and sparc, other targets don't need the change in principle, only for consistency. The GNU Hurd start code relied on _dl_skip_args after dl_main returned, now it checks directly if args were adjusted and fixes the Hurd startup data accordingly. Follow up patches can remove _dl_skip_args and DL_ARGV_NOT_RELRO. Tested on aarch64-linux-gnu and cross tested on i686-gnu. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit ad43cac44a6860eaefcadadfb2acb349921e96bf)	2022-05-19 10:48:52 +01:00
Stefan Liebler	04892c543e	S390: Enable static PIE This commit enables static PIE on 64bit. On 31bit, static PIE is not supported. A new configure check in sysdeps/s390/s390-64/configure.ac also performs a minimal test for requirements in ld: Ensure you also have those patches for: - binutils (ld) - "[PR ld/22263] s390: Avoid dynamic TLS relocs in PIE" https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=26b1426577b5dcb32d149c64cca3e603b81948a9 (Tested by configure check above) Otherwise there will be a R_390_TLS_TPOFF relocation, which fails to be processed in _dl_relocate_static_pie() as static TLS map is not setup. - "s390: Add DT_JMPREL pointing to .rela.[i]plt with static-pie" https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d942d8db12adf4c9e5c7d9ed6496a779ece7149e (We can't test it in configure as we are not able to link a static PIE executable if the system glibc lacks static PIE support) Otherwise there won't be DT_JMPREL, DT_PLTRELA, DT_PLTRELASZ entries and the IFUNC symbols are not processed, which leads to crashes. - kernel (the mentioned links to the commits belong to 5.19 merge window): - "s390/mmap: increase stack/mmap gap to 128MB" https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features&id=f2f47d0ef72c30622e62471903ea19446ea79ee2 - "s390/vdso: move vdso mapping to its own function" https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features&id=57761da4dc5cd60bed2c81ba0edb7495c3c740b8 - "s390/vdso: map vdso above stack" https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features&id=9e37a2e8546f9e48ea76c839116fa5174d14e033 - "s390/vdso: add vdso randomization" https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features&id=41cd81abafdc4e58a93fcb677712a76885e3ca25 (We can't test the kernel of the target system) Otherwise if /proc/sys/kernel/randomize_va_space is turned off (0), static PIE executables like ldconfig will crash. While startup sbrk is used to enlarge the HEAP. Unfortunately the underlying brk syscall fails as there is not enough space after the HEAP. Then the address of the TLS image is invalid and the following memcpy in __libc_setup_tls() leads to a segfault. If /proc/sys/kernel/randomize_va_space is activated (default: 2), there is enough space after HEAP. - glibc - "Linux: Define MMAP_CALL_INTERNAL" https://sourceware.org/git/?p=glibc.git;a=commit;h=c1b68685d438373efe64e5f076f4215723004dfb - "i386: Remove OPTIMIZE_FOR_GCC_5 from Linux libc-do-syscall.S" https://sourceware.org/git/?p=glibc.git;a=commit;h=6e5c7a1e262961adb52443ab91bd2c9b72316402 - "i386: Honor I386_USE_SYSENTER for 6-argument Linux system calls" https://sourceware.org/git/?p=glibc.git;a=commit;h=60f0f2130d30cfd008ca39743027f1e200592dff - "ia64: Always define IA64_USE_NEW_STUB as a flag macro" https://sourceware.org/git/?p=glibc.git;a=commit;h=18bd9c3d3b1b6a9182698c85354578d1d58e9d64 - "Linux: Implement a useful version of _startup_fatal" https://sourceware.org/git/?p=glibc.git;a=commit;h=a2a6bce7d7e52c1c34369a7da62c501cc350bc31 - "Linux: Introduce __brk_call for invoking the brk system call" https://sourceware.org/git/?p=glibc.git;a=commit;h=b57ab258c1140bc45464b4b9908713e3e0ee35aa - "csu: Implement and use _dl_early_allocate during static startup" https://sourceware.org/git/?p=glibc.git;a=commit;h=f787e138aa0bf677bf74fa2a08595c446292f3d7 The mentioned patch series by Florian Weimer avoids the mentioned failing sbrk syscall by falling back to mmap. This commit also adjusts startup code in start.S to be ready for static PIE. We have to add a wrapper function for main as we are not allowed to use GOT relocations before __libc_start_main is called. (Compare also to: - commit 14d886edbd3d80b771e1c42fbd9217f9074de9c6 "aarch64: fix start code for static pie" - commit 3d1d79283e6de4f7c434cb67fb53a4fd28359669 "aarch64: fix static pie enabled libc when main is in a shared library" ) (cherry picked from commit 728894dba4a19578bd803906de184a8dd51ed13c)	2022-05-19 09:46:56 +02:00
Florian Weimer	72d9dcfd16	csu: Implement and use _dl_early_allocate during static startup This implements mmap fallback for a brk failure during TLS allocation. scripts/tls-elf-edit.py is updated to support the new patching method. The script no longer requires that in the input object is of ET_DYN type. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit f787e138aa0bf677bf74fa2a08595c446292f3d7)	2022-05-17 08:08:52 +02:00
Florian Weimer	b5ddf33c6e	Linux: Introduce __brk_call for invoking the brk system call Alpha and sparc can now use the generic implementation. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit b57ab258c1140bc45464b4b9908713e3e0ee35aa)	2022-05-17 08:08:52 +02:00
Florian Weimer	2d05ba7f8e	Linux: Implement a useful version of _startup_fatal On i386 and ia64, the TCB is not available at this point. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit a2a6bce7d7e52c1c34369a7da62c501cc350bc31)	2022-05-17 08:08:52 +02:00
Florian Weimer	55ee3afa0d	ia64: Always define IA64_USE_NEW_STUB as a flag macro And keep the previous definition if it exists. This allows disabling IA64_USE_NEW_STUB while keeping USE_DL_SYSINFO defined. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 18bd9c3d3b1b6a9182698c85354578d1d58e9d64)	2022-05-17 08:08:52 +02:00
Florian Weimer	d66cca3fbb	Linux: Define MMAP_CALL_INTERNAL Unlike MMAP_CALL, this avoids a TCB dependency for an errno update on failure. <mmap_internal.h> cannot be included as is on several architectures due to the definition of page_unit, so introduce a separate header file for the definition of MMAP_CALL and MMAP_CALL_INTERNAL, <mmap_call.h>. Reviewed-by: Stefan Liebler <stli@linux.ibm.com> (cherry picked from commit c1b68685d438373efe64e5f076f4215723004dfb)	2022-05-17 08:08:52 +02:00
Florian Weimer	a7b122a7b4	i386: Honor I386_USE_SYSENTER for 6-argument Linux system calls Introduce an int-80h-based version of __libc_do_syscall and use it if I386_USE_SYSENTER is defined as 0. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 60f0f2130d30cfd008ca39743027f1e200592dff)	2022-05-17 08:08:52 +02:00
Florian Weimer	d1772c9376	i386: Remove OPTIMIZE_FOR_GCC_5 from Linux libc-do-syscall.S After commit a78e6a10d0b50d0ca80309775980fc99944b1727 ("i386: Remove broken CAN_USE_REGISTER_ASM_EBP (bug 28771)"), it is never defined. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 6e5c7a1e262961adb52443ab91bd2c9b72316402)	2022-05-17 08:08:52 +02:00
Fangrui Song	58bb3aeaae	elf: Remove __libc_init_secure After 73fc4e28b9464f0e13edc719a5372839970e7ddb, __libc_enable_secure_decided is always 0 and a statically linked executable may overwrite __libc_enable_secure without considering AT_SECURE. The __libc_enable_secure has been correctly initialized in _dl_aux_init, so just remove __libc_enable_secure_decided and __libc_init_secure. This allows us to remove some startup_get*id functions from 22b79ed7f413cd980a7af0cf258da5bf82b6d5e5. Reviewed-by: Florian Weimer <fweimer@redhat.com> (cherry picked from commit 3e9acce8c50883b6cd8a3fb653363d9fa21e1608)	2022-05-17 08:08:52 +02:00
Florian Weimer	0a5c6c9d99	Linux: Consolidate auxiliary vector parsing (redo) And optimize it slightly. This is commit 8c8510ab2790039e58995ef3a22309582413d3ff revised. In _dl_aux_init in elf/dl-support.c, use an explicit loop and -fno-tree-loop-distribute-patterns to avoid memset. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 73fc4e28b9464f0e13edc719a5372839970e7ddb)	2022-05-17 08:08:52 +02:00
Florian Weimer	76304dfdaf	Linux: Include <dl-auxv.h> in dl-sysdep.c only for SHARED Otherwise, <dl-auxv.h> on POWER ends up being included twice, once in dl-sysdep.c, once in dl-support.c. That leads to a linker failure due to multiple definitions of _dl_cache_line_size. Fixes commit d96d2995c1121d3310102afda2deb1f35761b5e6 ("Revert "Linux: Consolidate auxiliary vector parsing"). (cherry picked from commit 098c795e85fbd05c5ef59c2d0ce59529331bea27)	2022-05-17 08:08:52 +02:00
Florian Weimer	788eb21ff0	Revert "Linux: Consolidate auxiliary vector parsing" This reverts commit 8c8510ab2790039e58995ef3a22309582413d3ff. The revert is not perfect because the commit included a bug fix for _dl_sysdep_start with an empty argv, introduced in commit 2d47fa68628e831a692cba8fc9050cef435afc5e ("Linux: Remove DL_FIND_ARG_COMPONENTS"), and this bug fix is kept. The revert is necessary because the reverted commit introduced an early memset call on aarch64, which leads to crash due to lack of TCB initialization. (cherry picked from commit d96d2995c1121d3310102afda2deb1f35761b5e6)	2022-05-17 08:08:52 +02:00
Florian Weimer	150039ff07	Linux: Consolidate auxiliary vector parsing And optimize it slightly. The large switch statement in _dl_sysdep_start can be replaced with a large array. This reduces source code and binary size. On i686-linux-gnu: Before: text data bss dec hex filename 7791 12 0 7803 1e7b elf/dl-sysdep.os After: text data bss dec hex filename 7135 12 0 7147 1beb elf/dl-sysdep.os Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 8c8510ab2790039e58995ef3a22309582413d3ff)	2022-05-17 08:08:52 +02:00
Florian Weimer	3948c6ca89	Linux: Assume that NEED_DL_SYSINFO_DSO is always defined The definition itself is still needed for generic code. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit f19fc997a5754a6c0bb9e43618f0597e878061f7)	2022-05-17 08:08:52 +02:00
Florian Weimer	29f833f5ab	Linux: Remove DL_FIND_ARG_COMPONENTS The generic definition is always used since the Native Client port has been removed. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 2d47fa68628e831a692cba8fc9050cef435afc5e)	2022-05-17 08:08:52 +02:00
Florian Weimer	1695c5e0f6	Linux: Remove HAVE_AUX_SECURE, HAVE_AUX_XID, HAVE_AUX_PAGESIZE They are always defined. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit b9c3d3382f6f50e9723002deb2dc8127de720fa6)	2022-05-17 08:08:52 +02:00
Florian Weimer	756d583c9e	elf: Merge dl-sysdep.c into the Linux version The generic version is the de-facto Linux implementation. It requires an auxiliary vector, so Hurd does not use it. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 91c0a47ffb66e7cd802de870686465db3b3976a0)	2022-05-17 08:08:52 +02:00
Noah Goldstein	2c4fc8e5ca	x86: Optimize {str\|wcs}rchr-evex The new code unrolls the main loop slightly without adding too much overhead and minimizes the comparisons for the search CHAR. Geometric Mean of all benchmarks New / Old: 0.755 See email for all results. Full xcheck passes on x86_64 with and without multiarch enabled. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit c966099cdc3e0fdf92f63eac09b22fa7e5f5f02d)	2022-05-16 12:42:46 -07:00
Noah Goldstein	fdbc8439ac	x86: Optimize {str\|wcs}rchr-avx2 The new code unrolls the main loop slightly without adding too much overhead and minimizes the comparisons for the search CHAR. Geometric Mean of all benchmarks New / Old: 0.832 See email for all results. Full xcheck passes on x86_64 with and without multiarch enabled. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit df7e295d18ffa34f629578c0017a9881af7620f6)	2022-05-16 12:40:38 -07:00
Noah Goldstein	b05c0c8b28	x86: Optimize {str\|wcs}rchr-sse2 The new code unrolls the main loop slightly without adding too much overhead and minimizes the comparisons for the search CHAR. Geometric Mean of all benchmarks New / Old: 0.741 See email for all results. Full xcheck passes on x86_64 with and without multiarch enabled. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 5307aa9c1800f36a64c183c091c9af392c1fa75c)	2022-05-16 12:37:03 -07:00
H.J. Lu	bc35e22be4	x86-64: Fix SSE2 memcmp and SSSE3 memmove for x32 Clear the upper 32 bits in RDX (memory size) for x32 to fix FAIL: string/tst-size_t-memcmp FAIL: string/tst-size_t-memcmp-2 FAIL: string/tst-size_t-memcpy FAIL: wcsmbs/tst-size_t-wmemcmp on x32 introduced by 8804157ad9 x86: Optimize memcmp SSE2 in memcmp.S 26b2478322 x86: Reduce code size of mem{move\|pcpy\|cpy}-ssse3 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit 8ea20ee5f6145de4bff9481d3e09ac36ba9df8f3)	2022-05-16 12:34:52 -07:00
Noah Goldstein	4d1841deb7	x86: Fix missing __wmemcmp def for disable-multiarch build commit 8804157ad9da39631703b92315460808eac86b0c Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Fri Apr 15 12:27:59 2022 -0500 x86: Optimize memcmp SSE2 in memcmp.S Only defined wmemcmp and missed __wmemcmp. This commit fixes that by defining __wmemcmp and setting wmemcmp as a weak alias to __wmemcmp. Both multiarch and disable-multiarch builds succeed and full xchecks pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit c72a1a062a1ded52719802c07ab459e1fd54d2a6)	2022-05-16 12:31:39 -07:00
Noah Goldstein	cee9939f67	x86: Cleanup page cross code in memcmp-avx2-movbe.S Old code was both inefficient and wasted code size. New code (-62 bytes) and comparable or better performance in the page cross case. geometric_mean(N=20) of page cross cases New / Original: 0.960 size, align0, align1, ret, New Time/Old Time 1, 4095, 0, 0, 1.001 1, 4095, 0, 1, 0.999 1, 4095, 0, -1, 1.0 2, 4094, 0, 0, 1.0 2, 4094, 0, 1, 1.0 2, 4094, 0, -1, 1.0 3, 4093, 0, 0, 1.0 3, 4093, 0, 1, 1.0 3, 4093, 0, -1, 1.0 4, 4092, 0, 0, 0.987 4, 4092, 0, 1, 1.0 4, 4092, 0, -1, 1.0 5, 4091, 0, 0, 0.984 5, 4091, 0, 1, 1.002 5, 4091, 0, -1, 1.005 6, 4090, 0, 0, 0.993 6, 4090, 0, 1, 1.001 6, 4090, 0, -1, 1.003 7, 4089, 0, 0, 0.991 7, 4089, 0, 1, 1.0 7, 4089, 0, -1, 1.001 8, 4088, 0, 0, 0.875 8, 4088, 0, 1, 0.881 8, 4088, 0, -1, 0.888 9, 4087, 0, 0, 0.872 9, 4087, 0, 1, 0.879 9, 4087, 0, -1, 0.883 10, 4086, 0, 0, 0.878 10, 4086, 0, 1, 0.886 10, 4086, 0, -1, 0.873 11, 4085, 0, 0, 0.878 11, 4085, 0, 1, 0.881 11, 4085, 0, -1, 0.879 12, 4084, 0, 0, 0.873 12, 4084, 0, 1, 0.889 12, 4084, 0, -1, 0.875 13, 4083, 0, 0, 0.873 13, 4083, 0, 1, 0.863 13, 4083, 0, -1, 0.863 14, 4082, 0, 0, 0.838 14, 4082, 0, 1, 0.869 14, 4082, 0, -1, 0.877 15, 4081, 0, 0, 0.841 15, 4081, 0, 1, 0.869 15, 4081, 0, -1, 0.876 16, 4080, 0, 0, 0.988 16, 4080, 0, 1, 0.99 16, 4080, 0, -1, 0.989 17, 4079, 0, 0, 0.978 17, 4079, 0, 1, 0.981 17, 4079, 0, -1, 0.98 18, 4078, 0, 0, 0.981 18, 4078, 0, 1, 0.98 18, 4078, 0, -1, 0.985 19, 4077, 0, 0, 0.977 19, 4077, 0, 1, 0.979 19, 4077, 0, -1, 0.986 20, 4076, 0, 0, 0.977 20, 4076, 0, 1, 0.986 20, 4076, 0, -1, 0.984 21, 4075, 0, 0, 0.977 21, 4075, 0, 1, 0.983 21, 4075, 0, -1, 0.988 22, 4074, 0, 0, 0.983 22, 4074, 0, 1, 0.994 22, 4074, 0, -1, 0.993 23, 4073, 0, 0, 0.98 23, 4073, 0, 1, 0.992 23, 4073, 0, -1, 0.995 24, 4072, 0, 0, 0.989 24, 4072, 0, 1, 0.989 24, 4072, 0, -1, 0.991 25, 4071, 0, 0, 0.99 25, 4071, 0, 1, 0.999 25, 4071, 0, -1, 0.996 26, 4070, 0, 0, 0.993 26, 4070, 0, 1, 0.995 26, 4070, 0, -1, 0.998 27, 4069, 0, 0, 0.993 27, 4069, 0, 1, 0.999 27, 4069, 0, -1, 1.0 28, 4068, 0, 0, 0.997 28, 4068, 0, 1, 1.0 28, 4068, 0, -1, 0.999 29, 4067, 0, 0, 0.996 29, 4067, 0, 1, 0.999 29, 4067, 0, -1, 0.999 30, 4066, 0, 0, 0.991 30, 4066, 0, 1, 1.001 30, 4066, 0, -1, 0.999 31, 4065, 0, 0, 0.988 31, 4065, 0, 1, 0.998 31, 4065, 0, -1, 0.998 Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 23102686ec67b856a2d4fd25ddaa1c0b8d175c4f)	2022-05-16 12:28:37 -07:00
Noah Goldstein	0909286ffa	x86: Remove memcmp-sse4.S Code didn't actually use any sse4 instructions since `ptest` was removed in: commit 2f9062d7171850451e6044ef78d91ff8c017b9c0 Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Wed Nov 10 16:18:56 2021 -0600 x86: Shrink memcmp-sse4.S code size The new memcmp-sse2 implementation is also faster. geometric_mean(N=20) of page cross cases SSE2 / SSE4: 0.905 Note there are two regressions preferring SSE2 for Size = 1 and Size = 65. Size = 1: size, align0, align1, ret, New Time/Old Time 1, 1, 1, 0, 1.2 1, 1, 1, 1, 1.197 1, 1, 1, -1, 1.2 This is intentional. Size == 1 is significantly less hot based on profiles of GCC11 and Python3 than sizes [4, 8] (which is made hotter). Python3 Size = 1 -> 13.64% Python3 Size = [4, 8] -> 60.92% GCC11 Size = 1 -> 1.29% GCC11 Size = [4, 8] -> 33.86% size, align0, align1, ret, New Time/Old Time 4, 4, 4, 0, 0.622 4, 4, 4, 1, 0.797 4, 4, 4, -1, 0.805 5, 5, 5, 0, 0.623 5, 5, 5, 1, 0.777 5, 5, 5, -1, 0.802 6, 6, 6, 0, 0.625 6, 6, 6, 1, 0.813 6, 6, 6, -1, 0.788 7, 7, 7, 0, 0.625 7, 7, 7, 1, 0.799 7, 7, 7, -1, 0.795 8, 8, 8, 0, 0.625 8, 8, 8, 1, 0.848 8, 8, 8, -1, 0.914 9, 9, 9, 0, 0.625 Size = 65: size, align0, align1, ret, New Time/Old Time 65, 0, 0, 0, 1.103 65, 0, 0, 1, 1.216 65, 0, 0, -1, 1.227 65, 65, 0, 0, 1.091 65, 0, 65, 1, 1.19 65, 65, 65, -1, 1.215 This is because A) the checks in range [65, 96] are now unrolled 2x and B) because smaller values <= 16 are now given a hotter path. By contrast the SSE4 version has a branch for Size = 80. The unrolled version has get better performance for returns which need both comparisons. size, align0, align1, ret, New Time/Old Time 128, 4, 8, 0, 0.858 128, 4, 8, 1, 0.879 128, 4, 8, -1, 0.888 As well, out of microbenchmark environments that are not full predictable the branch will have a real-cost. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 7cbc03d03091d5664060924789afe46d30a5477e)	2022-05-16 12:25:18 -07:00
Noah Goldstein	5a8df6485c	x86: Optimize memcmp SSE2 in memcmp.S New code save size (-303 bytes) and has significantly better performance. geometric_mean(N=20) of page cross cases New / Original: 0.634 Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 8804157ad9da39631703b92315460808eac86b0c)	2022-05-16 12:19:59 -07:00
Noah Goldstein	af0865571a	x86: Small improvements for wcslen Just a few QOL changes. 1. Prefer `add` > `lea` as it has high execution units it can run on. 2. Don't break macro-fusion between `test` and `jcc` 3. Reduce code size by removing gratuitous padding bytes (-90 bytes). geometric_mean(N=20) of all benchmarks New / Original: 0.959 All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 244b415d386487521882debb845a040a4758cb18)	2022-05-16 12:17:42 -07:00
Noah Goldstein	3b710e32d8	x86: Remove AVX str{n}casecmp The rational is: 1. SSE42 has nearly identical logic so any benefit is minimal (3.4% regression on Tigerlake using SSE42 versus AVX across the benchtest suite). 2. AVX2 version covers the majority of targets that previously prefered it. 3. The targets where AVX would still be best (SnB and IVB) are becoming outdated. All in all the saving the code size is worth it. All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 305769b2a15c2e96f9e1b5195d3c4e0d6f0f4b68)	2022-05-16 12:12:10 -07:00
Noah Goldstein	fc5d42bf82	x86: Add EVEX optimized str{n}casecmp geometric_mean(N=40) of all benchmarks EVEX / SSE42: .621 All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 84e7c46df4086873eae28a1fb87d2cf5388b1e16)	2022-05-16 12:07:56 -07:00
Noah Goldstein	33fcf8344f	x86: Add AVX2 optimized str{n}casecmp geometric_mean(N=40) of all benchmarks AVX2 / SSE42: .702 All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit bbf81222343fed5cd704001a2ae0d86c71544151)	2022-05-16 12:00:53 -07:00
Noah Goldstein	3496d64d69	x86: Optimize str{n}casecmp TOLOWER logic in strcmp-sse42.S Slightly faster method of doing TOLOWER that saves an instruction. Also replace the hard coded 5-byte no with .p2align 4. On builds with CET enabled this misaligned entry to strcasecmp. geometric_mean(N=40) of all benchmarks New / Original: .920 All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit d154758e618ec9324f5d339c46db0aa27e8b1226)	2022-05-16 11:59:00 -07:00
Noah Goldstein	283982b362	x86: Optimize str{n}casecmp TOLOWER logic in strcmp.S Slightly faster method of doing TOLOWER that saves an instruction. Also replace the hard coded 5-byte no with .p2align 4. On builds with CET enabled this misaligned entry to strcasecmp. geometric_mean(N=40) of all benchmarks New / Original: .894 All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 670b54bc585ea4a94f3b2e9272ba44aa6b730b73)	2022-05-16 11:56:16 -07:00
Noah Goldstein	420cd6f155	x86: Remove strspn-sse2.S and use the generic implementation The generic implementation is faster. geometric_mean(N=20) of all benchmarks New / Original: .710 All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 9c8a6ad620b49a27120ecdd7049c26bf05900397)	2022-05-16 11:52:47 -07:00
Noah Goldstein	4b61d76521	x86: Remove strpbrk-sse2.S and use the generic implementation The generic implementation is faster (see strcspn commit). All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 653358535280a599382cb6c77538a187dac6a87f)	2022-05-16 11:48:09 -07:00
Noah Goldstein	2fef1961a7	x86: Remove strcspn-sse2.S and use the generic implementation The generic implementation is faster. geometric_mean(N=20) of all benchmarks New / Original: .678 All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit fe28e7d9d9535ebab4081d195c553b4fbf39d9ae)	2022-05-16 10:56:35 -07:00
Noah Goldstein	1ed2813eb1	x86: Optimize strspn in strspn-c.c Use _mm_cmpeq_epi8 and _mm_movemask_epi8 to get strlen instead of _mm_cmpistri. Also change offset to unsigned to avoid unnecessary sign extensions. geometric_mean(N=20) of all benchmarks that dont fallback on sse2; New / Original: .901 All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 412d10343168b05b8cf6c3683457cf9711d28046)	2022-05-16 10:51:41 -07:00
Noah Goldstein	3214c878f2	x86: Optimize strcspn and strpbrk in strcspn-c.c Use _mm_cmpeq_epi8 and _mm_movemask_epi8 to get strlen instead of _mm_cmpistri. Also change offset to unsigned to avoid unnecessary sign extensions. geometric_mean(N=20) of all benchmarks that dont fallback on sse2/strlen; New / Original: .928 All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 30d627d477d7255345a4b713cf352ac32d644d61)	2022-05-16 10:45:15 -07:00
Noah Goldstein	ff9772ac19	x86: Code cleanup in strchr-evex and comment justifying branch Small code cleanup for size: -81 bytes. Add comment justifying using a branch to do NULL/non-null return. All string/memory tests pass and no regressions in benchtests. geometric_mean(N=20) of all benchmarks New / Original: .985 Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit ec285ea90415458225623ddc0492ae3f705af043)	2022-05-16 10:42:53 -07:00
Noah Goldstein	424bbd4d25	x86: Code cleanup in strchr-avx2 and comment justifying branch Small code cleanup for size: -53 bytes. Add comment justifying using a branch to do NULL/non-null return. All string/memory tests pass and no regressions in benchtests. geometric_mean(N=20) of all benchmarks Original / New: 1.00 Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit a6fbf4d51e9ba8063c4f8331564892ead9c67344)	2022-05-16 10:39:59 -07:00
Adhemerval Zanella	0a10b8b181	x86_64: Remove bcopy optimizations The symbols is not present in current POSIX specification and compiler already generates memmove call. (cherry picked from commit bf92893a14ebc161b08b28acc24fa06ae6be19cb)	2022-05-16 10:36:04 -07:00
H.J. Lu	f0a53588da	x86-64: Define __memcmpeq in ld.so Define __memcmpeq in ld.so so that compiler can generate __memcmpeq call when compiling for ld.so. (cherry picked from commit a5659cf27d3ce6101c1632715d18ab6321755340)	2022-05-16 10:33:14 -07:00
H.J. Lu	a133623048	x86-64: Remove bzero weak alias in SS2 memset commit 3d9f171bfb5325bd5f427e9fc386453358c6e840 Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Feb 7 05:55:15 2022 -0800 x86-64: Optimize bzero added the optimized bzero. Remove bzero weak alias in SS2 memset to avoid undefined __bzero in memset-sse2-unaligned-erms. (cherry picked from commit 0fb8800029d230b3711bf722b2a47db92d0e273f)	2022-05-16 10:29:20 -07:00
H.J. Lu	18baf86f51	x86_64/multiarch: Sort sysdep_routines and put one entry per line (cherry picked from commit c328d0152d4b14cca58407ec68143894c8863004)	2022-05-16 10:26:53 -07:00
H.J. Lu	d422197a69	x86: Improve L to support L(XXX_SYMBOL (YYY, ZZZ)) (cherry picked from commit 1283948f236f209b7d3f44b69a42b96806fa6da0)	2022-05-16 10:20:57 -07:00
Siddhesh Poyarekar	58947e1fa5	fortify: Ensure that __glibc_fortify condition is a constant [BZ #29141 ] The fix c8ee1c85 introduced a -1 check for object size without also checking that object size is a constant. Because of this, the tree optimizer passes in gcc fail to fold away one of the branches in __glibc_fortify and trips on a spurious Wstringop-overflow. The warning itself is incorrect and the branch does go away eventually in DCE in the rtl passes in gcc, but the constant check is a helpful hint to simplify code early, so add it in. Resolves: BZ #29141 Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2022-05-16 21:28:38 +05:30
Florian Weimer	28ea43f8d6	dlfcn: Implement the RTLD_DI_PHDR request type for dlinfo The information is theoretically available via dl_iterate_phdr as well, but that approach is very slow if there are many shared objects. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@rehdat.com> (cherry picked from commit d056c212130280c0a54d9a4f72170ec621b70ce5)	2022-05-11 19:17:47 +02:00

1 2 3 4 5 ...

38476 Commits