glibc

mirror of git://sourceware.org/git/glibc.git synced 2025-03-31 14:01:18 +08:00

Author	SHA1	Message	Date
Alex Butler	6d69c4aad4	aarch64: MTE compatible strncmp Add support for MTE to strncmp. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Branislav Rankov <branislav.rankov@arm.com> Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com> (cherry picked from commit 03e1378f94173fc192a81e421457198f7b8a34a0)	2024-11-04 17:15:10 +00:00
Andreas Schwab	73886db621	wordexp: handle overflow in positional parameter number (bug 28011) Use strtoul instead of atoi so that overflow can be detected. (cherry picked from commit 5adda61f62b77384718b4c0d8336ade8f2b4b35c)	2021-07-06 19:04:26 +00:00
DJ Delorie	c49cbcdc32	nscd: Fix double free in netgroupcache [BZ #27462 ] In commit 745664bd798ec8fd50438605948eea594179fba1 a use-after-free was fixed, but this led to an occasional double-free. This patch tracks the "live" allocation better. Tested manually by a third party. Related: RHBZ 1927877 Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit dca565886b5e8bd7966e15f0ca42ee5cff686673)	2021-03-08 10:12:28 +00:00
Florian Weimer	2777e19c05	nscd: Fix use-after-free in addgetnetgrentX [BZ #23520 ] addinnetgrX may use the heap-allocated buffer, so free the buffer in this function. (cherry picked from commit 745664bd798ec8fd50438605948eea594179fba1)	2021-03-08 10:12:28 +00:00
Siddhesh Poyarekar	3668134a9e	Add NEWS entry for CVE-2020-29562 (BZ #26923 ) BZ #26923 now has a CVE entry, so add a NEWS entry for it. (cherry picked from commit 38a9e93cb1c58e3c899d638480e6d6e42af8e6fc)	2021-01-03 12:21:12 +00:00
Michael Colavita	5fa7884e25	iconv: Fix incorrect UCS4 inner loop bounds (BZ#26923) Previously, in UCS4 conversion routines we limit the number of characters we examine to the minimum of the number of characters in the input and the number of characters in the output. This is not the correct behavior when __GCONV_IGNORE_ERRORS is set, as we do not consume an output character when we skip a code unit. Instead, track the input and output pointers and terminate the loop when either reaches its limit. This resolves assertion failures when resetting the input buffer in a step of iconv, which assumes that the input will be fully consumed given sufficient output space. (cherry picked from commit 228edd356f03bf62dcf2b1335f25d43c602ee68d)	2021-01-03 12:21:12 +00:00
Arjun Shankar	d8ae6c00a9	iconv: Accept redundant shift sequences in IBM1364 [BZ #26224 ] The IBM1364, IBM1371, IBM1388, IBM1390 and IBM1399 character sets share converter logic (iconvdata/ibm1364.c) which would reject redundant shift sequences when processing input in these character sets. This led to a hang in the iconv program (CVE-2020-27618). This commit adjusts the converter to ignore redundant shift sequences and adds test cases for iconv_prog hangs that would be triggered upon their rejection. This brings the implementation in line with other converters that also ignore redundant shift sequences (e.g. IBM930 etc., fixed in commit 692de4b3960d). Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit 9a99c682144bdbd40792ebf822fe9264e0376fb5)	2020-11-30 22:59:53 +00:00
Arjun Shankar	670c675da3	intl: Handle translation output codesets with suffixes [BZ #26383 ] Commit 91927b7c7643 (Rewrite iconv option parsing [BZ #19519]) did not handle cases where the output codeset for translations (via the `gettext' family of functions) might have a caller specified encoding suffix such as TRANSLIT or IGNORE. This led to a regression where translations did not work when the codeset had a suffix. This commit fixes the above issue by parsing any suffixes passed to __dcigettext and adds two new test-cases to intl/tst-codeset.c to verify correct behaviour. The iconv-internal function __gconv_create_spec and the static iconv-internal function gconv_destroy_spec are now visible internally within glibc and used in intl/dcigettext.c. (cherry picked from commit 7d4ec75e111291851620c6aa2c4460647b7fd50d)	2020-11-30 22:59:53 +00:00
Aurelien Jarno	8fb94f8824	Add NEWS entry for CVE-2016-10228 (bug 19519) (cherry picked from commit 17a0126abf02955cabf6256c67f8f9462a64163f)	2020-11-30 22:59:53 +00:00
Arjun Shankar	ec51be40c7	Rewrite iconv option parsing [BZ #19519 ] This commit replaces string manipulation during `iconv_open' and iconv_prog option parsing with a structured, flag based conversion specification. In doing so, it alters the internal `__gconv_open' interface and accordingly adjusts its uses. This change fixes several hangs in the iconv program and therefore includes a new test to exercise iconv_prog options that originally led to these hangs. It also includes a new regression test for option handling in the iconv function. Reviewed-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit 91927b7c76437db860cd86a7714476b56bb39d07)	2020-11-30 22:59:53 +00:00
Arjun Shankar	bdaa594506	support: Add xsetlocale function (cherry picked from commit cce35a50c1de0cec5cd1f6c18979ff6ee3ea1dd1)	2020-11-30 22:59:53 +00:00
Arjun Shankar	68d583bb1c	Add Transliterations for Unicode Misc. Mathematical Symbols-A/B [BZ #23132 ] This commit adds previously missing transliterations for several code points in the Unicode blocks "Miscellaneous Mathematical Symbols-A/B" - transliterated to their approximate ASCII representations. It also adds a corresponding iconv transliteration test. Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit 513aaa0d782f8fae36732d06ca59d658149f0139)	2020-11-30 22:59:53 +00:00
Florian Weimer	222b1517cd	support: Correct error message for TEST_COMPARE_STRING It should say "string", not "blob". (cherry picked from commit 6175507c06de56e03407004bd2f289ed2cce034d)	2020-11-30 22:59:53 +00:00
Adhemerval Zanella	62df32d604	support: Fix printf format for TEST_COMPARE_STRING Fix the following on 32 bits targets: support_test_compare_string.c: In function 'support_test_compare_string': support_test_compare_string.c:80:37: error: format '%lu' expects argument of type 'long unsigned int', but argument 2 has type 'size_t' {aka 'unsigned int'} [-Werror=format=] printf (" string length: %lu bytes\n", left_length); ~~^ ~~~~~~~~~~~ %u Checked on arm-linux-gnueabihf. * support/support_test_compare_string.c (support_test_compare_string): Fix printf format. (cherry picked from commit 00c86a37d1b63044e3169d1f2ebec23447c73f79)	2020-11-30 22:59:53 +00:00
Florian Weimer	711f183508	support: Implement TEST_COMPARE_STRING (cherry picked from commit 1df872fd74f730bcae3df201a229195445d2e18a)	2020-11-30 22:59:53 +00:00
Andreas Schwab	8b84316420	Fix iconv buffer handling with IGNORE error handler (bug #18830 ) (cherry picked from commit 4802be92c891903caaf8cae47f685da6f26d4b9a)	2020-11-30 22:59:53 +00:00
Aurelien Jarno	477587c75e	Add NEWS entry for CVE-2020-10029 (bug 25487) (cherry picked from commit 15ab195229dc288d1d49612c3de14a33b88065ed)	2020-11-30 22:59:53 +00:00
Florian Weimer	4b8628acab	math/test-sinl-pseudo: Use stack protector only if available This fixes commit 9333498794cde1d5cca518bad ("Avoid ldbl-96 stack corruption from range reduction of pseudo-zero (bug 25487)."). (cherry picked from commit c10acd40262486dac597001aecc20ad9d3bd0e4a)	2020-11-30 22:59:53 +00:00
Joseph Myers	59420258af	Avoid ldbl-96 stack corruption from range reduction of pseudo-zero (bug 25487). Bug 25487 reports stack corruption in ldbl-96 sinl on a pseudo-zero argument (an representation where all the significand bits, including the explicit high bit, are zero, but the exponent is not zero, which is not a valid representation for the long double type). Although this is not a valid long double representation, existing practice in this area (see bug 4586, originally marked invalid but subsequently fixed) is that we still seek to avoid invalid memory accesses as a result, in case of programs that treat arbitrary binary data as long double representations, although the invalid representations of the ldbl-96 format do not need to be consistently handled the same as any particular valid representation. This patch makes the range reduction detect pseudo-zero and unnormal representations that would otherwise go to __kernel_rem_pio2, and returns a NaN for them instead of continuing with the range reduction process. (Pseudo-zero and unnormal representations whose unbiased exponent is less than -1 have already been safely returned from the function before this point without going through the rest of range reduction.) Pseudo-zero representations would previously result in the value passed to __kernel_rem_pio2 being all-zero, which is definitely unsafe; unnormal representations would previously result in a value passed whose high bit is zero, which might well be unsafe since that is not a form of input expected by __kernel_rem_pio2. Tested for x86_64. (cherry picked from commit 9333498794cde1d5cca518badf79533a24114b6f)	2020-11-30 22:59:53 +00:00
Aurelien Jarno	e075046743	Add NEWS entry for CVE-2020-1751 (bug 25423) Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit 07d16a6debc830ebcf9533da5396edd2eff688e0)	2020-11-16 08:00:00 +00:00
Aurelien Jarno	daf88b1dd1	Add NEWS entry for CVE-2020-6096 (bug 25620) Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit 17400c4bcd57d84add1da3aa93248ef2efdb0ccb)	2020-11-16 08:00:00 +00:00
Alexander Anisimov	b29853702e	arm: CVE-2020-6096: Fix multiarch memcpy for negative length [BZ #25620 ] Unsigned branch instructions could be used for r2 to fix the wrong behavior when a negative length is passed to memcpy. This commit fixes the armv7 version. (cherry picked from commit beea361050728138b82c57dda0c4810402d342b9)	2020-11-16 08:00:00 +00:00
Evgeny Eremin	bad8d5ff60	arm: CVE-2020-6096: fix memcpy and memmove for negative length [BZ #25620 ] Unsigned branch instructions could be used for r2 to fix the wrong behavior when a negative length is passed to memcpy and memmove. This commit fixes the generic arm implementation of memcpy amd memmove. (cherry picked from commit 79a4fa341b8a89cb03f84564fd72abaa1a2db394)	2020-11-16 08:00:00 +00:00
Andreas Schwab	d64ad0a517	Fix use-after-free in glob when expanding ~user (bug 25414) The value of `end_name' points into the value of `dirname', thus don't deallocate the latter before the last use of the former. (cherry picked from commit ddc650e9b3dc916eab417ce9f79e67337b05035c) (cherry picked from commit 39a05214fe14ff722d4d92e697fb71ff15e84e70)	2020-11-16 08:00:00 +00:00
Andreas Schwab	34ce87638c	Fix array overflow in backtrace on PowerPC (bug 25423) When unwinding through a signal frame the backtrace function on PowerPC didn't check array bounds when storing the frame address. Fixes commit d400dcac5e ("PowerPC: fix backtrace to handle signal trampolines"). (cherry picked from commit d93769405996dfc11d216ddbe415946617b5a494)	2020-11-16 08:00:00 +00:00
Florian Weimer	0df8ecff9e	misc/test-errno-linux: Handle EINVAL from quotactl In commit 3dd4d40b420846dd35869ccc8f8627feef2cff32 ("xfs: Sanity check flags of Q_XQUOTARM call"), Linux 5.4 added checking for the flags argument, causing the test to fail due to too restrictive test expectations. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 1f7525d924b608a3e43b10fcfb3d46b8a6e9e4f9)	2020-11-16 08:00:00 +00:00
Kamlesh Kumar	26f5442ec1	<string.h>: Define __CORRECT_ISO_CPP_STRING_H_PROTO for Clang [BZ #25232 ] Without the asm redirects, strchr et al. are not const-correct. libc++ has a wrapper header that works with and without __CORRECT_ISO_CPP_STRING_H_PROTO (using a Clang extension). But when Clang is used with libstdc++ or just C headers, the overloaded functions with the correct types are not declared. This change does not impact current GCC (with libstdc++ or libc++). (cherry picked from commit 953ceff17a4a15b10cfdd5edc3c8cae4884c8ec3)	2020-11-16 08:00:00 +00:00
Aurelien Jarno	4b64a4245c	intl/tst-gettext: fix failure with newest msgfmt Since upstream gettext commit d13f165b83 (msgfmt: Remove POT-Creation-Date field from the header in the output.), msgfmt does not copy the POT-Creation-Date field in the header entry from the po file to the mo file anymore. This breaks the assumption that we can test gettext by comparing each message in the po files with the corresponding string return by gettext. This makes the intl/tst-gettext to fail. While it would have been possible to modify the po2test.awk script to also strip the line POT-Creation-Date field when creating the msgs.h file, it would not work with both the old and new msgfmt. Instead create a tst-gettext-de.po file from de.po by removing the POT-Creation-Date line. Another alternative would be to use a static tst-gettext-de.po file, but I guess the reason for using de.po is to also catch issues caused by newly added strings. As tst-catgets also uses msg.h, it should also be updated. Instead of using the new tst-gettext-de.po file, the patch modifies xopen-msg.awk to avoid creating a second catgets->intl dependency. Changelog: [BZ #21508] * catgets/xopen-msg.awk: Ignore POT-Creation-Date line. * intl/Makefile ($(objpfx)tst-gettext-de.po): Generate intl/tst-gettext-de.po from po/de.po by removing the POT-Creation-Date line. ($(objpfx)msgs.h): Depend on $(objpfx)tst-gettext-de.po instead of ../po/de.po. * intl/tst-gettext.sh: Use ${objpfx}tst-gettext-de.po instead of ../po/de.po. (cherry picked from commit 56456a2aadf0522b51ea55be1291ca832c5d0524)	2020-11-16 08:00:00 +00:00
Szabolcs Nagy	dc7f51bda9	aarch64: Fix DT_AARCH64_VARIANT_PCS handling [BZ #26798 ] The variant PCS support was ineffective because in the common case linkmap->l_mach.plt == 0 but then the symbol table flags were ignored and normal lazy binding was used instead of resolving the relocs early. (This was a misunderstanding about how GOT[1] is setup by the linker.) In practice this mainly affects SVE calls when the vector length is more than 128 bits, then the top bits of the argument registers get clobbered during lazy binding. Fixes bug 26798. (cherry picked from commit 558251bd8785760ad40fcbfeaaee5d27fa5b0fe4)	2020-11-04 12:30:21 +00:00
Szabolcs Nagy	8edc96aa33	aarch64: add HWCAP_ATOMICS to HWCAP_IMPORTANT This enables searching shared libraries in atomics/ when the hardware supports LSE atomics of armv8.1 so one can provide optimized variants of libraries in a portable way. LSE atomics does not affect library abi, the new instructions can interoperate with old ones. I considered the earlier comments on the patch https://sourceware.org/ml/libc-alpha/2018-04/msg00400.html https://sourceware.org/ml/libc-alpha/2018-04/msg00625.html It turns out that the way glibc dynamic linker decides on the search path is not very flexible: it wants to use hwcap bits and associated strings. So some targets reuse hwcap bits for glibc internal purposes to affect the search logic. But hwcap is an interface with the kernel, glibc should not allocate bits in it for its internal logic as that limits future hwcap extensions and confusing to users who expect to see hwcap bits in ifunc resolvers. Instead of rewriting the dynamic linker path logic (which affects all targets) this patch just uses the existing mechanism, however this means that the path name has to be the hwcap name "atomics" and cannot be changed to something more meaningful to users. It is hard to tell how much performance benefit this can give, in principle armv8.1 atomics can be better optimized in the hardware, so it can make a difference for synchronization heavy code. On some systems such multilib setup may be the only viable way to get optimized libraries used. * sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h (HWCAP_IMPORTANT): Add HWCAP_ATOMICS. (cherry picked from commit 397c54c1afa531242602fe3ac7bb47eff0e909f9)	2020-03-25 16:24:32 +00:00
Szabolcs Nagy	599ebfacc0	aarch64: Remove HWCAP_CPUID from HWCAP_IMPORTANT This partially reverts commit f82e9672ad89ea1ef40bbe1af71478e255e87c5e Author: Siddhesh Poyarekar <siddhesh@sourceware.org> aarch64: Allow overriding HWCAP_CPUID feature check using HWCAP_MASK The idea was to make it possible to disable cpuid based ifunc resolution in glibc by changing the hwcap mask which the user could already control. However the hwcap mask has an orthogonal role: it specifies additional library search paths for the dynamic linker. So "cpuid" got added to the search paths when it was set in the default mask (HWCAP_IMPORTANT), which is not useful behaviour, the hwcap masking should not be reused in the cpu features code. Meanwhile there is a tunable to set the cpu explicitly so it is possible to disable the cpuid based dispatch without using a hwcap mask: GLIBC_TUNABLES=glibc.tune.cpu=generic * sysdeps/unix/sysv/linux/aarch64/cpu-features.c (init_cpu_features): Use dl_hwcap without masking. * sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h (HWCAP_IMPORTANT): Remove HWCAP_CPUID. (cherry picked from commit d0cd79807157e399ff58e67cb51651f90442122e)	2020-03-25 16:24:01 +00:00
Florian Weimer	bef0b1cb31	libio: Disable vtable validation for pre-2.1 interposed handles [BZ #25203 ] Commit c402355dfa7807b8e0adb27c009135a7e2b9f1b0 ("libio: Disable vtable validation in case of interposition [BZ #23313]") only covered the interposable glibc 2.1 handles, in libio/stdfiles.c. The parallel code in libio/oldstdfiles.c needs similar detection logic. Fixes (again) commit db3476aff19b75c4fdefbe65fcd5f0a90588ba51 ("libio: Implement vtable verification [BZ #20191]"). Change-Id: Ief6f9f17e91d1f7263421c56a7dc018f4f595c21 (cherry picked from commit cb61630ed712d033f54295f776967532d3f4b46a)	2019-11-28 14:42:32 +01:00
Marcin Kościelnicki	4d5cfeb510	rtld: Check __libc_enable_secure before honoring LD_PREFER_MAP_32BIT_EXEC (CVE-2019-19126) [BZ #25204 ] The problem was introduced in glibc 2.23, in commit b9eb92ab05204df772eb4929eccd018637c9f3e9 ("Add Prefer_MAP_32BIT_EXEC to map executable pages with MAP_32BIT"). (cherry picked from commit d5dfad4326fc683c813df1e37bbf5cf920591c8e)	2019-11-22 13:40:07 +01:00
Dragan Mladjenovic	92f04eedb5	mips: Force RWX stack for hard-float builds that can run on pre-4.8 kernels Linux/Mips kernels prior to 4.8 could potentially crash the user process when doing FPU emulation while running on non-executable user stack. Currently, gcc doesn't emit .note.GNU-stack for mips, but that will change in the future. To ensure that glibc can be used with such future gcc, without silently resulting in binaries that might crash in runtime, this patch forces RWX stack for all built objects if configured to run against minimum kernel version less than 4.8. * sysdeps/unix/sysv/linux/mips/Makefile (test-xfail-check-execstack): Move under mips-has-gnustack != yes. (CFLAGS-.o, ASFLAGS-.o): New rules. Apply -Wa,-execstack if mips-force-execstack == yes. * sysdeps/unix/sysv/linux/mips/configure: Regenerated. * sysdeps/unix/sysv/linux/mips/configure.ac (mips-force-execstack): New var. Set to yes for hard-float builds with minimum_kernel < 4.8.0 or minimum_kernel not set at all. (mips-has-gnustack): New var. Use value of libc_cv_as_noexecstack if mips-force-execstack != yes, otherwise set to no. (cherry picked from commit 33bc9efd91de1b14354291fc8ebd5bce96379f12)	2019-11-05 14:25:38 -03:00
Wilco Dijkstra	5b4f7382af	Add undef to fix test failure.	2019-09-13 16:35:12 +01:00
Wilco Dijkstra	9456483fb2	Improve performance of memmem This patch significantly improves performance of memmem using a novel modified Horspool algorithm. Needles up to size 256 use a bad-character table indexed by hashed pairs of characters to quickly skip past mismatches. Long needles use a self-adapting filtering step to avoid comparing the whole needle repeatedly. By limiting the needle length to 256, the shift table only requires 8 bits per entry, lowering preprocessing overhead and minimizing cache effects. This limit also implies worst-case performance is linear. Small needles up to size 2 use a dedicated linear search. Very long needles use the Two-Way algorithm (to avoid increasing stack size or slowing down the common case, inlining is disabled). The performance gain is 6.6 times on English text on AArch64 using random needles with average size 8. Tested against GLIBC testsuite and randomized tests. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> * string/memmem.c (__memmem): Rewrite to improve performance. (cherry picked from commit 680942b0167715e123d934b609060cd382f8e39f)	2019-09-13 15:50:08 +01:00
Wilco Dijkstra	373f8b06a3	Improve performance of strstr This patch significantly improves performance of strstr using a novel modified Horspool algorithm. Needles up to size 256 use a bad-character table indexed by hashed pairs of characters to quickly skip past mismatches. Long needles use a self-adapting filtering step to avoid comparing the whole needle repeatedly. By limiting the needle length to 256, the shift table only requires 8 bits per entry, lowering preprocessing overhead and minimizing cache effects. This limit also implies worst-case performance is linear. Small needles up to size 3 use a dedicated linear search. Very long needles use the Two-Way algorithm. The performance gain using the improved bench-strstr on Cortex-A72 is 5.8 times basic_strstr and 3.7 times twoway_strstr. Tested against GLIBC testsuite, randomized tests and the GNULIB strstr test (https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-strstr.c). Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> * string/str-two-way.h (two_way_short_needle): Add inline to avoid warning. (two_way_long_needle): Block inlining. * string/strstr.c (strstr2): Add new function. (strstr3): Likewise. (STRSTR): Completely rewrite strstr to improve performance. (cherry picked from commit 5e0a7ecb6629461b28adc1a5aabcc0ede122f201)	2019-09-13 15:50:02 +01:00
Wilco Dijkstra	4ec1b9e913	Fix strstr bug with huge needles (bug 23637) The generic strstr in GLIBC 2.28 fails to match huge needles. The optimized AVAILABLE macro reads ahead a large fixed amount to reduce the overhead of repeatedly checking for the end of the string. However if the needle length is larger than this, two_way_long_needle may confuse this as meaning the end of the string and return NULL. This is fixed by adding the needle length to the amount to read ahead. [BZ #23637] * string/test-strstr.c (pr23637): New function. (test_main): Add tests with longer needles. * string/strcasestr.c (AVAILABLE): Fix readahead distance. * string/strstr.c (AVAILABLE): Likewise. (cherry picked from commit 83a552b0bb9fc2a5e80a0ab3723c0a80ce1db9f2)	2019-09-13 15:49:20 +01:00
Rajalakshmi Srinivasaraghavan	ecd6271ed8	Speedup first memmem match As done in commit 284f42bc778e487dfd5dff5c01959f93b9e0c4f5, memcmp can be used after memchr to avoid the initialization overhead of the two-way algorithm for the first match. This has shown improvement >40% for first match. (cherry picked from commit c8dd67e7c958de04c3783cbea7c384431707b5f8)	2019-09-13 15:48:19 +01:00
Wilco Dijkstra	bba6b9288f	Simplify and speedup strstr/strcasestr first match Looking at the benchtests, both strstr and strcasestr spend a lot of time in a slow initialization loop handling one character per iteration. This can be simplified and use the much faster strlen/strnlen/strchr/memcmp. Read ahead a few cachelines to reduce the number of strnlen calls, which improves performance by ~3-4%. This patch improves the time taken for the full strstr benchtest by >40%. * string/strcasestr.c (STRCASESTR): Simplify and speedup first match. * string/strstr.c (AVAILABLE): Likewise. (cherry picked from commit 284f42bc778e487dfd5dff5c01959f93b9e0c4f5)	2019-09-13 15:47:22 +01:00
Wilco Dijkstra	7a4da6ef7a	Improve strstr performance Improve strstr performance. Strstr tends to be slow because it uses many calls to memchr and a slow byte loop to scan for the next match. Performance is significantly improved by using strnlen on larger blocks and using strchr to search for the next matching character. strcasestr can also use strnlen to scan ahead, and memmem can use memchr to check for the next match. On the GLIBC bench tests the performance gains on Cortex-A72 are: strstr: +25% strcasestr: +4.3% memmem: +18% On a 256KB dataset strstr performance improves by 67%, strcasestr by 47%. Reviewd-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 3ae725dfb6d7f61447d27d00ed83e573bd5454f4)	2019-09-13 15:46:16 +01:00
Wilco Dijkstra	5f0d2e0491	[AArch64] Add ifunc support for Ares Add Ares to the midr_el0 list and support ifunc dispatch. Since Ares supports 2 128-bit loads/stores, use Neon registers for memcpy by selecting __memcpy_falkor by default (we should rename this to __memcpy_simd or similar). * manual/tunables.texi (glibc.cpu.name): Add ares tunable. * sysdeps/aarch64/multiarch/memcpy.c (__libc_memcpy): Use __memcpy_falkor for ares. * sysdeps/unix/sysv/linux/aarch64/cpu-features.h (IS_ARES): Add new define. * sysdeps/unix/sysv/linux/aarch64/cpu-features.c (cpu_list): Add ares cpu. (cherry picked from commit 02f440c1ef5d5d79552a524065aa3e2fabe469b9)	2019-09-06 18:53:37 +01:00
Siddhesh Poyarekar	e6b7252040	aarch64,falkor: Use vector registers for memcpy Vector registers perform better than scalar register pairs for copying data so prefer them instead. This results in a time reduction of over 50% (i.e. 2x speed improvemnet) for some smaller sizes for memcpy-walk. Larger sizes show improvements of around 1% to 2%. memcpy-random shows a very small improvement, in the range of 1-2%. * sysdeps/aarch64/multiarch/memcpy_falkor.S (__memcpy_falkor): Use vector registers. (cherry picked from commit 0aec4c1d1801e8016ebe89281d16597e0557b8be)	2019-09-06 18:38:56 +01:00
Siddhesh Poyarekar	c74b884f70	aarch64,falkor: Ignore prefetcher tagging for smaller copies For smaller and medium sized copies, the effect of hardware prefetching are not as dominant as instruction level parallelism. Hence it makes more sense to load data into multiple registers than to try and route them to the same prefetch unit. This is also the case for the loop exit where we are unable to latch on to the same prefetch unit anyway so it makes more sense to have data loaded in parallel. The performance results are a bit mixed with memcpy-random, with numbers jumping between -1% and +3%, i.e. the numbers don't seem repeatable. memcpy-walk sees a 70% improvement (i.e. > 2x) for 128 bytes and that improvement reduces down as the impact of the tail copy decreases in comparison to the loop. * sysdeps/aarch64/multiarch/memcpy_falkor.S (__memcpy_falkor): Use multiple registers to copy data in loop tail. (cherry picked from commit db725a458e1cb0e17204daa543744faf08bb2e06)	2019-09-06 18:36:23 +01:00
Siddhesh Poyarekar	0fc5934ebd	aarch64/strncmp: Use lsr instead of mov+lsr A lsr can do what the mov and lsr did. (cherry picked from commit b47c3e7637efb77818cbef55dcd0ed1f0ea0ddf1)	2019-09-06 17:00:32 +01:00
Siddhesh Poyarekar	e0a0bd3acc	aarch64/strncmp: Unbreak builds with old binutils Binutils 2.26.* and older do not support moves with shifted registers, so use a separate shift instruction instead. (cherry picked from commit d46f84de745db8f3f06a37048261f4e5ceacf0a3)	2019-09-06 16:59:34 +01:00
Siddhesh Poyarekar	638caf3000	aarch64: Improve strncmp for mutually misaligned inputs The mutually misaligned inputs on aarch64 are compared with a simple byte copy, which is not very efficient. Enhance the comparison similar to strcmp by loading a double-word at a time. The peak performance improvement (i.e. 4k maxlen comparisons) due to this on the strncmp microbenchmark is as follows: falkor: 3.5x (up to 72% time reduction) cortex-a73: 3.5x (up to 71% time reduction) cortex-a53: 3.5x (up to 71% time reduction) All mutually misaligned inputs from 16 bytes maxlen onwards show upwards of 15% improvement and there is no measurable effect on the performance of aligned/mutually aligned inputs. * sysdeps/aarch64/strncmp.S (count): New macro. (strncmp): Store misaligned length in SRC1 in COUNT. (mutual_align): Adjust. (misaligned8): Load dword at a time when it is safe. (cherry picked from commit 7108f1f944792ac68332967015d5e6418c5ccc88)	2019-09-06 16:58:29 +01:00
Siddhesh Poyarekar	d5f45a29ff	aarch64/strcmp: fix misaligned loop jump target I accidentally set the loop jump back label as misaligned8 instead of do_misaligned. The typo is harmless but it's always nice to not have to unnecessarily execute those two instructions. * sysdeps/aarch64/strcmp.S (do_misaligned): Jump back to do_misaligned, not misaligned8. (cherry picked from commit 6ca24c43481e2c93a6eec362b04c3e77a35b28e3)	2019-09-06 16:57:46 +01:00
Siddhesh Poyarekar	7f690fafad	aarch64: Improve strcmp unaligned performance Replace the simple byte-wise compare in the misaligned case with a dword compare with page boundary checks in place. For simplicity I've chosen a 4K page boundary so that we don't have to query the actual page size on the system. This results in up to 3x improvement in performance in the unaligned case on falkor and about 2.5x improvement on mustang as measured using bench-strcmp. * sysdeps/aarch64/strcmp.S (misaligned8): Compare dword at a time whenever possible. (cherry picked from commit 2bce01ebbaf8db52ba4a5635eb5744f989cdbf69)	2019-09-06 16:56:45 +01:00
Siddhesh Poyarekar	40df047b3b	aarch64: Fix branch target to loop16 I goofed up when changing the loop8 name to loop16 and missed on out the branch instance. Fixed and actually build tested this time. * sysdeps/aarch64/memcmp.S (more16): Fix branch target loop16. (cherry picked from commit 4e54d918630ea53e29dd70d3bdffcb00d29ed3d4)	2019-09-06 16:20:12 +01:00

1 2 3 4 5 ...

33345 Commits