glibc

mirror of git://sourceware.org/git/glibc.git synced 2025-04-12 14:21:18 +08:00

Author	SHA1	Message	Date
Wilco Dijkstra	e059e458b8	AArch64: Optimize strlen Optimize strlen by unrolling the main loop. Large strings are 64% faster on modern CPUs. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 03c8ce5000198947a4dd7b2c14e5131738fda62b)	2024-04-09 19:10:57 +01:00
Wilco Dijkstra	ce9a4f6a3c	AArch64: Optimize strcpy Unroll the main loop. Large strings are around 20% faster on modern CPUs. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 349e48c01e85bd96006860084e76d322e6ca02f1)	2024-04-09 19:10:57 +01:00
Wilco Dijkstra	bb36cb21ef	AArch64: Improve strchrnul Unroll the main loop, which improves performance slightly. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 09ebd8549b2ce5a3a6c0c7c5f3e62227faf50a99)	2024-04-09 19:10:56 +01:00
Wilco Dijkstra	196458764f	AArch64: Optimize strchr Simplify calculation of the mask using shrn. Unroll the main loop. Small strings are 20% faster on modern CPUs. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 51541a229740801882490177fa178e49264b13fb)	2024-04-09 19:10:56 +01:00
Wilco Dijkstra	2a4c4043d0	AArch64: Improve strlen_asimd Use shrn for the mask, merge tst+bne into cbnz, and tweak code alignment. Performance improves slightly as a result. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 1bbb1a2022e126f21810d3d0ebe0a975d5243e43)	2024-04-09 19:10:56 +01:00
Wilco Dijkstra	f55ba2fedc	AArch64: Optimize memrchr Optimize the main loop - large strings are 43% faster on modern CPUs. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 00776241776e67fc666b896c1e85770f4f3ec1e1)	2024-04-09 19:10:56 +01:00
Wilco Dijkstra	91680682e5	AArch64: Optimize memchr Optimize the main loop - large strings are 40% faster on modern CPUs. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit ce758d4f063820c2bc743e12797d7454c66be718)	2024-04-09 19:10:48 +01:00
Danila Kutenin	e213c2205e	aarch64: Optimize string functions with shrn instruction We found that string functions were using AND+ADDP to find the nibble/syndrome mask but there is an easier opportunity through `SHRN dst.8b, src.8h, 4` (shift right every 2 bytes by 4 and narrow to 1 byte) and has same latency on all SIMD ARMv8 targets as ADDP. There are also possible gaps for memcmp but that's for another patch. We see 10-20% savings for small-mid size cases (<=128) which are primary cases for general workloads. (cherry picked from commit 3c9980698988ef64072f1fac339b180f52792faf)	2024-04-09 19:10:46 +01:00
Wilco Dijkstra	d30d8bb5ca	AArch64: Optimize memcmp Rewrite memcmp to improve performance. On small and medium inputs performance is 10-20% better. Large inputs use a SIMD loop processing 64 bytes per iteration, which is 30-50% faster depending on the size. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit b51eb35c572b015641f03e3682c303f7631279b7)	2024-04-09 19:10:21 +01:00
Wilco Dijkstra	405dd5b536	AArch64: Improve strnlen performance Optimize strnlen by avoiding UMINV which is slow on most cores. On Neoverse N1 large strings are 1.8x faster than the current version, and bench-strnlen is 50% faster overall. This version is MTE compatible. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 252cad02d4c63540501b9b8c988cb91248563224)	2024-04-09 19:10:15 +01:00
Szabolcs Nagy	0dc9ac6d5c	aarch64: use PTR_ARG and SIZE_ARG instead of DELOUSE DELOUSE was added to asm code to make them compatible with non-LP64 ABIs, but it is an unfortunate name and the code was not compatible with ABIs where pointer and size_t are different. Glibc currently only supports the LP64 ABI so these macros are not really needed or tested, but for now the name is changed to be more meaningful instead of removing them completely. Some DELOUSE macros were dropped: clone, strlen and strnlen used it unnecessarily. The out of tree ILP32 patches are currently not maintained and will likely need a rework to rebase them on top of the time64 changes. (cherry picked from commit 45b1e17e9150dbd9ac2d578579063fbfa8e1b327)	2024-04-09 19:10:06 +01:00
Sunil K Pandey	bd1ded3d05	x86_64: Optimize ffsll function code size. Ffsll function randomly regress by ~20%, depending on how code gets aligned in memory. Ffsll function code size is 17 bytes. Since default function alignment is 16 bytes, it can load on 16, 32, 48 or 64 bytes aligned memory. When ffsll function load at 16, 32 or 64 bytes aligned memory, entire code fits in single 64 bytes cache line. When ffsll function load at 48 bytes aligned memory, it splits in two cache line, hence random regression. Ffsll function size reduction from 17 bytes to 12 bytes ensures that it will always fit in single 64 bytes cache line. This patch fixes ffsll function random performance regression. Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit 9d94997b5f9445afd4f2bccc5fa60ff7c4361ec1)	2024-01-31 18:57:10 -08:00
Noah Goldstein	077f1f78bb	x86: Fix incorrect scope of setting `shared_per_thread` [BZ# 30745] The: ``` if (shared_per_thread > 0 && threads > 0) shared_per_thread /= threads; ``` Code was accidentally moved to inside the else scope. This doesn't match how it was previously (before af992e7abd). This patch fixes that by putting the division after the `else` block. (cherry picked from commit 084fb31bc2c5f95ae0b9e6df4d3cf0ff43471ede)	2023-09-11 22:47:26 -05:00
Noah Goldstein	ed4ceabea1	x86: Use `3/4*sizeof(per-thread-L3)` as low bound for NT threshold. On some machines we end up with incomplete cache information. This can make the new calculation of `sizeof(total-L3)/custom-divisor` end up lower than intended (and lower than the prior value). So reintroduce the old bound as a lower bound to avoid potentially regressing code where we don't have complete information to make the decision. Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit 8b9a0af8ca012217bf90d1dc0694f85b49ae09da)	2023-09-11 22:47:26 -05:00
Noah Goldstein	05c2893095	x86: Fix slight bug in `shared_per_thread` cache size calculation. After: ``` commit af992e7abdc9049714da76cae1e5e18bc4838fb8 Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Wed Jun 7 13:18:01 2023 -0500 x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4` ``` Split `shared` (cumulative cache size) from `shared_per_thread` (cache size per socket), the `shared_per_thread` can be slightly off from the previous calculation. Previously we added `core` even if `threads_l2` was invalid, and only used `threads_l2` to divide `core` if it was present. The changed version only included `core` if `threads_l2` was valid. This change restores the old behavior if `threads_l2` is invalid by adding the entire value of `core`. Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit 47f747217811db35854ea06741be3685e8bbd44d)	2023-09-11 22:47:26 -05:00
Noah Goldstein	b462b80b08	x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4` Current `non_temporal_threshold` set to roughly '3/4 * sizeof_L3 / ncores_per_socket'. This patch updates that value to roughly 'sizeof_L3 / 4` The original value (specifically dividing the `ncores_per_socket`) was done to limit the amount of other threads' data a `memcpy`/`memset` could evict. Dividing by 'ncores_per_socket', however leads to exceedingly low non-temporal thresholds and leads to using non-temporal stores in cases where REP MOVSB is multiple times faster. Furthermore, non-temporal stores are written directly to main memory so using it at a size much smaller than L3 can place soon to be accessed data much further away than it otherwise could be. As well, modern machines are able to detect streaming patterns (especially if REP MOVSB is used) and provide LRU hints to the memory subsystem. This in affect caps the total amount of eviction at 1/cache_associativity, far below meaningfully thrashing the entire cache. As best I can tell, the benchmarks that lead this small threshold where done comparing non-temporal stores versus standard cacheable stores. A better comparison (linked below) is to be REP MOVSB which, on the measure systems, is nearly 2x faster than non-temporal stores at the low-end of the previous threshold, and within 10% for over 100MB copies (well past even the current threshold). In cases with a low number of threads competing for bandwidth, REP MOVSB is ~2x faster up to `sizeof_L3`. The divisor of `4` is a somewhat arbitrary value. From benchmarks it seems Skylake and Icelake both prefer a divisor of `2`, but older CPUs such as Broadwell prefer something closer to `8`. This patch is meant to be followed up by another one to make the divisor cpu-specific, but in the meantime (and for easier backporting), this patch settles on `4` as a middle-ground. Benchmarks comparing non-temporal stores, REP MOVSB, and cacheable stores where done using: https://github.com/goldsteinn/memcpy-nt-benchmarks Sheets results (also available in pdf on the github): https://docs.google.com/spreadsheets/d/e/2PACX-1vS183r0rW_jRX6tG_E90m9qVuFiMbRIJvi5VAE8yYOvEOIEEc3aSNuEsrFbuXw5c3nGboxMmrupZD7K/pubhtml Reviewed-by: DJ Delorie <dj@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit af992e7abdc9049714da76cae1e5e18bc4838fb8)	2023-09-11 22:47:26 -05:00
Florian Weimer	bf4a99baed	debug: Mark libSegFault.so as NODELETE The signal handler installed in the ELF constructor cannot easily be removed again (because the program may have changed handlers in the meantime). Mark the object as NODELETE so that the registered handler function is never unloaded. Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit 23ee92deea4c99d0e6a5f48fa7b942909b123ec5)	2023-07-21 16:40:14 +02:00
Noah Goldstein	0a888ff9bd	x86: Fix wcsnlen-avx2 page cross length comparison [BZ #29591 ] Previous implementation was adjusting length (rsi) to match bytes (eax), but since there is no bound to length this can cause overflow. Fix is to just convert the byte-count (eax) to length by dividing by sizeof (wchar_t) before the comparison. Full check passes on x86-64 and build succeeds w/ and w/o multiarch. (cherry picked from commit b0969fa53a28b4ab2159806bf6c99a98999502ee)	2022-11-24 16:28:58 -08:00
Florian Weimer	0c9137a444	CVE-2022-23218: Buffer overflow in sunrpc svcunix_create (bug 28768) The sunrpc function svcunix_create suffers from a stack-based buffer overflow with overlong pathname arguments. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit f545ad4928fa1f27a3075265182b38a4f939a5f7)	2022-10-04 08:00:00 +00:00
Florian Weimer	7a9e8a984a	<shlib-compat.h>: Support compat_symbol_reference for _ISOMAC This is helpful for testing compat symbols in cases where _ISOMAC is activated implicitly due to -DMODULE_NAME=testsuite and cannot be disabled easily. (cherry picked from commit 36f6e408845c8c539128f3fb9cb132bf1845a2c8)	2022-10-04 08:00:00 +00:00
Martin Sebor	76e807f5f1	sunrpc: Test case for clnt_create "unix" buffer overflow (bug 22542) Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit ef972a4c50014a16132b5c75571cfb6b30bef136)	2022-10-04 08:00:00 +00:00
Florian Weimer	52d57fc76d	CVE-2022-23219: Buffer overflow in sunrpc clnt_create for "unix" (bug 22542) Processing an overlong pathname in the sunrpc clnt_create function results in a stack-based buffer overflow. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit 226b46770c82899b555986583294b049c6ec9b40)	2022-10-04 08:00:00 +00:00
Florian Weimer	b10d5e62a6	socket: Add the __sockaddr_un_set function Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit e368b12f6c16b6888dda99ba641e999b9c9643c8)	2022-10-04 08:00:00 +00:00
Siddhesh Poyarekar	6fd634e9b9	NEWS: Mention CVE-2020-29562 (BZ #26923 ) BZ #26923 now has a CVE entry, so add a NEWS entry for it. (cherry picked from commit 38a9e93cb1c58e3c899d638480e6d6e42af8e6fc)	2022-10-04 08:00:00 +00:00
Michael Colavita	1896ace580	iconv: Fix incorrect UCS4 inner loop bounds (BZ#26923) Previously, in UCS4 conversion routines we limit the number of characters we examine to the minimum of the number of characters in the input and the number of characters in the output. This is not the correct behavior when __GCONV_IGNORE_ERRORS is set, as we do not consume an output character when we skip a code unit. Instead, track the input and output pointers and terminate the loop when either reaches its limit. This resolves assertion failures when resetting the input buffer in a step of iconv, which assumes that the input will be fully consumed given sufficient output space. (cherry picked from commit 228edd356f03bf62dcf2b1335f25d43c602ee68d)	2022-10-04 08:00:00 +00:00
Dmitry V. Levin	cdf5ee727d	NEWS: Mention CVE-2021-35942 Add a NEWS entry for the fix that was backported by commit 27e892f6608e9d0da71884bb1422a735f6062850.	2022-10-04 08:00:00 +00:00
DJ Delorie	aa510aa276	NEWS: Mention CVE-2021-27645 (cherry picked from commit 24eb3be5db5befefe4bcf0f438bf6629a9c3a608)	2022-10-04 08:00:00 +00:00
Florian Weimer	3299ce69c5	NEWS: Mention CVE-2021-3326 (iconv assertion with ISO-20220-JP-3) (cherry picked from commit d7f4f3f5fb1275f0b3d9f4e1b3d9d7b75a5a9e26)	2022-10-04 08:00:00 +00:00
Siddhesh Poyarekar	b2229db87d	NEWS: Mention CVE-2019-25013 (cherry picked from commit 18b640c57094236e6c991ba16f87467085a1d55a)	2022-10-04 08:00:00 +00:00
Dmitry V. Levin	32022774db	NEWS: Move CVE-2021-33574 entry from 2.32 section to 2.32.1 The fix was backported by commit ff75390ef59823193351ae77584c397c503b7b58 ("Use __pthread_attr_copy in mq_notify (bug 27896)") after glibc 2.32 release.	2022-10-04 08:00:00 +00:00
Dmitry V. Levin	09c113cf00	NEWS: Move CVE-2020-27618 entry from 2.32 section to 2.32.1 The fix was backported by commit 050022910be1d1f5c11cd5168f1685ad4f9580d2 ("iconv: Accept redundant shift sequences in IBM1364 [BZ #26224]") after glibc 2.32 release.	2022-10-04 08:00:00 +00:00
Dmitry V. Levin	f6f96a16e6	NEWS: add entries for fixed bugs Add NEWS entries to the list of bugs that were fixed after glibc 2.32 release: 24973, 25399, 26383, 26690, 26798, 26831, 26926, 26988, 27024, 27068, 27256, 27398, 27462, 27471, 27476, 27511, 27609, 27655, 27896, 28011, 28033, 28064, 28213, 29304, and 29611.	2022-10-04 08:00:00 +00:00
Paul Zimmermann	ede8acfdee	Fix typos in "NEWS for version 2.32" (cherry picked from commit 4d3a77c73594c3704992f8d5b779c8be053cff35)	2022-10-04 08:00:00 +00:00
Shuo Wang	51e00fc5aa	Fix typos in NEWS file (cherry picked from commit fdb724f9032ff73310be0e51549f494a3eaa7495)	2022-10-04 08:00:00 +00:00
Sunil K Pandey	6bbc1a3a35	x86-64: Require BMI2 for avx2 functions [BZ #29611 ] This patch fixes BZ #29611	2022-09-28 18:06:04 -07:00
H.J. Lu	f9e29095fc	x86-64: Require BMI2 for strchr-avx2.S [BZ #29611 ] Since strchr-avx2.S updated by commit 1f745ecc2109890886b161d4791e1406fdfc29b8 Author: noah <goldstein.w.n@gmail.com> Date: Wed Feb 3 00:38:59 2021 -0500 x86-64: Refactor and improve performance of strchr-avx2.S uses sarx: c4 e2 72 f7 c0 sarx %ecx,%eax,%eax for strchr-avx2 family functions, require BMI2 in ifunc-impl-list.c and ifunc-avx2.h. This fixes BZ #29611. (cherry picked from commit 83c5b368226c34a2f0a5287df40fc290b2b34359)	2022-09-28 18:05:48 -07:00
Florian Weimer	51b72ac04b	elf: Call __libc_early_init for reused namespaces (bug 29528) libc_map is never reset to NULL, neither during dlclose nor on a dlopen call which reuses the namespace structure. As a result, if a namespace is reused, its libc is not initialized properly. The most visible result is a crash in the <ctype.h> functions. To prevent similar bugs on namespace reuse from surfacing, unconditionally initialize the chosen namespace to zero using memset. (cherry picked from commit d0e357ff45a75553dee3b17ed7d303bfa544f6fe)	2022-08-30 17:09:57 +02:00
Adhemerval Zanella	6f8c9dc8bb	linux: Fix mq_timereceive check for 32 bit fallback code (BZ 29304) On success, mq_receive() and mq_timedreceive() return the number of bytes in the received message, so it requires to check if the value is larger than 0. Checked on i686-linux-gnu. (cherry picked from commit 71d87d85bf54f6522813aec97c19bdd24997341e)	2022-06-30 10:46:55 -03:00
H.J. Lu	443e146ce7	NEWS: Add a bug fix entry for BZ #28896	2022-02-18 19:10:42 -08:00
Noah Goldstein	7bbad8e3cf	x86: Fix TEST_NAME to make it a string in tst-strncmp-rtm.c Previously TEST_NAME was passing a function pointer. This didn't fail because of the -Wno-error flag (to allow for overflow sizes passed to strncmp/wcsncmp) Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit b98d0bbf747f39770e0caba7e984ce9f8f900330)	2022-02-18 18:08:10 -08:00
Noah Goldstein	720263fcb8	x86: Test wcscmp RTM in the wcsncmp overflow case [BZ #28896 ] In the overflow fallback strncmp-avx2-rtm and wcsncmp-avx2-rtm would call strcmp-avx2 and wcscmp-avx2 respectively. This would have not checks around vzeroupper and would trigger spurious aborts. This commit fixes that. test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass on AVX2 machines with and without RTM. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 7835d611af0854e69a0c71e3806f8fe379282d6f)	2022-02-18 18:08:01 -08:00
Noah Goldstein	e6251366b6	x86: Fallback {str\|wcs}cmp RTM in the ncmp overflow case [BZ #28896 ] In the overflow fallback strncmp-avx2-rtm and wcsncmp-avx2-rtm would call strcmp-avx2 and wcscmp-avx2 respectively. This would have not checks around vzeroupper and would trigger spurious aborts. This commit fixes that. test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass on AVX2 machines with and without RTM. Co-authored-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit c6272098323153db373f2986c67786ea8c85f1cf)	2022-02-18 18:07:56 -08:00
H.J. Lu	0f8a239000	string: Add a testcase for wcsncmp with SIZE_MAX [BZ #28755 ] Verify that wcsncmp (L("abc"), L("abd"), SIZE_MAX) == 0. The new test fails without commit ddf0992cf57a93200e0c782e2a94d0733a5a0b87 Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Sun Jan 9 16:02:21 2022 -0600 x86: Fix __wcsncmp_avx2 in strcmp-avx2.S [BZ# 28755] and commit 7e08db3359c86c94918feb33a1182cd0ff3bb10b Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Sun Jan 9 16:02:28 2022 -0600 x86: Fix __wcsncmp_evex in strcmp-evex.S [BZ# 28755] This is for BZ #28755. Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com> (cherry picked from commit aa5a720056d37cf24924c138a3dbe6dace98e97c)	2022-02-17 11:32:14 -08:00
H.J. Lu	9d1cd8bd7a	x86-64: Test strlen and wcslen with 0 in the RSI register [BZ #28064 ] commit 6f573a27b6c8b4236445810a44660612323f5a73 Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Wed Jun 23 01:19:34 2021 -0400 x86-64: Add wcslen optimize for sse4.1 added wcsnlen-sse4.1 to the wcslen ifunc implementation list. Since the random value in the the RSI register is larger than the wide-character string length in the existing wcslen test, it didn't trigger the wcslen test failure. Add a test to force 0 into the RSI register before calling wcslen. (cherry picked from commit a6e7c3745d73ff876b4ba6991fb00768a938aef5)	2022-02-01 11:55:33 -08:00
Noah Goldstein	d528cb5165	x86: Remove wcsnlen-sse4_1 from wcslen ifunc-impl-list [BZ #28064 ] The following commit commit 6f573a27b6c8b4236445810a44660612323f5a73 Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Wed Jun 23 01:19:34 2021 -0400 x86-64: Add wcslen optimize for sse4.1 Added wcsnlen-sse4.1 to the wcslen ifunc implementation list and did not add wcslen-sse4.1 to wcslen ifunc implementation list. This commit fixes that by removing wcsnlen-sse4.1 from the wcslen ifunc implementation list and adding wcslen-sse4.1 to the ifunc implementation list. Testing: test-wcslen.c, test-rsi-wcslen.c, and test-rsi-strlen.c are passing as well as all other tests in wcsmbs and string. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 0679442defedf7e52a94264975880ab8674736b2)	2022-02-01 11:55:24 -08:00
H.J. Lu	b1fcaf14fe	x86: Black list more Intel CPUs for TSX [BZ #27398 ] Disable TSX and enable RTM_ALWAYS_ABORT for Intel CPUs listed in: https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html This fixes BZ #27398. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit 1e000d3d33211d5a954300e2a69b90f93f18a1a1)	2022-02-01 06:57:39 -08:00
H.J. Lu	77317b3b0d	x86: Check RTM_ALWAYS_ABORT for RTM [BZ #28033 ] From https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html * Intel TSX will be disabled by default. * The processor will force abort all Restricted Transactional Memory (RTM) transactions by default. * A new CPUID bit CPUID.07H.0H.EDX[11](RTM_ALWAYS_ABORT) will be enumerated, which is set to indicate to updated software that the loaded microcode is forcing RTM abort. * On processors that enumerate support for RTM, the CPUID enumeration bits for Intel TSX (CPUID.07H.0H.EBX[11] and CPUID.07H.0H.EBX[4]) continue to be set by default after microcode update. * Workloads that were benefited from Intel TSX might experience a change in performance. * System software may use a new bit in Model-Specific Register (MSR) 0x10F TSX_FORCE_ABORT[TSX_CPUID_CLEAR] functionality to clear the Hardware Lock Elision (HLE) and RTM bits to indicate to software that Intel TSX is disabled. 1. Add RTM_ALWAYS_ABORT to CPUID features. 2. Set RTM usable only if RTM_ALWAYS_ABORT isn't set. This skips the string/tst-memchr-rtm etc. testcases on the affected processors, which always fail after a microcde update. 3. Check RTM feature, instead of usability, against /proc/cpuinfo. This fixes BZ #28033. (cherry picked from commit ea8e465a6b8d0f26c72bcbe453a854de3abf68ec)	2022-02-01 06:57:15 -08:00
H.J. Lu	f996f678b9	x86-64: Require BMI2 for __strlen_evex and __strnlen_evex Since __strlen_evex and __strnlen_evex added by commit 1fd8c163a83d96ace1ff78fa6bac7aee084f6f77 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Mar 5 06:24:52 2021 -0800 x86-64: Add ifunc-avx2.h functions with 256-bit EVEX use sarx: c4 e2 6a f7 c0 sarx %edx,%eax,%eax require BMI2 for __strlen_evex and __strnlen_evex in ifunc-impl-list.c. ifunc-avx2.h already requires BMI2 for EVEX implementation. (cherry picked from commit 55bf411b451c13f0fb7ff3d3bf9a820020b45df1)	2022-01-27 15:31:17 -08:00
H.J. Lu	bee0b69a58	NEWS: Add a bug fix entry for BZ #27974	2022-01-27 14:46:15 -08:00
Noah Goldstein	63c84a82a3	String: Add overflow tests for strnlen, memchr, and strncat [BZ #27974 ] This commit adds tests for a bug in the wide char variant of the functions where the implementation may assume that maxlen for wcsnlen or n for wmemchr/strncat will not overflow when multiplied by sizeof(wchar_t). These tests show the following implementations failing on x86_64: wcsnlen-sse4_1 wcsnlen-avx2 wmemchr-sse2 wmemchr-avx2 strncat would fail as well if it where on a system that prefered either of the wcsnlen implementations that failed as it relies on wcsnlen. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit da5a6fba0febbfc90896ce1b2eb75c6d8a88a72d)	2022-01-27 14:45:39 -08:00

1 2 3 4 5 ...

36279 Commits