glibc

mirror of git://sourceware.org/git/glibc.git synced 2024-11-21 01:12:26 +08:00

Author	SHA1	Message	Date
Romain Geissler	8006457ab7	Fix leak in getaddrinfo introduced by the fix for CVE-2023-4806 [BZ #30843 ] This patch fixes a very recently added leak in getaddrinfo. This was assigned CVE-2023-5156. Resolves: BZ #30884 Related: BZ #30842 Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `ec6b95c330`)	2023-09-26 15:38:49 -04:00
Siddhesh Poyarekar	e09ee267c0	getaddrinfo: Fix use after free in getcanonname (CVE-2023-4806) When an NSS plugin only implements the _gethostbyname2_r and _getcanonname_r callbacks, getaddrinfo could use memory that was freed during tmpbuf resizing, through h_name in a previous query response. The backing store for res->at->name when doing a query with gethostbyname3_r or gethostbyname2_r is tmpbuf, which is reallocated in gethosts during the query. For AF_INET6 lookup with AI_ALL \| AI_V4MAPPED, gethosts gets called twice, once for a v6 lookup and second for a v4 lookup. In this case, if the first call reallocates tmpbuf enough number of times, resulting in a malloc, th->h_name (that res->at->name refers to) ends up on a heap allocated storage in tmpbuf. Now if the second call to gethosts also causes the plugin callback to return NSS_STATUS_TRYAGAIN, tmpbuf will get freed, resulting in a UAF reference in res->at->name. This then gets dereferenced in the getcanonname_r plugin call, resulting in the use after free. Fix this by copying h_name over and freeing it at the end. This resolves BZ #30843, which is assigned CVE-2023-4806. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `973fe93a56`)	2023-09-15 20:00:04 -04:00
Siddhesh Poyarekar	cc4544ef80	gethosts: Return EAI_MEMORY on allocation failure All other cases of failures due to lack of memory return EAI_MEMORY, so it seems wrong to return EAI_SYSTEM here. The only reason convert_hostent_to_gaih_addrtuple could fail is on calloc failure. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit `b587456c0e`)	2023-09-15 19:53:24 -04:00
Siddhesh Poyarekar	92478a808f	gaih_inet: Split result generation into its own function Simplify the loop a wee bit and clean up variable names too. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit `ac4653ef50`)	2023-09-15 19:53:24 -04:00
Siddhesh Poyarekar	6e3fed9d20	gaih_inet: split loopback lookup into its own function Flatten the condition nesting and replace the alloca for RET.AT/ATR with a single array LOCAL_AT[2]. This gets rid of alloca and alloca accounting. `git diff -b` is probably the best way to view this change since much of the diff is whitespace changes. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit `657472b2a5`)	2023-09-15 19:53:24 -04:00
Siddhesh Poyarekar	4d59769087	gaih_inet: make gethosts into a function The macro is quite a pain to debug, so make gethosts into a function to make it easier to maintain. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit `cfa3bd48cb`)	2023-09-15 19:53:24 -04:00
Siddhesh Poyarekar	ec71cb9611	gaih_inet: separate nss lookup loop into its own function Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit `906cecbe08`)	2023-09-15 19:53:24 -04:00
Siddhesh Poyarekar	5914a1d55b	gaih_inet: Split nscd lookup code into its own function. Add a new member got_ipv6 to indicate if the results have an IPv6 result and use it instead of the local got_ipv6. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit `e7e5315b7f`)	2023-09-15 19:53:24 -04:00
Siddhesh Poyarekar	3b5a3e5009	gaih_inet: Split simple gethostbyname into its own function Add a free_at flag in gaih_result to indicate if res.at needs to be freed by the caller. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit `b44389cb7f`)	2023-09-15 19:53:24 -04:00
Siddhesh Poyarekar	922f2614d6	gaih_inet: make numeric lookup a separate routine Introduce the gaih_result structure and general paradigm for cleanups that follow to process the lookup request and return a result. A lookup function (like text_to_binary_address), should return an integer error code and set members of gaih_result based on what it finds. If the function does not have a result and no errors have occurred during the lookup, it should return 0 and res.at should be set to NULL, allowing a subsequent function to do the lookup until we run out of options. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit `26dea46119`)	2023-09-15 19:53:24 -04:00
Siddhesh Poyarekar	e05e5889b8	gaih_inet: Simplify service resolution Refactor the code to split out the service resolution code into a separate function. Allocate the service tuples array just once to the size of the typeproto array, thus avoiding the unnecessary pointer chasing and stack allocations. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit `8d6cf99f2f`)	2023-09-15 19:53:24 -04:00
Siddhesh Poyarekar	f7efb43738	getaddrinfo: Fix leak with AI_ALL [BZ #28852 ] Use realloc in convert_hostent_to_gaih_addrtuple and fix up pointers in the result list so that a single block is maintained for hostbyname3_r/hostbyname2_r and freed in gaih_inet. This result is never merged with any other results, since the hosts database does not permit merging. Resolves BZ #28852. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit `3004604607`)	2023-09-15 19:53:24 -04:00
Siddhesh Poyarekar	b195fd86c6	gaih_inet: Simplify canon name resolution Simplify logic for allocation of canon to remove the canonbuf variable; canon now always points to an allocated block. Also pull the canon name set into a separate function. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit `d01411f6bc`)	2023-09-15 19:53:24 -04:00
Siddhesh Poyarekar	01671608a3	gethosts: Remove unused argument _type The generated code is unchanged. (cherry picked from commit `b17e842a60`)	2023-09-15 19:53:18 -04:00
Siddhesh Poyarekar	228cdb00a0	Simplify allocations and fix merge and continue actions [BZ #28931 ] Allocations for address tuples is currently a bit confusing because of the pointer chasing through PAT, making it hard to observe the sequence in which allocations have been made. Narrow scope of the pointer chasing through PAT so that it is only used where necessary. This also tightens actions behaviour with the hosts database in getaddrinfo to comply with the manual text. The "continue" action discards previous results and the "merge" action results in an immedate lookup failure. Consequently, chaining of allocations across modules is no longer necessary, thus opening up cleanup opportunities. A test has been added that checks some combinations to ensure that they work correctly. Resolves: BZ #28931 Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit `1c37b8022e`)	2023-09-14 22:46:01 -04:00
Noah Goldstein	7a6b1f06e7	x86: Fix incorrect scope of setting `shared_per_thread` [BZ# 30745] The: ``` if (shared_per_thread > 0 && threads > 0) shared_per_thread /= threads; ``` Code was accidentally moved to inside the else scope. This doesn't match how it was previously (before `af992e7abd`). This patch fixes that by putting the division after the `else` block. (cherry picked from commit `084fb31bc2`)	2023-09-05 17:17:52 -05:00
Noah Goldstein	a07ab67a88	x86: Use `3/4*sizeof(per-thread-L3)` as low bound for NT threshold. On some machines we end up with incomplete cache information. This can make the new calculation of `sizeof(total-L3)/custom-divisor` end up lower than intended (and lower than the prior value). So reintroduce the old bound as a lower bound to avoid potentially regressing code where we don't have complete information to make the decision. Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit `8b9a0af8ca`)	2023-09-05 17:17:52 -05:00
Noah Goldstein	521afc9637	x86: Fix slight bug in `shared_per_thread` cache size calculation. After: ``` commit `af992e7abd` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Wed Jun 7 13:18:01 2023 -0500 x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4` ``` Split `shared` (cumulative cache size) from `shared_per_thread` (cache size per socket), the `shared_per_thread` can be slightly off from the previous calculation. Previously we added `core` even if `threads_l2` was invalid, and only used `threads_l2` to divide `core` if it was present. The changed version only included `core` if `threads_l2` was valid. This change restores the old behavior if `threads_l2` is invalid by adding the entire value of `core`. Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit `47f7472178`)	2023-09-05 17:17:52 -05:00
Noah Goldstein	c8c0aac68f	x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4` Current `non_temporal_threshold` set to roughly '3/4 * sizeof_L3 / ncores_per_socket'. This patch updates that value to roughly 'sizeof_L3 / 4` The original value (specifically dividing the `ncores_per_socket`) was done to limit the amount of other threads' data a `memcpy`/`memset` could evict. Dividing by 'ncores_per_socket', however leads to exceedingly low non-temporal thresholds and leads to using non-temporal stores in cases where REP MOVSB is multiple times faster. Furthermore, non-temporal stores are written directly to main memory so using it at a size much smaller than L3 can place soon to be accessed data much further away than it otherwise could be. As well, modern machines are able to detect streaming patterns (especially if REP MOVSB is used) and provide LRU hints to the memory subsystem. This in affect caps the total amount of eviction at 1/cache_associativity, far below meaningfully thrashing the entire cache. As best I can tell, the benchmarks that lead this small threshold where done comparing non-temporal stores versus standard cacheable stores. A better comparison (linked below) is to be REP MOVSB which, on the measure systems, is nearly 2x faster than non-temporal stores at the low-end of the previous threshold, and within 10% for over 100MB copies (well past even the current threshold). In cases with a low number of threads competing for bandwidth, REP MOVSB is ~2x faster up to `sizeof_L3`. The divisor of `4` is a somewhat arbitrary value. From benchmarks it seems Skylake and Icelake both prefer a divisor of `2`, but older CPUs such as Broadwell prefer something closer to `8`. This patch is meant to be followed up by another one to make the divisor cpu-specific, but in the meantime (and for easier backporting), this patch settles on `4` as a middle-ground. Benchmarks comparing non-temporal stores, REP MOVSB, and cacheable stores where done using: https://github.com/goldsteinn/memcpy-nt-benchmarks Sheets results (also available in pdf on the github): https://docs.google.com/spreadsheets/d/e/2PACX-1vS183r0rW_jRX6tG_E90m9qVuFiMbRIJvi5VAE8yYOvEOIEEc3aSNuEsrFbuXw5c3nGboxMmrupZD7K/pubhtml Reviewed-by: DJ Delorie <dj@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit `af992e7abd`)	2023-09-05 17:17:52 -05:00
H.J. Lu	1cd6626a89	__check_pf: Add a cancellation cleanup handler [BZ #20975 ] There are reports for hang in __check_pf: https://github.com/JoeDog/siege/issues/4 It is reproducible only under specific configurations: 1. Large number of cores (>= 64) and large number of threads (> 3X of the number of cores) with long lived socket connection. 2. Low power (frequency) mode. 3. Power management is enabled. While holding lock, __check_pf calls make_request which calls __sendto and __recvmsg. Since __sendto and __recvmsg are cancellation points, lock held by __check_pf won't be released and can cause deadlock when thread cancellation happens in __sendto or __recvmsg. Add a cancellation cleanup handler for __check_pf to unlock the lock when cancelled by another thread. This fixes BZ #20975 and the siege hang issue. (cherry picked from commit `a443bd3fb2`)	2023-05-23 14:53:23 -07:00
Adam Yi	567f7413fb	posix: Fix system blocks SIGCHLD erroneously [BZ #30163 ] Fix bug that SIGCHLD is erroneously blocked forever in the following scenario: 1. Thread A calls system but hasn't returned yet 2. Thread B calls another system but returns SIGCHLD would be blocked forever in thread B after its system() returns, even after the system() in thread A returns. Although POSIX does not require, glibc system implementation aims to be thread and cancellation safe. This bug was introduced in `5fb7fc9635` when we moved reverting signal mask to happen when the last concurrently running system returns, despite that signal mask is per thread. This commit reverts this logic and adds a test. Signed-off-by: Adam Yi <ayi@janestreet.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit `436a604b7d`)	2023-04-28 16:36:24 +02:00
Florian Weimer	71eb9cc1ff	x86_64: Fix asm constraints in feraiseexcept (bug 30305) The divss instruction clobbers its first argument, and the constraints need to reflect that. Fortunately, with GCC 12, generated code does not actually change, so there is no externally visible bug. Suggested-by: Jakub Jelinek <jakub@redhat.com> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `5d1ccdda7b`)	2023-04-24 16:00:13 +02:00
H.J. Lu	89c017de2f	x86: Check minimum/maximum of non_temporal_threshold [BZ #29953 ] The minimum non_temporal_threshold is 0x4040. non_temporal_threshold may be set to less than the minimum value when the shared cache size isn't available (e.g., in an emulator) or by the tunable. Add checks for minimum and maximum of non_temporal_threshold. This fixes BZ #29953. (cherry picked from commit `48b74865c6`)	2023-04-20 15:50:34 +02:00
Adhemerval Zanella	11ad405fd4	elf: Fix 64 time_t support for installed statically binaries The usage of internal static symbol for statically linked binaries does not work correctly for objects built with -D_TIME_BITS=64, since the internal definition does not provide the expected aliases. This patch makes it to use the default stat functions instead (which uses the default 64 time_t alias and types). Checked on i686-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit `9fe6f63638`)	2023-02-08 09:32:42 +01:00
H.J. Lu	c5c666f349	s_sincosf.h: Change pio4 type to float [BZ #28713 ] s_cosf.c and s_sinf.c have if (abstop12 (y) < abstop12 (pio4)) where abstop12 takes a float argument, but pio4 is static const double. pio4 is used only in calls to abstop12 and never in arithmetic. Apply -static const double pio4 = 0x1.921FB54442D18p-1; +static const float pio4 = 0x1.921FB6p-1f; to fix: FAIL: math/test-float-cos FAIL: math/test-float-sin FAIL: math/test-float-sincos FAIL: math/test-float32-cos FAIL: math/test-float32-sin FAIL: math/test-float32-sincos when compiling with GCC 12. Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr> (cherry picked from commit `d3e4f5a101`)	2023-01-11 15:29:37 +01:00
H.J. Lu	dd0c72fb46	Regenerate ulps on x86_64 with GCC 12 Fix FAIL: math/test-float-clog10 FAIL: math/test-float32-clog10 on Intel Core i7-1165G7 with GCC 12. (cherry picked from commit `de8a0897e3`)	2023-01-11 14:35:10 +01:00
Fangrui Song	bbe4bbb6e8	elf: Drop elf/tls-macros.h in favor of __thread and tls_model attributes [BZ #28152 ] [BZ #28205 ] elf/tls-macros.h was added for TLS testing when GCC did not support __thread. __thread and tls_model attributes are mature now and have been used by many newer tests. Also delete tst-tls2.c which tests .tls_common (unused by modern GCC and unsupported by Clang/LLD). .tls_common and .tbss definition are almost identical after linking, so the runtime test doesn't add additional coverage. Assembler and linker tests should be on the binutils side. When LLD 13.0.0 is allowed in configure.ac (https://sourceware.org/pipermail/libc-alpha/2021-August/129866.html), `make check` result is on par with glibc built with GNU ld on aarch64 and x86_64. As a future clean-up, TLS_GD/TLS_LD/TLS_IE/TLS_IE macros can be removed from sysdeps/*/tls-macros.h. We can add optional -mtls-dialect={gnu2,trad} tests to ensure coverage. Tested on aarch64-linux-gnu, powerpc64le-linux-gnu, and x86_64-linux-gnu. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit `33c50ef428`)	2023-01-10 19:09:21 +01:00
Alan Modra	d0e2ac0c59	elf/tst-tlsopt-powerpc fails when compiled with -mcpu=power10 (BZ# 29776) Supports pcrel addressing of TLS GOT entry. Also tweak the non-pcrel asm constraint to better reflect how the reg is used. (cherry picked from commit `94628de778`)	2023-01-10 17:33:30 +01:00
Noah Goldstein	e3255e7d21	x86: Fix wcsnlen-avx2 page cross length comparison [BZ #29591 ] Previous implementation was adjusting length (rsi) to match bytes (eax), but since there is no bound to length this can cause overflow. Fix is to just convert the byte-count (eax) to length by dividing by sizeof (wchar_t) before the comparison. Full check passes on x86-64 and build succeeds w/ and w/o multiarch. (cherry picked from commit `b0969fa53a`)	2022-11-24 14:42:41 -08:00
Vladislav Khmelevsky	691f70b84a	elf: Fix rtld-audit trampoline for aarch64 This patch fixes two problems with audit: 1. The DL_OFFSET_RV_VPCS offset was mixed up with DL_OFFSET_RG_VPCS, resulting in x2 register value nulling in RG structure. 2. We need to preserve the x8 register before function call, but don't have to save it's new value and restore it before return. Anyway the final restore was using OFFSET_RV instead of OFFSET_RG value which is wrong (althoug doesn't affect anything). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit `eb4181e9f4`)	2022-11-22 10:34:14 -03:00
Florian Weimer	d57cdc1b5a	Linux: Support __IPC_64 in sysvctl *ctl command arguments (bug 29771) Old applications pass __IPC_64 as part of the command argument because old glibc did not check for unknown commands, and passed through the arguments directly to the kernel, without adding __IPC_64. Applications need to continue doing that for old glibc compatibility, so this commit enables this approach in current glibc. For msgctl and shmctl, if no translation is required, make direct system calls, as we did before the time64 changes. If translation is required, mask __IPC_64 from the command argument. For semctl, the union-in-vararg argument handling means that translation is needed on all architectures. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit `22a46dee24`)	2022-11-11 18:06:25 +01:00
Adhemerval Zanella	ca5df79545	linux: Fix generic struct_stat for 64 bit time (BZ# 29657) The generic Linux struct_stat misses the conditionals to use bits/struct_stat_time64_helper.h in the __USE_TIME_BITS64 for architecture that uses __TIMESIZE == 32 (currently csky and nios2). Since newer ports should not support 32 bit time_t, the generic implementation should be used as default. For arm, hppa, and sh a copy of default struct_stat is added, while for csky and nios a new one based on generic is used, along with conditionals to use bits/struct_stat_time64_helper.h. The default struct_stat is also replaced with the generic one. Checked on aarch64-linux-gnu and arm-linux-gnueabihf. (cherry picked from commit `7a6ca82f80`)	2022-10-25 16:12:20 -03:00
Aurelien Jarno	e570b865b5	x86-64: Require BMI1/BMI2 for AVX2 strrchr and wcsrchr implementations The AVX2 strrchr and wcsrchr implementation uses the 'blsmsk' instruction which belongs to the BMI1 CPU feature and the 'shrx' instruction, which belongs to the BMI2 CPU feature. Fixes: `df7e295d18` ("x86: Optimize {str\|wcs}rchr-avx2") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `7e8283170c`)	2022-10-04 00:00:59 +02:00
Aurelien Jarno	36d6b9be3d	x86-64: Require BMI2 and LZCNT for AVX2 memrchr implementation The AVX2 memrchr implementation uses the 'shlxl' instruction, which belongs to the BMI2 CPU feature and uses the 'lzcnt' instruction, which belongs to the LZCNT CPU feature. Fixes: `af5306a735` ("x86: Optimize memrchr-avx2.S") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `3c0c78afab`)	2022-10-04 00:00:59 +02:00
Aurelien Jarno	94b9c1b640	x86-64: Require BMI2 for AVX2 (raw\|w)memchr implementations The AVX2 memchr, rawmemchr and wmemchr implementations use the 'bzhi' and 'sarx' instructions, which belongs to the BMI2 CPU feature. Fixes: `acfd088a19` ("x86: Optimize memchr-avx2.S") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `e3e7fab7fe`)	2022-10-04 00:00:59 +02:00
Aurelien Jarno	67e863742d	x86-64: Require BMI2 for AVX2 wcs(n)cmp implementations The AVX2 wcs(n)cmp implementations use the 'bzhi' instruction, which belongs to the BMI2 CPU feature. NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF as BSF if the CPU doesn't support TZCNT, and produces the same result for non-zero input. Partially fixes: `b77b06e0e2` ("x86: Optimize strcmp-avx2.S") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `f31a5a884e`)	2022-10-04 00:00:59 +02:00
Aurelien Jarno	b9cbb8dd48	x86-64: Require BMI2 for AVX2 strncmp implementation The AVX2 strncmp implementations uses the 'bzhi' instruction, which belongs to the BMI2 CPU feature. NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF as BSF if the CPU doesn't support TZCNT, and produces the same result for non-zero input. Partially fixes: `b77b06e0e2` ("x86: Optimize strcmp-avx2.S") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `fc7de1d9b9`)	2022-10-04 00:00:59 +02:00
Aurelien Jarno	e1561d8cf0	x86-64: Require BMI2 for AVX2 strcmp implementation The AVX2 strcmp implementation uses the 'bzhi' instruction, which belongs to the BMI2 CPU feature. NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF as BSF if the CPU doesn't support TZCNT, and produces the same result for non-zero input. Partially fixes: `b77b06e0e2` ("x86: Optimize strcmp-avx2.S") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `4d64c64457`)	2022-10-04 00:00:59 +02:00
Aurelien Jarno	414fc856ff	x86-64: Require BMI2 for AVX2 str(n)casecmp implementations The AVX2 str(n)casecmp implementations use the 'bzhi' instruction, which belongs to the BMI2 CPU feature. NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF as BSF if the CPU doesn't support TZCNT, and produces the same result for non-zero input. Partially fixes: `b77b06e0e2` ("x86: Optimize strcmp-avx2.S") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `10f79d3670`)	2022-10-04 00:00:58 +02:00
Aurelien Jarno	95f5089d4a	x86: include BMI1 and BMI2 in x86-64-v3 level The "System V Application Binary Interface AMD64 Architecture Processor Supplement" mandates the BMI1 and BMI2 CPU features for the x86-64-v3 level. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `b80f16adbd`)	2022-10-04 00:00:58 +02:00
Wangyang Guo	ea69248445	nptl: Add backoff mechanism to spinlock loop When mutiple threads waiting for lock at the same time, once lock owner releases the lock, waiters will see lock available and all try to lock, which may cause an expensive CAS storm. Binary exponential backoff with random jitter is introduced. As try-lock attempt increases, there is more likely that a larger number threads compete for adaptive mutex lock, so increase wait time in exponential. A random jitter is also added to avoid synchronous try-lock from other threads. v2: Remove read-check before try-lock for performance. v3: 1. Restore read-check since it works well in some platform. 2. Make backoff arch dependent, and enable it for x86_64. 3. Limit max backoff to reduce latency in large critical section. v4: Fix strict-prototypes error in sysdeps/nptl/pthread_mutex_backoff.h v5: Commit log updated for regression in large critical section. Result of pthread-mutex-locks bench Test Platform: Xeon 8280L (2 socket, 112 CPUs in total) First Row: thread number First Col: critical section length Values: backoff vs upstream, time based, low is better non-critical-length: 1 1 2 4 8 16 32 64 112 140 0 0.99 0.58 0.52 0.49 0.43 0.44 0.46 0.52 0.54 1 0.98 0.43 0.56 0.50 0.44 0.45 0.50 0.56 0.57 2 0.99 0.41 0.57 0.51 0.45 0.47 0.48 0.60 0.61 4 0.99 0.45 0.59 0.53 0.48 0.49 0.52 0.64 0.65 8 1.00 0.66 0.71 0.63 0.56 0.59 0.66 0.72 0.71 16 0.97 0.78 0.91 0.73 0.67 0.70 0.79 0.80 0.80 32 0.95 1.17 0.98 0.87 0.82 0.86 0.89 0.90 0.90 64 0.96 0.95 1.01 1.01 0.98 1.00 1.03 0.99 0.99 128 0.99 1.01 1.01 1.17 1.08 1.12 1.02 0.97 1.02 non-critical-length: 32 1 2 4 8 16 32 64 112 140 0 1.03 0.97 0.75 0.65 0.58 0.58 0.56 0.70 0.70 1 0.94 0.95 0.76 0.65 0.58 0.58 0.61 0.71 0.72 2 0.97 0.96 0.77 0.66 0.58 0.59 0.62 0.74 0.74 4 0.99 0.96 0.78 0.66 0.60 0.61 0.66 0.76 0.77 8 0.99 0.99 0.84 0.70 0.64 0.66 0.71 0.80 0.80 16 0.98 0.97 0.95 0.76 0.70 0.73 0.81 0.85 0.84 32 1.04 1.12 1.04 0.89 0.82 0.86 0.93 0.91 0.91 64 0.99 1.15 1.07 1.00 0.99 1.01 1.05 0.99 0.99 128 1.00 1.21 1.20 1.22 1.25 1.31 1.12 1.10 0.99 non-critical-length: 128 1 2 4 8 16 32 64 112 140 0 1.02 1.00 0.99 0.67 0.61 0.61 0.61 0.74 0.73 1 0.95 0.99 1.00 0.68 0.61 0.60 0.60 0.74 0.74 2 1.00 1.04 1.00 0.68 0.59 0.61 0.65 0.76 0.76 4 1.00 0.96 0.98 0.70 0.63 0.63 0.67 0.78 0.77 8 1.01 1.02 0.89 0.73 0.65 0.67 0.71 0.81 0.80 16 0.99 0.96 0.96 0.79 0.71 0.73 0.80 0.84 0.84 32 0.99 0.95 1.05 0.89 0.84 0.85 0.94 0.92 0.91 64 1.00 0.99 1.16 1.04 1.00 1.02 1.06 0.99 0.99 128 1.00 1.06 0.98 1.14 1.39 1.26 1.08 1.02 0.98 There is regression in large critical section. But adaptive mutex is aimed for "quick" locks. Small critical section is more common when users choose to use adaptive pthread_mutex. Signed-off-by: Wangyang Guo <wangyang.guo@intel.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `8162147872`)	2022-09-28 07:34:53 -07:00
Noah Goldstein	04efdcfac4	sysdeps: Add 'get_fast_jitter' interace in fast-jitter.h 'get_fast_jitter' is meant to be used purely for performance purposes. In all cases it's used it should be acceptable to get no randomness (see default case). An example use case is in setting jitter for retries between threads at a lock. There is a performance benefit to having jitter, but only if the jitter can be generated very quickly and ultimately there is no serious issue if no jitter is generated. The implementation generally uses 'HP_TIMING_NOW' iff it is inlined (avoid any potential syscall paths). Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `911c63a51c`)	2022-09-28 07:34:31 -07:00
Arjun Shankar	68507377f2	socket: Check lengths before advancing pointer in CMSG_NXTHDR The inline and library functions that the CMSG_NXTHDR macro may expand to increment the pointer to the header before checking the stride of the increment against available space. Since C only allows incrementing pointers to one past the end of an array, the increment must be done after a length check. This commit fixes that and includes a regression test for CMSG_FIRSTHDR and CMSG_NXTHDR. The Linux, Hurd, and generic headers are all changed. Tested on Linux on armv7hl, i686, x86_64, aarch64, ppc64le, and s390x. [BZ #28846] Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `9c443ac455`)	2022-08-22 18:59:26 +02:00
Florian Weimer	1fcc7bfee2	alpha: Fix generic brk system call emulation in __brk_call (bug 29490) The kernel special-cases the zero argument for alpha brk, and we can use that to restore the generic Linux error handling behavior. Fixes commit `b57ab258c1` ("Linux: Introduce __brk_call for invoking the brk system call"). (cherry picked from commit `e7ad26ee3c`)	2022-08-22 11:12:40 +02:00
Joseph Myers	4ab59ce4e5	Update syscall lists for Linux 5.19 Linux 5.19 has no new syscalls, but enables memfd_secret in the uapi headers for RISC-V. Update the version number in syscall-names.list to reflect that it is still current for 5.19 and regenerate the arch-syscall.h headers with build-many-glibcs.py update-syscalls. Tested with build-many-glibcs.py. (cherry picked from commit `fccadcdf5b`)	2022-08-08 07:54:02 +02:00
Joseph Myers	b991af5063	Update syscall-names.list for Linux 5.18 Linux 5.18 has no new syscalls. Update the version number in syscall-names.list to reflect that it is still current for 5.18. Tested with build-many-glibcs.py. (cherry picked from commit `3d9926663c`)	2022-07-21 17:06:27 +02:00
Noah Goldstein	ccc54bd61c	x86: Add missing IS_IN (libc) check to strncmp-sse4_2.S Was missing to for the multiarch build rtld-strncmp-sse4_2.os was being built and exporting symbols: build/glibc/string/rtld-strncmp-sse4_2.os: 0000000000000000 T __strncmp_sse42 Introduced in: commit `11ffcacb64` Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Jun 21 12:10:50 2017 -0700 x86-64: Implement strcmp family IFUNC selectors in C (cherry picked from commit `96ac447d91`)	2022-07-18 20:45:21 -07:00
Noah Goldstein	35f9c72c8b	x86: Move mem{p}{mov\|cpy}_{chk_}erms to its own file The primary memmove_{impl}_unaligned_erms implementations don't interact with this function. Putting them in same file both wastes space and unnecessarily bloats a hot code section. (cherry picked from commit `21925f6473`)	2022-07-18 20:45:21 -07:00
Noah Goldstein	7079931c51	x86: Move and slightly improve memset_erms Implementation wise: 1. Remove the VZEROUPPER as memset_{impl}_unaligned_erms does not use the L(stosb) label that was previously defined. 2. Don't give the hotpath (fallthrough) to zero size. Code positioning wise: Move memset_{chk}_erms to its own file. Leaving it in between the memset_{impl}_unaligned both adds unnecessary complexity to the file and wastes space in a relatively hot cache section. (cherry picked from commit `4a3f29e7e4`)	2022-07-18 20:45:21 -07:00
Noah Goldstein	f4598f0351	x86: Add definition for __wmemset_chk AVX2 RTM in ifunc impl list This was simply missing and meant we weren't testing it properly. (cherry picked from commit `2a1099020c`)	2022-07-18 20:45:21 -07:00

1 2 3 4 5 ...

14580 Commits