x86: Do not prefer ERMS for memset on Zen3+

For AMD Zen3+ architecture, the performance of the vectorized loop is
slightly better than ERMS.

Checked on x86_64-linux-gnu on Zen3.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

(cherry picked from commit 272708884c)
This commit is contained in:
Adhemerval Zanella 2024-02-08 10:08:39 -03:00 committed by Arjun Shankar
parent aa4249266e
commit 6484a92698

View File

@ -1021,6 +1021,11 @@ dl_init_cacheinfo (struct cpu_features *cpu_features)
minimum value is fixed. */
rep_stosb_threshold = TUNABLE_GET (x86_rep_stosb_threshold,
long int, NULL);
if (cpu_features->basic.kind == arch_kind_amd
&& !TUNABLE_IS_INITIALIZED (x86_rep_stosb_threshold))
/* For AMD Zen3+ architecture, the performance of the vectorized loop is
slightly better than ERMS. */
rep_stosb_threshold = SIZE_MAX;
TUNABLE_SET_WITH_BOUNDS (x86_data_cache_size, data, 0, SIZE_MAX);
TUNABLE_SET_WITH_BOUNDS (x86_shared_cache_size, shared, 0, SIZE_MAX);