x86: Update __x86_shared_non_temporal_threshold

__x86_shared_non_temporal_threshold was set to 6 times of per-core
shared cache size, based on the large memcpy micro benchmark in glibc
on a 8-core processor.  For a processor with more than 8 cores, the
threshold is too low.  Set __x86_shared_non_temporal_threshold to the
3/4 of the total shared cache size so that it is unchanged on 8-core
processors.  On processors with less than 8 cores, the threshold is
lower.

	* sysdeps/x86/cacheinfo.c (__x86_shared_non_temporal_threshold):
	Set to the 3/4 of the total shared cache size.
This commit is contained in:
H.J. Lu 2017-06-02 17:32:21 -07:00
parent 3e6def237a
commit 808fd9e6fe
2 changed files with 9 additions and 2 deletions

View File

@ -1,3 +1,8 @@
2017-06-02 H.J. Lu <hongjiu.lu@intel.com>
* sysdeps/x86/cacheinfo.c (__x86_shared_non_temporal_threshold):
Set to the 3/4 of the total shared cache size.
2017-06-02 Rical Jasan <ricaljasan@pacific.net> 2017-06-02 Rical Jasan <ricaljasan@pacific.net>
* manual/errno.texi: Remove redundant error strings. * manual/errno.texi: Remove redundant error strings.

View File

@ -767,8 +767,10 @@ intel_bug_no_cache_info:
/* The large memcpy micro benchmark in glibc shows that 6 times of /* The large memcpy micro benchmark in glibc shows that 6 times of
shared cache size is the approximate value above which non-temporal shared cache size is the approximate value above which non-temporal
store becomes faster. */ store becomes faster on a 8-core processor. This is the 3/4 of the
__x86_shared_non_temporal_threshold = __x86_shared_cache_size * 6; total shared cache size. */
__x86_shared_non_temporal_threshold
= __x86_shared_cache_size * threads * 3 / 4;
} }
#endif #endif