Intel Core i3, i5 and i7 processors have fast unaligned copy and
copy backward is ignored. Remove Fast_Copy_Backward from Intel Core
processors to avoid confusion.
* sysdeps/x86/cpu-features.c (init_cpu_features): Don't set
bit_arch_Fast_Copy_Backward for Intel Core proessors.
On AMD processors, memcpy optimized with unaligned SSE load is
slower than emcpy optimized with aligned SSSE3 while other string
functions are faster with unaligned SSE load. A feature bit,
Fast_Unaligned_Copy, is added to select memcpy optimized with
unaligned SSE load.
[BZ #19583]
* sysdeps/x86/cpu-features.c (init_cpu_features): Set
Fast_Unaligned_Copy with Fast_Unaligned_Load for Intel
processors. Set Fast_Copy_Backward for AMD Excavator
processors.
* sysdeps/x86/cpu-features.h (bit_arch_Fast_Unaligned_Copy):
New.
(index_arch_Fast_Unaligned_Copy): Likewise.
* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check
Fast_Unaligned_Copy instead of Fast_Unaligned_Load.
Since only Intel processors with AVX2 have fast unaligned load, we
should set index_arch_AVX_Fast_Unaligned_Load only for Intel processors.
Move AVX, AVX2, AVX512, FMA and FMA4 detection into get_common_indeces
and call get_common_indeces for other processors.
Add CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P to aoid loading
GLRO(dl_x86_cpu_features) in cpu-features.c.
[BZ #19583]
* sysdeps/x86/cpu-features.c (get_common_indeces): Remove
inline. Check family before setting family, model and
extended_model. Set AVX, AVX2, AVX512, FMA and FMA4 usable
bits here.
(init_cpu_features): Replace HAS_CPU_FEATURE and
HAS_ARCH_FEATURE with CPU_FEATURES_CPU_P and
CPU_FEATURES_ARCH_P. Set index_arch_AVX_Fast_Unaligned_Load
for Intel processors with usable AVX2. Call get_common_indeces
for other processors with family == NULL.
* sysdeps/x86/cpu-features.h (CPU_FEATURES_CPU_P): New macro.
(CPU_FEATURES_ARCH_P): Likewise.
(HAS_CPU_FEATURE): Use CPU_FEATURES_CPU_P.
(HAS_ARCH_FEATURE): Use CPU_FEATURES_ARCH_P.
index_* and bit_* macros are used to access cpuid and feature arrays o
struct cpu_features. It is very easy to use bits and indices of cpuid
array on feature array, especially in assembly codes. For example,
sysdeps/i386/i686/multiarch/bcopy.S has
HAS_CPU_FEATURE (Fast_Rep_String)
which should be
HAS_ARCH_FEATURE (Fast_Rep_String)
We change index_* and bit_* to index_cpu_*/index_arch_* and
bit_cpu_*/bit_arch_* so that we can catch such error at build time.
[BZ #19762]
* sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h
(EXTRA_LD_ENVVARS): Add _arch_ to index_*/bit_*.
* sysdeps/x86/cpu-features.c (init_cpu_features): Likewise.
* sysdeps/x86/cpu-features.h (bit_*): Renamed to ...
(bit_arch_*): This for feature array.
(bit_*): Renamed to ...
(bit_cpu_*): This for cpu array.
(index_*): Renamed to ...
(index_arch_*): This for feature array.
(index_*): Renamed to ...
(index_cpu_*): This for cpu array.
[__ASSEMBLER__] (HAS_FEATURE): Add and use field.
[__ASSEMBLER__] (HAS_CPU_FEATURE)): Pass cpu to HAS_FEATURE.
[__ASSEMBLER__] (HAS_ARCH_FEATURE)): Pass arch to HAS_FEATURE.
[!__ASSEMBLER__] (HAS_CPU_FEATURE): Replace index_##name and
bit_##name with index_cpu_##name and bit_cpu_##name.
[!__ASSEMBLER__] (HAS_ARCH_FEATURE): Replace index_##name and
bit_##name with index_arch_##name and bit_arch_##name.
GLIBC benchtest testcases shows SSE2_Unaligned based implementations
are performing faster compare to SSE2 based implementations for
routines: strcmp, strcat, strncat, stpcpy, stpncpy, strcpy, strncpy
and strstr. Flag index_Fast_Unaligned_Load is set for Excavator family
0x15h CPU's. This makes SSE2_Unaligned based implementations as
default for these routines.
[BZ #19467]
* sysdeps/x86/cpu-features.c (init_cpu_features): Set
index_Fast_Unaligned_Load flag for Excavator family CPUs.
It shows improvement up to 28% over AVX2 memset (performance results
attached at <https://sourceware.org/ml/libc-alpha/2015-12/msg00052.html>).
* sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S: New file.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Added new file.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c: Added new tests.
* sysdeps/x86_64/multiarch/memset.S: Added new IFUNC branch.
* sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
* sysdeps/x86/cpu-features.h (bit_Prefer_No_VZEROUPPER,
index_Prefer_No_VZEROUPPER): New.
* sysdeps/x86/cpu-features.c (init_cpu_features): Set the
Prefer_No_VZEROUPPER for Knights Landing.
Knights Landing processor is based on Silvermont. This patch enables
Silvermont optimizations for Knights Landing.
* sysdeps/x86/cpu-features.c (init_cpu_features): Enable
Silvermont optimizations for Knights Landing.
AMD CPUs uses the similar encoding scheme for extended family and model
as Intel CPUs as shown in:
http://support.amd.com/TechDocs/25481.pdf
This patch updates get_common_indeces to get family and model for both
Intel and AMD CPUs when family == 0x0f.
[BZ #19214]
* sysdeps/x86/cpu-features.c (get_common_indeces): Add an
argument to return extended model. Update family and model
with extended family and model when family == 0x0f.
(init_cpu_features): Updated.
We detect i586 and i686 features at run-time by checking CX8 and CMOV
CPUID features bits. We can use these information to select the best
implementation in ix86 multiarch. HAS_I586/HAS_I686 is true if i586/i686
instructions are available on the processor.
Due to the reordering and the other nifty extensions in i686, it is not
really good to use heavily i586 optimized code on an i686. It's better
to use i486 code if it isn't an i586. USE_I586/USE_I686 is true if
i586/i686 implementation should be used for the processor. USE_I586
is true only if i686 instructions aren't available. If i686 instructions
are available, we always choose i686 or i486 implementation, in that order,
and we never choose i586 implementation for i686-class processors.
* sysdeps/i386/init-arch.h: New file.
* sysdeps/i386/i586/init-arch.h: Likewise.
* sysdeps/i386/i686/init-arch.h: Likewise.
* sysdeps/x86/cpu-features.c (init_cpu_features): Set bit_I586
bit if CX8 is available. Set bit_I686 bit if CMOV is available.
* sysdeps/x86/cpu-features.h (bit_I586): New.
(bit_I686): Likewise.
(bit_CX8): Likewise.
(bit_CMOV): Likewise.
(index_CX8): Likewise.
(index_CMOV): Likewise.
(index_I586): Likewise.
(index_I686): Likewise.
(reg_CX8): Likewise.
(reg_CMOV): Likewise.
(HAS_I586): Defined as HAS_ARCH_FEATURE (I586) if i586 isn't
available at compile-time.
(HAS_I686): Defined as HAS_ARCH_FEATURE (I686) if i686 isn't
available at compile-time.
* sysdeps/x86/init-arch.h (USE_I586): New macro.
(USE_I686): Likewise.
cpuid, i586 and i686 instructions are available if the processor
specified by -march= supports them. We can use this information
to determine whether those instructions can be used safely.
* sysdeps/x86/cpu-features.c (init_cpu_features): Check
whether cpuid is available only if HAS_CPUID is 0.
* sysdeps/x86/cpu-features.h (HAS_CPUID): New.
(HAS_I586): Likewise.
(HAS_I686): Likewise.
Since not all i486 processors support cpuid, we call __get_cpuid_max to
check if cpuid is available before using it if not compiling for i586,
i686 nor x86-64.
* sysdeps/x86/cpu-features.c (init_cpu_features): Call
__get_cpuid_max if not compiling for i586, i686 nor x86-64.