glibc/sysdeps/x86_64/multiarch
Ling Ma 5c74e47cd6 Add x86_64 memset optimized for AVX2
In this patch we take advantage of HSW memory bandwidth, manage to
reduce miss branch prediction by avoiding using branch instructions and
force destination to be aligned with avx & avx2 instruction.

The CPU2006 403.gcc benchmark indicates this patch improves performance
from 26% to 59%.

	* sysdeps/x86_64/multiarch/Makefile: Add memset-avx2.
	* sysdeps/x86_64/multiarch/memset-avx2.S: New file.
	* sysdeps/x86_64/multiarch/memset.S: Likewise.
	* sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
	* sysdeps/x86_64/multiarch/rtld-memset.S: Likewise.
2014-06-19 15:14:08 -07:00
..
bcopy.S Use IFUNC memmove/memset in x86-64 bcopy/bzero 2012-10-11 13:58:16 -07:00
cacheinfo.c Use IFUNC on x86-64 memset 2010-11-08 03:41:34 -05:00
ifunc-defines.sym Detect if AVX2 is usable 2014-04-17 08:00:21 -07:00
ifunc-impl-list.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
init-arch.c Detect if AVX2 is usable 2014-04-17 08:00:21 -07:00
init-arch.h Fix -Wundef warning for FEATURE_INDEX_1. 2014-05-03 00:25:21 -04:00
Makefile Add x86_64 memset optimized for AVX2 2014-06-19 15:14:08 -07:00
memcmp-sse4.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
memcmp-ssse3.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
memcmp.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
memcpy_chk.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
memcpy-sse2-unaligned.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
memcpy-ssse3-back.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
memcpy-ssse3.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
memcpy.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
memmove_chk.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
memmove-ssse3-back.S Improve 64bit memcpy/memmove for Atom, Core 2 and Core i7 2010-06-30 08:26:11 -07:00
memmove-ssse3.S Improve 64bit memcpy/memmove for Atom, Core 2 and Core i7 2010-06-30 08:26:11 -07:00
memmove.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
mempcpy_chk.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
mempcpy-ssse3-back.S Improve 64bit memcpy/memmove for Atom, Core 2 and Core i7 2010-06-30 08:26:11 -07:00
mempcpy-ssse3.S Improve 64bit memcpy/memmove for Atom, Core 2 and Core i7 2010-06-30 08:26:11 -07:00
mempcpy.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
memset_chk.S Add x86_64 memset optimized for AVX2 2014-06-19 15:14:08 -07:00
memset-avx2.S Add x86_64 memset optimized for AVX2 2014-06-19 15:14:08 -07:00
memset.S Add x86_64 memset optimized for AVX2 2014-06-19 15:14:08 -07:00
rtld-memcmp.c x86-64 SSE4 optimized memcmp 2010-04-14 00:12:53 -07:00
rtld-memset.S Add x86_64 memset optimized for AVX2 2014-06-19 15:14:08 -07:00
rtld-strlen.S Make sure no code in ld.so uses xmm/ymm registers on x86-64. 2009-07-26 16:10:00 -07:00
sched_cpucount.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
stpcpy-sse2-unaligned.S Improved st{r,p}{,n}cpy for SSE2 and SSSE3 on x86-64 2011-06-24 15:14:22 -04:00
stpcpy-ssse3.S Improved st{r,p}{,n}cpy for SSE2 and SSSE3 on x86-64 2011-06-24 15:14:22 -04:00
stpcpy.S Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
stpncpy-c.c SSSE3 strcpy/stpcpy for x86-64 2009-07-02 03:39:03 -07:00
stpncpy-sse2-unaligned.S Improved st{r,p}{,n}cpy for SSE2 and SSSE3 on x86-64 2011-06-24 15:14:22 -04:00
stpncpy-ssse3.S Improved st{r,p}{,n}cpy for SSE2 and SSSE3 on x86-64 2011-06-24 15:14:22 -04:00
stpncpy.S Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
strcasecmp_l-ssse3.S Fix x86-64 build without multiarch. 2010-08-14 14:56:32 -07:00
strcasecmp_l.S Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
strcasestr.c Add strstr with unaligned loads. Fixes bug 12100. 2013-12-14 20:08:13 +01:00
strcat-sse2-unaligned.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strcat-ssse3.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strcat.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strchr-sse2-no-bsf.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strchr.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strcmp-sse2-unaligned.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strcmp-sse42.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strcmp-ssse3.S Avoid compiling unneeded file in ld.so. 2010-07-27 21:12:59 -07:00
strcmp.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strcpy-sse2-unaligned.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strcpy-ssse3.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strcpy.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strcspn-c.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strcspn.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strncase_l-ssse3.S Add optimized strncasecmp versions for x86-64. 2010-08-14 22:04:01 -07:00
strncase_l.S Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
strncat-c.c Improve 64 bit strcat functions with SSE2/SSSE3 2011-07-19 17:11:54 -04:00
strncat-sse2-unaligned.S Improve 64 bit strcat functions with SSE2/SSSE3 2011-07-19 17:11:54 -04:00
strncat-ssse3.S Improve 64 bit strcat functions with SSE2/SSSE3 2011-07-19 17:11:54 -04:00
strncat.S Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
strncmp-ssse3.S Don't define x86-64 __strncmp_ssse3 in libc.a 2012-09-27 07:43:03 -07:00
strncmp.S Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
strncpy-c.c SSSE3 strcpy/stpcpy for x86-64 2009-07-02 03:39:03 -07:00
strncpy-sse2-unaligned.S Improved st{r,p}{,n}cpy for SSE2 and SSSE3 on x86-64 2011-06-24 15:14:22 -04:00
strncpy-ssse3.S Improved st{r,p}{,n}cpy for SSE2 and SSSE3 on x86-64 2011-06-24 15:14:22 -04:00
strncpy.S Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
strpbrk-c.c Don't define __strpbrk_sse42 in static library 2010-03-24 12:16:24 -07:00
strpbrk.S Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
strspn-c.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strspn.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strstr-sse2-unaligned.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strstr.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-multiarch.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
varshift.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
varshift.h Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
Versions Add support for x86-64 fma instruction. 2009-07-29 15:26:06 -07:00
wcscpy-c.c Optimized wcschr and wcscpy for x86-64 and x86-32 2011-12-17 14:39:23 -05:00
wcscpy-ssse3.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
wcscpy.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
wmemcmp-c.c Fix more warnings 2011-12-03 21:49:35 -05:00
wmemcmp-sse4.S Optimized memcmp and wmemcmp for x86-64 and x86-32 2011-10-15 11:10:08 -04:00
wmemcmp-ssse3.S Optimized memcmp and wmemcmp for x86-64 and x86-32 2011-10-15 11:10:08 -04:00
wmemcmp.S Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00