glibc/sysdeps/x86_64/multiarch
H.J. Lu 88b57b8ed4 Add x86-64 memmove with unaligned load/store and rep movsb
Implement x86-64 memmove with unaligned load/store and rep movsb.
Support 16-byte, 32-byte and 64-byte vector register sizes.  When
size <= 8 times of vector register size, there is no check for
address overlap bewteen source and destination.  Since overhead for
overlap check is small when size > 8 times of vector register size,
memcpy is an alias of memmove.

A single file provides 2 implementations of memmove, one with rep movsb
and the other without rep movsb.  They share the same codes when size is
between 2 times of vector register size and REP_MOVSB_THRESHOLD which
is 2KB for 16-byte vector register size and scaled up by large vector
register size.

Key features:

1. Use overlapping load and store to avoid branch.
2. For size <= 8 times of vector register size, load  all sources into
registers and store them together.
3. If there is no address overlap bewteen source and destination, copy
from both ends with 4 times of vector register size at a time.
4. If address of destination > address of source, backward copy 8 times
of vector register size at a time.
5. Otherwise, forward copy 8 times of vector register size at a time.
6. Use rep movsb only for forward copy.  Avoid slow backward rep movsb
by fallbacking to backward copy 8 times of vector register size at a
time.
7. Skip when address of destination == address of source.

	[BZ #19776]
	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	memmove-sse2-unaligned-erms, memmove-avx-unaligned-erms and
	memmove-avx512-unaligned-erms.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Test
	__memmove_chk_avx512_unaligned_2,
	__memmove_chk_avx512_unaligned_erms,
	__memmove_chk_avx_unaligned_2, __memmove_chk_avx_unaligned_erms,
	__memmove_chk_sse2_unaligned_2,
	__memmove_chk_sse2_unaligned_erms, __memmove_avx_unaligned_2,
	__memmove_avx_unaligned_erms, __memmove_avx512_unaligned_2,
	__memmove_avx512_unaligned_erms, __memmove_erms,
	__memmove_sse2_unaligned_2, __memmove_sse2_unaligned_erms,
	__memcpy_chk_avx512_unaligned_2,
	__memcpy_chk_avx512_unaligned_erms,
	__memcpy_chk_avx_unaligned_2, __memcpy_chk_avx_unaligned_erms,
	__memcpy_chk_sse2_unaligned_2, __memcpy_chk_sse2_unaligned_erms,
	__memcpy_avx_unaligned_2, __memcpy_avx_unaligned_erms,
	__memcpy_avx512_unaligned_2, __memcpy_avx512_unaligned_erms,
	__memcpy_sse2_unaligned_2, __memcpy_sse2_unaligned_erms,
	__memcpy_erms, __mempcpy_chk_avx512_unaligned_2,
	__mempcpy_chk_avx512_unaligned_erms,
	__mempcpy_chk_avx_unaligned_2, __mempcpy_chk_avx_unaligned_erms,
	__mempcpy_chk_sse2_unaligned_2, __mempcpy_chk_sse2_unaligned_erms,
	__mempcpy_avx512_unaligned_2, __mempcpy_avx512_unaligned_erms,
	__mempcpy_avx_unaligned_2, __mempcpy_avx_unaligned_erms,
	__mempcpy_sse2_unaligned_2, __mempcpy_sse2_unaligned_erms and
	__mempcpy_erms.
	* sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: New
	file.
	* sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S:
	Likwise.
	* sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S:
	Likwise.
	* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:
	Likwise.
2016-03-31 10:04:40 -07:00
..
bcopy.S
ifunc-defines.sym Add _dl_x86_cpu_features to rtld_global 2015-08-13 03:41:22 -07:00
ifunc-impl-list.c Add x86-64 memmove with unaligned load/store and rep movsb 2016-03-31 10:04:40 -07:00
Makefile Add x86-64 memmove with unaligned load/store and rep movsb 2016-03-31 10:04:40 -07:00
memcmp-sse4.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
memcmp-ssse3.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
memcmp.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
memcpy_chk.S Fixed typos in __memcpy_chk. 2016-01-16 14:42:26 +03:00
memcpy-avx-unaligned.S Implement x86-64 multiarch mempcpy in memcpy 2016-03-28 13:13:51 -07:00
memcpy-sse2-unaligned.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
memcpy-ssse3-back.S Implement x86-64 multiarch mempcpy in memcpy 2016-03-28 13:13:51 -07:00
memcpy-ssse3.S Implement x86-64 multiarch mempcpy in memcpy 2016-03-28 13:13:51 -07:00
memcpy.S [x86] Add a feature bit: Fast_Unaligned_Copy 2016-03-28 04:40:03 -07:00
memmove_chk.c Added memcpy/memmove family optimized with AVX512 for KNL hardware. 2016-01-16 00:49:45 +03:00
memmove-avx512-no-vzeroupper.S Make __memcpy_avx512_no_vzeroupper an alias 2016-03-28 13:16:22 -07:00
memmove-avx512-unaligned-erms.S Add x86-64 memmove with unaligned load/store and rep movsb 2016-03-31 10:04:40 -07:00
memmove-avx-unaligned-erms.S Add x86-64 memmove with unaligned load/store and rep movsb 2016-03-31 10:04:40 -07:00
memmove-avx-unaligned.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
memmove-sse2-unaligned-erms.S Add x86-64 memmove with unaligned load/store and rep movsb 2016-03-31 10:04:40 -07:00
memmove-ssse3-back.S
memmove-ssse3.S
memmove-vec-unaligned-erms.S Add x86-64 memmove with unaligned load/store and rep movsb 2016-03-31 10:04:40 -07:00
memmove.c Added memcpy/memmove family optimized with AVX512 for KNL hardware. 2016-01-16 00:49:45 +03:00
mempcpy_chk.S Added memcpy/memmove family optimized with AVX512 for KNL hardware. 2016-01-16 00:49:45 +03:00
mempcpy.S Added memcpy/memmove family optimized with AVX512 for KNL hardware. 2016-01-16 00:49:45 +03:00
memset_chk.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
memset-avx2.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
memset-avx512-no-vzeroupper.S Group AVX512 functions in .text.avx512 section 2016-03-06 16:48:11 -08:00
memset.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
sched_cpucount.c Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
stpcpy-sse2-unaligned.S
stpcpy-ssse3.S
stpcpy.S
stpncpy-c.c
stpncpy-sse2-unaligned.S
stpncpy-ssse3.S
stpncpy.S
strcasecmp_l-ssse3.S
strcasecmp_l.S
strcat-sse2-unaligned.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
strcat-ssse3.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
strcat.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
strchr-sse2-no-bsf.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
strchr.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
strcmp-sse2-unaligned.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
strcmp-sse42.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
strcmp-ssse3.S
strcmp.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
strcpy-sse2-unaligned.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
strcpy-ssse3.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
strcpy.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
strcspn-c.c Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
strcspn.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
strncase_l-ssse3.S
strncase_l.S
strncat-c.c
strncat-sse2-unaligned.S
strncat-ssse3.S
strncat.S
strncmp-ssse3.S
strncmp.S
strncpy-c.c
strncpy-sse2-unaligned.S
strncpy-ssse3.S
strncpy.S
strpbrk-c.c
strpbrk.S
strspn-c.c Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
strspn.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
strstr-sse2-unaligned.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
strstr.c Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
test-multiarch.c Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
varshift.c Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
varshift.h Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
wcscpy-c.c
wcscpy-ssse3.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
wcscpy.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
wmemcmp-c.c
wmemcmp-sse4.S
wmemcmp-ssse3.S
wmemcmp.S Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00