H.J. Lu
ef9c4cb6c7
x86-64: Optimize wmemset with SSE2/AVX2/AVX512
The difference between memset and wmemset is byte vs int. Add stubs
to SSE2/AVX2/AVX512 memset for wmemset with updated constant and size:
SSE2 wmemset:
shl $0x2,%rdx
movd %esi,%xmm0
mov %rdi,%rax
pshufd $0x0,%xmm0,%xmm0
jmp entry_from_wmemset
SSE2 memset:
movd %esi,%xmm0
mov %rdi,%rax
punpcklbw %xmm0,%xmm0
punpcklwd %xmm0,%xmm0
pshufd $0x0,%xmm0,%xmm0
entry_from_wmemset:
Since the ERMS versions of wmemset requires "rep stosl" instead of
"rep stosb", only the vector store stubs of SSE2/AVX2/AVX512 wmemset
are added. The SSE2 wmemset is about 3X faster and the AVX2 wmemset
is about 6X faster on Haswell.
* include/wchar.h (__wmemset_chk): New.
* sysdeps/x86_64/memset.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed
to MEMSET_VDUP_TO_VEC0_AND_SET_RETURN.
(WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
(WMEMSET_CHK_SYMBOL): Likewise.
(WMEMSET_SYMBOL): Likewise.
(__wmemset): Add hidden definition.
(wmemset): Add weak hidden definition.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
wmemset_chk-nonshared.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Add __wmemset_sse2_unaligned,
__wmemset_avx2_unaligned, __wmemset_avx512_unaligned,
__wmemset_chk_sse2_unaligned, __wmemset_chk_avx2_unaligned
and __wmemset_chk_avx512_unaligned.
* sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S
(VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ...
(MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This.
(WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
(WMEMSET_SYMBOL): Likewise.
* sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S
(VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ...
(MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This.
(WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
(WMEMSET_SYMBOL): Likewise.
* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Updated.
(WMEMSET_CHK_SYMBOL): New.
(WMEMSET_CHK_SYMBOL (__wmemset_chk, unaligned)): Likewise.
(WMEMSET_SYMBOL (__wmemset, unaligned)): Likewise.
* sysdeps/x86_64/multiarch/memset.S (WMEMSET_SYMBOL): New.
(libc_hidden_builtin_def): Also define __GI_wmemset and
__GI___wmemset.
(weak_alias): New.
* sysdeps/x86_64/multiarch/wmemset.c: New file.
* sysdeps/x86_64/multiarch/wmemset.h: Likewise.
* sysdeps/x86_64/multiarch/wmemset_chk-nonshared.S: Likewise.
* sysdeps/x86_64/multiarch/wmemset_chk.c: Likewise.
* sysdeps/x86_64/wmemset.c: Likewise.
* sysdeps/x86_64/wmemset_chk.c: Likewise.
2017-06-05 11:09:59 -07:00
..
2017-04-04 20:56:23 +02:00
2017-05-20 19:04:43 -04:00
2017-01-01 00:14:16 +00:00
2014-12-16 18:18:49 +00:00
2016-09-23 08:43:56 -04:00
2017-01-01 00:14:16 +00:00
2016-12-27 16:44:15 +01:00
2017-03-21 15:14:27 +01:00
2017-05-31 17:35:46 -03:00
2012-02-25 23:18:39 -05:00
2016-09-23 08:43:56 -04:00
2016-09-23 08:43:56 -04:00
2016-09-23 08:43:56 -04:00
2014-11-24 15:03:45 +05:30
2017-01-01 00:14:16 +00:00
2017-01-01 00:14:16 +00:00
2017-05-15 10:23:28 -03:00
2014-12-11 21:41:30 +00:00
2016-10-28 22:40:16 -04:00
2015-09-03 20:33:46 +00:00
2015-10-15 14:15:41 -07:00
2015-10-15 14:13:50 -07:00
2016-09-23 08:43:56 -04:00
2016-09-23 08:43:56 -04:00
2016-09-23 08:43:56 -04:00
2017-05-11 19:27:59 -04:00
2016-09-23 08:43:56 -04:00
2017-04-18 14:56:51 +02:00
2017-05-09 11:40:28 -03:00
2015-09-15 20:36:50 +00:00
2017-05-15 10:23:28 -03:00
2014-12-11 21:41:30 +00:00
2012-02-25 23:18:39 -05:00
2016-09-23 08:43:56 -04:00
2017-04-07 07:45:53 -04:00
2012-02-25 23:18:39 -05:00
2017-01-01 00:14:16 +00:00
2016-04-29 22:18:21 -04:00
2012-02-25 23:18:39 -05:00
2016-09-23 08:43:56 -04:00
2016-09-23 08:43:56 -04:00
2017-01-01 00:14:16 +00:00
2017-01-01 00:14:16 +00:00
2012-02-26 21:32:56 -05:00
2017-02-25 09:59:46 -05:00
2017-03-01 20:33:46 -05:00
2017-03-01 20:33:46 -05:00
2017-05-11 19:27:59 -04:00
2016-09-23 08:43:56 -04:00
2017-05-11 19:14:11 -04:00
2017-01-01 00:14:16 +00:00
2017-01-01 00:14:16 +00:00
2017-05-20 19:01:46 -04:00
2017-05-20 19:01:46 -04:00
2012-02-25 23:18:39 -05:00
2016-09-23 08:43:56 -04:00
2017-05-15 10:23:28 -03:00
2016-09-23 08:43:56 -04:00
2016-09-23 08:43:56 -04:00
2012-02-26 21:32:56 -05:00
2015-06-17 20:19:04 +00:00
2017-05-04 20:36:42 +00:00
2016-09-23 08:43:56 -04:00
2016-09-23 08:43:56 -04:00
2016-09-23 08:43:56 -04:00
2015-06-17 20:16:56 +00:00
2016-09-23 08:43:56 -04:00
2015-10-02 11:34:13 +02:00
2012-02-25 23:18:39 -05:00
2017-04-13 13:09:38 +02:00
2017-01-01 00:14:16 +00:00
2016-03-07 11:53:47 +07:00
2017-01-01 00:14:16 +00:00
2015-06-17 20:11:58 +00:00
2017-01-01 00:14:16 +00:00
2015-10-15 14:22:25 -07:00
2016-09-23 08:43:56 -04:00
2017-01-01 00:14:16 +00:00
2017-06-05 10:17:46 +00:00
2017-01-01 00:14:16 +00:00
2017-01-01 00:14:16 +00:00
2017-02-21 06:30:38 -05:00
2016-09-23 08:43:56 -04:00
2017-05-11 19:27:59 -04:00
2017-05-30 18:27:57 -03:00
2017-05-11 19:27:59 -04:00
2012-02-25 23:18:39 -05:00
2012-02-25 23:18:39 -05:00
2017-05-11 19:27:59 -04:00
2016-09-23 08:43:56 -04:00
2012-02-25 23:18:39 -05:00
2012-02-25 23:18:39 -05:00
2017-05-11 19:27:59 -04:00
2012-02-25 23:18:39 -05:00
2016-09-23 08:43:56 -04:00
2017-01-01 00:14:16 +00:00
2017-06-05 11:09:59 -07:00
2015-06-05 20:04:47 +00:00
2012-02-25 23:18:39 -05:00