x86-64: Add wmemset optimized with SSE2/AVX2/AVX512
Commit Message
On Mon, Jun 5, 2017 at 10:36 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Jun 2, 2017 at 12:45 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Wed, May 31, 2017 at 4:29 AM, Rodriguez Bahena, Victor
>> <victor.rodriguez.bahena@intel.com> wrote:
>>> +1
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: <libc-alpha-owner@sourceware.org> on behalf of "H.J. Lu"
>>> <hjl.tools@gmail.com>
>>> Date: Tuesday, May 30, 2017 at 6:41 PM
>>> To: GNU C Library <libc-alpha@sourceware.org>
>>> Subject: Re: [PATCH] x86-64: Add wmemset optimized with SSE2/AVX2/AVX512
>>>
>>>>On Sun, May 21, 2017 at 1:34 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> The difference between memset and wmemset is byte vs int. Add stubs
>>>>> to SSE2/AVX2/AVX512 memset for wmemset with updated constant and size:
>>>>>
>>>>> SSE2 wmemset:
>>>>> shl $0x2,%rdx
>>>>> movd %esi,%xmm0
>>>>> mov %rdi,%rax
>>>>> pshufd $0x0,%xmm0,%xmm0
>>>>> jmp entry_from_wmemset
>>>>>
>>>>> SSE2 memset:
>>>>> movd %esi,%xmm0
>>>>> mov %rdi,%rax
>>>>> punpcklbw %xmm0,%xmm0
>>>>> punpcklwd %xmm0,%xmm0
>>>>> pshufd $0x0,%xmm0,%xmm0
>>>>> entry_from_wmemset:
>>>>>
>>>>> Since the ERMS versions of wmemset requires "rep stosl" instead of
>>>>> "rep stosb", only the vector store stubs of SSE2/AVX2/AVX512 wmemset
>>>>> are added. The SSE2 wmemset is about 3X faster and the AVX2 wmemset
>>>>> is about 6X faster on Haswell.
>>>>>
>>>>> OK for master?
>>>>
>>>>Any objections?
>>>>
>>>>> H.J.
>>>>> ---
>>>>> * include/wchar.h (__wmemset_chk): New.
>>>>> * sysdeps/x86_64/memset.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed
>>>>> to MEMSET_VDUP_TO_VEC0_AND_SET_RETURN.
>>>>> (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
>>>>> (WMEMSET_CHK_SYMBOL): Likewise.
>>>>> (WMEMSET_SYMBOL): Likewise.
>>>>> (__wmemset): Add hidden definition.
>>>>> (wmemset): Add weak hidden definition.
>>>>> * sysdeps/x86_64/multiarch/ifunc-impl-list.c
>>>>> (__libc_ifunc_impl_list): Add __wmemset_sse2_unaligned,
>>>>> __wmemset_avx2_unaligned, __wmemset_avx512_unaligned,
>>>>> __wmemset_chk_sse2_unaligned, __wmemset_chk_avx2_unaligned
>>>>> and __wmemset_chk_avx512_unaligned.
>>>>> * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S
>>>>> (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ...
>>>>> (MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This.
>>>>> (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
>>>>> (WMEMSET_SYMBOL): Likewise.
>>>>> * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S
>>>>> (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ...
>>>>> (MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This.
>>>>> (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
>>>>> (WMEMSET_SYMBOL): Likewise.
>>>>> * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Updated.
>>>>> (WMEMSET_CHK_SYMBOL): New.
>>>>> (WMEMSET_CHK_SYMBOL (__wmemset_chk, unaligned)): Likewise.
>>>>> (WMEMSET_SYMBOL (__wmemset, unaligned)): Likewise.
>>>>> * sysdeps/x86_64/multiarch/memset.S (WMEMSET_SYMBOL): New.
>>>>> (libc_hidden_builtin_def): Also define __GI_wmemset and
>>>>> __GI___wmemset.
>>>>> (weak_alias): New.
>>>>> * sysdeps/x86_64/multiarch/wmemset.S: New file.
>>>>> * sysdeps/x86_64/multiarch/wmemset_chk.S: Likewise.
>>>>> * sysdeps/x86_64/wmemset.S: Likewise.
>>>>> * sysdeps/x86_64/wmemset_chk.S: Likewise.
>>
>> Here is the updated patch to implement IFUNC wmemset in C.
>>
>>
>
> I will check it in today.
>
I am checking in this patch to rename wmemset.h to ifunc-wmemset.h.
From 5e1122827a9f9751922b25c332f8f0272de60719 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Wed, 7 Jun 2017 14:09:59 -0700
Subject: [PATCH] x86-64: Rename wmemset.h to ifunc-wmemset.h
No code changes.
* sysdeps/x86_64/multiarch/wmemset.c: Include ifunc-wmemset.h
instead of wmemset.h.
* sysdeps/x86_64/multiarch/wmemset_chk.c: Likewise.
* sysdeps/x86_64/multiarch/wmemset.h: Renamed to ...
* sysdeps/x86_64/multiarch/ifunc-wmemset.h: This.
---
ChangeLog | 10 ++++++++++
sysdeps/x86_64/multiarch/ifunc-impl-list.c | 4 ++--
sysdeps/x86_64/multiarch/{wmemset.h => ifunc-wmemset.h} | 0
sysdeps/x86_64/multiarch/wmemset.c | 2 +-
sysdeps/x86_64/multiarch/wmemset_chk.c | 2 +-
5 files changed, 14 insertions(+), 4 deletions(-)
rename sysdeps/x86_64/multiarch/{wmemset.h => ifunc-wmemset.h} (100%)
@@ -1,3 +1,13 @@
+2017-06-07 H.J. Lu <hongjiu.lu@intel.com>
+
+ * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Update
+ comments.
+ * sysdeps/x86_64/multiarch/wmemset.c: Include ifunc-wmemset.h
+ instead of wmemset.h.
+ * sysdeps/x86_64/multiarch/wmemset_chk.c: Likewise.
+ * sysdeps/x86_64/multiarch/wmemset.h: Renamed to ...
+ * sysdeps/x86_64/multiarch/ifunc-wmemset.h: This.
+
2017-06-07 Gabriel F. T. Gomes <gftg@linux.vnet.ibm.com>
* stdio-common/printf-parsemb.c (__parse_one_specmb): Initialize
@@ -315,7 +315,7 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
__wmemcmp_ssse3)
IFUNC_IMPL_ADD (array, i, wmemcmp, 1, __wmemcmp_sse2))
- /* Support sysdeps/x86_64/multiarch/wmemset.S. */
+ /* Support sysdeps/x86_64/multiarch/wmemset.c. */
IFUNC_IMPL (i, name, wmemset,
IFUNC_IMPL_ADD (array, i, wmemset, 1,
__wmemset_sse2_unaligned)
@@ -444,7 +444,7 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
__strncmp_ssse3)
IFUNC_IMPL_ADD (array, i, strncmp, 1, __strncmp_sse2))
- /* Support sysdeps/x86_64/multiarch/wmemset_chk.S. */
+ /* Support sysdeps/x86_64/multiarch/wmemset_chk.c. */
IFUNC_IMPL (i, name, __wmemset_chk,
IFUNC_IMPL_ADD (array, i, __wmemset_chk, 1,
__wmemset_chk_sse2_unaligned)
similarity index 100%
rename from sysdeps/x86_64/multiarch/wmemset.h
rename to sysdeps/x86_64/multiarch/ifunc-wmemset.h
@@ -26,7 +26,7 @@
# undef __wmemset
# define SYMBOL_NAME wmemset
-# include "wmemset.h"
+# include "ifunc-wmemset.h"
libc_ifunc_redirected (__redirect_wmemset, __wmemset, IFUNC_SELECTOR ());
weak_alias (__wmemset, wmemset)
@@ -24,7 +24,7 @@
# undef __wmemset_chk
# define SYMBOL_NAME wmemset_chk
-# include "wmemset.h"
+# include "ifunc-wmemset.h"
libc_ifunc_redirected (__redirect_wmemset_chk, __wmemset_chk,
IFUNC_SELECTOR ());
--
2.9.4