x86-64: Add wmemset optimized with SSE2/AVX2/AVX512

Message ID CAMe9rOos2j_w+ONa77Q=JZv4M4wogXuBs_Vy79OWZ2_weyhr-A@mail.gmail.com
State New, archived
Headers

Commit Message

H.J. Lu June 7, 2017, 9:50 p.m. UTC
  On Mon, Jun 5, 2017 at 10:36 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Jun 2, 2017 at 12:45 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Wed, May 31, 2017 at 4:29 AM, Rodriguez Bahena, Victor
>> <victor.rodriguez.bahena@intel.com> wrote:
>>> +1
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: <libc-alpha-owner@sourceware.org> on behalf of "H.J. Lu"
>>> <hjl.tools@gmail.com>
>>> Date: Tuesday, May 30, 2017 at 6:41 PM
>>> To: GNU C Library <libc-alpha@sourceware.org>
>>> Subject: Re: [PATCH] x86-64: Add wmemset optimized with SSE2/AVX2/AVX512
>>>
>>>>On Sun, May 21, 2017 at 1:34 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> The difference between memset and wmemset is byte vs int.  Add stubs
>>>>> to SSE2/AVX2/AVX512 memset for wmemset with updated constant and size:
>>>>>
>>>>> SSE2 wmemset:
>>>>>         shl    $0x2,%rdx
>>>>>         movd   %esi,%xmm0
>>>>>         mov    %rdi,%rax
>>>>>         pshufd $0x0,%xmm0,%xmm0
>>>>>         jmp     entry_from_wmemset
>>>>>
>>>>> SSE2 memset:
>>>>>         movd   %esi,%xmm0
>>>>>         mov    %rdi,%rax
>>>>>         punpcklbw %xmm0,%xmm0
>>>>>         punpcklwd %xmm0,%xmm0
>>>>>         pshufd $0x0,%xmm0,%xmm0
>>>>> entry_from_wmemset:
>>>>>
>>>>> Since the ERMS versions of wmemset requires "rep stosl" instead of
>>>>> "rep stosb", only the vector store stubs of SSE2/AVX2/AVX512 wmemset
>>>>> are added.  The SSE2 wmemset is about 3X faster and the AVX2 wmemset
>>>>> is about 6X faster on Haswell.
>>>>>
>>>>> OK for master?
>>>>
>>>>Any objections?
>>>>
>>>>> H.J.
>>>>> ---
>>>>>         * include/wchar.h (__wmemset_chk): New.
>>>>>         * sysdeps/x86_64/memset.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed
>>>>>         to MEMSET_VDUP_TO_VEC0_AND_SET_RETURN.
>>>>>         (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
>>>>>         (WMEMSET_CHK_SYMBOL): Likewise.
>>>>>         (WMEMSET_SYMBOL): Likewise.
>>>>>         (__wmemset): Add hidden definition.
>>>>>         (wmemset): Add weak hidden definition.
>>>>>         * sysdeps/x86_64/multiarch/ifunc-impl-list.c
>>>>>         (__libc_ifunc_impl_list): Add __wmemset_sse2_unaligned,
>>>>>         __wmemset_avx2_unaligned, __wmemset_avx512_unaligned,
>>>>>         __wmemset_chk_sse2_unaligned, __wmemset_chk_avx2_unaligned
>>>>>         and __wmemset_chk_avx512_unaligned.
>>>>>         * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S
>>>>>         (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ...
>>>>>         (MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This.
>>>>>         (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
>>>>>         (WMEMSET_SYMBOL): Likewise.
>>>>>         * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S
>>>>>         (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ...
>>>>>         (MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This.
>>>>>         (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
>>>>>         (WMEMSET_SYMBOL): Likewise.
>>>>>         * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Updated.
>>>>>         (WMEMSET_CHK_SYMBOL): New.
>>>>>         (WMEMSET_CHK_SYMBOL (__wmemset_chk, unaligned)): Likewise.
>>>>>         (WMEMSET_SYMBOL (__wmemset, unaligned)): Likewise.
>>>>>         * sysdeps/x86_64/multiarch/memset.S (WMEMSET_SYMBOL): New.
>>>>>         (libc_hidden_builtin_def): Also define __GI_wmemset and
>>>>>         __GI___wmemset.
>>>>>         (weak_alias): New.
>>>>>         * sysdeps/x86_64/multiarch/wmemset.S: New file.
>>>>>         * sysdeps/x86_64/multiarch/wmemset_chk.S: Likewise.
>>>>>         * sysdeps/x86_64/wmemset.S: Likewise.
>>>>>         * sysdeps/x86_64/wmemset_chk.S: Likewise.
>>
>> Here is the updated patch to implement IFUNC wmemset in C.
>>
>>
>
> I will check it in today.
>

I am checking in this patch to rename wmemset.h to ifunc-wmemset.h.
  

Patch

From 5e1122827a9f9751922b25c332f8f0272de60719 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Wed, 7 Jun 2017 14:09:59 -0700
Subject: [PATCH] x86-64: Rename wmemset.h to ifunc-wmemset.h

No code changes.

	* sysdeps/x86_64/multiarch/wmemset.c: Include ifunc-wmemset.h
	instead of wmemset.h.
	* sysdeps/x86_64/multiarch/wmemset_chk.c: Likewise.
	* sysdeps/x86_64/multiarch/wmemset.h: Renamed to ...
	* sysdeps/x86_64/multiarch/ifunc-wmemset.h: This.
---
 ChangeLog                                               | 10 ++++++++++
 sysdeps/x86_64/multiarch/ifunc-impl-list.c              |  4 ++--
 sysdeps/x86_64/multiarch/{wmemset.h => ifunc-wmemset.h} |  0
 sysdeps/x86_64/multiarch/wmemset.c                      |  2 +-
 sysdeps/x86_64/multiarch/wmemset_chk.c                  |  2 +-
 5 files changed, 14 insertions(+), 4 deletions(-)
 rename sysdeps/x86_64/multiarch/{wmemset.h => ifunc-wmemset.h} (100%)

diff --git a/ChangeLog b/ChangeLog
index 88483ae..bbc21ab 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,13 @@ 
+2017-06-07  H.J. Lu  <hongjiu.lu@intel.com>
+
+	* sysdeps/x86_64/multiarch/ifunc-impl-list.c: Update
+	comments.
+	* sysdeps/x86_64/multiarch/wmemset.c: Include ifunc-wmemset.h
+	instead of wmemset.h.
+	* sysdeps/x86_64/multiarch/wmemset_chk.c: Likewise.
+	* sysdeps/x86_64/multiarch/wmemset.h: Renamed to ...
+	* sysdeps/x86_64/multiarch/ifunc-wmemset.h: This.
+
 2017-06-07  Gabriel F. T. Gomes  <gftg@linux.vnet.ibm.com>
 
 	* stdio-common/printf-parsemb.c (__parse_one_specmb): Initialize
diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
index ee4243a..5627183 100644
--- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
@@ -315,7 +315,7 @@  __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 			      __wmemcmp_ssse3)
 	      IFUNC_IMPL_ADD (array, i, wmemcmp, 1, __wmemcmp_sse2))
 
-  /* Support sysdeps/x86_64/multiarch/wmemset.S.  */
+  /* Support sysdeps/x86_64/multiarch/wmemset.c.  */
   IFUNC_IMPL (i, name, wmemset,
 	      IFUNC_IMPL_ADD (array, i, wmemset, 1,
 			      __wmemset_sse2_unaligned)
@@ -444,7 +444,7 @@  __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 			      __strncmp_ssse3)
 	      IFUNC_IMPL_ADD (array, i, strncmp, 1, __strncmp_sse2))
 
-  /* Support sysdeps/x86_64/multiarch/wmemset_chk.S.  */
+  /* Support sysdeps/x86_64/multiarch/wmemset_chk.c.  */
   IFUNC_IMPL (i, name, __wmemset_chk,
 	      IFUNC_IMPL_ADD (array, i, __wmemset_chk, 1,
 			      __wmemset_chk_sse2_unaligned)
diff --git a/sysdeps/x86_64/multiarch/wmemset.h b/sysdeps/x86_64/multiarch/ifunc-wmemset.h
similarity index 100%
rename from sysdeps/x86_64/multiarch/wmemset.h
rename to sysdeps/x86_64/multiarch/ifunc-wmemset.h
diff --git a/sysdeps/x86_64/multiarch/wmemset.c b/sysdeps/x86_64/multiarch/wmemset.c
index 61626a9..dd35be6 100644
--- a/sysdeps/x86_64/multiarch/wmemset.c
+++ b/sysdeps/x86_64/multiarch/wmemset.c
@@ -26,7 +26,7 @@ 
 # undef __wmemset
 
 # define SYMBOL_NAME wmemset
-# include "wmemset.h"
+# include "ifunc-wmemset.h"
 
 libc_ifunc_redirected (__redirect_wmemset, __wmemset, IFUNC_SELECTOR ());
 weak_alias (__wmemset, wmemset)
diff --git a/sysdeps/x86_64/multiarch/wmemset_chk.c b/sysdeps/x86_64/multiarch/wmemset_chk.c
index 2c039a5..d3ded55 100644
--- a/sysdeps/x86_64/multiarch/wmemset_chk.c
+++ b/sysdeps/x86_64/multiarch/wmemset_chk.c
@@ -24,7 +24,7 @@ 
 # undef __wmemset_chk
 
 # define SYMBOL_NAME wmemset_chk
-# include "wmemset.h"
+# include "ifunc-wmemset.h"
 
 libc_ifunc_redirected (__redirect_wmemset_chk, __wmemset_chk,
 		       IFUNC_SELECTOR ());
-- 
2.9.4