RFC: Rewrite x86-64 IFUNC selector in C

Message ID CAOVZoANe43WXzTKaOpyPUdgW0H3kxPTfw67afXg2OgPK36NDdQ@mail.gmail.com
State New, archived
Headers

Commit Message

Erich Elsen June 7, 2017, 8:39 p.m. UTC
  Here's a patch that also moves memset.S to memset.c.

On Tue, May 30, 2017 at 1:32 PM, Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
> On 30/05/2017 17:13, H.J. Lu wrote:
>> On Tue, May 30, 2017 at 12:50 PM, Adhemerval Zanella
>> <adhemerval.zanella@linaro.org> wrote:
>>>
>>>
>>> On 30/05/2017 13:24, H.J. Lu wrote:
>>>> On Mon, May 29, 2017 at 1:34 PM, Adhemerval Zanella
>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>
>>>>> I think we can simplify it further and use the already existent ifunc macros on
>>>>> libc-symbols.h.  Also, for memmove I think we can organize the code better (at
>>>>> least for ifunc) and build a extra object with a more meaningful name.  I used
>>>>> your logic for the ifunc selection and extended for memmove as well.
>>>>>
>>>> Here is the combined patch.
>>>
>>>
>>>> diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile
>>>> index 3736f54..8a6ff00 100644
>>>> --- a/sysdeps/x86_64/multiarch/Makefile
>>>> +++ b/sysdeps/x86_64/multiarch/Makefile
>>>> @@ -19,6 +19,7 @@ sysdep_routines += strncat-c stpncpy-c strncpy-c strcmp-ssse3 \
>>>>                  strchr-sse2-no-bsf memcmp-ssse3 strstr-sse2-unaligned \
>>>>                  strcspn-c strpbrk-c strspn-c varshift \
>>>>                  memset-avx512-no-vzeroupper \
>>>> +                memmove-sse2-unaligned-erms \
>>>
>>> I still think this is a misleading name file since it build not only memmove but
>>> also memcpy and mempcpy implementations.  I would prefer to rename to a more
>>> general name as in my suggestion.
>>
>> I'd like to keep it consistent with memmove-avx-unaligned-erms.S and
>> memmove-avx512-unaligned-erms.S.   As for making memcpy an alias
>> of memmove, and embedding mempcpy, __memcpy_chk,
>> __mempcpy_chk, __memmove_chk is an implementation detail.
>>
>
> Alright, at least I think it should add some comments that the object
> is building multiple symbols (it wasn't clear when I started to work
> on the patch).
  

Comments

H.J. Lu June 7, 2017, 10:21 p.m. UTC | #1
On Wed, Jun 7, 2017 at 1:39 PM, Erich Elsen <eriche@google.com> wrote:
> Here's a patch that also moves memset.S to memset.c.

I integrated your patch into hjl/ifunc/c branch.  Thanks.

BTW, have you investigated using IFUNC internally in libc,so?

> On Tue, May 30, 2017 at 1:32 PM, Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
>>
>>
>> On 30/05/2017 17:13, H.J. Lu wrote:
>>> On Tue, May 30, 2017 at 12:50 PM, Adhemerval Zanella
>>> <adhemerval.zanella@linaro.org> wrote:
>>>>
>>>>
>>>> On 30/05/2017 13:24, H.J. Lu wrote:
>>>>> On Mon, May 29, 2017 at 1:34 PM, Adhemerval Zanella
>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>>
>>>>>> I think we can simplify it further and use the already existent ifunc macros on
>>>>>> libc-symbols.h.  Also, for memmove I think we can organize the code better (at
>>>>>> least for ifunc) and build a extra object with a more meaningful name.  I used
>>>>>> your logic for the ifunc selection and extended for memmove as well.
>>>>>>
>>>>> Here is the combined patch.
>>>>
>>>>
>>>>> diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile
>>>>> index 3736f54..8a6ff00 100644
>>>>> --- a/sysdeps/x86_64/multiarch/Makefile
>>>>> +++ b/sysdeps/x86_64/multiarch/Makefile
>>>>> @@ -19,6 +19,7 @@ sysdep_routines += strncat-c stpncpy-c strncpy-c strcmp-ssse3 \
>>>>>                  strchr-sse2-no-bsf memcmp-ssse3 strstr-sse2-unaligned \
>>>>>                  strcspn-c strpbrk-c strspn-c varshift \
>>>>>                  memset-avx512-no-vzeroupper \
>>>>> +                memmove-sse2-unaligned-erms \
>>>>
>>>> I still think this is a misleading name file since it build not only memmove but
>>>> also memcpy and mempcpy implementations.  I would prefer to rename to a more
>>>> general name as in my suggestion.
>>>
>>> I'd like to keep it consistent with memmove-avx-unaligned-erms.S and
>>> memmove-avx512-unaligned-erms.S.   As for making memcpy an alias
>>> of memmove, and embedding mempcpy, __memcpy_chk,
>>> __mempcpy_chk, __memmove_chk is an implementation detail.
>>>
>>
>> Alright, at least I think it should add some comments that the object
>> is building multiple symbols (it wasn't clear when I started to work
>> on the patch).
  
H.J. Lu June 8, 2017, 3:42 a.m. UTC | #2
On Wed, Jun 7, 2017 at 3:21 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Jun 7, 2017 at 1:39 PM, Erich Elsen <eriche@google.com> wrote:
>> Here's a patch that also moves memset.S to memset.c.
>
> I integrated your patch into hjl/ifunc/c branch.  Thanks.

The memset IFUNC selector can't share with memmove since
they have different algorithms and implementations.  I fixed
in on hjl/ifunc/c branch.

> BTW, have you investigated using IFUNC internally in libc,so?
>
>> On Tue, May 30, 2017 at 1:32 PM, Adhemerval Zanella
>> <adhemerval.zanella@linaro.org> wrote:
>>>
>>>
>>> On 30/05/2017 17:13, H.J. Lu wrote:
>>>> On Tue, May 30, 2017 at 12:50 PM, Adhemerval Zanella
>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>
>>>>>
>>>>> On 30/05/2017 13:24, H.J. Lu wrote:
>>>>>> On Mon, May 29, 2017 at 1:34 PM, Adhemerval Zanella
>>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>>>
>>>>>>> I think we can simplify it further and use the already existent ifunc macros on
>>>>>>> libc-symbols.h.  Also, for memmove I think we can organize the code better (at
>>>>>>> least for ifunc) and build a extra object with a more meaningful name.  I used
>>>>>>> your logic for the ifunc selection and extended for memmove as well.
>>>>>>>
>>>>>> Here is the combined patch.
>>>>>
>>>>>
>>>>>> diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile
>>>>>> index 3736f54..8a6ff00 100644
>>>>>> --- a/sysdeps/x86_64/multiarch/Makefile
>>>>>> +++ b/sysdeps/x86_64/multiarch/Makefile
>>>>>> @@ -19,6 +19,7 @@ sysdep_routines += strncat-c stpncpy-c strncpy-c strcmp-ssse3 \
>>>>>>                  strchr-sse2-no-bsf memcmp-ssse3 strstr-sse2-unaligned \
>>>>>>                  strcspn-c strpbrk-c strspn-c varshift \
>>>>>>                  memset-avx512-no-vzeroupper \
>>>>>> +                memmove-sse2-unaligned-erms \
>>>>>
>>>>> I still think this is a misleading name file since it build not only memmove but
>>>>> also memcpy and mempcpy implementations.  I would prefer to rename to a more
>>>>> general name as in my suggestion.
>>>>
>>>> I'd like to keep it consistent with memmove-avx-unaligned-erms.S and
>>>> memmove-avx512-unaligned-erms.S.   As for making memcpy an alias
>>>> of memmove, and embedding mempcpy, __memcpy_chk,
>>>> __mempcpy_chk, __memmove_chk is an implementation detail.
>>>>
>>>
>>> Alright, at least I think it should add some comments that the object
>>> is building multiple symbols (it wasn't clear when I started to work
>>> on the patch).
>
>
>
> --
> H.J.
  

Patch

From 6655fc03ff8306b6c752113ee826de818639c953 Mon Sep 17 00:00:00 2001
From: Erich Elsen <eriche@google.com>
Date: Wed, 7 Jun 2017 13:37:11 -0700
Subject: [PATCH 1/1] move memset.S -> memset.c

---
 sysdeps/x86_64/multiarch/Makefile                  |  1 +
 sysdeps/x86_64/multiarch/ifunc-memmove.h           |  8 ++---
 sysdeps/x86_64/multiarch/memcpy.c                  |  2 +-
 sysdeps/x86_64/multiarch/memmove.c                 |  2 +-
 sysdeps/x86_64/multiarch/mempcpy.c                 |  2 +-
 .../{memset.S => memset-sse2-unaligned-erms.S}     | 37 +---------------------
 sysdeps/x86_64/multiarch/memset.c                  | 34 ++++++++++++++++++++
 7 files changed, 43 insertions(+), 43 deletions(-)
 rename sysdeps/x86_64/multiarch/{memset.S => memset-sse2-unaligned-erms.S} (64%)
 create mode 100644 sysdeps/x86_64/multiarch/memset.c

diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile
index 1fcde18c72..9d2583ad6f 100644
--- a/sysdeps/x86_64/multiarch/Makefile
+++ b/sysdeps/x86_64/multiarch/Makefile
@@ -23,6 +23,7 @@  sysdep_routines += strncat-c stpncpy-c strncpy-c strcmp-ssse3 \
 		   memmove-sse2-unaligned-erms \
 		   memmove-avx-unaligned-erms \
 		   memmove-avx512-unaligned-erms \
+		   memset-sse2-unaligned-erms \
 		   memset-avx2-unaligned-erms \
 		   memset-avx512-unaligned-erms
 CFLAGS-varshift.c += -msse4
diff --git a/sysdeps/x86_64/multiarch/ifunc-memmove.h b/sysdeps/x86_64/multiarch/ifunc-memmove.h
index 3034e33920..22c370a3c5 100644
--- a/sysdeps/x86_64/multiarch/ifunc-memmove.h
+++ b/sysdeps/x86_64/multiarch/ifunc-memmove.h
@@ -37,7 +37,7 @@  extern __typeof (REDIRECT_NAME) OPTIMIZE (avx512_no_vzeroupper)
   attribute_hidden;
 
 static inline void *
-IFUNC_SELECTOR (void)
+IFUNC_SELECTOR (bool has_avx_impl, bool has_ssse3_impl)
 {
   const struct cpu_features* cpu_features = __get_cpu_features ();
 
@@ -56,7 +56,7 @@  IFUNC_SELECTOR (void)
       return OPTIMIZE (avx512_unaligned);
     }
 
-  if (CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load))
+  if (has_avx_impl && CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load))
     {
       if (CPU_FEATURES_CPU_P (cpu_features, ERMS))
 	return OPTIMIZE (avx_unaligned_erms);
@@ -64,8 +64,8 @@  IFUNC_SELECTOR (void)
       return OPTIMIZE (avx_unaligned);
     }
 
-  if (!CPU_FEATURES_CPU_P (cpu_features, SSSE3)
-      || CPU_FEATURES_ARCH_P (cpu_features, Fast_Unaligned_Copy))
+  if (!has_ssse3_impl || (!CPU_FEATURES_CPU_P (cpu_features, SSSE3)
+      || CPU_FEATURES_ARCH_P (cpu_features, Fast_Unaligned_Copy)))
     {
       if (CPU_FEATURES_CPU_P (cpu_features, ERMS))
 	return OPTIMIZE (sse2_unaligned_erms);
diff --git a/sysdeps/x86_64/multiarch/memcpy.c b/sysdeps/x86_64/multiarch/memcpy.c
index 730168ac96..b610adcaaa 100644
--- a/sysdeps/x86_64/multiarch/memcpy.c
+++ b/sysdeps/x86_64/multiarch/memcpy.c
@@ -30,7 +30,7 @@ 
 
 extern __typeof (__redirect_memcpy) __new_memcpy;
 
-libc_ifunc (__new_memcpy, IFUNC_SELECTOR ());
+libc_ifunc (__new_memcpy, IFUNC_SELECTOR (true, true));
 
 # include <shlib-compat.h>
 versioned_symbol (libc, __new_memcpy, memcpy, GLIBC_2_14);
diff --git a/sysdeps/x86_64/multiarch/memmove.c b/sysdeps/x86_64/multiarch/memmove.c
index e2fa151d3c..74a71be185 100644
--- a/sysdeps/x86_64/multiarch/memmove.c
+++ b/sysdeps/x86_64/multiarch/memmove.c
@@ -29,6 +29,6 @@ 
 
 extern __typeof (__redirect_memmove) __libc_memmove;
 
-libc_ifunc (__libc_memmove, IFUNC_SELECTOR ());
+libc_ifunc (__libc_memmove, IFUNC_SELECTOR (true, true));
 strong_alias (__libc_memmove, memmove);
 #endif
diff --git a/sysdeps/x86_64/multiarch/mempcpy.c b/sysdeps/x86_64/multiarch/mempcpy.c
index ac3fec5613..cbc722ac20 100644
--- a/sysdeps/x86_64/multiarch/mempcpy.c
+++ b/sysdeps/x86_64/multiarch/mempcpy.c
@@ -30,6 +30,6 @@ 
 # define SYMBOL_NAME mempcpy
 # include "ifunc-memmove.h"
 
-libc_ifunc_redirected (__redirect_mempcpy, __mempcpy, IFUNC_SELECTOR ());
+libc_ifunc_redirected (__redirect_mempcpy, __mempcpy, IFUNC_SELECTOR (true, true));
 weak_alias (__mempcpy, mempcpy)
 #endif
diff --git a/sysdeps/x86_64/multiarch/memset.S b/sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S
similarity index 64%
rename from sysdeps/x86_64/multiarch/memset.S
rename to sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S
index 11f27378b0..c69cd74895 100644
--- a/sysdeps/x86_64/multiarch/memset.S
+++ b/sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S
@@ -1,4 +1,4 @@ 
-/* Multiple versions of memset
+/* memset with SSE2.
    All versions must be listed in ifunc-impl-list.c.
    Copyright (C) 2014-2017 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
@@ -21,41 +21,6 @@ 
 #include <shlib-compat.h>
 #include <init-arch.h>
 
-/* Define multiple versions only for the definition in lib.  */
-#if IS_IN (libc)
-ENTRY(memset)
-	.type	memset, @gnu_indirect_function
-	LOAD_RTLD_GLOBAL_RO_RDX
-	lea	__memset_erms(%rip), %RAX_LP
-	HAS_ARCH_FEATURE (Prefer_ERMS)
-	jnz	2f
-	lea	__memset_sse2_unaligned_erms(%rip), %RAX_LP
-	HAS_CPU_FEATURE (ERMS)
-	jnz	1f
-	lea	__memset_sse2_unaligned(%rip), %RAX_LP
-1:
-	HAS_ARCH_FEATURE (AVX2_Usable)
-	jz	2f
-	lea	__memset_avx2_unaligned_erms(%rip), %RAX_LP
-	HAS_CPU_FEATURE (ERMS)
-	jnz	L(AVX512F)
-	lea	__memset_avx2_unaligned(%rip), %RAX_LP
-L(AVX512F):
-	HAS_ARCH_FEATURE (Prefer_No_AVX512)
-	jnz	2f
-	HAS_ARCH_FEATURE (AVX512F_Usable)
-	jz	2f
-	lea	__memset_avx512_no_vzeroupper(%rip), %RAX_LP
-	HAS_ARCH_FEATURE (Prefer_No_VZEROUPPER)
-	jnz	2f
-	lea	__memset_avx512_unaligned_erms(%rip), %RAX_LP
-	HAS_CPU_FEATURE (ERMS)
-	jnz	2f
-	lea	__memset_avx512_unaligned(%rip), %RAX_LP
-2:	ret
-END(memset)
-#endif
-
 #if IS_IN (libc)
 # define MEMSET_SYMBOL(p,s)	p##_sse2_##s
 # define WMEMSET_SYMBOL(p,s)	p##_sse2_##s
diff --git a/sysdeps/x86_64/multiarch/memset.c b/sysdeps/x86_64/multiarch/memset.c
new file mode 100644
index 0000000000..5a04192837
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/memset.c
@@ -0,0 +1,34 @@ 
+/* Multiple versions of memset.
+   All versions must be listed in ifunc-impl-list.c.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Define multiple versions only for the definition in lib and for
+   DSO.  In static binaries we need memset before the initialization
+   happened.  */
+#if IS_IN (libc)
+# define memset __redirect_memset
+# include <string.h>
+# undef memset
+
+# define SYMBOL_NAME memset
+# include "ifunc-memmove.h"
+
+extern __typeof (__redirect_memset) memset;
+
+libc_ifunc (memset, IFUNC_SELECTOR (false, false));
+#endif
-- 
2.13.0.506.g27d5fe0cd-goog