[x86_64] Update memcpy, mempcpy and memmove selection order for Excavator CPU BZ #19583

Message ID CAMe9rOqpbUF2m40pxjyr+O8pSrA9EmUgNsOcBPuP-wDaMqn+RQ@mail.gmail.com
State New, archived
Headers

Commit Message

H.J. Lu March 18, 2016, 1:51 p.m. UTC
  On Fri, Mar 18, 2016 at 6:22 AM, Pawar, Amit <Amit.Pawar@amd.com> wrote:
>>No, it isn't fixed.  Avoid_AVX_Fast_Unaligned_Load should disable __memcpy_avx_unaligned and nothing more.  Also you need to fix ALL selections.
>
> diff --git a/sysdeps/x86_64/multiarch/memcpy.S b/sysdeps/x86_64/multiarch/memcpy.S
> index 8882590..a5afaf4 100644
> --- a/sysdeps/x86_64/multiarch/memcpy.S
> +++ b/sysdeps/x86_64/multiarch/memcpy.S
> @@ -39,6 +39,8 @@ ENTRY(__new_memcpy)
>         ret
>  #endif
>  1:     lea     __memcpy_avx_unaligned(%rip), %RAX_LP
> +       HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load)
> +       jnz     3f
>         HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load)
>         jnz     2f
>         lea     __memcpy_sse2_unaligned(%rip), %RAX_LP
> @@ -52,6 +54,8 @@ ENTRY(__new_memcpy)
>         jnz     2f
>         lea     __memcpy_ssse3(%rip), %RAX_LP
>  2:     ret
> +3:     lea     __memcpy_ssse3(%rip), %RAX_LP
> +       ret
>  END(__new_memcpy)
>
>  # undef ENTRY
>
> Will update all IFUNC's if this ok else please suggest.
>

Better, but not OK.  Try something like

iff --git a/sysdeps/x86_64/multiarch/memcpy.S
b/sysdeps/x86_64/multiarch/memcpy.S
index ab5998c..2abe2fd 100644
  

Comments

Adhemerval Zanella March 18, 2016, 1:55 p.m. UTC | #1
On 18-03-2016 10:51, H.J. Lu wrote:
> On Fri, Mar 18, 2016 at 6:22 AM, Pawar, Amit <Amit.Pawar@amd.com> wrote:
>>> No, it isn't fixed.  Avoid_AVX_Fast_Unaligned_Load should disable __memcpy_avx_unaligned and nothing more.  Also you need to fix ALL selections.
>>
>> diff --git a/sysdeps/x86_64/multiarch/memcpy.S b/sysdeps/x86_64/multiarch/memcpy.S
>> index 8882590..a5afaf4 100644
>> --- a/sysdeps/x86_64/multiarch/memcpy.S
>> +++ b/sysdeps/x86_64/multiarch/memcpy.S
>> @@ -39,6 +39,8 @@ ENTRY(__new_memcpy)
>>         ret
>>  #endif
>>  1:     lea     __memcpy_avx_unaligned(%rip), %RAX_LP
>> +       HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load)
>> +       jnz     3f
>>         HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load)
>>         jnz     2f
>>         lea     __memcpy_sse2_unaligned(%rip), %RAX_LP
>> @@ -52,6 +54,8 @@ ENTRY(__new_memcpy)
>>         jnz     2f
>>         lea     __memcpy_ssse3(%rip), %RAX_LP
>>  2:     ret
>> +3:     lea     __memcpy_ssse3(%rip), %RAX_LP
>> +       ret
>>  END(__new_memcpy)
>>
>>  # undef ENTRY
>>
>> Will update all IFUNC's if this ok else please suggest.
>>
> 
> Better, but not OK.  Try something like
> 
> iff --git a/sysdeps/x86_64/multiarch/memcpy.S
> b/sysdeps/x86_64/multiarch/memcpy.S
> index ab5998c..2abe2fd 100644
> --- a/sysdeps/x86_64/multiarch/memcpy.S
> +++ b/sysdeps/x86_64/multiarch/memcpy.S
> @@ -42,9 +42,11 @@ ENTRY(__new_memcpy)
>    ret
>  #endif
>  1:   lea   __memcpy_avx_unaligned(%rip), %RAX_LP
> +  HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load)
> +  jnz   3f
>    HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load)
>    jnz   2f
> -  lea   __memcpy_sse2_unaligned(%rip), %RAX_LP
> +3:   lea   __memcpy_sse2_unaligned(%rip), %RAX_LP
>    HAS_ARCH_FEATURE (Fast_Unaligned_Load)
>    jnz   2f
>    lea   __memcpy_sse2(%rip), %RAX_LP
> 
> 

I know this is not related to this patch, but any reason to not code the
resolver using the libc_ifunc macros?
  
H.J. Lu March 18, 2016, 2:42 p.m. UTC | #2
On Fri, Mar 18, 2016 at 6:55 AM, Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
> On 18-03-2016 10:51, H.J. Lu wrote:
>> On Fri, Mar 18, 2016 at 6:22 AM, Pawar, Amit <Amit.Pawar@amd.com> wrote:
>>>> No, it isn't fixed.  Avoid_AVX_Fast_Unaligned_Load should disable __memcpy_avx_unaligned and nothing more.  Also you need to fix ALL selections.
>>>
>>> diff --git a/sysdeps/x86_64/multiarch/memcpy.S b/sysdeps/x86_64/multiarch/memcpy.S
>>> index 8882590..a5afaf4 100644
>>> --- a/sysdeps/x86_64/multiarch/memcpy.S
>>> +++ b/sysdeps/x86_64/multiarch/memcpy.S
>>> @@ -39,6 +39,8 @@ ENTRY(__new_memcpy)
>>>         ret
>>>  #endif
>>>  1:     lea     __memcpy_avx_unaligned(%rip), %RAX_LP
>>> +       HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load)
>>> +       jnz     3f
>>>         HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load)
>>>         jnz     2f
>>>         lea     __memcpy_sse2_unaligned(%rip), %RAX_LP
>>> @@ -52,6 +54,8 @@ ENTRY(__new_memcpy)
>>>         jnz     2f
>>>         lea     __memcpy_ssse3(%rip), %RAX_LP
>>>  2:     ret
>>> +3:     lea     __memcpy_ssse3(%rip), %RAX_LP
>>> +       ret
>>>  END(__new_memcpy)
>>>
>>>  # undef ENTRY
>>>
>>> Will update all IFUNC's if this ok else please suggest.
>>>
>>
>> Better, but not OK.  Try something like
>>
>> iff --git a/sysdeps/x86_64/multiarch/memcpy.S
>> b/sysdeps/x86_64/multiarch/memcpy.S
>> index ab5998c..2abe2fd 100644
>> --- a/sysdeps/x86_64/multiarch/memcpy.S
>> +++ b/sysdeps/x86_64/multiarch/memcpy.S
>> @@ -42,9 +42,11 @@ ENTRY(__new_memcpy)
>>    ret
>>  #endif
>>  1:   lea   __memcpy_avx_unaligned(%rip), %RAX_LP
>> +  HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load)
>> +  jnz   3f
>>    HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load)
>>    jnz   2f
>> -  lea   __memcpy_sse2_unaligned(%rip), %RAX_LP
>> +3:   lea   __memcpy_sse2_unaligned(%rip), %RAX_LP
>>    HAS_ARCH_FEATURE (Fast_Unaligned_Load)
>>    jnz   2f
>>    lea   __memcpy_sse2(%rip), %RAX_LP
>>
>>
>
> I know this is not related to this patch, but any reason to not code the
> resolver using the libc_ifunc macros?

Did you mean writing them in C?  It can be done.  Someone
needs to write patches.
  
H.J. Lu March 18, 2016, 2:45 p.m. UTC | #3
On Fri, Mar 18, 2016 at 6:51 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Mar 18, 2016 at 6:22 AM, Pawar, Amit <Amit.Pawar@amd.com> wrote:
>>>No, it isn't fixed.  Avoid_AVX_Fast_Unaligned_Load should disable __memcpy_avx_unaligned and nothing more.  Also you need to fix ALL selections.
>>
>> diff --git a/sysdeps/x86_64/multiarch/memcpy.S b/sysdeps/x86_64/multiarch/memcpy.S
>> index 8882590..a5afaf4 100644
>> --- a/sysdeps/x86_64/multiarch/memcpy.S
>> +++ b/sysdeps/x86_64/multiarch/memcpy.S
>> @@ -39,6 +39,8 @@ ENTRY(__new_memcpy)
>>         ret
>>  #endif
>>  1:     lea     __memcpy_avx_unaligned(%rip), %RAX_LP
>> +       HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load)
>> +       jnz     3f
>>         HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load)
>>         jnz     2f
>>         lea     __memcpy_sse2_unaligned(%rip), %RAX_LP
>> @@ -52,6 +54,8 @@ ENTRY(__new_memcpy)
>>         jnz     2f
>>         lea     __memcpy_ssse3(%rip), %RAX_LP
>>  2:     ret
>> +3:     lea     __memcpy_ssse3(%rip), %RAX_LP
>> +       ret
>>  END(__new_memcpy)
>>
>>  # undef ENTRY
>>
>> Will update all IFUNC's if this ok else please suggest.
>>
>
> Better, but not OK.  Try something like
>
> iff --git a/sysdeps/x86_64/multiarch/memcpy.S
> b/sysdeps/x86_64/multiarch/memcpy.S
> index ab5998c..2abe2fd 100644
> --- a/sysdeps/x86_64/multiarch/memcpy.S
> +++ b/sysdeps/x86_64/multiarch/memcpy.S
> @@ -42,9 +42,11 @@ ENTRY(__new_memcpy)
>    ret
>  #endif
>  1:   lea   __memcpy_avx_unaligned(%rip), %RAX_LP
> +  HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load)
> +  jnz   3f
>    HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load)
>    jnz   2f
> -  lea   __memcpy_sse2_unaligned(%rip), %RAX_LP
> +3:   lea   __memcpy_sse2_unaligned(%rip), %RAX_LP
>    HAS_ARCH_FEATURE (Fast_Unaligned_Load)
>    jnz   2f
>    lea   __memcpy_sse2(%rip), %RAX_LP
>

One question.  If  you don't want __memcpy_avx_unaligned,
why do you set AVX_Fast_Unaligned_Load?
  
Pawar, Amit March 18, 2016, 3:19 p.m. UTC | #4
>One question.  If  you don't want __memcpy_avx_unaligned, why do you set AVX_Fast_Unaligned_Load?

Any idea whether currently any other string and memory functions are under implementation based on this? If not then let me just verify it. 

Also this feature is enabled in generic code. To disable it, need to change after this.

--Amit Pawar
  
H.J. Lu March 18, 2016, 3:24 p.m. UTC | #5
On Fri, Mar 18, 2016 at 8:19 AM, Pawar, Amit <Amit.Pawar@amd.com> wrote:
>>One question.  If  you don't want __memcpy_avx_unaligned, why do you set AVX_Fast_Unaligned_Load?
> Any idea whether currently any other string and memory functions are under implementation based on this? If not then let me just verify it.
>
> Also this feature is enabled in generic code. To disable it, need to change after this.
>

It was done based on assumption that AVX enabled machine has
fast AVX unaligned load.  If it isn't true for AMD CPUs, we can enable it
for all Intel AVX CPUs and you can set it for AMD CPUs properly.
  

Patch

--- a/sysdeps/x86_64/multiarch/memcpy.S
+++ b/sysdeps/x86_64/multiarch/memcpy.S
@@ -42,9 +42,11 @@  ENTRY(__new_memcpy)
   ret
 #endif
 1:   lea   __memcpy_avx_unaligned(%rip), %RAX_LP
+  HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load)
+  jnz   3f
   HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load)
   jnz   2f
-  lea   __memcpy_sse2_unaligned(%rip), %RAX_LP
+3:   lea   __memcpy_sse2_unaligned(%rip), %RAX_LP
   HAS_ARCH_FEATURE (Fast_Unaligned_Load)
   jnz   2f
   lea   __memcpy_sse2(%rip), %RAX_LP