Group AVX512 functions in .text.avx512 section

Message ID 1457289968-8965-1-git-send-email-hjl.tools@gmail.com
State New, archived
Headers

Commit Message

H.J. Lu March 6, 2016, 6:46 p.m. UTC
  * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S:
	Replace .text with .text.avx512.
	* sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S:
	Likewise.
---
 sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S | 2 +-
 sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
  

Comments

Florian Weimer March 7, 2016, 3 p.m. UTC | #1
On 03/06/2016 07:46 PM, H.J. Lu wrote:
> 	* sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S:
> 	Replace .text with .text.avx512.
> 	* sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S:
> 	Likewise.

What's the rationale for this change?

Thanks,
Florian
  
H.J. Lu March 7, 2016, 3:54 p.m. UTC | #2
On Mon, Mar 7, 2016 at 7:00 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 03/06/2016 07:46 PM, H.J. Lu wrote:
>>       * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S:
>>       Replace .text with .text.avx512.
>>       * sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S:
>>       Likewise.
>
> What's the rationale for this change?
>

All multiarch functions are grouped in .text.ISA sections so that
the mos likely selected implementations are next to each other
in memory.   It will improve cache performance.
  
Florian Weimer March 7, 2016, 3:57 p.m. UTC | #3
On 03/07/2016 04:54 PM, H.J. Lu wrote:
> On Mon, Mar 7, 2016 at 7:00 AM, Florian Weimer <fweimer@redhat.com> wrote:
>> On 03/06/2016 07:46 PM, H.J. Lu wrote:
>>>       * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S:
>>>       Replace .text with .text.avx512.
>>>       * sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S:
>>>       Likewise.
>>
>> What's the rationale for this change?

> All multiarch functions are grouped in .text.ISA sections so that
> the mos likely selected implementations are next to each other
> in memory.   It will improve cache performance.

Makes sense (except the benefit is more avoiding page faults because
these functions are quite large).

Thanks,
Florian
  

Patch

diff --git a/sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S b/sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S
index 1bb12e8..3d567fc 100644
--- a/sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S
+++ b/sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S
@@ -29,7 +29,7 @@ 
 # define MEMCPY_CHK	__memcpy_chk_avx512_no_vzeroupper
 #endif
 
-	.section .text,"ax",@progbits
+	.section .text.avx512,"ax",@progbits
 #if !defined USE_AS_BCOPY
 ENTRY (MEMCPY_CHK)
 	cmpq	%rdx, %rcx
diff --git a/sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S b/sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S
index 1e638d7..eab8c5a 100644
--- a/sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S
+++ b/sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S
@@ -26,7 +26,7 @@ 
 # define MEMSET_CHK __memset_chk_avx512_no_vzeroupper
 #endif
 
-	.section .text,"ax",@progbits
+	.section .text.avx512,"ax",@progbits
 #if defined PIC
 ENTRY (MEMSET_CHK)
 	cmpq	%rdx, %rcx