Patchwork Group AVX512 functions in .text.avx512 section

login
register
mail settings
Submitter H.J. Lu
Date March 6, 2016, 6:46 p.m.
Message ID <1457289968-8965-1-git-send-email-hjl.tools@gmail.com>
Download mbox | patch
Permalink /patch/11224/
State New
Headers show

Comments

H.J. Lu - March 6, 2016, 6:46 p.m.
* sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S:
	Replace .text with .text.avx512.
	* sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S:
	Likewise.
---
 sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S | 2 +-
 sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
Florian Weimer - March 7, 2016, 3 p.m.
On 03/06/2016 07:46 PM, H.J. Lu wrote:
> 	* sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S:
> 	Replace .text with .text.avx512.
> 	* sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S:
> 	Likewise.

What's the rationale for this change?

Thanks,
Florian
H.J. Lu - March 7, 2016, 3:54 p.m.
On Mon, Mar 7, 2016 at 7:00 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 03/06/2016 07:46 PM, H.J. Lu wrote:
>>       * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S:
>>       Replace .text with .text.avx512.
>>       * sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S:
>>       Likewise.
>
> What's the rationale for this change?
>

All multiarch functions are grouped in .text.ISA sections so that
the mos likely selected implementations are next to each other
in memory.   It will improve cache performance.
Florian Weimer - March 7, 2016, 3:57 p.m.
On 03/07/2016 04:54 PM, H.J. Lu wrote:
> On Mon, Mar 7, 2016 at 7:00 AM, Florian Weimer <fweimer@redhat.com> wrote:
>> On 03/06/2016 07:46 PM, H.J. Lu wrote:
>>>       * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S:
>>>       Replace .text with .text.avx512.
>>>       * sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S:
>>>       Likewise.
>>
>> What's the rationale for this change?

> All multiarch functions are grouped in .text.ISA sections so that
> the mos likely selected implementations are next to each other
> in memory.   It will improve cache performance.

Makes sense (except the benefit is more avoiding page faults because
these functions are quite large).

Thanks,
Florian

Patch

diff --git a/sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S b/sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S
index 1bb12e8..3d567fc 100644
--- a/sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S
+++ b/sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S
@@ -29,7 +29,7 @@ 
 # define MEMCPY_CHK	__memcpy_chk_avx512_no_vzeroupper
 #endif
 
-	.section .text,"ax",@progbits
+	.section .text.avx512,"ax",@progbits
 #if !defined USE_AS_BCOPY
 ENTRY (MEMCPY_CHK)
 	cmpq	%rdx, %rcx
diff --git a/sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S b/sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S
index 1e638d7..eab8c5a 100644
--- a/sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S
+++ b/sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S
@@ -26,7 +26,7 @@ 
 # define MEMSET_CHK __memset_chk_avx512_no_vzeroupper
 #endif
 
-	.section .text,"ax",@progbits
+	.section .text.avx512,"ax",@progbits
 #if defined PIC
 ENTRY (MEMSET_CHK)
 	cmpq	%rdx, %rcx