From patchwork Fri Feb 19 22:09:55 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lu, Hongjiu" X-Patchwork-Id: 10944 Received: (qmail 68598 invoked by alias); 19 Feb 2016 22:09:58 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 68587 invoked by uid 89); 19 Feb 2016 22:09:58 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=1.9 required=5.0 tests=BAYES_00, KAM_LAZY_DOMAIN_SECURITY, NO_DNS_FOR_FROM, RP_MATCHES_RCVD, UNWANTED_LANGUAGE_BODY autolearn=no version=3.3.2 spammy=Align, 0x100, H*R:D*gmail.com X-HELO: mga02.intel.com X-ExtLoop1: 1 Date: Fri, 19 Feb 2016 14:09:55 -0800 From: "H.J. Lu" To: GNU C Library Subject: [PATCH] [BZ #19679] [x86_64] Set DL_RUNIME_UNALIGNED_VEC_SIZE to 8 Message-ID: <20160219220955.GA25540@intel.com> Reply-To: "H.J. Lu" MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) Due to GCC bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066 __tls_get_addr may be called with 8-byte stack alignment. Although this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume that stack will be always aligned at 16 bytes. Since SSE optimized memory/string functions with aligned SSE register load and store are used in the dynamic linker, we must set DL_RUNIME_UNALIGNED_VEC_SIZE to 8 so that _dl_runtime_resolve_sse will align the stack before calling _dl_fixup: Dump of assembler code for function _dl_runtime_resolve_sse: 0x00007ffff7deea90 <+0>: push %rbx 0x00007ffff7deea91 <+1>: mov %rsp,%rbx 0x00007ffff7deea94 <+4>: and $0xfffffffffffffff0,%rsp ^^^^^^^^^^^ Align stack to 16 bytes 0x00007ffff7deea98 <+8>: sub $0x100,%rsp 0x00007ffff7deea9f <+15>: mov %rax,0xc0(%rsp) 0x00007ffff7deeaa7 <+23>: mov %rcx,0xc8(%rsp) 0x00007ffff7deeaaf <+31>: mov %rdx,0xd0(%rsp) 0x00007ffff7deeab7 <+39>: mov %rsi,0xd8(%rsp) 0x00007ffff7deeabf <+47>: mov %rdi,0xe0(%rsp) 0x00007ffff7deeac7 <+55>: mov %r8,0xe8(%rsp) 0x00007ffff7deeacf <+63>: mov %r9,0xf0(%rsp) 0x00007ffff7deead7 <+71>: movaps %xmm0,(%rsp) 0x00007ffff7deeadb <+75>: movaps %xmm1,0x10(%rsp) 0x00007ffff7deeae0 <+80>: movaps %xmm2,0x20(%rsp) 0x00007ffff7deeae5 <+85>: movaps %xmm3,0x30(%rsp) 0x00007ffff7deeaea <+90>: movaps %xmm4,0x40(%rsp) 0x00007ffff7deeaef <+95>: movaps %xmm5,0x50(%rsp) 0x00007ffff7deeaf4 <+100>: movaps %xmm6,0x60(%rsp) 0x00007ffff7deeaf9 <+105>: movaps %xmm7,0x70(%rsp) Tested on x86-64. Any comments? H.J. --- [BZ #19679] * sysdeps/x86_64/dl-trampoline.S (DL_RUNIME_UNALIGNED_VEC_SIZE): Set to 8. --- sysdeps/x86_64/dl-trampoline.S | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/sysdeps/x86_64/dl-trampoline.S b/sysdeps/x86_64/dl-trampoline.S index 9fb6b13..0a480b6 100644 --- a/sysdeps/x86_64/dl-trampoline.S +++ b/sysdeps/x86_64/dl-trampoline.S @@ -34,8 +34,11 @@ #endif #ifndef DL_RUNIME_UNALIGNED_VEC_SIZE -/* The maximum size of unaligned vector load and store. */ -# define DL_RUNIME_UNALIGNED_VEC_SIZE 16 +/* The maximum size of unaligned vector load and store in the dynamic + linker. Since SSE optimized memory/string functions with aligned + SSE register load and store are used in the dynamic linker, we must + set this to 8. */ +# define DL_RUNIME_UNALIGNED_VEC_SIZE 8 #endif /* True if _dl_runtime_resolve should align stack to VEC_SIZE bytes. */