From patchwork Mon Feb 1 11:36:54 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Torvald Riegel X-Patchwork-Id: 10681 Received: (qmail 124685 invoked by alias); 1 Feb 2016 11:37:01 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 124664 invoked by uid 89); 1 Feb 2016 11:37:00 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=BAYES_00, RP_MATCHES_RCVD, SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=contended, 20022016, 237, played X-HELO: mx1.redhat.com Message-ID: <1454326614.4592.293.camel@localhost.localdomain> Subject: Re: [PATCH][BZ #19490] Add unwind descriptors to pthread_spin_init, etc. on i386 From: Torvald Riegel To: Paul Pluzhnikov Cc: GLIBC Devel Date: Mon, 01 Feb 2016 12:36:54 +0100 In-Reply-To: References: <1453727190.4592.91.camel@localhost.localdomain> Mime-Version: 1.0 On Sun, 2016-01-31 at 15:09 -0800, Paul Pluzhnikov wrote: > On Mon, Jan 25, 2016 at 5:06 AM, Torvald Riegel wrote: > > > For the spinlocks, I'd really prefer if we could just remove the asm > > files. The generic implementation should basically produce the same > > code; if not, we should try to fix that instead of keeping the asm > > files. > > Using gcc-4.8.4 (4.8.4-2ubuntu1~14.04): > > $ objdump -d nptl/pthread_spin_unlock.o > > nptl/pthread_spin_unlock.o: file format elf32-i386 > > > Disassembly of section .text: > > 00000000 : > 0: f0 83 0c 24 00 lock orl $0x0,(%esp) > 5: 8b 44 24 04 mov 0x4(%esp),%eax > 9: c7 00 00 00 00 00 movl $0x0,(%eax) > f: 31 c0 xor %eax,%eax > 11: c3 ret > > This isn't quite the same as sysdeps/i386/nptl/pthread_spin_unlock.S This is because nptl/pthread_spin_unlock.c still issues a full barrier. If this is changed to an atomic_store_release, one gets this on x86_64: 0000000000000000 : 0: c7 07 00 00 00 00 movl $0x0,(%rdi) 6: 31 c0 xor %eax,%eax 8: c3 Perhaps now is a good time to finally get this done. Most archs are already using acquire semantics, IIRC. I think aarch64 and arm are the only major ones that happen to use the current generic unlock with full barrier -- but they only use acquire MO on unlock, so there's really no consistent pattern that would justify this. Note that there's an ongoing debate about whether POSIX requires pthread_spin_unlock to be a full barrier, whether it should or should not do that, and whether that makes any difference for all "sane" programs. But given that we never implemented full barriers on almost all of the major archs and nobody complained about it, I think we should continue to not slow down spinlocks just to make weird use cases work (and the ones that are indeed correct under POSIX are pretty complex pieces of code). > For pthread_spin_lock it's much worse: > > $ objdump -d nptl/pthread_spin_lock.o > > nptl/pthread_spin_lock.o: file format elf32-i386 > > > Disassembly of section .text: > > 00000000 : > 0: 57 push %edi > 1: b8 01 00 00 00 mov $0x1,%eax > 6: 56 push %esi > 7: 53 push %ebx > 8: 83 ec 10 sub $0x10,%esp > b: 8b 5c 24 20 mov 0x20(%esp),%ebx > f: 87 03 xchg %eax,(%ebx) > 11: 89 44 24 0c mov %eax,0xc(%esp) > 15: 8b 44 24 0c mov 0xc(%esp),%eax > 19: 31 ff xor %edi,%edi > 1b: be 01 00 00 00 mov $0x1,%esi > 20: 85 c0 test %eax,%eax > 22: 74 29 je 4d > 24: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi > 28: 8b 03 mov (%ebx),%eax > 2a: 85 c0 test %eax,%eax > 2c: 74 15 je 43 > 2e: ba e8 03 00 00 mov $0x3e8,%edx > 33: eb 08 jmp 3d > 35: 8d 76 00 lea 0x0(%esi),%esi > 38: 83 ea 01 sub $0x1,%edx > 3b: 74 06 je 43 > 3d: 8b 0b mov (%ebx),%ecx > 3f: 85 c9 test %ecx,%ecx > 41: 75 f5 jne 38 > 43: 89 f8 mov %edi,%eax > 45: f0 0f b1 33 lock cmpxchg %esi,(%ebx) > 49: 85 c0 test %eax,%eax > 4b: 75 db jne 28 > 4d: 83 c4 10 add $0x10,%esp > 50: 31 c0 xor %eax,%eax > 52: 5b pop %ebx > 53: 5e pop %esi > 54: 5f pop %edi > 55: c3 ret I wouldn't say it's worse. It's mostly different, and the uncontended path may be a little worse. In the generic version, we added spinning. This isn't really well-tuned yet, but it's something we want to do eventually. If we assume uncontended, the initial xchg should be fast; maybe we need to add a glibc_likely here or such, to make GCC do what we expect; outlining the contended path (ie, the spinning and cmpxchg) could also help work around GCC codegen deficiencies. However, on x86_64 I get the following (adding __glibc_likely to the atomic_exchange_acq only moves the return up): 0000000000000000 : 0: b8 01 00 00 00 mov $0x1,%eax 5: 87 07 xchg %eax,(%rdi) 7: 89 44 24 fc mov %eax,-0x4(%rsp) b: 8b 44 24 fc mov -0x4(%rsp),%eax f: 85 c0 test %eax,%eax 11: 75 03 jne 16 13: 31 c0 xor %eax,%eax 15: c3 retq 16: 45 31 c0 xor %r8d,%r8d 19: be 01 00 00 00 mov $0x1,%esi 1e: 8b 17 mov (%rdi),%edx 20: 85 d2 test %edx,%edx 22: 74 17 je 3b 24: ba e8 03 00 00 mov $0x3e8,%edx 29: eb 0a jmp 35 2b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 30: 83 ea 01 sub $0x1,%edx 33: 74 06 je 3b 35: 8b 0f mov (%rdi),%ecx 37: 85 c9 test %ecx,%ecx 39: 75 f5 jne 30 3b: 44 89 c0 mov %r8d,%eax 3e: f0 0f b1 37 lock cmpxchg %esi,(%rdi) 42: 85 c0 test %eax,%eax 44: 75 d8 jne 1e 46: eb cb jmp 13 The fastpath of this doesn't look bad to me (except at 7: and b:, for which I don't see a reason). See attached untested patch for what I played with. commit f9a5437b0c0150bac4c5afd769dd6eba09fed1de Author: Torvald Riegel Date: Mon Feb 1 12:35:50 2016 +0100 generic spinlock cleanup and x86_64 customization removal. diff --git a/nptl/pthread_spin_lock.c b/nptl/pthread_spin_lock.c index fb9bcc1..2209341 100644 --- a/nptl/pthread_spin_lock.c +++ b/nptl/pthread_spin_lock.c @@ -38,7 +38,7 @@ pthread_spin_lock (pthread_spinlock_t *lock) We assume that the first try mostly will be successful, and we use atomic_exchange. For the subsequent tries we use atomic_compare_and_exchange. */ - if (atomic_exchange_acq (lock, 1) == 0) + if (__glibc_likely (atomic_exchange_acq (lock, 1) == 0)) return 0; do diff --git a/nptl/pthread_spin_unlock.c b/nptl/pthread_spin_unlock.c index d4b63ac..ca534a0 100644 --- a/nptl/pthread_spin_unlock.c +++ b/nptl/pthread_spin_unlock.c @@ -23,7 +23,6 @@ int pthread_spin_unlock (pthread_spinlock_t *lock) { - atomic_full_barrier (); - *lock = 0; + atomic_store_release (lock, 0); return 0; } diff --git a/sysdeps/x86_64/nptl/pthread_spin_init.c b/sysdeps/x86_64/nptl/pthread_spin_init.c deleted file mode 100644 index f249c6f..0000000 --- a/sysdeps/x86_64/nptl/pthread_spin_init.c +++ /dev/null @@ -1 +0,0 @@ -#include diff --git a/sysdeps/x86_64/nptl/pthread_spin_lock.S b/sysdeps/x86_64/nptl/pthread_spin_lock.S deleted file mode 100644 index b871241..0000000 --- a/sysdeps/x86_64/nptl/pthread_spin_lock.S +++ /dev/null @@ -1,34 +0,0 @@ -/* Copyright (C) 2012-2016 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#include -#include - -ENTRY(pthread_spin_lock) -1: LOCK - decl 0(%rdi) - jne 2f - xor %eax, %eax - ret - - .align 16 -2: rep - nop - cmpl $0, 0(%rdi) - jg 1b - jmp 2b -END(pthread_spin_lock) diff --git a/sysdeps/x86_64/nptl/pthread_spin_trylock.S b/sysdeps/x86_64/nptl/pthread_spin_trylock.S deleted file mode 100644 index c9c5317..0000000 --- a/sysdeps/x86_64/nptl/pthread_spin_trylock.S +++ /dev/null @@ -1,37 +0,0 @@ -/* Copyright (C) 2002-2016 Free Software Foundation, Inc. - This file is part of the GNU C Library. - Contributed by Ulrich Drepper , 2002. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#include -#include - - -#ifdef UP -# define LOCK -#else -# define LOCK lock -#endif - -ENTRY(pthread_spin_trylock) - movl $1, %eax - xorl %ecx, %ecx - LOCK - cmpxchgl %ecx, (%rdi) - movl $EBUSY, %eax - cmovel %ecx, %eax - retq -END(pthread_spin_trylock) diff --git a/sysdeps/x86_64/nptl/pthread_spin_unlock.S b/sysdeps/x86_64/nptl/pthread_spin_unlock.S deleted file mode 100644 index 188de2e..0000000 --- a/sysdeps/x86_64/nptl/pthread_spin_unlock.S +++ /dev/null @@ -1,29 +0,0 @@ -/* Copyright (C) 2002-2016 Free Software Foundation, Inc. - This file is part of the GNU C Library. - Contributed by Ulrich Drepper , 2002. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#include - -ENTRY(pthread_spin_unlock) - movl $1, (%rdi) - xorl %eax, %eax - retq -END(pthread_spin_unlock) - - /* The implementation of pthread_spin_init is identical. */ - .globl pthread_spin_init -pthread_spin_init = pthread_spin_unlock