Message ID | 535A078F.3050003@huawei.com |
---|---|
State | Not applicable |
Headers |
Return-Path: <x14307373@homiemail-mx23.g.dreamhost.com> X-Original-To: siddhesh@wilcox.dreamhost.com Delivered-To: siddhesh@wilcox.dreamhost.com Received: from homiemail-mx23.g.dreamhost.com (mx2.sub5.homie.mail.dreamhost.com [208.113.200.128]) by wilcox.dreamhost.com (Postfix) with ESMTP id 020283604D1 for <siddhesh@wilcox.dreamhost.com>; Fri, 25 Apr 2014 00:00:00 -0700 (PDT) Received: by homiemail-mx23.g.dreamhost.com (Postfix, from userid 14307373) id A9C0B630292A8; Fri, 25 Apr 2014 00:00:00 -0700 (PDT) X-Original-To: glibc@patchwork.siddhesh.in Delivered-To: x14307373@homiemail-mx23.g.dreamhost.com Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by homiemail-mx23.g.dreamhost.com (Postfix) with ESMTPS id 8644963031C5F for <glibc@patchwork.siddhesh.in>; Fri, 25 Apr 2014 00:00:00 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:message-id:date:from:mime-version:to:cc :subject:content-type:content-transfer-encoding; q=dns; s= default; b=Bnhry7zWyT3N3V0kd4Zqb9ky2fQyPObc7kTD5LviSGKiKdzho7ZJe eJvLer15guseliS0dBf1fVCpuo9FEhbMxOwLdviJCAy0vtB2BSf8QHQU++jl8S5/ sLDc8k0m+zPB/vvzfk8XMTCaPVf5+D0GyQ/iNGEFBPIajC+flIVQ1Q= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:message-id:date:from:mime-version:to:cc :subject:content-type:content-transfer-encoding; s=default; bh=r YGb+WqhjGwv0220rBbapxibJSY=; b=DaudeK4Q+KiWGmI+SvpkK6dHSxTG4u03n Acri5tM8ZYnlomjPeEwpzID2yL96DGzr6rPVeZNJdiYjg311l5ZLVs4eLNF+ZZZj J548VHo4RV8bGprK2EFLp+kymfUU/Pghft35At5YQ5Aub2YuVwcIUhPzK2PN+FJj uYmZe5Mhto= Received: (qmail 23300 invoked by alias); 25 Apr 2014 06:59:58 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-glibc=patchwork.siddhesh.in@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 23280 invoked by uid 89); 25 Apr 2014 06:59:57 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 X-Spam-User: qpsmtpd, 2 recipients X-HELO: szxga03-in.huawei.com Message-ID: <535A078F.3050003@huawei.com> Date: Fri, 25 Apr 2014 14:58:23 +0800 From: Yang Yingliang <yangyingliang@huawei.com> User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Thunderbird/24.0.1 MIME-Version: 1.0 To: <libc-help@sourceware.org> CC: <libc-alpha@sourceware.org>, <yangyingliang@huawei.com> Subject: shared data protection failed in pthread_cond_timedwait Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-CFilter-Loop: Reflected X-DH-Original-To: glibc@patchwork.siddhesh.in |
Commit Message
Yang Yingliang
April 25, 2014, 6:58 a.m. UTC
Hi, I have 22 threads wait in pthread_cond_timedwait. When they are all woke up, I found there are more than one threads can access shared data in pthread_cond_timedwait. I added print messages as follow code: I tested on Linux arma15el 3.10.37+ #2 SMP Fri Apr 25 11:23:25 CST 2014 armv7l GNU/Linux. Here is the result: start do sub :45, lock:1 0xb6d9a460 end do sub :43, lock:1 0xb6d9a460 start do sub :43, lock:1 0xb6d9e460 end do sub :41, lock:2 0xb6d9e460 start do sub :43, lock:2 0xb6dbe460 //two threads both access the shared data start do sub :41, lock:1 0xb6daa460 end do sub :39, lock:2 0xb6daa460 start do sub :39, lock:2 0xb6de6460 end do sub :37, lock:2 0xb6de6460 start do sub :37, lock:2 0xb6db6460 end do sub :35, lock:2 0xb6db6460 start do sub :35, lock:2 0xb6dc2460 end do sub :33, lock:2 0xb6dc2460 end do sub :37, lock:2 0xb6dbe460 start do sub :33, lock:2 0xb6dc6460 end do sub :31, lock:0 0xb6dc6460 start do sub :31, lock:2 0xb6dae460 end do sub :29, lock:2 0xb6dae460 start do sub :29, lock:2 0xb6db2460 end do sub :27, lock:2 0xb6db2460 start do sub :27, lock:2 0xb6dba460 end do sub :25, lock:2 0xb6dba460 start do sub :25, lock:2 0xb6da2460 end do sub :23, lock:2 0xb6da2460 Is lll_lock (cond->__data.__lock, pshared) failed? pshared is LLL_SHARED.
Comments
On 25 April 2014 07:58, Yang Yingliang <yangyingliang@huawei.com> wrote: > Hi, > > > I have 22 threads wait in pthread_cond_timedwait. When they are all woke up, I found > there are more than one threads can access shared data in pthread_cond_timedwait. > > I added print messages as follow code: > > --- libc/nptl/pthread_cond_timedwait.c > +++ libc/nptl/pthread_cond_timedwait.c > @@ -34,6 +34,7 @@ > #else > # include <bits/libc-vdso.h> > #endif > +#include <stdio.h> > > /* Cleanup handler, defined in pthread_cond_wait.c. */ > extern void __condvar_cleanup (void *arg) > @@ -235,7 +239,9 @@ > > bc_out: > > +printf("start do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self()); > cond->__data.__nwaiters -= 1 << COND_NWAITERS_SHIFT; > +printf("end do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self()); > > /* If pthread_cond_destroy was called on this variable already, > notify the pthread_cond_destroy caller all waiters have left > > > I tested on Linux arma15el 3.10.37+ #2 SMP Fri Apr 25 11:23:25 CST 2014 armv7l GNU/Linux. > Here is the result: > > start do sub :45, lock:1 0xb6d9a460 > end do sub :43, lock:1 0xb6d9a460 > start do sub :43, lock:1 0xb6d9e460 > end do sub :41, lock:2 0xb6d9e460 > start do sub :43, lock:2 0xb6dbe460 //two threads both access the shared data > start do sub :41, lock:1 0xb6daa460 > end do sub :39, lock:2 0xb6daa460 > start do sub :39, lock:2 0xb6de6460 > end do sub :37, lock:2 0xb6de6460 > start do sub :37, lock:2 0xb6db6460 > end do sub :35, lock:2 0xb6db6460 > start do sub :35, lock:2 0xb6dc2460 > end do sub :33, lock:2 0xb6dc2460 > end do sub :37, lock:2 0xb6dbe460 > start do sub :33, lock:2 0xb6dc6460 > end do sub :31, lock:0 0xb6dc6460 > start do sub :31, lock:2 0xb6dae460 > end do sub :29, lock:2 0xb6dae460 > start do sub :29, lock:2 0xb6db2460 > end do sub :27, lock:2 0xb6db2460 > start do sub :27, lock:2 0xb6dba460 > end do sub :25, lock:2 0xb6dba460 > start do sub :25, lock:2 0xb6da2460 > end do sub :23, lock:2 0xb6da2460 > > Is lll_lock (cond->__data.__lock, pshared) failed? > > pshared is LLL_SHARED. I have had a quick look at this and there is no obvious reason I can see for this behaviour, unless there is some way that IO buffering could cause the messages to be strangely interleaved. The other alternative that may be worth investigating is whether or not ldrex/strex is working correctly in your SMP system.
On 2014/4/25 17:43, Will Newton wrote: > On 25 April 2014 07:58, Yang Yingliang <yangyingliang@huawei.com> wrote: >> Hi, >> >> >> I have 22 threads wait in pthread_cond_timedwait. When they are all woke up, I found >> there are more than one threads can access shared data in pthread_cond_timedwait. >> >> I added print messages as follow code: >> >> --- libc/nptl/pthread_cond_timedwait.c >> +++ libc/nptl/pthread_cond_timedwait.c >> @@ -34,6 +34,7 @@ >> #else >> # include <bits/libc-vdso.h> >> #endif >> +#include <stdio.h> >> >> /* Cleanup handler, defined in pthread_cond_wait.c. */ >> extern void __condvar_cleanup (void *arg) >> @@ -235,7 +239,9 @@ >> >> bc_out: >> >> +printf("start do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self()); >> cond->__data.__nwaiters -= 1 << COND_NWAITERS_SHIFT; >> +printf("end do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self()); >> >> /* If pthread_cond_destroy was called on this variable already, >> notify the pthread_cond_destroy caller all waiters have left >> >> >> I tested on Linux arma15el 3.10.37+ #2 SMP Fri Apr 25 11:23:25 CST 2014 armv7l GNU/Linux. >> Here is the result: >> >> start do sub :45, lock:1 0xb6d9a460 >> end do sub :43, lock:1 0xb6d9a460 >> start do sub :43, lock:1 0xb6d9e460 >> end do sub :41, lock:2 0xb6d9e460 >> start do sub :43, lock:2 0xb6dbe460 //two threads both access the shared data >> start do sub :41, lock:1 0xb6daa460 >> end do sub :39, lock:2 0xb6daa460 >> start do sub :39, lock:2 0xb6de6460 >> end do sub :37, lock:2 0xb6de6460 >> start do sub :37, lock:2 0xb6db6460 >> end do sub :35, lock:2 0xb6db6460 >> start do sub :35, lock:2 0xb6dc2460 >> end do sub :33, lock:2 0xb6dc2460 >> end do sub :37, lock:2 0xb6dbe460 >> start do sub :33, lock:2 0xb6dc6460 >> end do sub :31, lock:0 0xb6dc6460 >> start do sub :31, lock:2 0xb6dae460 >> end do sub :29, lock:2 0xb6dae460 >> start do sub :29, lock:2 0xb6db2460 >> end do sub :27, lock:2 0xb6db2460 >> start do sub :27, lock:2 0xb6dba460 >> end do sub :25, lock:2 0xb6dba460 >> start do sub :25, lock:2 0xb6da2460 >> end do sub :23, lock:2 0xb6da2460 >> >> Is lll_lock (cond->__data.__lock, pshared) failed? >> >> pshared is LLL_SHARED. > > I have had a quick look at this and there is no obvious reason I can > see for this behaviour, unless there is some way that IO buffering > could cause the messages to be strangely interleaved. The other > alternative that may be worth investigating is whether or not > ldrex/strex is working correctly in your SMP system. > After doing some investigation, it looks like atomic_compare_and_exchange_val_acq not doing atomic. So two threads can both acquire lock when futex is 0. Is there something wrong in atomic_compare_and_exchange_val_acq ? #define __lll_lock(futex, private) \ ((void) ({ \ int *__futex = (futex); \ if (__builtin_expect (atomic_compare_and_exchange_val_acq (__futex, \ 1, 0), 0)) \ { \ if (__builtin_constant_p (private) && (private) == LLL_PRIVATE) \ __lll_lock_wait_private (__futex); \ else \ __lll_lock_wait (__futex, private); \ } \ }))
On 2014/4/26 14:45, Yang Yingliang wrote: > On 2014/4/25 17:43, Will Newton wrote: >> On 25 April 2014 07:58, Yang Yingliang <yangyingliang@huawei.com> wrote: >>> Hi, >>> >>> >>> I have 22 threads wait in pthread_cond_timedwait. When they are all woke up, I found >>> there are more than one threads can access shared data in pthread_cond_timedwait. >>> >>> I added print messages as follow code: >>> >>> --- libc/nptl/pthread_cond_timedwait.c >>> +++ libc/nptl/pthread_cond_timedwait.c >>> @@ -34,6 +34,7 @@ >>> #else >>> # include <bits/libc-vdso.h> >>> #endif >>> +#include <stdio.h> >>> >>> /* Cleanup handler, defined in pthread_cond_wait.c. */ >>> extern void __condvar_cleanup (void *arg) >>> @@ -235,7 +239,9 @@ >>> >>> bc_out: >>> >>> +printf("start do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self()); >>> cond->__data.__nwaiters -= 1 << COND_NWAITERS_SHIFT; >>> +printf("end do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self()); >>> >>> /* If pthread_cond_destroy was called on this variable already, >>> notify the pthread_cond_destroy caller all waiters have left >>> >>> >>> I tested on Linux arma15el 3.10.37+ #2 SMP Fri Apr 25 11:23:25 CST 2014 armv7l GNU/Linux. >>> Here is the result: >>> >>> start do sub :45, lock:1 0xb6d9a460 >>> end do sub :43, lock:1 0xb6d9a460 >>> start do sub :43, lock:1 0xb6d9e460 >>> end do sub :41, lock:2 0xb6d9e460 >>> start do sub :43, lock:2 0xb6dbe460 //two threads both access the shared data >>> start do sub :41, lock:1 0xb6daa460 >>> end do sub :39, lock:2 0xb6daa460 >>> start do sub :39, lock:2 0xb6de6460 >>> end do sub :37, lock:2 0xb6de6460 >>> start do sub :37, lock:2 0xb6db6460 >>> end do sub :35, lock:2 0xb6db6460 >>> start do sub :35, lock:2 0xb6dc2460 >>> end do sub :33, lock:2 0xb6dc2460 >>> end do sub :37, lock:2 0xb6dbe460 >>> start do sub :33, lock:2 0xb6dc6460 >>> end do sub :31, lock:0 0xb6dc6460 >>> start do sub :31, lock:2 0xb6dae460 >>> end do sub :29, lock:2 0xb6dae460 >>> start do sub :29, lock:2 0xb6db2460 >>> end do sub :27, lock:2 0xb6db2460 >>> start do sub :27, lock:2 0xb6dba460 >>> end do sub :25, lock:2 0xb6dba460 >>> start do sub :25, lock:2 0xb6da2460 >>> end do sub :23, lock:2 0xb6da2460 >>> >>> Is lll_lock (cond->__data.__lock, pshared) failed? >>> >>> pshared is LLL_SHARED. >> >> I have had a quick look at this and there is no obvious reason I can >> see for this behaviour, unless there is some way that IO buffering >> could cause the messages to be strangely interleaved. The other >> alternative that may be worth investigating is whether or not >> ldrex/strex is working correctly in your SMP system. >> > > After doing some investigation, it looks like atomic_compare_and_exchange_val_acq > not doing atomic. So two threads can both acquire lock when futex is 0. Is there > something wrong in atomic_compare_and_exchange_val_acq ? > > #define __lll_lock(futex, private) \ > ((void) ({ \ > int *__futex = (futex); \ > if (__builtin_expect (atomic_compare_and_exchange_val_acq (__futex, \ > 1, 0), 0)) \ > { \ > if (__builtin_constant_p (private) && (private) == LLL_PRIVATE) \ > __lll_lock_wait_private (__futex); \ > else \ > __lll_lock_wait (__futex, private); \ > } \ > })) > I noticed that atomic_compare_and_exchange_val_acq are not atomic for ARM in glibc-2.18. I will try glibc-2.19.
On 26 April 2014 10:48, Yang Yingliang <yangyingliang@huawei.com> wrote: > On 2014/4/26 14:45, Yang Yingliang wrote: >> On 2014/4/25 17:43, Will Newton wrote: >>> On 25 April 2014 07:58, Yang Yingliang <yangyingliang@huawei.com> wrote: >>>> Hi, >>>> >>>> >>>> I have 22 threads wait in pthread_cond_timedwait. When they are all woke up, I found >>>> there are more than one threads can access shared data in pthread_cond_timedwait. >>>> >>>> I added print messages as follow code: >>>> >>>> --- libc/nptl/pthread_cond_timedwait.c >>>> +++ libc/nptl/pthread_cond_timedwait.c >>>> @@ -34,6 +34,7 @@ >>>> #else >>>> # include <bits/libc-vdso.h> >>>> #endif >>>> +#include <stdio.h> >>>> >>>> /* Cleanup handler, defined in pthread_cond_wait.c. */ >>>> extern void __condvar_cleanup (void *arg) >>>> @@ -235,7 +239,9 @@ >>>> >>>> bc_out: >>>> >>>> +printf("start do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self()); >>>> cond->__data.__nwaiters -= 1 << COND_NWAITERS_SHIFT; >>>> +printf("end do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self()); >>>> >>>> /* If pthread_cond_destroy was called on this variable already, >>>> notify the pthread_cond_destroy caller all waiters have left >>>> >>>> >>>> I tested on Linux arma15el 3.10.37+ #2 SMP Fri Apr 25 11:23:25 CST 2014 armv7l GNU/Linux. >>>> Here is the result: >>>> >>>> start do sub :45, lock:1 0xb6d9a460 >>>> end do sub :43, lock:1 0xb6d9a460 >>>> start do sub :43, lock:1 0xb6d9e460 >>>> end do sub :41, lock:2 0xb6d9e460 >>>> start do sub :43, lock:2 0xb6dbe460 //two threads both access the shared data >>>> start do sub :41, lock:1 0xb6daa460 >>>> end do sub :39, lock:2 0xb6daa460 >>>> start do sub :39, lock:2 0xb6de6460 >>>> end do sub :37, lock:2 0xb6de6460 >>>> start do sub :37, lock:2 0xb6db6460 >>>> end do sub :35, lock:2 0xb6db6460 >>>> start do sub :35, lock:2 0xb6dc2460 >>>> end do sub :33, lock:2 0xb6dc2460 >>>> end do sub :37, lock:2 0xb6dbe460 >>>> start do sub :33, lock:2 0xb6dc6460 >>>> end do sub :31, lock:0 0xb6dc6460 >>>> start do sub :31, lock:2 0xb6dae460 >>>> end do sub :29, lock:2 0xb6dae460 >>>> start do sub :29, lock:2 0xb6db2460 >>>> end do sub :27, lock:2 0xb6db2460 >>>> start do sub :27, lock:2 0xb6dba460 >>>> end do sub :25, lock:2 0xb6dba460 >>>> start do sub :25, lock:2 0xb6da2460 >>>> end do sub :23, lock:2 0xb6da2460 >>>> >>>> Is lll_lock (cond->__data.__lock, pshared) failed? >>>> >>>> pshared is LLL_SHARED. >>> >>> I have had a quick look at this and there is no obvious reason I can >>> see for this behaviour, unless there is some way that IO buffering >>> could cause the messages to be strangely interleaved. The other >>> alternative that may be worth investigating is whether or not >>> ldrex/strex is working correctly in your SMP system. >>> >> >> After doing some investigation, it looks like atomic_compare_and_exchange_val_acq >> not doing atomic. So two threads can both acquire lock when futex is 0. Is there >> something wrong in atomic_compare_and_exchange_val_acq ? >> >> #define __lll_lock(futex, private) \ >> ((void) ({ \ >> int *__futex = (futex); \ >> if (__builtin_expect (atomic_compare_and_exchange_val_acq (__futex, \ >> 1, 0), 0)) \ >> { \ >> if (__builtin_constant_p (private) && (private) == LLL_PRIVATE) \ >> __lll_lock_wait_private (__futex); \ >> else \ >> __lll_lock_wait (__futex, private); \ >> } \ >> })) >> > > I noticed that atomic_compare_and_exchange_val_acq are not atomic for ARM in glibc-2.18. > I will try glibc-2.19. Which version of gcc are you using? Does the disassembly of the code look reasonably correct i.e. ldrex/strex?
On Sat, 2014-04-26 at 17:48 +0800, Yang Yingliang wrote: > On 2014/4/26 14:45, Yang Yingliang wrote: > > On 2014/4/25 17:43, Will Newton wrote: > >> On 25 April 2014 07:58, Yang Yingliang <yangyingliang@huawei.com> wrote: > >>> Hi, > >>> > >>> > >>> I have 22 threads wait in pthread_cond_timedwait. When they are all woke up, I found > >>> there are more than one threads can access shared data in pthread_cond_timedwait. > >>> > >>> I added print messages as follow code: > >>> > >>> --- libc/nptl/pthread_cond_timedwait.c > >>> +++ libc/nptl/pthread_cond_timedwait.c > >>> @@ -34,6 +34,7 @@ > >>> #else > >>> # include <bits/libc-vdso.h> > >>> #endif > >>> +#include <stdio.h> > >>> > >>> /* Cleanup handler, defined in pthread_cond_wait.c. */ > >>> extern void __condvar_cleanup (void *arg) > >>> @@ -235,7 +239,9 @@ > >>> > >>> bc_out: > >>> > >>> +printf("start do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self()); > >>> cond->__data.__nwaiters -= 1 << COND_NWAITERS_SHIFT; > >>> +printf("end do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self()); > >>> > >>> /* If pthread_cond_destroy was called on this variable already, > >>> notify the pthread_cond_destroy caller all waiters have left > >>> > >>> > >>> I tested on Linux arma15el 3.10.37+ #2 SMP Fri Apr 25 11:23:25 CST 2014 armv7l GNU/Linux. > >>> Here is the result: > >>> > >>> start do sub :45, lock:1 0xb6d9a460 > >>> end do sub :43, lock:1 0xb6d9a460 > >>> start do sub :43, lock:1 0xb6d9e460 > >>> end do sub :41, lock:2 0xb6d9e460 > >>> start do sub :43, lock:2 0xb6dbe460 //two threads both access the shared data > >>> start do sub :41, lock:1 0xb6daa460 > >>> end do sub :39, lock:2 0xb6daa460 > >>> start do sub :39, lock:2 0xb6de6460 > >>> end do sub :37, lock:2 0xb6de6460 > >>> start do sub :37, lock:2 0xb6db6460 > >>> end do sub :35, lock:2 0xb6db6460 > >>> start do sub :35, lock:2 0xb6dc2460 > >>> end do sub :33, lock:2 0xb6dc2460 > >>> end do sub :37, lock:2 0xb6dbe460 > >>> start do sub :33, lock:2 0xb6dc6460 > >>> end do sub :31, lock:0 0xb6dc6460 > >>> start do sub :31, lock:2 0xb6dae460 > >>> end do sub :29, lock:2 0xb6dae460 > >>> start do sub :29, lock:2 0xb6db2460 > >>> end do sub :27, lock:2 0xb6db2460 > >>> start do sub :27, lock:2 0xb6dba460 > >>> end do sub :25, lock:2 0xb6dba460 > >>> start do sub :25, lock:2 0xb6da2460 > >>> end do sub :23, lock:2 0xb6da2460 > >>> > >>> Is lll_lock (cond->__data.__lock, pshared) failed? > >>> > >>> pshared is LLL_SHARED. > >> > >> I have had a quick look at this and there is no obvious reason I can > >> see for this behaviour, unless there is some way that IO buffering > >> could cause the messages to be strangely interleaved. The other > >> alternative that may be worth investigating is whether or not > >> ldrex/strex is working correctly in your SMP system. > >> > > > > After doing some investigation, it looks like atomic_compare_and_exchange_val_acq > > not doing atomic. So two threads can both acquire lock when futex is 0. Is there > > something wrong in atomic_compare_and_exchange_val_acq ? > > > > #define __lll_lock(futex, private) \ > > ((void) ({ \ > > int *__futex = (futex); \ > > if (__builtin_expect (atomic_compare_and_exchange_val_acq (__futex, \ > > 1, 0), 0)) \ > > { \ > > if (__builtin_constant_p (private) && (private) == LLL_PRIVATE) \ > > __lll_lock_wait_private (__futex); \ > > else \ > > __lll_lock_wait (__futex, private); \ > > } \ > > })) > > > > I noticed that atomic_compare_and_exchange_val_acq are not atomic for ARM in glibc-2.18. > I will try glibc-2.19. Could you share the full test case for this, please? Did you just test whether some mutex can be acquired several times?
--- libc/nptl/pthread_cond_timedwait.c +++ libc/nptl/pthread_cond_timedwait.c @@ -34,6 +34,7 @@ #else # include <bits/libc-vdso.h> #endif +#include <stdio.h> /* Cleanup handler, defined in pthread_cond_wait.c. */ extern void __condvar_cleanup (void *arg) @@ -235,7 +239,9 @@ bc_out: +printf("start do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self()); cond->__data.__nwaiters -= 1 << COND_NWAITERS_SHIFT; +printf("end do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self()); /* If pthread_cond_destroy was called on this variable already, notify the pthread_cond_destroy caller all waiters have left