Message ID | 53F26769.20801@linux.vnet.ibm.com |
---|---|
State | Dropped |
Delegated to: | Adhemerval Zanella Netto |
Headers |
Received: (qmail 16755 invoked by alias); 18 Aug 2014 20:52:06 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 16699 invoked by uid 89); 18 Aug 2014 20:52:05 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL, BAYES_00, RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: e24smtp05.br.ibm.com Message-ID: <53F26769.20801@linux.vnet.ibm.com> Date: Mon, 18 Aug 2014 17:51:53 -0300 From: Adhemerval Zanella <azanella@linux.vnet.ibm.com> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: libc-alpha@sourceware.org Subject: Re: Possible fix for bug #13165 References: <20140818195344.GA1187@brightrain.aerifal.cx> In-Reply-To: <20140818195344.GA1187@brightrain.aerifal.cx> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14081820-1798-0000-0000-00000041CE92 |
Commit Message
Adhemerval Zanella Netto
Aug. 18, 2014, 8:51 p.m. UTC
On 18-08-2014 16:53, Rich Felker wrote: > A couple days ago I posted ideas for a fix for this issue on the bug > tracker: > > https://sourceware.org/bugzilla/show_bug.cgi?id=13165#c38 > > Anybody who does glibc development/builds/testing up for trying my > idea and seeing if it works? > > Rich > If I understood correctly, you are proposing something like: correct? I saw not NPTL issues in nether powerpc64 or x86_64.
Comments
On Mon, Aug 18, 2014 at 05:51:53PM -0300, Adhemerval Zanella wrote: > On 18-08-2014 16:53, Rich Felker wrote: > > A couple days ago I posted ideas for a fix for this issue on the bug > > tracker: > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=13165#c38 > > > > Anybody who does glibc development/builds/testing up for trying my > > idea and seeing if it works? > > > > Rich > > > If I understood correctly, you are proposing something like: > > diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c > index fc5eac4..a16c5d5 100644 > --- a/nptl/pthread_cond_wait.c > +++ b/nptl/pthread_cond_wait.c > @@ -118,14 +118,6 @@ __pthread_cond_wait (cond, mutex) > /* Make sure we are alone. */ > lll_lock (cond->__data.__lock, pshared); > > - /* Now we can release the mutex. */ > - err = __pthread_mutex_unlock_usercnt (mutex, 0); > - if (__glibc_unlikely (err)) > - { > - lll_unlock (cond->__data.__lock, pshared); > - return err; > - } > - > /* We have one new user of the condvar. */ > ++cond->__data.__total_seq; > ++cond->__data.__futex; > @@ -153,6 +145,14 @@ __pthread_cond_wait (cond, mutex) > /* Remember the broadcast counter. */ > cbuffer.bc_seq = cond->__data.__broadcast_seq; > > + /* Now we can release the mutex. */ > + err = __pthread_mutex_unlock_usercnt (mutex, 0); > + if (__glibc_unlikely (err)) > + { > + lll_unlock (cond->__data.__lock, pshared); > + return err; > + } > + > do > { > unsigned int futex_val = cond->__data.__futex; > > correct? Yes, that looks like what I had in mind. But on second thought I'm not sure it should be necessary; I missed the internal lock that's being taken before the mutex is unlocked. > I saw not NPTL issues in nether powerpc64 or x86_64. You mean you can't reproduce the issue on these targets? Maybe it's only present on targets that are using non-default versions of the code. If so, and if the offending target-specific versions have been removed, maybe the bug is fixed now? I haven't had a chance to try the test case on latest glibc, but the bug was present on x86 (32-bit) last time I checked. It would be nice if this bug has already been fixed without taking any specific action to do so... Rich
On Mon, 2014-08-18 at 17:27 -0400, Rich Felker wrote: > On Mon, Aug 18, 2014 at 05:51:53PM -0300, Adhemerval Zanella wrote: > > On 18-08-2014 16:53, Rich Felker wrote: > > > A couple days ago I posted ideas for a fix for this issue on the bug > > > tracker: > > > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=13165#c38 > > > > > > Anybody who does glibc development/builds/testing up for trying my > > > idea and seeing if it works? > > > > > > Rich > > > > > If I understood correctly, you are proposing something like: > > > > diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c > > index fc5eac4..a16c5d5 100644 > > --- a/nptl/pthread_cond_wait.c > > +++ b/nptl/pthread_cond_wait.c > > @@ -118,14 +118,6 @@ __pthread_cond_wait (cond, mutex) > > /* Make sure we are alone. */ > > lll_lock (cond->__data.__lock, pshared); > > > > - /* Now we can release the mutex. */ > > - err = __pthread_mutex_unlock_usercnt (mutex, 0); > > - if (__glibc_unlikely (err)) > > - { > > - lll_unlock (cond->__data.__lock, pshared); > > - return err; > > - } > > - > > /* We have one new user of the condvar. */ > > ++cond->__data.__total_seq; > > ++cond->__data.__futex; > > @@ -153,6 +145,14 @@ __pthread_cond_wait (cond, mutex) > > /* Remember the broadcast counter. */ > > cbuffer.bc_seq = cond->__data.__broadcast_seq; > > > > + /* Now we can release the mutex. */ > > + err = __pthread_mutex_unlock_usercnt (mutex, 0); > > + if (__glibc_unlikely (err)) > > + { > > + lll_unlock (cond->__data.__lock, pshared); > > + return err; > > + } > > + > > do > > { > > unsigned int futex_val = cond->__data.__futex; > > > > correct? > > Yes, that looks like what I had in mind. But on second thought I'm not > sure it should be necessary; I missed the internal lock that's being > taken before the mutex is unlocked. > > > I saw not NPTL issues in nether powerpc64 or x86_64. > > You mean you can't reproduce the issue on these targets? Maybe it's > only present on targets that are using non-default versions of the > code. If so, and if the offending target-specific versions have been > removed, maybe the bug is fixed now? I haven't had a chance to try the > test case on latest glibc, but the bug was present on x86 (32-bit) > last time I checked. > > It would be nice if this bug has already been fixed without taking any > specific action to do so... The bug is not fixed. It is an algorithmic issue -- though of course it may or may not trigger in a particular execution depending on the respective interleavings.
On 18-08-2014 19:18, Torvald Riegel wrote: > On Mon, 2014-08-18 at 17:27 -0400, Rich Felker wrote: >> On Mon, Aug 18, 2014 at 05:51:53PM -0300, Adhemerval Zanella wrote: >>> On 18-08-2014 16:53, Rich Felker wrote: >>>> A couple days ago I posted ideas for a fix for this issue on the bug >>>> tracker: >>>> >>>> https://sourceware.org/bugzilla/show_bug.cgi?id=13165#c38 >>>> >>>> Anybody who does glibc development/builds/testing up for trying my >>>> idea and seeing if it works? >>>> >>>> Rich >>>> >>> If I understood correctly, you are proposing something like: >>> >>> diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c >>> index fc5eac4..a16c5d5 100644 >>> --- a/nptl/pthread_cond_wait.c >>> +++ b/nptl/pthread_cond_wait.c >>> @@ -118,14 +118,6 @@ __pthread_cond_wait (cond, mutex) >>> /* Make sure we are alone. */ >>> lll_lock (cond->__data.__lock, pshared); >>> >>> - /* Now we can release the mutex. */ >>> - err = __pthread_mutex_unlock_usercnt (mutex, 0); >>> - if (__glibc_unlikely (err)) >>> - { >>> - lll_unlock (cond->__data.__lock, pshared); >>> - return err; >>> - } >>> - >>> /* We have one new user of the condvar. */ >>> ++cond->__data.__total_seq; >>> ++cond->__data.__futex; >>> @@ -153,6 +145,14 @@ __pthread_cond_wait (cond, mutex) >>> /* Remember the broadcast counter. */ >>> cbuffer.bc_seq = cond->__data.__broadcast_seq; >>> >>> + /* Now we can release the mutex. */ >>> + err = __pthread_mutex_unlock_usercnt (mutex, 0); >>> + if (__glibc_unlikely (err)) >>> + { >>> + lll_unlock (cond->__data.__lock, pshared); >>> + return err; >>> + } >>> + >>> do >>> { >>> unsigned int futex_val = cond->__data.__futex; >>> >>> correct? >> Yes, that looks like what I had in mind. But on second thought I'm not >> sure it should be necessary; I missed the internal lock that's being >> taken before the mutex is unlocked. >> >>> I saw not NPTL issues in nether powerpc64 or x86_64. >> You mean you can't reproduce the issue on these targets? Maybe it's >> only present on targets that are using non-default versions of the >> code. If so, and if the offending target-specific versions have been >> removed, maybe the bug is fixed now? I haven't had a chance to try the >> test case on latest glibc, but the bug was present on x86 (32-bit) >> last time I checked. >> >> It would be nice if this bug has already been fixed without taking any >> specific action to do so... > The bug is not fixed. It is an algorithmic issue -- though of course it > may or may not trigger in a particular execution depending on the > respective interleavings. > I just noted I wasn't specific, by 'not issues' I meant NPTL testcase ran fine, however the testcase from bug stills fails.
diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c index fc5eac4..a16c5d5 100644 --- a/nptl/pthread_cond_wait.c +++ b/nptl/pthread_cond_wait.c @@ -118,14 +118,6 @@ __pthread_cond_wait (cond, mutex) /* Make sure we are alone. */ lll_lock (cond->__data.__lock, pshared); - /* Now we can release the mutex. */ - err = __pthread_mutex_unlock_usercnt (mutex, 0); - if (__glibc_unlikely (err)) - { - lll_unlock (cond->__data.__lock, pshared); - return err; - } - /* We have one new user of the condvar. */ ++cond->__data.__total_seq; ++cond->__data.__futex; @@ -153,6 +145,14 @@ __pthread_cond_wait (cond, mutex) /* Remember the broadcast counter. */ cbuffer.bc_seq = cond->__data.__broadcast_seq; + /* Now we can release the mutex. */ + err = __pthread_mutex_unlock_usercnt (mutex, 0); + if (__glibc_unlikely (err)) + { + lll_unlock (cond->__data.__lock, pshared); + return err; + } + do { unsigned int futex_val = cond->__data.__futex;