Message ID | 20160504140217.GZ2861@var.bordeaux.inria.fr |
---|---|
State | Superseded, archived |
Headers |
Received: (qmail 105787 invoked by alias); 4 May 2016 14:02:23 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 105775 invoked by uid 89); 4 May 2016 14:02:23 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.1 required=5.0 tests=BAYES_00, SPF_NEUTRAL autolearn=no version=3.3.2 spammy=held, samuel X-HELO: mail3-relais-sop.national.inria.fr Date: Wed, 4 May 2016 16:02:17 +0200 From: Samuel Thibault <samuel.thibault@ens-lyon.org> To: GNU C Library <libc-alpha@sourceware.org> Subject: [PATCH] aio: fix newp->running data race Message-ID: <20160504140217.GZ2861@var.bordeaux.inria.fr> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21+34 (58baf7c9f32f) (2010-12-30) |
Commit Message
Samuel Thibault
May 4, 2016, 2:02 p.m. UTC
Hello, As reported by helgrind: ==9363== Possible data race during write of size 4 at 0x6B45D18 by thread #5 ==9363== Locks held: 2, at addresses 0x557A1E0 0x6A149E0 ==9363== at 0x5375E0D: __aio_enqueue_request (in /lib/x86_64-linux-gnu/librt-2.21.so) ==9363== by 0x53764FD: aio_write (in /lib/x86_64-linux-gnu/librt-2.21.so) which is the write in if (result == 0) newp->running = running; ==9363== This conflicts with a previous read of size 4 by thread #6 ==9363== Locks held: none ==9363== at 0x53758FE: handle_fildes_io (in /lib/x86_64-linux-gnu/librt-2.21.so) which is the read in /* Hopefully this request is marked as running. */ assert (runp->running == allocated); So what is happening is that __aio_enqueue_request writes newp->running after having possibly started a thread worker to handle the request, thus in concurrence with the thread worker, which not only reads runp->running as shown here, but also writes to it later (and does not take __aio_requests_mutex since it's already given a request). But actually when we do start a new thread worker, there is no need to set newp->running, since it was already set by the "running = newp->running = allocated;" line along the thread creation. So to avoid the race we can just set newp->running in the cases where we do not start the thread. Samuel * sysdeps/pthread/aio_misc.c (__aio_enqueue_request): Do not write `running` field of `newp` when a thread was started to process it, since that thread will not take `__aio_requests_mutex`, and the field already has the proper value actually.
Comments
Ping? Samuel Thibault, on Wed 04 May 2016 16:02:17 +0200, wrote: > Hello, > > As reported by helgrind: > > ==9363== Possible data race during write of size 4 at 0x6B45D18 by thread #5 > ==9363== Locks held: 2, at addresses 0x557A1E0 0x6A149E0 > ==9363== at 0x5375E0D: __aio_enqueue_request (in /lib/x86_64-linux-gnu/librt-2.21.so) > ==9363== by 0x53764FD: aio_write (in /lib/x86_64-linux-gnu/librt-2.21.so) > > which is the write in > > if (result == 0) > newp->running = running; > > ==9363== This conflicts with a previous read of size 4 by thread #6 > ==9363== Locks held: none > ==9363== at 0x53758FE: handle_fildes_io (in /lib/x86_64-linux-gnu/librt-2.21.so) > > which is the read in > > /* Hopefully this request is marked as running. */ > assert (runp->running == allocated); > > So what is happening is that __aio_enqueue_request writes newp->running > after having possibly started a thread worker to handle the request, > thus in concurrence with the thread worker, which not only reads > runp->running as shown here, but also writes to it later (and does not > take __aio_requests_mutex since it's already given a request). > > But actually when we do start a new thread worker, there is no need > to set newp->running, since it was already set by the "running = > newp->running = allocated;" line along the thread creation. So to avoid > the race we can just set newp->running in the cases where we do not > start the thread. > > Samuel > > > * sysdeps/pthread/aio_misc.c (__aio_enqueue_request): Do not write > `running` field of `newp` when a thread was started to process it, > since that thread will not take `__aio_requests_mutex`, and the field > already has the proper value actually. > > diff --git a/sysdeps/pthread/aio_misc.c b/sysdeps/pthread/aio_misc.c > index f55570d..faf139d 100644 > --- a/sysdeps/pthread/aio_misc.c > +++ b/sysdeps/pthread/aio_misc.c > @@ -453,7 +453,11 @@ __aio_enqueue_request (aiocb_union *aiocbp, int operation) > result = 0; > } > } > + else > + newp->running = running; > } > + else > + newp->running = running; > > /* Enqueue the request in the run queue if it is not yet running. */ > if (running == yes && result == 0) > @@ -466,9 +470,7 @@ __aio_enqueue_request (aiocb_union *aiocbp, int operation) > pthread_cond_signal (&__aio_new_request_notification); > } > > - if (result == 0) > - newp->running = running; > - else > + if (result != 0) > { > /* Something went wrong. */ > __aio_free_request (newp);
Ping? Samuel Thibault, on Wed 04 May 2016 16:02:17 +0200, wrote: > Hello, > > As reported by helgrind: > > ==9363== Possible data race during write of size 4 at 0x6B45D18 by thread #5 > ==9363== Locks held: 2, at addresses 0x557A1E0 0x6A149E0 > ==9363== at 0x5375E0D: __aio_enqueue_request (in /lib/x86_64-linux-gnu/librt-2.21.so) > ==9363== by 0x53764FD: aio_write (in /lib/x86_64-linux-gnu/librt-2.21.so) > > which is the write in > > if (result == 0) > newp->running = running; > > ==9363== This conflicts with a previous read of size 4 by thread #6 > ==9363== Locks held: none > ==9363== at 0x53758FE: handle_fildes_io (in /lib/x86_64-linux-gnu/librt-2.21.so) > > which is the read in > > /* Hopefully this request is marked as running. */ > assert (runp->running == allocated); > > So what is happening is that __aio_enqueue_request writes newp->running > after having possibly started a thread worker to handle the request, > thus in concurrence with the thread worker, which not only reads > runp->running as shown here, but also writes to it later (and does not > take __aio_requests_mutex since it's already given a request). > > But actually when we do start a new thread worker, there is no need > to set newp->running, since it was already set by the "running = > newp->running = allocated;" line along the thread creation. So to avoid > the race we can just set newp->running in the cases where we do not > start the thread. > > Samuel > > > * sysdeps/pthread/aio_misc.c (__aio_enqueue_request): Do not write > `running` field of `newp` when a thread was started to process it, > since that thread will not take `__aio_requests_mutex`, and the field > already has the proper value actually. > > diff --git a/sysdeps/pthread/aio_misc.c b/sysdeps/pthread/aio_misc.c > index f55570d..faf139d 100644 > --- a/sysdeps/pthread/aio_misc.c > +++ b/sysdeps/pthread/aio_misc.c > @@ -453,7 +453,11 @@ __aio_enqueue_request (aiocb_union *aiocbp, int operation) > result = 0; > } > } > + else > + newp->running = running; > } > + else > + newp->running = running; > > /* Enqueue the request in the run queue if it is not yet running. */ > if (running == yes && result == 0) > @@ -466,9 +470,7 @@ __aio_enqueue_request (aiocb_union *aiocbp, int operation) > pthread_cond_signal (&__aio_new_request_notification); > } > > - if (result == 0) > - newp->running = running; > - else > + if (result != 0) > { > /* Something went wrong. */ > __aio_free_request (newp);
On Fri, 2016-07-08 at 16:28 +0200, Samuel Thibault wrote: > Ping? I've just looked quickly at the patch. Fixes for concurrency issues should be accompanied with detailed comments, so that other developers can easily grasp how synchronization is supposed to work. In this case, for example, it should be documented why runp->running can't be accessed concurrently from several threads. Doing this properly may add a lot of comments to the code if there are missing elsewhere, so we might need to find some middle ground if this patch is meant to become part of 2.24. But it will need something. See the concurrency notes in ./stdlib/cxa_thread_atexit_impl.c for an example, or the code comments in nptl/sem_waitcommon.c. > Samuel Thibault, on Wed 04 May 2016 16:02:17 +0200, wrote: > > Hello, > > > > As reported by helgrind: > > > > ==9363== Possible data race during write of size 4 at 0x6B45D18 by thread #5 > > ==9363== Locks held: 2, at addresses 0x557A1E0 0x6A149E0 > > ==9363== at 0x5375E0D: __aio_enqueue_request (in /lib/x86_64-linux-gnu/librt-2.21.so) > > ==9363== by 0x53764FD: aio_write (in /lib/x86_64-linux-gnu/librt-2.21.so) > > > > which is the write in > > > > if (result == 0) > > newp->running = running; > > > > ==9363== This conflicts with a previous read of size 4 by thread #6 > > ==9363== Locks held: none > > ==9363== at 0x53758FE: handle_fildes_io (in /lib/x86_64-linux-gnu/librt-2.21.so) > > > > which is the read in > > > > /* Hopefully this request is marked as running. */ > > assert (runp->running == allocated); > > > > So what is happening is that __aio_enqueue_request writes newp->running > > after having possibly started a thread worker to handle the request, > > thus in concurrence with the thread worker, which not only reads > > runp->running as shown here, but also writes to it later (and does not > > take __aio_requests_mutex since it's already given a request). > > > > But actually when we do start a new thread worker, there is no need > > to set newp->running, since it was already set by the "running = > > newp->running = allocated;" line along the thread creation. So to avoid > > the race we can just set newp->running in the cases where we do not > > start the thread. > > > > Samuel > > > > > > * sysdeps/pthread/aio_misc.c (__aio_enqueue_request): Do not write > > `running` field of `newp` when a thread was started to process it, > > since that thread will not take `__aio_requests_mutex`, and the field > > already has the proper value actually. > > > > diff --git a/sysdeps/pthread/aio_misc.c b/sysdeps/pthread/aio_misc.c > > index f55570d..faf139d 100644 > > --- a/sysdeps/pthread/aio_misc.c > > +++ b/sysdeps/pthread/aio_misc.c > > @@ -453,7 +453,11 @@ __aio_enqueue_request (aiocb_union *aiocbp, int operation) > > result = 0; > > } > > } > > + else > > + newp->running = running; > > } > > + else > > + newp->running = running; > > > > /* Enqueue the request in the run queue if it is not yet running. */ > > if (running == yes && result == 0) > > @@ -466,9 +470,7 @@ __aio_enqueue_request (aiocb_union *aiocbp, int operation) > > pthread_cond_signal (&__aio_new_request_notification); > > } > > > > - if (result == 0) > > - newp->running = running; > > - else > > + if (result != 0) > > { > > /* Something went wrong. */ > > __aio_free_request (newp);
diff --git a/sysdeps/pthread/aio_misc.c b/sysdeps/pthread/aio_misc.c index f55570d..faf139d 100644 --- a/sysdeps/pthread/aio_misc.c +++ b/sysdeps/pthread/aio_misc.c @@ -453,7 +453,11 @@ __aio_enqueue_request (aiocb_union *aiocbp, int operation) result = 0; } } + else + newp->running = running; } + else + newp->running = running; /* Enqueue the request in the run queue if it is not yet running. */ if (running == yes && result == 0) @@ -466,9 +470,7 @@ __aio_enqueue_request (aiocb_union *aiocbp, int operation) pthread_cond_signal (&__aio_new_request_notification); } - if (result == 0) - newp->running = running; - else + if (result != 0) { /* Something went wrong. */ __aio_free_request (newp);