aio: fix newp->running data race

Message ID 20160504140217.GZ2861@var.bordeaux.inria.fr
State Superseded, archived
Headers

Commit Message

Samuel Thibault May 4, 2016, 2:02 p.m. UTC
  Hello,

As reported by helgrind:

==9363== Possible data race during write of size 4 at 0x6B45D18 by thread #5
==9363== Locks held: 2, at addresses 0x557A1E0 0x6A149E0
==9363==    at 0x5375E0D: __aio_enqueue_request (in /lib/x86_64-linux-gnu/librt-2.21.so)
==9363==    by 0x53764FD: aio_write (in /lib/x86_64-linux-gnu/librt-2.21.so)

which is the write in

  if (result == 0)
    newp->running = running;

==9363== This conflicts with a previous read of size 4 by thread #6
==9363== Locks held: none
==9363==    at 0x53758FE: handle_fildes_io (in /lib/x86_64-linux-gnu/librt-2.21.so)

which is the read in 

	  /* Hopefully this request is marked as running.  */
	  assert (runp->running == allocated);

So what is happening is that __aio_enqueue_request writes newp->running
after having possibly started a thread worker to handle the request,
thus in concurrence with the thread worker, which not only reads
runp->running as shown here, but also writes to it later (and does not
take __aio_requests_mutex since it's already given a request).

But actually when we do start a new thread worker, there is no need
to set newp->running, since it was already set by the "running =
newp->running = allocated;" line along the thread creation.  So to avoid
the race we can just set newp->running in the cases where we do not
start the thread.

Samuel


    	* sysdeps/pthread/aio_misc.c (__aio_enqueue_request): Do not write
    	`running` field of `newp` when a thread was started to process it,
    	since that thread will not take `__aio_requests_mutex`, and the field
    	already has the proper value actually.
  

Comments

Samuel Thibault May 18, 2016, 1:02 p.m. UTC | #1
Ping?

Samuel Thibault, on Wed 04 May 2016 16:02:17 +0200, wrote:
> Hello,
> 
> As reported by helgrind:
> 
> ==9363== Possible data race during write of size 4 at 0x6B45D18 by thread #5
> ==9363== Locks held: 2, at addresses 0x557A1E0 0x6A149E0
> ==9363==    at 0x5375E0D: __aio_enqueue_request (in /lib/x86_64-linux-gnu/librt-2.21.so)
> ==9363==    by 0x53764FD: aio_write (in /lib/x86_64-linux-gnu/librt-2.21.so)
> 
> which is the write in
> 
>   if (result == 0)
>     newp->running = running;
> 
> ==9363== This conflicts with a previous read of size 4 by thread #6
> ==9363== Locks held: none
> ==9363==    at 0x53758FE: handle_fildes_io (in /lib/x86_64-linux-gnu/librt-2.21.so)
> 
> which is the read in 
> 
> 	  /* Hopefully this request is marked as running.  */
> 	  assert (runp->running == allocated);
> 
> So what is happening is that __aio_enqueue_request writes newp->running
> after having possibly started a thread worker to handle the request,
> thus in concurrence with the thread worker, which not only reads
> runp->running as shown here, but also writes to it later (and does not
> take __aio_requests_mutex since it's already given a request).
> 
> But actually when we do start a new thread worker, there is no need
> to set newp->running, since it was already set by the "running =
> newp->running = allocated;" line along the thread creation.  So to avoid
> the race we can just set newp->running in the cases where we do not
> start the thread.
> 
> Samuel
> 
> 
>     	* sysdeps/pthread/aio_misc.c (__aio_enqueue_request): Do not write
>     	`running` field of `newp` when a thread was started to process it,
>     	since that thread will not take `__aio_requests_mutex`, and the field
>     	already has the proper value actually.
> 
> diff --git a/sysdeps/pthread/aio_misc.c b/sysdeps/pthread/aio_misc.c
> index f55570d..faf139d 100644
> --- a/sysdeps/pthread/aio_misc.c
> +++ b/sysdeps/pthread/aio_misc.c
> @@ -453,7 +453,11 @@ __aio_enqueue_request (aiocb_union *aiocbp, int operation)
>  		result = 0;
>  	    }
>  	}
> +      else
> +	newp->running = running;
>      }
> +  else
> +    newp->running = running;
>  
>    /* Enqueue the request in the run queue if it is not yet running.  */
>    if (running == yes && result == 0)
> @@ -466,9 +470,7 @@ __aio_enqueue_request (aiocb_union *aiocbp, int operation)
>  	pthread_cond_signal (&__aio_new_request_notification);
>      }
>  
> -  if (result == 0)
> -    newp->running = running;
> -  else
> +  if (result != 0)
>      {
>        /* Something went wrong.  */
>        __aio_free_request (newp);
  
Samuel Thibault July 8, 2016, 2:28 p.m. UTC | #2
Ping?

Samuel Thibault, on Wed 04 May 2016 16:02:17 +0200, wrote:
> Hello,
> 
> As reported by helgrind:
> 
> ==9363== Possible data race during write of size 4 at 0x6B45D18 by thread #5
> ==9363== Locks held: 2, at addresses 0x557A1E0 0x6A149E0
> ==9363==    at 0x5375E0D: __aio_enqueue_request (in /lib/x86_64-linux-gnu/librt-2.21.so)
> ==9363==    by 0x53764FD: aio_write (in /lib/x86_64-linux-gnu/librt-2.21.so)
> 
> which is the write in
> 
>   if (result == 0)
>     newp->running = running;
> 
> ==9363== This conflicts with a previous read of size 4 by thread #6
> ==9363== Locks held: none
> ==9363==    at 0x53758FE: handle_fildes_io (in /lib/x86_64-linux-gnu/librt-2.21.so)
> 
> which is the read in 
> 
> 	  /* Hopefully this request is marked as running.  */
> 	  assert (runp->running == allocated);
> 
> So what is happening is that __aio_enqueue_request writes newp->running
> after having possibly started a thread worker to handle the request,
> thus in concurrence with the thread worker, which not only reads
> runp->running as shown here, but also writes to it later (and does not
> take __aio_requests_mutex since it's already given a request).
> 
> But actually when we do start a new thread worker, there is no need
> to set newp->running, since it was already set by the "running =
> newp->running = allocated;" line along the thread creation.  So to avoid
> the race we can just set newp->running in the cases where we do not
> start the thread.
> 
> Samuel
> 
> 
>     	* sysdeps/pthread/aio_misc.c (__aio_enqueue_request): Do not write
>     	`running` field of `newp` when a thread was started to process it,
>     	since that thread will not take `__aio_requests_mutex`, and the field
>     	already has the proper value actually.
> 
> diff --git a/sysdeps/pthread/aio_misc.c b/sysdeps/pthread/aio_misc.c
> index f55570d..faf139d 100644
> --- a/sysdeps/pthread/aio_misc.c
> +++ b/sysdeps/pthread/aio_misc.c
> @@ -453,7 +453,11 @@ __aio_enqueue_request (aiocb_union *aiocbp, int operation)
>  		result = 0;
>  	    }
>  	}
> +      else
> +	newp->running = running;
>      }
> +  else
> +    newp->running = running;
>  
>    /* Enqueue the request in the run queue if it is not yet running.  */
>    if (running == yes && result == 0)
> @@ -466,9 +470,7 @@ __aio_enqueue_request (aiocb_union *aiocbp, int operation)
>  	pthread_cond_signal (&__aio_new_request_notification);
>      }
>  
> -  if (result == 0)
> -    newp->running = running;
> -  else
> +  if (result != 0)
>      {
>        /* Something went wrong.  */
>        __aio_free_request (newp);
  
Torvald Riegel July 8, 2016, 4:05 p.m. UTC | #3
On Fri, 2016-07-08 at 16:28 +0200, Samuel Thibault wrote:
> Ping?

I've just looked quickly at the patch.  Fixes for concurrency issues
should be accompanied with detailed comments, so that other developers
can easily grasp how synchronization is supposed to work.  In this case,
for example, it should be documented why runp->running can't be accessed
concurrently from several threads.
Doing this properly may add a lot of comments to the code if there are
missing elsewhere, so we might need to find some middle ground if this
patch is meant to become part of 2.24.  But it will need something.  See
the concurrency notes in ./stdlib/cxa_thread_atexit_impl.c for an
example, or the code comments in nptl/sem_waitcommon.c.

> Samuel Thibault, on Wed 04 May 2016 16:02:17 +0200, wrote:
> > Hello,
> > 
> > As reported by helgrind:
> > 
> > ==9363== Possible data race during write of size 4 at 0x6B45D18 by thread #5
> > ==9363== Locks held: 2, at addresses 0x557A1E0 0x6A149E0
> > ==9363==    at 0x5375E0D: __aio_enqueue_request (in /lib/x86_64-linux-gnu/librt-2.21.so)
> > ==9363==    by 0x53764FD: aio_write (in /lib/x86_64-linux-gnu/librt-2.21.so)
> > 
> > which is the write in
> > 
> >   if (result == 0)
> >     newp->running = running;
> > 
> > ==9363== This conflicts with a previous read of size 4 by thread #6
> > ==9363== Locks held: none
> > ==9363==    at 0x53758FE: handle_fildes_io (in /lib/x86_64-linux-gnu/librt-2.21.so)
> > 
> > which is the read in 
> > 
> > 	  /* Hopefully this request is marked as running.  */
> > 	  assert (runp->running == allocated);
> > 
> > So what is happening is that __aio_enqueue_request writes newp->running
> > after having possibly started a thread worker to handle the request,
> > thus in concurrence with the thread worker, which not only reads
> > runp->running as shown here, but also writes to it later (and does not
> > take __aio_requests_mutex since it's already given a request).
> > 
> > But actually when we do start a new thread worker, there is no need
> > to set newp->running, since it was already set by the "running =
> > newp->running = allocated;" line along the thread creation.  So to avoid
> > the race we can just set newp->running in the cases where we do not
> > start the thread.
> > 
> > Samuel
> > 
> > 
> >     	* sysdeps/pthread/aio_misc.c (__aio_enqueue_request): Do not write
> >     	`running` field of `newp` when a thread was started to process it,
> >     	since that thread will not take `__aio_requests_mutex`, and the field
> >     	already has the proper value actually.
> > 
> > diff --git a/sysdeps/pthread/aio_misc.c b/sysdeps/pthread/aio_misc.c
> > index f55570d..faf139d 100644
> > --- a/sysdeps/pthread/aio_misc.c
> > +++ b/sysdeps/pthread/aio_misc.c
> > @@ -453,7 +453,11 @@ __aio_enqueue_request (aiocb_union *aiocbp, int operation)
> >  		result = 0;
> >  	    }
> >  	}
> > +      else
> > +	newp->running = running;
> >      }
> > +  else
> > +    newp->running = running;
> >  
> >    /* Enqueue the request in the run queue if it is not yet running.  */
> >    if (running == yes && result == 0)
> > @@ -466,9 +470,7 @@ __aio_enqueue_request (aiocb_union *aiocbp, int operation)
> >  	pthread_cond_signal (&__aio_new_request_notification);
> >      }
> >  
> > -  if (result == 0)
> > -    newp->running = running;
> > -  else
> > +  if (result != 0)
> >      {
> >        /* Something went wrong.  */
> >        __aio_free_request (newp);
  

Patch

diff --git a/sysdeps/pthread/aio_misc.c b/sysdeps/pthread/aio_misc.c
index f55570d..faf139d 100644
--- a/sysdeps/pthread/aio_misc.c
+++ b/sysdeps/pthread/aio_misc.c
@@ -453,7 +453,11 @@  __aio_enqueue_request (aiocb_union *aiocbp, int operation)
 		result = 0;
 	    }
 	}
+      else
+	newp->running = running;
     }
+  else
+    newp->running = running;
 
   /* Enqueue the request in the run queue if it is not yet running.  */
   if (running == yes && result == 0)
@@ -466,9 +470,7 @@  __aio_enqueue_request (aiocb_union *aiocbp, int operation)
 	pthread_cond_signal (&__aio_new_request_notification);
     }
 
-  if (result == 0)
-    newp->running = running;
-  else
+  if (result != 0)
     {
       /* Something went wrong.  */
       __aio_free_request (newp);