[2/2] New pthread rwlock that is more scalable.

Message ID 1483385174.13143.124.camel@redhat.com
State New, archived
Headers

Commit Message

Torvald Riegel Jan. 2, 2017, 7:26 p.m. UTC
  On Wed, 2016-07-27 at 23:44 +0200, Torvald Riegel wrote:
> This replaces the pthread rwlock with a new implementation that uses a
> more scalable algorithm (primarily through not using a critical section
> anymore to make state changes).  The fast path for rdlock acquisition
> and release is now basically a single atomic read-modify write or CAS
> and a few branches.  See nptl/pthread_rwlock_common.c for details.

I have noticed two small oversights, which are taken care of in the
attached patch.  The first is a mssign overflow check (a lock acquired
too often as a reader) in one of the tryrdlock branches.  The second is
a that I had forgotten to apply a cleanup (no correctness change; the
former code did more than it had to).
  

Comments

Carlos O'Donell Jan. 10, 2017, 8:45 a.m. UTC | #1
On 01/02/2017 02:26 PM, Torvald Riegel wrote:
> On Wed, 2016-07-27 at 23:44 +0200, Torvald Riegel wrote:
>> This replaces the pthread rwlock with a new implementation that uses a
>> more scalable algorithm (primarily through not using a critical section
>> anymore to make state changes).  The fast path for rdlock acquisition
>> and release is now basically a single atomic read-modify write or CAS
>> and a few branches.  See nptl/pthread_rwlock_common.c for details.
> I have noticed two small oversights, which are taken care of in the
> attached patch.  The first is a mssign overflow check (a lock acquired
> too often as a reader) in one of the tryrdlock branches.  The second is
> a that I had forgotten to apply a cleanup (no correctness change; the
> former code did more than it had to).

These look good to me.

> 
> commit 59c2c0dafb1c1460a457037f222032ade9fc5a74
> Author: Torvald Riegel <triegel@redhat.com>
> Date:   Mon Jan 2 17:50:37 2017 +0100
> 
>     Fix a minor issue and an oversight (not a correctness bug) in tryrdlock
> 
> diff --git a/nptl/pthread_rwlock_tryrdlock.c b/nptl/pthread_rwlock_tryrdlock.c
> index e002f15..6c3014c 100644
> --- a/nptl/pthread_rwlock_tryrdlock.c
> +++ b/nptl/pthread_rwlock_tryrdlock.c
> @@ -51,12 +51,6 @@ __pthread_rwlock_tryrdlock (pthread_rwlock_t *rwlock)
>  		  == PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP))
>  	    return EBUSY;
>  	  rnew = r + (1 << PTHREAD_RWLOCK_READER_SHIFT);
> -	  /* If we could have caused an overflow or take effect during an
> -	     overflow, we just can / need to return EAGAIN.  There is no need
> -	     to have modified the number of readers because we could have
> -	     done that and cleaned up immediately.  */
> -	  if (rnew >= PTHREAD_RWLOCK_READER_OVERFLOW)
> -	    return EAGAIN;
>  	}
>        else
>  	{
> @@ -72,6 +66,12 @@ __pthread_rwlock_tryrdlock (pthread_rwlock_t *rwlock)
>  		  ^ PTHREAD_RWLOCK_WRPHASE;
>  	    }
>  	}
> +      /* If we could have caused an overflow or take effect during an
> +	 overflow, we just can / need to return EAGAIN.  There is no need to
> +	 have actually modified the number of readers because we could have
> +	 done that and cleaned up immediately.  */
> +      if (rnew >= PTHREAD_RWLOCK_READER_OVERFLOW)
> +	return EAGAIN;

OK. Move the overflow check later in the code to catch all cases.

>      }
>    /* If the CAS fails, we retry; this prevents that tryrdlock fails spuriously
>       (i.e., fails to acquire the lock although there is no writer), which is
> @@ -84,16 +84,25 @@ __pthread_rwlock_tryrdlock (pthread_rwlock_t *rwlock)
>       readers or writers that acquire and release in the meantime.  Using
>       randomized exponential back-off to make a live-lock unlikely should be
>       sufficient.
> +     TODO Back-off.
>       Acquire MO so we synchronize with prior writers.  */
>    while (!atomic_compare_exchange_weak_acquire (&rwlock->__data.__readers,
>        &r, rnew));
>  
>    if ((r & PTHREAD_RWLOCK_WRPHASE) != 0)
>      {
> -      //FIXME / TODO same as in rdlock_full
> -      int private = __pthread_rwlock_get_private (rwlock);
> -      atomic_store_release (&rwlock->__data.__wrphase_futex, 0);
> -      futex_wake (&rwlock->__data.__wrphase_futex, INT_MAX, private);
> +      /* Same as in __pthread_rwlock_rdlock_full:
> +	 We started the read phase, so we are also responsible for
> +	 updating the write-phase futex.  Relaxed MO is sufficient.
> +	 Note that there can be no other reader that we have to wake
> +	 because all other readers will see the read phase started by us
> +	 (or they will try to start it themselves); if a writer started
> +	 the read phase, we cannot have started it.  Furthermore, we
> +	 cannot discard a PTHREAD_RWLOCK_FUTEX_USED flag because we will
> +	 overwrite the value set by the most recent writer (or the readers
> +	 before it in case of explicit hand-over) and we know that there
> +	 are no waiting readers.  */
> +      atomic_store_relaxed (&rwlock->__data.__wrphase_futex, 0);

OK.

>      }
>  
>    return 0;
  

Patch

commit 59c2c0dafb1c1460a457037f222032ade9fc5a74
Author: Torvald Riegel <triegel@redhat.com>
Date:   Mon Jan 2 17:50:37 2017 +0100

    Fix a minor issue and an oversight (not a correctness bug) in tryrdlock

diff --git a/nptl/pthread_rwlock_tryrdlock.c b/nptl/pthread_rwlock_tryrdlock.c
index e002f15..6c3014c 100644
--- a/nptl/pthread_rwlock_tryrdlock.c
+++ b/nptl/pthread_rwlock_tryrdlock.c
@@ -51,12 +51,6 @@  __pthread_rwlock_tryrdlock (pthread_rwlock_t *rwlock)
 		  == PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP))
 	    return EBUSY;
 	  rnew = r + (1 << PTHREAD_RWLOCK_READER_SHIFT);
-	  /* If we could have caused an overflow or take effect during an
-	     overflow, we just can / need to return EAGAIN.  There is no need
-	     to have modified the number of readers because we could have
-	     done that and cleaned up immediately.  */
-	  if (rnew >= PTHREAD_RWLOCK_READER_OVERFLOW)
-	    return EAGAIN;
 	}
       else
 	{
@@ -72,6 +66,12 @@  __pthread_rwlock_tryrdlock (pthread_rwlock_t *rwlock)
 		  ^ PTHREAD_RWLOCK_WRPHASE;
 	    }
 	}
+      /* If we could have caused an overflow or take effect during an
+	 overflow, we just can / need to return EAGAIN.  There is no need to
+	 have actually modified the number of readers because we could have
+	 done that and cleaned up immediately.  */
+      if (rnew >= PTHREAD_RWLOCK_READER_OVERFLOW)
+	return EAGAIN;
     }
   /* If the CAS fails, we retry; this prevents that tryrdlock fails spuriously
      (i.e., fails to acquire the lock although there is no writer), which is
@@ -84,16 +84,25 @@  __pthread_rwlock_tryrdlock (pthread_rwlock_t *rwlock)
      readers or writers that acquire and release in the meantime.  Using
      randomized exponential back-off to make a live-lock unlikely should be
      sufficient.
+     TODO Back-off.
      Acquire MO so we synchronize with prior writers.  */
   while (!atomic_compare_exchange_weak_acquire (&rwlock->__data.__readers,
       &r, rnew));
 
   if ((r & PTHREAD_RWLOCK_WRPHASE) != 0)
     {
-      //FIXME / TODO same as in rdlock_full
-      int private = __pthread_rwlock_get_private (rwlock);
-      atomic_store_release (&rwlock->__data.__wrphase_futex, 0);
-      futex_wake (&rwlock->__data.__wrphase_futex, INT_MAX, private);
+      /* Same as in __pthread_rwlock_rdlock_full:
+	 We started the read phase, so we are also responsible for
+	 updating the write-phase futex.  Relaxed MO is sufficient.
+	 Note that there can be no other reader that we have to wake
+	 because all other readers will see the read phase started by us
+	 (or they will try to start it themselves); if a writer started
+	 the read phase, we cannot have started it.  Furthermore, we
+	 cannot discard a PTHREAD_RWLOCK_FUTEX_USED flag because we will
+	 overwrite the value set by the most recent writer (or the readers
+	 before it in case of explicit hand-over) and we know that there
+	 are no waiting readers.  */
+      atomic_store_relaxed (&rwlock->__data.__wrphase_futex, 0);
     }
 
   return 0;