Fix rwlock stall with PREFER_WRITER_NONRECURSIVE_NP (bug 23861)

Message ID mvmh8fj6k3m.fsf@suse.de
State New, archived
Headers

Commit Message

Andreas Schwab Dec. 12, 2018, 11:49 a.m. UTC
  [BZ #23861]
	* nptl/pthread_rwlock_common.c (__pthread_rwlock_rdlock_full):
	Update expected value for __readers while waiting on
	PTHREAD_RWLOCK_RWAITING.
	* nptl/tst-rwlock-pwn.c: New file.
	* nptl/Makefile (tests): Add tst-rwlock-pwn.
---
 nptl/Makefile                |  3 +-
 nptl/pthread_rwlock_common.c |  2 +-
 nptl/tst-rwlock-pwn.c        | 82 ++++++++++++++++++++++++++++++++++++
 3 files changed, 85 insertions(+), 2 deletions(-)
 create mode 100644 nptl/tst-rwlock-pwn.c
  

Comments

Carlos O'Donell Dec. 12, 2018, 9:17 p.m. UTC | #1
On 12/12/18 6:49 AM, Andreas Schwab wrote:
> 	[BZ #23861]
> 	* nptl/pthread_rwlock_common.c (__pthread_rwlock_rdlock_full):
> 	Update expected value for __readers while waiting on
> 	PTHREAD_RWLOCK_RWAITING.
> 	* nptl/tst-rwlock-pwn.c: New file.
> 	* nptl/Makefile (tests): Add tst-rwlock-pwn.

OK for master with:
- the following commit message (or a variant)
- more comments added to the test case (see suggestions below).

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

Thank you for the fix. In the future please provide a more detailed 
description of the problem and the fix. The more detailed your description
the more valuable you'll find my review as I'm able to review intent and
changes together with my own understanding of the code, without having 
to replicate all of the work that went into it.

I have tested this version on my own systems, and find that tst-rwlock-pwn
passes and doesn't fail after several days of testing. Testing was only
done on x86_64.

I suggest the following commit message:
~~~
Fix rwlock stall with PREFER_WRITER_NONRECURSIVE_NP (bug 23861)

In the read lock function (__pthread_rwlock_rdlock_full) there was a
code path which would fail to reload __readers while waiting for
PTHREAD_RWLOCK_RWAITING to change. This failure to reload __readers
into a local value meant that various conditionals used the old value
of __readers and with only two threads left it could result in an
indefinite stall of one of the readers (waiting for PTHREAD_RWLOCK_RWAITING
to go to zero, but it never would).

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
~~~
Feel free to add or modify the message.

> ---
>  nptl/Makefile                |  3 +-
>  nptl/pthread_rwlock_common.c |  2 +-
>  nptl/tst-rwlock-pwn.c        | 82 ++++++++++++++++++++++++++++++++++++
>  3 files changed, 85 insertions(+), 2 deletions(-)
>  create mode 100644 nptl/tst-rwlock-pwn.c
> 
> diff --git a/nptl/Makefile b/nptl/Makefile
> index 34ae830276..b01f2b0626 100644
> --- a/nptl/Makefile
> +++ b/nptl/Makefile
> @@ -318,7 +318,8 @@ tests = tst-attr1 tst-attr2 tst-attr3 tst-default-attr \
>  	tst-minstack-throw \
>  	tst-cnd-basic tst-mtx-trylock tst-cnd-broadcast \
>  	tst-cnd-timedwait tst-thrd-detach tst-mtx-basic tst-thrd-sleep \
> -	tst-mtx-recursive tst-tss-basic tst-call-once tst-mtx-timedlock
> +	tst-mtx-recursive tst-tss-basic tst-call-once tst-mtx-timedlock \
> +	tst-rwlock-pwn

OK.

>  
>  tests-internal := tst-rwlock19 tst-rwlock20 \
>  		  tst-sem11 tst-sem12 tst-sem13 \
> diff --git a/nptl/pthread_rwlock_common.c b/nptl/pthread_rwlock_common.c
> index 5dd534271a..85fc1bcfc7 100644
> --- a/nptl/pthread_rwlock_common.c
> +++ b/nptl/pthread_rwlock_common.c
> @@ -314,7 +314,7 @@ __pthread_rwlock_rdlock_full (pthread_rwlock_t *rwlock,
>  		 harmless because the flag is just about the state of
>  		 __readers, and all threads set the flag under the same
>  		 conditions.  */
> -	      while ((atomic_load_relaxed (&rwlock->__data.__readers)
> +	      while (((r = atomic_load_relaxed (&rwlock->__data.__readers))
>  		      & PTHREAD_RWLOCK_RWAITING) != 0)

This is OK.

(a) Issues with atomic.h API.

It took me a while to review the entire algorithm, and this particular readlock
section of the algorithm has some interesting choices which were made in atomic.h
which I think lead to this defect. I wonder if we can't improve atomic.h to fix this.

There is a mix of atomic operations that upon completion load the result of the most
recent read into the expected memory location. In fact this is what the previous
call to atomic_compare_exchange_weak_relaxed() does because it uses
__atomic_compare_exchange_n() e.g. "If they are not equal, the operation is a read 
and the current contents of *ptr are written into *expected."

Then in the loop that is waiting for RWAITING to clear (this is the other half of
__pthread_rwlock_rdunlock, which ensures readers are able to make progress) the
algorithm uses atomic_load_relaxed which *returns* the value rather than having
any it written into a pointer as __atomic_load() would have done.

It is this mixing of "some atomic operations return the most recently read value"
and "some atomic operations write the most recently read value into expected" that
I think is hard to keep track of and actually requires looking at the underlying
implementation in atomic.h to determine which of the two operations is going on.

(b) rwlock stall problem.

In the inner-most loop of the PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP within
__pthread_rwlock_rdlock_full, we fail to reload the value of r, and this means the
following:

313               /* Wait for as long as the flag is set.  An ABA situation is
314                  harmless because the flag is just about the state of
315                  __readers, and all threads set the flag under the same
316                  conditions.  */
317               while ((atomic_load_relaxed (&rwlock->__data.__readers)
318                   & PTHREAD_RWLOCK_RWAITING) != 0)

This does not reload r.

319                 {
320                   int private = __pthread_rwlock_get_private (rwlock);
321                   int err = futex_abstimed_wait (&rwlock->__data.__readers,
322                       r, abstime, private);

This will fail continuously if r is different.

323                   /* We ignore EAGAIN and EINTR.  On time-outs, we can just
324                      return because we don't need to clean up anything.  */
325                   if (err == ETIMEDOUT)
326                     return err;
327                 }

However, the atomic_load_relaxed reads and returns the real value of __readers
and eventually it should succeed and exit the loop after consuming a lot of cpu
to busy-wait for r to change i.e. RWAITING to clear.

299   if (rwlock->__data.__flags == PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP)
300     { 
301       r = atomic_load_relaxed (&rwlock->__data.__readers);
302       while (((r & PTHREAD_RWLOCK_WRPHASE) == 0)
303               && ((r & PTHREAD_RWLOCK_WRLOCKED) != 0)
304               && ((r >> PTHREAD_RWLOCK_READER_SHIFT) > 0))

It's very likely this is true with old data in r.

305         {
306           /* TODO Spin first.  */
307           /* Try setting the flag signaling that we are waiting without having
308              incremented the number of readers.  Relaxed MO is fine because
309              this is just about waiting for a state change in __readers.  */
310           if (atomic_compare_exchange_weak_relaxed
311               (&rwlock->__data.__readers, &r, r | PTHREAD_RWLOCK_RWAITING))

When this fails, because r is old data, it updates r with the most recent value
because it uses __atomic_compare_exchange_n() which updates *expected with the
most recently read value.

312             { 

This loop can go on forever if the failure occurs in the second to last reader.
The second to last reader will clear RWAITING, and if the rest of r matches,
the if on line 310 can succeed and set RWAITING *again*, but this time the last
reader should have succeeded and didn't and therefore it waits forever for an
unlock that will not arrive.

This also supports why 3 is the required minimum number of threads, you need one
writer, and two readers, to trigger the stall. With the minimum number of threads
you're more likely to get them on distinct CPUs and trigger the stall case.

>  		{
>  		  int private = __pthread_rwlock_get_private (rwlock);
> diff --git a/nptl/tst-rwlock-pwn.c b/nptl/tst-rwlock-pwn.c
> new file mode 100644
> index 0000000000..91fa452610
> --- /dev/null
> +++ b/nptl/tst-rwlock-pwn.c
> @@ -0,0 +1,82 @@
> +/* Test rwlock with PREFER_WRITER_NONRECURSIVE_NP.

This is related to a bug.

Please use:

Bug 23861: Test rwlock with PREFER_WRITER_NONRECURSIVE_NP.

> +   Copyright (C) 2018 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <unistd.h>
> +#include <pthread.h>
> +#include <support/xthread.h>
> +
> +/* We choose 10 iterations and 3 threads because this happens to be able
> +   to trigger the stall today on modern hardware.  */

This should say:

/* We choose 10 iterations because this happens to be able
   to trigger the stall today on modern hardware.  */

> +#define LOOPS 10

Please add:

/* We need 3 threads to trigger bug 23861.  One thread as a writer, and
   two reader threads.  The test verifies that the second-to-last reader
   is able to notify the *last* reader that it should be done waiting.
   If the second-to-last reader fails to notify the last reader or does
   so incorrectly then the last reader may stall indefinitely.  */

> +#define NTHREADS 3
> +
> +_Atomic int do_exit;
> +pthread_rwlockattr_t mylock_attr;
> +pthread_rwlock_t mylock;
> +
> +void *
> +run_loop (void *a)
> +{
> +  while (!do_exit)
> +    {
> +      if (random () & 1)
> +	{
> +	  xpthread_rwlock_wrlock (&mylock);
> +	  xpthread_rwlock_unlock (&mylock);
> +	}
> +      else
> +	{
> +	  xpthread_rwlock_rdlock (&mylock);
> +	  xpthread_rwlock_unlock (&mylock);
> +	}	

OK.

> +    }
> +  return NULL;
> +}
> +
> +int
> +do_test (void)
> +{
> +  xpthread_rwlockattr_init (&mylock_attr);
> +  xpthread_rwlockattr_setkind_np (&mylock_attr,
> +				  PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP);
> +  xpthread_rwlock_init (&mylock, &mylock_attr);
> +
> +  for (int n = 0; n < LOOPS; n++)
> +    {
> +      pthread_t tids[NTHREADS];
> +      do_exit = 0;
> +      for (int i = 0; i < NTHREADS; i++)
> +	tids[i] = xpthread_create (NULL, run_loop, NULL);
> +      /* Let the threads run for some time.  */
> +      sleep (1);
> +      printf ("Exiting...");
> +      fflush (stdout);
> +      do_exit = 1;
> +      for (int i = 0; i < NTHREADS; i++)
> +	xpthread_join (tids[i]);
> +      printf ("done.\n");
> +    }
> +  pthread_rwlock_destroy (&mylock);
> +  pthread_rwlockattr_destroy (&mylock_attr);
> +  return 0;
> +}
> +
> +#define TIMEOUT (DEFAULT_TIMEOUT + 3 * LOOPS)
> +#include <support/test-driver.c>
> 

OK.
  

Patch

diff --git a/nptl/Makefile b/nptl/Makefile
index 34ae830276..b01f2b0626 100644
--- a/nptl/Makefile
+++ b/nptl/Makefile
@@ -318,7 +318,8 @@  tests = tst-attr1 tst-attr2 tst-attr3 tst-default-attr \
 	tst-minstack-throw \
 	tst-cnd-basic tst-mtx-trylock tst-cnd-broadcast \
 	tst-cnd-timedwait tst-thrd-detach tst-mtx-basic tst-thrd-sleep \
-	tst-mtx-recursive tst-tss-basic tst-call-once tst-mtx-timedlock
+	tst-mtx-recursive tst-tss-basic tst-call-once tst-mtx-timedlock \
+	tst-rwlock-pwn
 
 tests-internal := tst-rwlock19 tst-rwlock20 \
 		  tst-sem11 tst-sem12 tst-sem13 \
diff --git a/nptl/pthread_rwlock_common.c b/nptl/pthread_rwlock_common.c
index 5dd534271a..85fc1bcfc7 100644
--- a/nptl/pthread_rwlock_common.c
+++ b/nptl/pthread_rwlock_common.c
@@ -314,7 +314,7 @@  __pthread_rwlock_rdlock_full (pthread_rwlock_t *rwlock,
 		 harmless because the flag is just about the state of
 		 __readers, and all threads set the flag under the same
 		 conditions.  */
-	      while ((atomic_load_relaxed (&rwlock->__data.__readers)
+	      while (((r = atomic_load_relaxed (&rwlock->__data.__readers))
 		      & PTHREAD_RWLOCK_RWAITING) != 0)
 		{
 		  int private = __pthread_rwlock_get_private (rwlock);
diff --git a/nptl/tst-rwlock-pwn.c b/nptl/tst-rwlock-pwn.c
new file mode 100644
index 0000000000..91fa452610
--- /dev/null
+++ b/nptl/tst-rwlock-pwn.c
@@ -0,0 +1,82 @@ 
+/* Test rwlock with PREFER_WRITER_NONRECURSIVE_NP.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <pthread.h>
+#include <support/xthread.h>
+
+/* We choose 10 iterations and 3 threads because this happens to be able
+   to trigger the stall today on modern hardware.  */
+#define LOOPS 10
+#define NTHREADS 3
+
+_Atomic int do_exit;
+pthread_rwlockattr_t mylock_attr;
+pthread_rwlock_t mylock;
+
+void *
+run_loop (void *a)
+{
+  while (!do_exit)
+    {
+      if (random () & 1)
+	{
+	  xpthread_rwlock_wrlock (&mylock);
+	  xpthread_rwlock_unlock (&mylock);
+	}
+      else
+	{
+	  xpthread_rwlock_rdlock (&mylock);
+	  xpthread_rwlock_unlock (&mylock);
+	}
+    }
+  return NULL;
+}
+
+int
+do_test (void)
+{
+  xpthread_rwlockattr_init (&mylock_attr);
+  xpthread_rwlockattr_setkind_np (&mylock_attr,
+				  PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP);
+  xpthread_rwlock_init (&mylock, &mylock_attr);
+
+  for (int n = 0; n < LOOPS; n++)
+    {
+      pthread_t tids[NTHREADS];
+      do_exit = 0;
+      for (int i = 0; i < NTHREADS; i++)
+	tids[i] = xpthread_create (NULL, run_loop, NULL);
+      /* Let the threads run for some time.  */
+      sleep (1);
+      printf ("Exiting...");
+      fflush (stdout);
+      do_exit = 1;
+      for (int i = 0; i < NTHREADS; i++)
+	xpthread_join (tids[i]);
+      printf ("done.\n");
+    }
+  pthread_rwlock_destroy (&mylock);
+  pthread_rwlockattr_destroy (&mylock_attr);
+  return 0;
+}
+
+#define TIMEOUT (DEFAULT_TIMEOUT + 3 * LOOPS)
+#include <support/test-driver.c>