[07/11] nptl: Remove CANCELING_BITMASK

Message ID 20210526165728.1772546-8-adhemerval.zanella@linaro.org
State Superseded
Headers
Series nptl: pthread cancellation refactor |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent

Commit Message

Adhemerval Zanella Netto May 26, 2021, 4:57 p.m. UTC
  The CANCELING_BITMASK is used as an optimization to avoid sending
the signal when pthread_cancel is called in a concurrent manner.

This requires then to put both the cancellation state and type on
a shared state (cancelhandling), since 'pthread_cancel' checks
whether cancellation is enabled and asynchrnous to either cancel
itself of sending the signal.

It also requires handle the CANCELING_BITMASK on
__pthread_disable_asynccancel, however this is incurs in the same
issues described on BZ#12683: the cancellation is acting even *after*
the syscalls returns with user visible side-effects.

This patch removes this optimization and simplifies the pthread
cancellation implementation: pthread_cancel now first check if
cancellation is already pending and if not always send a signal
if the target is not itself.  The SIGCANCEL handler is also simpified
since there is not need to setup a CAS loop.

It also alows to move both the cancellation state and mode out of
'cancelhadling' (it is done in subsequent patches).

Checked on x86_64-linux-gnu and aarch64-linux-gnu.
---
 nptl/cancellation.c        |  12 ----
 nptl/descr.h               |   3 -
 nptl/pthread_cancel.c      | 124 ++++++++++---------------------------
 nptl/pthread_join_common.c |   2 +-
 4 files changed, 35 insertions(+), 106 deletions(-)
  

Comments

Florian Weimer May 26, 2021, 6:02 p.m. UTC | #1
* Adhemerval Zanella via Libc-alpha:

> The CANCELING_BITMASK is used as an optimization to avoid sending
> the signal when pthread_cancel is called in a concurrent manner.
>
> This requires then to put both the cancellation state and type on
> a shared state (cancelhandling), since 'pthread_cancel' checks
> whether cancellation is enabled and asynchrnous to either cancel
> itself of sending the signal.
>
> It also requires handle the CANCELING_BITMASK on
> __pthread_disable_asynccancel, however this is incurs in the same
> issues described on BZ#12683: the cancellation is acting even *after*
> the syscalls returns with user visible side-effects.
>
> This patch removes this optimization and simplifies the pthread
> cancellation implementation: pthread_cancel now first check if
> cancellation is already pending and if not always send a signal
> if the target is not itself.  The SIGCANCEL handler is also simpified
> since there is not need to setup a CAS loop.

“and, if not, always sends a signal” ?

> +  int ch = atomic_load_relaxed (&self->cancelhandling);
> +  /* Cancelation not enabled, not cancelled, or already exitting.  */

Typo: exitting

The change itself looks good to me, and I agree with the direction.

Thanks,
Florian
  
Florian Weimer June 15, 2021, 10:07 p.m. UTC | #2
* Adhemerval Zanella via Libc-alpha:

> The CANCELING_BITMASK is used as an optimization to avoid sending
> the signal when pthread_cancel is called in a concurrent manner.
>
> This requires then to put both the cancellation state and type on
> a shared state (cancelhandling), since 'pthread_cancel' checks
> whether cancellation is enabled and asynchrnous to either cancel
> itself of sending the signal.
>
> It also requires handle the CANCELING_BITMASK on
> __pthread_disable_asynccancel, however this is incurs in the same
> issues described on BZ#12683: the cancellation is acting even *after*
> the syscalls returns with user visible side-effects.
>
> This patch removes this optimization and simplifies the pthread
> cancellation implementation: pthread_cancel now first check if
> cancellation is already pending and if not always send a signal
> if the target is not itself.  The SIGCANCEL handler is also simpified
> since there is not need to setup a CAS loop.
>
> It also alows to move both the cancellation state and mode out of
> 'cancelhadling' (it is done in subsequent patches).

It looks like this causes sporadic failures in nptl/tst-sem16 because
sem_wait fails with EINTR, revealing that cancellation is implemented
using a signal (something that must not be visible according to POSIX).

This was not a problem before because pthread_cancel sent the signal
only when asynchronous cancellation was enabled.

We either have to revert this, or push forward with the transition so
that we can install the SIGCANCEL handler with SA_RESTART.

Thanks,
Florian
  
Adhemerval Zanella Netto June 15, 2021, 11:33 p.m. UTC | #3
On 15/06/2021 19:07, Florian Weimer wrote:
> * Adhemerval Zanella via Libc-alpha:
> 
>> The CANCELING_BITMASK is used as an optimization to avoid sending
>> the signal when pthread_cancel is called in a concurrent manner.
>>
>> This requires then to put both the cancellation state and type on
>> a shared state (cancelhandling), since 'pthread_cancel' checks
>> whether cancellation is enabled and asynchrnous to either cancel
>> itself of sending the signal.
>>
>> It also requires handle the CANCELING_BITMASK on
>> __pthread_disable_asynccancel, however this is incurs in the same
>> issues described on BZ#12683: the cancellation is acting even *after*
>> the syscalls returns with user visible side-effects.
>>
>> This patch removes this optimization and simplifies the pthread
>> cancellation implementation: pthread_cancel now first check if
>> cancellation is already pending and if not always send a signal
>> if the target is not itself.  The SIGCANCEL handler is also simpified
>> since there is not need to setup a CAS loop.
>>
>> It also alows to move both the cancellation state and mode out of
>> 'cancelhadling' (it is done in subsequent patches).
> 
> It looks like this causes sporadic failures in nptl/tst-sem16 because
> sem_wait fails with EINTR, revealing that cancellation is implemented
> using a signal (something that must not be visible according to POSIX).
> 
> This was not a problem before because pthread_cancel sent the signal
> only when asynchronous cancellation was enabled.
> 
> We either have to revert this, or push forward with the transition so
> that we can install the SIGCANCEL handler with SA_RESTART.

Indeed now that SIGCANCEL is always sent by pthread_cancel we must
install the signal handle with SA_RESTART, the signal handle will
either act on cancellation or it should be not visible by the 
application.

I don't think there is a need to revert this change, running nptl/tst-sem16
with SA_RESTART I don't see the regression anymore with multiple thousands
runs.
  
Adhemerval Zanella Netto June 16, 2021, 12:46 p.m. UTC | #4
On 15/06/2021 20:33, Adhemerval Zanella wrote:
> 
> 
> On 15/06/2021 19:07, Florian Weimer wrote:
>> * Adhemerval Zanella via Libc-alpha:
>>
>>> The CANCELING_BITMASK is used as an optimization to avoid sending
>>> the signal when pthread_cancel is called in a concurrent manner.
>>>
>>> This requires then to put both the cancellation state and type on
>>> a shared state (cancelhandling), since 'pthread_cancel' checks
>>> whether cancellation is enabled and asynchrnous to either cancel
>>> itself of sending the signal.
>>>
>>> It also requires handle the CANCELING_BITMASK on
>>> __pthread_disable_asynccancel, however this is incurs in the same
>>> issues described on BZ#12683: the cancellation is acting even *after*
>>> the syscalls returns with user visible side-effects.
>>>
>>> This patch removes this optimization and simplifies the pthread
>>> cancellation implementation: pthread_cancel now first check if
>>> cancellation is already pending and if not always send a signal
>>> if the target is not itself.  The SIGCANCEL handler is also simpified
>>> since there is not need to setup a CAS loop.
>>>
>>> It also alows to move both the cancellation state and mode out of
>>> 'cancelhadling' (it is done in subsequent patches).
>>
>> It looks like this causes sporadic failures in nptl/tst-sem16 because
>> sem_wait fails with EINTR, revealing that cancellation is implemented
>> using a signal (something that must not be visible according to POSIX).
>>
>> This was not a problem before because pthread_cancel sent the signal
>> only when asynchronous cancellation was enabled.
>>
>> We either have to revert this, or push forward with the transition so
>> that we can install the SIGCANCEL handler with SA_RESTART.
> 
> Indeed now that SIGCANCEL is always sent by pthread_cancel we must
> install the signal handle with SA_RESTART, the signal handle will
> either act on cancellation or it should be not visible by the 
> application.
> 
> I don't think there is a need to revert this change, running nptl/tst-sem16
> with SA_RESTART I don't see the regression anymore with multiple thousands
> runs.
> 

I will send a patch to use SA_RESTART on cancellation.
  

Patch

diff --git a/nptl/cancellation.c b/nptl/cancellation.c
index c20845adc0..b15f25d8f6 100644
--- a/nptl/cancellation.c
+++ b/nptl/cancellation.c
@@ -88,17 +88,5 @@  __pthread_disable_asynccancel (int oldtype)
       /* Prepare the next round.  */
       oldval = curval;
     }
-
-  /* We cannot return when we are being canceled.  Upon return the
-     thread might be things which would have to be undone.  The
-     following loop should loop until the cancellation signal is
-     delivered.  */
-  while (__builtin_expect ((newval & (CANCELING_BITMASK | CANCELED_BITMASK))
-			   == CANCELING_BITMASK, 0))
-    {
-      futex_wait_simple ((unsigned int *) &self->cancelhandling, newval,
-			 FUTEX_PRIVATE);
-      newval = THREAD_GETMEM (self, cancelhandling);
-    }
 }
 libc_hidden_def (__pthread_disable_asynccancel)
diff --git a/nptl/descr.h b/nptl/descr.h
index 9d8297b45f..a120365f88 100644
--- a/nptl/descr.h
+++ b/nptl/descr.h
@@ -283,9 +283,6 @@  struct pthread
   /* Bit set if asynchronous cancellation mode is selected.  */
 #define CANCELTYPE_BIT		1
 #define CANCELTYPE_BITMASK	(0x01 << CANCELTYPE_BIT)
-  /* Bit set if canceling has been initiated.  */
-#define CANCELING_BIT		2
-#define CANCELING_BITMASK	(0x01 << CANCELING_BIT)
   /* Bit set if canceled.  */
 #define CANCELED_BIT		3
 #define CANCELED_BITMASK	(0x01 << CANCELED_BIT)
diff --git a/nptl/pthread_cancel.c b/nptl/pthread_cancel.c
index f264a4096a..c11a2b4551 100644
--- a/nptl/pthread_cancel.c
+++ b/nptl/pthread_cancel.c
@@ -43,35 +43,18 @@  sigcancel_handler (int sig, siginfo_t *si, void *ctx)
 
   struct pthread *self = THREAD_SELF;
 
-  int oldval = THREAD_GETMEM (self, cancelhandling);
-  while (1)
-    {
-      /* We are canceled now.  When canceled by another thread this flag
-	 is already set but if the signal is directly send (internally or
-	 from another process) is has to be done here.  */
-      int newval = oldval | CANCELING_BITMASK | CANCELED_BITMASK;
-
-      if (oldval == newval || (oldval & EXITING_BITMASK) != 0)
-	/* Already canceled or exiting.  */
-	break;
-
-      int curval = THREAD_ATOMIC_CMPXCHG_VAL (self, cancelhandling, newval,
-					      oldval);
-      if (curval == oldval)
-	{
-	  /* Set the return value.  */
-	  THREAD_SETMEM (self, result, PTHREAD_CANCELED);
-
-	  /* Make sure asynchronous cancellation is still enabled.  */
-	  if ((newval & CANCELTYPE_BITMASK) != 0)
-	    /* Run the registered destructors and terminate the thread.  */
-	    __do_cancel ();
-
-	  break;
-	}
-
-      oldval = curval;
-    }
+  int ch = atomic_load_relaxed (&self->cancelhandling);
+  /* Cancelation not enabled, not cancelled, or already exitting.  */
+  if ((ch & CANCELSTATE_BITMASK) != 0
+      || (ch & CANCELED_BITMASK) == 0
+      || (ch & EXITING_BITMASK) != 0)
+    return;
+
+  /* Set the return value.  */
+  THREAD_SETMEM (self, result, PTHREAD_CANCELED);
+  /* Make sure asynchronous cancellation is still enabled.  */
+  if ((ch & CANCELTYPE_BITMASK) != 0)
+    __do_cancel ();
 }
 
 int
@@ -104,72 +87,33 @@  __pthread_cancel (pthread_t th)
 		    " must be installed for pthread_cancel to work\n");
   }
 #endif
-  int result = 0;
-  int oldval;
-  int newval;
-  do
+
+  int oldch = atomic_fetch_or_acquire (&pd->cancelhandling, CANCELED_BITMASK);
+  if ((oldch & CANCELED_BITMASK) != 0)
+    return 0;
+
+  if (pd == THREAD_SELF)
     {
-    again:
-      oldval = pd->cancelhandling;
-      newval = oldval | CANCELING_BITMASK | CANCELED_BITMASK;
-
-      /* Avoid doing unnecessary work.  The atomic operation can
-	 potentially be expensive if the bug has to be locked and
-	 remote cache lines have to be invalidated.  */
-      if (oldval == newval)
-	break;
-
-      /* If the cancellation is handled asynchronously just send a
-	 signal.  We avoid this if possible since it's more
-	 expensive.  */
-      if (CANCEL_ENABLED_AND_CANCELED_AND_ASYNCHRONOUS (newval))
-	{
-	  /* Mark the cancellation as "in progress".  */
-	  if (atomic_compare_and_exchange_bool_acq (&pd->cancelhandling,
-						    oldval | CANCELING_BITMASK,
-						    oldval))
-	    goto again;
-
-	  if (pd == THREAD_SELF)
-	    /* This is not merely an optimization: An application may
-	       call pthread_cancel (pthread_self ()) without calling
-	       pthread_create, so the signal handler may not have been
-	       set up for a self-cancel.  */
-            {
-              THREAD_SETMEM (pd, result, PTHREAD_CANCELED);
-              if ((newval & CANCELTYPE_BITMASK) != 0)
-	        __do_cancel ();
-            }
-	  else
-	    {
-	      /* The cancellation handler will take care of marking the
-		 thread as canceled.  */
-	      pid_t pid = __getpid ();
-
-	      int val = INTERNAL_SYSCALL_CALL (tgkill, pid, pd->tid,
-					       SIGCANCEL);
-	      if (INTERNAL_SYSCALL_ERROR_P (val))
-		result = INTERNAL_SYSCALL_ERRNO (val);
-	    }
-
-	  break;
-	}
-
-	/* A single-threaded process should be able to kill itself, since
-	   there is nothing in the POSIX specification that says that it
-	   cannot.  So we set multiple_threads to true so that cancellation
-	   points get executed.  */
-	THREAD_SETMEM (THREAD_SELF, header.multiple_threads, 1);
+      /* A single-threaded process should be able to kill itself, since there
+         is nothing in the POSIX specification that says that it cannot.  So
+         we set multiple_threads to true so that cancellation points get
+         executed.  */
+      THREAD_SETMEM (THREAD_SELF, header.multiple_threads, 1);
 #ifndef TLS_MULTIPLE_THREADS_IN_TCB
-	__libc_multiple_threads = 1;
+      __libc_multiple_threads = 1;
 #endif
+
+      THREAD_SETMEM (pd, result, PTHREAD_CANCELED);
+      if ((oldch & CANCELTYPE_BITMASK) != 0)
+        __do_cancel ();
+      return 0;
     }
-  /* Mark the thread as canceled.  This has to be done
-     atomically since other bits could be modified as well.  */
-  while (atomic_compare_and_exchange_bool_acq (&pd->cancelhandling, newval,
-					       oldval));
 
-  return result;
+  pid_t pid = __getpid ();
+  int val = INTERNAL_SYSCALL_CALL (tgkill, pid, pd->tid, SIGCANCEL);
+  return INTERNAL_SYSCALL_ERROR_P (val)
+         ? INTERNAL_SYSCALL_ERRNO (val)
+         : 0;
 }
 versioned_symbol (libc, __pthread_cancel, pthread_cancel, GLIBC_2_34);
 
diff --git a/nptl/pthread_join_common.c b/nptl/pthread_join_common.c
index e87801b5a3..f842c91a08 100644
--- a/nptl/pthread_join_common.c
+++ b/nptl/pthread_join_common.c
@@ -57,7 +57,7 @@  __pthread_clockjoin_ex (pthread_t threadid, void **thread_return,
   if ((pd == self
        || (self->joinid == pd
 	   && (pd->cancelhandling
-	       & (CANCELING_BITMASK | CANCELED_BITMASK | EXITING_BITMASK
+	       & (CANCELED_BITMASK | EXITING_BITMASK
 		  | TERMINATED_BITMASK)) == 0))
       && !CANCEL_ENABLED_AND_CANCELED (self->cancelhandling))
     /* This is a deadlock situation.  The threads are waiting for each