[v2,9/9] nptl: Avoid async cancellation to wrongly update __nptl_nthreads (BZ #19366)

Message ID 20210527172823.3461314-10-adhemerval.zanella@linaro.org
State Committed
Commit 8fe503f74e0a2ab41eec9bbae1e0ea8f5203716b
Delegated to: Florian Weimer
Headers
Series nptl: pthread cancellation refactor |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent

Commit Message

Adhemerval Zanella Netto May 27, 2021, 5:28 p.m. UTC
  The testcase provided on BZ#19366 may update __nptl_nthreads in a wrong
order, triggering an early process exit because the thread decrement
the value twice.

The issue is once the thread exits without acting on cancellation,
it decreaments '__nptl_nthreads' and then atomically set
 'cancelhandling' with EXITING_BIT (thus preventing further cancellation
handler to act).  The issue happens if a SIGCANCEL is received between
checking '__ntpl_nthreads' and setting EXITING_BIT.  To avoid it, the
'__nptl_nthreads' decrement is moved after EXITING_BIT.

It does not fully follow the POSIX XSH 2.9.5 Thread Cancellation under
the heading Thread Cancellation Cleanup Handlers that states that
when a cancellation request is acted upon, or when a thread calls
pthread_exit(), the thread first disables cancellation by setting its
cancelability state to PTHREAD_CANCEL_DISABLE and its cancelability type
to PTHREAD_CANCEL_DEFERRED.  The issue is '__pthread_enable_asynccancel'
explicit enabled assynchrnous cancellation, so an interrupted syscall
within the cancellation cleanup handlers might see an invalid cancelling
type (a possible fix might be possible with my proposed solution to
BZ#12683).

Trying to come up with a test is quite hard since it requires to
mimic the timing issue described below, however I see that the
buz report reproducer does not early exit anymore.

Checked on x86_64-linux-gnu.
---
 nptl/pthread_create.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)
  

Comments

Florian Weimer June 1, 2021, 2:29 p.m. UTC | #1
* Adhemerval Zanella via Libc-alpha:

> Trying to come up with a test is quite hard since it requires to
> mimic the timing issue described below, however I see that the
> buz report reproducer does not early exit anymore.

Typo: buz

Patch looks okay to me, thanks.

Florian
  
Adhemerval Zanella Netto June 2, 2021, 1:15 p.m. UTC | #2
On 01/06/2021 11:29, Florian Weimer wrote:
> * Adhemerval Zanella via Libc-alpha:
> 
>> Trying to come up with a test is quite hard since it requires to
>> mimic the timing issue described below, however I see that the
>> buz report reproducer does not early exit anymore.
> 
> Typo: buz

Ack.

> 
> Patch looks okay to me, thanks.
> 
> Florian
>
  

Patch

diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index 15ce5ad4a1..79af3412cf 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -442,13 +442,6 @@  start_thread (void *arg)
   /* Clean up any state libc stored in thread-local variables.  */
   __libc_thread_freeres ();
 
-  /* If this is the last thread we terminate the process now.  We
-     do not notify the debugger, it might just irritate it if there
-     is no thread left.  */
-  if (__glibc_unlikely (atomic_decrement_and_test (&__nptl_nthreads)))
-    /* This was the last thread.  */
-    exit (0);
-
   /* Report the death of the thread if this is wanted.  */
   if (__glibc_unlikely (pd->report_events))
     {
@@ -483,6 +476,10 @@  start_thread (void *arg)
      the breakpoint reports TD_THR_RUN state rather than TD_THR_ZOMBIE.  */
   atomic_bit_set (&pd->cancelhandling, EXITING_BIT);
 
+  if (__glibc_unlikely (atomic_decrement_and_test (&__nptl_nthreads)))
+    /* This was the last thread.  */
+    exit (0);
+
 #ifndef __ASSUME_SET_ROBUST_LIST
   /* If this thread has any robust mutexes locked, handle them now.  */
 # if __PTHREAD_MUTEX_HAVE_PREV