[v2,09/19] nptl: Fix race between pthread_kill and thread exit (bug 12889)
Checks
Context |
Check |
Description |
dj/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
Commit Message
From: Florian Weimer <fweimer@redhat.com>
Now that the 'tid' pthread member is not used to advertise the thread
lifetime through set_tid_address syscall, we can use a simple lock to
guarantee it is valid on pthread_kill().
This patch adds a new lock, tidlock, which is used to synchronize the
pthread 'tid' field access. Also, there is no need to access the
'tid' member using atomic operations.
Checked on x86_64-linux-gnu.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
---
nptl/allocatestack.c | 1 +
nptl/descr.h | 4 +
nptl/pthread_create.c | 7 ++
nptl/pthread_kill.c | 14 ++-
sysdeps/pthread/Makefile | 2 +
sysdeps/pthread/tst-kill4.c | 90 --------------
.../pthread/tst-pthread_cancel-select-loop.c | 87 ++++++++++++++
sysdeps/pthread/tst-pthread_kill-exiting.c | 110 ++++++++++++++++++
8 files changed, 221 insertions(+), 94 deletions(-)
delete mode 100644 sysdeps/pthread/tst-kill4.c
create mode 100644 sysdeps/pthread/tst-pthread_cancel-select-loop.c
create mode 100644 sysdeps/pthread/tst-pthread_kill-exiting.c
Comments
* Adhemerval Zanella:
> diff --git a/nptl/pthread_kill.c b/nptl/pthread_kill.c
> index 5d4c86f920..63fe8c44ca 100644
> --- a/nptl/pthread_kill.c
> +++ b/nptl/pthread_kill.c
> @@ -26,15 +26,18 @@ __pthread_kill_internal (pthread_t threadid, int signo)
> pid_t tid;
> struct pthread *pd = (struct pthread *) threadid;
>
> + /* Block all signal, since the lock is recursive and used on pthread_cancel
> + (which should be async-signal-safe). */
> + sigset_t oldmask;
> + __libc_signal_block_all (&oldmask);
> + lll_lock (pd->tidlock, LLL_PRIVATE);
> +
> if (pd == THREAD_SELF)
> /* It is a special case to handle raise() implementation after a vfork
> call (which does not update the PD tid field). */
> tid = INLINE_SYSCALL_CALL (gettid);
> else
> - /* Force load of pd->tid into local variable or register. Otherwise
> - if a thread exits between ESRCH test and tgkill, we might return
> - EINVAL, because pd->tid would be cleared by the kernel. */
> - tid = atomic_forced_read (pd->tid);
> + tid = pd->tid;
>
> int val;
> if (__glibc_likely (tid > 0))
> @@ -53,6 +56,9 @@ __pthread_kill_internal (pthread_t threadid, int signo)
> as an error. */
> val = 0;
>
> + lll_unlock (pd->tidlock, LLL_PRIVATE);
> + __libc_signal_restore_set (&oldmask);
> +
> return val;
> }
This needs a comment explaining that *all* pending signals are delivered
at the point of the __libc_signal_restore_set call. I hope that this is
actually what happens in Linux; POSIX only guarantees that for one
pending signal that is unblocked.
The problem here is that pthread_kill (pthread_self (), …) must generate
synchronous signal, and due to the signal-blocking, it is not
immediately obvious if that's the case.
If we aren't sure that all signals are flushed, we need to check for the
send-signal-to-self case and avoid blocking the signal there. We don't
need the tidlock for that because the running thread won't go away.
Thanks,
Florian
On 26/08/2021 11:23, Florian Weimer wrote:
> * Adhemerval Zanella:
>
>> diff --git a/nptl/pthread_kill.c b/nptl/pthread_kill.c
>> index 5d4c86f920..63fe8c44ca 100644
>> --- a/nptl/pthread_kill.c
>> +++ b/nptl/pthread_kill.c
>> @@ -26,15 +26,18 @@ __pthread_kill_internal (pthread_t threadid, int signo)
>> pid_t tid;
>> struct pthread *pd = (struct pthread *) threadid;
>>
>> + /* Block all signal, since the lock is recursive and used on pthread_cancel
>> + (which should be async-signal-safe). */
>> + sigset_t oldmask;
>> + __libc_signal_block_all (&oldmask);
>> + lll_lock (pd->tidlock, LLL_PRIVATE);
>> +
>> if (pd == THREAD_SELF)
>> /* It is a special case to handle raise() implementation after a vfork
>> call (which does not update the PD tid field). */
>> tid = INLINE_SYSCALL_CALL (gettid);
>> else
>> - /* Force load of pd->tid into local variable or register. Otherwise
>> - if a thread exits between ESRCH test and tgkill, we might return
>> - EINVAL, because pd->tid would be cleared by the kernel. */
>> - tid = atomic_forced_read (pd->tid);
>> + tid = pd->tid;
>>
>> int val;
>> if (__glibc_likely (tid > 0))
>> @@ -53,6 +56,9 @@ __pthread_kill_internal (pthread_t threadid, int signo)
>> as an error. */
>> val = 0;
>>
>> + lll_unlock (pd->tidlock, LLL_PRIVATE);
>> + __libc_signal_restore_set (&oldmask);
>> +
>> return val;
>> }
>
> This needs a comment explaining that *all* pending signals are delivered
> at the point of the __libc_signal_restore_set call. I hope that this is
> actually what happens in Linux; POSIX only guarantees that for one
> pending signal that is unblocked.
My understanding from kernel/signal.c all pending signals are delivered
with the signal mask restore, but only real-time one are queued (and
subjected to RLIMIT_SIGPENDING, which causes another issue [1]).
>
> The problem here is that pthread_kill (pthread_self (), …) must generate
> synchronous signal, and due to the signal-blocking, it is not
> immediately obvious if that's the case.
>
> If we aren't sure that all signals are flushed, we need to check for the
> send-signal-to-self case and avoid blocking the signal there. We don't
> need the tidlock for that because the running thread won't go away.
>
> Thanks,
> Florian
>
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=21108
* Adhemerval Zanella:
>> This needs a comment explaining that *all* pending signals are delivered
>> at the point of the __libc_signal_restore_set call. I hope that this is
>> actually what happens in Linux; POSIX only guarantees that for one
>> pending signal that is unblocked.
>
> My understanding from kernel/signal.c all pending signals are delivered
> with the signal mask restore, but only real-time one are queued (and
> subjected to RLIMIT_SIGPENDING, which causes another issue [1]).
If real-time signals are special, I think we need to change the way we
handle the self-signal case and avoid the lock there.
Thanks,
Florian
@@ -124,6 +124,7 @@ get_cached_stack (size_t *sizep, void **memp)
result->canceltype = PTHREAD_CANCEL_DEFERRED;
result->cleanup = NULL;
result->setup_failed = 0;
+ result->tidlock = LLL_LOCK_INITIALIZER;
/* No pending event. */
result->nextevent = NULL;
@@ -181,6 +181,10 @@ struct pthread
/* Thread ID set by the kernel with CLONE_PARENT_SETTID. */
pid_t tid;
+ /* Lock to synchronize access to TID value (used on pthread_kill and thread
+ exit). */
+ int tidlock;
+
/* Ununsed. */
pid_t pid_ununsed;
@@ -531,6 +531,12 @@ start_thread (void *arg)
/* This was the last thread. */
exit (0);
+ /* This prevents sending a signal from this thread to itself during its
+ final stages. This must come after the exit call above because
+ atexit handlers must not run with signals blocked. */
+ __libc_signal_block_all (NULL);
+ lll_lock (pd->tidlock, LLL_PRIVATE);
+
if (!pd->user_stack)
advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd,
pd->guardsize);
@@ -555,6 +561,7 @@ start_thread (void *arg)
__nptl_free_tcb (pd);
pd->tid = 0;
+ lll_unlock (pd->tidlock, LLL_PRIVATE);
out:
/* We cannot call '_exit' here. '_exit' will terminate the process.
@@ -26,15 +26,18 @@ __pthread_kill_internal (pthread_t threadid, int signo)
pid_t tid;
struct pthread *pd = (struct pthread *) threadid;
+ /* Block all signal, since the lock is recursive and used on pthread_cancel
+ (which should be async-signal-safe). */
+ sigset_t oldmask;
+ __libc_signal_block_all (&oldmask);
+ lll_lock (pd->tidlock, LLL_PRIVATE);
+
if (pd == THREAD_SELF)
/* It is a special case to handle raise() implementation after a vfork
call (which does not update the PD tid field). */
tid = INLINE_SYSCALL_CALL (gettid);
else
- /* Force load of pd->tid into local variable or register. Otherwise
- if a thread exits between ESRCH test and tgkill, we might return
- EINVAL, because pd->tid would be cleared by the kernel. */
- tid = atomic_forced_read (pd->tid);
+ tid = pd->tid;
int val;
if (__glibc_likely (tid > 0))
@@ -53,6 +56,9 @@ __pthread_kill_internal (pthread_t threadid, int signo)
as an error. */
val = 0;
+ lll_unlock (pd->tidlock, LLL_PRIVATE);
+ __libc_signal_restore_set (&oldmask);
+
return val;
}
@@ -119,7 +119,9 @@ tests += tst-cnd-basic tst-mtx-trylock tst-cnd-broadcast \
tst-unwind-thread \
tst-pt-vfork1 tst-pt-vfork2 tst-vfork1x tst-vfork2x \
tst-pthread_cancel-exited \
+ tst-pthread_cancel-select-loop \
tst-pthread_kill-exited \
+ tst-pthread_kill-exiting \
# tests
tests-time64 := \
deleted file mode 100644
@@ -1,90 +0,0 @@
-/* Copyright (C) 2003-2021 Free Software Foundation, Inc.
- This file is part of the GNU C Library.
- Contributed by Ulrich Drepper <drepper@redhat.com>, 2003.
-
- The GNU C Library is free software; you can redistribute it and/or
- modify it under the terms of the GNU Lesser General Public
- License as published by the Free Software Foundation; either
- version 2.1 of the License, or (at your option) any later version.
-
- The GNU C Library is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
-
- You should have received a copy of the GNU Lesser General Public
- License along with the GNU C Library; if not, see
- <https://www.gnu.org/licenses/>. */
-
-#include <errno.h>
-#include <pthread.h>
-#include <signal.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <unistd.h>
-
-
-static void *
-tf (void *a)
-{
- return NULL;
-}
-
-
-int
-do_test (void)
-{
- pthread_attr_t at;
- if (pthread_attr_init (&at) != 0)
- {
- puts ("attr_create failed");
- exit (1);
- }
-
- /* Limit thread stack size, because if it is too large, pthread_join
- will free it immediately rather than put it into stack cache. */
- if (pthread_attr_setstacksize (&at, 2 * 1024 * 1024) != 0)
- {
- puts ("setstacksize failed");
- exit (1);
- }
-
- pthread_t th;
- if (pthread_create (&th, &at, tf, NULL) != 0)
- {
- puts ("create failed");
- exit (1);
- }
-
- pthread_attr_destroy (&at);
-
- if (pthread_join (th, NULL) != 0)
- {
- puts ("join failed");
- exit (1);
- }
-
- /* The following only works because we assume here something about
- the implementation. Namely, that the memory allocated for the
- thread descriptor is not going away, that the TID field is
- cleared and therefore the signal is sent to process 0, and that
- we can savely assume there is no other process with this ID at
- that time. */
- int e = pthread_kill (th, 0);
- if (e == 0)
- {
- puts ("pthread_kill succeeded");
- exit (1);
- }
- if (e != ESRCH)
- {
- puts ("pthread_kill didn't return ESRCH");
- exit (1);
- }
-
- return 0;
-}
-
-
-#define TEST_FUNCTION do_test ()
-#include "../test-skeleton.c"
new file mode 100644
@@ -0,0 +1,87 @@
+/* Test that pthread_cancel succeeds during thread exit.
+ Copyright (C) 2021 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+/* This test tries to trigger an internal race condition in
+ pthread_cancel, where the cancellation signal is sent after the
+ thread has begun the cancellation process. This can result in a
+ spurious ESRCH error. For the original bug 12889, the window is
+ quite small, so the bug was not reproduced in every run. */
+
+#include <stdbool.h>
+#include <stddef.h>
+#include <support/check.h>
+#include <support/xthread.h>
+#include <support/xunistd.h>
+#include <sys/select.h>
+#include <unistd.h>
+
+/* Set to true by timeout_thread_function when the test should
+ terminate. */
+static bool timeout;
+
+static void *
+timeout_thread_function (void *unused)
+{
+ usleep (5 * 1000 * 1000);
+ __atomic_store_n (&timeout, true, __ATOMIC_RELAXED);
+ return NULL;
+}
+
+/* Used for blocking the select function below. */
+static int pipe_fds[2];
+
+static void *
+canceled_thread_function (void *unused)
+{
+ while (true)
+ {
+ fd_set rfs;
+ fd_set wfs;
+ fd_set efs;
+ FD_ZERO (&rfs);
+ FD_ZERO (&wfs);
+ FD_ZERO (&efs);
+ FD_SET (pipe_fds[0], &rfs);
+
+ /* If the cancellation request is recognized early, the thread
+ begins exiting while the cancellation signal arrives. */
+ select (FD_SETSIZE, &rfs, &wfs, &efs, NULL);
+ }
+ return NULL;
+}
+
+static int
+do_test (void)
+{
+ xpipe (pipe_fds);
+ pthread_t thr_timeout = xpthread_create (NULL, timeout_thread_function, NULL);
+
+ while (!__atomic_load_n (&timeout, __ATOMIC_RELAXED))
+ {
+ pthread_t thr = xpthread_create (NULL, canceled_thread_function, NULL);
+ xpthread_cancel (thr);
+ TEST_VERIFY (xpthread_join (thr) == PTHREAD_CANCELED);
+ }
+
+ xpthread_join (thr_timeout);
+ xclose (pipe_fds[0]);
+ xclose (pipe_fds[1]);
+ return 0;
+}
+
+#include <support/test-driver.c>
new file mode 100644
@@ -0,0 +1,110 @@
+/* Test that pthread_kill succeeds during thread exit.
+ Copyright (C) 2021 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+/* This test verifies that pthread_kill for a thread that is exiting
+ succeeds (with or without actually delivering the signal). */
+
+#include <array_length.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <support/xsignal.h>
+#include <support/xthread.h>
+#include <unistd.h>
+
+/* Set to true by timeout_thread_function when the test should
+ terminate. */
+static bool timeout;
+
+static void *
+timeout_thread_function (void *unused)
+{
+ usleep (1000 * 1000);
+ __atomic_store_n (&timeout, true, __ATOMIC_RELAXED);
+ return NULL;
+}
+
+/* Used to synchronize the sending threads with the target thread and
+ main thread. */
+static pthread_barrier_t barrier_1;
+static pthread_barrier_t barrier_2;
+
+/* The target thread to which signals are to be sent. */
+static pthread_t target_thread;
+
+static void *
+sender_thread_function (void *unused)
+{
+ while (!__atomic_load_n (&timeout, __ATOMIC_RELAXED))
+ {
+ /* Wait until target_thread has been initialized. The target
+ thread and main thread participate in this barrier. */
+ xpthread_barrier_wait (&barrier_1);
+
+ xpthread_kill (target_thread, SIGUSR1);
+
+ /* Communicate that the signal has been sent. The main thread
+ participates in this barrier. */
+ xpthread_barrier_wait (&barrier_2);
+ }
+ return NULL;
+}
+
+static void *
+target_thread_function (void *unused)
+{
+ target_thread = pthread_self ();
+ xpthread_barrier_wait (&barrier_1);
+ return NULL;
+}
+
+static int
+do_test (void)
+{
+ xsignal (SIGUSR1, SIG_IGN);
+
+ pthread_t thr_timeout = xpthread_create (NULL, timeout_thread_function, NULL);
+
+ pthread_t threads[4];
+ xpthread_barrier_init (&barrier_1, NULL, array_length (threads) + 2);
+ xpthread_barrier_init (&barrier_2, NULL, array_length (threads) + 1);
+
+ for (int i = 0; i < array_length (threads); ++i)
+ threads[i] = xpthread_create (NULL, sender_thread_function, NULL);
+
+ while (!__atomic_load_n (&timeout, __ATOMIC_RELAXED))
+ {
+ xpthread_create (NULL, target_thread_function, NULL);
+
+ /* Wait for the target thread to be set up and signal sending to
+ start. */
+ xpthread_barrier_wait (&barrier_1);
+
+ /* Wait for signal sending to complete. */
+ xpthread_barrier_wait (&barrier_2);
+
+ xpthread_join (target_thread);
+ }
+
+ for (int i = 0; i < array_length (threads); ++i)
+ xpthread_join (threads[i]);
+ xpthread_join (thr_timeout);
+
+ return 0;
+}
+
+#include <support/test-driver.c>