misc/tst-clone3: Fix waiting for exited thread.
Commit Message
Attached the updated patch.
See also notes below.
On 02/08/2019 07:37 PM, Adhemerval Zanella wrote:
>
>
> On 08/02/2019 13:12, Stefan Liebler wrote:
>> Hi,
>>
>> from time to time the test misc/tst-clone3 fails with an timeout. Then futex_wait is blocking. Usually ctid should be set to zero due to CLONE_CHILD_CLEARTID and the futex should be waken up. But the fail occures if the thread has already exited before ctid is set to the return value of clone(). Then futex_wait() will block as there will be nobody who wakes the futex up again.
>>
>> This patch initializes ctid to a known value before calling clone and the kernel is the only one who updates the value to zero after clone. If futex_wait is called then it is either waked up due to the exited thread or the futex syscall fails as *ctid_ptr is already zero instead of the specified value 1.
>>
>> Okay to commit?
>>
>> Bye
>> Stefan
>>
>> ChangeLog:
>>
>> * sysdeps/unix/sysv/linux/tst-clone3.c (do_test):
>> Initialize ctid with a known value and remove update of ctid
>> after clone.
>> (wait_tid): Adjust arguments and call futex_wait with ctid_val
>> as assumed current value of ctid_ptr.
>>
>> 20190208_misc_tst-clone3.patch
>>
>> commit 5a8f80973dbbf06a0eebc2064d2a14521c6a6131
>> Author: Stefan Liebler <stli@linux.ibm.com>
>> Date: Fri Feb 8 12:42:52 2019 +0100
>>
>> misc/tst-clone3: Fix waiting for exited thread.
>>
>> From time to time the test misc/tst-clone3 fails with an timeout.
>> Then futex_wait is blocking. Usually ctid should be set to zero
>> due to CLONE_CHILD_CLEARTID and the futex should be waken up.
>> But the fail occures if the thread has already exited before
>> ctid is set to the return value of clone(). Then futex_wait() will
>> block as there will be nobody who wakes the futex up again.
>>
>> This patch initializes ctid to a known value before calling clone
>> and the kernel is the only one who updates the value to zero after clone.
>> If futex_wait is called then it is either waked up due to the exited thread
>> or the futex syscall fails as *ctid_ptr is already zero instead of the
>> specified value 1.
>>
>> ChangeLog:
>>
>> * sysdeps/unix/sysv/linux/tst-clone3.c (do_test):
>> Initialize ctid with a known value and remove update of ctid
>> after clone.
>> (wait_tid): Adjust arguments and call futex_wait with ctid_val
>> as assumed current value of ctid_ptr.
>
> Thanks for catching it.
>
>>
>> diff --git a/sysdeps/unix/sysv/linux/tst-clone3.c b/sysdeps/unix/sysv/linux/tst-clone3.c
>> index aa8e718afe..ffa2056eb6 100644
>> --- a/sysdeps/unix/sysv/linux/tst-clone3.c
>> +++ b/sysdeps/unix/sysv/linux/tst-clone3.c
>> @@ -42,11 +42,11 @@ f (void *a)
>>
>> /* Futex wait for TID argument, similar to pthread_join internal
>> implementation. */
>> -#define wait_tid(tid) \
>> +#define wait_tid(ctid_ptr, ctid_val) \
>> do { \
>> - __typeof (tid) __tid; \
>> - while ((__tid = (tid)) != 0) \
>> - futex_wait (&(tid), __tid); \
>> + __typeof (*(ctid_ptr)) __tid; \
>> + while ((__tid = *(ctid_ptr)) != 0) \
>> + futex_wait (ctid_ptr, ctid_val); \
>> } while (0)
>
> lll_wait_tid uses an atomic load with acquire semantic, I think we should
> use it as well. We can either include the atomic header or use c11 atomic.
>
I've first tried to include atomic.h, but it failed building on x86_64.
Thus I'm using the c11 atomic load in the updated patch.
Okay to commit?
As information, I've observed those gcc errors on x86_64:
In file included from ../sysdeps/unix/sysv/linux/x86_64/sysdep.h:30,
from ../sysdeps/x86_64/nptl/tls.h:28,
from ../sysdeps/x86/atomic-machine.h:23,
from ../include/atomic.h:50,
from ../sysdeps/unix/sysv/linux/tst-clone3.c:29:
../sysdeps/unix/sysv/linux/dl-sysdep.h: In function
‘_dl_discover_osversion’:
../sysdeps/unix/sysv/linux/dl-sysdep.h:31:42: error: expected
declaration specifiers before ‘attribute_hidden’
extern int _dl_discover_osversion (void) attribute_hidden;
^~~~~~~~~~~~~~~~
In file included from ../sysdeps/x86_64/nptl/tls.h:31,
from ../sysdeps/x86/atomic-machine.h:23,
from ../include/atomic.h:50,
from ../sysdeps/unix/sysv/linux/tst-clone3.c:29:
../sysdeps/generic/dl-dtv.h:22:1: error: empty declaration [-Werror]
struct dtv_pointer
^~~~~~
../sysdeps/generic/dl-dtv.h:33:3: error: storage class specified for
parameter ‘dtv_t’
} dtv_t;
^~~~~
...
many many more errors
...
I've also tried to add tst-clone3 to tests_internal instead of tests in
the Makefile, but then I've got:
In file included from ../sysdeps/x86_64/nptl/tls.h:130,
from ../sysdeps/x86/atomic-machine.h:23,
from ../include/atomic.h:50,
from ../sysdeps/unix/sysv/linux/tst-clone3.c:29:
../nptl/descr.h:104:8: error: redefinition of ‘struct robust_list_head’
struct robust_list_head
^~~~~~~~~~~~~~~~
In file included from ../sysdeps/unix/sysv/linux/tst-clone3.c:26:
/usr/include/linux/futex.h:70:8: note: originally defined here
struct robust_list_head {
^~~~~~~~~~~~~~~~
>>
>> static inline int
>> @@ -64,7 +64,11 @@ do_test (void)
>> clone_flags |= CLONE_VM | CLONE_SIGHAND;
>> /* We will used ctid to call on futex to wait for thread exit. */
>> clone_flags |= CLONE_CHILD_CLEARTID;
>> - pid_t ctid, tid;
>> + /* Initialize with a known value. ctid is set to zero by the kernel after the
>> + cloned thread has exited. */
>> +#define CTID_INIT_VAL 1
>> + pid_t ctid = CTID_INIT_VAL;
>> + pid_t tid;
>>
>> #ifdef __ia64__
>> extern int __clone2 (int (*__fn) (void *__arg), void *__child_stack_base,
>> @@ -86,8 +90,7 @@ do_test (void)
>> if (tid == -1)
>> FAIL_EXIT1 ("clone failed: %m");
>>
>> - ctid = tid;
>> - wait_tid (ctid);
>> + wait_tid (&ctid, CTID_INIT_VAL);
>>
>> return 2;
>> }
>>
>
Comments
* Stefan Liebler:
> I've first tried to include atomic.h, but it failed building on
> x86_64. Thus I'm using the c11 atomic load in the updated patch.
> Okay to commit?
>
>
> As information, I've observed those gcc errors on x86_64:
> In file included from ../sysdeps/unix/sysv/linux/x86_64/sysdep.h:30,
> from ../sysdeps/x86_64/nptl/tls.h:28,
> from ../sysdeps/x86/atomic-machine.h:23,
> from ../include/atomic.h:50,
> from ../sysdeps/unix/sysv/linux/tst-clone3.c:29:
> ../sysdeps/unix/sysv/linux/dl-sysdep.h: In function
> ‘_dl_discover_osversion’:
> ../sysdeps/unix/sysv/linux/dl-sysdep.h:31:42: error: expected
> declaration specifiers before ‘attribute_hidden’
> extern int _dl_discover_osversion (void) attribute_hidden;
> ^~~~~~~~~~~~~~~~
That's because the test isn't in tests-internal.
> + while ((__tid = __atomic_load_n (ctid_ptr, __ATOMIC_ACQUIRE)) != 0) \
Actually, that's not a C11 atomic construct, but I think it's okay to
use that here. (The C11 stuff lives in <stdatomic.h> and should be
functionally equivalent.)
Sorry, this is a pet peeve of mine. We have three different atomic
access facilities that people refer to as C11 atomics: Our own
<atomic.h>, the GCC __atomic builtins, and <stdatomic.h>.
I still think this contributes to cognitive load, and we should
eliminate all but one (leaving us with new-style atomics and the old
macros in <atomic.h>). The GCC __atomic builtins have the best freely
available documentation, so they are a natural candidate IMHO.
Thanks,
Florian
On 11/02/2019 08:59, Florian Weimer wrote:
> * Stefan Liebler:
>
>> I've first tried to include atomic.h, but it failed building on
>> x86_64. Thus I'm using the c11 atomic load in the updated patch.
>> Okay to commit?
>>
>>
>> As information, I've observed those gcc errors on x86_64:
>> In file included from ../sysdeps/unix/sysv/linux/x86_64/sysdep.h:30,
>> from ../sysdeps/x86_64/nptl/tls.h:28,
>> from ../sysdeps/x86/atomic-machine.h:23,
>> from ../include/atomic.h:50,
>> from ../sysdeps/unix/sysv/linux/tst-clone3.c:29:
>> ../sysdeps/unix/sysv/linux/dl-sysdep.h: In function
>> ‘_dl_discover_osversion’:
>> ../sysdeps/unix/sysv/linux/dl-sysdep.h:31:42: error: expected
>> declaration specifiers before ‘attribute_hidden’
>> extern int _dl_discover_osversion (void) attribute_hidden;
>> ^~~~~~~~~~~~~~~~
>
> That's because the test isn't in tests-internal.
>
>> + while ((__tid = __atomic_load_n (ctid_ptr, __ATOMIC_ACQUIRE)) != 0) \
>
> Actually, that's not a C11 atomic construct, but I think it's okay to
> use that here. (The C11 stuff lives in <stdatomic.h> and should be
> functionally equivalent.)
>
> Sorry, this is a pet peeve of mine. We have three different atomic
> access facilities that people refer to as C11 atomics: Our own
> <atomic.h>, the GCC __atomic builtins, and <stdatomic.h>.
>
> I still think this contributes to cognitive load, and we should
> eliminate all but one (leaving us with new-style atomics and the old
> macros in <atomic.h>). The GCC __atomic builtins have the best freely
> available documentation, so they are a natural candidate IMHO.
It really annoying that C11 standard is not freely available from ISO,
although the working draft is still open [1]. I don't have a strong
opinion, but since there is support in the language itself and they
are fully supported by the compiler I also don't see why not prefer it.
[1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
commit 3672d05bd3c5c51831ef8b91ee2b0ff491fd2986
Author: Stefan Liebler <stli@linux.ibm.com>
Date: Fri Feb 8 12:42:52 2019 +0100
misc/tst-clone3: Fix waiting for exited thread.
From time to time the test misc/tst-clone3 fails with a timeout.
Then futex_wait is blocking. Usually ctid should be set to zero
due to CLONE_CHILD_CLEARTID and the futex should be waken up.
But the fail occures if the thread has already exited before
ctid is set to the return value of clone(). Then futex_wait() will
block as there will be nobody who wakes the futex up again.
This patch initializes ctid to a known value before calling clone
and the kernel is the only one who updates the value to zero after clone.
If futex_wait is called then it is either waked up due to the exited thread
or the futex syscall fails as *ctid_ptr is already zero instead of the
specified value 1.
ChangeLog:
* sysdeps/unix/sysv/linux/tst-clone3.c (do_test):
Initialize ctid with a known value and remove update of ctid
after clone.
(wait_tid): Adjust arguments and call futex_wait with ctid_val
as assumed current value of ctid_ptr.
@@ -42,11 +42,13 @@ f (void *a)
/* Futex wait for TID argument, similar to pthread_join internal
implementation. */
-#define wait_tid(tid) \
- do { \
- __typeof (tid) __tid; \
- while ((__tid = (tid)) != 0) \
- futex_wait (&(tid), __tid); \
+#define wait_tid(ctid_ptr, ctid_val) \
+ do { \
+ __typeof (*(ctid_ptr)) __tid; \
+ /* We need acquire MO here so that we synchronize with the \
+ kernel's store to 0 when the clone terminates. */ \
+ while ((__tid = __atomic_load_n (ctid_ptr, __ATOMIC_ACQUIRE)) != 0) \
+ futex_wait (ctid_ptr, ctid_val); \
} while (0)
static inline int
@@ -64,7 +66,11 @@ do_test (void)
clone_flags |= CLONE_VM | CLONE_SIGHAND;
/* We will used ctid to call on futex to wait for thread exit. */
clone_flags |= CLONE_CHILD_CLEARTID;
- pid_t ctid, tid;
+ /* Initialize with a known value. ctid is set to zero by the kernel after the
+ cloned thread has exited. */
+#define CTID_INIT_VAL 1
+ pid_t ctid = CTID_INIT_VAL;
+ pid_t tid;
#ifdef __ia64__
extern int __clone2 (int (*__fn) (void *__arg), void *__child_stack_base,
@@ -86,8 +92,7 @@ do_test (void)
if (tid == -1)
FAIL_EXIT1 ("clone failed: %m");
- ctid = tid;
- wait_tid (ctid);
+ wait_tid (&ctid, CTID_INIT_VAL);
return 2;
}