nptl: Fix join/tryjoin comments.
Commit Message
Florian,
I noticed these comment mistakes while reviewing your patch for BZ #24211.
OK for master?
~~~
This commit fixes several inaccurate comments in the implementation of
__pthread_timedjoin_ex(), pthread_tryjoin_np, and lll_wait_tid, all of
which are used to implement POSIX thread joining.
Firstly, in pthread_tryjoin_np() we only attempt the join if a read of
pd->tid == 0, because that means the thread has already been reaped by
the kernel and we can safely join it without blocking.
Secondly, all join implementations call the common
__pthread_timedjoin_ex and only if abstime is NULL (block) do we
actually need to use cancellation (to cancel the potentially infinite
wait).
Lastly, in lll_wait_tid we only call the canellation futex operations if
abstime is non-NULL.
With the comments corrected the implementation notes are consistent and
logical.
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
---
ChangeLog | 7 +++++++
nptl/lll_timedwait_tid.c | 3 +--
nptl/pthread_join_common.c | 4 ++--
nptl/pthread_tryjoin.c | 2 +-
sysdeps/nptl/lowlevellock.h | 2 +-
5 files changed, 12 insertions(+), 6 deletions(-)
Comments
* Carlos O'Donell:
> This commit fixes several inaccurate comments in the implementation of
> __pthread_timedjoin_ex(), pthread_tryjoin_np, and lll_wait_tid, all of
> which are used to implement POSIX thread joining.
GNU style is not to use () after function names.
> Firstly, in pthread_tryjoin_np() we only attempt the join if a read of
> pd->tid == 0, because that means the thread has already been reaped by
> the kernel and we can safely join it without blocking.
>
> Secondly, all join implementations call the common
> __pthread_timedjoin_ex and only if abstime is NULL (block) do we
> actually need to use cancellation (to cancel the potentially infinite
> wait).
I think the change regarding abstime == NULL is wrong. This is about
pthread_join/pthread_timedjoin_np (blocking) on the one hand and
pthread_tryjoin_np on the other (not blocking).
Thanks,
Florian
On 2/11/19 4:37 PM, Florian Weimer wrote:
> * Carlos O'Donell:
>
>> This commit fixes several inaccurate comments in the implementation of
>> __pthread_timedjoin_ex(), pthread_tryjoin_np, and lll_wait_tid, all of
>> which are used to implement POSIX thread joining.
>
> GNU style is not to use () after function names.
Commit message fixed (by removing ()).
>> Firstly, in pthread_tryjoin_np() we only attempt the join if a read of
>> pd->tid == 0, because that means the thread has already been reaped by
>> the kernel and we can safely join it without blocking.
>>
>> Secondly, all join implementations call the common
>> __pthread_timedjoin_ex and only if abstime is NULL (block) do we
>> actually need to use cancellation (to cancel the potentially infinite
>> wait).
>
> I think the change regarding abstime == NULL is wrong. This is about
> pthread_join/pthread_timedjoin_np (blocking) on the one hand and
> pthread_tryjoin_np on the other (not blocking).
Why is the comment wrong? Or do you mean to say the comment could be
made more accurate?
The same core __pthread_timedjoin_ex is used for everything, so it handles
3 cases:
* join
* tryjoin
* timedjoin
In a join with abstime NULL we switch to asynchronous cancellation, and
this is done behind the scenes by lll_wait_tid calling lll_futex_wait_cancel
when abstime is NULL.
In a tryjoin we only call __pthread_timedjoin_ex when __tid is 0, and we
optimize away the lll_wait_tid by setting block to true. It doesn't change
the fact that if we had called lll_wait_tid, we would not have blocked
anyway, we would have stopped at the first atomic_load_acquire and never
waited.
Only in timedjoin, with abstime non-NULL, do we ever reach the call to
lll_futex_wait_cancel. It's the only time we switch to asychronous ancellation
because it's what we do around the *_cancel variants (until we have a better
method for doing cancellation).
Given this information:
(a) For join, block == true, and abstime == NULL, so we do use async cancel
for the futex wait. And the old comment was fine, but the new comment
is clearer. The old comment also was for when we *unconditionally*
called CANCEL_ASYNC() (which means "turn on async cancel"), but this
code was removed in 2018 by ce7eb0e9031, making this not something we
do unconditionally anymore.
(b) For tryjoin, block == false, and abstime == NULL, in theory the comment
in pthread_tryjoin should be deleted because block == false will mean
we *never* call lll_wait_tid...
29 /* If pd->tid != 0 then lll_wait_tid will not block on futex
30 operation. */
^^^^ With the block == false optimization we never call lll_wait_tid!
But the comment is wrong, probably just a typo in 2017 by
4735850f7a4.
31 return __pthread_timedjoin_ex (threadid, thread_return, NULL, false);
but I like the comment because it holds regardless of the optimization.
(c) For timedjoin, block == true, and abstime != NULL, and the comment is
wrong, there is no async cancel involved. We call lll_wait_tid and
the abstime != NULL case calls __lll_timedwait_tid which is not a
cancellation point. No cancellation for timedjoin.
So in (c) we show that it's not about block vs non-block, it's about
making only pthread_join a cancellation point.
Is this a bug? Should __lll_timedwait_tid for very long timeouts have used
cancellation? No. Since the timedjoin is a GNU addition it is ours to define
as a cancellation point or not, and we choose not to. While join is mandated
by POSIX to be a cancellation point and it is (the only one of the joins to
be so).
* Carlos O'Donell:
> Why is the comment wrong? Or do you mean to say the comment could be
> made more accurate?
Let's go back a step. My understanding is that pthread_join and
pthread_timedjoin_np are both cancellation points, but
pthread_tryjoin_np is not. Are you saying that pthread_timedjoin_np is
not a cancellation point?
I assume that would be a bug.
Thanks,
Florian
On 12/02/2019 07:55, Florian Weimer wrote:
> * Carlos O'Donell:
>
>> Why is the comment wrong? Or do you mean to say the comment could be
>> made more accurate?
>
> Let's go back a step. My understanding is that pthread_join and
> pthread_timedjoin_np are both cancellation points, but
> pthread_tryjoin_np is not. Are you saying that pthread_timedjoin_np is
> not a cancellation point?
>
> I assume that would be a bug.
This is my mistake done by ce7eb0e90315eb1939ac29f656d39b3db858c092,
I refactored lll_wait_tid and wrongly assumed __lll_timedwait_tid is
a cancellation point. The lll_wait_tid is only used on
__pthread_timedjoin_ex and replicated on arch lowlevellock.h
implementation, I think it would be simpler to just move it to
nptl/pthread_join_common.c.
On 12/02/2019 09:39, Adhemerval Zanella wrote:
>
>
> On 12/02/2019 07:55, Florian Weimer wrote:
>> * Carlos O'Donell:
>>
>>> Why is the comment wrong? Or do you mean to say the comment could be
>>> made more accurate?
>>
>> Let's go back a step. My understanding is that pthread_join and
>> pthread_timedjoin_np are both cancellation points, but
>> pthread_tryjoin_np is not. Are you saying that pthread_timedjoin_np is
>> not a cancellation point?
>>
>> I assume that would be a bug.
>
> This is my mistake done by ce7eb0e90315eb1939ac29f656d39b3db858c092,
> I refactored lll_wait_tid and wrongly assumed __lll_timedwait_tid is
> a cancellation point. The lll_wait_tid is only used on
> __pthread_timedjoin_ex and replicated on arch lowlevellock.h
> implementation, I think it would be simpler to just move it to
> nptl/pthread_join_common.c.
>
>
I am working on fixing it.
* Adhemerval Zanella:
> On 12/02/2019 09:39, Adhemerval Zanella wrote:
>>
>>
>> On 12/02/2019 07:55, Florian Weimer wrote:
>>> * Carlos O'Donell:
>>>
>>>> Why is the comment wrong? Or do you mean to say the comment could be
>>>> made more accurate?
>>>
>>> Let's go back a step. My understanding is that pthread_join and
>>> pthread_timedjoin_np are both cancellation points, but
>>> pthread_tryjoin_np is not. Are you saying that pthread_timedjoin_np is
>>> not a cancellation point?
>>>
>>> I assume that would be a bug.
>>
>> This is my mistake done by ce7eb0e90315eb1939ac29f656d39b3db858c092,
>> I refactored lll_wait_tid and wrongly assumed __lll_timedwait_tid is
>> a cancellation point. The lll_wait_tid is only used on
>> __pthread_timedjoin_ex and replicated on arch lowlevellock.h
>> implementation, I think it would be simpler to just move it to
>> nptl/pthread_join_common.c.
>>
>>
>
> I am working on fixing it.
Thanks.
Meanwhile, I filed <https://sourceware.org/bugzilla/show_bug.cgi?id=24215>
with an out-of-tree test case.
Florian
@@ -1,3 +1,10 @@
+2019-02-11 Carlos O'Donell <carlos@redhat.com>
+
+ * nptl/lll_timedwait_tid.c: Use GNU-style comment.
+ * nptl/pthread_join_common.c: Fix comment.
+ * nptl/pthread_tryjoin.c: Likewise.
+ * sysdeps/nptl/lowlevellock.h: Likewise.
+
2019-02-11 Paul A. Clarke <pc@us.ibm.com>
* sysdeps/powerpc/fpu/e_sqrt.c (__slow_ieee754_sqrtf):
@@ -60,8 +60,7 @@ __lll_timedwait_tid (int *tidp, const struct timespec *abstime)
/* If *tidp == tid, wait until thread terminates or the wait times out.
The kernel up to version 3.16.3 does not use the private futex
- operations for futex wake-up when the clone terminates.
- */
+ operations for futex wake-up when the clone terminates. */
if (lll_futex_timed_wait (tidp, tid, &rt, LLL_SHARED) == -ETIMEDOUT)
return ETIMEDOUT;
}
@@ -76,8 +76,8 @@ __pthread_timedjoin_ex (pthread_t threadid, void **thread_return,
if (block)
{
- /* During the wait we change to asynchronous cancellation. If we
- are cancelled the thread we are waiting for must be marked as
+ /* If abstime is NULL we switch to asynchronous cancellation. If we
+ are cancelled then the thread we are waiting for must be marked as
un-wait-ed for again. */
pthread_cleanup_push (cleanup, &pd->joinid);
@@ -26,7 +26,7 @@ pthread_tryjoin_np (pthread_t threadid, void **thread_return)
if (pd->tid != 0)
return EBUSY;
- /* If pd->tid != 0 then lll_wait_tid will not block on futex
+ /* If pd->tid == 0 then lll_wait_tid will not block on futex
operation. */
return __pthread_timedjoin_ex (threadid, thread_return, NULL, false);
}
@@ -185,7 +185,7 @@ extern int __lll_timedwait_tid (int *, const struct timespec *)
operations for futex wake-up when the clone terminates.
If ABSTIME is not NULL, is used a timeout for futex call. If the timeout
occurs then return ETIMEOUT, if ABSTIME is invalid, return EINVAL.
- The futex operation are issues with cancellable versions. */
+ If abstime is NULL we use a cancellable futex wait operation. */
#define lll_wait_tid(tid, abstime) \
({ \
int __res = 0; \