[v2,2/4] manual: Add documentation for pthread_tryjoin_np and pthread_timedjoin_np

Message ID f6aea6d8eb36a982f7e42e7b571e6d7dd6919a8a.1568809830.git-series.mac@mcrowe.com
State Committed
Headers

Commit Message

Mike Crowe Sept. 18, 2019, 12:30 p.m. UTC
  This documentation isn't perfect, but it's better than nothing and can
hopefully act as a basis for future improvement. In particular, although
I'm certain that the functions are MT-Safe, I'm not sure about the rest of
the @safety line.

Yann Droneaud pointed out that pthread_timedjoin_np would wait forever if
the timeout parameter is passed as NULL. Since it's now too late to change
that, I've documented it.

	* manual/threads.texi: Add brief documentation for
	pthread_tryjoin_np and pthread_timedjoin_np.
---
 ChangeLog           |  5 +++++
 manual/threads.texi | 26 ++++++++++++++++++++++++--
 2 files changed, 29 insertions(+), 2 deletions(-)
  

Comments

Adhemerval Zanella Sept. 26, 2019, 9:26 p.m. UTC | #1
On 18/09/2019 05:30, Mike Crowe wrote:
> This documentation isn't perfect, but it's better than nothing and can
> hopefully act as a basis for future improvement. In particular, although
> I'm certain that the functions are MT-Safe, I'm not sure about the rest of
> the @safety line.
> 
> Yann Droneaud pointed out that pthread_timedjoin_np would wait forever if
> the timeout parameter is passed as NULL. Since it's now too late to change
> that, I've documented it.

To which semantic do you think it would be better to change? Since this
symbol are not yet in POSIX there still the possibility to change it
by using a new plus a compat symbol.

> 
> 	* manual/threads.texi: Add brief documentation for
> 	pthread_tryjoin_np and pthread_timedjoin_np.

LGTM, thanks.

> ---
>  ChangeLog           |  5 +++++
>  manual/threads.texi | 26 ++++++++++++++++++++++++--
>  2 files changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/ChangeLog b/ChangeLog
> index d7943ff..94ee860 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,5 +1,10 @@
>  2019-09-18  Mike Crowe  <mac@mcrowe.com>
>  
> +	* manual/threads.texi: Add brief documentation for
> +	pthread_tryjoin_np and pthread_timedjoin_np.
> +
> +2019-09-18  Mike Crowe  <mac@mcrowe.com>
> +
>  	* nptl/tst-join3.c: Use libsupport.
>  
>  2019-09-18  Stefan Liebler  <stli@linux.ibm.com>
> diff --git a/manual/threads.texi b/manual/threads.texi
> index 0e5e84a..8dcfc53 100644
> --- a/manual/threads.texi
> +++ b/manual/threads.texi
> @@ -727,6 +727,30 @@ rather than @code{CLOCK_REALTIME}.  Currently, @var{clockid} must be either
>  returned.
>  @end deftypefun
>  
> +@comment pthread.h
> +@comment GNU extension
> +@deftypefun int pthread_tryjoin_np (pthread_t *@var{thread},
> +				      void **@var{thread_return})
> +@standards{GNU, pthread.h}
> +@safety{@prelim{}@mtsafe{}@asunsafe{@asulock{}}@acunsafe{@aculock{}}}
> +Behaves like @code{pthread_join} except that it will return @code{EBUSY}
> +immediately if the thread specified by @var{thread} has not yet terminated.
> +@end deftypefun
> +

Ok.

> +@comment pthread.h
> +@comment GNU extension
> +@deftypefun int pthread_timedjoin_np (pthread_t *@var{thread},
> +				      void **@var{thread_return},
> +				      const struct timespec *@var{abstime})
> +@standards{GNU, pthread.h}
> +@safety{@prelim{}@mtsafe{}@asunsafe{@asulock{}}@acunsafe{@aculock{}}}
> +Behaves like @code{pthread_tryjoin_np} except that it will block until the
> +absolute time @var{abstime} measured against @code{CLOCK_REALTIME} is
> +reached if the thread has not terminated by that time and return
> +@code{EBUSY}. If @var{abstime} is equal to @code{NULL} then the function
> +will wait forever in the same way as @code{pthread_join}.
> +@end deftypefun
> +
>  @c FIXME these are undocumented:
>  @c pthread_atfork
>  @c pthread_attr_destroy
> @@ -844,6 +868,4 @@ returned.
>  @c pthread_spin_trylock
>  @c pthread_spin_unlock
>  @c pthread_testcancel
> -@c pthread_timedjoin_np
> -@c pthread_tryjoin_np
>  @c pthread_yield
> 

Ok.
  
Yann Droneaud Sept. 27, 2019, 9:08 a.m. UTC | #2
Hi,

Le jeudi 26 septembre 2019 à 14:26 -0700, Adhemerval Zanella a écrit :
> 
> On 18/09/2019 05:30, Mike Crowe wrote:
> > This documentation isn't perfect, but it's better than nothing and can
> > hopefully act as a basis for future improvement. In particular, although
> > I'm certain that the functions are MT-Safe, I'm not sure about the rest of
> > the @safety line.
> > 
> > Yann Droneaud pointed out that pthread_timedjoin_np would wait forever if
> > the timeout parameter is passed as NULL. Since it's now too late to change
> > that, I've documented it.
> 
> To which semantic do you think it would be better to change? Since this
> symbol are not yet in POSIX there still the possibility to change it
> by using a new plus a compat symbol.
> 

I think it's the behavior we want in order to accomodate for every
needs with a single interface:

- wait forever (aka. pthread_join())        : pass NULL
- wait:        (aka. pthread_timedjoin_np()): pass deadline
- don't wait   (aka. pthread_tryjoin_np())  : pass current time

The later could be made more usuable if (struct timespec){ 0, 0 } would
have a special meaning, but it cannot, as timestamp can overflow ...

Regards.
  
Mike Crowe Sept. 28, 2019, 8:53 a.m. UTC | #3
On Thursday 26 September 2019 at 14:26:00 -0700, Adhemerval Zanella wrote:
> On 18/09/2019 05:30, Mike Crowe wrote:
> > This documentation isn't perfect, but it's better than nothing and can
> > hopefully act as a basis for future improvement. In particular, although
> > I'm certain that the functions are MT-Safe, I'm not sure about the rest of
> > the @safety line.
> > 
> > Yann Droneaud pointed out that pthread_timedjoin_np would wait forever if
> > the timeout parameter is passed as NULL. Since it's now too late to change
> > that, I've documented it.
> 
> To which semantic do you think it would be better to change? Since this
> symbol are not yet in POSIX there still the possibility to change it
> by using a new plus a compat symbol.

The alternative would be to mark the timeout parameter as __nonnull and not
permit it to be NULL, as is the case for the other pthread timed*/clock*
functions. I imagine that if the functions were ever standardised then this
would be the case - but that doesn't look like it's on the horizon.

Although we wouldn't break existing binaries if we change the behaviour and
add a compat symbol, we could break people compiling existing source. I
don't think it's a serious enough problem for it to be worth even doing
that.

However, if others feel strongly about it then I'm prepared to make that
change.

Thanks.

Mike.
  
Adhemerval Zanella Oct. 14, 2019, 7:15 p.m. UTC | #4
On 28/09/2019 05:53, Mike Crowe wrote:
> On Thursday 26 September 2019 at 14:26:00 -0700, Adhemerval Zanella wrote:
>> On 18/09/2019 05:30, Mike Crowe wrote:
>>> This documentation isn't perfect, but it's better than nothing and can
>>> hopefully act as a basis for future improvement. In particular, although
>>> I'm certain that the functions are MT-Safe, I'm not sure about the rest of
>>> the @safety line.
>>>
>>> Yann Droneaud pointed out that pthread_timedjoin_np would wait forever if
>>> the timeout parameter is passed as NULL. Since it's now too late to change
>>> that, I've documented it.
>>
>> To which semantic do you think it would be better to change? Since this
>> symbol are not yet in POSIX there still the possibility to change it
>> by using a new plus a compat symbol.
> 
> The alternative would be to mark the timeout parameter as __nonnull and not
> permit it to be NULL, as is the case for the other pthread timed*/clock*
> functions. I imagine that if the functions were ever standardised then this
> would be the case - but that doesn't look like it's on the horizon.
> 
> Although we wouldn't break existing binaries if we change the behaviour and
> add a compat symbol, we could break people compiling existing source. I
> don't think it's a serious enough problem for it to be worth even doing
> that.
> 
> However, if others feel strongly about it then I'm prepared to make that
> change.

At least other pthread interfaces that uses a timespec as input argument
add a nonnull on their interfaces. However they handle a null not equal:

  - pthread_mutex_{clock,timed}lock will trigger a SEGFAULT due 
    lll_timedlock_wait.c:31 for the contended case, it works for
    uncontended case (where an userspace atomic operation is suffice).

  - pthread_cond_timedwait will also trigger a SEGFAULT due
    nptl/pthread_cond_wait.c:648.

  - pthread_{rw,rd}lock_{clock,timed}lock will handle it as unbounded
    time (since both pthread_rwlock_common.c and futexes call check if
    abstime is NULL).

> - wait forever (aka. pthread_join())        : pass NULL
> - wait:        (aka. pthread_timedjoin_np()): pass deadline
> - don't wait   (aka. pthread_tryjoin_np())  : pass current time

I don't see that making pthread_timedjoin_np behaves differently than
other pthread interfaces a good move forward. So I would avoid support
both 'wait forever' and 'don't wait' semantic in this case.

And POSIX also is not specific regarding timeout handling in aforementioned
pthread interfaces. And systems that implement them do differ, for instance:

  - FreeBSD 11.1 does not trigger any build warning or runtime error for
    NULL argument for neither pthread_mutex_timedlock or
    pthread_{rw,rd}lock_lock.  The pthread_cond_timedwait does return
    EINVAL though.

  - Solaris 11.4 seems to behave as glibc, it triggers a SEGFAULT for
    pthread_mutex_timedlock (for *both* contended and non-contended case),
    pthread_{rw,rd}lock_timedlock, *and* for pthread_cond_timedwait.

  - AIX 7.2 returns EINVAL for contended pthread_mutex_timedlock case and
    works for non-contended case. It does not trigger any runtime issue
    for {rw,rd}lock and also returns EINVAL for pthread_cond_timedwait.

So trying to come up with a portable error handling does not seem to
that valuable, it leads to believe common usage is to *not* use NULL 
value for timeout arguments.

So I would suggest to sync with current practice and fail early for
invalid inputs either by EINVAL or by assuming valid pointer (which
trigger an invalid memory access and warns user that this is an invalid
usage).
  
Mike Crowe Oct. 18, 2019, 4:28 p.m. UTC | #5
On Monday 14 October 2019 at 16:15:11 -0300, Adhemerval Zanella wrote:
> On 28/09/2019 05:53, Mike Crowe wrote:
> > On Thursday 26 September 2019 at 14:26:00 -0700, Adhemerval Zanella wrote:
> >> On 18/09/2019 05:30, Mike Crowe wrote:
> >>> This documentation isn't perfect, but it's better than nothing and can
> >>> hopefully act as a basis for future improvement. In particular, although
> >>> I'm certain that the functions are MT-Safe, I'm not sure about the rest of
> >>> the @safety line.
> >>>
> >>> Yann Droneaud pointed out that pthread_timedjoin_np would wait forever if
> >>> the timeout parameter is passed as NULL. Since it's now too late to change
> >>> that, I've documented it.
> >>
> >> To which semantic do you think it would be better to change? Since this
> >> symbol are not yet in POSIX there still the possibility to change it
> >> by using a new plus a compat symbol.
> > 
> > The alternative would be to mark the timeout parameter as __nonnull and not
> > permit it to be NULL, as is the case for the other pthread timed*/clock*
> > functions. I imagine that if the functions were ever standardised then this
> > would be the case - but that doesn't look like it's on the horizon.
> > 
> > Although we wouldn't break existing binaries if we change the behaviour and
> > add a compat symbol, we could break people compiling existing source. I
> > don't think it's a serious enough problem for it to be worth even doing
> > that.
> > 
> > However, if others feel strongly about it then I'm prepared to make that
> > change.
> 
> At least other pthread interfaces that uses a timespec as input argument
> add a nonnull on their interfaces. However they handle a null not equal:
> 
>   - pthread_mutex_{clock,timed}lock will trigger a SEGFAULT due 
>     lll_timedlock_wait.c:31 for the contended case, it works for
>     uncontended case (where an userspace atomic operation is suffice).
> 
>   - pthread_cond_timedwait will also trigger a SEGFAULT due
>     nptl/pthread_cond_wait.c:648.
> 
>   - pthread_{rw,rd}lock_{clock,timed}lock will handle it as unbounded
>     time (since both pthread_rwlock_common.c and futexes call check if
>     abstime is NULL).
> 
> > - wait forever (aka. pthread_join())        : pass NULL
> > - wait:        (aka. pthread_timedjoin_np()): pass deadline
> > - don't wait   (aka. pthread_tryjoin_np())  : pass current time
> 
> I don't see that making pthread_timedjoin_np behaves differently than
> other pthread interfaces a good move forward. So I would avoid support
> both 'wait forever' and 'don't wait' semantic in this case.
>
> And POSIX also is not specific regarding timeout handling in aforementioned
> pthread interfaces. And systems that implement them do differ, for instance:
> 
>   - FreeBSD 11.1 does not trigger any build warning or runtime error for
>     NULL argument for neither pthread_mutex_timedlock or
>     pthread_{rw,rd}lock_lock.  The pthread_cond_timedwait does return
>     EINVAL though.
> 
>   - Solaris 11.4 seems to behave as glibc, it triggers a SEGFAULT for
>     pthread_mutex_timedlock (for *both* contended and non-contended case),
>     pthread_{rw,rd}lock_timedlock, *and* for pthread_cond_timedwait.
> 
>   - AIX 7.2 returns EINVAL for contended pthread_mutex_timedlock case and
>     works for non-contended case. It does not trigger any runtime issue
>     for {rw,rd}lock and also returns EINVAL for pthread_cond_timedwait.
> 
> So trying to come up with a portable error handling does not seem to
> that valuable, it leads to believe common usage is to *not* use NULL 
> value for timeout arguments.
> 
> So I would suggest to sync with current practice and fail early for
> invalid inputs either by EINVAL or by assuming valid pointer (which
> trigger an invalid memory access and warns user that this is an invalid
> usage).

Since pthread_timedjoin_np is not new, we risk changing the behaviour of
existing code that may have come to rely on the behaviour when passing
NULL. Even if we use symbol versioning to ensure that code compiled against
old glibc continues to behave as it always did, we still risk changing the
behaviour when code is compiled against the newer glibc. If that code is
compiled with -Wnonnull and the compiler is able to determine that NULL is
being passed (which it may not if the code in question is using a wrapper
function) then a warning would be emitted which may help, but does not
guarantee that the change will be noticed.

pthread_clockjoin_np is new, so it doesn't have any legacy behaviour. We
can mark the timeout as __nonnull. We could check the argument against NULL
and return EINVAL, but it seems that the compiler warns when checking
__nonnull arguments and just throws the check away[1] so we're probably
stuck with potentially faulting in that situation.

In my view, changing the behaviour of pthread_timedjoin_np now is of little
benefit compared to the risk of breaking existing code. That ship has
sailed. Having decided that, I thought that making pthread_clockjoin_np
behave differently to pthread_timedjoin_np would be confusing, so I settled
on it behaving the same way.

So, what are you proposing we should do? Change the behaviour of the
existing pthread_timedjoin_np function and the new pthread_clockjoin_np
function or only of the latter?

Thanks.

Mike.

[1] https://godbolt.org/z/Msw9vk
  
Adhemerval Zanella Oct. 21, 2019, 6:15 p.m. UTC | #6
On 18/10/2019 13:28, Mike Crowe wrote:
> Since pthread_timedjoin_np is not new, we risk changing the behaviour of
> existing code that may have come to rely on the behaviour when passing
> NULL. Even if we use symbol versioning to ensure that code compiled against
> old glibc continues to behave as it always did, we still risk changing the
> behaviour when code is compiled against the newer glibc. If that code is
> compiled with -Wnonnull and the compiler is able to determine that NULL is
> being passed (which it may not if the code in question is using a wrapper
> function) then a warning would be emitted which may help, but does not
> guarantee that the change will be noticed.

At least for code built against a new symbol we can not only add the nonnull
but at result EINVAL for invalid abstime. It would be semantic change, but
it is what compat symbol are used mainly.

> 
> pthread_clockjoin_np is new, so it doesn't have any legacy behaviour. We
> can mark the timeout as __nonnull. We could check the argument against NULL
> and return EINVAL, but it seems that the compiler warns when checking
> __nonnull arguments and just throws the check away[1] so we're probably
> stuck with potentially faulting in that situation.
> 
> In my view, changing the behaviour of pthread_timedjoin_np now is of little
> benefit compared to the risk of breaking existing code. That ship has
> sailed. Having decided that, I thought that making pthread_clockjoin_np
> behave differently to pthread_timedjoin_np would be confusing, so I settled
> on it behaving the same way.
> 
> So, what are you proposing we should do? Change the behaviour of the
> existing pthread_timedjoin_np function and the new pthread_clockjoin_np
> function or only of the latter?

I agree with your rationale that we should not change pthread_timedjoin_np
and mimic its behaviour for pthread_clockjoin_np (in fact I acked the patches
already).

This discussion was mostly due the proposition Yann Droneaud brought to you
and you mentioned in the email.  I think the patchset is good to be pushed
upstream.


> 
> Thanks.
> 
> Mike.
> 
> [1] https://godbolt.org/z/Msw9vk
>
  

Patch

diff --git a/ChangeLog b/ChangeLog
index d7943ff..94ee860 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,10 @@ 
 2019-09-18  Mike Crowe  <mac@mcrowe.com>
 
+	* manual/threads.texi: Add brief documentation for
+	pthread_tryjoin_np and pthread_timedjoin_np.
+
+2019-09-18  Mike Crowe  <mac@mcrowe.com>
+
 	* nptl/tst-join3.c: Use libsupport.
 
 2019-09-18  Stefan Liebler  <stli@linux.ibm.com>
diff --git a/manual/threads.texi b/manual/threads.texi
index 0e5e84a..8dcfc53 100644
--- a/manual/threads.texi
+++ b/manual/threads.texi
@@ -727,6 +727,30 @@  rather than @code{CLOCK_REALTIME}.  Currently, @var{clockid} must be either
 returned.
 @end deftypefun
 
+@comment pthread.h
+@comment GNU extension
+@deftypefun int pthread_tryjoin_np (pthread_t *@var{thread},
+				      void **@var{thread_return})
+@standards{GNU, pthread.h}
+@safety{@prelim{}@mtsafe{}@asunsafe{@asulock{}}@acunsafe{@aculock{}}}
+Behaves like @code{pthread_join} except that it will return @code{EBUSY}
+immediately if the thread specified by @var{thread} has not yet terminated.
+@end deftypefun
+
+@comment pthread.h
+@comment GNU extension
+@deftypefun int pthread_timedjoin_np (pthread_t *@var{thread},
+				      void **@var{thread_return},
+				      const struct timespec *@var{abstime})
+@standards{GNU, pthread.h}
+@safety{@prelim{}@mtsafe{}@asunsafe{@asulock{}}@acunsafe{@aculock{}}}
+Behaves like @code{pthread_tryjoin_np} except that it will block until the
+absolute time @var{abstime} measured against @code{CLOCK_REALTIME} is
+reached if the thread has not terminated by that time and return
+@code{EBUSY}. If @var{abstime} is equal to @code{NULL} then the function
+will wait forever in the same way as @code{pthread_join}.
+@end deftypefun
+
 @c FIXME these are undocumented:
 @c pthread_atfork
 @c pthread_attr_destroy
@@ -844,6 +868,4 @@  returned.
 @c pthread_spin_trylock
 @c pthread_spin_unlock
 @c pthread_testcancel
-@c pthread_timedjoin_np
-@c pthread_tryjoin_np
 @c pthread_yield