nptl: Provide a way to block all signals for the timer helper thread

Message ID 87a72d5h85.fsf@oldenburg2.str.redhat.com
State Dropped
Delegated to: Carlos O'Donell
Headers
Series nptl: Provide a way to block all signals for the timer helper thread |

Commit Message

Florian Weimer May 12, 2020, 3:02 p.m. UTC
  timer_create needs to create threads with all signals blocked,
including SIGTIMER (which happens to equal SIGCANCEL).  Add a new
internal interface which provides an explicit way to achieve that.

Fixes commit b3cae39dcbfa2432b3f3aa28854d8ac57f0de1b8 ("nptl: Start
new threads with all signals blocked [BZ #25098]").

Tested on x86_64-linux-gnu.

-----
 nptl/Versions                            |  1 +
 nptl/pthreadP.h                          | 10 +++++++++
 nptl/pthread_create.c                    | 36 +++++++++++++++++++++++---------
 sysdeps/unix/sysv/linux/timer_routines.c | 16 ++++++--------
 4 files changed, 43 insertions(+), 20 deletions(-)
  

Comments

Florian Weimer May 12, 2020, 3:03 p.m. UTC | #1
* Florian Weimer:

> timer_create needs to create threads with all signals blocked,
> including SIGTIMER (which happens to equal SIGCANCEL).  Add a new
> internal interface which provides an explicit way to achieve that.
>
> Fixes commit b3cae39dcbfa2432b3f3aa28854d8ac57f0de1b8 ("nptl: Start
> new threads with all signals blocked [BZ #25098]").
>
> Tested on x86_64-linux-gnu.

I have not seen the rt/tst-timer-sigmask failure myself, so I'd
appreciate if someone who saw it could verify that it is gone with this
patch.

Thanks,
Florian
  
H.J. Lu May 12, 2020, 4:09 p.m. UTC | #2
On Tue, May 12, 2020 at 8:03 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Florian Weimer:
>
> > timer_create needs to create threads with all signals blocked,
> > including SIGTIMER (which happens to equal SIGCANCEL).  Add a new
> > internal interface which provides an explicit way to achieve that.
> >
> > Fixes commit b3cae39dcbfa2432b3f3aa28854d8ac57f0de1b8 ("nptl: Start
> > new threads with all signals blocked [BZ #25098]").
> >
> > Tested on x86_64-linux-gnu.
>
> I have not seen the rt/tst-timer-sigmask failure myself, so I'd
> appreciate if someone who saw it could verify that it is gone with this
> patch.
>

I usually see it when I do "make -j8 check" in 3 glibc builds, i386, x32
and x86-64, on a 8-core machine.
  
Carlos O'Donell May 12, 2020, 6:38 p.m. UTC | #3
On 5/12/20 12:09 PM, H.J. Lu wrote:
> On Tue, May 12, 2020 at 8:03 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * Florian Weimer:
>>
>>> timer_create needs to create threads with all signals blocked,
>>> including SIGTIMER (which happens to equal SIGCANCEL).  Add a new
>>> internal interface which provides an explicit way to achieve that.
>>>
>>> Fixes commit b3cae39dcbfa2432b3f3aa28854d8ac57f0de1b8 ("nptl: Start
>>> new threads with all signals blocked [BZ #25098]").
>>>
>>> Tested on x86_64-linux-gnu.
>>
>> I have not seen the rt/tst-timer-sigmask failure myself, so I'd
>> appreciate if someone who saw it could verify that it is gone with this
>> patch.
>>
> 
> I usually see it when I do "make -j8 check" in 3 glibc builds, i386, x32
> and x86-64, on a 8-core machine.

On a 4 CPU VM I see it with an x86-64 and i686 build going simultaneously
in Fedora 32 with "make -j4 check" roughly going at the same time.
  
Adhemerval Zanella May 12, 2020, 7:02 p.m. UTC | #4
On 12/05/2020 12:02, Florian Weimer wrote:
> timer_create needs to create threads with all signals blocked,
> including SIGTIMER (which happens to equal SIGCANCEL).  Add a new
> internal interface which provides an explicit way to achieve that.
> 
> Fixes commit b3cae39dcbfa2432b3f3aa28854d8ac57f0de1b8 ("nptl: Start
> new threads with all signals blocked [BZ #25098]").
> 
> Tested on x86_64-linux-gnu.

The patch looks go, thanks.  As a side note, I am wondering if it would
be better to de-couple POSIX timer from cancellation handling, it should
avoid require a special pthread create symbols as this patch.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> 
> -----
>  nptl/Versions                            |  1 +
>  nptl/pthreadP.h                          | 10 +++++++++
>  nptl/pthread_create.c                    | 36 +++++++++++++++++++++++---------
>  sysdeps/unix/sysv/linux/timer_routines.c | 16 ++++++--------
>  4 files changed, 43 insertions(+), 20 deletions(-)
> 
> diff --git a/nptl/Versions b/nptl/Versions
> index f7140277f5..17a711dfb1 100644
> --- a/nptl/Versions
> +++ b/nptl/Versions
> @@ -285,5 +285,6 @@ libpthread {
>      __pthread_barrier_init; __pthread_barrier_wait;
>      __shm_directory;
>      __libpthread_freeres;
> +    __pthread_create_internal;
>    }
>  }

Ok.

> diff --git a/nptl/pthreadP.h b/nptl/pthreadP.h
> index c4e72f57a9..4e065ace58 100644
> --- a/nptl/pthreadP.h
> +++ b/nptl/pthreadP.h
> @@ -339,6 +339,16 @@ extern void __pthread_cleanup_upto (__jmp_buf target, char *targetframe);
>  hidden_proto (__pthread_cleanup_upto)
>  #endif
>  
> +/* Exactly like pthread_create if NEW_SIGMASK == NULL.  Otherwise, do
> +   not use the current signal mask for the new thread, but set it to
> +   *NEW_SIGMASK instead (without unblocking internal signals).  */
> +extern int __pthread_create_internal (pthread_t *newthread,
> +				      const pthread_attr_t *attr,
> +				      void *(*start_routine) (void *),
> +				      void *arg, const sigset_t *new_sigmask);
> +#if IS_IN (libpthread)
> +hidden_proto (__pthread_create_internal)
> +#endif
>  
>  /* Functions with versioned interfaces.  */
>  extern int __pthread_create_2_1 (pthread_t *newthread,

Ok.

> diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
> index afd379e89a..2430d65723 100644
> --- a/nptl/pthread_create.c
> +++ b/nptl/pthread_create.c
> @@ -603,10 +603,10 @@ report_thread_creation (struct pthread *pd)
>    return false;
>  }
>  
> -
>  int
> -__pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
> -		      void *(*start_routine) (void *), void *arg)
> +__pthread_create_internal (pthread_t *newthread, const pthread_attr_t *attr,
> +			   void *(*start_routine) (void *), void *arg,
> +			   const sigset_t *new_sigmask)
>  {
>    STACK_VARIABLES;
>  
> @@ -762,14 +762,21 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>    sigset_t original_sigmask;
>    __libc_signal_block_all (&original_sigmask);
>  
> -  /* Conceptually, the new thread needs to inherit the signal mask of
> -     this thread.  Therefore, it needs to restore the saved signal
> -     mask of this thread, so save it in the startup information.  */
> -  pd->sigmask = original_sigmask;
> +  if (new_sigmask != NULL)
> +    /* The caller supplied the signal mask for the new thread.  */
> +    pd->sigmask = *new_sigmask;
> +  else
> +    {
> +      /* Conceptually, the new thread needs to inherit the signal mask
> +	 of this thread.  Therefore, it needs to restore the saved
> +	 signal mask of this thread, so save it in the startup
> +	 information.  */
> +      pd->sigmask = original_sigmask;
>  
> -  /* Reset the cancellation signal mask in case this thread is running
> -     cancellation.  */
> -  __sigdelset (&pd->sigmask, SIGCANCEL);
> +      /* Reset the cancellation signal mask in case this thread is
> +	 running cancellation.  */
> +      __sigdelset (&pd->sigmask, SIGCANCEL);
> +    }
>  
>    /* Start the thread.  */
>    if (__glibc_unlikely (report_thread_creation (pd)))

Ok.

> @@ -873,6 +880,15 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>  
>    return retval;
>  }
> +hidden_def (__pthread_create_internal)
> +
> +int
> +__pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
> +		      void *(*start_routine) (void *), void *arg)
> +{
> +  return __pthread_create_internal (newthread, attr, start_routine, arg,
> +				    false);
> +}
>  versioned_symbol (libpthread, __pthread_create_2_1, pthread_create, GLIBC_2_1);
>  
>  

Ok.

> diff --git a/sysdeps/unix/sysv/linux/timer_routines.c b/sysdeps/unix/sysv/linux/timer_routines.c
> index 63083f6f91..7a5fa3dbb2 100644
> --- a/sysdeps/unix/sysv/linux/timer_routines.c
> +++ b/sysdeps/unix/sysv/linux/timer_routines.c
> @@ -136,23 +136,19 @@ __start_helper_thread (void)
>    (void) pthread_attr_init (&attr);
>    (void) pthread_attr_setstacksize (&attr, __pthread_get_minstack (&attr));
>  
> -  /* Block all signals in the helper thread but SIGSETXID.  To do this
> -     thoroughly we temporarily have to block all signals here.  The
> -     helper can lose wakeups if SIGTIMER is not blocked throughout.  */
> -  sigset_t ss;
> -  __libc_signal_block_app (&ss);
> -  __libc_signal_block_sigtimer (NULL);
> +  /* Block all signals in the helper thread but SIGSETXID.  */
> +  sigset_t new_sigmask;
> +  __sigfillset (&new_sigmask);
> +  __sigdelset (&new_sigmask, SIGSETXID);
>  
>    /* Create the helper thread for this timer.  */
>    pthread_t th;
> -  int res = pthread_create (&th, &attr, timer_helper_thread, NULL);
> +  int res = __pthread_create_internal (&th, &attr, timer_helper_thread, NULL,
> +				       &new_sigmask);
>    if (res == 0)
>      /* We managed to start the helper thread.  */
>      __helper_tid = ((struct pthread *) th)->tid;
>  
> -  /* Restore the signal mask.  */
> -  __libc_signal_restore_set (&ss);
> -
>    /* No need for the attribute anymore.  */
>    (void) pthread_attr_destroy (&attr);
>  
> 

Ok.
  
Carlos O'Donell May 12, 2020, 7:09 p.m. UTC | #5
On 5/12/20 11:02 AM, Florian Weimer wrote:
> timer_create needs to create threads with all signals blocked,
> including SIGTIMER (which happens to equal SIGCANCEL).  Add a new
> internal interface which provides an explicit way to achieve that.
> 
> Fixes commit b3cae39dcbfa2432b3f3aa28854d8ac57f0de1b8 ("nptl: Start
> new threads with all signals blocked [BZ #25098]").

Please post v2.

Testing is clean for x86_64 and i686, the tst-timer-sigmask failure is gone,
and I confirm that logically now we don't unblock SIGCANCEL (whose semantic
overlap with SIGTIMER is the root cause of this failure e.g. poor design).

My request below is that we need document the internal interface quirk and
we need to explain why it's there and why __pthread_create_internal has this
quirk.

I agree with Adhemerval that a future cleanup would be good to somehow decouple
SIGTIMER from SIGCANCEL, but I don't want to block this regression fix.

Tested-by: Carlos O'Donell <carlos@redhat.com>

> Tested on x86_64-linux-gnu.
> 
> -----

Needs to be 3 dashes, otherwise this is included in the commit message
by git-am. Please adjust your scripts.

>  nptl/Versions                            |  1 +
>  nptl/pthreadP.h                          | 10 +++++++++
>  nptl/pthread_create.c                    | 36 +++++++++++++++++++++++---------
>  sysdeps/unix/sysv/linux/timer_routines.c | 16 ++++++--------
>  4 files changed, 43 insertions(+), 20 deletions(-)
> 
> diff --git a/nptl/Versions b/nptl/Versions
> index f7140277f5..17a711dfb1 100644
> --- a/nptl/Versions
> +++ b/nptl/Versions
> @@ -285,5 +285,6 @@ libpthread {
>      __pthread_barrier_init; __pthread_barrier_wait;
>      __shm_directory;
>      __libpthread_freeres;
> +    __pthread_create_internal;

OK. Provide new __pthread_create_internal@GLIBC_PRIVATE that takes a signal mask.

>    }
>  }
> diff --git a/nptl/pthreadP.h b/nptl/pthreadP.h
> index c4e72f57a9..4e065ace58 100644
> --- a/nptl/pthreadP.h
> +++ b/nptl/pthreadP.h
> @@ -339,6 +339,16 @@ extern void __pthread_cleanup_upto (__jmp_buf target, char *targetframe);
>  hidden_proto (__pthread_cleanup_upto)
>  #endif
>  
> +/* Exactly like pthread_create if NEW_SIGMASK == NULL.  Otherwise, do
> +   not use the current signal mask for the new thread, but set it to
> +   *NEW_SIGMASK instead (without unblocking internal signals).  */
> +extern int __pthread_create_internal (pthread_t *newthread,
> +				      const pthread_attr_t *attr,
> +				      void *(*start_routine) (void *),
> +				      void *arg, const sigset_t *new_sigmask);
> +#if IS_IN (libpthread)
> +hidden_proto (__pthread_create_internal)
> +#endif

OK.

>  
>  /* Functions with versioned interfaces.  */
>  extern int __pthread_create_2_1 (pthread_t *newthread,
> diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
> index afd379e89a..2430d65723 100644
> --- a/nptl/pthread_create.c
> +++ b/nptl/pthread_create.c
> @@ -603,10 +603,10 @@ report_thread_creation (struct pthread *pd)
>    return false;
>  }
>  
> -
>  int
> -__pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
> -		      void *(*start_routine) (void *), void *arg)

Could you please provide a detailed explanation of why we have a signal 
mask here?

Future reviewers of this code will be left wondering why the interface was
designed as it is, and that's difficult to explain without a more thorough
code audit.

I would like to see a detailed paragraph here that explains the arguments
(most of which are trivial) and particularly new_sigmask and why it's
needed.

If you don't want to explain this here, because you think this is generic
infrastructure, then we need to explain this at the call site that uses
new_sigmask != NULL.

> +__pthread_create_internal (pthread_t *newthread, const pthread_attr_t *attr,
> +			   void *(*start_routine) (void *), void *arg,
> +			   const sigset_t *new_sigmask)
>  {
>    STACK_VARIABLES;
>  
> @@ -762,14 +762,21 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>    sigset_t original_sigmask;
>    __libc_signal_block_all (&original_sigmask);

OK. At this point all signals are blocked. SIGCANCEL and SIGSETXID. Later
on we will call _libc_signal_restore_set (&pd->sigmask); from START_THREAD_DEFN
to restore the signal mask (and unblock signals, potentially delivering some).

>  
> -  /* Conceptually, the new thread needs to inherit the signal mask of
> -     this thread.  Therefore, it needs to restore the saved signal
> -     mask of this thread, so save it in the startup information.  */
> -  pd->sigmask = original_sigmask;
> +  if (new_sigmask != NULL)
> +    /* The caller supplied the signal mask for the new thread.  */
> +    pd->sigmask = *new_sigmask;

OK. The if clause does the new work, which is just to set the signal mask
exactly as expected and don't play with SIGCANCEL, since this is not a
normal thread.

> +  else
> +    {

OK. We have the else clause do the normal work.

> +      /* Conceptually, the new thread needs to inherit the signal mask
> +	 of this thread.  Therefore, it needs to restore the saved
> +	 signal mask of this thread, so save it in the startup
> +	 information.  */
> +      pd->sigmask = original_sigmask;

OK.

>  
> -  /* Reset the cancellation signal mask in case this thread is running
> -     cancellation.  */
> -  __sigdelset (&pd->sigmask, SIGCANCEL);
> +      /* Reset the cancellation signal mask in case this thread is
> +	 running cancellation.  */
> +      __sigdelset (&pd->sigmask, SIGCANCEL);

OK.

> +    }
>  
>    /* Start the thread.  */
>    if (__glibc_unlikely (report_thread_creation (pd)))
> @@ -873,6 +880,15 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>  
>    return retval;
>  }
> +hidden_def (__pthread_create_internal)
> +
> +int
> +__pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
> +		      void *(*start_routine) (void *), void *arg)
> +{
> +  return __pthread_create_internal (newthread, attr, start_routine, arg,
> +				    false);

My preference is to use NULL with an informative cast.

Unless you can show existing practice. I'm not a fan of using false like this.

> +}
>  versioned_symbol (libpthread, __pthread_create_2_1, pthread_create, GLIBC_2_1);
>  
>  
> diff --git a/sysdeps/unix/sysv/linux/timer_routines.c b/sysdeps/unix/sysv/linux/timer_routines.c
> index 63083f6f91..7a5fa3dbb2 100644
> --- a/sysdeps/unix/sysv/linux/timer_routines.c
> +++ b/sysdeps/unix/sysv/linux/timer_routines.c
> @@ -136,23 +136,19 @@ __start_helper_thread (void)
>    (void) pthread_attr_init (&attr);
>    (void) pthread_attr_setstacksize (&attr, __pthread_get_minstack (&attr));
>  
> -  /* Block all signals in the helper thread but SIGSETXID.  To do this
> -     thoroughly we temporarily have to block all signals here.  The
> -     helper can lose wakeups if SIGTIMER is not blocked throughout.  */
> -  sigset_t ss;
> -  __libc_signal_block_app (&ss);
> -  __libc_signal_block_sigtimer (NULL);

OK. Yup, we don't need to do any of this work because we have fixed
__pthread_create_internal to handle the sigmaks case for us (a good refactoring)
and reduces signal latency in calling thread.

> +  /* Block all signals in the helper thread but SIGSETXID.  */
> +  sigset_t new_sigmask;
> +  __sigfillset (&new_sigmask);
> +  __sigdelset (&new_sigmask, SIGSETXID);

OK. SIGTIMER/SIGCANCEL is blocked. SIGSETXID is not for setxid calls.
The helper thread synchronously calls sigwaitinfo to retrieve the pending
signal.

>    /* Create the helper thread for this timer.  */
>    pthread_t th;
> -  int res = pthread_create (&th, &attr, timer_helper_thread, NULL);
> +  int res = __pthread_create_internal (&th, &attr, timer_helper_thread, NULL,
> +				       &new_sigmask);
>    if (res == 0)
>      /* We managed to start the helper thread.  */
>      __helper_tid = ((struct pthread *) th)->tid;
>  
> -  /* Restore the signal mask.  */
> -  __libc_signal_restore_set (&ss);

OK.

> -
>    /* No need for the attribute anymore.  */
>    (void) pthread_attr_destroy (&attr);
>  
>
  
Florian Weimer May 12, 2020, 7:17 p.m. UTC | #6
* Carlos O'Donell:

>> +/* Exactly like pthread_create if NEW_SIGMASK == NULL.  Otherwise, do
>> +   not use the current signal mask for the new thread, but set it to
>> +   *NEW_SIGMASK instead (without unblocking internal signals).  */
>> +extern int __pthread_create_internal (pthread_t *newthread,
>> +				      const pthread_attr_t *attr,
>> +				      void *(*start_routine) (void *),
>> +				      void *arg, const sigset_t *new_sigmask);
>> +#if IS_IN (libpthread)
>> +hidden_proto (__pthread_create_internal)
>> +#endif
>
> OK.
>
>>  
>>  /* Functions with versioned interfaces.  */
>>  extern int __pthread_create_2_1 (pthread_t *newthread,
>> diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
>> index afd379e89a..2430d65723 100644
>> --- a/nptl/pthread_create.c
>> +++ b/nptl/pthread_create.c
>> @@ -603,10 +603,10 @@ report_thread_creation (struct pthread *pd)
>>    return false;
>>  }
>>  
>> -
>>  int
>> -__pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>> -		      void *(*start_routine) (void *), void *arg)
>
> Could you please provide a detailed explanation of why we have a signal 
> mask here?

Why isn't the comment in the header file sufficient?

I find your requirement puzzling.

> Future reviewers of this code will be left wondering why the interface was
> designed as it is, and that's difficult to explain without a more thorough
> code audit.

Hmm.  NULL == use some default is pretty standard?  It's already used
for the attr argument.

>> @@ -873,6 +880,15 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>>  
>>    return retval;
>>  }
>> +hidden_def (__pthread_create_internal)
>> +
>> +int
>> +__pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>> +		      void *(*start_routine) (void *), void *arg)
>> +{
>> +  return __pthread_create_internal (newthread, attr, start_routine, arg,
>> +				    false);
>
> My preference is to use NULL with an informative cast.
>
> Unless you can show existing practice. I'm not a fan of using false
> like this.

No, this is an accident.

Thanks,
Florian
  
Carlos O'Donell May 13, 2020, 3:25 p.m. UTC | #7
On 5/12/20 11:02 AM, Florian Weimer wrote:
> timer_create needs to create threads with all signals blocked,
> including SIGTIMER (which happens to equal SIGCANCEL).  Add a new
> internal interface which provides an explicit way to achieve that.
> 
> Fixes commit b3cae39dcbfa2432b3f3aa28854d8ac57f0de1b8 ("nptl: Start
> new threads with all signals blocked [BZ #25098]").
> 
> Tested on x86_64-linux-gnu.

Overnight continuous tester showed no sporadic fails with tst-timer-sigmask.

Thanks again for fixing this quickly.
  
Florian Weimer May 13, 2020, 4:22 p.m. UTC | #8
* Carlos O'Donell:

> On 5/12/20 11:02 AM, Florian Weimer wrote:
>> timer_create needs to create threads with all signals blocked,
>> including SIGTIMER (which happens to equal SIGCANCEL).  Add a new
>> internal interface which provides an explicit way to achieve that.
>> 
>> Fixes commit b3cae39dcbfa2432b3f3aa28854d8ac57f0de1b8 ("nptl: Start
>> new threads with all signals blocked [BZ #25098]").
>> 
>> Tested on x86_64-linux-gnu.
>
> Overnight continuous tester showed no sporadic fails with tst-timer-sigmask.
>
> Thanks again for fixing this quickly.

So is it okay to push this patch, then?  With the false → NULL fix?

Thanks,
Florian
  
Florian Weimer May 14, 2020, 3:46 p.m. UTC | #9
* Adhemerval Zanella:

> On 12/05/2020 12:02, Florian Weimer wrote:
>> timer_create needs to create threads with all signals blocked,
>> including SIGTIMER (which happens to equal SIGCANCEL).  Add a new
>> internal interface which provides an explicit way to achieve that.
>> 
>> Fixes commit b3cae39dcbfa2432b3f3aa28854d8ac57f0de1b8 ("nptl: Start
>> new threads with all signals blocked [BZ #25098]").
>> 
>> Tested on x86_64-linux-gnu.
>
> The patch looks go, thanks.  As a side note, I am wondering if it would
> be better to de-couple POSIX timer from cancellation handling, it should
> avoid require a special pthread create symbols as this patch.
>
> Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

Carlos, is it okay to check this in?  (With the false → NULL change, of
course.)

Thanks,
Florian
  
Carlos O'Donell May 14, 2020, 5:52 p.m. UTC | #10
On 5/12/20 3:17 PM, Florian Weimer wrote:
> * Carlos O'Donell:
> 
>>> +/* Exactly like pthread_create if NEW_SIGMASK == NULL.  Otherwise, do
>>> +   not use the current signal mask for the new thread, but set it to
>>> +   *NEW_SIGMASK instead (without unblocking internal signals).  */


Suggest:

/* Exactly like pthread_create if NEW_SIGMASK is NULL.
   Create the new thread using the thread descriptor at NEWTHREAD,
   and the thread attributes from *ATTR, executing provided START_ROUTINE
   with a single void argument ARG.  The last argument is not present in
   the public API for pthread_create, but is part of the internal glibc API.
   The last argument is the signal mask to be restored in the new thread,
   and is normally NULL, in which case the parent's signal mask is restored
   as mandated by the standard's requirement for pthread_create.  In some cases
   though you may wish to keep certain signals blocked to avoid race cases,
   and today this includes the helper thread created by SIGEV_THREAD where all
   signals are blocked except SIGCANCEL.  Thus the new_sigmask argument is
   primarily intended for the creation of helper threads not user threads.  */

>>> +extern int __pthread_create_internal (pthread_t *newthread,
>>> +				      const pthread_attr_t *attr,
>>> +				      void *(*start_routine) (void *),
>>> +				      void *arg, const sigset_t *new_sigmask);
>>> +#if IS_IN (libpthread)
>>> +hidden_proto (__pthread_create_internal)
>>> +#endif
>>
>> OK.
>>
>>>  
>>>  /* Functions with versioned interfaces.  */
>>>  extern int __pthread_create_2_1 (pthread_t *newthread,
>>> diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
>>> index afd379e89a..2430d65723 100644
>>> --- a/nptl/pthread_create.c
>>> +++ b/nptl/pthread_create.c
>>> @@ -603,10 +603,10 @@ report_thread_creation (struct pthread *pd)
>>>    return false;
>>>  }
>>>  
>>> -


>>>  int
>>> -__pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>>> -		      void *(*start_routine) (void *), void *arg)
>>
>> Could you please provide a detailed explanation of why we have a signal 
>> mask here?
> 
> Why isn't the comment in the header file sufficient?
> 
> I find your requirement puzzling.

I am asking for task oriented documentation for our internal APIs.

Task oriented documentation allows a non-expert to carry out an operation without
the need of an expert.

Simple API documentation lacks sufficient detail for a non-expert to use the API 
and to know what they would need to pass in this argument.

See my suggestion above.

>> Future reviewers of this code will be left wondering why the interface was
>> designed as it is, and that's difficult to explain without a more thorough
>> code audit.
> 
> Hmm.  NULL == use some default is pretty standard?  It's already used
> for the attr argument.

I hope my clarification of task-oriented vs. API helps.

>>> @@ -873,6 +880,15 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>>>  
>>>    return retval;
>>>  }
>>> +hidden_def (__pthread_create_internal)
>>> +
>>> +int
>>> +__pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>>> +		      void *(*start_routine) (void *), void *arg)
>>> +{
>>> +  return __pthread_create_internal (newthread, attr, start_routine, arg,
>>> +				    false);
>>
>> My preference is to use NULL with an informative cast.
>>
>> Unless you can show existing practice. I'm not a fan of using false
>> like this.
> 
> No, this is an accident.

Perfect.
  
Carlos O'Donell May 14, 2020, 5:54 p.m. UTC | #11
On 5/14/20 11:46 AM, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> On 12/05/2020 12:02, Florian Weimer wrote:
>>> timer_create needs to create threads with all signals blocked,
>>> including SIGTIMER (which happens to equal SIGCANCEL).  Add a new
>>> internal interface which provides an explicit way to achieve that.
>>>
>>> Fixes commit b3cae39dcbfa2432b3f3aa28854d8ac57f0de1b8 ("nptl: Start
>>> new threads with all signals blocked [BZ #25098]").
>>>
>>> Tested on x86_64-linux-gnu.
>>
>> The patch looks go, thanks.  As a side note, I am wondering if it would
>> be better to de-couple POSIX timer from cancellation handling, it should
>> avoid require a special pthread create symbols as this patch.
>>
>> Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
> 
> Carlos, is it okay to check this in?  (With the false → NULL change, of
> course.)

No. I still want task-oriented internal comments to explain when and why
the new_sigmask parameter would be set. I gave a suggested paragraph in
the follow-on review. Please have a look and post a v2 or discuss further.

I know we are under pressure to fix this because it's a regression, but we
can take some time to write better task-oriented comments for the API so
future readers know how to use it.
  
Florian Weimer May 14, 2020, 6:30 p.m. UTC | #12
* Carlos O'Donell:

> On 5/12/20 3:17 PM, Florian Weimer wrote:
>> * Carlos O'Donell:
>> 
>>>> +/* Exactly like pthread_create if NEW_SIGMASK == NULL.  Otherwise, do
>>>> +   not use the current signal mask for the new thread, but set it to
>>>> +   *NEW_SIGMASK instead (without unblocking internal signals).  */
>
>
> Suggest:
>
> /* Exactly like pthread_create if NEW_SIGMASK is NULL.
>    Create the new thread using the thread descriptor at NEWTHREAD,
>    and the thread attributes from *ATTR, executing provided START_ROUTINE
>    with a single void argument ARG.  The last argument is not present in
>    the public API for pthread_create, but is part of the internal glibc API.
>    The last argument is the signal mask to be restored in the new thread,
>    and is normally NULL, in which case the parent's signal mask is restored
>    as mandated by the standard's requirement for pthread_create.  In some cases
>    though you may wish to keep certain signals blocked to avoid race cases,
>    and today this includes the helper thread created by SIGEV_THREAD where all
>    signals are blocked except SIGCANCEL.  Thus the new_sigmask argument is
>    primarily intended for the creation of helper threads not user threads.  */

Does this mean you think this interface is more generally useful?

I'm still trying to make sense of this request.

Thanks,
Florian
  
Carlos O'Donell May 14, 2020, 7:33 p.m. UTC | #13
On 5/14/20 2:30 PM, Florian Weimer wrote:
> * Carlos O'Donell:
> 
>> On 5/12/20 3:17 PM, Florian Weimer wrote:
>>> * Carlos O'Donell:
>>>
>>>>> +/* Exactly like pthread_create if NEW_SIGMASK == NULL.  Otherwise, do
>>>>> +   not use the current signal mask for the new thread, but set it to
>>>>> +   *NEW_SIGMASK instead (without unblocking internal signals).  */
>>
>>
>> Suggest:
>>
>> /* Exactly like pthread_create if NEW_SIGMASK is NULL.
>>    Create the new thread using the thread descriptor at NEWTHREAD,
>>    and the thread attributes from *ATTR, executing provided START_ROUTINE
>>    with a single void argument ARG.  The last argument is not present in
>>    the public API for pthread_create, but is part of the internal glibc API.
>>    The last argument is the signal mask to be restored in the new thread,
>>    and is normally NULL, in which case the parent's signal mask is restored
>>    as mandated by the standard's requirement for pthread_create.  In some cases
>>    though you may wish to keep certain signals blocked to avoid race cases,
>>    and today this includes the helper thread created by SIGEV_THREAD where all
>>    signals are blocked except SIGCANCEL.  Thus the new_sigmask argument is
>>    primarily intended for the creation of helper threads not user threads.  */
> 
> Does this mean you think this interface is more generally useful?

It might. I haven't reviewed C11 threads, getaddrinfo_a or aio uses.

Do we need to judge that today?

My opinion is that if a patch creates new internal API, then that API should
receive documentation in the form of a comment.

My preference for such documentation is that it be task-oriented, that is that
it should explain how the interface operates (what you documented), and what
task you would use it for (the task-based aspect).

The alternative is adding a comment that says this internal API is not to be
used by anyone except the two current caller. In which case you're off the
hook for documenting the internal interface.... someone else will have to do
that when they extend it again or want to use it a third time.

> I'm still trying to make sense of this request.
Review: "How Developers Use API Documentation: An Observation Study"
http://sigdoc.acm.org/wp-content/uploads/2019/01/CDQ18002_Meng_Steinhardt_Schubert.pdf
See: "Present conceptual information integrated with related tasks."

The concept I think we should represent is that the signal mask here is to avoid
races in internally created threads that do specific tasks. Future developers may
find it useful, and leverage that to do other things. We tie a bunch of things
together in one paragraph: race cases with signals, intended purpose, and current
uses.

If you object on the groups of cost/value for such documentation, then I'm OK with
that, but then you have to mark these internal APIs as off-limits in some way.
  
Adhemerval Zanella May 14, 2020, 8:32 p.m. UTC | #14
On 14/05/2020 16:33, Carlos O'Donell wrote:
> On 5/14/20 2:30 PM, Florian Weimer wrote:
>> * Carlos O'Donell:
>>
>>> On 5/12/20 3:17 PM, Florian Weimer wrote:
>>>> * Carlos O'Donell:
>>>>
>>>>>> +/* Exactly like pthread_create if NEW_SIGMASK == NULL.  Otherwise, do
>>>>>> +   not use the current signal mask for the new thread, but set it to
>>>>>> +   *NEW_SIGMASK instead (without unblocking internal signals).  */
>>>
>>>
>>> Suggest:
>>>
>>> /* Exactly like pthread_create if NEW_SIGMASK is NULL.
>>>    Create the new thread using the thread descriptor at NEWTHREAD,
>>>    and the thread attributes from *ATTR, executing provided START_ROUTINE
>>>    with a single void argument ARG.  The last argument is not present in
>>>    the public API for pthread_create, but is part of the internal glibc API.
>>>    The last argument is the signal mask to be restored in the new thread,
>>>    and is normally NULL, in which case the parent's signal mask is restored
>>>    as mandated by the standard's requirement for pthread_create.  In some cases
>>>    though you may wish to keep certain signals blocked to avoid race cases,
>>>    and today this includes the helper thread created by SIGEV_THREAD where all
>>>    signals are blocked except SIGCANCEL.  Thus the new_sigmask argument is
>>>    primarily intended for the creation of helper threads not user threads.  */
>>
>> Does this mean you think this interface is more generally useful?
> 
> It might. I haven't reviewed C11 threads, getaddrinfo_a or aio uses.

If we eventually decouple SIGCANCEL and SIGTIMER this interface will most
likely be removed as well.  Not sure if it would be better to add extra
comments to document the behavior of if the VCS itself is a better tool.

> 
> Do we need to judge that tod> 
> My opinion is that if a patch creates new internal API, then that API should
> receive documentation in the form of a comment.
> 
> My preference for such documentation is that it be task-oriented, that is that
> it should explain how the interface operates (what you documented), and what
> task you would use it for (the task-based aspect).
> 
> The alternative is adding a comment that says this internal API is not to be
> used by anyone except the two current caller. In which case you're off the
> hook for documenting the internal interface.... someone else will have to do
> that when they extend it again or want to use it a third time.
> 
>> I'm still trying to make sense of this request.
> Review: "How Developers Use API Documentation: An Observation Study"
> http://sigdoc.acm.org/wp-content/uploads/2019/01/CDQ18002_Meng_Steinhardt_Schubert.pdf
> See: "Present conceptual information integrated with related tasks."
> 
> The concept I think we should represent is that the signal mask here is to avoid
> races in internally created threads that do specific tasks. Future developers may
> find it useful, and leverage that to do other things. We tie a bunch of things
> together in one paragraph: race cases with signals, intended purpose, and current
> uses.
> 
> If you object on the groups of cost/value for such documentation, then I'm OK with
> that, but then you have to mark these internal APIs as off-limits in some way.
>
  
Carlos O'Donell May 14, 2020, 8:40 p.m. UTC | #15
On 5/14/20 4:32 PM, Adhemerval Zanella wrote:
> 
> 
> On 14/05/2020 16:33, Carlos O'Donell wrote:
>> On 5/14/20 2:30 PM, Florian Weimer wrote:
>>> * Carlos O'Donell:
>>>
>>>> On 5/12/20 3:17 PM, Florian Weimer wrote:
>>>>> * Carlos O'Donell:
>>>>>
>>>>>>> +/* Exactly like pthread_create if NEW_SIGMASK == NULL.  Otherwise, do
>>>>>>> +   not use the current signal mask for the new thread, but set it to
>>>>>>> +   *NEW_SIGMASK instead (without unblocking internal signals).  */
>>>>
>>>>
>>>> Suggest:
>>>>
>>>> /* Exactly like pthread_create if NEW_SIGMASK is NULL.
>>>>    Create the new thread using the thread descriptor at NEWTHREAD,
>>>>    and the thread attributes from *ATTR, executing provided START_ROUTINE
>>>>    with a single void argument ARG.  The last argument is not present in
>>>>    the public API for pthread_create, but is part of the internal glibc API.
>>>>    The last argument is the signal mask to be restored in the new thread,
>>>>    and is normally NULL, in which case the parent's signal mask is restored
>>>>    as mandated by the standard's requirement for pthread_create.  In some cases
>>>>    though you may wish to keep certain signals blocked to avoid race cases,
>>>>    and today this includes the helper thread created by SIGEV_THREAD where all
>>>>    signals are blocked except SIGCANCEL.  Thus the new_sigmask argument is
>>>>    primarily intended for the creation of helper threads not user threads.  */
>>>
>>> Does this mean you think this interface is more generally useful?
>>
>> It might. I haven't reviewed C11 threads, getaddrinfo_a or aio uses.
> 
> If we eventually decouple SIGCANCEL and SIGTIMER this interface will most
> likely be removed as well.  Not sure if it would be better to add extra
> comments to document the behavior of if the VCS itself is a better tool.

I'm happy if we limit the scope.

Suggest:

/* This interface is *only* for use by pthread_create or SIGEV_THREAD
   helper threads.  We expect it will go away when SIGCANCEL and
   SIGTIMER are decoupled.  */

Then it's clear nobody should use it.
  
Florian Weimer May 19, 2020, 11:08 a.m. UTC | #16
* Carlos O'Donell:

> On 5/14/20 2:30 PM, Florian Weimer wrote:
>> * Carlos O'Donell:
>> 
>>> On 5/12/20 3:17 PM, Florian Weimer wrote:
>>>> * Carlos O'Donell:
>>>>
>>>>>> +/* Exactly like pthread_create if NEW_SIGMASK == NULL.  Otherwise, do
>>>>>> +   not use the current signal mask for the new thread, but set it to
>>>>>> +   *NEW_SIGMASK instead (without unblocking internal signals).  */
>>>
>>>
>>> Suggest:
>>>
>>> /* Exactly like pthread_create if NEW_SIGMASK is NULL.
>>>    Create the new thread using the thread descriptor at NEWTHREAD,
>>>    and the thread attributes from *ATTR, executing provided START_ROUTINE
>>>    with a single void argument ARG.  The last argument is not present in
>>>    the public API for pthread_create, but is part of the internal glibc API.
>>>    The last argument is the signal mask to be restored in the new thread,
>>>    and is normally NULL, in which case the parent's signal mask is restored
>>>    as mandated by the standard's requirement for pthread_create.  In some cases
>>>    though you may wish to keep certain signals blocked to avoid race cases,
>>>    and today this includes the helper thread created by SIGEV_THREAD where all
>>>    signals are blocked except SIGCANCEL.  Thus the new_sigmask argument is
>>>    primarily intended for the creation of helper threads not user threads.  */
>> 
>> Does this mean you think this interface is more generally useful?
>
> It might. I haven't reviewed C11 threads, getaddrinfo_a or aio uses.
>
> Do we need to judge that today?
>
> My opinion is that if a patch creates new internal API, then that API should
> receive documentation in the form of a comment.

I've now turned this into a real API:

  <https://sourceware.org/pipermail/libc-alpha/2020-May/114065.html>

Thanks,
Florian
  

Patch

diff --git a/nptl/Versions b/nptl/Versions
index f7140277f5..17a711dfb1 100644
--- a/nptl/Versions
+++ b/nptl/Versions
@@ -285,5 +285,6 @@  libpthread {
     __pthread_barrier_init; __pthread_barrier_wait;
     __shm_directory;
     __libpthread_freeres;
+    __pthread_create_internal;
   }
 }
diff --git a/nptl/pthreadP.h b/nptl/pthreadP.h
index c4e72f57a9..4e065ace58 100644
--- a/nptl/pthreadP.h
+++ b/nptl/pthreadP.h
@@ -339,6 +339,16 @@  extern void __pthread_cleanup_upto (__jmp_buf target, char *targetframe);
 hidden_proto (__pthread_cleanup_upto)
 #endif
 
+/* Exactly like pthread_create if NEW_SIGMASK == NULL.  Otherwise, do
+   not use the current signal mask for the new thread, but set it to
+   *NEW_SIGMASK instead (without unblocking internal signals).  */
+extern int __pthread_create_internal (pthread_t *newthread,
+				      const pthread_attr_t *attr,
+				      void *(*start_routine) (void *),
+				      void *arg, const sigset_t *new_sigmask);
+#if IS_IN (libpthread)
+hidden_proto (__pthread_create_internal)
+#endif
 
 /* Functions with versioned interfaces.  */
 extern int __pthread_create_2_1 (pthread_t *newthread,
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index afd379e89a..2430d65723 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -603,10 +603,10 @@  report_thread_creation (struct pthread *pd)
   return false;
 }
 
-
 int
-__pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
-		      void *(*start_routine) (void *), void *arg)
+__pthread_create_internal (pthread_t *newthread, const pthread_attr_t *attr,
+			   void *(*start_routine) (void *), void *arg,
+			   const sigset_t *new_sigmask)
 {
   STACK_VARIABLES;
 
@@ -762,14 +762,21 @@  __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
   sigset_t original_sigmask;
   __libc_signal_block_all (&original_sigmask);
 
-  /* Conceptually, the new thread needs to inherit the signal mask of
-     this thread.  Therefore, it needs to restore the saved signal
-     mask of this thread, so save it in the startup information.  */
-  pd->sigmask = original_sigmask;
+  if (new_sigmask != NULL)
+    /* The caller supplied the signal mask for the new thread.  */
+    pd->sigmask = *new_sigmask;
+  else
+    {
+      /* Conceptually, the new thread needs to inherit the signal mask
+	 of this thread.  Therefore, it needs to restore the saved
+	 signal mask of this thread, so save it in the startup
+	 information.  */
+      pd->sigmask = original_sigmask;
 
-  /* Reset the cancellation signal mask in case this thread is running
-     cancellation.  */
-  __sigdelset (&pd->sigmask, SIGCANCEL);
+      /* Reset the cancellation signal mask in case this thread is
+	 running cancellation.  */
+      __sigdelset (&pd->sigmask, SIGCANCEL);
+    }
 
   /* Start the thread.  */
   if (__glibc_unlikely (report_thread_creation (pd)))
@@ -873,6 +880,15 @@  __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
 
   return retval;
 }
+hidden_def (__pthread_create_internal)
+
+int
+__pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
+		      void *(*start_routine) (void *), void *arg)
+{
+  return __pthread_create_internal (newthread, attr, start_routine, arg,
+				    false);
+}
 versioned_symbol (libpthread, __pthread_create_2_1, pthread_create, GLIBC_2_1);
 
 
diff --git a/sysdeps/unix/sysv/linux/timer_routines.c b/sysdeps/unix/sysv/linux/timer_routines.c
index 63083f6f91..7a5fa3dbb2 100644
--- a/sysdeps/unix/sysv/linux/timer_routines.c
+++ b/sysdeps/unix/sysv/linux/timer_routines.c
@@ -136,23 +136,19 @@  __start_helper_thread (void)
   (void) pthread_attr_init (&attr);
   (void) pthread_attr_setstacksize (&attr, __pthread_get_minstack (&attr));
 
-  /* Block all signals in the helper thread but SIGSETXID.  To do this
-     thoroughly we temporarily have to block all signals here.  The
-     helper can lose wakeups if SIGTIMER is not blocked throughout.  */
-  sigset_t ss;
-  __libc_signal_block_app (&ss);
-  __libc_signal_block_sigtimer (NULL);
+  /* Block all signals in the helper thread but SIGSETXID.  */
+  sigset_t new_sigmask;
+  __sigfillset (&new_sigmask);
+  __sigdelset (&new_sigmask, SIGSETXID);
 
   /* Create the helper thread for this timer.  */
   pthread_t th;
-  int res = pthread_create (&th, &attr, timer_helper_thread, NULL);
+  int res = __pthread_create_internal (&th, &attr, timer_helper_thread, NULL,
+				       &new_sigmask);
   if (res == 0)
     /* We managed to start the helper thread.  */
     __helper_tid = ((struct pthread *) th)->tid;
 
-  /* Restore the signal mask.  */
-  __libc_signal_restore_set (&ss);
-
   /* No need for the attribute anymore.  */
   (void) pthread_attr_destroy (&attr);