[v2,00/19] Fix various NPTL synchronization issues

Message ID 20210823195047.543237-1-adhemerval.zanella@linaro.org
Headers
Series Fix various NPTL synchronization issues |

Message

Adhemerval Zanella Aug. 23, 2021, 7:50 p.m. UTC
  This is an update of my previous set to fix some NPTL issues [1].
The main changes are:

  - Rebased against master and adjusted the __clone_internal usage.
  - Adapted Florian's ESRCH fixes [2]
  - Add fixes for various function that access the 'tid'.

Patch 01 to 03 are general nptl fixes and they are independent of the
other fixes.

Patch 04 is the main change of this patchset, it uses a different
field instead of the pthread 'tid' to synchrnonize the internal
thread state (BZ#19951).

It allows to both move the thread setxid internal state out of
'cancelhandling' (used on setuid() call in multi-thread information),
and remove the EXITING_BIT and TERMINATED_BIT (since 'joinstate' now
track such it).  This is done on patch 05 and 06.

Patches 08 and 09 fixes two long standing issues regarding
pthread_kill() and thread exit (BZ#12889 and BZ#19193).  Now that]
'tid' is setting explicitly by pthread_create(), a simple lock can be
used instead of more complex futex operation.

Patches 10 to 18 extend the same 'tid' access fix to other pthread
functions that uses the member.

[1] https://patchwork.sourceware.org/project/glibc/list/?series=2304
[2] https://patchwork.sourceware.org/project/glibc/list/?series=2696

Adhemerval Zanella (16):
  nptl: Fix tst-cancel7 and tst-cancelx7 race condition (BZ #14232)
  nptl: Set cancellation type and state on pthread_exit
  nptl: Handle robust PI mutexes for !__ASSUME_SET_ROBUST_LIST
  nptl: Do not use pthread set_tid_address as state synchronization (BZ
    #19951)
  nptl: Move setxid flag out of cancelhandling
  nptl: Replace struct thread cancelhandling field
  nptl: Use tidlock when accessing TID on pthread_getaffinity_np
  nptl: Use tidlock when accessing TID on pthread_setaffinity
  nptl: Use tidlock when accessing TID on pthread_getcpuclockid
  nptl: Use tidlock when accessing TID on pthread_getschedparam
  nptl: Use tidlock when accessing TID on pthread_setschedparam
  nptl: Use tidlock when accessing TID on pthread_getname_np
  nptl: Use tidlock when accessing TID on pthread_setname_np
  nptl: Use tidlock when accessing TID on pthread_sigqueue
  nptl: Use tidlock when accessing TID on pthread_setschedprio
  nptl: Remove INVALID_TD_P

Florian Weimer (3):
  support: Add support_wait_for_thread_exit
  nptl: pthread_kill, pthread_cancel should fail after exit (bug 19193)
  nptl: Fix race between pthread_kill and thread exit (bug 12889)

 nptl/Makefile                                 |   3 +-
 nptl/Versions                                 |   1 +
 nptl/allocatestack.c                          |   6 +-
 nptl/cancellation.c                           |  17 +-
 nptl/descr.h                                  |  48 +++---
 nptl/nptl-stack.h                             |   2 +-
 nptl/nptl_free_tcb.c                          |  22 +--
 nptl/nptl_setxid.c                            |  49 ++----
 nptl/pthread_cancel.c                         |  20 +--
 nptl/pthread_clockjoin.c                      |   2 +-
 nptl/pthread_create.c                         | 124 ++++++++------
 nptl/pthread_detach.c                         |  36 ++--
 nptl/pthread_exit.c                           |   4 +-
 nptl/pthread_getaffinity.c                    |  22 ++-
 nptl/pthread_getattr_np.c                     |   2 +-
 nptl/pthread_getcpuclockid.c                  |  27 +--
 nptl/pthread_getname.c                        |  45 +++--
 nptl/pthread_getschedparam.c                  |  16 +-
 nptl/pthread_join.c                           |   2 +-
 nptl/pthread_join_common.c                    | 126 +++++---------
 nptl/pthread_kill.c                           |  21 ++-
 nptl/pthread_setaffinity.c                    |  22 ++-
 nptl/pthread_setname.c                        |  34 ++--
 nptl/pthread_setschedparam.c                  |  16 +-
 nptl/pthread_setschedprio.c                   |  17 +-
 nptl/pthread_sigqueue.c                       |  52 +++---
 nptl/pthread_testcancel.c                     |  11 +-
 nptl/pthread_timedjoin.c                      |   2 +-
 nptl/pthread_tryjoin.c                        |  18 +-
 nptl/tst-cancel7.c                            | 114 ++++++-------
 nptl/tst-cleanup5.c                           | 157 ++++++++++++++++++
 nptl_db/structs.def                           |   2 +-
 nptl_db/td_thr_get_info.c                     |  16 +-
 nptl_db/td_thr_getfpregs.c                    |   9 +-
 nptl_db/td_thr_getgregs.c                     |   9 +-
 nptl_db/td_thr_setfpregs.c                    |   9 +-
 nptl_db/td_thr_setgregs.c                     |   9 +-
 support/Makefile                              |   3 +-
 support/support.h                             |   4 +
 support/support_wait_for_thread_exit.c        |  72 ++++++++
 sysdeps/hppa/nptl/tcb-offsets.sym             |   1 -
 sysdeps/i386/nptl/tcb-offsets.sym             |   1 -
 sysdeps/nptl/dl-tls_init_tp.c                 |   4 +-
 sysdeps/nptl/libc_start_call_main.h           |   7 +
 sysdeps/nptl/pthreadP.h                       |  27 +--
 sysdeps/pthread/Makefile                      |   7 +-
 sysdeps/pthread/tst-kill4.c                   |  90 ----------
 sysdeps/pthread/tst-pthread_cancel-exited.c   |  45 +++++
 .../pthread/tst-pthread_cancel-select-loop.c  |  87 ++++++++++
 sysdeps/pthread/tst-pthread_kill-exited.c     |  46 +++++
 sysdeps/pthread/tst-pthread_kill-exiting.c    | 110 ++++++++++++
 sysdeps/pthread/tst-thrd-detach.c             |  16 +-
 sysdeps/sh/nptl/tcb-offsets.sym               |   1 -
 sysdeps/x86_64/nptl/tcb-offsets.sym           |   4 -
 54 files changed, 1037 insertions(+), 580 deletions(-)
 create mode 100644 nptl/tst-cleanup5.c
 create mode 100644 support/support_wait_for_thread_exit.c
 delete mode 100644 sysdeps/pthread/tst-kill4.c
 create mode 100644 sysdeps/pthread/tst-pthread_cancel-exited.c
 create mode 100644 sysdeps/pthread/tst-pthread_cancel-select-loop.c
 create mode 100644 sysdeps/pthread/tst-pthread_kill-exited.c
 create mode 100644 sysdeps/pthread/tst-pthread_kill-exiting.c
  

Comments

Florian Weimer Aug. 26, 2021, 2:47 p.m. UTC | #1
* Adhemerval Zanella:

> This is an update of my previous set to fix some NPTL issues [1].
> The main changes are:
>
>   - Rebased against master and adjusted the __clone_internal usage.
>   - Adapted Florian's ESRCH fixes [2]
>   - Add fixes for various function that access the 'tid'.
>
> Patch 01 to 03 are general nptl fixes and they are independent of the
> other fixes.
>
> Patch 04 is the main change of this patchset, it uses a different
> field instead of the pthread 'tid' to synchrnonize the internal
> thread state (BZ#19951).
>
> It allows to both move the thread setxid internal state out of
> 'cancelhandling' (used on setuid() call in multi-thread information),
> and remove the EXITING_BIT and TERMINATED_BIT (since 'joinstate' now
> track such it).  This is done on patch 05 and 06.
>
> Patches 08 and 09 fixes two long standing issues regarding
> pthread_kill() and thread exit (BZ#12889 and BZ#19193).  Now that]
> 'tid' is setting explicitly by pthread_create(), a simple lock can be
> used instead of more complex futex operation.
>
> Patches 10 to 18 extend the same 'tid' access fix to other pthread
> functions that uses the member.

I don't think this series of patches is suitable for backport to glibc
2.34 once completed.  The libpthreaddb changes look particularly
cumbersome because you'll need two versions of the library depending
which coredumps you are investigating.  However, I expect that we need
to fix the pthread_cancel race in glibc 2.34.

I can send my previous attempt with a straightforward lock (and perhaps
with the callback-based function removed).

However, I'd like to know what people think about relying on signal
unblocking delivering the signal that was sent to the thread itself.
Do we need to special-case the pthread_self case or not?

Thanks,
Florian
  
Adhemerval Zanella Aug. 26, 2021, 6:19 p.m. UTC | #2
On 26/08/2021 11:47, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> This is an update of my previous set to fix some NPTL issues [1].
>> The main changes are:
>>
>>   - Rebased against master and adjusted the __clone_internal usage.
>>   - Adapted Florian's ESRCH fixes [2]
>>   - Add fixes for various function that access the 'tid'.
>>
>> Patch 01 to 03 are general nptl fixes and they are independent of the
>> other fixes.
>>
>> Patch 04 is the main change of this patchset, it uses a different
>> field instead of the pthread 'tid' to synchrnonize the internal
>> thread state (BZ#19951).
>>
>> It allows to both move the thread setxid internal state out of
>> 'cancelhandling' (used on setuid() call in multi-thread information),
>> and remove the EXITING_BIT and TERMINATED_BIT (since 'joinstate' now
>> track such it).  This is done on patch 05 and 06.
>>
>> Patches 08 and 09 fixes two long standing issues regarding
>> pthread_kill() and thread exit (BZ#12889 and BZ#19193).  Now that]
>> 'tid' is setting explicitly by pthread_create(), a simple lock can be
>> used instead of more complex futex operation.
>>
>> Patches 10 to 18 extend the same 'tid' access fix to other pthread
>> functions that uses the member.
> 
> I don't think this series of patches is suitable for backport to glibc
> 2.34 once completed.  The libpthreaddb changes look particularly
> cumbersome because you'll need two versions of the library depending
> which coredumps you are investigating.  However, I expect that we need
> to fix the pthread_cancel race in glibc 2.34.

I thin for fix the pthread_cancel race on glibc 2.34 only the fix
for BZ#19951 along with your patch should be suffice.  I don't think
this is cumbersome, but I didn't considered the coredump (which I
also don't think it should be blocker though).

> 
> I can send my previous attempt with a straightforward lock (and perhaps
> with the callback-based function removed).
> 
> However, I'd like to know what people think about relying on signal
> unblocking delivering the signal that was sent to the thread itself.
> Do we need to special-case the pthread_self case or not?
> 
> Thanks,
> Florian
>