[v2,00/14] Dynamic TLS related data race fixes

Message ID cover.1618301209.git.szabolcs.nagy@arm.com
Headers
Series Dynamic TLS related data race fixes |

Message

Szabolcs Nagy April 13, 2021, 8:17 a.m. UTC
  This series is trying to address some long standing dynamic
linker races:

19329 - race between tls access, pthread_create and dlopen
27111 - race between tls access, pthread_create and dlclose
27137 - x86: lazy tlsdesc relocation is racy

and there are a number of related issues fixed:

27135 - dtv gaps are not reused
27136 - dtv off-by-1 error
19924 - slow tls access after dlopen
27721 - x86: ld audit does not work with bind now tlsdesc

there are related issues that are not fixed, but tried to
not make the situation worse:

16133 - tls access is not as-safe
16134 - tls allocation aborts on oom
27112 - generation count may overflow on 32 bit targets

some of the patches can be handled independently.

The set is also available in the nsz/bug19329-v2 branch.

A common theme between the races is that the GL(dl_load_lock)
lock protects various dynamic linker shared global state that
are written under that lock, but in various situations (lazy
binding, tls access, thread creation) they are read without
the lock.

There seems to be consensus to not require relaxed atomic loads
for non-racy reads of objects that may be accessed atomically
elsewhere. The wiki with the concurrency rules have not been
updated yet. In this patch relaxed atomics is used if an access
is racy but synchronization is only missing when that is not
relied on (e.g. tls access must be synchronized with dlopen by
the user so unsynchronized dtv updates are not relied on).

I ran the glibc tests on x86_64, i386 and aarch64 linux (but
this may not cover the x86 tlsdesc changes completely).

v2:
- committed the fix for bug 27403 (arch64 memleak on dlclose)
- committed the fix for bug 21349 (a lazy symbol binding race)
- added an RFC fix for bug 19924 (slow tls after dlopen)
- reordered test patches after bug fixes.
- addressed previous review comments (see individual patches).

Szabolcs Nagy (14):
  elf: Fix a DTV setup issue [BZ #27136]
  elf: Add a DTV setup test [BZ #27136]
  elf: Fix comments and logic in _dl_add_to_slotinfo
  elf: Refactor _dl_update_slotinfo to avoid use after free
  elf: Fix data races in pthread_create and TLS access [BZ #19329]
  elf: Use relaxed atomics for racy accesses [BZ #19329]
  elf: Add test case for [BZ #19329]
  elf: Fix DTV gap reuse logic [BZ #27135]
  x86_64: Avoid lazy relocation of tlsdesc [BZ #27137]
  i386: Avoid lazy relocation of tlsdesc [BZ #27137]
  x86_64: Remove lazy tlsdesc relocation related code
  i386: Remove lazy tlsdesc relocation related code
  elf: Remove lazy tlsdesc relocation related code
  RFC elf: Fix slow tls access after dlopen [BZ #19924]

 elf/Makefile                |  15 ++-
 elf/dl-close.c              |  26 ++--
 elf/dl-open.c               |  21 ++--
 elf/dl-reloc.c              |   5 +-
 elf/dl-tls.c                | 161 ++++++++++++++-----------
 elf/tlsdeschtab.h           |  53 +--------
 elf/tst-tls20.c             |  98 +++++++++++++++
 elf/tst-tls20mod-bad.c      |   2 +
 elf/tst-tls21.c             |  68 +++++++++++
 elf/tst-tls21mod.c          |   1 +
 sysdeps/aarch64/tlsdesc.c   |   1 -
 sysdeps/arm/tlsdesc.c       |   1 -
 sysdeps/generic/ldsodefs.h  |   3 +-
 sysdeps/i386/dl-machine.h   |  76 ++++++------
 sysdeps/i386/dl-tlsdesc.S   | 156 ------------------------
 sysdeps/i386/dl-tlsdesc.h   |   6 +-
 sysdeps/i386/tlsdesc.c      | 230 ------------------------------------
 sysdeps/x86_64/dl-machine.h |  23 ++--
 sysdeps/x86_64/dl-tls.c     |   5 +-
 sysdeps/x86_64/dl-tlsdesc.S | 104 ----------------
 sysdeps/x86_64/dl-tlsdesc.h |   4 +-
 sysdeps/x86_64/tlsdesc.c    | 108 -----------------
 22 files changed, 361 insertions(+), 806 deletions(-)
 create mode 100644 elf/tst-tls20.c
 create mode 100644 elf/tst-tls20mod-bad.c
 create mode 100644 elf/tst-tls21.c
 create mode 100644 elf/tst-tls21mod.c