[v2,0/4] Simplify internal single-threaded usage

Message ID 20220610163552.3587064-1-adhemerval.zanella@linaro.org
Headers
Series Simplify internal single-threaded usage |

Message

Adhemerval Zanella Netto June 10, 2022, 4:35 p.m. UTC
  The glibc currently has three different internal ways to check if a
process is single-threaded: the exported global variable
__libc_single_threaded, the internal-only __libc_multiple_threads, and
the variant used by some architectures and allocated on TCB, 
multiple_threads.  Also each port can define SINGLE_THREAD_BY_GLOBAL
to either use __libc_multiple_threads or multiple_threads.

The __libc_single_threaded and __libc_multiple_threads have essentially
the same semantic: both are global variables where the value is not reset
if/when the process becomes multi-threaded.  The issue of using
__libc_single_threaded internally is since it is accessed through copy
relocation, both values must be updated.  This is fixed in the first
patch.

The second patch replaces __libc_multiple_threads with
__libc_single_threaded, while also fixing a bug where architectures that
define SINGLE_THREAD_BY_GLOBAL does not enable the optimization.

The third patch replaces multiple_threads with __libc_single_threaded,
to simplify a possible single-thread lock optimization.  On most
architectures, accessing an internal global variable should be as fast
as through the TCB (it seems that only legacy ABIs that require extra
code sequence to materialize global access, such as i686 and sparc,
using the TCB would be faster, however it is mitigated when the
SINGLE_THREAD_P is accessed in large code blocks).

The x86 is only architecture that optimizes the lock access directly by
reimplementing atomic operations.  In this case, the affected
implementations are rewritten to use SINGLE_THREAD_P macro, while some
unused macros are just removed (for instance atomic_add_zero).  The idea
is to just phase out this specific atomic implementation in favor of
compiler builtins and move the single-thread optimization to be
arch-neutral.

In the last patch just remove the single-thread.h header and move the
definition to internal sys/single_threaded.h, so now there is only
one place to add such optimization.

v2:
* Add RTLD_DEFAULT support for __libc_dlsym and use it instead of
  ___dlsym.
* Simplify the x86 atomic macros.

Adhemerval Zanella (4):
  misc: Optimize internal usage of __libc_single_threaded
  Replace __libc_multiple_threads with __libc_single_threaded
  Remove usage of TLS_MULTIPLE_THREADS_IN_TCB
  Remove single-thread.h

 elf/dl-libc.c                               |  20 +-
 elf/libc_early_init.c                       |   9 +
 include/sys/single_threaded.h               |  20 +-
 misc/single_threaded.c                      |   2 +
 misc/tst-atomic.c                           |   1 +
 nptl/Makefile                               |   1 -
 nptl/allocatestack.c                        |  12 -
 nptl/descr.h                                |  17 +-
 nptl/libc_multiple_threads.c                |  28 --
 nptl/pthread_cancel.c                       |   9 +-
 nptl/pthread_create.c                       |  11 +-
 sysdeps/generic/single-thread.h             |  25 -
 sysdeps/i386/htl/tcb-offsets.sym            |   1 -
 sysdeps/i386/nptl/tcb-offsets.sym           |   1 -
 sysdeps/i386/nptl/tls.h                     |   4 +-
 sysdeps/ia64/nptl/tcb-offsets.sym           |   1 -
 sysdeps/ia64/nptl/tls.h                     |   2 -
 sysdeps/mach/hurd/i386/tls.h                |   4 +-
 sysdeps/mach/hurd/sysdep-cancel.h           |   5 -
 sysdeps/nios2/nptl/tcb-offsets.sym          |   1 -
 sysdeps/or1k/nptl/tls.h                     |   2 -
 sysdeps/powerpc/nptl/tcb-offsets.sym        |   3 -
 sysdeps/powerpc/nptl/tls.h                  |   3 -
 sysdeps/s390/nptl/tcb-offsets.sym           |   1 -
 sysdeps/s390/nptl/tls.h                     |   6 +-
 sysdeps/sh/nptl/tcb-offsets.sym             |   1 -
 sysdeps/sh/nptl/tls.h                       |   2 -
 sysdeps/sparc/nptl/tcb-offsets.sym          |   1 -
 sysdeps/sparc/nptl/tls.h                    |   2 +-
 sysdeps/unix/sysdep.h                       |   2 +-
 sysdeps/unix/sysv/linux/aarch64/sysdep.h    |   2 -
 sysdeps/unix/sysv/linux/alpha/sysdep.h      |   2 -
 sysdeps/unix/sysv/linux/arc/sysdep.h        |   2 -
 sysdeps/unix/sysv/linux/arm/sysdep.h        |   2 -
 sysdeps/unix/sysv/linux/hppa/sysdep.h       |   2 -
 sysdeps/unix/sysv/linux/microblaze/sysdep.h |   2 -
 sysdeps/unix/sysv/linux/s390/sysdep.h       |   3 -
 sysdeps/unix/sysv/linux/single-thread.h     |  44 --
 sysdeps/unix/sysv/linux/x86_64/sysdep.h     |   2 -
 sysdeps/x86/atomic-machine.h                | 484 ++++++--------------
 sysdeps/x86_64/nptl/tcb-offsets.sym         |   1 -
 41 files changed, 199 insertions(+), 544 deletions(-)
 delete mode 100644 nptl/libc_multiple_threads.c
 delete mode 100644 sysdeps/generic/single-thread.h
 delete mode 100644 sysdeps/unix/sysv/linux/single-thread.h