Patchwork RFC: Implement __libc_single_threaded support

login
register
mail settings
Submitter Florian Weimer
Date Feb. 4, 2019, 11:08 a.m.
Message ID <87munbu7z3.fsf@oldenburg2.str.redhat.com>
Download mbox | patch
Permalink /patch/31295/
State New
Headers show

Comments

Florian Weimer - Feb. 4, 2019, 11:08 a.m.
I'd like to see some feedback on the patch below.  I think this is a
useful addition, among other things it's important for a decent
std::shared_ptr implementation.

It will allow us to use replace __libc_multiple_threads inside glibc.
The new mechanism is both more accurate (joining the last thread brings
us back to single-threaded mode) and more reliable (the broadcast works
across dlmopen).  It also works in any shared object.

We could pick a different name for the header file, such as
<gnu/single_threaded.h>.  Having a dedicated header file is relevant to
the libstdc++ headers, I think, because they can then use __has_include
to determine whether the installed glibc version supports this facility.
If we snuck this into <pthread.h>, that would not be possible.

The internal namespace (the __libc_ prefix) is a bit awkward, but I
think it's the only way to keep things namespace-clean for use in C++
templates.

The variable could be of type _Bool instead int, but I note that Richard
Henderson's LSE atomics optimization uses _Bool as well.

Thanks,
Florian
-----
Implement __libc_single_threaded support

The variable is defined in libc_nonshared.a as a hidden symbol, so
that it can be read efficiently even on architectures which have
limited support for position-indepndent code.  To update it when the
process goes multi-threaded, its address is registered in the link map
for the corresponding object, via an ELF constructor which is supposed
to run early.  The variable starts out as false, so that it is
conservatively correct even if the ELF constructor has not yet run.

__libc_single_threaded is in the reserved namespace, so that other
toolchain libraries (such as libstdc++ in the implementation of
std::shared_ptr) can use it, without causing namespace issues.

For static and static dlopen scenarios, slightly different code is
used.  In the static dlopen case, nothing can be done, and the process
is marked as multi-threaded in both the inner and outer libcs.

In the nptl implementation, a stronger memory order is required for
access to __nptl_nthreads variable because we must be sure that a
thread exit (including a detached thread exit) happens before the read
of a value of 1, indicating single-threaded mode.

The idea for a hidden variable that indicates single-threaded mode was
inspired by Richard Henderson's out-of-line LSE attomics for AArch64
and the following discussion:

  <https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01484.html>

2019-02-04  Florian Weimer  <fweimer@redhat.com>

	Add the __libc_single_threaded variable.
	* misc/sys/single_threaded.h: New file.
	* misc/Makefile (headers): Add sys/single_threaded.h.
	(routines, static-only-routines): Add single_threaded.
	(CFLAGS-single_threaded.c): Compile with -Wno-error.
	* dlfcn/dlmopen.c [!SHARED] (dlmopen): Call
	__dl_single_threaded_dlopen_called.
	* dlfcn/dlmopen.c [!SHARED] (dlopen): Likewise.
	* elf/Makefile (routines, elide-routines.os): Add
	dl-single_threaded_broadcast-static.
	(rtld-routines): Add dl-single_threaded_register.
	(tests-static-normal): Add tst-single_threaded-static,
	tst-single_threaded-pthread-static.
	[$(build-shared)] (tests-static): Add
	tst-single_threaded-static-dlopen.
	[$(build-shared)] (static-dlopen-environment): New variable.
	[$(build-shared)] (tst-tls9-static-ENV): Use it.
	[$(build-shared)] (tst-single_threaded-static-dlopen-ENV): Likewise.
	(tests): Add tst-single_threaded, tst-single_threaded-pthread.
	(modules-names): Add tst-single_threaded-mod1,
	tst-single_threaded-mod2, tst-single_threaded-mod3,
	tst-single_threaded-mod4.
	(tst-single_threaded): Link with tst-single_threaded-mod1.so,
	libdl.so.
	(tst-single_threaded.out): Depend on tst-single_threaded-mod2.so,
	tst-single_threaded-mod3.so.
	(tst-single_threaded-static-dlopen): Link with
	tst-single_threaded-mod1.o, libdl.a.
	(tst-single_threaded-static-dlopen.out): Depend on
	tst-single_threaded-mod2.so.
	(tst-single_threaded-pthread): Link with
	tst-single_threaded-mod1.so, libdl.so, libpthread.
	(tst-single_threaded-pthread.out): Depend on
	tst-single_threaded-mod2.so, tst-single_threaded-mod3.so,
	tst-single_threaded-mod4.so.
	(tst-single_threaded-pthread-static): Link with libpthread.
	* elf/Versions (ld GLIBC_2.30): Add
	__libc_single_threaded_register.
	* elf/dl-libc.c (__libc_dlopen_mode): Call
	__dl_single_threaded_dlopen_called.
	* elf/dl-single_threaded_broadcast.c: New file.
	* elf/dl-single_threaded_broadcast-static.c: Likewise.
	* elf/dl-single_threaded_register.c: Likewise.
	* elf/rtld.c (__libc_single_threaded): Define.
	(_dl_start_final): Initialize __libc_single_threaded and
	GL(dl_rtld_map).l_single_threaded.
	* elf/tst-single_threaded.c: New file.
	* elf/tst-single_threaded-pthread.c: Likewise.
	* elf/tst-single_threaded-pthread-static.c: Likewise.
	* elf/tst-single_threaded-static.c: Likewise.
	* elf/tst-single_threaded-static-dlopen.c: Likewise.
	* elf/tst-single_threaded-mod1.c: Likewise.
	* elf/tst-single_threaded-mod2.c: Likewise.
	* elf/tst-single_threaded-mod3.c: Likewise.
	* elf/tst-single_threaded-mod4.c: Likewise.
	* include/link.h (struct link_map): Add l_single_threaded.
	* include/sys/single_threaded.h: New file.
	* manual/threads.texi (Single-Threaded): New node.
	* nptl/Makefile (libpthread-routines): Add
	nptl-single_threaded_broadcast.
	* nptl/allocatestack.c (__reclaim_stacks): Call
	__dl_single_threaded_broadcast.
	* nptl/nptl-single_threaded_broadcast.c: New file.
	* nptl/pthread_create.c (__nptl_deallocate_tsd): Use stronger
	memory order for __nptl_threads access.
	(START_THREAD_DEFN): Likewise.
	(__pthread_create_2_1): Likewise.  Call
	__dl_single_threaded_broadcast when transitioning to and from
	multi-threaded.
	* nptl/pthread_join_common (__pthread_timedjoin_ex): Call
	__dl_single_threaded_broadcast when transitioning to
	single-threaded.
	* sysdeps/mach/hurd/i386/ld.abilist (GLIBC_2.30): Add
	__libc_single_threaded_register.
	* sysdeps/unix/sysv/linux/aarch64/ld.abilist (GLIBC_2.30):
	Likewise.
	* sysdeps/unix/sysv/linux/alpha/ld.abilist (GLIBC_2.30): Likewise.
	* sysdeps/unix/sysv/linux/arm/ld.abilist (GLIBC_2.30): Likewise.
	* sysdeps/unix/sysv/linux/csky/ld.abilist (GLIBC_2.30): Likewise.
	* sysdeps/unix/sysv/linux/hppa/ld.abilist (GLIBC_2.30): Likewise.
	* sysdeps/unix/sysv/linux/i386/ld.abilist (GLIBC_2.30): Likewise.
	* sysdeps/unix/sysv/linux/ia64/ld.abilist (GLIBC_2.30): Likewise.
	* sysdeps/unix/sysv/linux/m68k/coldfire/ld.abilist (GLIBC_2.30):
	Likewise.
	* sysdeps/unix/sysv/linux/m68k/m680x0/ld.abilist (GLIBC_2.30):
	Likewise.
	* sysdeps/unix/sysv/linux/microblaze/ld.abilist (GLIBC_2.30):
	Likewise.
	* sysdeps/unix/sysv/linux/mips/mips32/ld.abilist (GLIBC_2.30):
	Likewise.
	* sysdeps/unix/sysv/linux/mips/mips64/n32/ld.abilist (GLIBC_2.30):
	Likewise.
	* sysdeps/unix/sysv/linux/mips/mips64/n64/ld.abilist (GLIBC_2.30):
	Likewise.
	* sysdeps/unix/sysv/linux/nios2/ld.abilist (GLIBC_2.30):
	Likewise.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist
	(GLIBC_2.30): Likewise.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/be/ld.abilist
	(GLIBC_2.30): Likewise.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/le/ld.abilist
	(GLIBC_2.30): Likewise.
	* sysdeps/unix/sysv/linux/riscv/rv64/ld.abilist (GLIBC_2.30):
	Likewise.
	* sysdeps/unix/sysv/linux/s390/s390-32/ld.abilist (GLIBC_2.30):
	Likewise.
	* sysdeps/unix/sysv/linux/s390/s390-64/ld.abilist (GLIBC_2.30):
	Likewise.
	* sysdeps/unix/sysv/linux/sh/ld.abilist (GLIBC_2.30): Likewise.
	* sysdeps/unix/sysv/linux/sparc/sparc32/ld.abilist (GLIBC_2.30):
	Likewise.
	* sysdeps/unix/sysv/linux/sparc/sparc64/ld.abilist (GLIBC_2.30):
	Likewise.
	* sysdeps/unix/sysv/linux/x86_64/64/ld.abilist (GLIBC_2.30):
	Likewise.
	* sysdeps/unix/sysv/linux/x86_64/x32/ld.abilist (GLIBC_2.30):
	Likewise.
Rich Felker - Feb. 4, 2019, 6:09 p.m.
On Mon, Feb 04, 2019 at 12:08:16PM +0100, Florian Weimer wrote:
> I'd like to see some feedback on the patch below.  I think this is a
> useful addition, among other things it's important for a decent
> std::shared_ptr implementation.
> 
> It will allow us to use replace __libc_multiple_threads inside glibc.
> The new mechanism is both more accurate (joining the last thread brings
> us back to single-threaded mode) and more reliable (the broadcast works
> across dlmopen).  It also works in any shared object.
> 
> We could pick a different name for the header file, such as
> <gnu/single_threaded.h>.  Having a dedicated header file is relevant to
> the libstdc++ headers, I think, because they can then use __has_include
> to determine whether the installed glibc version supports this facility.
> If we snuck this into <pthread.h>, that would not be possible.
> 
> The internal namespace (the __libc_ prefix) is a bit awkward, but I
> think it's the only way to keep things namespace-clean for use in C++
> templates.
> 
> The variable could be of type _Bool instead int, but I note that Richard
> Henderson's LSE atomics optimization uses _Bool as well.
> 
> Thanks,
> Florian
> -----
> Implement __libc_single_threaded support
> 
> The variable is defined in libc_nonshared.a as a hidden symbol, so
> that it can be read efficiently even on architectures which have
> limited support for position-indepndent code.  To update it when the

I think this assessment of where it helps is incorrect. Let's look at
what would happen if it were just a normal external data object from
libc.so:

1. If you're accessing it from a non-PIE main program, there is no
   overhead because it's just a copy relocation and gets a fixed
   address.

2. If you're accessing it from a PIE main program, it can still be a
   copy relocation, but the access is PC-relative. Or it can use:

3. If you're accessing it from a shared library, the address of the
   object is loaded via a PC-relative load from the GOT.

In case 1, your optimization makes no difference at all. In case 2, it
also makes no difference; either way there's a PC-relative access.
Only in case 3 is there a difference; with your optimization, a
PC-relative access to the data replaces a PC-relative access to the
GOT slot followed by an indirection.

So all your optimization is doing is saving one level of indirection
for access from shared libraries. It does not save any expensive setup
for PC-relative access on archs where it's expensive.

If you do away with this, there is no complex init-time setup, no O(n)
updating of local copies when threads are created/destroyed, no
dynamic management of a list of copies, no synchronization with
dlopen/dlclose, etc. The standard relocation mechanisms just do their
thing and the only cost is replacing:

	mov __libc_single_threaded(%rip),%eax

with:

	mov __libc_single_threaded@GOT(%rip),%rax
	mov (%rax),eax

and equivalent on all other archs. Saving one indirection does not
seem to be worth all this heavy machinery that imposes constraints on
the implementation and various costs, failure modes, and bug surface.

Rich
Florian Weimer - Feb. 4, 2019, 6:54 p.m.
* Rich Felker:

> So all your optimization is doing is saving one level of indirection
> for access from shared libraries. It does not save any expensive setup
> for PC-relative access on archs where it's expensive.

That's correct.  I'd like to hear the point of view of architecture
maintainers on this matter.

I can prepare a patch using just one variable if someone wants to run
actual performance measurements.  Other aspects of the complexity will
remain, though.  I don't think the thread counter should be read
directly, but rather an infrequently updated variable, as in the current
patch.  Registration would go away, but the complicated rules for
updates of the single-threaded indicator in statically linked processes
which use dlopen would remain.  However, dlmopen support would come
naturally as long as we put the variable into the (globally shared)
dynamic loader.

Thanks,
Florian
Rich Felker - Feb. 4, 2019, 11:16 p.m.
On Mon, Feb 04, 2019 at 07:54:09PM +0100, Florian Weimer wrote:
> * Rich Felker:
> 
> > So all your optimization is doing is saving one level of indirection
> > for access from shared libraries. It does not save any expensive setup
> > for PC-relative access on archs where it's expensive.
> 
> That's correct.  I'd like to hear the point of view of architecture
> maintainers on this matter.
> 
> I can prepare a patch using just one variable if someone wants to run
> actual performance measurements.  Other aspects of the complexity will
> remain, though.  I don't think the thread counter should be read
> directly, but rather an infrequently updated variable, as in the current
> patch.  Registration would go away, but the complicated rules for
> updates of the single-threaded indicator in statically linked processes
> which use dlopen would remain.

I don't see how dlopen is involed at all once there's no more per-DSO
data object. The only complexity is synchronization around the change
from 1->0 or 0->1 when threads are created or exit.

Rich
Florian Weimer - Feb. 6, 2019, 11:59 a.m.
* Rich Felker:

> On Mon, Feb 04, 2019 at 07:54:09PM +0100, Florian Weimer wrote:
>> * Rich Felker:
>> 
>> > So all your optimization is doing is saving one level of indirection
>> > for access from shared libraries. It does not save any expensive setup
>> > for PC-relative access on archs where it's expensive.
>> 
>> That's correct.  I'd like to hear the point of view of architecture
>> maintainers on this matter.
>> 
>> I can prepare a patch using just one variable if someone wants to run
>> actual performance measurements.  Other aspects of the complexity will
>> remain, though.  I don't think the thread counter should be read
>> directly, but rather an infrequently updated variable, as in the current
>> patch.  Registration would go away, but the complicated rules for
>> updates of the single-threaded indicator in statically linked processes
>> which use dlopen would remain.
>
> I don't see how dlopen is involed at all once there's no more per-DSO
> data object. The only complexity is synchronization around the change
> from 1->0 or 0->1 when threads are created or exit.

With static dlopen, there would still be two variables because the
static variable does not interpose the definition on the dynamic side.
(The dlmopen case also needs some more work, I just realized, due to the
current duplication of the __nptl_nthreads variable.)

Thanks,
Florian
Szabolcs Nagy - Feb. 6, 2019, 6:04 p.m.
On 04/02/2019 18:54, Florian Weimer wrote:
> * Rich Felker:

> 

>> So all your optimization is doing is saving one level of indirection

>> for access from shared libraries. It does not save any expensive setup

>> for PC-relative access on archs where it's expensive.

> 

> That's correct.  I'd like to hear the point of view of architecture

> maintainers on this matter.

> 


exposing a __libc_single_threaded api sounds useful to me.

doing the libc_nonshared.a trick that makes it local so an
indirection can be saved, but requires updates to all dsos
on thread creation/exit, sounds like a hairy abi to expose,
but such optimization is certainly ok inside glibc (as i
understand libc.so and libpthread.so can already access
this info locally now, which we don't want to regress)

(on some targets tp-relative access may be even faster than
pc-relative access, so if we want this super optimized then
we could put it at a fixed magic tls offset and then there
is no need for O(n) operation to update it.. but a magic tls
offset abi is even worse so i think i prefer to just use a
normal extern object symbol in libc.so)

> I can prepare a patch using just one variable if someone wants to run

> actual performance measurements.  Other aspects of the complexity will

> remain, though.  I don't think the thread counter should be read

> directly, but rather an infrequently updated variable, as in the current

> patch.  Registration would go away, but the complicated rules for

> updates of the single-threaded indicator in statically linked processes

> which use dlopen would remain.  However, dlmopen support would come

> naturally as long as we put the variable into the (globally shared)

> dynamic loader.

> 

> Thanks,

> Florian

>

Patch

diff --git a/dlfcn/dlmopen.c b/dlfcn/dlmopen.c
index db1d0922d5..5aa458e2a4 100644
--- a/dlfcn/dlmopen.c
+++ b/dlfcn/dlmopen.c
@@ -22,6 +22,7 @@ 
 #include <stddef.h>
 #include <unistd.h>
 #include <ldsodefs.h>
+#include <sys/single_threaded.h>
 
 #if !defined SHARED && IS_IN (libdl)
 
@@ -92,6 +93,7 @@  __dlmopen (Lmid_t nsid, const char *file, int mode DL_CALLER_DECL)
 # ifdef SHARED
   return _dlerror_run (dlmopen_doit, &args) ? NULL : args.new;
 # else
+  __dl_single_threaded_dlopen_called ();
   if (_dlerror_run (dlmopen_doit, &args))
     return NULL;
 
diff --git a/dlfcn/dlopen.c b/dlfcn/dlopen.c
index 7fd33480d3..67f08561f7 100644
--- a/dlfcn/dlopen.c
+++ b/dlfcn/dlopen.c
@@ -21,6 +21,7 @@ 
 #include <stddef.h>
 #include <unistd.h>
 #include <ldsodefs.h>
+#include <sys/single_threaded.h>
 
 #if !defined SHARED && IS_IN (libdl)
 
@@ -86,6 +87,7 @@  __dlopen (const char *file, int mode DL_CALLER_DECL)
 # ifdef SHARED
   return _dlerror_run (dlopen_doit, &args) ? NULL : args.new;
 # else
+  __dl_single_threaded_dlopen_called ();
   if (_dlerror_run (dlopen_doit, &args))
     return NULL;
 
diff --git a/elf/Makefile b/elf/Makefile
index 9cf5cd8dfd..a7634ba672 100644
--- a/elf/Makefile
+++ b/elf/Makefile
@@ -25,7 +25,7 @@  headers		= elf.h bits/elfclass.h link.h bits/link.h
 routines	= $(all-dl-routines) dl-support dl-iteratephdr \
 		  dl-addr dl-addr-obj enbl-secure dl-profstub \
 		  dl-origin dl-libc dl-sym dl-sysdep dl-error \
-		  dl-reloc-static-pie
+		  dl-reloc-static-pie dl-single_threaded_broadcast-static
 
 # The core dynamic linking functions are in libc for the static and
 # profiled libraries.
@@ -53,12 +53,13 @@  endif
 all-dl-routines = $(dl-routines) $(sysdep-dl-routines)
 # But they are absent from the shared libc, because that code is in ld.so.
 elide-routines.os = $(all-dl-routines) dl-support enbl-secure dl-origin \
-		    dl-sysdep dl-exception dl-reloc-static-pie
+		    dl-sysdep dl-exception dl-reloc-static-pie \
+		    dl-single_threaded_broadcast-static
 
 # ld.so uses those routines, plus some special stuff for being the program
 # interpreter and operating independent of libc.
 rtld-routines	= rtld $(all-dl-routines) dl-sysdep dl-environ dl-minimal \
-  dl-error-minimal dl-conflict
+  dl-error-minimal dl-conflict dl-single_threaded_register
 all-rtld-routines = $(rtld-routines) $(sysdep-rtld-routines)
 
 CFLAGS-dl-runtime.c += -fexceptions -fasynchronous-unwind-tables
@@ -147,7 +148,9 @@  endif
 tests-static-normal := tst-leaks1-static tst-array1-static tst-array5-static \
 	       tst-dl-iter-static \
 	       tst-tlsalign-static tst-tlsalign-extern-static \
-	       tst-linkall-static tst-env-setuid tst-env-setuid-tunables
+	       tst-linkall-static tst-env-setuid tst-env-setuid-tunables \
+	       tst-single_threaded-static tst-single_threaded-pthread-static
+
 tests-static-internal := tst-tls1-static tst-tls2-static \
 	       tst-ptrguard1-static tst-stackguard1-static \
 	       tst-tls1-static-non-pie tst-libc_dlvsym-static
@@ -162,9 +165,11 @@  tests-internal := tst-tls1 tst-tls2 $(tests-static-internal)
 tests-static := $(tests-static-normal) $(tests-static-internal)
 
 ifeq (yes,$(build-shared))
-tests-static += tst-tls9-static
-tst-tls9-static-ENV = \
-       LD_LIBRARY_PATH=$(objpfx):$(common-objpfx):$(common-objpfx)dlfcn
+tests-static += tst-tls9-static tst-single_threaded-static-dlopen
+static-dlopen-environment = \
+  LD_LIBRARY_PATH=$(objpfx):$(common-objpfx):$(common-objpfx)dlfcn
+tst-tls9-static-ENV = $(static-dlopen-environment)
+tst-single_threaded-static-dlopen-ENV = $(static-dlopen-environment)
 
 tests += restest1 preloadtest loadfail multiload origtest resolvfail \
 	 constload1 order noload filter \
@@ -187,7 +192,8 @@  tests += restest1 preloadtest loadfail multiload origtest resolvfail \
 	 tst-nodelete2 tst-audit11 tst-audit12 tst-dlsym-error tst-noload \
 	 tst-latepthread tst-tls-manydynamic tst-nodelete-dlclose \
 	 tst-debug1 tst-main1 tst-absolute-sym tst-absolute-zero tst-big-note \
-	 tst-unwind-ctor tst-unwind-main
+	 tst-unwind-ctor tst-unwind-main tst-single_threaded \
+	 tst-single_threaded-pthread
 #	 reldep9
 tests-internal += loadtest unload unload2 circleload1 \
 	 neededtest neededtest2 neededtest3 neededtest4 \
@@ -275,7 +281,9 @@  modules-names = testobj1 testobj2 testobj3 testobj4 testobj5 testobj6 \
 		tst-latepthreadmod $(tst-tls-many-dynamic-modules) \
 		tst-nodelete-dlclose-dso tst-nodelete-dlclose-plugin \
 		tst-main1mod tst-libc_dlvsym-dso tst-absolute-sym-lib \
-		tst-absolute-zero-lib tst-big-note-lib tst-unwind-ctor-lib
+		tst-absolute-zero-lib tst-big-note-lib tst-unwind-ctor-lib \
+		tst-single_threaded-mod1 tst-single_threaded-mod2 \
+		tst-single_threaded-mod3 tst-single_threaded-mod4
 # Most modules build with _ISOMAC defined, but those filtered out
 # depend on internal headers.
 modules-names-tests = $(filter-out ifuncmod% tst-libc_dlvsym-dso tst-tlsmod%,\
@@ -1498,3 +1506,17 @@  $(objpfx)tst-big-note: $(objpfx)tst-big-note-lib.so
 $(objpfx)tst-unwind-ctor: $(objpfx)tst-unwind-ctor-lib.so
 
 CFLAGS-tst-unwind-main.c += -funwind-tables
+
+$(objpfx)tst-single_threaded: $(objpfx)tst-single_threaded-mod1.so $(libdl)
+$(objpfx)tst-single_threaded.out: \
+  $(objpfx)tst-single_threaded-mod2.so $(objpfx)tst-single_threaded-mod3.so
+$(objpfx)tst-single_threaded-static-dlopen: \
+  $(objpfx)tst-single_threaded-mod1.o $(common-objpfx)dlfcn/libdl.a
+$(objpfx)tst-single_threaded-static-dlopen.out: \
+  $(objpfx)tst-single_threaded-mod2.so
+$(objpfx)tst-single_threaded-pthread: \
+  $(objpfx)tst-single_threaded-mod1.so $(libdl) $(shared-thread-library)
+$(objpfx)tst-single_threaded-pthread.out: \
+  $(objpfx)tst-single_threaded-mod2.so $(objpfx)tst-single_threaded-mod3.so \
+  $(objpfx)tst-single_threaded-mod4.so
+$(objpfx)tst-single_threaded-pthread-static: $(static-thread-library)
diff --git a/elf/Versions b/elf/Versions
index 3b09901f6c..7dd624eca0 100644
--- a/elf/Versions
+++ b/elf/Versions
@@ -54,6 +54,9 @@  ld {
     # stack canary
     __stack_chk_guard;
   }
+  GLIBC_2.30 {
+    __libc_single_threaded_register;
+  }
   GLIBC_PRIVATE {
     # Those are in the dynamic linker, but used by libc.so.
     __libc_enable_secure;
diff --git a/elf/dl-libc.c b/elf/dl-libc.c
index 0fedee7ef5..bc8da916b0 100644
--- a/elf/dl-libc.c
+++ b/elf/dl-libc.c
@@ -21,6 +21,7 @@ 
 #include <stdlib.h>
 #include <ldsodefs.h>
 #include <dl-hash.h>
+#include <sys/single_threaded.h>
 
 extern int __libc_argc attribute_hidden;
 extern char **__libc_argv attribute_hidden;
@@ -194,6 +195,7 @@  __libc_dlopen_mode (const char *name, int mode)
     return _dl_open_hook->dlopen_mode (name, mode);
   return (dlerror_run (do_dlopen, &args) ? NULL : (void *) args.map);
 #else
+  __dl_single_threaded_dlopen_called ();
   if (dlerror_run (do_dlopen, &args))
     return NULL;
 
diff --git a/elf/dl-single_threaded_broadcast-static.c b/elf/dl-single_threaded_broadcast-static.c
new file mode 100644
index 0000000000..023583693b
--- /dev/null
+++ b/elf/dl-single_threaded_broadcast-static.c
@@ -0,0 +1,44 @@ 
+/* Support for single-thread optimizations; broadcasting in static libc.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <atomic.h>
+#include <stdbool.h>
+#include <sys/single_threaded.h>
+
+/* Keep track of calls to dlopen.  */
+static _Bool __static_dlopen_called;
+
+void
+__dl_single_threaded_dlopen_called (void)
+{
+  /* Conservatively assume that the process is no longer
+     single-threaded after dlopen.  The inner libc may load
+     libpthread.  */
+  if (__libc_single_threaded)
+    __libc_single_threaded = false;
+  atomic_store_relaxed (&__static_dlopen_called, true);
+}
+
+void
+__dl_single_threaded_broadcast (_Bool single_threaded)
+{
+  /* If static dlopen has been called, we must conservatively assume
+     that the process is multi-threaded.  */
+  __libc_single_threaded
+    = single_threaded && !atomic_load_relaxed (&__static_dlopen_called);
+}
diff --git a/elf/dl-single_threaded_broadcast.c b/elf/dl-single_threaded_broadcast.c
new file mode 100644
index 0000000000..8c7985fcd9
--- /dev/null
+++ b/elf/dl-single_threaded_broadcast.c
@@ -0,0 +1,49 @@ 
+/* Support for single-thread optimizations; broadcasting changes.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file is supposed to be compiled for and linked into the
+   threading library that needs it.  */
+
+#include <ldsodefs.h>
+#include <sys/single_threaded.h>
+
+#ifndef SHARED
+# error "For dynamic linking only, see dl-single_threaded_broadcast-static.c."
+#endif
+
+void
+__dl_single_threaded_broadcast (_Bool single_threaded)
+{
+  /* No locking is necessary in this function because we perform the
+     update either before we become multi-threaded or after we have
+     become single-threaded.  */
+
+  /* No need to broadcast with static dlopen because we must
+     conservatively assume that there multiple threads.  */
+  if (!rtld_active ())
+    return;
+
+  /* If the value has not changed, do not broadcast it.  */
+  if (single_threaded == *GL(dl_rtld_map).l_single_threaded)
+    return;
+  for (Lmid_t ns = 0; ns < GL(dl_nns); ++ns)
+    for (struct link_map *l = GL(dl_ns)[ns]._ns_loaded;
+         l != NULL; l = l->l_next)
+      if (l->l_single_threaded != NULL)
+        *l->l_single_threaded = single_threaded;
+}
diff --git a/elf/dl-single_threaded_register.c b/elf/dl-single_threaded_register.c
new file mode 100644
index 0000000000..0031435224
--- /dev/null
+++ b/elf/dl-single_threaded_register.c
@@ -0,0 +1,58 @@ 
+/* Support for single-thread optimizations; registration of local variables.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <elf.h>
+#include <ldsodefs.h>
+#include <stdio.h>
+#include <sys/single_threaded.h>
+
+void
+__libc_single_threaded_register (_Bool *single_threaded_ptr)
+{
+  /* In a nested libc context, conservatively assume that multiple
+     threads are present.  The code below needs an active dynamic
+     loader.  */
+  if (!rtld_active ())
+    return;
+
+  /* Protect against concurrent loads and unloads.  Assume that the
+     registration for a single DSO does not run in parallel (i.e.,
+     that the loader serializes ELF constructor invocations).  */
+  __rtld_lock_lock_recursive (GL(dl_load_lock));
+
+  /* Since __libc_single_threaded is a data symbol, it is not
+     necessary to convert it to a function descriptor.  */
+  struct link_map *l
+    = _dl_find_dso_for_object ((ElfW(Addr)) single_threaded_ptr);
+  if (l == NULL)
+    __libc_fatal ("\
+Fatal glibc error: cannot register __libc_single_threaded variable\n");
+
+  if (l->l_single_threaded == NULL)
+    {
+      l->l_single_threaded = single_threaded_ptr;
+      /* Only if the registration was successful, we can inherit the
+         global single-threaded value.  Use the value from the main
+         link map.  The value of l_single_threaded changes only while
+         single-threaded, so no synchronization is required.  */
+      *single_threaded_ptr = *GL(dl_rtld_map).l_single_threaded;
+    }
+
+  __rtld_lock_unlock_recursive (GL(dl_load_lock));
+}
+rtld_hidden_def (__libc_single_threaded_register)
diff --git a/elf/rtld.c b/elf/rtld.c
index 5d97f41b7b..9ae41d3216 100644
--- a/elf/rtld.c
+++ b/elf/rtld.c
@@ -43,6 +43,7 @@ 
 #include <stap-probe.h>
 #include <stackinfo.h>
 #include <not-cancel.h>
+#include <sys/single_threaded.h>
 
 #include <assert.h>
 
@@ -348,6 +349,12 @@  extern char _begin[] attribute_hidden;
 extern char _etext[] attribute_hidden;
 extern char _end[] attribute_hidden;
 
+/* _dl_start_final below sets the local copy of __libc_single_threaded
+   to true and registers the variable directly in the link map, so no
+   ELF constructor is needed.  This avoids an incorrect value in case
+   of static dlopen (although the copy of ld.so should be completely
+   dormant).  */
+_Bool __libc_single_threaded attribute_hidden __attribute__ ((nocommon));
 
 #ifdef RTLD_START
 RTLD_START
@@ -396,6 +403,11 @@  _dl_start_final (void *arg, struct dl_start_final_info *info)
   GL(dl_rtld_map).l_map_start = (ElfW(Addr)) _begin;
   GL(dl_rtld_map).l_map_end = (ElfW(Addr)) _end;
   GL(dl_rtld_map).l_text_end = (ElfW(Addr)) _etext;
+
+  /* We start with just one thread.  */
+  __libc_single_threaded = true;
+  GL(dl_rtld_map).l_single_threaded = &__libc_single_threaded;
+
   /* Copy the TLS related data if necessary.  */
 #ifndef DONT_USE_BOOTSTRAP_MAP
 # if NO_TLS_OFFSET != 0
diff --git a/elf/tst-single_threaded-mod1.c b/elf/tst-single_threaded-mod1.c
new file mode 100644
index 0000000000..9fe94b2526
--- /dev/null
+++ b/elf/tst-single_threaded-mod1.c
@@ -0,0 +1,25 @@ 
+/* Test support for single-thread optimizations.  Shared object 1.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sys/single_threaded.h>
+
+_Bool
+single_threaded_1 (void)
+{
+  return __libc_single_threaded;
+}
diff --git a/elf/tst-single_threaded-mod2.c b/elf/tst-single_threaded-mod2.c
new file mode 100644
index 0000000000..a5166c9ebc
--- /dev/null
+++ b/elf/tst-single_threaded-mod2.c
@@ -0,0 +1,25 @@ 
+/* Test support for single-thread optimizations.  Shared object 2.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sys/single_threaded.h>
+
+_Bool
+single_threaded_2 (void)
+{
+  return __libc_single_threaded;
+}
diff --git a/elf/tst-single_threaded-mod3.c b/elf/tst-single_threaded-mod3.c
new file mode 100644
index 0000000000..53df13e3a7
--- /dev/null
+++ b/elf/tst-single_threaded-mod3.c
@@ -0,0 +1,25 @@ 
+/* Test support for single-thread optimizations.  Shared object 3.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sys/single_threaded.h>
+
+_Bool
+single_threaded_3 (void)
+{
+  return __libc_single_threaded;
+}
diff --git a/elf/tst-single_threaded-mod4.c b/elf/tst-single_threaded-mod4.c
new file mode 100644
index 0000000000..3bf5e555a4
--- /dev/null
+++ b/elf/tst-single_threaded-mod4.c
@@ -0,0 +1,25 @@ 
+/* Test support for single-thread optimizations.  Shared object 4.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sys/single_threaded.h>
+
+_Bool
+single_threaded_4 (void)
+{
+  return __libc_single_threaded;
+}
diff --git a/elf/tst-single_threaded-pthread-static.c b/elf/tst-single_threaded-pthread-static.c
new file mode 100644
index 0000000000..cf35075bd3
--- /dev/null
+++ b/elf/tst-single_threaded-pthread-static.c
@@ -0,0 +1,84 @@ 
+/* Test support for single-thread optimizations.  With threads, static version.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This test is a stripped-down version of
+   tst-single_threaded-pthread.c, without any loading of dynamic
+   objects.  */
+
+#include <support/check.h>
+#include <support/xthread.h>
+#include <sys/single_threaded.h>
+
+/* First barrier synchronizes main thread, thread 1, thread 2.  */
+static pthread_barrier_t barrier1;
+
+/* Second barrier synchronizes main thread, thread 2.  */
+static pthread_barrier_t barrier2;
+
+static void *
+threadfunc (void *closure)
+{
+  TEST_VERIFY (!__libc_single_threaded);
+
+  /* Wait for the main thread and the other thread.  */
+  xpthread_barrier_wait (&barrier1);
+  TEST_VERIFY (!__libc_single_threaded);
+
+  /* Second thread waits on second barrier, too.  */
+  if (closure != NULL)
+    xpthread_barrier_wait (&barrier2);
+  TEST_VERIFY (!__libc_single_threaded);
+
+  return NULL;
+}
+
+static int
+do_test (void)
+{
+  TEST_VERIFY (__libc_single_threaded);
+
+  /* Two threads plus main thread.  */
+  xpthread_barrier_init (&barrier1, NULL, 3);
+
+  /* Main thread and second thread.  */
+  xpthread_barrier_init (&barrier2, NULL, 2);
+
+  pthread_t thr1 = xpthread_create (NULL, threadfunc, NULL);
+  TEST_VERIFY (!__libc_single_threaded);
+
+  pthread_t thr2 = xpthread_create (NULL, threadfunc, &thr2);
+  TEST_VERIFY (!__libc_single_threaded);
+
+  xpthread_barrier_wait (&barrier1);
+  TEST_VERIFY (!__libc_single_threaded);
+
+  /* Join first thread.  This should not bring us back into
+     single-threaded mode.  */
+  xpthread_join (thr1);
+  TEST_VERIFY (!__libc_single_threaded);
+
+  /* We should be back in single-threaded mode after joining both
+     threads.  */
+  xpthread_barrier_wait (&barrier2);
+  xpthread_join (thr2);
+  TEST_VERIFY (__libc_single_threaded);
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/elf/tst-single_threaded-pthread.c b/elf/tst-single_threaded-pthread.c
new file mode 100644
index 0000000000..1de82cbe97
--- /dev/null
+++ b/elf/tst-single_threaded-pthread.c
@@ -0,0 +1,169 @@ 
+/* Test support for single-thread optimizations.  With threads.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stddef.h>
+#include <support/check.h>
+#include <support/namespace.h>
+#include <support/xdlfcn.h>
+#include <support/xthread.h>
+#include <sys/single_threaded.h>
+
+/* First barrier synchronizes main thread, thread 1, thread 2.  */
+static pthread_barrier_t barrier1;
+
+/* Second barrier synchronizes main thread, thread 2.  */
+static pthread_barrier_t barrier2;
+
+/* Defined in tst-single-threaded-mod1.so.  */
+_Bool single_threaded_1 (void);
+
+/* Initialized via dlsym.  */
+static _Bool (*single_threaded_2) (void);
+static _Bool (*single_threaded_3) (void);
+static _Bool (*single_threaded_4) (void);
+
+static void *
+threadfunc (void *closure)
+{
+  TEST_VERIFY (!__libc_single_threaded);
+  TEST_VERIFY (!single_threaded_1 ());
+  TEST_VERIFY (!single_threaded_2 ());
+
+  /* Wait until the main thread loads more functions.  */
+  xpthread_barrier_wait (&barrier1);
+
+  TEST_VERIFY (!__libc_single_threaded);
+  TEST_VERIFY (!single_threaded_1 ());
+  TEST_VERIFY (!single_threaded_2 ());
+  TEST_VERIFY (!single_threaded_3 ());
+  TEST_VERIFY (!single_threaded_4 ());
+
+  /* Second thread waits on second barrier, too.  */
+  if (closure != NULL)
+    xpthread_barrier_wait (&barrier2);
+  TEST_VERIFY (!__libc_single_threaded);
+  TEST_VERIFY (!single_threaded_1 ());
+  TEST_VERIFY (!single_threaded_2 ());
+  TEST_VERIFY (!single_threaded_3 ());
+  TEST_VERIFY (!single_threaded_4 ());
+
+  return NULL;
+}
+
+/* A subprocess is always single-threaded at first.  */
+static void
+subprocess (void *closure)
+{
+  TEST_VERIFY (__libc_single_threaded);
+  TEST_VERIFY (single_threaded_1 ());
+  if (single_threaded_2 != NULL)
+    TEST_VERIFY (single_threaded_2 ());
+  if (single_threaded_3 != NULL)
+    TEST_VERIFY (single_threaded_3 ());
+  if (single_threaded_4 != NULL)
+    TEST_VERIFY (single_threaded_4 ());
+}
+
+static int
+do_test (void)
+{
+  TEST_VERIFY (__libc_single_threaded);
+  TEST_VERIFY (single_threaded_1 ());
+  support_isolate_in_subprocess (subprocess, NULL);
+
+  void *handle_mod2 = xdlopen ("tst-single_threaded-mod2.so", RTLD_LAZY);
+  single_threaded_2 = xdlsym (handle_mod2, "single_threaded_2");
+  TEST_VERIFY (single_threaded_2 ());
+
+  /* Two threads plus main thread.  */
+  xpthread_barrier_init (&barrier1, NULL, 3);
+
+  /* Main thread and second thread.  */
+  xpthread_barrier_init (&barrier2, NULL, 2);
+
+  pthread_t thr1 = xpthread_create (NULL, threadfunc, NULL);
+  TEST_VERIFY (!__libc_single_threaded);
+  TEST_VERIFY (!single_threaded_1 ());
+  TEST_VERIFY (!single_threaded_2 ());
+  support_isolate_in_subprocess (subprocess, NULL);
+
+  pthread_t thr2 = xpthread_create (NULL, threadfunc, &thr2);
+  TEST_VERIFY (!__libc_single_threaded);
+  TEST_VERIFY (!single_threaded_1 ());
+  TEST_VERIFY (!single_threaded_2 ());
+  support_isolate_in_subprocess (subprocess, NULL);
+
+  /* Delayed library load, while already multi-threaded.  */
+  void *handle_mod3 = xdlopen ("tst-single_threaded-mod3.so", RTLD_LAZY);
+  single_threaded_3 = xdlsym (handle_mod3, "single_threaded_3");
+  TEST_VERIFY (!__libc_single_threaded);
+  TEST_VERIFY (!single_threaded_1 ());
+  TEST_VERIFY (!single_threaded_2 ());
+  TEST_VERIFY (!single_threaded_3 ());
+  support_isolate_in_subprocess (subprocess, NULL);
+
+  /* Same with dlmopen.  */
+  void *handle_mod4 = dlmopen (LM_ID_NEWLM, "tst-single_threaded-mod4.so",
+                               RTLD_LAZY);
+  single_threaded_4 = xdlsym (handle_mod4, "single_threaded_4");
+  TEST_VERIFY (!__libc_single_threaded);
+  TEST_VERIFY (!single_threaded_1 ());
+  TEST_VERIFY (!single_threaded_2 ());
+  TEST_VERIFY (!single_threaded_3 ());
+  TEST_VERIFY (!single_threaded_4 ());
+  support_isolate_in_subprocess (subprocess, NULL);
+
+  /* Run the newly loaded functions from the other threads as
+     well.  */
+  xpthread_barrier_wait (&barrier1);
+  TEST_VERIFY (!__libc_single_threaded);
+  TEST_VERIFY (!single_threaded_1 ());
+  TEST_VERIFY (!single_threaded_2 ());
+  TEST_VERIFY (!single_threaded_3 ());
+  TEST_VERIFY (!single_threaded_4 ());
+  support_isolate_in_subprocess (subprocess, NULL);
+
+  /* Join first thread.  This should not bring us back into
+     single-threaded mode.  */
+  xpthread_join (thr1);
+  TEST_VERIFY (!__libc_single_threaded);
+  TEST_VERIFY (!single_threaded_1 ());
+  TEST_VERIFY (!single_threaded_2 ());
+  TEST_VERIFY (!single_threaded_3 ());
+  TEST_VERIFY (!single_threaded_4 ());
+  support_isolate_in_subprocess (subprocess, NULL);
+
+  /* We should be back in single-threaded mode after joining both
+     threads.  */
+  xpthread_barrier_wait (&barrier2);
+  xpthread_join (thr2);
+  TEST_VERIFY (__libc_single_threaded);
+  TEST_VERIFY (single_threaded_1 ());
+  TEST_VERIFY (single_threaded_2 ());
+  TEST_VERIFY (single_threaded_3 ());
+  TEST_VERIFY (single_threaded_4 ());
+  support_isolate_in_subprocess (subprocess, NULL);
+
+  xdlclose (handle_mod4);
+  xdlclose (handle_mod3);
+  xdlclose (handle_mod2);
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/elf/tst-single_threaded-static-dlopen.c b/elf/tst-single_threaded-static-dlopen.c
new file mode 100644
index 0000000000..7e7847e7ab
--- /dev/null
+++ b/elf/tst-single_threaded-static-dlopen.c
@@ -0,0 +1,55 @@ 
+/* Test support for single-thread optimizations.  No threads, static dlopen.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* In a static dlopen scenario, the single-threaded optimization is
+   not possible because their is no globally shared dynamic linker
+   across all namespaces.  */
+
+#include <stddef.h>
+#include <support/check.h>
+#include <support/xdlfcn.h>
+#include <sys/single_threaded.h>
+
+static int
+do_test (void)
+{
+  TEST_VERIFY (__libc_single_threaded);
+
+  /* Defined in tst-single-threaded-mod1.o.  */
+  extern _Bool single_threaded_1 (void);
+  TEST_VERIFY (single_threaded_1 ());
+
+  /* Even after a failed dlopen, assume multi-threaded mode.  */
+  TEST_VERIFY (dlopen ("tst-single_threaded-does-not-exist.so", RTLD_LAZY)
+               == NULL);
+  TEST_VERIFY (!__libc_single_threaded);
+  TEST_VERIFY (!single_threaded_1 ());
+
+  void *handle_mod2 = xdlopen ("tst-single_threaded-mod2.so", RTLD_LAZY);
+  _Bool (*single_threaded_2) (void)
+    = xdlsym (handle_mod2, "single_threaded_2");
+  TEST_VERIFY (!__libc_single_threaded);
+  TEST_VERIFY (!single_threaded_1 ());
+  TEST_VERIFY (!single_threaded_2 ());
+
+  xdlclose (handle_mod2);
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/elf/tst-single_threaded-static.c b/elf/tst-single_threaded-static.c
new file mode 100644
index 0000000000..29d7ab2731
--- /dev/null
+++ b/elf/tst-single_threaded-static.c
@@ -0,0 +1,29 @@ 
+/* Test support for single-thread optimizations.  Static, no threads.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <support/check.h>
+#include <sys/single_threaded.h>
+
+static int
+do_test (void)
+{
+  TEST_VERIFY (__libc_single_threaded);
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/elf/tst-single_threaded.c b/elf/tst-single_threaded.c
new file mode 100644
index 0000000000..bcdcdefec9
--- /dev/null
+++ b/elf/tst-single_threaded.c
@@ -0,0 +1,67 @@ 
+/* Test support for single-thread optimizations.  No threads.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stddef.h>
+#include <support/check.h>
+#include <support/namespace.h>
+#include <support/xdlfcn.h>
+#include <sys/single_threaded.h>
+
+/* Defined in tst-single-threaded-mod1.so.  */
+extern _Bool single_threaded_1 (void);
+
+/* Initialized via dlsym.  */
+_Bool (*single_threaded_2) (void);
+_Bool (*single_threaded_3) (void);
+
+static void
+subprocess (void *closure)
+{
+  TEST_VERIFY (__libc_single_threaded);
+  TEST_VERIFY (single_threaded_1 ());
+  if (single_threaded_2 != NULL)
+    TEST_VERIFY (single_threaded_2 ());
+  if (single_threaded_3 != NULL)
+    TEST_VERIFY (single_threaded_3 ());
+}
+
+static int
+do_test (void)
+{
+  TEST_VERIFY (__libc_single_threaded);
+  TEST_VERIFY (single_threaded_1 ());
+  support_isolate_in_subprocess (subprocess, NULL);
+
+  void *handle_mod2 = xdlopen ("tst-single_threaded-mod2.so", RTLD_LAZY);
+  single_threaded_2 = xdlsym (handle_mod2, "single_threaded_2");
+  TEST_VERIFY (single_threaded_2 ());
+  support_isolate_in_subprocess (subprocess, NULL);
+
+  void *handle_mod3 = dlmopen (LM_ID_NEWLM, "tst-single_threaded-mod3.so",
+                               RTLD_LAZY);
+  single_threaded_3 = xdlsym (handle_mod3, "single_threaded_3");
+  TEST_VERIFY (single_threaded_3 ());
+  support_isolate_in_subprocess (subprocess, NULL);
+
+  xdlclose (handle_mod3);
+  xdlclose (handle_mod2);
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/include/link.h b/include/link.h
index 736e1d72ae..714e44d24c 100644
--- a/include/link.h
+++ b/include/link.h
@@ -324,6 +324,11 @@  struct link_map
     ElfW(Addr) l_relro_addr;
     size_t l_relro_size;
 
+    /* Address of the __libc_single_threaded variable for this object.
+       Initialized by __libc_single_threaded_register, used by
+       __dl_single_threaded_broadcast.  */
+    _Bool *l_single_threaded;
+
     unsigned long long int l_serial;
 
     /* Audit information.  This array apparent must be the last in the
diff --git a/include/sys/single_threaded.h b/include/sys/single_threaded.h
new file mode 100644
index 0000000000..c5aacaa7e8
--- /dev/null
+++ b/include/sys/single_threaded.h
@@ -0,0 +1,48 @@ 
+/* Support for single-thread optimizations; wrapper header.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <misc/sys/single_threaded.h>
+
+#ifndef _ISOMAC
+
+/* Called with a pointer to __libc_single_threaded by an ELF
+   constructor.  (See misc/single_threaded.c.)  SINGLE_THREAD_PTR is
+   stored in l_single_threaded member of the link map that contains
+   object at *SINGLE_THREAD_PTR.  Does nothing after a static
+   dlopen.  */
+void __libc_single_threaded_register (_Bool *single_thread_ptr);
+rtld_hidden_proto (__libc_single_threaded_register)
+
+/* Called in the threading library to adjust all the copies of the
+   single-threaded flag.  This must be called only while the process
+   is known to be (still) single-threaded.  In a dynamically linked
+   program, traverse all link maps and update all the registered
+   __libc_single_threaded_register variables.  Do nothing after static
+   dlopen.  In a statically linked program, if dlopen is not possible,
+   update the single __libc_single_threaded_register variable
+   directly.  */
+void __dl_single_threaded_broadcast (_Bool single_threaded) attribute_hidden;
+
+#ifndef SHARED
+/* If static dlopen has been called, we must be more conservative
+   about the single-thread optimization.  This function must be called
+   from dlopen, dlmopen, and internal dlopen.  */
+void __dl_single_threaded_dlopen_called (void) attribute_hidden;
+#endif
+
+#endif
diff --git a/manual/threads.texi b/manual/threads.texi
index 87fda7d8e7..50232d61d9 100644
--- a/manual/threads.texi
+++ b/manual/threads.texi
@@ -11,6 +11,7 @@  POSIX threads.
 @menu
 * ISO C Threads::	Threads based on the ISO C specification.
 * POSIX Threads::	Threads based on the POSIX specification.
+* Single-Threaded::     Detecting single-threaded execution.
 @end menu
 
 
@@ -669,6 +670,93 @@  The system does not have sufficient memory.
 @end table
 @end deftypefun
 
+@node Single-Threaded
+@section Detecting Single-Threaded Execution
+
+Multi-threaded programs require synchronization among threads.  This
+synchronization can be costly even if there is just a single thread
+and no data is shared between multiple processors.  For this reason,
+@theglibc{} offers an interface to detect whether the process is in
+single-threaded mode.  Applications can use this information to avoid
+synchronization, for example by using regular instructions to load and
+store memory instead of atomic instructions, or using relaxed memory
+ordering instead of stronger memory ordering.
+
+@deftypevar {_Bool} __libc_single_threaded
+@standards{GNU, sys/single_threaded.h}
+This variable is true if the current process is definitely
+single-threaded.  If it is false, the process can be multi-threaded,
+or @theglibc{} cannot determine at this point of the program execution
+whether the process is single-threaded or not.  Among other things,
+this can happen if termination of a detached thread returns the
+process to single-threaded mode.
+@c The value can also be incorrect when ELF constructors run.
+@c With static dlopen, __libc_single_threaded is conservatively
+@c set to false always because the inner and outer thread libraries
+@c are uncoordinated.
+
+Applications must never write to this variable.
+
+Applications should perform the same actions whether or not
+@code{__libc_single_threaded} is true, but only switch to a weaker
+memory ordering.  As a result, a process that becomes multi-threaded
+afterwards is already in the correct state.  For example, in order to
+increment a reference counter, the following code can be used:
+
+@smallexample
+if (__libc_single_threaded)
+  atomic_fetch_add (&reference_counter, 1, memory_order_relaxed);
+else
+  atomic_fetch_add (&reference_counter, 1, memory_order_acq_rel);
+@end smallexample
+
+This still requires some for synchronization for on the
+single-threaded branch, so it can be beneficial not to declare the
+reference counter as @code{_Atomic}, and use the GCC @code{__atomic}
+built-ins:
+
+@smallexample
+if (__libc_single_threaded)
+  ++refeference_counter;
+else
+  __atomic_fetch_add (&reference_counter, 1, __ATOMIC_ACQ_REL);
+@end smallexample
+
+Several functions in @theglibc{} can change the value of the
+@code{__libc_single_threaded} variable.  Creating new threads using
+the @code{pthread_create} or @code{thrd_create} function sets the
+variable to false.  Less obvious is that the @code{dlopen} function
+may also cause threads to be created if any of the loaded objects
+creates a thread from an ELF constructor.  Therefore, applications
+need to make a copy of the value of @code{__libc_single_threaded} if
+after such a function call, behavior must match the value as it was
+before the call, like this:
+
+@smallexample
+bool single_threaded = __libc_single_threaded;
+if (single_threaded)
+  prepare_single_threaded ();
+else
+  prepare_multi_thread ();
+
+void *handle = dlopen (shared_library_name, RTLD_NOW);
+lookup_symbols (handle);
+
+if (single_threaded)
+  cleanup_single_threaded ();
+else
+  cleanup_multi_thread ();
+@end smallexample
+
+Since the variable @code{__libc_single_threaded} can change from true
+to false during the execution of the program, it is not useful for
+selecting optimized function implementations in IFUNC resolvers.
+
+For performance reasons, multiple copies of the
+@code{__libc_single_threaded} variable, with different addresses, can
+exist in a program.
+@end deftypevar
+
 @c FIXME these are undocumented:
 @c pthread_atfork
 @c pthread_attr_destroy
diff --git a/misc/Makefile b/misc/Makefile
index cf0daa1161..d17135cd0f 100644
--- a/misc/Makefile
+++ b/misc/Makefile
@@ -36,7 +36,8 @@  headers	:= sys/uio.h bits/uio-ext.h bits/uio_lim.h \
 	   syslog.h sys/syslog.h \
 	   bits/syslog.h bits/syslog-ldbl.h bits/syslog-path.h bits/error.h \
 	   bits/select2.h bits/hwcap.h sys/auxv.h \
-	   sys/sysmacros.h bits/sysmacros.h bits/types/struct_iovec.h
+	   sys/sysmacros.h bits/sysmacros.h bits/types/struct_iovec.h \
+	   sys/single_threaded.h
 
 routines := brk sbrk sstk ioctl \
 	    readv writev preadv preadv64 pwritev pwritev64 \
@@ -71,7 +72,13 @@  routines := brk sbrk sstk ioctl \
 	    fgetxattr flistxattr fremovexattr fsetxattr getxattr \
 	    listxattr lgetxattr llistxattr lremovexattr lsetxattr \
 	    removexattr setxattr getauxval ifunc-impl-list makedev \
-	    allocate_once
+	    allocate_once single_threaded
+
+# single_threaded contains an ELF constructor and a hidden definition
+# and must be linked into applications.  The hidden variable
+# definition of __libc_single_threaded is an important performance
+# optimization for this functionality.
+static-only-routines += single_threaded
 
 generated += tst-error1.mtrace tst-error1-mem.out \
   tst-allocate_once.mtrace tst-allocate_once-mem.out
@@ -128,6 +135,10 @@  CFLAGS-msync.c += -fexceptions -fasynchronous-unwind-tables
 CFLAGS-fdatasync.c += -fexceptions -fasynchronous-unwind-tables
 CFLAGS-fsync.c += -fexceptions -fasynchronous-unwind-tables
 
+# This file needs a high-priority ELF constructor, which causes a GCC
+# warning.
+CFLAGS-single_threaded.c += -Wno-error
+
 # Called during static library initialization, so turn stack-protection
 # off for non-shared builds.
 CFLAGS-sbrk.o = $(no-stack-protector)
diff --git a/misc/single_threaded.c b/misc/single_threaded.c
new file mode 100644
index 0000000000..051e9565a9
--- /dev/null
+++ b/misc/single_threaded.c
@@ -0,0 +1,46 @@ 
+/* Support for single-thread optimizations; application-side part.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sys/single_threaded.h>
+
+#ifdef LIBC_NONSHARED
+#include <ldsodefs.h>
+
+/* Prevent multiple definitions because registering more than one
+   definition is not possible.  Conservatively assume that
+   multi-threading is active, in case the constructor runs too late,
+   before threads are created.  */
+_Bool __libc_single_threaded attribute_hidden __attribute__ ((nocommon));
+
+/* This runs as a high-priority constructor.  We do not have a
+   dependency on anything else outside the dynamic linker, so
+   performing the initialization early is safe.  */
+static __attribute__ ((constructor (0))) void
+init (void)
+{
+  __libc_single_threaded_register (&__libc_single_threaded);
+}
+
+#else /* !LIBC_NONSHARED */
+
+/* In the static case, the variable is updated directly, so we can
+   default to single-threaded mode because the program is initially
+   single-threaded.  */
+_Bool __libc_single_threaded attribute_hidden = 1;
+
+#endif /* !LIBC_NONSHARED */
diff --git a/misc/sys/single_threaded.h b/misc/sys/single_threaded.h
new file mode 100644
index 0000000000..37f94a9b70
--- /dev/null
+++ b/misc/sys/single_threaded.h
@@ -0,0 +1,39 @@ 
+/* Support for single-thread optimizations.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_SINGLE_THREAD_H
+#define _SYS_SINGLE_THREAD_H
+
+#include <features.h>
+
+__BEGIN_DECLS
+
+/* If this variable is 1, then the current thread is the only thread
+   in the process image.  If it is 0, the process can be
+   multi-threaded, or this variable has not yet been initialized.  */
+#ifdef __cplusplus
+extern bool __libc_single_threaded
+  __attribute__ ((__visibility__ ("hidden")));
+#else
+extern _Bool __libc_single_threaded
+  __attribute__ ((__visibility__ ("hidden")));
+#endif
+
+__END_DECLS
+
+#endif /* _SYS_SINGLE_THREAD_H */
diff --git a/nptl/Makefile b/nptl/Makefile
index 0e316edfac..5831feb02b 100644
--- a/nptl/Makefile
+++ b/nptl/Makefile
@@ -145,7 +145,8 @@  libpthread-routines = nptl-init nptlfreeres vars events version pt-interp \
 		      mtx_destroy mtx_init mtx_lock mtx_timedlock \
 		      mtx_trylock mtx_unlock call_once cnd_broadcast \
 		      cnd_destroy cnd_init cnd_signal cnd_timedwait cnd_wait \
-		      tss_create tss_delete tss_get tss_set pthread_mutex_conf
+		      tss_create tss_delete tss_get tss_set \
+		      pthread_mutex_conf nptl-single_threaded_broadcast
 #		      pthread_setuid pthread_seteuid pthread_setreuid \
 #		      pthread_setresuid \
 #		      pthread_setgid pthread_setegid pthread_setregid \
diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
index 0dbb155f70..664b7d2c4f 100644
--- a/nptl/allocatestack.c
+++ b/nptl/allocatestack.c
@@ -32,6 +32,7 @@ 
 #include <futex-internal.h>
 #include <kernel-features.h>
 #include <stack-aliasing.h>
+#include <sys/single_threaded.h>
 
 
 #ifndef NEED_SEPARATE_REGISTER_STACK
@@ -953,6 +954,10 @@  __reclaim_stacks (void)
 
   /* There is one thread running.  */
   atomic_store_relaxed (&__nptl_nthreads, 1);
+  /* Fork has acquired the loader lock before fork, so the dynamic
+     linker data structures are in a usable state and we can perform
+     the broadcast.  */
+  __dl_single_threaded_broadcast (true);
 
   in_flight_stack = 0;
 
diff --git a/nptl/nptl-single_threaded_broadcast.c b/nptl/nptl-single_threaded_broadcast.c
new file mode 100644
index 0000000000..1f7c3a689e
--- /dev/null
+++ b/nptl/nptl-single_threaded_broadcast.c
@@ -0,0 +1,21 @@ 
+/* Support for single-thread optimizations; broadcasting changes from NPTL.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifdef SHARED
+# include <elf/dl-single_threaded_broadcast.c>
+#endif
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index 15fbeceac7..329f27b8a8 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -34,6 +34,7 @@ 
 #include <futex-internal.h>
 #include <tls-setup.h>
 #include "libioP.h"
+#include <sys/single_threaded.h>
 
 #include <shlib-compat.h>
 
@@ -341,7 +342,7 @@  __nptl_main_thread_exited (void)
 {
   __nptl_deallocate_tsd ();
 
-  if (atomic_fetch_add_relaxed (&__nptl_nthreads, -1) > 1)
+  if (atomic_fetch_add_acq_rel (&__nptl_nthreads, -1) > 1)
     /* Exit the thread, but not the process.  */
     __exit_thread ();
 }
@@ -510,7 +511,7 @@  START_THREAD_DEFN
   /* If this is the last thread we terminate the process now.  We
      do not notify the debugger, it might just irritate it if there
      is no thread left.  */
-  if (atomic_fetch_add_relaxed (&__nptl_nthreads, -1) == 1)
+  if (atomic_fetch_add_acq_rel (&__nptl_nthreads, -1) == 1)
     /* This was the last thread.  */
     exit (0);
 
@@ -769,8 +770,15 @@  __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
         collect_default_sched (pd);
     }
 
+  /* Relaxed memory order is sufficient because we increment the
+     variable before actually creating the thread below, while still
+     single-threaded.  */
   if (__glibc_unlikely (atomic_load_relaxed (&__nptl_nthreads) == 1))
-    _IO_enable_locks ();
+    {
+      /* Transition to multi-threaded mode.  */
+      __dl_single_threaded_broadcast (false);
+      _IO_enable_locks ();
+    }
 
   /* Pass the descriptor to the caller.  */
   *newthread = (pthread_t) pd;
@@ -785,7 +793,7 @@  __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
      we momentarily store a false value; this doesn't matter because there
      is no kosher thing a signal handler interrupting us right here can do
      that cares whether the thread count is correct.  */
-  atomic_fetch_add_relaxed (&__nptl_nthreads, 1);
+  atomic_fetch_add_acq_rel (&__nptl_nthreads, 1);
 
   /* Our local value of stopped_start and thread_ran can be accessed at
      any time. The PD->stopped_start may only be accessed if we have
@@ -849,8 +857,12 @@  __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
 	  /* State (e) and we have ownership of PD (see CONCURRENCY
 	     NOTES above).  */
 
-	  /* Oops, we lied for a second.  */
-	  atomic_fetch_add_relaxed (&__nptl_nthreads, -1);
+	  /* Restore the old __nptl_threads value.  If process
+	     temporarily had two threads, it is single-threaded now,
+	     so go back into single-threaded mode.  */
+	  if (atomic_fetch_add_acq_rel (&__nptl_nthreads, -1) == 2)
+	    __dl_single_threaded_broadcast (true);
+
 
 	  /* Perhaps a thread wants to change the IDs and is waiting for this
 	     stillborn thread.  */
diff --git a/nptl/pthread_join_common.c b/nptl/pthread_join_common.c
index ecb78ffba5..9ae1b8817d 100644
--- a/nptl/pthread_join_common.c
+++ b/nptl/pthread_join_common.c
@@ -19,6 +19,7 @@ 
 #include "pthreadP.h"
 #include <atomic.h>
 #include <stap-probe.h>
+#include <sys/single_threaded.h>
 
 static void
 cleanup (void *arg)
@@ -101,6 +102,14 @@  __pthread_timedjoin_ex (pthread_t threadid, void **thread_return,
   else
     pd->joinid = NULL;
 
+  /* Synchronizes with the thread counter updates in pthread_create.c.
+     The thread we just joined may not be the same thread that brought
+     down the thread count to 1 (a detached thread might have exited
+     as well), so we need another form of synchronization.  */
+  if (atomic_load_acquire (&__nptl_nthreads) == 1)
+    /* We are back in single-threaded mode.  */
+    __dl_single_threaded_broadcast (true);
+
   LIBC_PROBE (pthread_join_ret, 3, threadid, result, pd->result);
 
   return result;
diff --git a/sysdeps/mach/hurd/i386/ld.abilist b/sysdeps/mach/hurd/i386/ld.abilist
index c76b913486..cac95fc1ca 100644
--- a/sysdeps/mach/hurd/i386/ld.abilist
+++ b/sysdeps/mach/hurd/i386/ld.abilist
@@ -22,4 +22,5 @@  GLIBC_2.2.6 malloc F
 GLIBC_2.2.6 realloc F
 GLIBC_2.3 ___tls_get_addr F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __libc_single_threaded_register F
 GLIBC_2.4 __stack_chk_guard D 0x4
diff --git a/sysdeps/unix/sysv/linux/aarch64/ld.abilist b/sysdeps/unix/sysv/linux/aarch64/ld.abilist
index 4ffe688649..2a4cb969dc 100644
--- a/sysdeps/unix/sysv/linux/aarch64/ld.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/ld.abilist
@@ -7,3 +7,4 @@  GLIBC_2.17 calloc F
 GLIBC_2.17 free F
 GLIBC_2.17 malloc F
 GLIBC_2.17 realloc F
+GLIBC_2.30 __libc_single_threaded_register F
diff --git a/sysdeps/unix/sysv/linux/alpha/ld.abilist b/sysdeps/unix/sysv/linux/alpha/ld.abilist
index 98b66edabf..2a15dd0d65 100644
--- a/sysdeps/unix/sysv/linux/alpha/ld.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/ld.abilist
@@ -6,4 +6,5 @@  GLIBC_2.0 realloc F
 GLIBC_2.1 __libc_stack_end D 0x8
 GLIBC_2.1 _dl_mcount F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __libc_single_threaded_register F
 GLIBC_2.4 __stack_chk_guard D 0x8
diff --git a/sysdeps/unix/sysv/linux/arm/ld.abilist b/sysdeps/unix/sysv/linux/arm/ld.abilist
index a301c6ebc4..85908675b8 100644
--- a/sysdeps/unix/sysv/linux/arm/ld.abilist
+++ b/sysdeps/unix/sysv/linux/arm/ld.abilist
@@ -1,3 +1,4 @@ 
+GLIBC_2.30 __libc_single_threaded_register F
 GLIBC_2.4 __libc_stack_end D 0x4
 GLIBC_2.4 __stack_chk_guard D 0x4
 GLIBC_2.4 __tls_get_addr F
diff --git a/sysdeps/unix/sysv/linux/csky/ld.abilist b/sysdeps/unix/sysv/linux/csky/ld.abilist
index 71576160ed..3d113a9e02 100644
--- a/sysdeps/unix/sysv/linux/csky/ld.abilist
+++ b/sysdeps/unix/sysv/linux/csky/ld.abilist
@@ -7,3 +7,4 @@  GLIBC_2.29 calloc F
 GLIBC_2.29 free F
 GLIBC_2.29 malloc F
 GLIBC_2.29 realloc F
+GLIBC_2.30 __libc_single_threaded_register F
diff --git a/sysdeps/unix/sysv/linux/hppa/ld.abilist b/sysdeps/unix/sysv/linux/hppa/ld.abilist
index 0387614d8f..2dd0b30ccf 100644
--- a/sysdeps/unix/sysv/linux/hppa/ld.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/ld.abilist
@@ -6,4 +6,5 @@  GLIBC_2.2 free F
 GLIBC_2.2 malloc F
 GLIBC_2.2 realloc F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __libc_single_threaded_register F
 GLIBC_2.4 __stack_chk_guard D 0x4
diff --git a/sysdeps/unix/sysv/linux/i386/ld.abilist b/sysdeps/unix/sysv/linux/i386/ld.abilist
index edb7307228..137ccfa6fe 100644
--- a/sysdeps/unix/sysv/linux/i386/ld.abilist
+++ b/sysdeps/unix/sysv/linux/i386/ld.abilist
@@ -7,3 +7,4 @@  GLIBC_2.1 __libc_stack_end D 0x4
 GLIBC_2.1 _dl_mcount F
 GLIBC_2.3 ___tls_get_addr F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __libc_single_threaded_register F
diff --git a/sysdeps/unix/sysv/linux/ia64/ld.abilist b/sysdeps/unix/sysv/linux/ia64/ld.abilist
index 82042472c3..3a22e06d46 100644
--- a/sysdeps/unix/sysv/linux/ia64/ld.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/ld.abilist
@@ -6,3 +6,4 @@  GLIBC_2.2 free F
 GLIBC_2.2 malloc F
 GLIBC_2.2 realloc F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __libc_single_threaded_register F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/ld.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/ld.abilist
index a301c6ebc4..85908675b8 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/ld.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/ld.abilist
@@ -1,3 +1,4 @@ 
+GLIBC_2.30 __libc_single_threaded_register F
 GLIBC_2.4 __libc_stack_end D 0x4
 GLIBC_2.4 __stack_chk_guard D 0x4
 GLIBC_2.4 __tls_get_addr F
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/ld.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/ld.abilist
index c9ec45cf1c..c6a92839d5 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/ld.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/ld.abilist
@@ -6,4 +6,5 @@  GLIBC_2.0 realloc F
 GLIBC_2.1 __libc_stack_end D 0x4
 GLIBC_2.1 _dl_mcount F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __libc_single_threaded_register F
 GLIBC_2.4 __stack_chk_guard D 0x4
diff --git a/sysdeps/unix/sysv/linux/microblaze/ld.abilist b/sysdeps/unix/sysv/linux/microblaze/ld.abilist
index aa0d71150a..30bfa0dfc8 100644
--- a/sysdeps/unix/sysv/linux/microblaze/ld.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/ld.abilist
@@ -7,3 +7,4 @@  GLIBC_2.18 calloc F
 GLIBC_2.18 free F
 GLIBC_2.18 malloc F
 GLIBC_2.18 realloc F
+GLIBC_2.30 __libc_single_threaded_register F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/ld.abilist b/sysdeps/unix/sysv/linux/mips/mips32/ld.abilist
index 55d48868e8..88d258477a 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/ld.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/ld.abilist
@@ -6,4 +6,5 @@  GLIBC_2.0 realloc F
 GLIBC_2.2 __libc_stack_end D 0x4
 GLIBC_2.2 _dl_mcount F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __libc_single_threaded_register F
 GLIBC_2.4 __stack_chk_guard D 0x4
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/ld.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/ld.abilist
index 55d48868e8..88d258477a 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/ld.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/ld.abilist
@@ -6,4 +6,5 @@  GLIBC_2.0 realloc F
 GLIBC_2.2 __libc_stack_end D 0x4
 GLIBC_2.2 _dl_mcount F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __libc_single_threaded_register F
 GLIBC_2.4 __stack_chk_guard D 0x4
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/ld.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/ld.abilist
index 44b345b7cf..aa8c957124 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/ld.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/ld.abilist
@@ -6,4 +6,5 @@  GLIBC_2.0 realloc F
 GLIBC_2.2 __libc_stack_end D 0x8
 GLIBC_2.2 _dl_mcount F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __libc_single_threaded_register F
 GLIBC_2.4 __stack_chk_guard D 0x8
diff --git a/sysdeps/unix/sysv/linux/nios2/ld.abilist b/sysdeps/unix/sysv/linux/nios2/ld.abilist
index 110f1039fa..05e1e42d75 100644
--- a/sysdeps/unix/sysv/linux/nios2/ld.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/ld.abilist
@@ -7,3 +7,4 @@  GLIBC_2.21 calloc F
 GLIBC_2.21 free F
 GLIBC_2.21 malloc F
 GLIBC_2.21 realloc F
+GLIBC_2.30 __libc_single_threaded_register F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist
index e8b0ea3a9b..44bc2f1ba6 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist
@@ -8,3 +8,4 @@  GLIBC_2.1 _dl_mcount F
 GLIBC_2.22 __tls_get_addr_opt F
 GLIBC_2.23 __parse_hwcap_and_convert_at_platform F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __libc_single_threaded_register F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/ld.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/ld.abilist
index edfc9ca56f..3c5d5b29da 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/ld.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/ld.abilist
@@ -8,3 +8,4 @@  GLIBC_2.3 calloc F
 GLIBC_2.3 free F
 GLIBC_2.3 malloc F
 GLIBC_2.3 realloc F
+GLIBC_2.30 __libc_single_threaded_register F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/ld.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/ld.abilist
index 37c8f6684b..3b664de3af 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/ld.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/ld.abilist
@@ -8,3 +8,4 @@  GLIBC_2.17 malloc F
 GLIBC_2.17 realloc F
 GLIBC_2.22 __tls_get_addr_opt F
 GLIBC_2.23 __parse_hwcap_and_convert_at_platform F
+GLIBC_2.30 __libc_single_threaded_register F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/ld.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/ld.abilist
index b411871d06..6325d16b6b 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv64/ld.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/ld.abilist
@@ -7,3 +7,4 @@  GLIBC_2.27 calloc F
 GLIBC_2.27 free F
 GLIBC_2.27 malloc F
 GLIBC_2.27 realloc F
+GLIBC_2.30 __libc_single_threaded_register F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/ld.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/ld.abilist
index 0576c9575e..3e932a57be 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/ld.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/ld.abilist
@@ -6,3 +6,4 @@  GLIBC_2.0 realloc F
 GLIBC_2.1 __libc_stack_end D 0x4
 GLIBC_2.1 _dl_mcount F
 GLIBC_2.3 __tls_get_offset F
+GLIBC_2.30 __libc_single_threaded_register F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/ld.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/ld.abilist
index 1fbb890d1d..7f6ff99a10 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/ld.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/ld.abilist
@@ -6,3 +6,4 @@  GLIBC_2.2 free F
 GLIBC_2.2 malloc F
 GLIBC_2.2 realloc F
 GLIBC_2.3 __tls_get_offset F
+GLIBC_2.30 __libc_single_threaded_register F
diff --git a/sysdeps/unix/sysv/linux/sh/ld.abilist b/sysdeps/unix/sysv/linux/sh/ld.abilist
index 0387614d8f..2dd0b30ccf 100644
--- a/sysdeps/unix/sysv/linux/sh/ld.abilist
+++ b/sysdeps/unix/sysv/linux/sh/ld.abilist
@@ -6,4 +6,5 @@  GLIBC_2.2 free F
 GLIBC_2.2 malloc F
 GLIBC_2.2 realloc F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __libc_single_threaded_register F
 GLIBC_2.4 __stack_chk_guard D 0x4
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/ld.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/ld.abilist
index fd0b33f86d..c229ffdec4 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/ld.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/ld.abilist
@@ -6,3 +6,4 @@  GLIBC_2.0 realloc F
 GLIBC_2.1 __libc_stack_end D 0x4
 GLIBC_2.1 _dl_mcount F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __libc_single_threaded_register F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/ld.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/ld.abilist
index 82042472c3..3a22e06d46 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/ld.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/ld.abilist
@@ -6,3 +6,4 @@  GLIBC_2.2 free F
 GLIBC_2.2 malloc F
 GLIBC_2.2 realloc F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __libc_single_threaded_register F
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/ld.abilist b/sysdeps/unix/sysv/linux/x86_64/64/ld.abilist
index 0dc9430611..6c288150e2 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/ld.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/ld.abilist
@@ -6,3 +6,4 @@  GLIBC_2.2.5 free F
 GLIBC_2.2.5 malloc F
 GLIBC_2.2.5 realloc F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __libc_single_threaded_register F
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/ld.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/ld.abilist
index 80f3161586..6fb9ec245a 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/ld.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/ld.abilist
@@ -6,3 +6,4 @@  GLIBC_2.16 calloc F
 GLIBC_2.16 free F
 GLIBC_2.16 malloc F
 GLIBC_2.16 realloc F
+GLIBC_2.30 __libc_single_threaded_register F