Avoid using up static TLS surplus for optimizations [BZ #25051]

Message ID 20200522144917.17999-1-szabolcs.nagy@arm.com
State Superseded
Delegated to: Carlos O'Donell
Headers
Series Avoid using up static TLS surplus for optimizations [BZ #25051] |

Commit Message

Szabolcs Nagy May 22, 2020, 2:49 p.m. UTC
  On some targets static TLS surplus area can be used opportunistically
for dynamically loaded modules such that the TLS access then becomes
faster (TLSDESC and powerpc TLS optimization). However we don't want
all surplus TLS to be used for this optimization because dynamically
loaded modules with initial-exec model TLS can only use surplus TLS.

Currently TLS_STATIC_SURPLUS is 1664 bytes which is not even enough to
cover the IE TLS needed for libc.so when DL_NNS (== 16) namespaces are
created, so to allow reliable use of dlmopen as well as dynamic TLS
optimizations a new contract is specified for use of static TLS:

- libc.so can have up to 192 bytes of IE TLS,
- other system libraries together can have up to 144 bytes of IE TLS.
- By default 512 bytes of "optional" static TLS is available for
  opportunistic use.
- By default at most 4 dlmopen namespaces are supported.

So the surplus TLS requirement is 3*192 + 4*144 + 512 = 1664 bytes
with dynamic linking (i.e. the same as before: the externally visible
behaviour is not changed, other than limiting static TLS use for
optimizations on affected targets.)

The optional TLS available for opportunistic use is now tunable
(dl.optional_static_tls), so users can directly affect the allocated
static TLS size. (Note that module unloading with dlclose does not
reclaim static TLS. After the optional TLS runs out, TLS access
is no longer optimized.)

Since users may need more dlmopen namespaces (5 .. 16) the maximum
supported number is now a tunable (dl.nns), when it is increased the
static TLS allocation increases according to the contract. If users
use more namespaces than the tunable setting, static TLS may run out.
Or if users dynamically load libraries with IE TLS beyond what's
allowed by the contract, static TLS may run out. These conditions are
not checked or enforced, but the user's responsibility.

Static linking used fixed 2048 bytes surplus TLS, this is changed
so the same contract is used as for dynamic linking.  However with
static linking DL_NNS == 1 so dl.nns tunable is forced to 1, so by
default the surplus TLS is reduced to 144 + 512 = 656 bytes. This
change is not expected to cause problems.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.

---

v4:
- Rebased and moved this log out of the commit message.
- Minor commit message wording changes.
v3:
- archived at
  https://sourceware.org/pipermail/libc-alpha/2020-March/111660.html
- Replace TLS_STATIC_SURPLUS with GLRO(dl_tls_static_surplus) and
  simplify related logic.
- In case of static linking, replace GL(dl_tls_static_size) with
  GLRO(dl_tls_static_surplus) in the code paths before the
  GL(dl_tls_static_size) value is actually computed.
- Update comments and the test code.
- Document the new tunables.
- Update description, mention static linking.
v2:
- Add dl.nns tunable.
- Add dl.optional_static_tls tunable.
- New surplus TLS usage contract that works reliably up to dl.nns
  namespaces.
---
 csu/libc-start.c           |   2 +
 csu/libc-tls.c             |  28 +++++------
 elf/Makefile               |  17 ++++++-
 elf/dl-reloc.c             |  37 ++++++++++----
 elf/dl-sysdep.c            |   2 +
 elf/dl-tls.c               |  56 +++++++++++++++++++--
 elf/dl-tunables.h          |   4 ++
 elf/dl-tunables.list       |  13 +++++
 elf/dynamic-link.h         |   5 +-
 elf/tst-tls-ie-mod.h       |  40 +++++++++++++++
 elf/tst-tls-ie-mod0.c      |   4 ++
 elf/tst-tls-ie-mod1.c      |   4 ++
 elf/tst-tls-ie-mod2.c      |   4 ++
 elf/tst-tls-ie-mod3.c      |   4 ++
 elf/tst-tls-ie-mod4.c      |   4 ++
 elf/tst-tls-ie-mod5.c      |   4 ++
 elf/tst-tls-ie.c           | 100 +++++++++++++++++++++++++++++++++++++
 manual/tunables.texi       |  28 +++++++++++
 sysdeps/generic/ldsodefs.h |   8 +++
 19 files changed, 332 insertions(+), 32 deletions(-)
 create mode 100644 elf/tst-tls-ie-mod.h
 create mode 100644 elf/tst-tls-ie-mod0.c
 create mode 100644 elf/tst-tls-ie-mod1.c
 create mode 100644 elf/tst-tls-ie-mod2.c
 create mode 100644 elf/tst-tls-ie-mod3.c
 create mode 100644 elf/tst-tls-ie-mod4.c
 create mode 100644 elf/tst-tls-ie-mod5.c
 create mode 100644 elf/tst-tls-ie.c
  

Comments

Florian Weimer June 4, 2020, 1:54 p.m. UTC | #1
* Szabolcs Nagy:

> +void
> +_dl_static_tls_tunables_init (void)
> +{
> +  size_t nns, opt_tls;
> +
> +#if HAVE_TUNABLES
> +  nns = TUNABLE_GET (nns, size_t, NULL);
> +  opt_tls = TUNABLE_GET (optional_static_tls, size_t, NULL);
> +#else
> +  /* Default values of the tunables.  */
> +  nns = 4;
> +  opt_tls = 512;
> +#endif
> +  if (nns > DL_NNS)
> +    nns = DL_NNS;
> +  GL(dl_tls_static_optional) = opt_tls;
> +  GLRO(dl_tls_static_surplus) = ((nns - 1) * LIBC_IE_TLS
> +				 + nns * OTHER_IE_TLS
> +				 + opt_tls);
> +}

I think the default 4 is incompatible with elf/tst-manyaudit.  The test
doesn't check that all specified audit modules have been loaded.

I think we can automatically increase the number of namespaces based on
the count of audit modules.  This can be a follow-up patch, and I can
work on this.

Thanks,
Florian
  
Carlos O'Donell June 4, 2020, 2:05 p.m. UTC | #2
On 6/4/20 9:54 AM, Florian Weimer via Libc-alpha wrote:
> * Szabolcs Nagy:
> 
>> +void
>> +_dl_static_tls_tunables_init (void)
>> +{
>> +  size_t nns, opt_tls;
>> +
>> +#if HAVE_TUNABLES
>> +  nns = TUNABLE_GET (nns, size_t, NULL);
>> +  opt_tls = TUNABLE_GET (optional_static_tls, size_t, NULL);
>> +#else
>> +  /* Default values of the tunables.  */
>> +  nns = 4;
>> +  opt_tls = 512;
>> +#endif
>> +  if (nns > DL_NNS)
>> +    nns = DL_NNS;
>> +  GL(dl_tls_static_optional) = opt_tls;
>> +  GLRO(dl_tls_static_surplus) = ((nns - 1) * LIBC_IE_TLS
>> +				 + nns * OTHER_IE_TLS
>> +				 + opt_tls);
>> +}
> 
> I think the default 4 is incompatible with elf/tst-manyaudit.  The test
> doesn't check that all specified audit modules have been loaded.
> 
> I think we can automatically increase the number of namespaces based on
> the count of audit modules.  This can be a follow-up patch, and I can
> work on this.

For context Florian and I were discussing this patch, along with the changes
Florian was making for late failure audit modules crashing (which itself
was related to rseq changes).

I agree with Florian that 4 is incompatible with elf/tst-manyaudit, and
my opinion is that if we can't load an audit module we should fail to start
the process. The user, via env vars, or via DT_AUDIT, has requested auditing
and auditing is an integral part of some customer workflows because it allows
modifying the loader's behaviour. To fail to load an audit module is a critical
process startup failure in my opinion. I could be convinced otherwise, but I'd
have to hear some fairly cogent arguments in that direction.

As a follow-up patch I'd like to see the number of audit modules get counted,
and then the number of namespaces increased, and then static tls adjusted
accordingly to make it work. Note that you'll have to set the nns tunable's
minimum value based on number of modules being requested, and it should
override the tunable if the tunable is too low. That is to say that a low
tunable value for nns is ignored if you specify lots of LD_AUDIT or DT_AUDIT
modules. The tunables are only a hint and if it's too low it will be ignored.
Setting the dynamic minimum is important because it allows you to run ld.so
with --print-tunables (something I've asked HJ to work fix for the x86
tunables that have a dynamic minimum also) it will show how many namespaces
is the minimum.
  
Florian Weimer June 4, 2020, 4:23 p.m. UTC | #3
One more high-level comment:

Would it be much work to separate the namespace number tuning from the
static TLS changes as far as optimizations are concerned?

I'd feel comfortable reviewing the namespace changes, but not the TLS
changes.

Thanks,
Florian
  
Szabolcs Nagy June 4, 2020, 7:55 p.m. UTC | #4
* Florian Weimer via Libc-alpha <libc-alpha@sourceware.org> [2020-06-04 18:23:03 +0200]:
> One more high-level comment:
> 
> Would it be much work to separate the namespace number tuning from the
> static TLS changes as far as optimizations are concerned?
> 
> I'd feel comfortable reviewing the namespace changes, but not the TLS
> changes.

i will try to break up the patch.
  
Carlos O'Donell June 18, 2020, 7:50 p.m. UTC | #5
On 5/22/20 10:49 AM, Szabolcs Nagy wrote:
> On some targets static TLS surplus area can be used opportunistically
> for dynamically loaded modules such that the TLS access then becomes
> faster (TLSDESC and powerpc TLS optimization). However we don't want
> all surplus TLS to be used for this optimization because dynamically
> loaded modules with initial-exec model TLS can only use surplus TLS.
> 
> Currently TLS_STATIC_SURPLUS is 1664 bytes which is not even enough to
> cover the IE TLS needed for libc.so when DL_NNS (== 16) namespaces are
> created, so to allow reliable use of dlmopen as well as dynamic TLS
> optimizations a new contract is specified for use of static TLS:
> 
> - libc.so can have up to 192 bytes of IE TLS,
> - other system libraries together can have up to 144 bytes of IE TLS.
> - By default 512 bytes of "optional" static TLS is available for
>   opportunistic use.
> - By default at most 4 dlmopen namespaces are supported.
> 
> So the surplus TLS requirement is 3*192 + 4*144 + 512 = 1664 bytes
> with dynamic linking (i.e. the same as before: the externally visible
> behaviour is not changed, other than limiting static TLS use for
> optimizations on affected targets.)
> 
> The optional TLS available for opportunistic use is now tunable
> (dl.optional_static_tls), so users can directly affect the allocated
> static TLS size. (Note that module unloading with dlclose does not
> reclaim static TLS. After the optional TLS runs out, TLS access
> is no longer optimized.)
> 
> Since users may need more dlmopen namespaces (5 .. 16) the maximum
> supported number is now a tunable (dl.nns), when it is increased the
> static TLS allocation increases according to the contract. If users
> use more namespaces than the tunable setting, static TLS may run out.
> Or if users dynamically load libraries with IE TLS beyond what's
> allowed by the contract, static TLS may run out. These conditions are
> not checked or enforced, but the user's responsibility.
> 
> Static linking used fixed 2048 bytes surplus TLS, this is changed
> so the same contract is used as for dynamic linking.  However with
> static linking DL_NNS == 1 so dl.nns tunable is forced to 1, so by
> default the surplus TLS is reduced to 144 + 512 = 656 bytes. This
> change is not expected to cause problems.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.

Tested on x86_64 and i686 and passes without regression.

I know you're splitting this patch into two patches, but I went ahead
and reviewed this anyway because it's in my queue. You can still split
the patches if you want and I can review that again quickly and ACK.
I wanted to get the bulk of the review done.

Subsequent followup after committing this:
- We need to fix tst-manyaudit.
- We should be able to count how many spaces we need based on LD_AUDIT
  or DT_AUDIT and enable up to that amount.

Post a v2 and I think we're almost done.

See my recommendations below.

Tested-by: Carlos O'Donell <carlos@redhat.com>
 
> ---
> 
> v4:
> - Rebased and moved this log out of the commit message.
> - Minor commit message wording changes.
> v3:
> - archived at
>   https://sourceware.org/pipermail/libc-alpha/2020-March/111660.html
> - Replace TLS_STATIC_SURPLUS with GLRO(dl_tls_static_surplus) and
>   simplify related logic.
> - In case of static linking, replace GL(dl_tls_static_size) with
>   GLRO(dl_tls_static_surplus) in the code paths before the
>   GL(dl_tls_static_size) value is actually computed.
> - Update comments and the test code.
> - Document the new tunables.
> - Update description, mention static linking.
> v2:
> - Add dl.nns tunable.
> - Add dl.optional_static_tls tunable.
> - New surplus TLS usage contract that works reliably up to dl.nns
>   namespaces.
> ---
>  csu/libc-start.c           |   2 +
>  csu/libc-tls.c             |  28 +++++------
>  elf/Makefile               |  17 ++++++-
>  elf/dl-reloc.c             |  37 ++++++++++----
>  elf/dl-sysdep.c            |   2 +
>  elf/dl-tls.c               |  56 +++++++++++++++++++--
>  elf/dl-tunables.h          |   4 ++
>  elf/dl-tunables.list       |  13 +++++
>  elf/dynamic-link.h         |   5 +-
>  elf/tst-tls-ie-mod.h       |  40 +++++++++++++++
>  elf/tst-tls-ie-mod0.c      |   4 ++
>  elf/tst-tls-ie-mod1.c      |   4 ++
>  elf/tst-tls-ie-mod2.c      |   4 ++
>  elf/tst-tls-ie-mod3.c      |   4 ++
>  elf/tst-tls-ie-mod4.c      |   4 ++
>  elf/tst-tls-ie-mod5.c      |   4 ++
>  elf/tst-tls-ie.c           | 100 +++++++++++++++++++++++++++++++++++++
>  manual/tunables.texi       |  28 +++++++++++
>  sysdeps/generic/ldsodefs.h |   8 +++
>  19 files changed, 332 insertions(+), 32 deletions(-)
>  create mode 100644 elf/tst-tls-ie-mod.h
>  create mode 100644 elf/tst-tls-ie-mod0.c
>  create mode 100644 elf/tst-tls-ie-mod1.c
>  create mode 100644 elf/tst-tls-ie-mod2.c
>  create mode 100644 elf/tst-tls-ie-mod3.c
>  create mode 100644 elf/tst-tls-ie-mod4.c
>  create mode 100644 elf/tst-tls-ie-mod5.c
>  create mode 100644 elf/tst-tls-ie.c
> 
> diff --git a/csu/libc-start.c b/csu/libc-start.c
> index 4005caf84a..2396956266 100644
> --- a/csu/libc-start.c
> +++ b/csu/libc-start.c
> @@ -190,6 +190,8 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
>  
>    __tunables_init (__environ);
>  
> +  _dl_static_tls_tunables_init ();

OK. We're starting to get an interesting set of initialization functions in LIBC_START_MAIN.

> +
>    ARCH_INIT_CPU_FEATURES ();
>  
>    /* Perform IREL{,A} relocations.  */
> diff --git a/csu/libc-tls.c b/csu/libc-tls.c
> index 73ade0fec5..62f0b0c8c3 100644
> --- a/csu/libc-tls.c
> +++ b/csu/libc-tls.c
> @@ -46,13 +46,19 @@ bool _dl_tls_dtv_gaps;
>  struct dtv_slotinfo_list *_dl_tls_dtv_slotinfo_list;
>  /* Number of modules in the static TLS block.  */
>  size_t _dl_tls_static_nelem;
> -/* Size of the static TLS block.  Giving this initialized value
> -   preallocates some surplus bytes in the static TLS area.  */
> -size_t _dl_tls_static_size = 2048;
> +/* Size of the static TLS block.  */
> +size_t _dl_tls_static_size;

OK. The size is no longer a static amount of space.

>  /* Size actually allocated in the static TLS block.  */
>  size_t _dl_tls_static_used;
>  /* Alignment requirement of the static TLS block.  */
>  size_t _dl_tls_static_align;
> +/* Size of surplus space in the static TLS area for dynamically
> +   loaded modules with IE-model TLS or for TLSDESC optimization.
> +   See comments in elf/dl-tls.c where it is initialized.  */
> +size_t _dl_tls_static_surplus;

OK. Good comment.

> +/* Remaining amount of static TLS that may be used for optimizing
> +   dynamic TLS access (e.g. with TLSDESC).  */
> +size_t _dl_tls_static_optional;

OK. Great comment. Good reference to TLSDESC.

>  
>  /* Generation counter for the dtv.  */
>  size_t _dl_tls_generation;
> @@ -81,10 +87,8 @@ init_slotinfo (void)
>  static void
>  init_static_tls (size_t memsz, size_t align)
>  {
> -  /* That is the size of the TLS memory for this object.  The initialized
> -     value of _dl_tls_static_size is provided by dl-open.c to request some
> -     surplus that permits dynamic loading of modules with IE-model TLS.  */
> -  GL(dl_tls_static_size) = roundup (memsz + GL(dl_tls_static_size),
> +  /* That is the size of the TLS memory for this object.  */
> +  GL(dl_tls_static_size) = roundup (memsz + GLRO(dl_tls_static_surplus),
>  				    TLS_TCB_ALIGN);

OK.

For the record it is OK to use TLS_TCB_ALIGN here because there will be a
TCB placed into this space and it should be large enough for that even
if the stack isn't.

However, later on we keep using TLS_TCB_SIZE as a kind of minimum, which
I find wrong, we need as much space as the module needs. But that's existing
code and we're not fixing that here.

>  #if TLS_TCB_AT_TP
>    GL(dl_tls_static_size) += TLS_TCB_SIZE;
> @@ -129,21 +133,17 @@ __libc_setup_tls (void)
>       'errno'.  Therefore we avoid 'malloc' which might touch 'errno'.
>       Instead we use 'sbrk' which would only uses 'errno' if it fails.
>       In this case we are right away out of memory and the user gets
> -     what she/he deserves.
> -
> -     The initialized value of _dl_tls_static_size is provided by dl-open.c
> -     to request some surplus that permits dynamic loading of modules with
> -     IE-model TLS.  */
> +     what she/he deserves.  */

OK.

>  #if TLS_TCB_AT_TP
>    /* Align the TCB offset to the maximum alignment, as
>       _dl_allocate_tls_storage (in elf/dl-tls.c) does using __libc_memalign
>       and dl_tls_static_align.  */
> -  tcb_offset = roundup (memsz + GL(dl_tls_static_size), max_align);
> +  tcb_offset = roundup (memsz + GLRO(dl_tls_static_surplus), max_align);

OK.

>    tlsblock = __sbrk (tcb_offset + TLS_INIT_TCB_SIZE + max_align);
>  #elif TLS_DTV_AT_TP
>    tcb_offset = roundup (TLS_INIT_TCB_SIZE, align ?: 1);
>    tlsblock = __sbrk (tcb_offset + memsz + max_align
> -		     + TLS_PRE_TCB_SIZE + GL(dl_tls_static_size));
> +		     + TLS_PRE_TCB_SIZE + GLRO(dl_tls_static_surplus));

OK.

>    tlsblock += TLS_PRE_TCB_SIZE;
>  #else
>    /* In case a model with a different layout for the TCB and DTV
> diff --git a/elf/Makefile b/elf/Makefile
> index 6fe1df90bb..b8bde1f47d 100644
> --- a/elf/Makefile
> +++ b/elf/Makefile
> @@ -204,7 +204,8 @@ tests += restest1 preloadtest loadfail multiload origtest resolvfail \
>  	 tst-dlopen-self tst-auditmany tst-initfinilazyfail tst-dlopenfail \
>  	 tst-dlopenfail-2 \
>  	 tst-filterobj tst-filterobj-dlopen tst-auxobj tst-auxobj-dlopen \
> -	 tst-audit14 tst-audit15 tst-audit16
> +	 tst-audit14 tst-audit15 tst-audit16 \
> +	 tst-tls-ie

OK. New test tst-tls-ie.

>  #	 reldep9
>  tests-internal += loadtest unload unload2 circleload1 \
>  	 neededtest neededtest2 neededtest3 neededtest4 \
> @@ -317,7 +318,10 @@ modules-names = testobj1 testobj2 testobj3 testobj4 testobj5 testobj6 \
>  		tst-dlopenfailmod1 tst-dlopenfaillinkmod tst-dlopenfailmod2 \
>  		tst-dlopenfailmod3 tst-ldconfig-ld-mod \
>  		tst-filterobj-flt tst-filterobj-aux tst-filterobj-filtee \
> -		tst-auditlogmod-1 tst-auditlogmod-2 tst-auditlogmod-3
> +		tst-auditlogmod-1 tst-auditlogmod-2 tst-auditlogmod-3 \
> +		tst-tls-ie-mod0 tst-tls-ie-mod1 tst-tls-ie-mod2 \
> +		tst-tls-ie-mod3 tst-tls-ie-mod4 tst-tls-ie-mod5

OK. Five new modules. Tests the new limit at 4 namespaces.

> +
>  # Most modules build with _ISOMAC defined, but those filtered out
>  # depend on internal headers.
>  modules-names-tests = $(filter-out ifuncmod% tst-libc_dlvsym-dso tst-tlsmod%,\
> @@ -1748,3 +1752,12 @@ $(objpfx)tst-auxobj: $(objpfx)tst-filterobj-aux.so
>  $(objpfx)tst-auxobj-dlopen: $(libdl)
>  $(objpfx)tst-auxobj.out: $(objpfx)tst-filterobj-filtee.so
>  $(objpfx)tst-auxobj-dlopen.out: $(objpfx)tst-filterobj-filtee.so
> +
> +$(objpfx)tst-tls-ie: $(libdl) $(shared-thread-library)
> +$(objpfx)tst-tls-ie.out: \
> +  $(objpfx)tst-tls-ie-mod0.so \
> +  $(objpfx)tst-tls-ie-mod1.so \
> +  $(objpfx)tst-tls-ie-mod2.so \
> +  $(objpfx)tst-tls-ie-mod3.so \
> +  $(objpfx)tst-tls-ie-mod4.so \
> +  $(objpfx)tst-tls-ie-mod5.so

OK.

> diff --git a/elf/dl-reloc.c b/elf/dl-reloc.c
> index ffcc84d396..6d32e49467 100644
> --- a/elf/dl-reloc.c
> +++ b/elf/dl-reloc.c
> @@ -39,13 +39,16 @@
>  /* We are trying to perform a static TLS relocation in MAP, but it was
>     dynamically loaded.  This can only work if there is enough surplus in
>     the static TLS area already allocated for each running thread.  If this
> -   object's TLS segment is too big to fit, we fail.  If it fits,
> -   we set MAP->l_tls_offset and return.
> -   This function intentionally does not return any value but signals error
> -   directly, as static TLS should be rare and code handling it should
> -   not be inlined as much as possible.  */
> +   object's TLS segment is too big to fit, we fail with -1.  If it fits,
> +   we set MAP->l_tls_offset and return 0.
> +   A portion of the surplus static TLS can be optionally used to optimize
> +   dynamic TLS access (with TLSDESC or powerpc TLS optimizations).
> +   If OPTIONAL is true then TLS is allocated for such optimization and
> +   the caller must have a fallback in case the optional portion of surplus
> +   TLS runs out.  If OPTIONAL is false then the entire surplus TLS area is
> +   considered and the allocation only fails if that runs out.  */

OK. Good menion of both TLSDESC and powerpc TLS optimizations (which are also
unique).

>  int
> -_dl_try_allocate_static_tls (struct link_map *map)
> +_dl_try_allocate_static_tls (struct link_map *map, bool optional)

OK. New optional.

>  {
>    /* If we've already used the variable with dynamic access, or if the
>       alignment requirements are too high, fail.  */
> @@ -68,8 +71,14 @@ _dl_try_allocate_static_tls (struct link_map *map)
>  
>    size_t n = (freebytes - blsize) / map->l_tls_align;

This is the hard part to review. There isn't much information on the exact
way in which we calculate this layout and how we make use of the space.
I'm not holding you accountable here, you're just fixing something which is
wrong my making some minimal changes, but it is difficult to audit all the
surrounding code when it all looks very odd :-)

We do it here for TLS_TCB_AT_TP.

>  
> -  size_t offset = GL(dl_tls_static_used) + (freebytes - n * map->l_tls_align
> -					    - map->l_tls_firstbyte_offset);
> +  /* Account optional static TLS surplus usage.  */
> +  size_t use = freebytes - n * map->l_tls_align - map->l_tls_firstbyte_offset;

OK. In the previous calculation we computed offset and stored this into the
adjusted GL(dl_tls_static_used) which is actually an offset (needs renaming).

All you do here is compute "use" first, look to see if the requirement is
greater than the optional static space and fail if is and if we were going
to use the optional space.

> +  if (optional && use > GL(dl_tls_static_optional))
> +    goto fail;
> +  else if (optional)
> +    GL(dl_tls_static_optional) -= use;

If we use this space up we account for it by reducing optional.

> +
> +  size_t offset = GL(dl_tls_static_used) + use;

Then we adjust offset like we were doing before. All so good so far.

>  
>    map->l_tls_offset = GL(dl_tls_static_used) = offset;
>  #elif TLS_DTV_AT_TP
> @@ -83,6 +92,13 @@ _dl_try_allocate_static_tls (struct link_map *map)
>    if (used > GL(dl_tls_static_size))
>      goto fail;

This last code you can see here is computing to see if we used more
space than we had for the static reserve.

>  
> +  /* Account optional static TLS surplus usage.  */

We do it again for TLS_DTV_AT_TP

> +  size_t use = used - GL(dl_tls_static_used);
> +  if (optional && use > GL(dl_tls_static_optional))
> +    goto fail;
> +  else if (optional)
> +    GL(dl_tls_static_optional) -= use;

Simpler math here to compute the value.

> +
>    map->l_tls_offset = offset;
>    map->l_tls_firstbyte_offset = GL(dl_tls_static_used);
>    GL(dl_tls_static_used) = used;
> @@ -110,12 +126,15 @@ _dl_try_allocate_static_tls (struct link_map *map)
>    return 0;
>  }
>  
> +/* This function intentionally does not return any value but signals error
> +   directly, as static TLS should be rare and code handling it should
> +   not be inlined as much as possible.  */

OK.

>  void
>  __attribute_noinline__
>  _dl_allocate_static_tls (struct link_map *map)
>  {
>    if (map->l_tls_offset == FORCED_DYNAMIC_TLS_OFFSET
> -      || _dl_try_allocate_static_tls (map))
> +      || _dl_try_allocate_static_tls (map, false))

OK.

>      {
>        _dl_signal_error (0, map->l_name, NULL, N_("\
>  cannot allocate memory in static TLS block"));
> diff --git a/elf/dl-sysdep.c b/elf/dl-sysdep.c
> index 854570821c..68a780dcbd 100644
> --- a/elf/dl-sysdep.c
> +++ b/elf/dl-sysdep.c
> @@ -222,6 +222,8 @@ _dl_sysdep_start (void **start_argptr,
>  
>    __tunables_init (_environ);
>  
> +  _dl_static_tls_tunables_init ();

OK.

> +
>  #ifdef DL_SYSDEP_INIT
>    DL_SYSDEP_INIT;
>  #endif
> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
> index fa03234610..740e33ea91 100644
> --- a/elf/dl-tls.c
> +++ b/elf/dl-tls.c
> @@ -29,10 +29,55 @@
>  #include <dl-tls.h>
>  #include <ldsodefs.h>
>  
> -/* Amount of excess space to allocate in the static TLS area
> -   to allow dynamic loading of modules defining IE-model TLS data.  */
> -#define TLS_STATIC_SURPLUS	64 + DL_NNS * 100
> +#define TUNABLE_NAMESPACE dl
> +#include <dl-tunables.h>
> +
> +/* Surplus static TLS, GLRO(dl_tls_static_surplus), is used for
> +
> +   - IE TLS in libc.so for all dlmopen namespaces except in the initial
> +     one where libc.so is not loaded dynamically but at startup time,
> +   - IE TLS in other libraries which may be dynamically loaded even in the
> +     initial namespace,
> +   - and optionally for optimizing dynamic TLS access.
> +
> +   The maximum number of namespaces is DL_NNS, but to support that many
> +   namespaces correctly the static TLS allocation should be significantly
> +   increased, which may cause problems with small thread stacks due to the
> +   way static TLS is accounted (bug 11787).
> +
> +   So there is a dl.nns tunable limit on the number of supported namespaces
> +   that affects the size of the static TLS and by default it's small enough
> +   not to cause problems with existing applications. The limit is not
> +   enforced or checked: it is the user's responsibility to increase dl.nns
> +   if more dlmopen namespaces are used.  */
> +
> +/* Size of initial-exec TLS in libc.so.  */
> +#define LIBC_IE_TLS 192

OK.

> +/* Size of initial-exec TLS in libraries other than libc.so.
> +   This should be large enough to cover runtime libraries of the
> +   compiler such as libgomp and libraries in libc other than libc.so.  */
> +#define OTHER_IE_TLS 144

OK.

>  
> +void
> +_dl_static_tls_tunables_init (void)
> +{
> +  size_t nns, opt_tls;
> +
> +#if HAVE_TUNABLES
> +  nns = TUNABLE_GET (nns, size_t, NULL);
> +  opt_tls = TUNABLE_GET (optional_static_tls, size_t, NULL);
> +#else
> +  /* Default values of the tunables.  */
> +  nns = 4;
> +  opt_tls = 512;
> +#endif
> +  if (nns > DL_NNS)
> +    nns = DL_NNS;

OK. We don't yet cleanup DL_NNS to make this dynamic. I'm OK with that.

> +  GL(dl_tls_static_optional) = opt_tls;

OK.

> +  GLRO(dl_tls_static_surplus) = ((nns - 1) * LIBC_IE_TLS
> +				 + nns * OTHER_IE_TLS
> +				 + opt_tls);

OK.

> +}
>  
>  /* Out-of-memory handler.  */
>  static void
> @@ -218,7 +263,8 @@ _dl_determine_tlsoffset (void)
>      }
>  
>    GL(dl_tls_static_used) = offset;
> -  GL(dl_tls_static_size) = (roundup (offset + TLS_STATIC_SURPLUS, max_align)
> +  GL(dl_tls_static_size) = (roundup (offset + GLRO(dl_tls_static_surplus),
> +				     max_align)

OK.

>  			    + TLS_TCB_SIZE);
>  #elif TLS_DTV_AT_TP
>    /* The TLS blocks start right after the TCB.  */
> @@ -262,7 +308,7 @@ _dl_determine_tlsoffset (void)
>      }
>  
>    GL(dl_tls_static_used) = offset;
> -  GL(dl_tls_static_size) = roundup (offset + TLS_STATIC_SURPLUS,
> +  GL(dl_tls_static_size) = roundup (offset + GLRO(dl_tls_static_surplus),

OK.

>  				    TLS_TCB_ALIGN);
>  #else
>  # error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
> diff --git a/elf/dl-tunables.h b/elf/dl-tunables.h
> index 969e50327b..678f447e09 100644
> --- a/elf/dl-tunables.h
> +++ b/elf/dl-tunables.h
> @@ -128,4 +128,8 @@ tunable_is_name (const char *orig, const char *envname)
>  }
>  
>  #endif
> +
> +/* Initializers of tunables in the dl tunable namespace.  */
> +void _dl_static_tls_tunables_init (void);

OK.

> +
>  #endif
> diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
> index 0d398dd251..ce46f28c7a 100644
> --- a/elf/dl-tunables.list
> +++ b/elf/dl-tunables.list
> @@ -126,4 +126,17 @@ glibc {
>        default: 3
>      }
>    }
> +
> +  dl {
> +    nns {
> +      type: SIZE_T
> +      minval: 1
> +      maxval: 16

OK. Matches DL_NNS still used in the code in a lot of places.

> +      default: 4
> +    }
> +    optional_static_tls {
> +      type: SIZE_T
> +      default: 512

OK, though minval: 0 might be a good limit. No maximum. SXID_ERASE is good.

> +    }
> +  }
>  }
> diff --git a/elf/dynamic-link.h b/elf/dynamic-link.h
> index bb7a66f4cd..6727233e1a 100644
> --- a/elf/dynamic-link.h
> +++ b/elf/dynamic-link.h
> @@ -40,9 +40,10 @@
>      (__builtin_expect ((sym_map)->l_tls_offset				\
>  		       != FORCED_DYNAMIC_TLS_OFFSET, 1)			\
>       && (__builtin_expect ((sym_map)->l_tls_offset != NO_TLS_OFFSET, 1)	\
> -	 || _dl_try_allocate_static_tls (sym_map) == 0))
> +	 || _dl_try_allocate_static_tls (sym_map, true) == 0))

OK, yes TRY_STATIC_TLS is the only place we should set optional==true.

>  
> -int _dl_try_allocate_static_tls (struct link_map *map) attribute_hidden;
> +int _dl_try_allocate_static_tls (struct link_map *map, bool optional)
> +  attribute_hidden;

OK.

>  
>  #include <elf.h>
>  
> diff --git a/elf/tst-tls-ie-mod.h b/elf/tst-tls-ie-mod.h
> new file mode 100644
> index 0000000000..46b362a9b7
> --- /dev/null
> +++ b/elf/tst-tls-ie-mod.h
> @@ -0,0 +1,40 @@
> +/* Module with specified TLS size and model.
> +   Copyright (C) 2020 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +/* This file is parameterized by macros N, SIZE and MODEL.  */

OK.

> +
> +#include <stdio.h>
> +#include <string.h>
> +
> +#define CONCATX(x, y) x ## y
> +#define CONCAT(x, y) CONCATX (x, y)
> +#define STRX(x) #x
> +#define STR(x) STRX (x)
> +
> +#define VAR CONCAT (var, N)
> +
> +__attribute__ ((aligned (8), tls_model (MODEL)))
> +__thread char VAR[SIZE];
> +
> +void
> +CONCAT (access, N) (void)
> +{
> +  printf (STR (VAR) "[%d]:\t %p .. %p " MODEL "\n", SIZE, VAR, VAR + SIZE);
> +  fflush (stdout);
> +  memset (VAR, 1, SIZE);
> +}

OK.

> diff --git a/elf/tst-tls-ie-mod0.c b/elf/tst-tls-ie-mod0.c
> new file mode 100644
> index 0000000000..2450686e40
> --- /dev/null
> +++ b/elf/tst-tls-ie-mod0.c
> @@ -0,0 +1,4 @@
> +#define N 0
> +#define SIZE 480
> +#define MODEL "global-dynamic"
> +#include "tst-tls-ie-mod.h"

OK.

> diff --git a/elf/tst-tls-ie-mod1.c b/elf/tst-tls-ie-mod1.c
> new file mode 100644
> index 0000000000..849ff91e53
> --- /dev/null
> +++ b/elf/tst-tls-ie-mod1.c
> @@ -0,0 +1,4 @@
> +#define N 1
> +#define SIZE 120
> +#define MODEL "global-dynamic"
> +#include "tst-tls-ie-mod.h"

OK.

> diff --git a/elf/tst-tls-ie-mod2.c b/elf/tst-tls-ie-mod2.c
> new file mode 100644
> index 0000000000..23915ab67b
> --- /dev/null
> +++ b/elf/tst-tls-ie-mod2.c
> @@ -0,0 +1,4 @@
> +#define N 2
> +#define SIZE 24
> +#define MODEL "global-dynamic"

OK.

> +#include "tst-tls-ie-mod.h"
> diff --git a/elf/tst-tls-ie-mod3.c b/elf/tst-tls-ie-mod3.c
> new file mode 100644
> index 0000000000..5395f844a5
> --- /dev/null
> +++ b/elf/tst-tls-ie-mod3.c
> @@ -0,0 +1,4 @@
> +#define N 3
> +#define SIZE 16
> +#define MODEL "global-dynamic"
> +#include "tst-tls-ie-mod.h"

OK.

> diff --git a/elf/tst-tls-ie-mod4.c b/elf/tst-tls-ie-mod4.c
> new file mode 100644
> index 0000000000..93ac2eacae
> --- /dev/null
> +++ b/elf/tst-tls-ie-mod4.c
> @@ -0,0 +1,4 @@
> +#define N 4
> +#define SIZE 1024
> +#define MODEL "initial-exec"
> +#include "tst-tls-ie-mod.h"

OK. Large IE size allocation.

> diff --git a/elf/tst-tls-ie-mod5.c b/elf/tst-tls-ie-mod5.c
> new file mode 100644
> index 0000000000..84b3fd285b
> --- /dev/null
> +++ b/elf/tst-tls-ie-mod5.c
> @@ -0,0 +1,4 @@
> +#define N 5
> +#define SIZE 128
> +#define MODEL "initial-exec"

OK. Smaller IE size allocation. Exactly 1152 with this allocation.

> +#include "tst-tls-ie-mod.h"
> diff --git a/elf/tst-tls-ie.c b/elf/tst-tls-ie.c
> new file mode 100644
> index 0000000000..2f00a2936d
> --- /dev/null
> +++ b/elf/tst-tls-ie.c
> @@ -0,0 +1,100 @@
> +/* Test dlopen of modules with initial-exec TLS.
> +   Copyright (C) 2016-2020 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +/* This test tries to check that surplus static TLS is not used up for
> +   dynamic TLS optimizations and 3*192 + 4*144 = 1152 bytes of static
> +   TLS is available for dlopening modules with initial-exec TLS.  It
> +   depends on dl.nns=4 and dl.optional_static_tls=512 tunable setting.  */

OK. Good comment.

> +
> +#include <errno.h>
> +#include <pthread.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +
> +static int do_test (void);
> +#include <support/xthread.h>
> +#include <support/xdlfcn.h>
> +#include <support/test-driver.c>
> +
> +/* Have some big TLS in the main exe: should not use surplus TLS.  */

OK. Correct, static linker will size this appropriately for the bianry.

> +__thread char maintls[1000];

OK. This will be IE-mode.

> +
> +static pthread_barrier_t barrier;
> +
> +/* Forces multi-threaded behaviour.  */
> +static void *
> +blocked_thread_func (void *closure)
> +{
> +  xpthread_barrier_wait (&barrier);
> +  /* TLS load and access tests run here in the main thread.  */
> +  xpthread_barrier_wait (&barrier);
> +  return NULL;
> +}
> +
> +static void *
> +load_and_access (const char *mod, const char *func)
> +{
> +  /* Load module with TLS.  */
> +  void *p = xdlopen (mod, RTLD_NOW);
> +  /* Access the TLS variable to ensure it is allocated.  */
> +  void (*f) (void) = (void (*) (void))xdlsym (p, func);
> +  f ();
> +  return p;

OK.

> +}
> +
> +static int
> +do_test (void)
> +{
> +  void *mods[6];
> +
> +  {
> +    int ret = pthread_barrier_init (&barrier, NULL, 2);
> +    if (ret != 0)
> +      {
> +        errno = ret;
> +        printf ("error: pthread_barrier_init: %m\n");
> +        exit (1);
> +      }
> +  }
> +
> +  pthread_t blocked_thread = xpthread_create (NULL, blocked_thread_func, NULL);
> +  xpthread_barrier_wait (&barrier);
> +
> +  printf ("maintls[%zu]:\t %p .. %p\n",
> +	   sizeof maintls, maintls, maintls + sizeof maintls);
> +  memset (maintls, 1, sizeof maintls);
> +
> +  /* Load modules with dynamic TLS (may use surplus TLS opportunistically).  */

Better to say "may use surplus *static* TLS ..."

> +  mods[0] = load_and_access ("tst-tls-ie-mod0.so", "access0");
> +  mods[1] = load_and_access ("tst-tls-ie-mod1.so", "access1");
> +  mods[2] = load_and_access ("tst-tls-ie-mod2.so", "access2");
> +  mods[3] = load_and_access ("tst-tls-ie-mod3.so", "access3");

OK.

> +  /* Load modules with initial-exec TLS (can only use surplus TLS).  */

Better to say "use surplus static TLS..."

> +  mods[4] = load_and_access ("tst-tls-ie-mod4.so", "access4");
> +  mods[5] = load_and_access ("tst-tls-ie-mod5.so", "access5");

Why can't we load a 6th module and test that it fails?

Does it terminate the process or just return a dlopen failure?

> +
> +  xpthread_barrier_wait (&barrier);
> +  xpthread_join (blocked_thread);
> +
> +  /* Close the modules.  */
> +  for (int i = 0; i < 6; ++i)
> +    xdlclose (mods[i]);
> +
> +  return 0;
> +}
> diff --git a/manual/tunables.texi b/manual/tunables.texi
> index ec18b10834..437fdadff0 100644
> --- a/manual/tunables.texi
> +++ b/manual/tunables.texi
> @@ -31,6 +31,7 @@ their own namespace.
>  @menu
>  * Tunable names::  The structure of a tunable name
>  * Memory Allocation Tunables::  Tunables in the memory allocation subsystem
> +* Dynamic Linking Tunables:: Tunables in the dynamic linking subsystem

OK.

>  * Elision Tunables::  Tunables in elision subsystem>  * POSIX Thread Tunables:: Tunables in the POSIX thread subsystem
>  * Hardware Capability Tunables::  Tunables that modify the hardware
> @@ -226,6 +227,33 @@ pointer, so add 4 on 32-bit systems or 8 on 64-bit systems to the size
>  passed to @code{malloc} for the largest bin size to enable.
>  @end deftp
>  
> +@node Dynamic Linking Tunables
> +@section Dynamic Linking Tunables
> +@cindex dynamic linking tunables
> +@cindex dl tunables
> +
> +@deftp {Tunable namespace} glibc.dl

OK.

> +Dynamic linker behavior can be modified by setting the
> +following tunables in the @code{dl} namespace:
> +@end deftp
> +
> +@deftp Tunable glibc.dl.nns
> +Sets the number of supported dynamic link namespaces for which enough
> +static TLS is allocated (see @code{dlmopen}).  If more namespaces are
> +created then static TLS may run out at @code{dlopen} or @code{dlmopen}
> +time which is a non-recoverable failure.  Currently this limit can be
> +set between 1 and 16 inclusive, the default is 4. If the limit is
> +increased then internally more static TLS is allocated to accomodate
> +system libraries with initial-exec TLS in all namespaces.

Suggest:

Sets the number of supported dynamic link namespaces (see @code{dlmopen}).
Currently this limit can be set between 1 and 16 inclusive, the default is 4.
Each link namespace consumes some memory in all thread, and thus raising the
limit will increase the amount of memory each thread uses. Raising the limit
is useful when your application uses more than 4 dynamic linker audit modules
e.g. LD_AUDIT, or will use more than 4 dynamic link namespaces as created
by @code{dlmopen} with an lmid argument of @code{LM_ID_NEWLM}.

---

IMO we should avoid talking about the implementation details of IE TLS. We
need only talk about the fact that using more static link namespaces takes
up more memory and explain why you would lift the limit.

> +@end deftp
> +
> +@deftp Tunable glibc.dl.optional_static_tls
> +Sets the amount of surplus static TLS that may be used for optimizing
> +dynamic TLS access (only works on certain platforms, e.g. TLSDESC can
> +be optimized this way). The internal allocation of static TLS is
> +increased by this amount, the default is 512.

Suggest:

Sets the amount of surplus static TLS in bytes to allocate at program
startup.  Every thread created allocates this amount of specified surplus
static TLS. This is a minimum value and additional space may be allocated
for internal purposes including alignment.  Optional static TLS is used for
optimizing dynamic TLS access for platforms that support such optimizations
e.g. TLS descriptors or optimized TLS access for POWER (@code{DT_PPC64_OPT}
and @code{DT_PPC_OPT}).  In order to make the best use of such optimizations
the value should be as many bytes as would be required to hold all TLS
variables in all dynamic loaded shared libraries.  The value cannot be known
by the dynamic loader because it doesn't know the expected set of shared
libraries which will be loaded.  The existing static TLS space cannot be
changed once allocated at process startup.  The default allocation of
optional static TLS is 512 bytes and is allocated in every thread.

> +@end deftp
> +
>  @node Elision Tunables
>  @section Elision Tunables
>  @cindex elision tunables
> diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
> index 5ff4a2831b..cc7c682a81 100644
> --- a/sysdeps/generic/ldsodefs.h
> +++ b/sysdeps/generic/ldsodefs.h
> @@ -442,6 +442,9 @@ struct rtld_global
>    EXTERN size_t _dl_tls_static_used;
>    /* Alignment requirement of the static TLS block.  */
>    EXTERN size_t _dl_tls_static_align;
> +  /* Remaining amount of static TLS that may be used for optimizing
> +     dynamic TLS access (e.g. with TLSDESC).  */
> +  EXTERN size_t _dl_tls_static_optional;

OK.

>  
>  /* Number of additional entries in the slotinfo array of each slotinfo
>     list element.  A large number makes it almost certain take we never
> @@ -583,6 +586,11 @@ struct rtld_global_ro
>       binaries, don't honor for PIEs).  */
>    EXTERN ElfW(Addr) _dl_use_load_bias;
>  
> +  /* Size of surplus space in the static TLS area for dynamically
> +     loaded modules with IE-model TLS or for TLSDESC optimization.
> +     See comments in elf/dl-tls.c where it is initialized.  */
> +  EXTERN size_t _dl_tls_static_surplus;

OK.

> +
>    /* Name of the shared object to be profiled (if any).  */
>    EXTERN const char *_dl_profile;
>    /* Filename of the output file.  */
>
  
Szabolcs Nagy June 19, 2020, 4:44 p.m. UTC | #6
The 06/18/2020 15:50, Carlos O'Donell wrote:
> On 5/22/20 10:49 AM, Szabolcs Nagy wrote:
> > On some targets static TLS surplus area can be used opportunistically
> > for dynamically loaded modules such that the TLS access then becomes
> > faster (TLSDESC and powerpc TLS optimization). However we don't want
> > all surplus TLS to be used for this optimization because dynamically
> > loaded modules with initial-exec model TLS can only use surplus TLS.
> > 
> > Currently TLS_STATIC_SURPLUS is 1664 bytes which is not even enough to
> > cover the IE TLS needed for libc.so when DL_NNS (== 16) namespaces are
> > created, so to allow reliable use of dlmopen as well as dynamic TLS
> > optimizations a new contract is specified for use of static TLS:
> > 
> > - libc.so can have up to 192 bytes of IE TLS,
> > - other system libraries together can have up to 144 bytes of IE TLS.
> > - By default 512 bytes of "optional" static TLS is available for
> >   opportunistic use.
> > - By default at most 4 dlmopen namespaces are supported.
> > 
> > So the surplus TLS requirement is 3*192 + 4*144 + 512 = 1664 bytes
> > with dynamic linking (i.e. the same as before: the externally visible
> > behaviour is not changed, other than limiting static TLS use for
> > optimizations on affected targets.)
> > 
> > The optional TLS available for opportunistic use is now tunable
> > (dl.optional_static_tls), so users can directly affect the allocated
> > static TLS size. (Note that module unloading with dlclose does not
> > reclaim static TLS. After the optional TLS runs out, TLS access
> > is no longer optimized.)
> > 
> > Since users may need more dlmopen namespaces (5 .. 16) the maximum
> > supported number is now a tunable (dl.nns), when it is increased the
> > static TLS allocation increases according to the contract. If users
> > use more namespaces than the tunable setting, static TLS may run out.
> > Or if users dynamically load libraries with IE TLS beyond what's
> > allowed by the contract, static TLS may run out. These conditions are
> > not checked or enforced, but the user's responsibility.
> > 
> > Static linking used fixed 2048 bytes surplus TLS, this is changed
> > so the same contract is used as for dynamic linking.  However with
> > static linking DL_NNS == 1 so dl.nns tunable is forced to 1, so by
> > default the surplus TLS is reduced to 144 + 512 = 656 bytes. This
> > change is not expected to cause problems.
> > 
> > Tested on aarch64-linux-gnu and x86_64-linux-gnu.
> 
> Tested on x86_64 and i686 and passes without regression.
> 
> I know you're splitting this patch into two patches, but I went ahead
> and reviewed this anyway because it's in my queue. You can still split
> the patches if you want and I can review that again quickly and ACK.
> I wanted to get the bulk of the review done.

thanks. i did not make much progress with the spliting
but i will try to do it on monday, i think that's useful.

> 
> Subsequent followup after committing this:
> - We need to fix tst-manyaudit.
> - We should be able to count how many spaces we need based on LD_AUDIT
>   or DT_AUDIT and enable up to that amount.
> 
> Post a v2 and I think we're almost done.
> 
> See my recommendations below.
> 
> Tested-by: Carlos O'Donell <carlos@redhat.com>
...
> > --- a/csu/libc-start.c
> > +++ b/csu/libc-start.c
> > @@ -190,6 +190,8 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
> >  
> >    __tunables_init (__environ);
> >  
> > +  _dl_static_tls_tunables_init ();
> 
> OK. We're starting to get an interesting set of initialization functions in LIBC_START_MAIN.

this init has to be after __tunables_init
and before _dl_determine_tlsoffset in the
shared case and before/in __libc_tls_setup
in the static case.

previously i moved it as early as possible
but now i think it should be together with
other tls setup code where the initialized
values first used.

and i will move the declaration to
sysdep/generic/ldsodefs.h since other
tls related declarations live there.

> > @@ -317,7 +318,10 @@ modules-names = testobj1 testobj2 testobj3 testobj4 testobj5 testobj6 \
> >  		tst-dlopenfailmod1 tst-dlopenfaillinkmod tst-dlopenfailmod2 \
> >  		tst-dlopenfailmod3 tst-ldconfig-ld-mod \
> >  		tst-filterobj-flt tst-filterobj-aux tst-filterobj-filtee \
> > -		tst-auditlogmod-1 tst-auditlogmod-2 tst-auditlogmod-3
> > +		tst-auditlogmod-1 tst-auditlogmod-2 tst-auditlogmod-3 \
> > +		tst-tls-ie-mod0 tst-tls-ie-mod1 tst-tls-ie-mod2 \
> > +		tst-tls-ie-mod3 tst-tls-ie-mod4 tst-tls-ie-mod5
> 
> OK. Five new modules. Tests the new limit at 4 namespaces.

actually i use dlopen, not dlmopen, so it's
one namespace and only used more modules so
the test remains effective in case the
tls optimization logic varies a bit.

but it makes sense to have another test
that uses 3 dlmopen (then the ie tls modules
need smaller tls size so they fit with the
3 additional libc.so)

> > +      default: 4
> > +    }
> > +    optional_static_tls {
> > +      type: SIZE_T
> > +      default: 512
> 
> OK, though minval: 0 might be a good limit. No maximum. SXID_ERASE is good.

i will add the min.

> > +  mods[4] = load_and_access ("tst-tls-ie-mod4.so", "access4");
> > +  mods[5] = load_and_access ("tst-tls-ie-mod5.so", "access5");
> 
> Why can't we load a 6th module and test that it fails?
> 
> Does it terminate the process or just return a dlopen failure?

hm it should not crash, but report the error,
so i think that's a reasonable check to do.

and i will do the other suggested wording
changes in the comments and documentation.
  

Patch

diff --git a/csu/libc-start.c b/csu/libc-start.c
index 4005caf84a..2396956266 100644
--- a/csu/libc-start.c
+++ b/csu/libc-start.c
@@ -190,6 +190,8 @@  LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
 
   __tunables_init (__environ);
 
+  _dl_static_tls_tunables_init ();
+
   ARCH_INIT_CPU_FEATURES ();
 
   /* Perform IREL{,A} relocations.  */
diff --git a/csu/libc-tls.c b/csu/libc-tls.c
index 73ade0fec5..62f0b0c8c3 100644
--- a/csu/libc-tls.c
+++ b/csu/libc-tls.c
@@ -46,13 +46,19 @@  bool _dl_tls_dtv_gaps;
 struct dtv_slotinfo_list *_dl_tls_dtv_slotinfo_list;
 /* Number of modules in the static TLS block.  */
 size_t _dl_tls_static_nelem;
-/* Size of the static TLS block.  Giving this initialized value
-   preallocates some surplus bytes in the static TLS area.  */
-size_t _dl_tls_static_size = 2048;
+/* Size of the static TLS block.  */
+size_t _dl_tls_static_size;
 /* Size actually allocated in the static TLS block.  */
 size_t _dl_tls_static_used;
 /* Alignment requirement of the static TLS block.  */
 size_t _dl_tls_static_align;
+/* Size of surplus space in the static TLS area for dynamically
+   loaded modules with IE-model TLS or for TLSDESC optimization.
+   See comments in elf/dl-tls.c where it is initialized.  */
+size_t _dl_tls_static_surplus;
+/* Remaining amount of static TLS that may be used for optimizing
+   dynamic TLS access (e.g. with TLSDESC).  */
+size_t _dl_tls_static_optional;
 
 /* Generation counter for the dtv.  */
 size_t _dl_tls_generation;
@@ -81,10 +87,8 @@  init_slotinfo (void)
 static void
 init_static_tls (size_t memsz, size_t align)
 {
-  /* That is the size of the TLS memory for this object.  The initialized
-     value of _dl_tls_static_size is provided by dl-open.c to request some
-     surplus that permits dynamic loading of modules with IE-model TLS.  */
-  GL(dl_tls_static_size) = roundup (memsz + GL(dl_tls_static_size),
+  /* That is the size of the TLS memory for this object.  */
+  GL(dl_tls_static_size) = roundup (memsz + GLRO(dl_tls_static_surplus),
 				    TLS_TCB_ALIGN);
 #if TLS_TCB_AT_TP
   GL(dl_tls_static_size) += TLS_TCB_SIZE;
@@ -129,21 +133,17 @@  __libc_setup_tls (void)
      'errno'.  Therefore we avoid 'malloc' which might touch 'errno'.
      Instead we use 'sbrk' which would only uses 'errno' if it fails.
      In this case we are right away out of memory and the user gets
-     what she/he deserves.
-
-     The initialized value of _dl_tls_static_size is provided by dl-open.c
-     to request some surplus that permits dynamic loading of modules with
-     IE-model TLS.  */
+     what she/he deserves.  */
 #if TLS_TCB_AT_TP
   /* Align the TCB offset to the maximum alignment, as
      _dl_allocate_tls_storage (in elf/dl-tls.c) does using __libc_memalign
      and dl_tls_static_align.  */
-  tcb_offset = roundup (memsz + GL(dl_tls_static_size), max_align);
+  tcb_offset = roundup (memsz + GLRO(dl_tls_static_surplus), max_align);
   tlsblock = __sbrk (tcb_offset + TLS_INIT_TCB_SIZE + max_align);
 #elif TLS_DTV_AT_TP
   tcb_offset = roundup (TLS_INIT_TCB_SIZE, align ?: 1);
   tlsblock = __sbrk (tcb_offset + memsz + max_align
-		     + TLS_PRE_TCB_SIZE + GL(dl_tls_static_size));
+		     + TLS_PRE_TCB_SIZE + GLRO(dl_tls_static_surplus));
   tlsblock += TLS_PRE_TCB_SIZE;
 #else
   /* In case a model with a different layout for the TCB and DTV
diff --git a/elf/Makefile b/elf/Makefile
index 6fe1df90bb..b8bde1f47d 100644
--- a/elf/Makefile
+++ b/elf/Makefile
@@ -204,7 +204,8 @@  tests += restest1 preloadtest loadfail multiload origtest resolvfail \
 	 tst-dlopen-self tst-auditmany tst-initfinilazyfail tst-dlopenfail \
 	 tst-dlopenfail-2 \
 	 tst-filterobj tst-filterobj-dlopen tst-auxobj tst-auxobj-dlopen \
-	 tst-audit14 tst-audit15 tst-audit16
+	 tst-audit14 tst-audit15 tst-audit16 \
+	 tst-tls-ie
 #	 reldep9
 tests-internal += loadtest unload unload2 circleload1 \
 	 neededtest neededtest2 neededtest3 neededtest4 \
@@ -317,7 +318,10 @@  modules-names = testobj1 testobj2 testobj3 testobj4 testobj5 testobj6 \
 		tst-dlopenfailmod1 tst-dlopenfaillinkmod tst-dlopenfailmod2 \
 		tst-dlopenfailmod3 tst-ldconfig-ld-mod \
 		tst-filterobj-flt tst-filterobj-aux tst-filterobj-filtee \
-		tst-auditlogmod-1 tst-auditlogmod-2 tst-auditlogmod-3
+		tst-auditlogmod-1 tst-auditlogmod-2 tst-auditlogmod-3 \
+		tst-tls-ie-mod0 tst-tls-ie-mod1 tst-tls-ie-mod2 \
+		tst-tls-ie-mod3 tst-tls-ie-mod4 tst-tls-ie-mod5
+
 # Most modules build with _ISOMAC defined, but those filtered out
 # depend on internal headers.
 modules-names-tests = $(filter-out ifuncmod% tst-libc_dlvsym-dso tst-tlsmod%,\
@@ -1748,3 +1752,12 @@  $(objpfx)tst-auxobj: $(objpfx)tst-filterobj-aux.so
 $(objpfx)tst-auxobj-dlopen: $(libdl)
 $(objpfx)tst-auxobj.out: $(objpfx)tst-filterobj-filtee.so
 $(objpfx)tst-auxobj-dlopen.out: $(objpfx)tst-filterobj-filtee.so
+
+$(objpfx)tst-tls-ie: $(libdl) $(shared-thread-library)
+$(objpfx)tst-tls-ie.out: \
+  $(objpfx)tst-tls-ie-mod0.so \
+  $(objpfx)tst-tls-ie-mod1.so \
+  $(objpfx)tst-tls-ie-mod2.so \
+  $(objpfx)tst-tls-ie-mod3.so \
+  $(objpfx)tst-tls-ie-mod4.so \
+  $(objpfx)tst-tls-ie-mod5.so
diff --git a/elf/dl-reloc.c b/elf/dl-reloc.c
index ffcc84d396..6d32e49467 100644
--- a/elf/dl-reloc.c
+++ b/elf/dl-reloc.c
@@ -39,13 +39,16 @@ 
 /* We are trying to perform a static TLS relocation in MAP, but it was
    dynamically loaded.  This can only work if there is enough surplus in
    the static TLS area already allocated for each running thread.  If this
-   object's TLS segment is too big to fit, we fail.  If it fits,
-   we set MAP->l_tls_offset and return.
-   This function intentionally does not return any value but signals error
-   directly, as static TLS should be rare and code handling it should
-   not be inlined as much as possible.  */
+   object's TLS segment is too big to fit, we fail with -1.  If it fits,
+   we set MAP->l_tls_offset and return 0.
+   A portion of the surplus static TLS can be optionally used to optimize
+   dynamic TLS access (with TLSDESC or powerpc TLS optimizations).
+   If OPTIONAL is true then TLS is allocated for such optimization and
+   the caller must have a fallback in case the optional portion of surplus
+   TLS runs out.  If OPTIONAL is false then the entire surplus TLS area is
+   considered and the allocation only fails if that runs out.  */
 int
-_dl_try_allocate_static_tls (struct link_map *map)
+_dl_try_allocate_static_tls (struct link_map *map, bool optional)
 {
   /* If we've already used the variable with dynamic access, or if the
      alignment requirements are too high, fail.  */
@@ -68,8 +71,14 @@  _dl_try_allocate_static_tls (struct link_map *map)
 
   size_t n = (freebytes - blsize) / map->l_tls_align;
 
-  size_t offset = GL(dl_tls_static_used) + (freebytes - n * map->l_tls_align
-					    - map->l_tls_firstbyte_offset);
+  /* Account optional static TLS surplus usage.  */
+  size_t use = freebytes - n * map->l_tls_align - map->l_tls_firstbyte_offset;
+  if (optional && use > GL(dl_tls_static_optional))
+    goto fail;
+  else if (optional)
+    GL(dl_tls_static_optional) -= use;
+
+  size_t offset = GL(dl_tls_static_used) + use;
 
   map->l_tls_offset = GL(dl_tls_static_used) = offset;
 #elif TLS_DTV_AT_TP
@@ -83,6 +92,13 @@  _dl_try_allocate_static_tls (struct link_map *map)
   if (used > GL(dl_tls_static_size))
     goto fail;
 
+  /* Account optional static TLS surplus usage.  */
+  size_t use = used - GL(dl_tls_static_used);
+  if (optional && use > GL(dl_tls_static_optional))
+    goto fail;
+  else if (optional)
+    GL(dl_tls_static_optional) -= use;
+
   map->l_tls_offset = offset;
   map->l_tls_firstbyte_offset = GL(dl_tls_static_used);
   GL(dl_tls_static_used) = used;
@@ -110,12 +126,15 @@  _dl_try_allocate_static_tls (struct link_map *map)
   return 0;
 }
 
+/* This function intentionally does not return any value but signals error
+   directly, as static TLS should be rare and code handling it should
+   not be inlined as much as possible.  */
 void
 __attribute_noinline__
 _dl_allocate_static_tls (struct link_map *map)
 {
   if (map->l_tls_offset == FORCED_DYNAMIC_TLS_OFFSET
-      || _dl_try_allocate_static_tls (map))
+      || _dl_try_allocate_static_tls (map, false))
     {
       _dl_signal_error (0, map->l_name, NULL, N_("\
 cannot allocate memory in static TLS block"));
diff --git a/elf/dl-sysdep.c b/elf/dl-sysdep.c
index 854570821c..68a780dcbd 100644
--- a/elf/dl-sysdep.c
+++ b/elf/dl-sysdep.c
@@ -222,6 +222,8 @@  _dl_sysdep_start (void **start_argptr,
 
   __tunables_init (_environ);
 
+  _dl_static_tls_tunables_init ();
+
 #ifdef DL_SYSDEP_INIT
   DL_SYSDEP_INIT;
 #endif
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index fa03234610..740e33ea91 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -29,10 +29,55 @@ 
 #include <dl-tls.h>
 #include <ldsodefs.h>
 
-/* Amount of excess space to allocate in the static TLS area
-   to allow dynamic loading of modules defining IE-model TLS data.  */
-#define TLS_STATIC_SURPLUS	64 + DL_NNS * 100
+#define TUNABLE_NAMESPACE dl
+#include <dl-tunables.h>
+
+/* Surplus static TLS, GLRO(dl_tls_static_surplus), is used for
+
+   - IE TLS in libc.so for all dlmopen namespaces except in the initial
+     one where libc.so is not loaded dynamically but at startup time,
+   - IE TLS in other libraries which may be dynamically loaded even in the
+     initial namespace,
+   - and optionally for optimizing dynamic TLS access.
+
+   The maximum number of namespaces is DL_NNS, but to support that many
+   namespaces correctly the static TLS allocation should be significantly
+   increased, which may cause problems with small thread stacks due to the
+   way static TLS is accounted (bug 11787).
+
+   So there is a dl.nns tunable limit on the number of supported namespaces
+   that affects the size of the static TLS and by default it's small enough
+   not to cause problems with existing applications. The limit is not
+   enforced or checked: it is the user's responsibility to increase dl.nns
+   if more dlmopen namespaces are used.  */
+
+/* Size of initial-exec TLS in libc.so.  */
+#define LIBC_IE_TLS 192
+/* Size of initial-exec TLS in libraries other than libc.so.
+   This should be large enough to cover runtime libraries of the
+   compiler such as libgomp and libraries in libc other than libc.so.  */
+#define OTHER_IE_TLS 144
 
+void
+_dl_static_tls_tunables_init (void)
+{
+  size_t nns, opt_tls;
+
+#if HAVE_TUNABLES
+  nns = TUNABLE_GET (nns, size_t, NULL);
+  opt_tls = TUNABLE_GET (optional_static_tls, size_t, NULL);
+#else
+  /* Default values of the tunables.  */
+  nns = 4;
+  opt_tls = 512;
+#endif
+  if (nns > DL_NNS)
+    nns = DL_NNS;
+  GL(dl_tls_static_optional) = opt_tls;
+  GLRO(dl_tls_static_surplus) = ((nns - 1) * LIBC_IE_TLS
+				 + nns * OTHER_IE_TLS
+				 + opt_tls);
+}
 
 /* Out-of-memory handler.  */
 static void
@@ -218,7 +263,8 @@  _dl_determine_tlsoffset (void)
     }
 
   GL(dl_tls_static_used) = offset;
-  GL(dl_tls_static_size) = (roundup (offset + TLS_STATIC_SURPLUS, max_align)
+  GL(dl_tls_static_size) = (roundup (offset + GLRO(dl_tls_static_surplus),
+				     max_align)
 			    + TLS_TCB_SIZE);
 #elif TLS_DTV_AT_TP
   /* The TLS blocks start right after the TCB.  */
@@ -262,7 +308,7 @@  _dl_determine_tlsoffset (void)
     }
 
   GL(dl_tls_static_used) = offset;
-  GL(dl_tls_static_size) = roundup (offset + TLS_STATIC_SURPLUS,
+  GL(dl_tls_static_size) = roundup (offset + GLRO(dl_tls_static_surplus),
 				    TLS_TCB_ALIGN);
 #else
 # error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
diff --git a/elf/dl-tunables.h b/elf/dl-tunables.h
index 969e50327b..678f447e09 100644
--- a/elf/dl-tunables.h
+++ b/elf/dl-tunables.h
@@ -128,4 +128,8 @@  tunable_is_name (const char *orig, const char *envname)
 }
 
 #endif
+
+/* Initializers of tunables in the dl tunable namespace.  */
+void _dl_static_tls_tunables_init (void);
+
 #endif
diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
index 0d398dd251..ce46f28c7a 100644
--- a/elf/dl-tunables.list
+++ b/elf/dl-tunables.list
@@ -126,4 +126,17 @@  glibc {
       default: 3
     }
   }
+
+  dl {
+    nns {
+      type: SIZE_T
+      minval: 1
+      maxval: 16
+      default: 4
+    }
+    optional_static_tls {
+      type: SIZE_T
+      default: 512
+    }
+  }
 }
diff --git a/elf/dynamic-link.h b/elf/dynamic-link.h
index bb7a66f4cd..6727233e1a 100644
--- a/elf/dynamic-link.h
+++ b/elf/dynamic-link.h
@@ -40,9 +40,10 @@ 
     (__builtin_expect ((sym_map)->l_tls_offset				\
 		       != FORCED_DYNAMIC_TLS_OFFSET, 1)			\
      && (__builtin_expect ((sym_map)->l_tls_offset != NO_TLS_OFFSET, 1)	\
-	 || _dl_try_allocate_static_tls (sym_map) == 0))
+	 || _dl_try_allocate_static_tls (sym_map, true) == 0))
 
-int _dl_try_allocate_static_tls (struct link_map *map) attribute_hidden;
+int _dl_try_allocate_static_tls (struct link_map *map, bool optional)
+  attribute_hidden;
 
 #include <elf.h>
 
diff --git a/elf/tst-tls-ie-mod.h b/elf/tst-tls-ie-mod.h
new file mode 100644
index 0000000000..46b362a9b7
--- /dev/null
+++ b/elf/tst-tls-ie-mod.h
@@ -0,0 +1,40 @@ 
+/* Module with specified TLS size and model.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* This file is parameterized by macros N, SIZE and MODEL.  */
+
+#include <stdio.h>
+#include <string.h>
+
+#define CONCATX(x, y) x ## y
+#define CONCAT(x, y) CONCATX (x, y)
+#define STRX(x) #x
+#define STR(x) STRX (x)
+
+#define VAR CONCAT (var, N)
+
+__attribute__ ((aligned (8), tls_model (MODEL)))
+__thread char VAR[SIZE];
+
+void
+CONCAT (access, N) (void)
+{
+  printf (STR (VAR) "[%d]:\t %p .. %p " MODEL "\n", SIZE, VAR, VAR + SIZE);
+  fflush (stdout);
+  memset (VAR, 1, SIZE);
+}
diff --git a/elf/tst-tls-ie-mod0.c b/elf/tst-tls-ie-mod0.c
new file mode 100644
index 0000000000..2450686e40
--- /dev/null
+++ b/elf/tst-tls-ie-mod0.c
@@ -0,0 +1,4 @@ 
+#define N 0
+#define SIZE 480
+#define MODEL "global-dynamic"
+#include "tst-tls-ie-mod.h"
diff --git a/elf/tst-tls-ie-mod1.c b/elf/tst-tls-ie-mod1.c
new file mode 100644
index 0000000000..849ff91e53
--- /dev/null
+++ b/elf/tst-tls-ie-mod1.c
@@ -0,0 +1,4 @@ 
+#define N 1
+#define SIZE 120
+#define MODEL "global-dynamic"
+#include "tst-tls-ie-mod.h"
diff --git a/elf/tst-tls-ie-mod2.c b/elf/tst-tls-ie-mod2.c
new file mode 100644
index 0000000000..23915ab67b
--- /dev/null
+++ b/elf/tst-tls-ie-mod2.c
@@ -0,0 +1,4 @@ 
+#define N 2
+#define SIZE 24
+#define MODEL "global-dynamic"
+#include "tst-tls-ie-mod.h"
diff --git a/elf/tst-tls-ie-mod3.c b/elf/tst-tls-ie-mod3.c
new file mode 100644
index 0000000000..5395f844a5
--- /dev/null
+++ b/elf/tst-tls-ie-mod3.c
@@ -0,0 +1,4 @@ 
+#define N 3
+#define SIZE 16
+#define MODEL "global-dynamic"
+#include "tst-tls-ie-mod.h"
diff --git a/elf/tst-tls-ie-mod4.c b/elf/tst-tls-ie-mod4.c
new file mode 100644
index 0000000000..93ac2eacae
--- /dev/null
+++ b/elf/tst-tls-ie-mod4.c
@@ -0,0 +1,4 @@ 
+#define N 4
+#define SIZE 1024
+#define MODEL "initial-exec"
+#include "tst-tls-ie-mod.h"
diff --git a/elf/tst-tls-ie-mod5.c b/elf/tst-tls-ie-mod5.c
new file mode 100644
index 0000000000..84b3fd285b
--- /dev/null
+++ b/elf/tst-tls-ie-mod5.c
@@ -0,0 +1,4 @@ 
+#define N 5
+#define SIZE 128
+#define MODEL "initial-exec"
+#include "tst-tls-ie-mod.h"
diff --git a/elf/tst-tls-ie.c b/elf/tst-tls-ie.c
new file mode 100644
index 0000000000..2f00a2936d
--- /dev/null
+++ b/elf/tst-tls-ie.c
@@ -0,0 +1,100 @@ 
+/* Test dlopen of modules with initial-exec TLS.
+   Copyright (C) 2016-2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* This test tries to check that surplus static TLS is not used up for
+   dynamic TLS optimizations and 3*192 + 4*144 = 1152 bytes of static
+   TLS is available for dlopening modules with initial-exec TLS.  It
+   depends on dl.nns=4 and dl.optional_static_tls=512 tunable setting.  */
+
+#include <errno.h>
+#include <pthread.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+static int do_test (void);
+#include <support/xthread.h>
+#include <support/xdlfcn.h>
+#include <support/test-driver.c>
+
+/* Have some big TLS in the main exe: should not use surplus TLS.  */
+__thread char maintls[1000];
+
+static pthread_barrier_t barrier;
+
+/* Forces multi-threaded behaviour.  */
+static void *
+blocked_thread_func (void *closure)
+{
+  xpthread_barrier_wait (&barrier);
+  /* TLS load and access tests run here in the main thread.  */
+  xpthread_barrier_wait (&barrier);
+  return NULL;
+}
+
+static void *
+load_and_access (const char *mod, const char *func)
+{
+  /* Load module with TLS.  */
+  void *p = xdlopen (mod, RTLD_NOW);
+  /* Access the TLS variable to ensure it is allocated.  */
+  void (*f) (void) = (void (*) (void))xdlsym (p, func);
+  f ();
+  return p;
+}
+
+static int
+do_test (void)
+{
+  void *mods[6];
+
+  {
+    int ret = pthread_barrier_init (&barrier, NULL, 2);
+    if (ret != 0)
+      {
+        errno = ret;
+        printf ("error: pthread_barrier_init: %m\n");
+        exit (1);
+      }
+  }
+
+  pthread_t blocked_thread = xpthread_create (NULL, blocked_thread_func, NULL);
+  xpthread_barrier_wait (&barrier);
+
+  printf ("maintls[%zu]:\t %p .. %p\n",
+	   sizeof maintls, maintls, maintls + sizeof maintls);
+  memset (maintls, 1, sizeof maintls);
+
+  /* Load modules with dynamic TLS (may use surplus TLS opportunistically).  */
+  mods[0] = load_and_access ("tst-tls-ie-mod0.so", "access0");
+  mods[1] = load_and_access ("tst-tls-ie-mod1.so", "access1");
+  mods[2] = load_and_access ("tst-tls-ie-mod2.so", "access2");
+  mods[3] = load_and_access ("tst-tls-ie-mod3.so", "access3");
+  /* Load modules with initial-exec TLS (can only use surplus TLS).  */
+  mods[4] = load_and_access ("tst-tls-ie-mod4.so", "access4");
+  mods[5] = load_and_access ("tst-tls-ie-mod5.so", "access5");
+
+  xpthread_barrier_wait (&barrier);
+  xpthread_join (blocked_thread);
+
+  /* Close the modules.  */
+  for (int i = 0; i < 6; ++i)
+    xdlclose (mods[i]);
+
+  return 0;
+}
diff --git a/manual/tunables.texi b/manual/tunables.texi
index ec18b10834..437fdadff0 100644
--- a/manual/tunables.texi
+++ b/manual/tunables.texi
@@ -31,6 +31,7 @@  their own namespace.
 @menu
 * Tunable names::  The structure of a tunable name
 * Memory Allocation Tunables::  Tunables in the memory allocation subsystem
+* Dynamic Linking Tunables:: Tunables in the dynamic linking subsystem
 * Elision Tunables::  Tunables in elision subsystem
 * POSIX Thread Tunables:: Tunables in the POSIX thread subsystem
 * Hardware Capability Tunables::  Tunables that modify the hardware
@@ -226,6 +227,33 @@  pointer, so add 4 on 32-bit systems or 8 on 64-bit systems to the size
 passed to @code{malloc} for the largest bin size to enable.
 @end deftp
 
+@node Dynamic Linking Tunables
+@section Dynamic Linking Tunables
+@cindex dynamic linking tunables
+@cindex dl tunables
+
+@deftp {Tunable namespace} glibc.dl
+Dynamic linker behavior can be modified by setting the
+following tunables in the @code{dl} namespace:
+@end deftp
+
+@deftp Tunable glibc.dl.nns
+Sets the number of supported dynamic link namespaces for which enough
+static TLS is allocated (see @code{dlmopen}).  If more namespaces are
+created then static TLS may run out at @code{dlopen} or @code{dlmopen}
+time which is a non-recoverable failure.  Currently this limit can be
+set between 1 and 16 inclusive, the default is 4. If the limit is
+increased then internally more static TLS is allocated to accomodate
+system libraries with initial-exec TLS in all namespaces.
+@end deftp
+
+@deftp Tunable glibc.dl.optional_static_tls
+Sets the amount of surplus static TLS that may be used for optimizing
+dynamic TLS access (only works on certain platforms, e.g. TLSDESC can
+be optimized this way). The internal allocation of static TLS is
+increased by this amount, the default is 512.
+@end deftp
+
 @node Elision Tunables
 @section Elision Tunables
 @cindex elision tunables
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index 5ff4a2831b..cc7c682a81 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -442,6 +442,9 @@  struct rtld_global
   EXTERN size_t _dl_tls_static_used;
   /* Alignment requirement of the static TLS block.  */
   EXTERN size_t _dl_tls_static_align;
+  /* Remaining amount of static TLS that may be used for optimizing
+     dynamic TLS access (e.g. with TLSDESC).  */
+  EXTERN size_t _dl_tls_static_optional;
 
 /* Number of additional entries in the slotinfo array of each slotinfo
    list element.  A large number makes it almost certain take we never
@@ -583,6 +586,11 @@  struct rtld_global_ro
      binaries, don't honor for PIEs).  */
   EXTERN ElfW(Addr) _dl_use_load_bias;
 
+  /* Size of surplus space in the static TLS area for dynamically
+     loaded modules with IE-model TLS or for TLSDESC optimization.
+     See comments in elf/dl-tls.c where it is initialized.  */
+  EXTERN size_t _dl_tls_static_surplus;
+
   /* Name of the shared object to be profiled (if any).  */
   EXTERN const char *_dl_profile;
   /* Filename of the output file.  */