[PR19826] fix non-LE TLS in static programs

Message ID or60pqngb7.fsf@livre.home
State New, archived
Headers

Commit Message

Alexandre Oliva Sept. 21, 2016, 2:21 a.m. UTC
  An earlier fix for TLS dropped early initialization of DTV entries for
modules using static TLS, leaving it for __tls_get_addr to set them
up.  That worked on platforms that require the GD access model to be
relaxed to LE in the main executable, but it caused a regression on
platforms that allow GD in the main executable, particularly in
statically-linked programs: they use a custom __tls_get_addr that does
not update the DTV, which fails when the DTV early initialization is
not performed.

In static programs, __libc_setup_tls performs the DTV initialization
for the main thread, but the DTV of other threads is set up in
_dl_allocate_tls_init, so that's the fix that matters.

Restoring the initialization in the remaining functions modified by
this patch was just for uniformity.  It's not clear that it is ever
needed: even on platforms that allow GD in the main executable, the
dynamically-linked version of __tls_get_addr would set up the DTV
entries, even for static TLS modules, while updating the DTV counter.


Thanks to Andreas Schwab for the hints and the testing on ARM, and
Joseph Myers for the hint.  I've regression-tested this myself on
x86_64-linux-gnu, but the problem couldn't be triggered on that platform
to begin with.

Ok to install?


for  ChangeLog

	[BZ #19826]
	* elf/dl-tls.c (_dl_allocate_tls_init): Restore DTV early
	initialization of static TLS entries.
	* elf/dl-reloc.c (_dl_nothread_init_static_tls): Likewise.
	* nptl/allocatestack.c (init_one_static_tls): Likewise.
---
 elf/dl-reloc.c       |    6 ++++++
 elf/dl-tls.c         |    4 ++++
 nptl/allocatestack.c |    9 ++++++---
 3 files changed, 16 insertions(+), 3 deletions(-)
  

Comments

Andreas Schwab Sept. 21, 2016, 2:34 p.m. UTC | #1
On Sep 21 2016, Alexandre Oliva <aoliva@redhat.com> wrote:

> 	[BZ #19826]
> 	* elf/dl-tls.c (_dl_allocate_tls_init): Restore DTV early
> 	initialization of static TLS entries.
> 	* elf/dl-reloc.c (_dl_nothread_init_static_tls): Likewise.
> 	* nptl/allocatestack.c (init_one_static_tls): Likewise.

Ok.

Andreas.
  
Carlos O'Donell Sept. 29, 2016, 7:04 p.m. UTC | #2
On 09/20/2016 10:21 PM, Alexandre Oliva wrote:
> Ok to install?
> 
> for  ChangeLog
> 
> 	[BZ #19826]
> 	* elf/dl-tls.c (_dl_allocate_tls_init): Restore DTV early
> 	initialization of static TLS entries.
> 	* elf/dl-reloc.c (_dl_nothread_init_static_tls): Likewise.
> 	* nptl/allocatestack.c (init_one_static_tls): Likewise.

Alex is going to hand this patch off to me to champion and finish
iterating on with the rest of the interested developers (Andreas, Torvald,
Florian).

I'll work up a version 2 right now and post it shortly.
  
Florian Weimer Dec. 2, 2016, 3:25 p.m. UTC | #3
On 09/29/2016 09:04 PM, Carlos O'Donell wrote:
> On 09/20/2016 10:21 PM, Alexandre Oliva wrote:
>> Ok to install?
>>
>> for  ChangeLog
>>
>> 	[BZ #19826]
>> 	* elf/dl-tls.c (_dl_allocate_tls_init): Restore DTV early
>> 	initialization of static TLS entries.
>> 	* elf/dl-reloc.c (_dl_nothread_init_static_tls): Likewise.
>> 	* nptl/allocatestack.c (init_one_static_tls): Likewise.
>
> Alex is going to hand this patch off to me to champion and finish
> iterating on with the rest of the interested developers (Andreas, Torvald,
> Florian).
>
> I'll work up a version 2 right now and post it shortly.

What's the status here?  It seems that Alexandre committed it on 
September 22nd, despite Torvald's objections.

This patch broke non-optimized global-dynamic TLS on aarch64 (for shared 
builds), as shown by my new tst-tls-manydynamic test case.

Florian
  
Carlos O'Donell Dec. 2, 2016, 4:25 p.m. UTC | #4
On 12/02/2016 10:25 AM, Florian Weimer wrote:
> On 09/29/2016 09:04 PM, Carlos O'Donell wrote:
>> On 09/20/2016 10:21 PM, Alexandre Oliva wrote:
>>> Ok to install?
>>>
>>> for  ChangeLog
>>>
>>>     [BZ #19826]
>>>     * elf/dl-tls.c (_dl_allocate_tls_init): Restore DTV early
>>>     initialization of static TLS entries.
>>>     * elf/dl-reloc.c (_dl_nothread_init_static_tls): Likewise.
>>>     * nptl/allocatestack.c (init_one_static_tls): Likewise.
>>
>> Alex is going to hand this patch off to me to champion and finish
>> iterating on with the rest of the interested developers (Andreas, Torvald,
>> Florian).
>>
>> I'll work up a version 2 right now and post it shortly.
> 
> What's the status here? It seems that Alexandre committed it on
> September 22nd, despite Torvald's objections.

Alex should not have committed this patch without consensus.

It was my intent to rework the patches to the point where they
meet the community standard for P&C work. In fact I'm reviewing
Szabolcs Nagy's high-level design on how to fix the P&C issues
in TLS.

In this case Andreas Schwab gave an ACK for the patch but Torvald
had requested further clarification. Alex should have waited to
resolve the issues with Torvald or asked Torvald if his objections
constituted a sustained objection (blocks consensus and the patch).

> This patch broke non-optimized global-dynamic TLS on aarch64 (for
> shared builds), as shown by my new tst-tls-manydynamic test case.

Does reverting Alex's patch fix tst-tls-manydynamic?
  
Florian Weimer Dec. 2, 2016, 4:32 p.m. UTC | #5
On 12/02/2016 05:25 PM, Carlos O'Donell wrote:

>> This patch broke non-optimized global-dynamic TLS on aarch64 (for
>> shared builds), as shown by my new tst-tls-manydynamic test case.
>
> Does reverting Alex's patch fix tst-tls-manydynamic?

It's sufficient to revert the patch to nptl/allocatestack.c.  I don't 
know if it is really needed, I and I have no idea how to make 
initialization at this point safe because the threads themselves could 
be using the DTV concurrently, so we cannot enlarge it.

I filed <https://sourceware.org/bugzilla/show_bug.cgi?id=20915> so that 
we keep track of this regression and fix it before the release.

Thanks,
Florian
  
Carlos O'Donell Dec. 2, 2016, 4:45 p.m. UTC | #6
On 12/02/2016 11:32 AM, Florian Weimer wrote:
> On 12/02/2016 05:25 PM, Carlos O'Donell wrote:
> 
>>> This patch broke non-optimized global-dynamic TLS on aarch64
>>> (for shared builds), as shown by my new tst-tls-manydynamic test
>>> case.
>> 
>> Does reverting Alex's patch fix tst-tls-manydynamic?
> 
> It's sufficient to revert the patch to nptl/allocatestack.c.  I don't
> know if it is really needed, I and I have no idea how to make
> initialization at this point safe because the threads themselves
> could be using the DTV concurrently, so we cannot enlarge it.
> 
> I filed <https://sourceware.org/bugzilla/show_bug.cgi?id=20915> so
> that we keep track of this regression and fix it before the release.

You don't quite answer my question.

If bug 20915 is a release blocker because of the new regression, and
I think it should be, then we should immediately add it to the release
blockers list for 2.25:

https://sourceware.org/glibc/wiki/Release/2.25
  
Florian Weimer Dec. 2, 2016, 4:55 p.m. UTC | #7
On 12/02/2016 05:45 PM, Carlos O'Donell wrote:
> On 12/02/2016 11:32 AM, Florian Weimer wrote:
>> On 12/02/2016 05:25 PM, Carlos O'Donell wrote:
>>
>>>> This patch broke non-optimized global-dynamic TLS on aarch64
>>>> (for shared builds), as shown by my new tst-tls-manydynamic test
>>>> case.
>>>
>>> Does reverting Alex's patch fix tst-tls-manydynamic?
>>
>> It's sufficient to revert the patch to nptl/allocatestack.c.  I don't
>> know if it is really needed, I and I have no idea how to make
>> initialization at this point safe because the threads themselves
>> could be using the DTV concurrently, so we cannot enlarge it.
>>
>> I filed <https://sourceware.org/bugzilla/show_bug.cgi?id=20915> so
>> that we keep track of this regression and fix it before the release.
>
> You don't quite answer my question.

Well, obviously, because as I wrote, the patch broke the 
tst-tls-manydynamic test case.

> If bug 20915 is a release blocker because of the new regression, and
> I think it should be, then we should immediately add it to the release
> blockers list for 2.25:
>
> https://sourceware.org/glibc/wiki/Release/2.25

Thanks, I added it.

Florian
  
Alexandre Oliva Dec. 6, 2016, 6:56 a.m. UTC | #8
On Dec  2, 2016, Florian Weimer <fweimer@redhat.com> wrote:

> On 09/29/2016 09:04 PM, Carlos O'Donell wrote:
>> On 09/20/2016 10:21 PM, Alexandre Oliva wrote:
>>> Ok to install?
>>> 
>>> for  ChangeLog
>>> 
>>> [BZ #19826]
>>> * elf/dl-tls.c (_dl_allocate_tls_init): Restore DTV early
>>> initialization of static TLS entries.
>>> * elf/dl-reloc.c (_dl_nothread_init_static_tls): Likewise.
>>> * nptl/allocatestack.c (init_one_static_tls): Likewise.
>> 
>> Alex is going to hand this patch off to me to champion and finish
>> iterating on with the rest of the interested developers (Andreas, Torvald,
>> Florian).
>> 
>> I'll work up a version 2 right now and post it shortly.

> What's the status here?  It seems that Alexandre committed it on
> September 22nd, despite Torvald's objections.

No, I committed the *first* of two patches, that I'd posted on 2016-09-20.
https://sourceware.org/ml/libc-alpha/2016-09/msg00365.html

That patch was approved by Andreas on 2016-09-21.
https://sourceware.org/ml/libc-alpha/2016-09/msg00378.html

Not having got any objections, I pushed it on 2016-09-22.
https://sourceware.org/bugzilla/show_bug.cgi?id=19826#c14

On 2016-09-24, *I* was the first and only person to object to this first
patch, when I realized it reintroduced the bug that an earlier (zeroth)
patch of mine had fixed back in March, 2015.  Along with the objection,
I posted a second patch in this thread, containing a comment that
explained why that part of the first patch I'd installed was incorrect,
in more detail than what I'd incorrently taken out in the first patch,
so that we wouldn't make the same mistake again.
https://sourceware.org/ml/libc-alpha/2016-09/msg00512.html

Here's the zeroth patch.
https://sourceware.org/ml/libc-alpha/2015-03/msg00563.html

Also on 2016-09-24, Torvald objected to the *second* patch, because he
wasn't happy with the wording of the comments, and that I wasn't adding
documentation for all concurrency issues in the TLS implementation along
with it, and later than I wasn't revamping it into using atomics, and
instead I just wanted to revert this small change that I had mistakenly
made but later realized why I shouldn't have.
https://sourceware.org/ml/libc-alpha/2016-09/msg00513.html

When Carlos announced he'd pick that up, after I asked him for help
because the thread was turning into yet another infinite distraction, he
responded to the first patch, rather than the second.  I suppose that's
the source of your confusion.  Maybe he was already confused about which
patch was pending back then, or he just went for the top of the thread
or something.
https://sourceware.org/ml/libc-alpha/2016-09/msg00593.html

> This patch broke non-optimized global-dynamic TLS on aarch64 (for
> shared builds), as shown by my new tst-tls-manydynamic test case.

Would you please try the second patch and confirm that it fixes the
problem?  What you're hitting is most likely the DTV resizing race
condition described in the comments there, at the '(**)' note.

It's quite unfortunate that this known race has been left unfixed for so
long because of objections to the wording of comments.
  
Alexandre Oliva Dec. 6, 2016, 7:06 a.m. UTC | #9
On Dec  2, 2016, "Carlos O'Donell" <carlos@redhat.com> wrote:

>> This patch broke non-optimized global-dynamic TLS on aarch64 (for
>> shared builds), as shown by my new tst-tls-manydynamic test case.

> Does reverting Alex's patch fix tst-tls-manydynamic?

That would break TLS on static programs on platforms that don't mandate
GD-to-LE relaxations again.

We just have to revert the two pieces of the first patch that are
broken, which is what the second patch I posted does.  It was a follow
up to Andreas' approval for the first.

We wouldn't really have revert both of them, as Florian afonud out, but
since they play similar roles in threaded and non-threaded programs, it
would make more sense to keep their behavior in sync.
  
Alexandre Oliva Dec. 6, 2016, 7:15 a.m. UTC | #10
On Dec  2, 2016, Florian Weimer <fweimer@redhat.com> wrote:

> I don't know if it is really needed,

No, it's wrong, it had been fixed before.

> I and I have no idea how to make initialization at this point safe

It's safe to leave that initialization out.  The checks for whether the
DTV needs updating are "safe enough".  As in, a thread will take care of
its own DTV, so there won't be races with other threads.  However, it's
not AS-Safe: if DTV resizing is interrupted by a signal whose handler
triggers another DTV resizing, we'll have trouble, as we always had.
That's one of the reasons why GD TLS is not AS-Safe.  Another is that
dynamic TLS blocks may have to be allocated, and we use(d?)  AS-Unsafe
means for the allocation.  There may be other reasons that don't come to
mind right now.

> I filed <https://sourceware.org/bugzilla/show_bug.cgi?id=20915> so
> that we keep track of this regression and fix it before the release.

Thanks.  Carlos, please add a '[BZ #20915]' to the ChangeLog of the
revised version of the second patch when you get to it.
  
Florian Weimer Dec. 6, 2016, 7:45 a.m. UTC | #11
On 12/06/2016 08:15 AM, Alexandre Oliva wrote:
> On Dec  2, 2016, Florian Weimer <fweimer@redhat.com> wrote:
>
>> I don't know if it is really needed,
>
> No, it's wrong, it had been fixed before.
>
>> I and I have no idea how to make initialization at this point safe
>
> It's safe to leave that initialization out.  The checks for whether the
> DTV needs updating are "safe enough".  As in, a thread will take care of
> its own DTV, so there won't be races with other threads.

But it's also called for the thread which invoked dlopen.  Does 
_dl_update_slotinfo (called from _dl_try_allocate_static_tls) take care 
of that?

> However, it's
> not AS-Safe: if DTV resizing is interrupted by a signal whose handler
> triggers another DTV resizing, we'll have trouble, as we always had.

Can we block signals during DTV resizing?

> That's one of the reasons why GD TLS is not AS-Safe.  Another is that
> dynamic TLS blocks may have to be allocated, and we use(d?)  AS-Unsafe
> means for the allocation.  There may be other reasons that don't come to
> mind right now.

We still have malloc calls in there.  It's not just a problem for 
asynchronous signals, we do not have a way to report allocation 
failures, either.

Thanks,
Florian
  
Florian Weimer Dec. 6, 2016, 8:03 a.m. UTC | #12
On 12/06/2016 07:56 AM, Alexandre Oliva wrote:
> On Dec  2, 2016, Florian Weimer <fweimer@redhat.com> wrote:
>> What's the status here?  It seems that Alexandre committed it on
>> September 22nd, despite Torvald's objections.
>
> No, I committed the *first* of two patches, that I'd posted on 2016-09-20.
> https://sourceware.org/ml/libc-alpha/2016-09/msg00365.html

I'm sorry I reconstructed the history here incorrectly.

>> This patch broke non-optimized global-dynamic TLS on aarch64 (for
>> shared builds), as shown by my new tst-tls-manydynamic test case.
>
> Would you please try the second patch and confirm that it fixes the
> problem?

If I revert the change to allocatestack.c, the test case passes.

> What you're hitting is most likely the DTV resizing race
> condition described in the comments there, at the '(**)' note.

It's not even a race.  The test case serializes DTV access externally, 
and still fails.  So the comment about a race is rather misleading.

Thanks,
Florian
  
Alexandre Oliva Dec. 6, 2016, 9:13 a.m. UTC | #13
On Dec  6, 2016, Florian Weimer <fweimer@redhat.com> wrote:

> On 12/06/2016 07:56 AM, Alexandre Oliva wrote:
>> On Dec  2, 2016, Florian Weimer <fweimer@redhat.com> wrote:
>>> What's the status here?  It seems that Alexandre committed it on
>>> September 22nd, despite Torvald's objections.
>> 
>> No, I committed the *first* of two patches, that I'd posted on 2016-09-20.
>> https://sourceware.org/ml/libc-alpha/2016-09/msg00365.html

> I'm sorry I reconstructed the history here incorrectly.

No problem, I recall the whole thing was already confusing back when I
tried to explain it to Carlos for him to pick it up ;-)

> If I revert the change to allocatestack.c, the test case passes.

Thanks

>> What you're hitting is most likely the DTV resizing race
>> condition described in the comments there, at the '(**)' note.

> It's not even a race.  The test case serializes DTV access externally,
> and still fails.  So the comment about a race is rather misleading.

Ah, sorry, I haven't looked at the testcase yet, but from your
description, I suppose you eliminated the race.  We still access the DTV
past the end, though.  Earlier (before the zeroth patch installed on
March, 2015), we had a limit on the Static TLS modid, so that the DTV
entry initialization wouldn't write past the end of the DTV.  That patch
lifted that limit, and removed the DTV entry initialization from that
point.

The first patch (2016-09-20's) mistakenly brought that code back.  We
have to revert that.  We should probably mention the writing past the
end of the DTV in the comments too.
  
Alexandre Oliva Dec. 6, 2016, 9:30 a.m. UTC | #14
On Dec  6, 2016, Florian Weimer <fweimer@redhat.com> wrote:

> On 12/06/2016 08:15 AM, Alexandre Oliva wrote:
>> On Dec  2, 2016, Florian Weimer <fweimer@redhat.com> wrote:
>> 
>>> I don't know if it is really needed,
>> 
>> No, it's wrong, it had been fixed before.
>> 
>>> I and I have no idea how to make initialization at this point safe
>> 
>> It's safe to leave that initialization out.  The checks for whether the
>> DTV needs updating are "safe enough".  As in, a thread will take care of
>> its own DTV, so there won't be races with other threads.

> But it's also called for the thread which invoked dlopen.  Does
> _dl_update_slotinfo (called from _dl_try_allocate_static_tls) take
> care of that?

Yeah, exactly.  The DTV update performed by the thread itself will take
care of it, we don't need to set up the DTV entries (other than those
for the LE module, that the code in the first patch that's not reverted
in the second takes care of) as we bump the DTV generation: that will
cause tls_get_addr to perform the update.

>> However, it's
>> not AS-Safe: if DTV resizing is interrupted by a signal whose handler
>> triggers another DTV resizing, we'll have trouble, as we always had.

> Can we block signals during DTV resizing?

That would be a reasonable way to address this problem.

>> That's one of the reasons why GD TLS is not AS-Safe.  Another is that
>> dynamic TLS blocks may have to be allocated, and we use(d?)  AS-Unsafe
>> means for the allocation.  There may be other reasons that don't come to
>> mind right now.

> We still have malloc calls in there.

*nod*.  I think we could make malloc AS-Safe by having an alternate
arena to fallback to, if the preferred arena is already locked, and if
we realize we have to fallback to the alternate arena, we block signals
before locking it and unblock them once the allocation is done.  This
would probably be better than using an alternate allocator for the DTV
and for Static TLS and whatnot.

> It's not just a problem for asynchronous signals, we do not have a way
> to report allocation failures, either.

Yeah, that's an essential consequence of deferred allocation of dynamic
TLS blocks and of per-thread DTV resizing.  We could avoid part of it by
always allocating TLS blocks statically, with some additional
synchronization at dlopen, but dealing with the DTV could be a lot more
interesting.  Ensuring one thread won't access its DTV while it's being
resized could be tricky, and ensuring the accesses by the owner thread
are well-defined after another thread resized and updated the DTV,
rather than races under the chosen memory model, could become a
performance nightmare.
  
Florian Weimer Dec. 6, 2016, 9:41 a.m. UTC | #15
On 12/06/2016 10:30 AM, Alexandre Oliva wrote:

> *nod*.  I think we could make malloc AS-Safe by having an alternate
> arena to fallback to, if the preferred arena is already locked, and if
> we realize we have to fallback to the alternate arena, we block signals
> before locking it and unblock them once the allocation is done.  This
> would probably be better than using an alternate allocator for the DTV
> and for Static TLS and whatnot.

We need a separate allocator for thread-local data to get predictable 
performance, especially in NUMA environments.  Otherwise, application 
could get unlucky and put TLS data structures on the same page as 
application data used by a completely different thread, and performance 
will be horrible.

>> It's not just a problem for asynchronous signals, we do not have a way
>> to report allocation failures, either.
>
> Yeah, that's an essential consequence of deferred allocation of dynamic
> TLS blocks and of per-thread DTV resizing.  We could avoid part of it by
> always allocating TLS blocks statically, with some additional
> synchronization at dlopen, but dealing with the DTV could be a lot more
> interesting.

But we know its size at dlopen time, right?  So we could perform the 
allocation then and just defer installing it.

Thanks,
Florian
  
Torvald Riegel Dec. 6, 2016, 9:50 a.m. UTC | #16
On Tue, 2016-12-06 at 04:56 -0200, Alexandre Oliva wrote:
> It's quite unfortunate that this known race has been left unfixed for so
> long because of objections to the wording of comments.

Don't you think it would be easier to understand what's going on for all
of us if the comments were better?  There would likely be less need to
have a long discussions -- such as the one we're having right now --
about what is what and why things are the way they are.
  
Carlos O'Donell Dec. 6, 2016, 1:01 p.m. UTC | #17
On 12/06/2016 03:03 AM, Florian Weimer wrote:
> On 12/06/2016 07:56 AM, Alexandre Oliva wrote:
>> On Dec  2, 2016, Florian Weimer <fweimer@redhat.com> wrote:
>>> What's the status here?  It seems that Alexandre committed it on
>>> September 22nd, despite Torvald's objections.
>>
>> No, I committed the *first* of two patches, that I'd posted on 2016-09-20.
>> https://sourceware.org/ml/libc-alpha/2016-09/msg00365.html
> 
> I'm sorry I reconstructed the history here incorrectly.

So did I. 

My deepest apologies Alex, you did indeed have consensus from Andreas
over the first of two patches.
  
Alexandre Oliva Dec. 6, 2016, 7:41 p.m. UTC | #18
On Dec  6, 2016, Florian Weimer <fweimer@redhat.com> wrote:

> On 12/06/2016 10:30 AM, Alexandre Oliva wrote:
>> *nod*.  I think we could make malloc AS-Safe by having an alternate
>> arena to fallback to, if the preferred arena is already locked, and if
>> we realize we have to fallback to the alternate arena, we block signals
>> before locking it and unblock them once the allocation is done.  This
>> would probably be better than using an alternate allocator for the DTV
>> and for Static TLS and whatnot.

> We need a separate allocator for thread-local data to get predictable
> performance, especially in NUMA environments.

Sure, but that's not an argument against making malloc AS-Safe, is it?

There may be an argument for making malloc behave in a different way
within signal handlers, assuming allocations within them might end up
being frequently used by them in other threads as well, except when
allocating for TLS, because then we know the allocation is thread local.

>>> It's not just a problem for asynchronous signals, we do not have a way
>>> to report allocation failures, either.

>> Yeah, that's an essential consequence of deferred allocation of dynamic
>> TLS blocks and of per-thread DTV resizing.  We could avoid part of it by
>> always allocating TLS blocks statically, with some additional
>> synchronization at dlopen, but dealing with the DTV could be a lot more
>> interesting.

> But we know its size at dlopen time, right?  So we could perform the
> allocation then and just defer installing it.

Yeah, I suppose that may work.  It could even make TLS allocations more
efficient and thread-aware, say, by allocating pages and using the same
layout for the DTV and for corresponding TLS blocks for all threads.
I'm not sure how to make such allocations are placed efficiently in NUMA
settings, but that can be worked out.  Synchronization with threads
created or terminated while the creation is ongoing might be an
interesting nut to crack, especially in order to avoid priority
inversion, but my intuition suggests none of these issues are
insurmountable.

I'm just a bit worried about the overhead of allocating and deallocating
TLS blocks for all threads, in processes with thousands of threads, when
a temporarily loaded library brings in a TLS block that is to be
discarded shortly thereafter, when the library is unloaded.  I don't
know how common that is, or whether we want to discourage that, but it
would surely be a major change to the execution profile.  Not much more
different than the change I made years ago that turned formerly dynamic
TLS blocks into late-loaded static TLS ones, with similar consequences,
and I didn't see anyone complaining about that...  So maybe it's not
something to worry too much over, after all.  I suppose dynamic TLS not
promoted to static TLS has become such an uncommon case on platforms
that use TLS descriptors that it's nearly a corner case that those who
end up exercising it get what they deserve ;-) (though we could offer
means to adjust the initial static TLS block to enable undeserving users
to avoid this undesirable scenario :-)

Anyway...  This is all way outside the work I'm scoped to do, or my
responsibility to fixing the bugs I introduced, so I'll leave it to you
all and get my focus back on the things I'm supposed to be doing.
  
Alexandre Oliva Dec. 6, 2016, 7:43 p.m. UTC | #19
On Dec  6, 2016, "Carlos O'Donell" <carlos@redhat.com> wrote:

> My deepest apologies Alex, you did indeed have consensus from Andreas
> over the first of two patches.

Apologies accepted, no hard feelings, and thanks for the confirmation.
  

Patch

diff --git a/elf/dl-reloc.c b/elf/dl-reloc.c
index 42bddc1..dcab666 100644
--- a/elf/dl-reloc.c
+++ b/elf/dl-reloc.c
@@ -137,6 +137,12 @@  _dl_nothread_init_static_tls (struct link_map *map)
 # error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
 #endif
 
+  /* Fill in the DTV slot so that a later LD/GD access will find it.  */
+  dtv_t *dtv = THREAD_DTV ();
+  assert (map->l_tls_modid <= dtv[-1].counter);
+  dtv[map->l_tls_modid].pointer.to_free = NULL;
+  dtv[map->l_tls_modid].pointer.val = dest;
+
   /* Initialize the memory.  */
   memset (__mempcpy (dest, map->l_tls_initimage, map->l_tls_initimage_size),
 	  '\0', map->l_tls_blocksize - map->l_tls_initimage_size);
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 17567ad..60f4c1d 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -538,6 +538,10 @@  _dl_allocate_tls_init (void *result)
 # error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
 #endif
 
+	  /* Set up the DTV entry.  The simplified __tls_get_addr that
+	     some platforms use in static programs requires it.  */
+	  dtv[map->l_tls_modid].pointer.val = dest;
+
 	  /* Copy the initialization image and clear the BSS part.  */
 	  memset (__mempcpy (dest, map->l_tls_initimage,
 			     map->l_tls_initimage_size), '\0',
diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
index 60b34dc..3016a2e 100644
--- a/nptl/allocatestack.c
+++ b/nptl/allocatestack.c
@@ -1207,9 +1207,12 @@  init_one_static_tls (struct pthread *curp, struct link_map *map)
 #  error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
 # endif
 
-  /* We cannot delay the initialization of the Static TLS area, since
-     it can be accessed with LE or IE, but since the DTV is only used
-     by GD and LD, we can delay its update to avoid a race.  */
+  /* Fill in the DTV slot so that a later LD/GD access will find it.  */
+  dtv_t *dtv = GET_DTV (TLS_TPADJ (curp));
+  dtv[map->l_tls_modid].pointer.to_free = NULL;
+  dtv[map->l_tls_modid].pointer.val = dest;
+
+  /* Initialize the memory.  */
   memset (__mempcpy (dest, map->l_tls_initimage, map->l_tls_initimage_size),
 	  '\0', map->l_tls_blocksize - map->l_tls_initimage_size);
 }