[BZ#17090/17620/17621] : fix DTV race, assert, and DTV_SURPLUS Static TLS limit

  On Sep 23, 2016, Florian Weimer <fw@deneb.enyo.de> wrote:

> * Alexandre Oliva:
>> On Sep 15, 2016, Andreas Schwab <schwab@suse.de> wrote:
>>> This breaks TLS in static programs on armv7, see BZ #19826.

>> Do we want the fix for BZ#19826 backported to older branches?

> That would make sense, yes.

Here's a backport of the patch, that applies cleanly on 2.24, 2.23 and
2.22 (the first affected by the regression), and that I tested by
applying it on top of the tag of each of these releases.

Would it make sense to install it in all 3 release branches?

Attentive readers will notice that the patch removes a comment that
alludes to a race, and brings back the code that had been replaced by
the comment.  When I posted the patch for the trunk, I restored the DTV
initializers to all 3 functions, although only one of them actually
needed them.  I'm afraid my sense of aesthetics overruled my sense of
thread-safety, because I couldn't figure out what the race was,
investigated the wrong potential race and got convinced it wouldn't hit.
But now, just before posting this message, I see I was mistaken: the
patch reintroduced a race in the trunk, that I'll proceed to fix.

The race amounts to the following scenario: one thread loads a module
that has a TLS block that happens to be assigned to static TLS, getting
nptl/allocatestack.c's __pthread_init_static_tls called through
GL(dl_init_static_tls) to initialize the TLS block for the module in all
threads' static TLS blocks.

__pthread_init_static_tls calls init_one_static_tls for each active
thread, initializing the DTV entry and copying the TLS initialization
block to the static TLS block of each thread.

IIRC the race arises when one thread decides to resize its DTV while the
initializer above is modifying it.  (if it decides to update it without
resizing, we also have a race, but both threads end up writing the same
final value, which AFAIK is not so much of a problem)

The end result is that the initializer may write to stale/freed memory,
that may even have already been reassigned to some other purpose.
That's very bad indeed.

So, although I post the entire backported patch, because that's what I
tested, and the comment I refer to above is in a part of the patch that,
if applied, would restore the described race, I immediately withdraw it,
and propose instead to revert the changes to elf/dl-reloc.c and
nptl/allocatestack.c that I'd installed earlier this week to the master
branch.  They're not necessary for use with the regular __tls_get_addr,
and they won't affect the alternate __tls_get_addr used in static
programs, because the Static TLS block for the main program (the only
module it cares about) and its DTV entry are always set up by
__libc_setup_tls (main thread) or _dl_allocate_tls_init (other threads):
it will never be the case that we will dlopen the main program after the
program starts and have to initialize its static TLS blocks (and DTV
entry) for all running threads, because we never dlopen the main
program.

For the master branch, I'll post a separate patch momentarily, reverting
the incorrect portion of the recent change, and expanding the comments
about the race condition, so that neither myself nor anyone else ever
mistakes another (harmless?) race for that one: that one thread
initializes other threads' TLS blocks while holding the rtld lock, but
later on the other threads access the initialized TLS blocks without
taking any locks or any form of memory synchronization that would
clearly establish, memory-wise, the implied happens-before.

For the earlier branches, I propose we install only the elf/dl-tls.c
change from the patch below; the elf/dl-reloc.c change could stay, but
I'm now more inclined to take it out, for uniformity.

I'll retest and repost the patches for master and branches early next
week, if not over the weekend.

Andreas, could you please confirm that you tested the dl-tls.c change by
itself on ARM, and that it was enough to fix the problem on its own?
Thanks,

WARNING: the patch below is broken, do not use!

[PR19826] fix non-LE TLS in static programs

An earlier fix for TLS dropped early initialization of DTV entries for
modules using static TLS, leaving it for __tls_get_addr to set them
up.  That worked on platforms that require the GD access model to be
relaxed to LE in the main executable, but it caused a regression on
platforms that allow GD in the main executable, particularly in
statically-linked programs: they use a custom __tls_get_addr that does
not update the DTV, which fails when the DTV early initialization is
not performed.

In static programs, __libc_setup_tls performs the DTV initialization
for the main thread, but the DTV of other threads is set up in
_dl_allocate_tls_init, so that's the fix that matters.

Restoring the initialization in the remaining functions modified by
this patch was just for uniformity.  It's not clear that it is ever
needed: even on platforms that allow GD in the main executable, the
dynamically-linked version of __tls_get_addr would set up the DTV
entries, even for static TLS modules, while updating the DTV counter.

for  ChangeLog

	[BZ #19826]
	* elf/dl-tls.c (_dl_allocate_tls_init): Restore DTV early
	initialization of static TLS entries.
	* elf/dl-reloc.c (_dl_nothread_init_static_tls): Likewise.
	* nptl/allocatestack.c (init_one_static_tls): Likewise.
---
 elf/dl-reloc.c       |    6 ++++++
 elf/dl-tls.c         |    5 +++++
 nptl/allocatestack.c |    9 ++++++---
 3 files changed, 17 insertions(+), 3 deletions(-)

[BZ#17090/17620/17621] : fix DTV race, assert, and DTV_SURPLUS Static TLS limit

Commit Message

Comments

Patch