From patchwork Sat Sep 24 02:52:50 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandre Oliva X-Patchwork-Id: 15968 Received: (qmail 16643 invoked by alias); 24 Sep 2016 02:53:39 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 15705 invoked by uid 89); 24 Sep 2016 02:53:38 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.0 required=5.0 tests=BAYES_00, RP_MATCHES_RCVD, SPF_HELO_PASS, UNSUBSCRIBE_BODY autolearn=ham version=3.3.2 spammy=investigated, amounts, locks, 1376 X-HELO: mx1.redhat.com From: Alexandre Oliva To: Florian Weimer Cc: Andreas Schwab , Richard Henderson , Alan Modra , Roland McGrath , codonell@redhat.com, libc-alpha@sourceware.org, rth@redhat.com Subject: Re: [BZ#17090/17620/17621]: fix DTV race, assert, and DTV_SURPLUS Static TLS limit References: <20141118224048.600312C3B23@topped-with-meat.com> <20141120021703.86F032C3B18@topped-with-meat.com> <20150304050529.GD26435@bubble.grove.modra.org> <20150304110430.GE26435@bubble.grove.modra.org> <55072871.3060104@twiddle.net> <87lgyi9wq3.fsf@mid.deneb.enyo.de> Date: Fri, 23 Sep 2016 23:52:50 -0300 In-Reply-To: <87lgyi9wq3.fsf@mid.deneb.enyo.de> (Florian Weimer's message of "Fri, 23 Sep 2016 22:39:48 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) MIME-Version: 1.0 On Sep 23, 2016, Florian Weimer wrote: > * Alexandre Oliva: >> On Sep 15, 2016, Andreas Schwab wrote: >>> This breaks TLS in static programs on armv7, see BZ #19826. >> Do we want the fix for BZ#19826 backported to older branches? > That would make sense, yes. Here's a backport of the patch, that applies cleanly on 2.24, 2.23 and 2.22 (the first affected by the regression), and that I tested by applying it on top of the tag of each of these releases. Would it make sense to install it in all 3 release branches? Attentive readers will notice that the patch removes a comment that alludes to a race, and brings back the code that had been replaced by the comment. When I posted the patch for the trunk, I restored the DTV initializers to all 3 functions, although only one of them actually needed them. I'm afraid my sense of aesthetics overruled my sense of thread-safety, because I couldn't figure out what the race was, investigated the wrong potential race and got convinced it wouldn't hit. But now, just before posting this message, I see I was mistaken: the patch reintroduced a race in the trunk, that I'll proceed to fix. The race amounts to the following scenario: one thread loads a module that has a TLS block that happens to be assigned to static TLS, getting nptl/allocatestack.c's __pthread_init_static_tls called through GL(dl_init_static_tls) to initialize the TLS block for the module in all threads' static TLS blocks. __pthread_init_static_tls calls init_one_static_tls for each active thread, initializing the DTV entry and copying the TLS initialization block to the static TLS block of each thread. IIRC the race arises when one thread decides to resize its DTV while the initializer above is modifying it. (if it decides to update it without resizing, we also have a race, but both threads end up writing the same final value, which AFAIK is not so much of a problem) The end result is that the initializer may write to stale/freed memory, that may even have already been reassigned to some other purpose. That's very bad indeed. So, although I post the entire backported patch, because that's what I tested, and the comment I refer to above is in a part of the patch that, if applied, would restore the described race, I immediately withdraw it, and propose instead to revert the changes to elf/dl-reloc.c and nptl/allocatestack.c that I'd installed earlier this week to the master branch. They're not necessary for use with the regular __tls_get_addr, and they won't affect the alternate __tls_get_addr used in static programs, because the Static TLS block for the main program (the only module it cares about) and its DTV entry are always set up by __libc_setup_tls (main thread) or _dl_allocate_tls_init (other threads): it will never be the case that we will dlopen the main program after the program starts and have to initialize its static TLS blocks (and DTV entry) for all running threads, because we never dlopen the main program. For the master branch, I'll post a separate patch momentarily, reverting the incorrect portion of the recent change, and expanding the comments about the race condition, so that neither myself nor anyone else ever mistakes another (harmless?) race for that one: that one thread initializes other threads' TLS blocks while holding the rtld lock, but later on the other threads access the initialized TLS blocks without taking any locks or any form of memory synchronization that would clearly establish, memory-wise, the implied happens-before. For the earlier branches, I propose we install only the elf/dl-tls.c change from the patch below; the elf/dl-reloc.c change could stay, but I'm now more inclined to take it out, for uniformity. I'll retest and repost the patches for master and branches early next week, if not over the weekend. Andreas, could you please confirm that you tested the dl-tls.c change by itself on ARM, and that it was enough to fix the problem on its own? Thanks, WARNING: the patch below is broken, do not use! [PR19826] fix non-LE TLS in static programs An earlier fix for TLS dropped early initialization of DTV entries for modules using static TLS, leaving it for __tls_get_addr to set them up. That worked on platforms that require the GD access model to be relaxed to LE in the main executable, but it caused a regression on platforms that allow GD in the main executable, particularly in statically-linked programs: they use a custom __tls_get_addr that does not update the DTV, which fails when the DTV early initialization is not performed. In static programs, __libc_setup_tls performs the DTV initialization for the main thread, but the DTV of other threads is set up in _dl_allocate_tls_init, so that's the fix that matters. Restoring the initialization in the remaining functions modified by this patch was just for uniformity. It's not clear that it is ever needed: even on platforms that allow GD in the main executable, the dynamically-linked version of __tls_get_addr would set up the DTV entries, even for static TLS modules, while updating the DTV counter. for ChangeLog [BZ #19826] * elf/dl-tls.c (_dl_allocate_tls_init): Restore DTV early initialization of static TLS entries. * elf/dl-reloc.c (_dl_nothread_init_static_tls): Likewise. * nptl/allocatestack.c (init_one_static_tls): Likewise. --- elf/dl-reloc.c | 6 ++++++ elf/dl-tls.c | 5 +++++ nptl/allocatestack.c | 9 ++++++--- 3 files changed, 17 insertions(+), 3 deletions(-) diff --git a/elf/dl-reloc.c b/elf/dl-reloc.c index 42bddc1..64e3aea 100644 --- a/elf/dl-reloc.c +++ b/elf/dl-reloc.c @@ -137,6 +137,12 @@ _dl_nothread_init_static_tls (struct link_map *map) # error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined" #endif + /* Fill in the DTV slot so that a later LD/GD access will find it. */ + dtv_t *dtv = THREAD_DTV (); + assert (map->l_tls_modid <= dtv[-1].counter); + dtv[map->l_tls_modid].pointer.is_static = true; + dtv[map->l_tls_modid].pointer.val = dest; + /* Initialize the memory. */ memset (__mempcpy (dest, map->l_tls_initimage, map->l_tls_initimage_size), '\0', map->l_tls_blocksize - map->l_tls_initimage_size); diff --git a/elf/dl-tls.c b/elf/dl-tls.c index ed13fd9..0add960 100644 --- a/elf/dl-tls.c +++ b/elf/dl-tls.c @@ -511,6 +511,11 @@ _dl_allocate_tls_init (void *result) # error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined" #endif + /* Set up the DTV entry. The simplified __tls_get_addr that + some platforms use in static programs requires it. */ + dtv[map->l_tls_modid].pointer.is_static = true; + dtv[map->l_tls_modid].pointer.val = dest; + /* Copy the initialization image and clear the BSS part. */ memset (__mempcpy (dest, map->l_tls_initimage, map->l_tls_initimage_size), '\0', diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c index 6b42b11..99002b2 100644 --- a/nptl/allocatestack.c +++ b/nptl/allocatestack.c @@ -1209,9 +1209,12 @@ init_one_static_tls (struct pthread *curp, struct link_map *map) # error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined" # endif - /* We cannot delay the initialization of the Static TLS area, since - it can be accessed with LE or IE, but since the DTV is only used - by GD and LD, we can delay its update to avoid a race. */ + /* Fill in the DTV slot so that a later LD/GD access will find it. */ + dtv_t *dtv = GET_DTV (TLS_TPADJ (curp)); + dtv[map->l_tls_modid].pointer.is_static = true; + dtv[map->l_tls_modid].pointer.val = dest; + + /* Initialize the memory. */ memset (__mempcpy (dest, map->l_tls_initimage, map->l_tls_initimage_size), '\0', map->l_tls_blocksize - map->l_tls_initimage_size); }