From patchwork Thu Nov 27 23:55:32 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 3984 Received: (qmail 17243 invoked by alias); 27 Nov 2014 23:55:40 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 17231 invoked by uid 89); 27 Nov 2014 23:55:39 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-yk0-f175.google.com X-Received: by 10.170.61.202 with SMTP id d193mr45999301ykd.32.1417132534986; Thu, 27 Nov 2014 15:55:34 -0800 (PST) Date: Thu, 27 Nov 2014 15:55:32 -0800 From: "H.J. Lu" To: Carlos O'Donell Cc: Joseph Myers , Paul Archard , Ond??ej B??lka , MyungJoo Ham , GNU C Library , Torvald Riegel Subject: Re: [PATCH][BZ #13862] Reuse of cached stack can cause bounds overrun of thread DTV Message-ID: <20141127235532.GA18235@gmail.com> References: <20141127140654.GA31381@gmail.com> <5477512E.1080403@redhat.com> <54779CCC.8030505@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <54779CCC.8030505@redhat.com> User-Agent: Mutt/1.5.23 (2014-03-12) On Thu, Nov 27, 2014 at 04:51:08PM -0500, Carlos O'Donell wrote: > > >> Q: A dlclose() may modify GL(dl_tls_max_dtv_idx) concurrently with another > >> thread starting up. Is it safe to read GL(dl_tls_max_dtv_idx) more than > >> once in a function since they may differ? Does reading GL(dl_tls_max_dtv_idx) > >> require an atomic load? We are trying to be explicit about this in glibc. > > > > There is a race between DTV access and dlclose: > > That doesn't answer my question. > > Alex Oliva and I went through the code and we have come to the conclusion > that it is *not* safe to read GL(dl_tls_max_dtv_idx) more than once if the > code doing the reading expects to get the same value. The same value is > only guaranteed if you hold GL(dl_load_lock). However, it is safe to read > an old value, since _dl_update_slotinfo will resize the DTV again if you need > to access a TLS variable of a dlopen that was in-flight during thread creation. > > Therefore, please make the read of GL(dl_tls_max_dtv_idx) an atomic access > to prevent the compiler from reloading this and to document that it is > written to by other threads concurrently. Done. I used atomic_load_acquire now. > > >> Q: If GL(dl_tls_max_dtv_idx) may change at any time during thread startup > >> does it make sense to reallocate the dtv? Why not leave it uninitialized > >> and allow _dl_update_slotinfo to reallocate it? > > > > Are you suggesting folding _dl_allocate_tls_init into _dl_update_slotinfo > > and remove _dl_allocate_tls_init? I can give it a try. > > I had not thought of that, but it's a good idea. > > I don't know how you'd make it work though. > > I think _dl_allocate_tls_init needs to do some book keeping, but the > allocation of the DTV could be deferred to __tls_get_addr time > (_dl_update_slotinfo). I tried and it doesn't work. > > > > There is a race condition. But it is a separate issue. They should > > be fixed together. > > You are adding new code. That new code should be correct. This needs > an atomic load. Done. See above. > >> Missing license. > > > > What is the license policy on testcases? > > Same license as the project with the appropriate header. Added. > >>> + > >>> +#define DSO_SHARED_FILES 20 > >>> +#define DSO_OPEN_THREADS 20 > >>> +#define DSO_EXEC_THREADS 2 > >> > >> Why these particular constants? If you chose them because > >> they happened to work on a particular system, please comment > >> that. > > > > They came from: > > > > https://sourceware.org/bugzilla/attachment.cgi?id=6290 > > You don't answer the question. > > If you are submitting the patch you have to be ready to defend it. > > If you can't defend it you must at least add a comment saying the > choices are arbitrary. > > /* The choices of thread count, and file counts are arbitary. > The point is simply to run enough threads that an exiting > thread has it's stack reused by another thread at the same > time as new libraries have been loaded. */ Thanks for the suggestion. I copied it to the testcase. > >> Use /* */. > > > > We have been using // for comments in glibc for a while. Why not here? > > glibc follows the GNU Coding Standard which says to use /* */? > Done. Changed to /**/. > >> > >> This test needs a verbose, minimum one paragraph, comment explaining > >> what the test is testing. > > No comment? I added one paragraph for each file. > >>> + > >>> + for (j = 0; j < 100; j++) > >> > >> Why 100 times? Please comment. > > Comment stating this is arbitrary. Done. > >>> +void > >>> +function (void) > >>> +{ > >>> + int i; > >>> + for (i = 0; i < 256; i++) > >> > >> Why 256 times? Please comment. > >>> + var[i] = i; > >>> +} > >>> > > > > I don't have answers for any particular values in this > > testcase. > > Then add a comment saying they are arbitrary. Such that > future readers know they are not special values. > I added some comments. Here is the updated patch. OK to install? Thanks. H.J. --- From f08bbe49c8820576e2149a5be44af0ce2ddb355b Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Thu, 27 Nov 2014 05:42:23 -0800 Subject: [PATCH] Resize DTV if the current DTV isn't big enough This patch changes _dl_allocate_tls_init to resize DTV if the current DTV isn't big enough. Tested on X86-64, x32 and ia32. [BZ #13862] * elf/dl-tls.c: Include . (oom): Remove #ifdef SHARED/#endif. (_dl_static_dtv, _dl_initial_dtv): Moved before ... (_dl_resize_dtv): This. Extracted from _dl_update_slotinfo. (_dl_allocate_tls_init): Resize DTV if the current DTV isn't big enough. (_dl_update_slotinfo): Call _dl_resize_dtv to resize DTV. * nptl/Makefile (tests): Add tst-stack4. (modules-names): Add tst-stack4mod. ($(objpfx)tst-stack4): New. (tst-stack4mod.sos): Likewise. ($(objpfx)tst-stack4.out): Likewise. ($(tst-stack4mod.sos)): Likewise. (clean): Likewise. * nptl/tst-stack4.c: New file. * nptl/tst-stack4mod.c: Likewise. --- ChangeLog | 20 +++++++ elf/dl-tls.c | 102 ++++++++++++++++++++------------- nptl/Makefile | 17 +++++- nptl/tst-stack4.c | 159 +++++++++++++++++++++++++++++++++++++++++++++++++++ nptl/tst-stack4mod.c | 28 +++++++++ 5 files changed, 283 insertions(+), 43 deletions(-) create mode 100644 nptl/tst-stack4.c create mode 100644 nptl/tst-stack4mod.c diff --git a/ChangeLog b/ChangeLog index be49dba..6a535e7 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,23 @@ +2014-11-27 H.J. Lu + + [BZ #13862] + * elf/dl-tls.c: Include . + (oom): Remove #ifdef SHARED/#endif. + (_dl_static_dtv, _dl_initial_dtv): Moved before ... + (_dl_resize_dtv): This. Extracted from _dl_update_slotinfo. + (_dl_allocate_tls_init): Resize DTV if the current DTV isn't + big enough. + (_dl_update_slotinfo): Call _dl_resize_dtv to resize DTV. + * nptl/Makefile (tests): Add tst-stack4. + (modules-names): Add tst-stack4mod. + ($(objpfx)tst-stack4): New. + (tst-stack4mod.sos): Likewise. + ($(objpfx)tst-stack4.out): Likewise. + ($(tst-stack4mod.sos)): Likewise. + (clean): Likewise. + * nptl/tst-stack4.c: New file. + * nptl/tst-stack4mod.c: Likewise. + 2014-11-27 J. Brown * sysdeps/x86/bits/string.h: Add recent CPUs. diff --git a/elf/dl-tls.c b/elf/dl-tls.c index 5204fda..8dd4992 100644 --- a/elf/dl-tls.c +++ b/elf/dl-tls.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include @@ -34,14 +35,12 @@ /* Out-of-memory handler. */ -#ifdef SHARED static void __attribute__ ((__noreturn__)) oom (void) { _dl_fatal_printf ("cannot allocate memory for thread-local data: ABORT\n"); } -#endif size_t @@ -397,6 +396,53 @@ _dl_allocate_tls_storage (void) } +#ifndef SHARED +extern dtv_t _dl_static_dtv[]; +# define _dl_initial_dtv (&_dl_static_dtv[1]) +#endif + +static dtv_t * +_dl_resize_dtv (dtv_t *dtv) +{ + /* Resize the dtv. */ + dtv_t *newp; + /* Load GL(dl_tls_max_dtv_idx) atomically since it may be written to by + other threads oncurrently. */ + size_t newsize + = atomic_load_acquire (&GL(dl_tls_max_dtv_idx)) + DTV_SURPLUS; + size_t oldsize = dtv[-1].counter; + + if (dtv == GL(dl_initial_dtv)) + { + /* This is the initial dtv that was either statically allocated in + __libc_setup_tls or allocated during rtld startup using the + dl-minimal.c malloc instead of the real malloc. We can't free + it, we have to abandon the old storage. */ + + newp = malloc ((2 + newsize) * sizeof (dtv_t)); + if (newp == NULL) + oom (); + memcpy (newp, &dtv[-1], (2 + oldsize) * sizeof (dtv_t)); + } + else + { + newp = realloc (&dtv[-1], + (2 + newsize) * sizeof (dtv_t)); + if (newp == NULL) + oom (); + } + + newp[0].counter = newsize; + + /* Clear the newly allocated part. */ + memset (newp + 2 + oldsize, '\0', + (newsize - oldsize) * sizeof (dtv_t)); + + /* Return the generation counter. */ + return &newp[1]; +} + + void * internal_function _dl_allocate_tls_init (void *result) @@ -410,6 +456,16 @@ _dl_allocate_tls_init (void *result) size_t total = 0; size_t maxgen = 0; + /* Check if the current dtv is big enough. */ + if (dtv[-1].counter < GL(dl_tls_max_dtv_idx)) + { + /* Resize the dtv. */ + dtv = _dl_resize_dtv (dtv); + + /* Install this new dtv in the thread data structures. */ + INSTALL_DTV (result, &dtv[-1]); + } + /* We have to prepare the dtv for all currently loaded modules using TLS. For those which are dynamically loaded we add the values indicating deferred allocation. */ @@ -492,11 +548,6 @@ _dl_allocate_tls (void *mem) rtld_hidden_def (_dl_allocate_tls) -#ifndef SHARED -extern dtv_t _dl_static_dtv[]; -# define _dl_initial_dtv (&_dl_static_dtv[1]) -#endif - void internal_function _dl_deallocate_tls (void *tcb, bool dealloc_tcb) @@ -645,41 +696,10 @@ _dl_update_slotinfo (unsigned long int req_modid) assert (total + cnt == modid); if (dtv[-1].counter < modid) { - /* Reallocate the dtv. */ - dtv_t *newp; - size_t newsize = GL(dl_tls_max_dtv_idx) + DTV_SURPLUS; - size_t oldsize = dtv[-1].counter; - - assert (map->l_tls_modid <= newsize); - - if (dtv == GL(dl_initial_dtv)) - { - /* This is the initial dtv that was allocated - during rtld startup using the dl-minimal.c - malloc instead of the real malloc. We can't - free it, we have to abandon the old storage. */ - - newp = malloc ((2 + newsize) * sizeof (dtv_t)); - if (newp == NULL) - oom (); - memcpy (newp, &dtv[-1], (2 + oldsize) * sizeof (dtv_t)); - } - else - { - newp = realloc (&dtv[-1], - (2 + newsize) * sizeof (dtv_t)); - if (newp == NULL) - oom (); - } - - newp[0].counter = newsize; - - /* Clear the newly allocated part. */ - memset (newp + 2 + oldsize, '\0', - (newsize - oldsize) * sizeof (dtv_t)); + /* Resize the dtv. */ + dtv = _dl_resize_dtv (dtv); - /* Point dtv to the generation counter. */ - dtv = &newp[1]; + assert (modid <= dtv[-1].counter); /* Install this new dtv in the thread data structures. */ diff --git a/nptl/Makefile b/nptl/Makefile index 86a1b0c..ac76596 100644 --- a/nptl/Makefile +++ b/nptl/Makefile @@ -254,7 +254,7 @@ tests = tst-typesizes \ tst-exec1 tst-exec2 tst-exec3 tst-exec4 \ tst-exit1 tst-exit2 tst-exit3 \ tst-stdio1 tst-stdio2 \ - tst-stack1 tst-stack2 tst-stack3 tst-pthread-getattr \ + tst-stack1 tst-stack2 tst-stack3 tst-stack4 tst-pthread-getattr \ tst-pthread-attr-affinity \ tst-unload \ tst-dlsym1 \ @@ -311,7 +311,7 @@ endif modules-names = tst-atfork2mod tst-tls3mod tst-tls4moda tst-tls4modb \ tst-tls5mod tst-tls5moda tst-tls5modb tst-tls5modc \ - tst-tls5modd tst-tls5mode tst-tls5modf \ + tst-tls5modd tst-tls5mode tst-tls5modf tst-stack4mod \ tst-_res1mod1 tst-_res1mod2 tst-execstack-mod tst-fini1mod extra-test-objs += $(addsuffix .os,$(strip $(modules-names))) tst-cleanup4aux.o test-extras += $(modules-names) tst-cleanup4aux @@ -479,6 +479,19 @@ $(objpfx)tst-stack3-mem.out: $(objpfx)tst-stack3.out $(evaluate-test) generated += tst-stack3-mem.out tst-stack3.mtrace +$(objpfx)tst-stack4: $(libdl) $(shared-thread-library) +tst-stack4mod.sos=$(shell for i in 0 1 2 3 4 5 6 7 8 9 10 \ + 11 12 13 14 15 16 17 18 19; do \ + for j in 0 1 2 3 4 5 6 7 8 9 10 \ + 11 12 13 14 15 16 17 18 19; do \ + echo $(objpfx)tst-stack4mod-$$i-$$j.so; \ + done; done) +$(objpfx)tst-stack4.out: $(tst-stack4mod.sos) +$(tst-stack4mod.sos): $(objpfx)tst-stack4mod.so + cp -f $< $@ +clean: + rm -f $(tst-stack4mod.sos) + $(objpfx)tst-cleanup4: $(objpfx)tst-cleanup4aux.o $(shared-thread-library) $(objpfx)tst-cleanupx4: $(objpfx)tst-cleanup4aux.o $(shared-thread-library) diff --git a/nptl/tst-stack4.c b/nptl/tst-stack4.c new file mode 100644 index 0000000..344c658 --- /dev/null +++ b/nptl/tst-stack4.c @@ -0,0 +1,159 @@ +/* Test DTV size overflow with pthread_create and TLS in dlopened shared + object. + Copyright (C) 2014 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include + +/* The choices of thread count, and file counts are arbitary. + The point is simply to run enough threads that an exiting + thread has it's stack reused by another thread at the same + time as new libraries have been loaded. */ +#define DSO_SHARED_FILES 20 +#define DSO_OPEN_THREADS 20 +#define DSO_EXEC_THREADS 2 + +/* Used to make sure that only one thread is calling dlopen and dlclose + at a time. */ +pthread_mutex_t g_lock; + +typedef void (*function) (void); + +void * +dso_invoke(void *dso_fun) +{ + function *fun_vec = (function *) dso_fun; + int dso; + + for (dso = 0; dso < DSO_SHARED_FILES; dso++) + (*fun_vec[dso]) (); + + pthread_exit (NULL); +} + +void * +dso_process (void * p) +{ + void *handle[DSO_SHARED_FILES]; + function fun_vec[DSO_SHARED_FILES]; + char dso_path[DSO_SHARED_FILES][100]; + int dso; + uintptr_t t = (uintptr_t) p; + + /* Open DSOs and get a function. */ + for (dso = 0; dso < DSO_SHARED_FILES; dso++) + { + sprintf (dso_path[dso], "tst-stack4mod-%i-%i.so", t, dso); + + pthread_mutex_lock (&g_lock); + + handle[dso] = dlopen (dso_path[dso], RTLD_NOW); + assert (handle[dso]); + + fun_vec[dso] = (function) dlsym (handle[dso], "function"); + assert (fun_vec[dso]); + + pthread_mutex_unlock (&g_lock); + } + + /* Spawn workers. */ + pthread_t thread[DSO_EXEC_THREADS]; + int i, ret; + uintptr_t result = 0; + for (i = 0; i < DSO_EXEC_THREADS; i++) + { + pthread_mutex_lock (&g_lock); + ret = pthread_create (&thread[i], NULL, dso_invoke, (void *) fun_vec); + if (ret != 0) + { + printf ("pthread_create failed: %d\n", ret); + result = 1; + } + pthread_mutex_unlock (&g_lock); + } + + if (!result) + for (i = 0; i < DSO_EXEC_THREADS; i++) + { + ret = pthread_join (thread[i], NULL); + if (ret != 0) + { + printf ("pthread_join failed: %d\n", ret); + result = 1; + } + } + + /* Close all DSOs. */ + for (dso = 0; dso < DSO_SHARED_FILES; dso++) + { + pthread_mutex_lock (&g_lock); + dlclose (handle[dso]); + pthread_mutex_unlock (&g_lock); + } + + /* Exit. */ + pthread_exit ((void *) result); +} + +static int +do_test (void) +{ + pthread_t thread[DSO_OPEN_THREADS]; + int i,j; + int ret; + int result = 0; + + pthread_mutex_init (&g_lock, NULL); + + /* 100 is arbitrary here and is known to trigger PR 13862. */ + for (j = 0; j < 100; j++) + { + for (i = 0; i < DSO_OPEN_THREADS; i++) + { + ret = pthread_create (&thread[i], NULL, dso_process, + (void *) (uintptr_t) i); + if (ret != 0) + { + printf ("pthread_create failed: %d\n", ret); + result = 1; + } + } + + if (result) + break; + + for (i = 0; i < DSO_OPEN_THREADS; i++) + { + ret = pthread_join (thread[i], NULL); + if (ret != 0) + { + printf ("pthread_join failed: %d\n", ret); + result = 1; + } + } + } + + return result; +} + +#define TEST_FUNCTION do_test () +#define TIMEOUT 100 +#include "../test-skeleton.c" diff --git a/nptl/tst-stack4mod.c b/nptl/tst-stack4mod.c new file mode 100644 index 0000000..a75aa09 --- /dev/null +++ b/nptl/tst-stack4mod.c @@ -0,0 +1,28 @@ +/* This tests DTV usage with TLS in dlopened shared object. + Copyright (C) 2014 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* 256 is arbitrary here and is known to trigger PR 13862. */ +__thread int var[256] attribute_hidden = {0}; + +void +function (void) +{ + int i; + for (i = 0; i < sizeof (var) / sizeof (int); i++) + var[i] = i; +}