[BZ,#13862] Reuse of cached stack can cause bounds overrun of thread DTV

Message ID 20141127235532.GA18235@gmail.com
State Superseded
Headers

Commit Message

H.J. Lu Nov. 27, 2014, 11:55 p.m. UTC
  On Thu, Nov 27, 2014 at 04:51:08PM -0500, Carlos O'Donell wrote:
> 
> >> Q: A dlclose() may modify GL(dl_tls_max_dtv_idx) concurrently with another
> >>    thread starting up. Is it safe to read GL(dl_tls_max_dtv_idx) more than
> >>    once in a function since they may differ? Does reading GL(dl_tls_max_dtv_idx)
> >>    require an atomic load? We are trying to be explicit about this in glibc.
> > 
> > There is a race between DTV access and dlclose:
> 
> That doesn't answer my question.
> 
> Alex Oliva and I went through the code and we have come to the conclusion
> that it is *not* safe to read GL(dl_tls_max_dtv_idx) more than once if the
> code doing the reading expects to get the same value. The same value is
> only guaranteed if you hold GL(dl_load_lock). However, it is safe to read
> an old value, since _dl_update_slotinfo will resize the DTV again if you need
> to access a TLS variable of a dlopen that was in-flight during thread creation.
> 
> Therefore, please make the read of GL(dl_tls_max_dtv_idx) an atomic access
> to prevent the compiler from reloading this and to document that it is
> written to by other threads concurrently.

Done.  I used atomic_load_acquire now.

> 
> >> Q: If GL(dl_tls_max_dtv_idx) may change at any time during thread startup
> >>    does it make sense to reallocate the dtv? Why not leave it uninitialized
> >>    and allow _dl_update_slotinfo to reallocate it?
> > 
> > Are you suggesting folding  _dl_allocate_tls_init into _dl_update_slotinfo
> > and remove _dl_allocate_tls_init?  I can give it a try.
> 
> I had not thought of that, but it's a good idea.
> 
> I don't know how you'd make it work though.
> 
> I think _dl_allocate_tls_init needs to do some book keeping, but the
> allocation of the DTV could be deferred to __tls_get_addr time 
> (_dl_update_slotinfo).

I tried and it doesn't work.

> > 
> > There is a race condition. But it is a separate issue.  They should
> > be fixed together.
> 
> You are adding new code. That new code should be correct. This needs
> an atomic load.

Done.  See above.
 
> >> Missing license.
> > 
> > What is the license policy on testcases?
> 
> Same license as the project with the appropriate header.

Added.

> >>> +
> >>> +#define DSO_SHARED_FILES 20
> >>> +#define DSO_OPEN_THREADS 20
> >>> +#define DSO_EXEC_THREADS 2
> >>
> >> Why these particular constants? If you chose them because
> >> they happened to work on a particular system, please comment
> >> that.
> > 
> > They came from:
> > 
> > https://sourceware.org/bugzilla/attachment.cgi?id=6290
> 
> You don't answer the question.
> 
> If you are submitting the patch you have to be ready to defend it.
> 
> If you can't defend it you must at least add a comment saying the
> choices are arbitrary.
> 
> /* The choices of thread count, and file counts are arbitary.
>    The point is simply to run enough threads that an exiting
>    thread has it's stack reused by another thread at the same
>    time as new libraries have been loaded.  */

Thanks for the suggestion.  I copied it to the testcase.

> >> Use /* */.
> > 
> > We have been using // for comments in glibc for a while.  Why not here?
> 
> glibc follows the GNU Coding Standard which says to use /* */?
> 

Done.  Changed to /**/.

> >>
> >> This test needs a verbose, minimum one paragraph, comment explaining
> >> what the test is testing.
> 
> No comment?
 
I added one paragraph for each file.

> >>> +
> >>> +  for (j = 0; j < 100; j++)
> >>
> >> Why 100 times? Please comment.
> 
> Comment stating this is arbitrary.

Done.
 
> >>> +void
> >>> +function (void)
> >>> +{
> >>> +  int i;
> >>> +  for (i = 0; i < 256; i++)
> >>
> >> Why 256 times? Please comment.
> >>> +    var[i] = i;
> >>> +}
> >>>
> > 
> > I don't have answers for any particular values in this
> > testcase.
> 
> Then add a comment saying they are arbitrary. Such that
> future readers know they are not special values.
> 

I added some comments.

Here is the updated patch.  OK to install?

Thanks.


H.J.
---
From f08bbe49c8820576e2149a5be44af0ce2ddb355b Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Thu, 27 Nov 2014 05:42:23 -0800
Subject: [PATCH] Resize DTV if the current DTV isn't big enough

This patch changes _dl_allocate_tls_init to resize DTV if the current DTV
isn't big enough.  Tested on X86-64, x32 and ia32.

	[BZ #13862]
	* elf/dl-tls.c: Include <atomic.h>.
	(oom): Remove #ifdef SHARED/#endif.
	(_dl_static_dtv, _dl_initial_dtv): Moved before ...
	(_dl_resize_dtv): This.  Extracted from _dl_update_slotinfo.
	(_dl_allocate_tls_init): Resize DTV if the current DTV isn't
	big enough.
	(_dl_update_slotinfo): Call _dl_resize_dtv to resize DTV.
	* nptl/Makefile (tests): Add tst-stack4.
	(modules-names): Add tst-stack4mod.
	($(objpfx)tst-stack4): New.
	(tst-stack4mod.sos): Likewise.
	($(objpfx)tst-stack4.out): Likewise.
	($(tst-stack4mod.sos)): Likewise.
	(clean): Likewise.
	* nptl/tst-stack4.c: New file.
	* nptl/tst-stack4mod.c: Likewise.
---
 ChangeLog            |  20 +++++++
 elf/dl-tls.c         | 102 ++++++++++++++++++++-------------
 nptl/Makefile        |  17 +++++-
 nptl/tst-stack4.c    | 159 +++++++++++++++++++++++++++++++++++++++++++++++++++
 nptl/tst-stack4mod.c |  28 +++++++++
 5 files changed, 283 insertions(+), 43 deletions(-)
 create mode 100644 nptl/tst-stack4.c
 create mode 100644 nptl/tst-stack4mod.c
  

Comments

Carlos O'Donell Nov. 28, 2014, 3:37 p.m. UTC | #1
On 11/27/2014 06:55 PM, H.J. Lu wrote:
> On Thu, Nov 27, 2014 at 04:51:08PM -0500, Carlos O'Donell wrote:
>>
>>>> Q: A dlclose() may modify GL(dl_tls_max_dtv_idx) concurrently with another
>>>>    thread starting up. Is it safe to read GL(dl_tls_max_dtv_idx) more than
>>>>    once in a function since they may differ? Does reading GL(dl_tls_max_dtv_idx)
>>>>    require an atomic load? We are trying to be explicit about this in glibc.
>>>
>>> There is a race between DTV access and dlclose:
>>
>> That doesn't answer my question.
>>
>> Alex Oliva and I went through the code and we have come to the conclusion
>> that it is *not* safe to read GL(dl_tls_max_dtv_idx) more than once if the
>> code doing the reading expects to get the same value. The same value is
>> only guaranteed if you hold GL(dl_load_lock). However, it is safe to read
>> an old value, since _dl_update_slotinfo will resize the DTV again if you need
>> to access a TLS variable of a dlopen that was in-flight during thread creation.
>>
>> Therefore, please make the read of GL(dl_tls_max_dtv_idx) an atomic access
>> to prevent the compiler from reloading this and to document that it is
>> written to by other threads concurrently.
> 
> Done.  I used atomic_load_acquire now.

Thanks.
 
>>
>>>> Q: If GL(dl_tls_max_dtv_idx) may change at any time during thread startup
>>>>    does it make sense to reallocate the dtv? Why not leave it uninitialized
>>>>    and allow _dl_update_slotinfo to reallocate it?
>>>
>>> Are you suggesting folding  _dl_allocate_tls_init into _dl_update_slotinfo
>>> and remove _dl_allocate_tls_init?  I can give it a try.
>>
>> I had not thought of that, but it's a good idea.
>>
>> I don't know how you'd make it work though.
>>
>> I think _dl_allocate_tls_init needs to do some book keeping, but the
>> allocation of the DTV could be deferred to __tls_get_addr time 
>> (_dl_update_slotinfo).
> 
> I tried and it doesn't work.

Thanks for trying. I appreciate that.

>>>
>>> There is a race condition. But it is a separate issue.  They should
>>> be fixed together.
>>
>> You are adding new code. That new code should be correct. This needs
>> an atomic load.
> 
> Done.  See above.

Thanks.

>>>> Missing license.
>>>
>>> What is the license policy on testcases?
>>
>> Same license as the project with the appropriate header.
> 
> Added.

Thanks.

>>>>> +
>>>>> +#define DSO_SHARED_FILES 20
>>>>> +#define DSO_OPEN_THREADS 20
>>>>> +#define DSO_EXEC_THREADS 2
>>>>
>>>> Why these particular constants? If you chose them because
>>>> they happened to work on a particular system, please comment
>>>> that.
>>>
>>> They came from:
>>>
>>> https://sourceware.org/bugzilla/attachment.cgi?id=6290
>>
>> You don't answer the question.
>>
>> If you are submitting the patch you have to be ready to defend it.
>>
>> If you can't defend it you must at least add a comment saying the
>> choices are arbitrary.
>>
>> /* The choices of thread count, and file counts are arbitary.
>>    The point is simply to run enough threads that an exiting
>>    thread has it's stack reused by another thread at the same
>>    time as new libraries have been loaded.  */
> 
> Thanks for the suggestion.  I copied it to the testcase.

Thanks for applying.

>>>> Use /* */.
>>>
>>> We have been using // for comments in glibc for a while.  Why not here?
>>
>> glibc follows the GNU Coding Standard which says to use /* */?
>>
> 
> Done.  Changed to /**/.

Thanks.

>>>>
>>>> This test needs a verbose, minimum one paragraph, comment explaining
>>>> what the test is testing.
>>
>> No comment?
>  
> I added one paragraph for each file.

Thanks.

>>>>> +
>>>>> +  for (j = 0; j < 100; j++)
>>>>
>>>> Why 100 times? Please comment.
>>
>> Comment stating this is arbitrary.
> 
> Done.
>  
>>>>> +void
>>>>> +function (void)
>>>>> +{
>>>>> +  int i;
>>>>> +  for (i = 0; i < 256; i++)
>>>>
>>>> Why 256 times? Please comment.
>>>>> +    var[i] = i;
>>>>> +}
>>>>>
>>>
>>> I don't have answers for any particular values in this
>>> testcase.
>>
>> Then add a comment saying they are arbitrary. Such that
>> future readers know they are not special values.
>>
> 
> I added some comments.
> 
> Here is the updated patch.  OK to install?

Yes, but only after you fix one typo. I made a suggested change
to the test case descriptive sentence, but it's your choice to
accept tha tor not.

Thank you for the very quick responses.

> Thanks.
> 
> 
> H.J.
> ---
> From f08bbe49c8820576e2149a5be44af0ce2ddb355b Mon Sep 17 00:00:00 2001
> From: "H.J. Lu" <hjl.tools@gmail.com>
> Date: Thu, 27 Nov 2014 05:42:23 -0800
> Subject: [PATCH] Resize DTV if the current DTV isn't big enough
> 
> This patch changes _dl_allocate_tls_init to resize DTV if the current DTV
> isn't big enough.  Tested on X86-64, x32 and ia32.
> 
> 	[BZ #13862]
> 	* elf/dl-tls.c: Include <atomic.h>.
> 	(oom): Remove #ifdef SHARED/#endif.
> 	(_dl_static_dtv, _dl_initial_dtv): Moved before ...
> 	(_dl_resize_dtv): This.  Extracted from _dl_update_slotinfo.
> 	(_dl_allocate_tls_init): Resize DTV if the current DTV isn't
> 	big enough.
> 	(_dl_update_slotinfo): Call _dl_resize_dtv to resize DTV.
> 	* nptl/Makefile (tests): Add tst-stack4.
> 	(modules-names): Add tst-stack4mod.
> 	($(objpfx)tst-stack4): New.
> 	(tst-stack4mod.sos): Likewise.
> 	($(objpfx)tst-stack4.out): Likewise.
> 	($(tst-stack4mod.sos)): Likewise.
> 	(clean): Likewise.
> 	* nptl/tst-stack4.c: New file.
> 	* nptl/tst-stack4mod.c: Likewise.
> ---
>  ChangeLog            |  20 +++++++
>  elf/dl-tls.c         | 102 ++++++++++++++++++++-------------
>  nptl/Makefile        |  17 +++++-
>  nptl/tst-stack4.c    | 159 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  nptl/tst-stack4mod.c |  28 +++++++++
>  5 files changed, 283 insertions(+), 43 deletions(-)
>  create mode 100644 nptl/tst-stack4.c
>  create mode 100644 nptl/tst-stack4mod.c
> 
> diff --git a/ChangeLog b/ChangeLog
> index be49dba..6a535e7 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,3 +1,23 @@
> +2014-11-27  H.J. Lu  <hongjiu.lu@intel.com>
> +
> +	[BZ #13862]
> +	* elf/dl-tls.c: Include <atomic.h>.
> +	(oom): Remove #ifdef SHARED/#endif.
> +	(_dl_static_dtv, _dl_initial_dtv): Moved before ...
> +	(_dl_resize_dtv): This.  Extracted from _dl_update_slotinfo.
> +	(_dl_allocate_tls_init): Resize DTV if the current DTV isn't
> +	big enough.
> +	(_dl_update_slotinfo): Call _dl_resize_dtv to resize DTV.
> +	* nptl/Makefile (tests): Add tst-stack4.
> +	(modules-names): Add tst-stack4mod.
> +	($(objpfx)tst-stack4): New.
> +	(tst-stack4mod.sos): Likewise.
> +	($(objpfx)tst-stack4.out): Likewise.
> +	($(tst-stack4mod.sos)): Likewise.
> +	(clean): Likewise.
> +	* nptl/tst-stack4.c: New file.
> +	* nptl/tst-stack4mod.c: Likewise.
> +
>  2014-11-27  J. Brown  <jb999@gmx.de>
>  
>  	* sysdeps/x86/bits/string.h: Add recent CPUs.
> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
> index 5204fda..8dd4992 100644
> --- a/elf/dl-tls.c
> +++ b/elf/dl-tls.c
> @@ -23,6 +23,7 @@
>  #include <stdlib.h>
>  #include <unistd.h>
>  #include <sys/param.h>
> +#include <atomic.h>
>  
>  #include <tls.h>
>  #include <dl-tls.h>
> @@ -34,14 +35,12 @@
>  
>  
>  /* Out-of-memory handler.  */
> -#ifdef SHARED
>  static void
>  __attribute__ ((__noreturn__))
>  oom (void)
>  {
>    _dl_fatal_printf ("cannot allocate memory for thread-local data: ABORT\n");
>  }
> -#endif
>  
>  
>  size_t
> @@ -397,6 +396,53 @@ _dl_allocate_tls_storage (void)
>  }
>  
>  
> +#ifndef SHARED
> +extern dtv_t _dl_static_dtv[];
> +# define _dl_initial_dtv (&_dl_static_dtv[1])
> +#endif
> +
> +static dtv_t *
> +_dl_resize_dtv (dtv_t *dtv)
> +{
> +  /* Resize the dtv.  */
> +  dtv_t *newp;
> +  /* Load GL(dl_tls_max_dtv_idx) atomically since it may be written to by
> +     other threads oncurrently.  */

s/oncurrently/concurrently/g

> +  size_t newsize
> +    = atomic_load_acquire (&GL(dl_tls_max_dtv_idx)) + DTV_SURPLUS;
> +  size_t oldsize = dtv[-1].counter;
> +
> +  if (dtv == GL(dl_initial_dtv))
> +    {
> +      /* This is the initial dtv that was either statically allocated in
> +	 __libc_setup_tls or allocated during rtld startup using the
> +	 dl-minimal.c malloc instead of the real malloc.  We can't free
> +	 it, we have to abandon the old storage.  */
> +
> +      newp = malloc ((2 + newsize) * sizeof (dtv_t));
> +      if (newp == NULL)
> +	oom ();
> +      memcpy (newp, &dtv[-1], (2 + oldsize) * sizeof (dtv_t));
> +    }
> +  else
> +    {
> +      newp = realloc (&dtv[-1],
> +		      (2 + newsize) * sizeof (dtv_t));
> +      if (newp == NULL)
> +	oom ();
> +    }
> +
> +  newp[0].counter = newsize;
> +
> +  /* Clear the newly allocated part.  */
> +  memset (newp + 2 + oldsize, '\0',
> +	  (newsize - oldsize) * sizeof (dtv_t));
> +
> +  /* Return the generation counter.  */
> +  return &newp[1];
> +}
> +
> +
>  void *
>  internal_function
>  _dl_allocate_tls_init (void *result)
> @@ -410,6 +456,16 @@ _dl_allocate_tls_init (void *result)
>    size_t total = 0;
>    size_t maxgen = 0;
>  
> +  /* Check if the current dtv is big enough.   */
> +  if (dtv[-1].counter < GL(dl_tls_max_dtv_idx))
> +    {
> +      /* Resize the dtv.  */
> +      dtv = _dl_resize_dtv (dtv);
> +
> +      /* Install this new dtv in the thread data structures.  */
> +      INSTALL_DTV (result, &dtv[-1]);
> +    }
> +
>    /* We have to prepare the dtv for all currently loaded modules using
>       TLS.  For those which are dynamically loaded we add the values
>       indicating deferred allocation.  */
> @@ -492,11 +548,6 @@ _dl_allocate_tls (void *mem)
>  rtld_hidden_def (_dl_allocate_tls)
>  
>  
> -#ifndef SHARED
> -extern dtv_t _dl_static_dtv[];
> -# define _dl_initial_dtv (&_dl_static_dtv[1])
> -#endif
> -
>  void
>  internal_function
>  _dl_deallocate_tls (void *tcb, bool dealloc_tcb)
> @@ -645,41 +696,10 @@ _dl_update_slotinfo (unsigned long int req_modid)
>  	      assert (total + cnt == modid);
>  	      if (dtv[-1].counter < modid)
>  		{
> -		  /* Reallocate the dtv.  */
> -		  dtv_t *newp;
> -		  size_t newsize = GL(dl_tls_max_dtv_idx) + DTV_SURPLUS;
> -		  size_t oldsize = dtv[-1].counter;
> -
> -		  assert (map->l_tls_modid <= newsize);
> -
> -		  if (dtv == GL(dl_initial_dtv))
> -		    {
> -		      /* This is the initial dtv that was allocated
> -			 during rtld startup using the dl-minimal.c
> -			 malloc instead of the real malloc.  We can't
> -			 free it, we have to abandon the old storage.  */
> -
> -		      newp = malloc ((2 + newsize) * sizeof (dtv_t));
> -		      if (newp == NULL)
> -			oom ();
> -		      memcpy (newp, &dtv[-1], (2 + oldsize) * sizeof (dtv_t));
> -		    }
> -		  else
> -		    {
> -		      newp = realloc (&dtv[-1],
> -				      (2 + newsize) * sizeof (dtv_t));
> -		      if (newp == NULL)
> -			oom ();
> -		    }
> -
> -		  newp[0].counter = newsize;
> -
> -		  /* Clear the newly allocated part.  */
> -		  memset (newp + 2 + oldsize, '\0',
> -			  (newsize - oldsize) * sizeof (dtv_t));
> +		  /* Resize the dtv.  */
> +		  dtv = _dl_resize_dtv (dtv);
>  
> -		  /* Point dtv to the generation counter.  */
> -		  dtv = &newp[1];
> +		  assert (modid <= dtv[-1].counter);
>  
>  		  /* Install this new dtv in the thread data
>  		     structures.  */
> diff --git a/nptl/Makefile b/nptl/Makefile
> index 86a1b0c..ac76596 100644
> --- a/nptl/Makefile
> +++ b/nptl/Makefile
> @@ -254,7 +254,7 @@ tests = tst-typesizes \
>  	tst-exec1 tst-exec2 tst-exec3 tst-exec4 \
>  	tst-exit1 tst-exit2 tst-exit3 \
>  	tst-stdio1 tst-stdio2 \
> -	tst-stack1 tst-stack2 tst-stack3 tst-pthread-getattr \
> +	tst-stack1 tst-stack2 tst-stack3 tst-stack4 tst-pthread-getattr \
>  	tst-pthread-attr-affinity \
>  	tst-unload \
>  	tst-dlsym1 \
> @@ -311,7 +311,7 @@ endif
>  
>  modules-names = tst-atfork2mod tst-tls3mod tst-tls4moda tst-tls4modb \
>  		tst-tls5mod tst-tls5moda tst-tls5modb tst-tls5modc \
> -		tst-tls5modd tst-tls5mode tst-tls5modf \
> +		tst-tls5modd tst-tls5mode tst-tls5modf tst-stack4mod \
>  		tst-_res1mod1 tst-_res1mod2 tst-execstack-mod tst-fini1mod
>  extra-test-objs += $(addsuffix .os,$(strip $(modules-names))) tst-cleanup4aux.o
>  test-extras += $(modules-names) tst-cleanup4aux
> @@ -479,6 +479,19 @@ $(objpfx)tst-stack3-mem.out: $(objpfx)tst-stack3.out
>  	$(evaluate-test)
>  generated += tst-stack3-mem.out tst-stack3.mtrace
>  
> +$(objpfx)tst-stack4: $(libdl) $(shared-thread-library)
> +tst-stack4mod.sos=$(shell for i in 0 1 2 3 4 5 6 7 8 9 10 \
> +				   11 12 13 14 15 16 17 18 19; do \
> +			    for j in 0 1 2 3 4 5 6 7 8 9 10 \
> +				     11 12 13 14 15 16 17 18 19; do \
> +			      echo $(objpfx)tst-stack4mod-$$i-$$j.so; \
> +			    done; done)
> +$(objpfx)tst-stack4.out: $(tst-stack4mod.sos)
> +$(tst-stack4mod.sos): $(objpfx)tst-stack4mod.so
> +	cp -f $< $@
> +clean:
> +	rm -f $(tst-stack4mod.sos)
> +
>  $(objpfx)tst-cleanup4: $(objpfx)tst-cleanup4aux.o $(shared-thread-library)
>  $(objpfx)tst-cleanupx4: $(objpfx)tst-cleanup4aux.o $(shared-thread-library)
>  
> diff --git a/nptl/tst-stack4.c b/nptl/tst-stack4.c
> new file mode 100644
> index 0000000..344c658
> --- /dev/null
> +++ b/nptl/tst-stack4.c
> @@ -0,0 +1,159 @@
> +/* Test DTV size overflow with pthread_create and TLS in dlopened shared
> +   object.

Suggest:
Test DTV size oveflow when pthread_create reuses old DTV and TLS is used 
by dlopened shared object.

> +   Copyright (C) 2014 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <stdio.h>
> +#include <stdint.h>
> +#include <dlfcn.h>
> +#include <assert.h>
> +#include <pthread.h>
> +
> +/* The choices of thread count, and file counts are arbitary.
> +   The point is simply to run enough threads that an exiting
> +   thread has it's stack reused by another thread at the same
> +   time as new libraries have been loaded.  */
> +#define DSO_SHARED_FILES 20
> +#define DSO_OPEN_THREADS 20
> +#define DSO_EXEC_THREADS 2
> +
> +/* Used to make sure that only one thread is calling dlopen and dlclose
> +   at a time.  */
> +pthread_mutex_t g_lock;
> +
> +typedef void (*function) (void);
> +
> +void *
> +dso_invoke(void *dso_fun)
> +{
> +  function *fun_vec = (function *) dso_fun;
> +  int dso;
> +
> +  for (dso = 0; dso < DSO_SHARED_FILES; dso++)
> +    (*fun_vec[dso]) ();
> +
> +  pthread_exit (NULL);
> +}
> +
> +void *
> +dso_process (void * p)
> +{
> +  void *handle[DSO_SHARED_FILES];
> +  function fun_vec[DSO_SHARED_FILES];
> +  char dso_path[DSO_SHARED_FILES][100];
> +  int dso;
> +  uintptr_t t = (uintptr_t) p;
> +
> +  /* Open DSOs and get a function.  */
> +  for (dso = 0; dso < DSO_SHARED_FILES; dso++)
> +    {
> +      sprintf (dso_path[dso], "tst-stack4mod-%i-%i.so", t, dso);
> +
> +      pthread_mutex_lock (&g_lock);
> +
> +      handle[dso] = dlopen (dso_path[dso], RTLD_NOW);
> +      assert (handle[dso]);
> +
> +      fun_vec[dso] = (function) dlsym (handle[dso], "function");
> +      assert (fun_vec[dso]);
> +
> +      pthread_mutex_unlock (&g_lock);
> +    }
> +
> +  /* Spawn workers.  */
> +  pthread_t thread[DSO_EXEC_THREADS];
> +  int i, ret;
> +  uintptr_t result = 0;
> +  for (i = 0; i < DSO_EXEC_THREADS; i++)
> +    {
> +      pthread_mutex_lock (&g_lock);
> +      ret = pthread_create (&thread[i], NULL, dso_invoke, (void *) fun_vec);
> +      if (ret != 0)
> +	{
> +	  printf ("pthread_create failed: %d\n", ret);
> +	  result = 1;
> +	}
> +      pthread_mutex_unlock (&g_lock);
> +    }
> +
> +  if (!result) 
> +    for (i = 0; i < DSO_EXEC_THREADS; i++)
> +      {
> +	ret = pthread_join (thread[i], NULL);
> +	if (ret != 0)
> +	  {
> +	    printf ("pthread_join failed: %d\n", ret);
> +	    result = 1;
> +	  }
> +      }
> +
> +  /* Close all DSOs.  */
> +  for (dso = 0; dso < DSO_SHARED_FILES; dso++)
> +    {
> +      pthread_mutex_lock (&g_lock);
> +      dlclose (handle[dso]);
> +      pthread_mutex_unlock (&g_lock);
> +    }
> +
> +  /* Exit.  */
> +  pthread_exit ((void *) result);
> +}
> +
> +static int
> +do_test (void)
> +{
> +  pthread_t thread[DSO_OPEN_THREADS];
> +  int i,j;
> +  int ret;
> +  int result = 0;
> +
> +  pthread_mutex_init (&g_lock, NULL);
> +
> +  /* 100 is arbitrary here and is known to trigger PR 13862.  */
> +  for (j = 0; j < 100; j++)
> +    {
> +      for (i = 0; i < DSO_OPEN_THREADS; i++)
> +	{
> +	  ret = pthread_create (&thread[i], NULL, dso_process,
> +				(void *) (uintptr_t) i);
> +	  if (ret != 0)
> +	    {
> +	      printf ("pthread_create failed: %d\n", ret);
> +	      result = 1;
> +	    }
> +	}
> +
> +      if (result)
> +	break;
> +
> +      for (i = 0; i < DSO_OPEN_THREADS; i++)
> +	{
> +	  ret = pthread_join (thread[i], NULL);
> +	  if (ret != 0)
> +	    {
> +	      printf ("pthread_join failed: %d\n", ret);
> +	      result = 1;
> +	    }
> +	}
> +    }
> +
> +  return result;
> +}
> +
> +#define TEST_FUNCTION do_test ()
> +#define TIMEOUT 100
> +#include "../test-skeleton.c"
> diff --git a/nptl/tst-stack4mod.c b/nptl/tst-stack4mod.c
> new file mode 100644
> index 0000000..a75aa09
> --- /dev/null
> +++ b/nptl/tst-stack4mod.c
> @@ -0,0 +1,28 @@
> +/* This tests DTV usage with TLS in dlopened shared object.
> +   Copyright (C) 2014 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +/* 256 is arbitrary here and is known to trigger PR 13862.  */
> +__thread int var[256] attribute_hidden = {0};
> +
> +void
> +function (void)
> +{
> +  int i;
> +  for (i = 0; i < sizeof (var) / sizeof (int); i++)
> +    var[i] = i;
> +}
> 

Cheers,
Carlos.
  

Patch

diff --git a/ChangeLog b/ChangeLog
index be49dba..6a535e7 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,23 @@ 
+2014-11-27  H.J. Lu  <hongjiu.lu@intel.com>
+
+	[BZ #13862]
+	* elf/dl-tls.c: Include <atomic.h>.
+	(oom): Remove #ifdef SHARED/#endif.
+	(_dl_static_dtv, _dl_initial_dtv): Moved before ...
+	(_dl_resize_dtv): This.  Extracted from _dl_update_slotinfo.
+	(_dl_allocate_tls_init): Resize DTV if the current DTV isn't
+	big enough.
+	(_dl_update_slotinfo): Call _dl_resize_dtv to resize DTV.
+	* nptl/Makefile (tests): Add tst-stack4.
+	(modules-names): Add tst-stack4mod.
+	($(objpfx)tst-stack4): New.
+	(tst-stack4mod.sos): Likewise.
+	($(objpfx)tst-stack4.out): Likewise.
+	($(tst-stack4mod.sos)): Likewise.
+	(clean): Likewise.
+	* nptl/tst-stack4.c: New file.
+	* nptl/tst-stack4mod.c: Likewise.
+
 2014-11-27  J. Brown  <jb999@gmx.de>
 
 	* sysdeps/x86/bits/string.h: Add recent CPUs.
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 5204fda..8dd4992 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -23,6 +23,7 @@ 
 #include <stdlib.h>
 #include <unistd.h>
 #include <sys/param.h>
+#include <atomic.h>
 
 #include <tls.h>
 #include <dl-tls.h>
@@ -34,14 +35,12 @@ 
 
 
 /* Out-of-memory handler.  */
-#ifdef SHARED
 static void
 __attribute__ ((__noreturn__))
 oom (void)
 {
   _dl_fatal_printf ("cannot allocate memory for thread-local data: ABORT\n");
 }
-#endif
 
 
 size_t
@@ -397,6 +396,53 @@  _dl_allocate_tls_storage (void)
 }
 
 
+#ifndef SHARED
+extern dtv_t _dl_static_dtv[];
+# define _dl_initial_dtv (&_dl_static_dtv[1])
+#endif
+
+static dtv_t *
+_dl_resize_dtv (dtv_t *dtv)
+{
+  /* Resize the dtv.  */
+  dtv_t *newp;
+  /* Load GL(dl_tls_max_dtv_idx) atomically since it may be written to by
+     other threads oncurrently.  */
+  size_t newsize
+    = atomic_load_acquire (&GL(dl_tls_max_dtv_idx)) + DTV_SURPLUS;
+  size_t oldsize = dtv[-1].counter;
+
+  if (dtv == GL(dl_initial_dtv))
+    {
+      /* This is the initial dtv that was either statically allocated in
+	 __libc_setup_tls or allocated during rtld startup using the
+	 dl-minimal.c malloc instead of the real malloc.  We can't free
+	 it, we have to abandon the old storage.  */
+
+      newp = malloc ((2 + newsize) * sizeof (dtv_t));
+      if (newp == NULL)
+	oom ();
+      memcpy (newp, &dtv[-1], (2 + oldsize) * sizeof (dtv_t));
+    }
+  else
+    {
+      newp = realloc (&dtv[-1],
+		      (2 + newsize) * sizeof (dtv_t));
+      if (newp == NULL)
+	oom ();
+    }
+
+  newp[0].counter = newsize;
+
+  /* Clear the newly allocated part.  */
+  memset (newp + 2 + oldsize, '\0',
+	  (newsize - oldsize) * sizeof (dtv_t));
+
+  /* Return the generation counter.  */
+  return &newp[1];
+}
+
+
 void *
 internal_function
 _dl_allocate_tls_init (void *result)
@@ -410,6 +456,16 @@  _dl_allocate_tls_init (void *result)
   size_t total = 0;
   size_t maxgen = 0;
 
+  /* Check if the current dtv is big enough.   */
+  if (dtv[-1].counter < GL(dl_tls_max_dtv_idx))
+    {
+      /* Resize the dtv.  */
+      dtv = _dl_resize_dtv (dtv);
+
+      /* Install this new dtv in the thread data structures.  */
+      INSTALL_DTV (result, &dtv[-1]);
+    }
+
   /* We have to prepare the dtv for all currently loaded modules using
      TLS.  For those which are dynamically loaded we add the values
      indicating deferred allocation.  */
@@ -492,11 +548,6 @@  _dl_allocate_tls (void *mem)
 rtld_hidden_def (_dl_allocate_tls)
 
 
-#ifndef SHARED
-extern dtv_t _dl_static_dtv[];
-# define _dl_initial_dtv (&_dl_static_dtv[1])
-#endif
-
 void
 internal_function
 _dl_deallocate_tls (void *tcb, bool dealloc_tcb)
@@ -645,41 +696,10 @@  _dl_update_slotinfo (unsigned long int req_modid)
 	      assert (total + cnt == modid);
 	      if (dtv[-1].counter < modid)
 		{
-		  /* Reallocate the dtv.  */
-		  dtv_t *newp;
-		  size_t newsize = GL(dl_tls_max_dtv_idx) + DTV_SURPLUS;
-		  size_t oldsize = dtv[-1].counter;
-
-		  assert (map->l_tls_modid <= newsize);
-
-		  if (dtv == GL(dl_initial_dtv))
-		    {
-		      /* This is the initial dtv that was allocated
-			 during rtld startup using the dl-minimal.c
-			 malloc instead of the real malloc.  We can't
-			 free it, we have to abandon the old storage.  */
-
-		      newp = malloc ((2 + newsize) * sizeof (dtv_t));
-		      if (newp == NULL)
-			oom ();
-		      memcpy (newp, &dtv[-1], (2 + oldsize) * sizeof (dtv_t));
-		    }
-		  else
-		    {
-		      newp = realloc (&dtv[-1],
-				      (2 + newsize) * sizeof (dtv_t));
-		      if (newp == NULL)
-			oom ();
-		    }
-
-		  newp[0].counter = newsize;
-
-		  /* Clear the newly allocated part.  */
-		  memset (newp + 2 + oldsize, '\0',
-			  (newsize - oldsize) * sizeof (dtv_t));
+		  /* Resize the dtv.  */
+		  dtv = _dl_resize_dtv (dtv);
 
-		  /* Point dtv to the generation counter.  */
-		  dtv = &newp[1];
+		  assert (modid <= dtv[-1].counter);
 
 		  /* Install this new dtv in the thread data
 		     structures.  */
diff --git a/nptl/Makefile b/nptl/Makefile
index 86a1b0c..ac76596 100644
--- a/nptl/Makefile
+++ b/nptl/Makefile
@@ -254,7 +254,7 @@  tests = tst-typesizes \
 	tst-exec1 tst-exec2 tst-exec3 tst-exec4 \
 	tst-exit1 tst-exit2 tst-exit3 \
 	tst-stdio1 tst-stdio2 \
-	tst-stack1 tst-stack2 tst-stack3 tst-pthread-getattr \
+	tst-stack1 tst-stack2 tst-stack3 tst-stack4 tst-pthread-getattr \
 	tst-pthread-attr-affinity \
 	tst-unload \
 	tst-dlsym1 \
@@ -311,7 +311,7 @@  endif
 
 modules-names = tst-atfork2mod tst-tls3mod tst-tls4moda tst-tls4modb \
 		tst-tls5mod tst-tls5moda tst-tls5modb tst-tls5modc \
-		tst-tls5modd tst-tls5mode tst-tls5modf \
+		tst-tls5modd tst-tls5mode tst-tls5modf tst-stack4mod \
 		tst-_res1mod1 tst-_res1mod2 tst-execstack-mod tst-fini1mod
 extra-test-objs += $(addsuffix .os,$(strip $(modules-names))) tst-cleanup4aux.o
 test-extras += $(modules-names) tst-cleanup4aux
@@ -479,6 +479,19 @@  $(objpfx)tst-stack3-mem.out: $(objpfx)tst-stack3.out
 	$(evaluate-test)
 generated += tst-stack3-mem.out tst-stack3.mtrace
 
+$(objpfx)tst-stack4: $(libdl) $(shared-thread-library)
+tst-stack4mod.sos=$(shell for i in 0 1 2 3 4 5 6 7 8 9 10 \
+				   11 12 13 14 15 16 17 18 19; do \
+			    for j in 0 1 2 3 4 5 6 7 8 9 10 \
+				     11 12 13 14 15 16 17 18 19; do \
+			      echo $(objpfx)tst-stack4mod-$$i-$$j.so; \
+			    done; done)
+$(objpfx)tst-stack4.out: $(tst-stack4mod.sos)
+$(tst-stack4mod.sos): $(objpfx)tst-stack4mod.so
+	cp -f $< $@
+clean:
+	rm -f $(tst-stack4mod.sos)
+
 $(objpfx)tst-cleanup4: $(objpfx)tst-cleanup4aux.o $(shared-thread-library)
 $(objpfx)tst-cleanupx4: $(objpfx)tst-cleanup4aux.o $(shared-thread-library)
 
diff --git a/nptl/tst-stack4.c b/nptl/tst-stack4.c
new file mode 100644
index 0000000..344c658
--- /dev/null
+++ b/nptl/tst-stack4.c
@@ -0,0 +1,159 @@ 
+/* Test DTV size overflow with pthread_create and TLS in dlopened shared
+   object.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <dlfcn.h>
+#include <assert.h>
+#include <pthread.h>
+
+/* The choices of thread count, and file counts are arbitary.
+   The point is simply to run enough threads that an exiting
+   thread has it's stack reused by another thread at the same
+   time as new libraries have been loaded.  */
+#define DSO_SHARED_FILES 20
+#define DSO_OPEN_THREADS 20
+#define DSO_EXEC_THREADS 2
+
+/* Used to make sure that only one thread is calling dlopen and dlclose
+   at a time.  */
+pthread_mutex_t g_lock;
+
+typedef void (*function) (void);
+
+void *
+dso_invoke(void *dso_fun)
+{
+  function *fun_vec = (function *) dso_fun;
+  int dso;
+
+  for (dso = 0; dso < DSO_SHARED_FILES; dso++)
+    (*fun_vec[dso]) ();
+
+  pthread_exit (NULL);
+}
+
+void *
+dso_process (void * p)
+{
+  void *handle[DSO_SHARED_FILES];
+  function fun_vec[DSO_SHARED_FILES];
+  char dso_path[DSO_SHARED_FILES][100];
+  int dso;
+  uintptr_t t = (uintptr_t) p;
+
+  /* Open DSOs and get a function.  */
+  for (dso = 0; dso < DSO_SHARED_FILES; dso++)
+    {
+      sprintf (dso_path[dso], "tst-stack4mod-%i-%i.so", t, dso);
+
+      pthread_mutex_lock (&g_lock);
+
+      handle[dso] = dlopen (dso_path[dso], RTLD_NOW);
+      assert (handle[dso]);
+
+      fun_vec[dso] = (function) dlsym (handle[dso], "function");
+      assert (fun_vec[dso]);
+
+      pthread_mutex_unlock (&g_lock);
+    }
+
+  /* Spawn workers.  */
+  pthread_t thread[DSO_EXEC_THREADS];
+  int i, ret;
+  uintptr_t result = 0;
+  for (i = 0; i < DSO_EXEC_THREADS; i++)
+    {
+      pthread_mutex_lock (&g_lock);
+      ret = pthread_create (&thread[i], NULL, dso_invoke, (void *) fun_vec);
+      if (ret != 0)
+	{
+	  printf ("pthread_create failed: %d\n", ret);
+	  result = 1;
+	}
+      pthread_mutex_unlock (&g_lock);
+    }
+
+  if (!result) 
+    for (i = 0; i < DSO_EXEC_THREADS; i++)
+      {
+	ret = pthread_join (thread[i], NULL);
+	if (ret != 0)
+	  {
+	    printf ("pthread_join failed: %d\n", ret);
+	    result = 1;
+	  }
+      }
+
+  /* Close all DSOs.  */
+  for (dso = 0; dso < DSO_SHARED_FILES; dso++)
+    {
+      pthread_mutex_lock (&g_lock);
+      dlclose (handle[dso]);
+      pthread_mutex_unlock (&g_lock);
+    }
+
+  /* Exit.  */
+  pthread_exit ((void *) result);
+}
+
+static int
+do_test (void)
+{
+  pthread_t thread[DSO_OPEN_THREADS];
+  int i,j;
+  int ret;
+  int result = 0;
+
+  pthread_mutex_init (&g_lock, NULL);
+
+  /* 100 is arbitrary here and is known to trigger PR 13862.  */
+  for (j = 0; j < 100; j++)
+    {
+      for (i = 0; i < DSO_OPEN_THREADS; i++)
+	{
+	  ret = pthread_create (&thread[i], NULL, dso_process,
+				(void *) (uintptr_t) i);
+	  if (ret != 0)
+	    {
+	      printf ("pthread_create failed: %d\n", ret);
+	      result = 1;
+	    }
+	}
+
+      if (result)
+	break;
+
+      for (i = 0; i < DSO_OPEN_THREADS; i++)
+	{
+	  ret = pthread_join (thread[i], NULL);
+	  if (ret != 0)
+	    {
+	      printf ("pthread_join failed: %d\n", ret);
+	      result = 1;
+	    }
+	}
+    }
+
+  return result;
+}
+
+#define TEST_FUNCTION do_test ()
+#define TIMEOUT 100
+#include "../test-skeleton.c"
diff --git a/nptl/tst-stack4mod.c b/nptl/tst-stack4mod.c
new file mode 100644
index 0000000..a75aa09
--- /dev/null
+++ b/nptl/tst-stack4mod.c
@@ -0,0 +1,28 @@ 
+/* This tests DTV usage with TLS in dlopened shared object.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* 256 is arbitrary here and is known to trigger PR 13862.  */
+__thread int var[256] attribute_hidden = {0};
+
+void
+function (void)
+{
+  int i;
+  for (i = 0; i < sizeof (var) / sizeof (int); i++)
+    var[i] = i;
+}