[07/15] elf: Refactor _dl_update_slotinfo to avoid use after free

Message ID 3ecdb956cbf6d1b46e36311ffe7f491ce186cdbc.1613390045.git.szabolcs.nagy@arm.com
State Superseded
Delegated to: Adhemerval Zanella Netto
Headers
Series Dynamic TLS related data race fixes |

Commit Message

Szabolcs Nagy Feb. 15, 2021, noon UTC
  map is not valid to access here because it can be freed by a
concurrent dlclose, so don't check the modid.

The map == 0 and map != 0 code paths can be shared (avoiding
the dtv resize in case of map == 0 is just an optimization:
larger dtv than necessary would be fine too).
---
 elf/dl-tls.c | 21 +++++----------------
 1 file changed, 5 insertions(+), 16 deletions(-)
  

Comments

Adhemerval Zanella April 6, 2021, 7:40 p.m. UTC | #1
On 15/02/2021 09:00, Szabolcs Nagy via Libc-alpha wrote:
> map is not valid to access here because it can be freed by a
> concurrent dlclose, so don't check the modid.

Won't it be protected by the recursive GL(dl_load_lock) in such case?
I think the concurrency issue is between dlopen and _dl_deallocate_tls
called by pthread stack handling (nptl/allocatestack.c).  Am I missing
something here?

> 
> The map == 0 and map != 0 code paths can be shared (avoiding
> the dtv resize in case of map == 0 is just an optimization:
> larger dtv than necessary would be fine too).
> ---
>  elf/dl-tls.c | 21 +++++----------------
>  1 file changed, 5 insertions(+), 16 deletions(-)
> 
> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
> index 24d00c14ef..f8b32b3ecb 100644
> --- a/elf/dl-tls.c
> +++ b/elf/dl-tls.c
> @@ -743,6 +743,8 @@ _dl_update_slotinfo (unsigned long int req_modid)
>  	{
>  	  for (size_t cnt = total == 0 ? 1 : 0; cnt < listp->len; ++cnt)
>  	    {
> +	      size_t modid = total + cnt;
> +
>  	      size_t gen = listp->slotinfo[cnt].gen;
>  
>  	      if (gen > new_gen)
> @@ -758,25 +760,12 @@ _dl_update_slotinfo (unsigned long int req_modid)
>  
>  	      /* If there is no map this means the entry is empty.  */
>  	      struct link_map *map = listp->slotinfo[cnt].map;
> -	      if (map == NULL)
> -		{
> -		  if (dtv[-1].counter >= total + cnt)
> -		    {
> -		      /* If this modid was used at some point the memory
> -			 might still be allocated.  */
> -		      free (dtv[total + cnt].pointer.to_free);
> -		      dtv[total + cnt].pointer.val = TLS_DTV_UNALLOCATED;
> -		      dtv[total + cnt].pointer.to_free = NULL;
> -		    }
> -
> -		  continue;
> -		}
> -
>  	      /* Check whether the current dtv array is large enough.  */
> -	      size_t modid = map->l_tls_modid;
> -	      assert (total + cnt == modid);
>  	      if (dtv[-1].counter < modid)
>  		{
> +		  if (map == NULL)
> +		    continue;
> +
>  		  /* Resize the dtv.  */
>  		  dtv = _dl_resize_dtv (dtv);
>  
>
  
Szabolcs Nagy April 7, 2021, 8:01 a.m. UTC | #2
The 04/06/2021 16:40, Adhemerval Zanella wrote:
> On 15/02/2021 09:00, Szabolcs Nagy via Libc-alpha wrote:
> > map is not valid to access here because it can be freed by a
> > concurrent dlclose, so don't check the modid.
> 
> Won't it be protected by the recursive GL(dl_load_lock) in such case?
> I think the concurrency issue is between dlopen and _dl_deallocate_tls
> called by pthread stack handling (nptl/allocatestack.c).  Am I missing
> something here?


_dl_update_slotinfo is called both with and without
the dlopen lock held: during dynamic tls access
the lock is not held (see the __tls_get_addr path)

we cannot add a lock there because that would cause
new deadlocks, dealing with this is the tricky part
of the patchset.

> > 
> > The map == 0 and map != 0 code paths can be shared (avoiding
> > the dtv resize in case of map == 0 is just an optimization:
> > larger dtv than necessary would be fine too).
> > ---
> >  elf/dl-tls.c | 21 +++++----------------
> >  1 file changed, 5 insertions(+), 16 deletions(-)
> > 
> > diff --git a/elf/dl-tls.c b/elf/dl-tls.c
> > index 24d00c14ef..f8b32b3ecb 100644
> > --- a/elf/dl-tls.c
> > +++ b/elf/dl-tls.c
> > @@ -743,6 +743,8 @@ _dl_update_slotinfo (unsigned long int req_modid)
> >  	{
> >  	  for (size_t cnt = total == 0 ? 1 : 0; cnt < listp->len; ++cnt)
> >  	    {
> > +	      size_t modid = total + cnt;
> > +
> >  	      size_t gen = listp->slotinfo[cnt].gen;
> >  
> >  	      if (gen > new_gen)
> > @@ -758,25 +760,12 @@ _dl_update_slotinfo (unsigned long int req_modid)
> >  
> >  	      /* If there is no map this means the entry is empty.  */
> >  	      struct link_map *map = listp->slotinfo[cnt].map;
> > -	      if (map == NULL)
> > -		{
> > -		  if (dtv[-1].counter >= total + cnt)
> > -		    {
> > -		      /* If this modid was used at some point the memory
> > -			 might still be allocated.  */
> > -		      free (dtv[total + cnt].pointer.to_free);
> > -		      dtv[total + cnt].pointer.val = TLS_DTV_UNALLOCATED;
> > -		      dtv[total + cnt].pointer.to_free = NULL;
> > -		    }
> > -
> > -		  continue;
> > -		}
> > -
> >  	      /* Check whether the current dtv array is large enough.  */
> > -	      size_t modid = map->l_tls_modid;
> > -	      assert (total + cnt == modid);
> >  	      if (dtv[-1].counter < modid)
> >  		{
> > +		  if (map == NULL)
> > +		    continue;
> > +
> >  		  /* Resize the dtv.  */
> >  		  dtv = _dl_resize_dtv (dtv);
> >  
> >
  
Adhemerval Zanella April 7, 2021, 2:28 p.m. UTC | #3
On 07/04/2021 05:01, Szabolcs Nagy wrote:
> The 04/06/2021 16:40, Adhemerval Zanella wrote:
>> On 15/02/2021 09:00, Szabolcs Nagy via Libc-alpha wrote:
>>> map is not valid to access here because it can be freed by a
>>> concurrent dlclose, so don't check the modid.
>>
>> Won't it be protected by the recursive GL(dl_load_lock) in such case?
>> I think the concurrency issue is between dlopen and _dl_deallocate_tls
>> called by pthread stack handling (nptl/allocatestack.c).  Am I missing
>> something here?
> 
> 
> _dl_update_slotinfo is called both with and without
> the dlopen lock held: during dynamic tls access
> the lock is not held (see the __tls_get_addr path)


Right, revising the patch I mapped the calls (not sure if it is 
fully complete):

| _dl_open
|   __rtld_lock_lock_recursive (GL(dl_load_lock));
|   dl_open_worker
|     update_tls_slotinfo
|       _dl_update_slotinfo
|   __rtld_lock_unlock_recursive (GL(dl_load_lock));
  
| __tls_get_addr   
|   update_get_addr
|     _dl_update_slotinfo 

| rtld
|   _dl_resolve_conflicts
|     elf_machine_rela
|       TRY_STATIC_TLS
|         _dl_try_allocate_static_tls
|          _dl_update_slotinfo
|    
|    elf_machine_rela 
|      CHECK_STATIC_TLS   
|        _dl_allocate_static_tls
|          _dl_try_allocate_static_tls
|            _dl_update_slotinfo

The rtld part should not matter, since it is done before thread
is supported. 

> 
> we cannot add a lock there because that would cause
> new deadlocks, dealing with this is the tricky part
> of the patchset.

I understand this patch from previous discussion about it.  The
part is confusing me is "because it can be freed by a concurrent
dlclose".  My understanding is '_dl_deallocate_tls' might be called
in thread exit / deallocation without the GL(dl_load_lock) (which
is a potential issue); what I can't see is how concurrent dlclose 
might trigger this issue (since it should be synchronized with dlopen
through the lock).


> 
>>>
>>> The map == 0 and map != 0 code paths can be shared (avoiding
>>> the dtv resize in case of map == 0 is just an optimization:
>>> larger dtv than necessary would be fine too).
>>> ---
>>>  elf/dl-tls.c | 21 +++++----------------
>>>  1 file changed, 5 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
>>> index 24d00c14ef..f8b32b3ecb 100644
>>> --- a/elf/dl-tls.c
>>> +++ b/elf/dl-tls.c
>>> @@ -743,6 +743,8 @@ _dl_update_slotinfo (unsigned long int req_modid)
>>>  	{
>>>  	  for (size_t cnt = total == 0 ? 1 : 0; cnt < listp->len; ++cnt)
>>>  	    {
>>> +	      size_t modid = total + cnt;
>>> +
>>>  	      size_t gen = listp->slotinfo[cnt].gen;
>>>  
>>>  	      if (gen > new_gen)
>>> @@ -758,25 +760,12 @@ _dl_update_slotinfo (unsigned long int req_modid)
>>>  
>>>  	      /* If there is no map this means the entry is empty.  */
>>>  	      struct link_map *map = listp->slotinfo[cnt].map;
>>> -	      if (map == NULL)
>>> -		{
>>> -		  if (dtv[-1].counter >= total + cnt)
>>> -		    {
>>> -		      /* If this modid was used at some point the memory
>>> -			 might still be allocated.  */
>>> -		      free (dtv[total + cnt].pointer.to_free);
>>> -		      dtv[total + cnt].pointer.val = TLS_DTV_UNALLOCATED;
>>> -		      dtv[total + cnt].pointer.to_free = NULL;
>>> -		    }
>>> -
>>> -		  continue;
>>> -		}
>>> -
>>>  	      /* Check whether the current dtv array is large enough.  */
>>> -	      size_t modid = map->l_tls_modid;
>>> -	      assert (total + cnt == modid);
>>>  	      if (dtv[-1].counter < modid)
>>>  		{
>>> +		  if (map == NULL)
>>> +		    continue;
>>> +
>>>  		  /* Resize the dtv.  */
>>>  		  dtv = _dl_resize_dtv (dtv);
>>>  
>>>
  
Adhemerval Zanella April 7, 2021, 2:36 p.m. UTC | #4
On 07/04/2021 11:28, Adhemerval Zanella wrote:
> 
> 
> On 07/04/2021 05:01, Szabolcs Nagy wrote:
>> The 04/06/2021 16:40, Adhemerval Zanella wrote:
>>> On 15/02/2021 09:00, Szabolcs Nagy via Libc-alpha wrote:
>>>> map is not valid to access here because it can be freed by a
>>>> concurrent dlclose, so don't check the modid.
>>>
>>> Won't it be protected by the recursive GL(dl_load_lock) in such case?
>>> I think the concurrency issue is between dlopen and _dl_deallocate_tls
>>> called by pthread stack handling (nptl/allocatestack.c).  Am I missing
>>> something here?
>>
>>
>> _dl_update_slotinfo is called both with and without
>> the dlopen lock held: during dynamic tls access
>> the lock is not held (see the __tls_get_addr path)
> 
> 
> Right, revising the patch I mapped the calls (not sure if it is 
> fully complete):
> 
> | _dl_open
> |   __rtld_lock_lock_recursive (GL(dl_load_lock));
> |   dl_open_worker
> |     update_tls_slotinfo
> |       _dl_update_slotinfo
> |   __rtld_lock_unlock_recursive (GL(dl_load_lock));
>   
> | __tls_get_addr   
> |   update_get_addr
> |     _dl_update_slotinfo 
> 
> | rtld
> |   _dl_resolve_conflicts
> |     elf_machine_rela
> |       TRY_STATIC_TLS
> |         _dl_try_allocate_static_tls
> |          _dl_update_slotinfo
> |    
> |    elf_machine_rela 
> |      CHECK_STATIC_TLS   
> |        _dl_allocate_static_tls
> |          _dl_try_allocate_static_tls
> |            _dl_update_slotinfo
> 
> The rtld part should not matter, since it is done before thread
> is supported. 
> 
>>
>> we cannot add a lock there because that would cause
>> new deadlocks, dealing with this is the tricky part
>> of the patchset.
> 
> I understand this patch from previous discussion about it.  The
> part is confusing me is "because it can be freed by a concurrent
> dlclose".  My understanding is '_dl_deallocate_tls' might be called
> in thread exit / deallocation without the GL(dl_load_lock) (which
> is a potential issue); what I can't see is how concurrent dlclose 
> might trigger this issue (since it should be synchronized with dlopen
> through the lock).

I think I got what you meant: the concurrency issues is not related to 
dlopen open, but rather to __tls_get_addr and dclose.  Maybe making this
explicit on the commit message.
  
Adhemerval Zanella April 7, 2021, 5:05 p.m. UTC | #5
On 15/02/2021 09:00, Szabolcs Nagy via Libc-alpha wrote:
> map is not valid to access here because it can be freed by a
> concurrent dlclose, so don't check the modid.
> 
> The map == 0 and map != 0 code paths can be shared (avoiding
> the dtv resize in case of map == 0 is just an optimization:
> larger dtv than necessary would be fine too).

Please extend a bit the patch description and add that __tls_get_addr
is the public interface that show concurrency issues with dlclose.

The patch looks ok, thanks.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> ---
>  elf/dl-tls.c | 21 +++++----------------
>  1 file changed, 5 insertions(+), 16 deletions(-)
> 
> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
> index 24d00c14ef..f8b32b3ecb 100644
> --- a/elf/dl-tls.c
> +++ b/elf/dl-tls.c
> @@ -743,6 +743,8 @@ _dl_update_slotinfo (unsigned long int req_modid)
>  	{
>  	  for (size_t cnt = total == 0 ? 1 : 0; cnt < listp->len; ++cnt)
>  	    {
> +	      size_t modid = total + cnt;
> +
>  	      size_t gen = listp->slotinfo[cnt].gen;
>  
>  	      if (gen > new_gen)
> @@ -758,25 +760,12 @@ _dl_update_slotinfo (unsigned long int req_modid)
>  
>  	      /* If there is no map this means the entry is empty.  */
>  	      struct link_map *map = listp->slotinfo[cnt].map;
> -	      if (map == NULL)
> -		{
> -		  if (dtv[-1].counter >= total + cnt)
> -		    {
> -		      /* If this modid was used at some point the memory
> -			 might still be allocated.  */
> -		      free (dtv[total + cnt].pointer.to_free);
> -		      dtv[total + cnt].pointer.val = TLS_DTV_UNALLOCATED;
> -		      dtv[total + cnt].pointer.to_free = NULL;
> -		    }
> -
> -		  continue;
> -		}
> -
>  	      /* Check whether the current dtv array is large enough.  */
> -	      size_t modid = map->l_tls_modid;
> -	      assert (total + cnt == modid);
>  	      if (dtv[-1].counter < modid)
>  		{
> +		  if (map == NULL)
> +		    continue;
> +
>  		  /* Resize the dtv.  */
>  		  dtv = _dl_resize_dtv (dtv);
>  
>
  

Patch

diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 24d00c14ef..f8b32b3ecb 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -743,6 +743,8 @@  _dl_update_slotinfo (unsigned long int req_modid)
 	{
 	  for (size_t cnt = total == 0 ? 1 : 0; cnt < listp->len; ++cnt)
 	    {
+	      size_t modid = total + cnt;
+
 	      size_t gen = listp->slotinfo[cnt].gen;
 
 	      if (gen > new_gen)
@@ -758,25 +760,12 @@  _dl_update_slotinfo (unsigned long int req_modid)
 
 	      /* If there is no map this means the entry is empty.  */
 	      struct link_map *map = listp->slotinfo[cnt].map;
-	      if (map == NULL)
-		{
-		  if (dtv[-1].counter >= total + cnt)
-		    {
-		      /* If this modid was used at some point the memory
-			 might still be allocated.  */
-		      free (dtv[total + cnt].pointer.to_free);
-		      dtv[total + cnt].pointer.val = TLS_DTV_UNALLOCATED;
-		      dtv[total + cnt].pointer.to_free = NULL;
-		    }
-
-		  continue;
-		}
-
 	      /* Check whether the current dtv array is large enough.  */
-	      size_t modid = map->l_tls_modid;
-	      assert (total + cnt == modid);
 	      if (dtv[-1].counter < modid)
 		{
+		  if (map == NULL)
+		    continue;
+
 		  /* Resize the dtv.  */
 		  dtv = _dl_resize_dtv (dtv);