[06/15] elf: Fix comments and logic in _dl_add_to_slotinfo

Message ID 068770faf123b7c227f5f1e130812f7976e74cef.1613390045.git.szabolcs.nagy@arm.com
State Superseded
Delegated to: Adhemerval Zanella Netto
Headers
Series Dynamic TLS related data race fixes |

Commit Message

Szabolcs Nagy Feb. 15, 2021, 11:59 a.m. UTC
  From: Szabolcs Nagy <szabolcs dot nagy at arm dot com>

Since

  commit a509eb117fac1d764b15eba64993f4bdb63d7f3c
  Avoid late dlopen failure due to scope, TLS slotinfo updates [BZ #25112]

the generation counter update is not needed in the failure path.
---
 elf/dl-tls.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)
  

Comments

Adhemerval Zanella April 2, 2021, 8:50 p.m. UTC | #1
On 15/02/2021 08:59, Szabolcs Nagy via Libc-alpha wrote:
> From: Szabolcs Nagy <szabolcs dot nagy at arm dot com>
> 
> Since
> 
>   commit a509eb117fac1d764b15eba64993f4bdb63d7f3c
>   Avoid late dlopen failure due to scope, TLS slotinfo updates [BZ #25112]
> 
> the generation counter update is not needed in the failure path.

It is not clear to me from just the commit reference why it would
be safe to remove the GL(dl_tls_generation) update on 
_dl_add_to_slotinfo.

The dl_open_worker calls update_tls_slotinfo which in turn call
might call _dl_add_to_slotinfo *after* the demarcation point.  Will
it terminate the process?

> ---
>  elf/dl-tls.c | 11 +----------
>  1 file changed, 1 insertion(+), 10 deletions(-)
> 
> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
> index 79b93ad91b..24d00c14ef 100644
> --- a/elf/dl-tls.c
> +++ b/elf/dl-tls.c
> @@ -998,16 +998,7 @@ _dl_add_to_slotinfo (struct link_map *l, bool do_add)
>  		+ TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo));
>        if (listp == NULL)
>  	{
> -	  /* We ran out of memory.  We will simply fail this
> -	     call but don't undo anything we did so far.  The
> -	     application will crash or be terminated anyway very
> -	     soon.  */
> -
> -	  /* We have to do this since some entries in the dtv
> -	     slotinfo array might already point to this
> -	     generation.  */
> -	  ++GL(dl_tls_generation);
> -
> +	  /* We ran out of memory while resizing the dtv slotinfo list.  */
>  	  _dl_signal_error (ENOMEM, "dlopen", NULL, N_("\
>  cannot create TLS data structures"));
>  	}
>
  
Szabolcs Nagy April 6, 2021, 3:48 p.m. UTC | #2
The 04/02/2021 17:50, Adhemerval Zanella via Libc-alpha wrote:
> On 15/02/2021 08:59, Szabolcs Nagy via Libc-alpha wrote:
> > From: Szabolcs Nagy <szabolcs dot nagy at arm dot com>
> > 
> > Since
> > 
> >   commit a509eb117fac1d764b15eba64993f4bdb63d7f3c
> >   Avoid late dlopen failure due to scope, TLS slotinfo updates [BZ #25112]
> > 
> > the generation counter update is not needed in the failure path.
> 
> It is not clear to me from just the commit reference why it would
> be safe to remove the GL(dl_tls_generation) update on 
> _dl_add_to_slotinfo.
> 
> The dl_open_worker calls update_tls_slotinfo which in turn call
> might call _dl_add_to_slotinfo *after* the demarcation point.  Will
> it terminate the process?

in that commit the logic got changed such that allocations
happen before the demarcation point in resize_tls_slotinfo.

this is the reason for the do_add bool argument in
_dl_add_to_slotinfo: it's called twice and the first call
with do_add==false is only there to ensure everything is
allocated before the demarcation point (so module loading
can be reverted, no need to bump the generation count).

i guess this is not visible by just looking at the
_dl_add_to_slotinfo code.

note that adding some asserts to ensure there is no allocation
when do_add==true does not work: rtld uses the same api, but
without the do_add==false preallocation step since at startup
time allocation failure is fatal anyway.

> > ---
> >  elf/dl-tls.c | 11 +----------
> >  1 file changed, 1 insertion(+), 10 deletions(-)
> > 
> > diff --git a/elf/dl-tls.c b/elf/dl-tls.c
> > index 79b93ad91b..24d00c14ef 100644
> > --- a/elf/dl-tls.c
> > +++ b/elf/dl-tls.c
> > @@ -998,16 +998,7 @@ _dl_add_to_slotinfo (struct link_map *l, bool do_add)
> >  		+ TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo));
> >        if (listp == NULL)
> >  	{
> > -	  /* We ran out of memory.  We will simply fail this
> > -	     call but don't undo anything we did so far.  The
> > -	     application will crash or be terminated anyway very
> > -	     soon.  */
> > -
> > -	  /* We have to do this since some entries in the dtv
> > -	     slotinfo array might already point to this
> > -	     generation.  */
> > -	  ++GL(dl_tls_generation);
> > -
> > +	  /* We ran out of memory while resizing the dtv slotinfo list.  */
> >  	  _dl_signal_error (ENOMEM, "dlopen", NULL, N_("\
> >  cannot create TLS data structures"));
> >  	}
  
Adhemerval Zanella April 6, 2021, 5:47 p.m. UTC | #3
On 06/04/2021 12:48, Szabolcs Nagy wrote:
> The 04/02/2021 17:50, Adhemerval Zanella via Libc-alpha wrote:
>> On 15/02/2021 08:59, Szabolcs Nagy via Libc-alpha wrote:
>>> From: Szabolcs Nagy <szabolcs dot nagy at arm dot com>
>>>
>>> Since
>>>
>>>   commit a509eb117fac1d764b15eba64993f4bdb63d7f3c
>>>   Avoid late dlopen failure due to scope, TLS slotinfo updates [BZ #25112]
>>>
>>> the generation counter update is not needed in the failure path.
>>
>> It is not clear to me from just the commit reference why it would
>> be safe to remove the GL(dl_tls_generation) update on 
>> _dl_add_to_slotinfo.
>>
>> The dl_open_worker calls update_tls_slotinfo which in turn call
>> might call _dl_add_to_slotinfo *after* the demarcation point.  Will
>> it terminate the process?
> 
> in that commit the logic got changed such that allocations
> happen before the demarcation point in resize_tls_slotinfo.
> 
> this is the reason for the do_add bool argument in
> _dl_add_to_slotinfo: it's called twice and the first call
> with do_add==false is only there to ensure everything is
> allocated before the demarcation point (so module loading
> can be reverted, no need to bump the generation count).
> 
> i guess this is not visible by just looking at the
> _dl_add_to_slotinfo code.

Right, so if I reading correctly once _dl_add_to_slotinfo (..., true)
is called by update_tls_slotinfo, the malloc that create a new
dtv_slotinfo_list won't be called (since it was already allocated
previously) since the entry was already pre-allocated and thus the
search part at line 978-987 will find.  Is that correct?

> 
> note that adding some asserts to ensure there is no allocation
> when do_add==true does not work: rtld uses the same api, but
> without the do_add==false preallocation step since at startup
> time allocation failure is fatal anyway.

Right, the _dl_signal_error will trigger a fatal_error since
lcatch won't be override yet.

Thanks for the explanation.

> 
>>> ---
>>>  elf/dl-tls.c | 11 +----------
>>>  1 file changed, 1 insertion(+), 10 deletions(-)
>>>
>>> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
>>> index 79b93ad91b..24d00c14ef 100644
>>> --- a/elf/dl-tls.c
>>> +++ b/elf/dl-tls.c
>>> @@ -998,16 +998,7 @@ _dl_add_to_slotinfo (struct link_map *l, bool do_add)
>>>  		+ TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo));
>>>        if (listp == NULL)
>>>  	{
>>> -	  /* We ran out of memory.  We will simply fail this
>>> -	     call but don't undo anything we did so far.  The
>>> -	     application will crash or be terminated anyway very
>>> -	     soon.  */
>>> -
>>> -	  /* We have to do this since some entries in the dtv
>>> -	     slotinfo array might already point to this
>>> -	     generation.  */
>>> -	  ++GL(dl_tls_generation);
>>> -
>>> +	  /* We ran out of memory while resizing the dtv slotinfo list.  */
>>>  	  _dl_signal_error (ENOMEM, "dlopen", NULL, N_("\
>>>  cannot create TLS data structures"));
>>>  	}
  
Szabolcs Nagy April 7, 2021, 7:57 a.m. UTC | #4
The 04/06/2021 14:47, Adhemerval Zanella wrote:
> On 06/04/2021 12:48, Szabolcs Nagy wrote:
> > The 04/02/2021 17:50, Adhemerval Zanella via Libc-alpha wrote:
> >> On 15/02/2021 08:59, Szabolcs Nagy via Libc-alpha wrote:
> >>> From: Szabolcs Nagy <szabolcs dot nagy at arm dot com>
> >>>
> >>> Since
> >>>
> >>>   commit a509eb117fac1d764b15eba64993f4bdb63d7f3c
> >>>   Avoid late dlopen failure due to scope, TLS slotinfo updates [BZ #25112]
> >>>
> >>> the generation counter update is not needed in the failure path.
> >>
> >> It is not clear to me from just the commit reference why it would
> >> be safe to remove the GL(dl_tls_generation) update on 
> >> _dl_add_to_slotinfo.
> >>
> >> The dl_open_worker calls update_tls_slotinfo which in turn call
> >> might call _dl_add_to_slotinfo *after* the demarcation point.  Will
> >> it terminate the process?
> > 
> > in that commit the logic got changed such that allocations
> > happen before the demarcation point in resize_tls_slotinfo.
> > 
> > this is the reason for the do_add bool argument in
> > _dl_add_to_slotinfo: it's called twice and the first call
> > with do_add==false is only there to ensure everything is
> > allocated before the demarcation point (so module loading
> > can be reverted, no need to bump the generation count).
> > 
> > i guess this is not visible by just looking at the
> > _dl_add_to_slotinfo code.
> 
> Right, so if I reading correctly once _dl_add_to_slotinfo (..., true)
> is called by update_tls_slotinfo, the malloc that create a new
> dtv_slotinfo_list won't be called (since it was already allocated
> previously) since the entry was already pre-allocated and thus the
> search part at line 978-987 will find.  Is that correct?

yes

if you have an idea how to make things clearer let me know.

> > 
> > note that adding some asserts to ensure there is no allocation
> > when do_add==true does not work: rtld uses the same api, but
> > without the do_add==false preallocation step since at startup
> > time allocation failure is fatal anyway.
> 
> Right, the _dl_signal_error will trigger a fatal_error since
> lcatch won't be override yet.
> 
> Thanks for the explanation.
> 
> > 
> >>> ---
> >>>  elf/dl-tls.c | 11 +----------
> >>>  1 file changed, 1 insertion(+), 10 deletions(-)
> >>>
> >>> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
> >>> index 79b93ad91b..24d00c14ef 100644
> >>> --- a/elf/dl-tls.c
> >>> +++ b/elf/dl-tls.c
> >>> @@ -998,16 +998,7 @@ _dl_add_to_slotinfo (struct link_map *l, bool do_add)
> >>>  		+ TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo));
> >>>        if (listp == NULL)
> >>>  	{
> >>> -	  /* We ran out of memory.  We will simply fail this
> >>> -	     call but don't undo anything we did so far.  The
> >>> -	     application will crash or be terminated anyway very
> >>> -	     soon.  */
> >>> -
> >>> -	  /* We have to do this since some entries in the dtv
> >>> -	     slotinfo array might already point to this
> >>> -	     generation.  */
> >>> -	  ++GL(dl_tls_generation);
> >>> -
> >>> +	  /* We ran out of memory while resizing the dtv slotinfo list.  */
> >>>  	  _dl_signal_error (ENOMEM, "dlopen", NULL, N_("\
> >>>  cannot create TLS data structures"));
> >>>  	}

--
  
Adhemerval Zanella April 7, 2021, 2:20 p.m. UTC | #5
On 07/04/2021 04:57, Szabolcs Nagy wrote:
> The 04/06/2021 14:47, Adhemerval Zanella wrote:
>> On 06/04/2021 12:48, Szabolcs Nagy wrote:
>>> The 04/02/2021 17:50, Adhemerval Zanella via Libc-alpha wrote:
>>>> On 15/02/2021 08:59, Szabolcs Nagy via Libc-alpha wrote:
>>>>> From: Szabolcs Nagy <szabolcs dot nagy at arm dot com>
>>>>>
>>>>> Since
>>>>>
>>>>>   commit a509eb117fac1d764b15eba64993f4bdb63d7f3c
>>>>>   Avoid late dlopen failure due to scope, TLS slotinfo updates [BZ #25112]
>>>>>
>>>>> the generation counter update is not needed in the failure path.
>>>>
>>>> It is not clear to me from just the commit reference why it would
>>>> be safe to remove the GL(dl_tls_generation) update on 
>>>> _dl_add_to_slotinfo.
>>>>
>>>> The dl_open_worker calls update_tls_slotinfo which in turn call
>>>> might call _dl_add_to_slotinfo *after* the demarcation point.  Will
>>>> it terminate the process?
>>>
>>> in that commit the logic got changed such that allocations
>>> happen before the demarcation point in resize_tls_slotinfo.
>>>
>>> this is the reason for the do_add bool argument in
>>> _dl_add_to_slotinfo: it's called twice and the first call
>>> with do_add==false is only there to ensure everything is
>>> allocated before the demarcation point (so module loading
>>> can be reverted, no need to bump the generation count).
>>>
>>> i guess this is not visible by just looking at the
>>> _dl_add_to_slotinfo code.
>>
>> Right, so if I reading correctly once _dl_add_to_slotinfo (..., true)
>> is called by update_tls_slotinfo, the malloc that create a new
>> dtv_slotinfo_list won't be called (since it was already allocated
>> previously) since the entry was already pre-allocated and thus the
>> search part at line 978-987 will find.  Is that correct?
> 
> yes
> 
> if you have an idea how to make things clearer let me know.
> 

Maybe add it on the commit list.  The patch looks ok to me
thanks.
  

Patch

diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 79b93ad91b..24d00c14ef 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -998,16 +998,7 @@  _dl_add_to_slotinfo (struct link_map *l, bool do_add)
 		+ TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo));
       if (listp == NULL)
 	{
-	  /* We ran out of memory.  We will simply fail this
-	     call but don't undo anything we did so far.  The
-	     application will crash or be terminated anyway very
-	     soon.  */
-
-	  /* We have to do this since some entries in the dtv
-	     slotinfo array might already point to this
-	     generation.  */
-	  ++GL(dl_tls_generation);
-
+	  /* We ran out of memory while resizing the dtv slotinfo list.  */
 	  _dl_signal_error (ENOMEM, "dlopen", NULL, N_("\
 cannot create TLS data structures"));
 	}