Reinitialize dl_load_write_lock on fork (bug 19282)

Message ID mvm4lrd39p8.fsf@suse.de
State New, archived
Headers

Commit Message

Andreas Schwab Oct. 5, 2017, 3:17 p.m. UTC
  * sysdeps/nptl/fork.c (__libc_fork): Reinitialize
	GL(dl_load_write_lock).
---
 sysdeps/nptl/fork.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
  

Comments

Florian Weimer Oct. 5, 2017, 3:25 p.m. UTC | #1
On 10/05/2017 05:17 PM, Andreas Schwab wrote:
> -      /* Reset the lock the dynamic loader uses to protect its data.  */
> +      /* Reset the locks the dynamic loader uses to protect its data.  */
>         __rtld_lock_initialize (GL(dl_load_lock));
> +      __rtld_lock_initialize (GL(dl_load_write_lock));

I'm pretty sure that the child now sees corrupted a dynamic linker state 
instead of hanging.  If not, this needs a comment why this does not happen.

Thanks,
Florian
  
Andreas Schwab Oct. 5, 2017, 3:39 p.m. UTC | #2
On Okt 05 2017, Florian Weimer <fweimer@redhat.com> wrote:

> On 10/05/2017 05:17 PM, Andreas Schwab wrote:
>> -      /* Reset the lock the dynamic loader uses to protect its data.  */
>> +      /* Reset the locks the dynamic loader uses to protect its data.  */
>>         __rtld_lock_initialize (GL(dl_load_lock));
>> +      __rtld_lock_initialize (GL(dl_load_write_lock));
>
> I'm pretty sure that the child now sees corrupted a dynamic linker state
> instead of hanging.

That's a good point, though this problem already existed before the
introduction of dl_load_write_lock.

Andreas.
  
Florian Weimer Oct. 5, 2017, 3:46 p.m. UTC | #3
On 10/05/2017 05:39 PM, Andreas Schwab wrote:
> On Okt 05 2017, Florian Weimer <fweimer@redhat.com> wrote:
> 
>> On 10/05/2017 05:17 PM, Andreas Schwab wrote:
>>> -      /* Reset the lock the dynamic loader uses to protect its data.  */
>>> +      /* Reset the locks the dynamic loader uses to protect its data.  */
>>>          __rtld_lock_initialize (GL(dl_load_lock));
>>> +      __rtld_lock_initialize (GL(dl_load_write_lock));
>>
>> I'm pretty sure that the child now sees corrupted a dynamic linker state
>> instead of hanging.
> 
> That's a good point, though this problem already existed before the
> introduction of dl_load_write_lock.

Why does this make a difference?

Thanks,
Florian
  
Andreas Schwab Oct. 5, 2017, 3:50 p.m. UTC | #4
On Okt 05 2017, Florian Weimer <fweimer@redhat.com> wrote:

> On 10/05/2017 05:39 PM, Andreas Schwab wrote:
>> On Okt 05 2017, Florian Weimer <fweimer@redhat.com> wrote:
>>
>>> On 10/05/2017 05:17 PM, Andreas Schwab wrote:
>>>> -      /* Reset the lock the dynamic loader uses to protect its data.  */
>>>> +      /* Reset the locks the dynamic loader uses to protect its data.  */
>>>>          __rtld_lock_initialize (GL(dl_load_lock));
>>>> +      __rtld_lock_initialize (GL(dl_load_write_lock));
>>>
>>> I'm pretty sure that the child now sees corrupted a dynamic linker state
>>> instead of hanging.
>>
>> That's a good point, though this problem already existed before the
>> introduction of dl_load_write_lock.
>
> Why does this make a difference?

Difference to what?

Andreas.
  
Florian Weimer Oct. 5, 2017, 3:54 p.m. UTC | #5
On 10/05/2017 05:50 PM, Andreas Schwab wrote:
> On Okt 05 2017, Florian Weimer <fweimer@redhat.com> wrote:
> 
>> On 10/05/2017 05:39 PM, Andreas Schwab wrote:
>>> On Okt 05 2017, Florian Weimer <fweimer@redhat.com> wrote:
>>>
>>>> On 10/05/2017 05:17 PM, Andreas Schwab wrote:
>>>>> -      /* Reset the lock the dynamic loader uses to protect its data.  */
>>>>> +      /* Reset the locks the dynamic loader uses to protect its data.  */
>>>>>           __rtld_lock_initialize (GL(dl_load_lock));
>>>>> +      __rtld_lock_initialize (GL(dl_load_write_lock));
>>>>
>>>> I'm pretty sure that the child now sees corrupted a dynamic linker state
>>>> instead of hanging.
>>>
>>> That's a good point, though this problem already existed before the
>>> introduction of dl_load_write_lock.
>>
>> Why does this make a difference?
> 
> Difference to what?

I assumed you were arguing the patch was somehow okay because we used to 
have the corruption bug.  If not and you are retracting your patch, 
that's fine, too.

Thanks,
Florian
  
Andreas Schwab Oct. 5, 2017, 4:02 p.m. UTC | #6
On Okt 05 2017, Florian Weimer <fweimer@redhat.com> wrote:

> I assumed you were arguing the patch was somehow okay because we used to
> have the corruption bug.

I was just stating a fact.  Both problems are bad enough.

Andreas.
  
Carlos O'Donell Oct. 5, 2017, 6:10 p.m. UTC | #7
On 10/05/2017 09:02 AM, Andreas Schwab wrote:
> On Okt 05 2017, Florian Weimer <fweimer@redhat.com> wrote:
> 
>> I assumed you were arguing the patch was somehow okay because we used to
>> have the corruption bug.
> 
> I was just stating a fact.  Both problems are bad enough.

Either the user coordinated the threads to avoid the situation of having
a thread in dlopen while another thread calls fork, *or* it did not avoid
this problem and you end up with a deadlock.

Either way there is no reason to apply your patch? Why do you want to clear
the lock on fork? Just so you can access potentially corrupt data?
  
Andreas Schwab Oct. 9, 2017, 7:58 a.m. UTC | #8
So this bug is actually INVALID but for a different reason: in a
multi-threaded program the forked child may only call async-signal-safe
functions.

Andreas.
  
Florian Weimer Oct. 9, 2017, 8:06 a.m. UTC | #9
On 10/09/2017 09:58 AM, Andreas Schwab wrote:
> So this bug is actually INVALID but for a different reason: in a
> multi-threaded program the forked child may only call async-signal-safe
> functions.

I think we really need to support dlopen-after-fork in multi-threaded 
processes as an extension, in the same we way need to support 
malloc-after-fork.  I expect that there are quite a few programs which 
do this.

Florian
  
Carlos O'Donell Oct. 9, 2017, 3:25 p.m. UTC | #10
On 10/09/2017 01:06 AM, Florian Weimer wrote:
> On 10/09/2017 09:58 AM, Andreas Schwab wrote:
>> So this bug is actually INVALID but for a different reason: in a 
>> multi-threaded program the forked child may only call
>> async-signal-safe functions.
> 
> I think we really need to support dlopen-after-fork in multi-threaded
> processes as an extension, in the same we way need to support
> malloc-after-fork.  I expect that there are quite a few programs
> which do this.

I agree with your sentiment, some of this needs to be supported, but
it will also need a new safety classification in the manual to describe
which interfaces are safe. Until we do this work, I agree with Andreas
that these uses are in theory INVALID. The same issues apply to SR-safety
which we discussed a year or two ago i.e. when is it safe for a function
to re-enter libc e.g. dlopen->malloc->dlopen (happens mostly with interposed
malloc).
  
Zack Weinberg Oct. 9, 2017, 5:39 p.m. UTC | #11
On Mon, Oct 9, 2017 at 11:25 AM, Carlos O'Donell <carlos@redhat.com> wrote:
>
> I agree with your sentiment, some of this needs to be supported, but
> it will also need a new safety classification in the manual to describe
> which interfaces are safe. Until we do this work, I agree with Andreas
> that these uses are in theory INVALID. The same issues apply to SR-safety
> which we discussed a year or two ago i.e. when is it safe for a function
> to re-enter libc e.g. dlopen->malloc->dlopen (happens mostly with interposed
> malloc).

Every time this stuff comes up I wonder whether it would be feasible
to reduce the scope of the locks held by dlopen, especially to avoid
holding locks while user-controlled code is running.

zw
  
Carlos O'Donell Oct. 10, 2017, 9:16 p.m. UTC | #12
On 10/09/2017 10:39 AM, Zack Weinberg wrote:
> On Mon, Oct 9, 2017 at 11:25 AM, Carlos O'Donell <carlos@redhat.com> wrote:
>>
>> I agree with your sentiment, some of this needs to be supported, but
>> it will also need a new safety classification in the manual to describe
>> which interfaces are safe. Until we do this work, I agree with Andreas
>> that these uses are in theory INVALID. The same issues apply to SR-safety
>> which we discussed a year or two ago i.e. when is it safe for a function
>> to re-enter libc e.g. dlopen->malloc->dlopen (happens mostly with interposed
>> malloc).
> 
> Every time this stuff comes up I wonder whether it would be feasible
> to reduce the scope of the locks held by dlopen, especially to avoid
> holding locks while user-controlled code is running.

It is feasible, and it's exactly what we want to do IMO. I keep thinking
we want something like RCU in the dynamic loader, because that might allow
us to drop locks when calling foreign functions. The slow case is dlopen,
as it should be, though we could make startup just as fast as we are now,
in the single-threaded case.

We've been circling this drain for a while, and the only impediment
is fixing all the other broken stuff that is slightly higher priority.
As I may have mentioned already, I have plans to work on more dynamic
loader internals to try solve the TLS access problems we have. That won't
be until early next year though. So anyone else who wants to look at it is
more than welcome.
  

Patch

diff --git a/sysdeps/nptl/fork.c b/sysdeps/nptl/fork.c
index 4bb87e2331..a3ed9e8945 100644
--- a/sysdeps/nptl/fork.c
+++ b/sysdeps/nptl/fork.c
@@ -194,8 +194,9 @@  __libc_fork (void)
 	  _IO_list_resetlock ();
 	}
 
-      /* Reset the lock the dynamic loader uses to protect its data.  */
+      /* Reset the locks the dynamic loader uses to protect its data.  */
       __rtld_lock_initialize (GL(dl_load_lock));
+      __rtld_lock_initialize (GL(dl_load_write_lock));
 
       /* Run the handlers registered for the child.  */
       while (allp != NULL)