Bug #20116: Clarify barrier-like and mutex-like behaviours of PD->lock.

Message ID 3d358d5a-66eb-cf97-ad98-16061a838f1a@redhat.com
State Superseded
Headers

Commit Message

Carlos O'Donell Feb. 13, 2017, 1:29 p.m. UTC
  Florian Weimer rightly pointed out to me that we use PD->lock in two 
distinct ways, at startup in a barrier-like fashion, and later in a 
mutex-like fashion. This should be clarified in the concurrency notes
because it implies two different algorithms are at play using the same 
concurrency structure. Also note that a normal POSIX lock would not be 
valid to use in this case because it is not valid for another thread 
to unlock it (the barrier-like use-case at startup).

OK to commit?

2017-02-13  Carlos O'Donell  <carlos@redhat.com>

	[BZ #20116]
	* nptl/pthrad_create.c: Expand comments to describe barrier-like
	and mutex-like uses of PD->lock.

---
  

Comments

Florian Weimer Feb. 13, 2017, 6:49 p.m. UTC | #1
On 02/13/2017 02:29 PM, Carlos O'Donell wrote:
> +   It is important to point out that PD->lock is being used as a POSIX
> +   barrier and a POSIX mutex.  The lock is taken in the parent to force
> +   the child to wait, and then the child releases the lock, in effect a
> +   barrier.  However, this barrier-like effect is used only for
> +   synchronizing the parent and child.  After startup the lock is used
> +   like a mutex to create a critical region during which a single owner
> +   modifies the thread parameters.

I had missed that the lock was reused for the scheduler parameter.

But the current code still does not make sense to me.  Why do we need to 
keep a copy of the scheduler parameters at all?  Is this just a cache to 
improve performance, similar to what we used to do for the PID?

Thanks,
Florian
  
Florian Weimer Feb. 14, 2017, 11:32 a.m. UTC | #2
On 02/13/2017 07:49 PM, Florian Weimer wrote:
> On 02/13/2017 02:29 PM, Carlos O'Donell wrote:
>> +   It is important to point out that PD->lock is being used as a POSIX
>> +   barrier and a POSIX mutex.  The lock is taken in the parent to force
>> +   the child to wait, and then the child releases the lock, in effect a
>> +   barrier.  However, this barrier-like effect is used only for
>> +   synchronizing the parent and child.  After startup the lock is used
>> +   like a mutex to create a critical region during which a single owner
>> +   modifies the thread parameters.
>
> I had missed that the lock was reused for the scheduler parameter.
>
> But the current code still does not make sense to me.  Why do we need to
> keep a copy of the scheduler parameters at all?  Is this just a cache to
> improve performance, similar to what we used to do for the PID?

The cache is used in the implementation of PTHREAD_PRIO_PROTECT mutexes. 
  There are data races:

   https://sourceware.org/bugzilla/show_bug.cgi?id=21160

I expect that the use of ->lock to protect these members will go away 
eventually.

Thanks,
Florian
  
Carlos O'Donell March 14, 2017, 1:07 a.m. UTC | #3
On 02/14/2017 06:32 AM, Florian Weimer wrote:
> On 02/13/2017 07:49 PM, Florian Weimer wrote:
>> On 02/13/2017 02:29 PM, Carlos O'Donell wrote:
>>> +   It is important to point out that PD->lock is being used as a POSIX
>>> +   barrier and a POSIX mutex.  The lock is taken in the parent to force
>>> +   the child to wait, and then the child releases the lock, in effect a
>>> +   barrier.  However, this barrier-like effect is used only for
>>> +   synchronizing the parent and child.  After startup the lock is used
>>> +   like a mutex to create a critical region during which a single owner
>>> +   modifies the thread parameters.
>>
>> I had missed that the lock was reused for the scheduler parameter.
>>
>> But the current code still does not make sense to me.  Why do we need to
>> keep a copy of the scheduler parameters at all?  Is this just a cache to
>> improve performance, similar to what we used to do for the PID?
> 
> The cache is used in the implementation of PTHREAD_PRIO_PROTECT mutexes.  There are data races:
> 
>   https://sourceware.org/bugzilla/show_bug.cgi?id=21160
> 
> I expect that the use of ->lock to protect these members will go away eventually.

Yes, it's used in tpp.

Given your current understand is the above additional text sufficient
to clarify the situation?
  
Florian Weimer March 14, 2017, 6:58 a.m. UTC | #4
On 03/14/2017 02:07 AM, Carlos O'Donell wrote:
> On 02/14/2017 06:32 AM, Florian Weimer wrote:
>> On 02/13/2017 07:49 PM, Florian Weimer wrote:
>>> On 02/13/2017 02:29 PM, Carlos O'Donell wrote:
>>>> +   It is important to point out that PD->lock is being used as a POSIX
>>>> +   barrier and a POSIX mutex.  The lock is taken in the parent to force
>>>> +   the child to wait, and then the child releases the lock, in effect a
>>>> +   barrier.  However, this barrier-like effect is used only for
>>>> +   synchronizing the parent and child.  After startup the lock is used
>>>> +   like a mutex to create a critical region during which a single owner
>>>> +   modifies the thread parameters.
>>>
>>> I had missed that the lock was reused for the scheduler parameter.
>>>
>>> But the current code still does not make sense to me.  Why do we need to
>>> keep a copy of the scheduler parameters at all?  Is this just a cache to
>>> improve performance, similar to what we used to do for the PID?
>>
>> The cache is used in the implementation of PTHREAD_PRIO_PROTECT mutexes.  There are data races:
>>
>>   https://sourceware.org/bugzilla/show_bug.cgi?id=21160
>>
>> I expect that the use of ->lock to protect these members will go away eventually.
>
> Yes, it's used in tpp.
>
> Given your current understand is the above additional text sufficient
> to clarify the situation?

Yes, I'm mostly fine with it.  I wouldn't call it a “POSIX barrier” and 
“POSIX mutex” though, it clearly is not.

Maybe use “as a barrier (acquired and released by different threads and 
as a mutex (acquired and released by the same thread, providing mutual 
exclusion)”.

Thanks,
Florian
  
Torvald Riegel March 14, 2017, 12:26 p.m. UTC | #5
On Tue, 2017-03-14 at 07:58 +0100, Florian Weimer wrote:
> On 03/14/2017 02:07 AM, Carlos O'Donell wrote:
> > On 02/14/2017 06:32 AM, Florian Weimer wrote:
> >> On 02/13/2017 07:49 PM, Florian Weimer wrote:
> >>> On 02/13/2017 02:29 PM, Carlos O'Donell wrote:
> >>>> +   It is important to point out that PD->lock is being used as a POSIX
> >>>> +   barrier and a POSIX mutex.

... is being used both similar to a one-shot semaphore and,
subsequently, as a mutex.

> The lock is taken in the parent to force
> >>>> +   the child to wait, and then the child releases the lock, in effect a
> >>>> +   barrier.  However, this barrier-like effect is used only for

s/, in effect a barrier//
s/barrier-like effect is used/semaphore-like use is employed/

> >>>> +   synchronizing the parent and child.  After startup the lock is used
> >>>> +   like a mutex to create a critical region during which a single owner

critical section

> >>>> +   modifies the thread parameters.
> >>>
> >>> I had missed that the lock was reused for the scheduler parameter.
> >>>
> >>> But the current code still does not make sense to me.  Why do we need to
> >>> keep a copy of the scheduler parameters at all?  Is this just a cache to
> >>> improve performance, similar to what we used to do for the PID?
> >>
> >> The cache is used in the implementation of PTHREAD_PRIO_PROTECT mutexes.  There are data races:
> >>
> >>   https://sourceware.org/bugzilla/show_bug.cgi?id=21160
> >>
> >> I expect that the use of ->lock to protect these members will go away eventually.
> >
> > Yes, it's used in tpp.
> >
> > Given your current understand is the above additional text sufficient
> > to clarify the situation?
> 
> Yes, I'm mostly fine with it.  I wouldn't call it a “POSIX barrier” and 
> “POSIX mutex” though, it clearly is not.
> 
> Maybe use “as a barrier (acquired and released by different threads and 
> as a mutex (acquired and released by the same thread, providing mutual 
> exclusion)”.

It's not a barrier actually, but closer to a semaphore (the parent does
not wait for the child to arrive, but continues immediately after
posting...).
  

Patch

diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index 2ef2bcb..b522638 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -94,8 +94,17 @@  unsigned int __nptl_nthreads = 1;
    exactly which of the four ownership states we are in and therefore
    what actions can be taken.  For example after (2) we cannot read or
    write from PD anymore since the thread may no longer exist and the
-   memory may be unmapped.  The most complicated cases happen during
-   thread startup:
+   memory may be unmapped.
+
+   It is important to point out that PD->lock is being used as a POSIX
+   barrier and a POSIX mutex.  The lock is taken in the parent to force
+   the child to wait, and then the child releases the lock, in effect a
+   barrier.  However, this barrier-like effect is used only for
+   synchronizing the parent and child.  After startup the lock is used
+   like a mutex to create a critical region during which a single owner
+   modifies the thread parameters.
+
+   The most complicated cases happen during thread startup:
 
    (a) If the created thread is in a detached (PTHREAD_CREATE_DETACHED),
        or joinable (default PTHREAD_CREATE_JOINABLE) state and