Don't divide by zero when trying to destroy an uninitialised barrier.

Message ID 5717B2F4.9050105@starleaf.com
State New, archived
Headers

Commit Message

Mark Thompson April 20, 2016, 4:48 p.m. UTC
  ---
 nptl/pthread_barrier_destroy.c | 9 +++++++++
 1 file changed, 9 insertions(+)
  

Comments

Szabolcs Nagy April 20, 2016, 5:03 p.m. UTC | #1
On 20/04/16 17:48, Mark Thompson wrote:
> ---
>  nptl/pthread_barrier_destroy.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/nptl/pthread_barrier_destroy.c b/nptl/pthread_barrier_destroy.c
> index 92d2027..d114084 100644
> --- a/nptl/pthread_barrier_destroy.c
> +++ b/nptl/pthread_barrier_destroy.c
> @@ -36,6 +36,15 @@ pthread_barrier_destroy (pthread_barrier_t *barrier)
>       they have exited as well.  To get the notification, pretend that we have
>       reached the reset threshold.  */
>    unsigned int count = bar->count;
> +
> +  /* For an initialised barrier, count must be greater than zero here.  An
> +     uninitialised barrier may still have zero, however, and in this case it is
> +     preferable to fail immediately rather than to invoke undefined behaviour
> +     by dividing by zero on the next line.  (POSIX allows the implementation to
> +     diagnose invalid state and return EINVAL in this case.)  */
> +  if (__glibc_unlikely (count == 0))
> +    return EINVAL;
> +

this case is undefined behaviour in posix, and
i think the best way to handle that is crashing.
(because no behaviour can be portably relied upon)

nowadays posix says
"The [EINVAL] error for an uninitialized barrier
attributes object is removed; this condition
results in undefined behavior."

>    unsigned int max_in_before_reset = BARRIER_IN_THRESHOLD
>                                    - BARRIER_IN_THRESHOLD % count;
>    /* Relaxed MO sufficient because the program must have ensured that all
>
  
Mark Thompson April 20, 2016, 7:16 p.m. UTC | #2
On 20/04/16 18:03, Szabolcs Nagy wrote:
> On 20/04/16 17:48, Mark Thompson wrote:
>> ---
>>  nptl/pthread_barrier_destroy.c | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/nptl/pthread_barrier_destroy.c b/nptl/pthread_barrier_destroy.c
>> index 92d2027..d114084 100644
>> --- a/nptl/pthread_barrier_destroy.c
>> +++ b/nptl/pthread_barrier_destroy.c
>> @@ -36,6 +36,15 @@ pthread_barrier_destroy (pthread_barrier_t *barrier)
>>       they have exited as well.  To get the notification, pretend that we have
>>       reached the reset threshold.  */
>>    unsigned int count = bar->count;
>> +
>> +  /* For an initialised barrier, count must be greater than zero here.  An
>> +     uninitialised barrier may still have zero, however, and in this case it is
>> +     preferable to fail immediately rather than to invoke undefined behaviour
>> +     by dividing by zero on the next line.  (POSIX allows the implementation to
>> +     diagnose invalid state and return EINVAL in this case.)  */
>> +  if (__glibc_unlikely (count == 0))
>> +    return EINVAL;
>> +
> 
> this case is undefined behaviour in posix, and
> i think the best way to handle that is crashing.
> (because no behaviour can be portably relied upon)

The undefined behaviour is not necessarily crashing - systems which do not trap on divide by zero (such as aarch64) will do something else, such as returning success or hanging forever.  Would you prefer an abort() be added to make the behavior consistent?

> nowadays posix says
> "The [EINVAL] error for an uninitialized barrier
> attributes object is removed; this condition
> results in undefined behavior."

It also says:

"If an implementation detects that the value specified by the barrier argument to pthread_barrier_destroy() does not refer to an initialized barrier object, it is recommended that the function should fail and report an [EINVAL] error."

(<http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_barrier_destroy.html>)
  
Adhemerval Zanella Netto April 20, 2016, 7:46 p.m. UTC | #3
On 20-04-2016 16:16, Mark Thompson wrote:
> On 20/04/16 18:03, Szabolcs Nagy wrote:
>> On 20/04/16 17:48, Mark Thompson wrote:
>>> ---
>>>  nptl/pthread_barrier_destroy.c | 9 +++++++++
>>>  1 file changed, 9 insertions(+)
>>>
>>> diff --git a/nptl/pthread_barrier_destroy.c b/nptl/pthread_barrier_destroy.c
>>> index 92d2027..d114084 100644
>>> --- a/nptl/pthread_barrier_destroy.c
>>> +++ b/nptl/pthread_barrier_destroy.c
>>> @@ -36,6 +36,15 @@ pthread_barrier_destroy (pthread_barrier_t *barrier)
>>>       they have exited as well.  To get the notification, pretend that we have
>>>       reached the reset threshold.  */
>>>    unsigned int count = bar->count;
>>> +
>>> +  /* For an initialised barrier, count must be greater than zero here.  An
>>> +     uninitialised barrier may still have zero, however, and in this case it is
>>> +     preferable to fail immediately rather than to invoke undefined behaviour
>>> +     by dividing by zero on the next line.  (POSIX allows the implementation to
>>> +     diagnose invalid state and return EINVAL in this case.)  */
>>> +  if (__glibc_unlikely (count == 0))
>>> +    return EINVAL;
>>> +
>>
>> this case is undefined behaviour in posix, and
>> i think the best way to handle that is crashing.
>> (because no behaviour can be portably relied upon)
> 
> The undefined behaviour is not necessarily crashing - systems which do not trap on divide by zero (such as aarch64) will do something else, such as returning success or hanging forever.  Would you prefer an abort() be added to make the behavior consistent?
> 
>> nowadays posix says
>> "The [EINVAL] error for an uninitialized barrier
>> attributes object is removed; this condition
>> results in undefined behavior."
> 
> It also says:
> 
> "If an implementation detects that the value specified by the barrier argument to pthread_barrier_destroy() does not refer to an initialized barrier object, it is recommended that the function should fail and report an [EINVAL] error."
> 
> (<http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_barrier_destroy.html>)
> 

I do not see a compelling reason to not return EINVAL if the UB 
could be detected and if POSIX stated this behaviour is recommended.
  
Torvald Riegel April 21, 2016, 4:23 p.m. UTC | #4
On Wed, 2016-04-20 at 20:16 +0100, Mark Thompson wrote:
> On 20/04/16 18:03, Szabolcs Nagy wrote:
> > On 20/04/16 17:48, Mark Thompson wrote:
> >> ---
> >>  nptl/pthread_barrier_destroy.c | 9 +++++++++
> >>  1 file changed, 9 insertions(+)
> >>
> >> diff --git a/nptl/pthread_barrier_destroy.c b/nptl/pthread_barrier_destroy.c
> >> index 92d2027..d114084 100644
> >> --- a/nptl/pthread_barrier_destroy.c
> >> +++ b/nptl/pthread_barrier_destroy.c
> >> @@ -36,6 +36,15 @@ pthread_barrier_destroy (pthread_barrier_t *barrier)
> >>       they have exited as well.  To get the notification, pretend that we have
> >>       reached the reset threshold.  */
> >>    unsigned int count = bar->count;
> >> +
> >> +  /* For an initialised barrier, count must be greater than zero here.  An
> >> +     uninitialised barrier may still have zero, however, and in this case it is
> >> +     preferable to fail immediately rather than to invoke undefined behaviour
> >> +     by dividing by zero on the next line.  (POSIX allows the implementation to
> >> +     diagnose invalid state and return EINVAL in this case.)  */
> >> +  if (__glibc_unlikely (count == 0))
> >> +    return EINVAL;
> >> +
> > 
> > this case is undefined behaviour in posix, and
> > i think the best way to handle that is crashing.
> > (because no behaviour can be portably relied upon)
> 
> The undefined behaviour is not necessarily crashing - systems which do not trap on divide by zero (such as aarch64) will do something else, such as returning success or hanging forever.  Would you prefer an abort() be added to make the behavior consistent?

IMO, abort() would be better than returning EINVAL.  See
https://sourceware.org/glibc/wiki/Style_and_Conventions#Bugs_in_the_user_program

> > nowadays posix says
> > "The [EINVAL] error for an uninitialized barrier
> > attributes object is removed; this condition
> > results in undefined behavior."
> 
> It also says:
> 
> "If an implementation detects that the value specified by the barrier argument to pthread_barrier_destroy() does not refer to an initialized barrier object, it is recommended that the function should fail and report an [EINVAL] error."
> 
> (<http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_barrier_destroy.html>)

But there are no error conditions listed for pthread_barrier_destroy,
which I read as no errors being allowed in a correct program.
Therefore, this is really UB, and we should call abort().
  
Florian Weimer April 26, 2016, 2:38 p.m. UTC | #5
On 04/20/2016 09:46 PM, Adhemerval Zanella wrote:
> I do not see a compelling reason to not return EINVAL if the UB
> could be detected and if POSIX stated this behaviour is recommended.

It would result in silent loss of synchronization if the return value is 
not checked.  Such bugs are difficult to track down.

Florian
  
Adhemerval Zanella Netto April 26, 2016, 2:44 p.m. UTC | #6
On 26/04/2016 11:38, Florian Weimer wrote:
> On 04/20/2016 09:46 PM, Adhemerval Zanella wrote:
>> I do not see a compelling reason to not return EINVAL if the UB
>> could be detected and if POSIX stated this behaviour is recommended.
> 
> It would result in silent loss of synchronization if the return value is not checked.  Such bugs are difficult to track down.
> 
> Florian

But the check is user responsibility and getting such error means the
program is doing something fuzzy.

But thinking twice seems that abort in such cases seems a better
alternative, it gives the user a more straightforward indication
he should check his code.
  
Mike Frysinger April 26, 2016, 5:24 p.m. UTC | #7
On 26 Apr 2016 11:44, Adhemerval Zanella wrote:
> On 26/04/2016 11:38, Florian Weimer wrote:
> > On 04/20/2016 09:46 PM, Adhemerval Zanella wrote:
> >> I do not see a compelling reason to not return EINVAL if the UB
> >> could be detected and if POSIX stated this behaviour is recommended.
> > 
> > It would result in silent loss of synchronization if the return value is not checked.  Such bugs are difficult to track down.
> > 
> > Florian
> 
> But the check is user responsibility and getting such error means the
> program is doing something fuzzy.
> 
> But thinking twice seems that abort in such cases seems a better
> alternative, it gives the user a more straightforward indication
> he should check his code. 

in principal, i tend to agree with you.  but as an active counter-point,
i think we can agree that the heap corruption checks which trigger aborts
have improved software in the wider ecosystem.
	... malloc(): memory corruption (fast) ...

whether we'll see as much use in this API, it's hard to say.  but if we
already have the code in place to detect a bad/invalid scenario, then an
abort doesn't seem bad to me.  when you start adding more checks though,
then due consideration to overhead/fast paths make sense.
-mike
  
Torvald Riegel April 26, 2016, 6:03 p.m. UTC | #8
On Tue, 2016-04-26 at 11:44 -0300, Adhemerval Zanella wrote:
> 
> On 26/04/2016 11:38, Florian Weimer wrote:
> > On 04/20/2016 09:46 PM, Adhemerval Zanella wrote:
> >> I do not see a compelling reason to not return EINVAL if the UB
> >> could be detected and if POSIX stated this behaviour is recommended.
> > 
> > It would result in silent loss of synchronization if the return value is not checked.  Such bugs are difficult to track down.
> > 
> > Florian
> 
> But the check is user responsibility and getting such error means the
> program is doing something fuzzy.

EINVAL is not listed as an error code, so there is no user
responsibility.
  
Adhemerval Zanella Netto April 26, 2016, 7:47 p.m. UTC | #9
On 26/04/2016 15:03, Torvald Riegel wrote:
> On Tue, 2016-04-26 at 11:44 -0300, Adhemerval Zanella wrote:
>>
>> On 26/04/2016 11:38, Florian Weimer wrote:
>>> On 04/20/2016 09:46 PM, Adhemerval Zanella wrote:
>>>> I do not see a compelling reason to not return EINVAL if the UB
>>>> could be detected and if POSIX stated this behaviour is recommended.
>>>
>>> It would result in silent loss of synchronization if the return value is not checked.  Such bugs are difficult to track down.
>>>
>>> Florian
>>
>> But the check is user responsibility and getting such error means the
>> program is doing something fuzzy.
> 
> EINVAL is not listed as an error code, so there is no user
> responsibility.
> 

Alright, so abort seems the best solution then.
  
Carlos O'Donell April 27, 2016, 2:27 p.m. UTC | #10
On 04/21/2016 12:23 PM, Torvald Riegel wrote:
> On Wed, 2016-04-20 at 20:16 +0100, Mark Thompson wrote:
>> On 20/04/16 18:03, Szabolcs Nagy wrote:
>>> On 20/04/16 17:48, Mark Thompson wrote:
>>>> ---
>>>>  nptl/pthread_barrier_destroy.c | 9 +++++++++
>>>>  1 file changed, 9 insertions(+)
>>>>
>>>> diff --git a/nptl/pthread_barrier_destroy.c b/nptl/pthread_barrier_destroy.c
>>>> index 92d2027..d114084 100644
>>>> --- a/nptl/pthread_barrier_destroy.c
>>>> +++ b/nptl/pthread_barrier_destroy.c
>>>> @@ -36,6 +36,15 @@ pthread_barrier_destroy (pthread_barrier_t *barrier)
>>>>       they have exited as well.  To get the notification, pretend that we have
>>>>       reached the reset threshold.  */
>>>>    unsigned int count = bar->count;
>>>> +
>>>> +  /* For an initialised barrier, count must be greater than zero here.  An
>>>> +     uninitialised barrier may still have zero, however, and in this case it is
>>>> +     preferable to fail immediately rather than to invoke undefined behaviour
>>>> +     by dividing by zero on the next line.  (POSIX allows the implementation to
>>>> +     diagnose invalid state and return EINVAL in this case.)  */
>>>> +  if (__glibc_unlikely (count == 0))
>>>> +    return EINVAL;
>>>> +
>>>
>>> this case is undefined behaviour in posix, and
>>> i think the best way to handle that is crashing.
>>> (because no behaviour can be portably relied upon)
>>
>> The undefined behaviour is not necessarily crashing - systems which
>> do not trap on divide by zero (such as aarch64) will do something
>> else, such as returning success or hanging forever. Would you
>> prefer an abort() be added to make the behavior consistent?
> 
> IMO, abort() would be better than returning EINVAL.  See
> https://sourceware.org/glibc/wiki/Style_and_Conventions#Bugs_in_the_user_program

Agreed.

It's easy to detect. We should abort().
  

Patch

diff --git a/nptl/pthread_barrier_destroy.c b/nptl/pthread_barrier_destroy.c
index 92d2027..d114084 100644
--- a/nptl/pthread_barrier_destroy.c
+++ b/nptl/pthread_barrier_destroy.c
@@ -36,6 +36,15 @@  pthread_barrier_destroy (pthread_barrier_t *barrier)
      they have exited as well.  To get the notification, pretend that we have
      reached the reset threshold.  */
   unsigned int count = bar->count;
+
+  /* For an initialised barrier, count must be greater than zero here.  An
+     uninitialised barrier may still have zero, however, and in this case it is
+     preferable to fail immediately rather than to invoke undefined behaviour
+     by dividing by zero on the next line.  (POSIX allows the implementation to
+     diagnose invalid state and return EINVAL in this case.)  */
+  if (__glibc_unlikely (count == 0))
+    return EINVAL;
+
   unsigned int max_in_before_reset = BARRIER_IN_THRESHOLD
                                   - BARRIER_IN_THRESHOLD % count;
   /* Relaxed MO sufficient because the program must have ensured that all