pthread_once hangs when init routine throws an exception [BZ #18435]

Message ID 557741C5.5060203@redhat.com
State New, archived
Headers

Commit Message

Martin Sebor June 9, 2015, 7:43 p.m. UTC
  Attached is an updated version of the patch that addresses
the LDFLAGS -> LDLIBS comment. Retested on ppc64.

Is it okay to commit?

Martin

On 05/31/2015 03:37 PM, Martin Sebor wrote:
> The C++ 2011 std::call_once function is specified to allow
> the initialization routine to exit by throwing an exception.
> Such an execution, termed exceptional, requires call_once to
> propagate the exception to its caller. A program may contain
> any number of exceptional executions but only one returning
> execution (which, if it exists, must be the last execution
> with the same once flag).
>
> On POSIX systems such as Linux std::call_once is implemented
> in terms of pthread_once. However, as discussed in libstdc++
> bug 66146 - "call_once not C++11-compliant on ppc64le," GLIBC's
> pthread_once hangs when the initialization function exits by
> throwing an exception on at least arm and ppc64 (though
> apparently not on x86_64). This effectively prevents call_once
> from conforming to the C++ requirements since there doesn't
> appear to be a thread-safe way to work around this problem in
> libstdc++.
>
> The attached patch changes pthread_once to handle gracefully
> init functions that exit by throwing exceptions. It has been
> tested on ppc64, ppc64le, and x86_64 with no regressions.
>
> During the discussion of the bug concerns were raised about
> whether the use case of throwing exceptions from the
> pthread_once init routine is intended to be supported either
> by POSIX, or by GLIBC. After some research I believe that both
> POSIX and GLIBC have, in fact, intended to support it, for at
> least two reasons:
>
> First, the POSIX Rationale states in section Thread Cancellation
> Overview, under Thread Cancellation Cleanup Handlers, that:
>
>    it is an explicit goal of POSIX.1-2008 to be compatible with
>    existing exception facilities and languages having exceptions.
>
> Second, as is evident from the comment above the pthread_once
> declaration in GLIBC (quoted below), GLIBC too has intended
> to support this use case since 2004 when the comment was
> added (and the __THROW specification removed from the API):
>
>     ...
>     The initialization functions might throw exception which
>     is why this function is not marked with __THROW.  */
>
> Martin
  

Comments

Martin Sebor June 15, 2015, 7:23 p.m. UTC | #1
On 06/09/2015 01:43 PM, Martin Sebor wrote:
> Attached is an updated version of the patch that addresses
> the LDFLAGS -> LDLIBS comment. Retested on ppc64.
>
> Is it okay to commit?

Are there any outstanding concerns with this version of
the patch or is it good to commit?

Martin

>
> Martin
>
> On 05/31/2015 03:37 PM, Martin Sebor wrote:
>> The C++ 2011 std::call_once function is specified to allow
>> the initialization routine to exit by throwing an exception.
>> Such an execution, termed exceptional, requires call_once to
>> propagate the exception to its caller. A program may contain
>> any number of exceptional executions but only one returning
>> execution (which, if it exists, must be the last execution
>> with the same once flag).
>>
>> On POSIX systems such as Linux std::call_once is implemented
>> in terms of pthread_once. However, as discussed in libstdc++
>> bug 66146 - "call_once not C++11-compliant on ppc64le," GLIBC's
>> pthread_once hangs when the initialization function exits by
>> throwing an exception on at least arm and ppc64 (though
>> apparently not on x86_64). This effectively prevents call_once
>> from conforming to the C++ requirements since there doesn't
>> appear to be a thread-safe way to work around this problem in
>> libstdc++.
>>
>> The attached patch changes pthread_once to handle gracefully
>> init functions that exit by throwing exceptions. It has been
>> tested on ppc64, ppc64le, and x86_64 with no regressions.
>>
>> During the discussion of the bug concerns were raised about
>> whether the use case of throwing exceptions from the
>> pthread_once init routine is intended to be supported either
>> by POSIX, or by GLIBC. After some research I believe that both
>> POSIX and GLIBC have, in fact, intended to support it, for at
>> least two reasons:
>>
>> First, the POSIX Rationale states in section Thread Cancellation
>> Overview, under Thread Cancellation Cleanup Handlers, that:
>>
>>    it is an explicit goal of POSIX.1-2008 to be compatible with
>>    existing exception facilities and languages having exceptions.
>>
>> Second, as is evident from the comment above the pthread_once
>> declaration in GLIBC (quoted below), GLIBC too has intended
>> to support this use case since 2004 when the comment was
>> added (and the __THROW specification removed from the API):
>>
>>     ...
>>     The initialization functions might throw exception which
>>     is why this function is not marked with __THROW.  */
>>
>> Martin
>
  
Martin Sebor June 22, 2015, 7:55 p.m. UTC | #2
Are there any concerns with this patch?

   https://sourceware.org/ml/libc-alpha/2015-06/msg00348.html

Martin

On 06/15/2015 01:23 PM, Martin Sebor wrote:
> On 06/09/2015 01:43 PM, Martin Sebor wrote:
>> Attached is an updated version of the patch that addresses
>> the LDFLAGS -> LDLIBS comment. Retested on ppc64.
>>
>> Is it okay to commit?
>
> Are there any outstanding concerns with this version of
> the patch or is it good to commit?
>
> Martin
>
>>
>> Martin
>>
>> On 05/31/2015 03:37 PM, Martin Sebor wrote:
>>> The C++ 2011 std::call_once function is specified to allow
>>> the initialization routine to exit by throwing an exception.
>>> Such an execution, termed exceptional, requires call_once to
>>> propagate the exception to its caller. A program may contain
>>> any number of exceptional executions but only one returning
>>> execution (which, if it exists, must be the last execution
>>> with the same once flag).
>>>
>>> On POSIX systems such as Linux std::call_once is implemented
>>> in terms of pthread_once. However, as discussed in libstdc++
>>> bug 66146 - "call_once not C++11-compliant on ppc64le," GLIBC's
>>> pthread_once hangs when the initialization function exits by
>>> throwing an exception on at least arm and ppc64 (though
>>> apparently not on x86_64). This effectively prevents call_once
>>> from conforming to the C++ requirements since there doesn't
>>> appear to be a thread-safe way to work around this problem in
>>> libstdc++.
>>>
>>> The attached patch changes pthread_once to handle gracefully
>>> init functions that exit by throwing exceptions. It has been
>>> tested on ppc64, ppc64le, and x86_64 with no regressions.
>>>
>>> During the discussion of the bug concerns were raised about
>>> whether the use case of throwing exceptions from the
>>> pthread_once init routine is intended to be supported either
>>> by POSIX, or by GLIBC. After some research I believe that both
>>> POSIX and GLIBC have, in fact, intended to support it, for at
>>> least two reasons:
>>>
>>> First, the POSIX Rationale states in section Thread Cancellation
>>> Overview, under Thread Cancellation Cleanup Handlers, that:
>>>
>>>    it is an explicit goal of POSIX.1-2008 to be compatible with
>>>    existing exception facilities and languages having exceptions.
>>>
>>> Second, as is evident from the comment above the pthread_once
>>> declaration in GLIBC (quoted below), GLIBC too has intended
>>> to support this use case since 2004 when the comment was
>>> added (and the __THROW specification removed from the API):
>>>
>>>     ...
>>>     The initialization functions might throw exception which
>>>     is why this function is not marked with __THROW.  */
>>>
>>> Martin
>>
>
  
Martin Sebor June 30, 2015, 11:25 p.m. UTC | #3
Are there any unresolved concerns with or objections to this
patch?

   https://sourceware.org/ml/libc-alpha/2015-06/msg00348.html

On 06/22/2015 01:55 PM, Martin Sebor wrote:
> Are there any concerns with this patch?
>
>    https://sourceware.org/ml/libc-alpha/2015-06/msg00348.html
>
> Martin
>
> On 06/15/2015 01:23 PM, Martin Sebor wrote:
>> On 06/09/2015 01:43 PM, Martin Sebor wrote:
>>> Attached is an updated version of the patch that addresses
>>> the LDFLAGS -> LDLIBS comment. Retested on ppc64.
>>>
>>> Is it okay to commit?
>>
>> Are there any outstanding concerns with this version of
>> the patch or is it good to commit?
>>
>> Martin
>>
>>>
>>> Martin
>>>
>>> On 05/31/2015 03:37 PM, Martin Sebor wrote:
>>>> The C++ 2011 std::call_once function is specified to allow
>>>> the initialization routine to exit by throwing an exception.
>>>> Such an execution, termed exceptional, requires call_once to
>>>> propagate the exception to its caller. A program may contain
>>>> any number of exceptional executions but only one returning
>>>> execution (which, if it exists, must be the last execution
>>>> with the same once flag).
>>>>
>>>> On POSIX systems such as Linux std::call_once is implemented
>>>> in terms of pthread_once. However, as discussed in libstdc++
>>>> bug 66146 - "call_once not C++11-compliant on ppc64le," GLIBC's
>>>> pthread_once hangs when the initialization function exits by
>>>> throwing an exception on at least arm and ppc64 (though
>>>> apparently not on x86_64). This effectively prevents call_once
>>>> from conforming to the C++ requirements since there doesn't
>>>> appear to be a thread-safe way to work around this problem in
>>>> libstdc++.
>>>>
>>>> The attached patch changes pthread_once to handle gracefully
>>>> init functions that exit by throwing exceptions. It has been
>>>> tested on ppc64, ppc64le, and x86_64 with no regressions.
>>>>
>>>> During the discussion of the bug concerns were raised about
>>>> whether the use case of throwing exceptions from the
>>>> pthread_once init routine is intended to be supported either
>>>> by POSIX, or by GLIBC. After some research I believe that both
>>>> POSIX and GLIBC have, in fact, intended to support it, for at
>>>> least two reasons:
>>>>
>>>> First, the POSIX Rationale states in section Thread Cancellation
>>>> Overview, under Thread Cancellation Cleanup Handlers, that:
>>>>
>>>>    it is an explicit goal of POSIX.1-2008 to be compatible with
>>>>    existing exception facilities and languages having exceptions.
>>>>
>>>> Second, as is evident from the comment above the pthread_once
>>>> declaration in GLIBC (quoted below), GLIBC too has intended
>>>> to support this use case since 2004 when the comment was
>>>> added (and the __THROW specification removed from the API):
>>>>
>>>>     ...
>>>>     The initialization functions might throw exception which
>>>>     is why this function is not marked with __THROW.  */
>>>>
>>>> Martin
>>>
>>
>
  
Rich Felker July 1, 2015, 12:06 a.m. UTC | #4
On Tue, Jun 30, 2015 at 05:25:31PM -0600, Martin Sebor wrote:
> Are there any unresolved concerns with or objections to this
> patch?
> 
>   https://sourceware.org/ml/libc-alpha/2015-06/msg00348.html

I still object to enshrining this usage as a supported feature, but I
don't object to the patch. In particular I think GCC should fix its
invalid usage of pthread_once to implement the 'equivalent' C++
threads primitive.

Rich
  
Martin Sebor July 1, 2015, 8:18 p.m. UTC | #5
On 06/30/2015 06:06 PM, Rich Felker wrote:
> On Tue, Jun 30, 2015 at 05:25:31PM -0600, Martin Sebor wrote:
>> Are there any unresolved concerns with or objections to this
>> patch?
>>
>>    https://sourceware.org/ml/libc-alpha/2015-06/msg00348.html
>
> I still object to enshrining this usage as a supported feature, but I
> don't object to the patch. In particular I think GCC should fix its
> invalid usage of pthread_once to implement the 'equivalent' C++
> threads primitive.

Thank you. The patch has been committed.

Martin
  
Joseph Myers July 1, 2015, 9:27 p.m. UTC | #6
On Wed, 1 Jul 2015, Martin Sebor wrote:

> On 06/30/2015 06:06 PM, Rich Felker wrote:
> > On Tue, Jun 30, 2015 at 05:25:31PM -0600, Martin Sebor wrote:
> > > Are there any unresolved concerns with or objections to this
> > > patch?
> > > 
> > >    https://sourceware.org/ml/libc-alpha/2015-06/msg00348.html
> > 
> > I still object to enshrining this usage as a supported feature, but I
> > don't object to the patch. In particular I think GCC should fix its
> > invalid usage of pthread_once to implement the 'equivalent' C++
> > threads primitive.
> 
> Thank you. The patch has been committed.

If the patch fully fixes the bug, please close it as fixed.
  
Szabolcs Nagy July 6, 2015, 1:18 p.m. UTC | #7
On 09/06/15 20:43, Martin Sebor wrote:
> Attached is an updated version of the patch that addresses
> the LDFLAGS -> LDLIBS comment. Retested on ppc64.
> 
> Is it okay to commit?
> 
> Martin
> 
> On 05/31/2015 03:37 PM, Martin Sebor wrote:
>> > The C++ 2011 std::call_once function is specified to allow
>> > the initialization routine to exit by throwing an exception.
>> > Such an execution, termed exceptional, requires call_once to
>> > propagate the exception to its caller. A program may contain
>> > any number of exceptional executions but only one returning
>> > execution (which, if it exists, must be the last execution
>> > with the same once flag).
>> >
>> > On POSIX systems such as Linux std::call_once is implemented
>> > in terms of pthread_once. However, as discussed in libstdc++
>> > bug 66146 - "call_once not C++11-compliant on ppc64le," GLIBC's
>> > pthread_once hangs when the initialization function exits by
>> > throwing an exception on at least arm and ppc64 (though
>> > apparently not on x86_64). This effectively prevents call_once
>> > from conforming to the C++ requirements since there doesn't
>> > appear to be a thread-safe way to work around this problem in
>> > libstdc++.
>> >
>> > The attached patch changes pthread_once to handle gracefully
>> > init functions that exit by throwing exceptions. It has been
>> > tested on ppc64, ppc64le, and x86_64 with no regressions.
...
> diff --git a/nptl/pthreadP.h b/nptl/pthreadP.h
> index 84a7105..72d3e23 100644
> --- a/nptl/pthreadP.h
> +++ b/nptl/pthreadP.h
> @@ -536,16 +536,9 @@ extern void __librt_disable_asynccancel (int oldtype)
>  extern void __pthread_cleanup_push (struct _pthread_cleanup_buffer *buffer,
>  				    void (*routine) (void *), void *arg)
>       attribute_hidden;
> -# undef pthread_cleanup_push
> -# define pthread_cleanup_push(routine,arg) \
> -  { struct _pthread_cleanup_buffer _buffer;				      \
> -    __pthread_cleanup_push (&_buffer, (routine), (arg));
>  
>  extern void __pthread_cleanup_pop (struct _pthread_cleanup_buffer *buffer,
>  				   int execute) attribute_hidden;
> -# undef pthread_cleanup_pop
> -# define pthread_cleanup_pop(execute) \
> -    __pthread_cleanup_pop (&_buffer, (execute)); }
>  #endif
>  
>  extern void __pthread_cleanup_push_defer (struct _pthread_cleanup_buffer *buffer,

this broke

nptl/tst-join5
nptl/tst-once3

tests on aarch64.

the cleanup handler of the pthread_once and pthread_join
implementation don't run when they are canceled.
  
Martin Sebor July 6, 2015, 2:16 p.m. UTC | #8
> this broke
>
> nptl/tst-join5
> nptl/tst-once3
>
> tests on aarch64.
>
> the cleanup handler of the pthread_once and pthread_join
> implementation don't run when they are canceled.

I'll look into it as soon as I get access to an aarch64 machine.

Martin
  
Adhemerval Zanella July 6, 2015, 2:58 p.m. UTC | #9
On 06-07-2015 11:16, Martin Sebor wrote:
>> this broke
>>
>> nptl/tst-join5
>> nptl/tst-once3
>>
>> tests on aarch64.
>>
>> the cleanup handler of the pthread_once and pthread_join
>> implementation don't run when they are canceled.
> 
> I'll look into it as soon as I get access to an aarch64 machine.
> 
> Martin
> 

And I see a regression with

nptl/tst-once3

for armhf.
  
Szabolcs Nagy July 6, 2015, 4:33 p.m. UTC | #10
On 06/07/15 15:58, Adhemerval Zanella wrote:
> On 06-07-2015 11:16, Martin Sebor wrote:
>>> this broke
>>>
>>> nptl/tst-join5
>>> nptl/tst-once3
>>>
>>> tests on aarch64.
>>>
>>> the cleanup handler of the pthread_once and pthread_join
>>> implementation don't run when they are canceled.
>>
>> I'll look into it as soon as I get access to an aarch64 machine.
>>
>> Martin
>>
> 
> And I see a regression with
> 
> nptl/tst-once3
> 
> for armhf.
> 

in case of aarch64 the bug is somewhere in __pthread_unwind
(called from __do_cancel) so probably a libgcc issue.
  
Szabolcs Nagy July 6, 2015, 5:08 p.m. UTC | #11
On 06/07/15 17:33, Szabolcs Nagy wrote:
> On 06/07/15 15:58, Adhemerval Zanella wrote:
>> On 06-07-2015 11:16, Martin Sebor wrote:
>>>> this broke
>>>>
>>>> nptl/tst-join5
>>>> nptl/tst-once3
>>>>
>>>> tests on aarch64.
>>>>
>>>> the cleanup handler of the pthread_once and pthread_join
>>>> implementation don't run when they are canceled.
>>>
>>> I'll look into it as soon as I get access to an aarch64 machine.
>>>
>>> Martin
>>>
>>
>> And I see a regression with
>>
>> nptl/tst-once3
>>
>> for armhf.
>>
> 
> in case of aarch64 the bug is somewhere in __pthread_unwind
> (called from __do_cancel) so probably a libgcc issue.
> 

the problem seems to be that gcc on x86_64 turns on
-fasynchronous-unwind-tables by default, but not on
aarch64 or arm.

now i added -fasynchronous-unwind-tables to the cflags
of the relevant tests, will send a patch if they pass.
  
Szabolcs Nagy July 8, 2015, 11 a.m. UTC | #12
On 06/07/15 18:08, Szabolcs Nagy wrote:
> On 06/07/15 17:33, Szabolcs Nagy wrote:
>> On 06/07/15 15:58, Adhemerval Zanella wrote:
>>> On 06-07-2015 11:16, Martin Sebor wrote:
>>>>> this broke
>>>>>
>>>>> nptl/tst-join5
>>>>> nptl/tst-once3
>>>>>
>>>>> tests on aarch64.
>>>>>
>>>>> the cleanup handler of the pthread_once and pthread_join
>>>>> implementation don't run when they are canceled.
>>>>
>>>> I'll look into it as soon as I get access to an aarch64 machine.
>>>>
>>>> Martin
>>>>
>>>
>>> And I see a regression with
>>>
>>> nptl/tst-once3
>>>
>>> for armhf.
>>>
>>
>> in case of aarch64 the bug is somewhere in __pthread_unwind
>> (called from __do_cancel) so probably a libgcc issue.
>>
> 
> the problem seems to be that gcc on x86_64 turns on
> -fasynchronous-unwind-tables by default, but not on
> aarch64 or arm.
> 
> now i added -fasynchronous-unwind-tables to the cflags
> of the relevant tests, will send a patch if they pass.
> 

This uncovered a serious issue that affects other archs too.

Both test failures are caused by glibc switching the internal
mechanism of pthread cancellation clean up handling to use
__attribute__((cleanup(f))) and -fexceptions, but the two test
failures are independent:

(1) Should -fasynchronous-unwind-tables be on by default in gcc?

nptl/tst-once3 fails because the callback passed to pthread_once
now has to be compiled with -fasynchronous-unwind-tables which
is not on by default on arm and aarch64 gcc.  So does glibc
expect the users to use this flag correctly or does glibc
requires the compiler to have it on by default?

(My understanding: posix conforming c code cannot observe the
presence of -fasynchronous-unwind-tables without invoking UB, but
the glibc implementation of cancellation cleanup and backtrace
from signal handlers makes this detail observable.  Any function
which may be canceled needs this flag to make cleanup work, so
glibc seems to impose this as a requirement on the compiler: the
user may not be in control of all the code that may be canceled).


(2) Should gcc support exceptions from async signal handlers?

nptl/tst-join5 failure is more problematic: it fails because gcc
does not seem to implement -fexceptions with the assumption that
signal handlers can throw, in particular it assumes inline asm
does not throw exceptions.  If the syscall that is a cancellation
point appears between pthread_cleanup_push and pthread_cleanup_pop
in glibc internal code, the cleanup handler may not get run on
cancellation depending on where gcc moved the syscall inline asm.
(It is free to move it outside the code range that is marked for
exception handling, this is what happens on aarch64 in pthread_join).
This affects all archs, but some may get lucky.

(My understanding: gcc must be very strict about how it marks
the code range for exception handling and assume any instruction
may throw if it wants -fexceptions -fasynchronous-unwind-tables to
work from signal handlers.  Current compilers do not seem to support
this so glibc internal code should not rely on it, which means the
cancellation mechanism should not rely on exception handling at
least not when the exception is thrown from the cancel signal
handler.  I think the gnu toolchain should not try to make pthread
cancellation to interoperate with C++ exceptions nor to make
exceptions work from signal handlers: no standard requires this
behaviour and seems to cause problems).


Both issues cause silent omission of cleanup handlers running
on cancellation, leaving libc internal state inconsistent.

The second issue may be worth discussing on the gcc list.
  
Carlos O'Donell July 8, 2015, 4:09 p.m. UTC | #13
On 07/08/2015 07:00 AM, Szabolcs Nagy wrote:
> On 06/07/15 18:08, Szabolcs Nagy wrote:
>> On 06/07/15 17:33, Szabolcs Nagy wrote:
>>> On 06/07/15 15:58, Adhemerval Zanella wrote:
>>>> On 06-07-2015 11:16, Martin Sebor wrote:
>>>>>> this broke
>>>>>>
>>>>>> nptl/tst-join5
>>>>>> nptl/tst-once3
>>>>>>
>>>>>> tests on aarch64.
>>>>>>
>>>>>> the cleanup handler of the pthread_once and pthread_join
>>>>>> implementation don't run when they are canceled.
>>>>>
>>>>> I'll look into it as soon as I get access to an aarch64 machine.
>>>>>
>>>>> Martin
>>>>>
>>>>
>>>> And I see a regression with
>>>>
>>>> nptl/tst-once3
>>>>
>>>> for armhf.
>>>>
>>>
>>> in case of aarch64 the bug is somewhere in __pthread_unwind
>>> (called from __do_cancel) so probably a libgcc issue.
>>>
>>
>> the problem seems to be that gcc on x86_64 turns on
>> -fasynchronous-unwind-tables by default, but not on
>> aarch64 or arm.
>>
>> now i added -fasynchronous-unwind-tables to the cflags
>> of the relevant tests, will send a patch if they pass.
>>
> 
> This uncovered a serious issue that affects other archs too.

Thanks.

> Both test failures are caused by glibc switching the internal
> mechanism of pthread cancellation clean up handling to use
> __attribute__((cleanup(f))) and -fexceptions, but the two test
> failures are independent:
> 
> (1) Should -fasynchronous-unwind-tables be on by default in gcc?
> 
> nptl/tst-once3 fails because the callback passed to pthread_once
> now has to be compiled with -fasynchronous-unwind-tables which
> is not on by default on arm and aarch64 gcc.  So does glibc
> expect the users to use this flag correctly or does glibc
> requires the compiler to have it on by default?

This is bad.

> (My understanding: posix conforming c code cannot observe the
> presence of -fasynchronous-unwind-tables without invoking UB, but
> the glibc implementation of cancellation cleanup and backtrace
> from signal handlers makes this detail observable.  Any function
> which may be canceled needs this flag to make cleanup work, so
> glibc seems to impose this as a requirement on the compiler: the
> user may not be in control of all the code that may be canceled).
 
We already impose the requirement that all such called code be
cancel safe anyway and it might not be unless all called code
uses cancel handlers to cleanup during cancellation. This would
be another requirement that imposes -fasynchronous-unwind-tables
on cancellation users. However, this is a new requirement and
old code can't be fixed, and thus we have problem that requires
versioning and documentation. All for the purposes of implementing
C++ std::call_once via pthread_once, which seems like is going
to be problematic.
 
> (2) Should gcc support exceptions from async signal handlers?

No. I don't think you can support it safely.

> nptl/tst-join5 failure is more problematic: it fails because gcc
> does not seem to implement -fexceptions with the assumption that
> signal handlers can throw, in particular it assumes inline asm
> does not throw exceptions.  If the syscall that is a cancellation
> point appears between pthread_cleanup_push and pthread_cleanup_pop
> in glibc internal code, the cleanup handler may not get run on
> cancellation depending on where gcc moved the syscall inline asm.
> (It is free to move it outside the code range that is marked for
> exception handling, this is what happens on aarch64 in pthread_join).
> This affects all archs, but some may get lucky.

Ah! That's truly a terrible scenario.

> (My understanding: gcc must be very strict about how it marks
> the code range for exception handling and assume any instruction
> may throw if it wants -fexceptions -fasynchronous-unwind-tables to
> work from signal handlers.  Current compilers do not seem to support
> this so glibc internal code should not rely on it, which means the
> cancellation mechanism should not rely on exception handling at
> least not when the exception is thrown from the cancel signal
> handler.  I think the gnu toolchain should not try to make pthread
> cancellation to interoperate with C++ exceptions nor to make
> exceptions work from signal handlers: no standard requires this
> behaviour and seems to cause problems).

No, we just need to revert this patch and have C++ implement
std::call_once by itself.
 
> Both issues cause silent omission of cleanup handlers running
> on cancellation, leaving libc internal state inconsistent.
> 
> The second issue may be worth discussing on the gcc list.
> 

Cheers,
Carlos.
  
Torvald Riegel July 8, 2015, 4:33 p.m. UTC | #14
On Wed, 2015-07-08 at 12:09 -0400, Carlos O'Donell wrote:
> On 07/08/2015 07:00 AM, Szabolcs Nagy wrote:
> > (2) Should gcc support exceptions from async signal handlers?
> 
> No. I don't think you can support it safely.
> 
> > nptl/tst-join5 failure is more problematic: it fails because gcc
> > does not seem to implement -fexceptions with the assumption that
> > signal handlers can throw, in particular it assumes inline asm
> > does not throw exceptions.  If the syscall that is a cancellation
> > point appears between pthread_cleanup_push and pthread_cleanup_pop
> > in glibc internal code, the cleanup handler may not get run on
> > cancellation depending on where gcc moved the syscall inline asm.
> > (It is free to move it outside the code range that is marked for
> > exception handling, this is what happens on aarch64 in pthread_join).
> > This affects all archs, but some may get lucky.
> 
> Ah! That's truly a terrible scenario.
> 
> > (My understanding: gcc must be very strict about how it marks
> > the code range for exception handling and assume any instruction
> > may throw if it wants -fexceptions -fasynchronous-unwind-tables to
> > work from signal handlers.  Current compilers do not seem to support
> > this so glibc internal code should not rely on it, which means the
> > cancellation mechanism should not rely on exception handling at
> > least not when the exception is thrown from the cancel signal
> > handler.  I think the gnu toolchain should not try to make pthread
> > cancellation to interoperate with C++ exceptions nor to make
> > exceptions work from signal handlers: no standard requires this
> > behaviour and seems to cause problems).
> 
> No, we just need to revert this patch and have C++ implement
> std::call_once by itself.

Would point (2) be taken care of by Adhemerval's cancellation changes?
  
Szabolcs Nagy July 8, 2015, 4:52 p.m. UTC | #15
On 08/07/15 17:33, Torvald Riegel wrote:
> On Wed, 2015-07-08 at 12:09 -0400, Carlos O'Donell wrote:
>> On 07/08/2015 07:00 AM, Szabolcs Nagy wrote:
>>> (2) Should gcc support exceptions from async signal handlers?
>>
>> No. I don't think you can support it safely.
>>
>>> nptl/tst-join5 failure is more problematic: it fails because gcc
>>> does not seem to implement -fexceptions with the assumption that
>>> signal handlers can throw, in particular it assumes inline asm
>>> does not throw exceptions.  If the syscall that is a cancellation
>>> point appears between pthread_cleanup_push and pthread_cleanup_pop
>>> in glibc internal code, the cleanup handler may not get run on
>>> cancellation depending on where gcc moved the syscall inline asm.
>>> (It is free to move it outside the code range that is marked for
>>> exception handling, this is what happens on aarch64 in pthread_join).
>>> This affects all archs, but some may get lucky.
>>
>> Ah! That's truly a terrible scenario.
>>
>>> (My understanding: gcc must be very strict about how it marks
>>> the code range for exception handling and assume any instruction
>>> may throw if it wants -fexceptions -fasynchronous-unwind-tables to
>>> work from signal handlers.  Current compilers do not seem to support
>>> this so glibc internal code should not rely on it, which means the
>>> cancellation mechanism should not rely on exception handling at
>>> least not when the exception is thrown from the cancel signal
>>> handler.  I think the gnu toolchain should not try to make pthread
>>> cancellation to interoperate with C++ exceptions nor to make
>>> exceptions work from signal handlers: no standard requires this
>>> behaviour and seems to cause problems).
>>
>> No, we just need to revert this patch and have C++ implement
>> std::call_once by itself.
> 
> Would point (2) be taken care of by Adhemerval's cancellation changes?
> 

yes: if the cancel point syscall is not inline asm,
but extern call (that is not marked with nothrow)
then gcc -fexceptions should handle it correctly.

asynchronous cancellation is still problematic,
but that is a special case.
  
Carlos O'Donell July 8, 2015, 5:14 p.m. UTC | #16
On 07/08/2015 12:52 PM, Szabolcs Nagy wrote:
> On 08/07/15 17:33, Torvald Riegel wrote:
>> On Wed, 2015-07-08 at 12:09 -0400, Carlos O'Donell wrote:
>>> On 07/08/2015 07:00 AM, Szabolcs Nagy wrote:
>>>> (2) Should gcc support exceptions from async signal handlers?
>>>
>>> No. I don't think you can support it safely.
>>>
>>>> nptl/tst-join5 failure is more problematic: it fails because gcc
>>>> does not seem to implement -fexceptions with the assumption that
>>>> signal handlers can throw, in particular it assumes inline asm
>>>> does not throw exceptions.  If the syscall that is a cancellation
>>>> point appears between pthread_cleanup_push and pthread_cleanup_pop
>>>> in glibc internal code, the cleanup handler may not get run on
>>>> cancellation depending on where gcc moved the syscall inline asm.
>>>> (It is free to move it outside the code range that is marked for
>>>> exception handling, this is what happens on aarch64 in pthread_join).
>>>> This affects all archs, but some may get lucky.
>>>
>>> Ah! That's truly a terrible scenario.
>>>
>>>> (My understanding: gcc must be very strict about how it marks
>>>> the code range for exception handling and assume any instruction
>>>> may throw if it wants -fexceptions -fasynchronous-unwind-tables to
>>>> work from signal handlers.  Current compilers do not seem to support
>>>> this so glibc internal code should not rely on it, which means the
>>>> cancellation mechanism should not rely on exception handling at
>>>> least not when the exception is thrown from the cancel signal
>>>> handler.  I think the gnu toolchain should not try to make pthread
>>>> cancellation to interoperate with C++ exceptions nor to make
>>>> exceptions work from signal handlers: no standard requires this
>>>> behaviour and seems to cause problems).
>>>
>>> No, we just need to revert this patch and have C++ implement
>>> std::call_once by itself.
>>
>> Would point (2) be taken care of by Adhemerval's cancellation changes?
>>
> 
> yes: if the cancel point syscall is not inline asm,
> but extern call (that is not marked with nothrow)
> then gcc -fexceptions should handle it correctly.
> 
> asynchronous cancellation is still problematic,
> but that is a special case.

And we still have to support that case which makes this change
a net loss of functionality. Therefore I think we need to revert
this and try again 2.23.

Cheers,
Carlos.
  
Jakub Jelinek March 2, 2021, 4:43 p.m. UTC | #17
On Wed, Jul 08, 2015 at 12:09:49PM -0400, Carlos O'Donell wrote:
> > (My understanding: gcc must be very strict about how it marks
> > the code range for exception handling and assume any instruction
> > may throw if it wants -fexceptions -fasynchronous-unwind-tables to
> > work from signal handlers.  Current compilers do not seem to support
> > this so glibc internal code should not rely on it, which means the
> > cancellation mechanism should not rely on exception handling at
> > least not when the exception is thrown from the cancel signal
> > handler.  I think the gnu toolchain should not try to make pthread
> > cancellation to interoperate with C++ exceptions nor to make
> > exceptions work from signal handlers: no standard requires this
> > behaviour and seems to cause problems).
> 
> No, we just need to revert this patch and have C++ implement
> std::call_once by itself.

As has been found, it is unfortunately impossible for C++ to implement it by
itself without breaking ABI.
What libstdc++ needs is either that a modified version of Martin's patch
is committed again, or a new entry-point which is ABI compatible with
pthread_once but ensures the exception behavior is added (or both,
make pthread_once work with exceptions and add an alias to pthread_once
that guarantees that behavior).

The only bug I see in Martin's patch is that he has removed the overrides
altogether.
sysdeps/generic/nptl/pthread.h
defines 3 different implementations of pthread_cleanup_{push,pop}:
1) C++ -fexceptions (using dtor)
2) C -fexceptions (using cleanup attribute)
3) -fno-exceptions one
The override in pthreadP.h which is used only if IS_IN_libpthread is
meant as a better replacement for 3), but isn't a good replacement for 2).

The overrides in pthreadP.h are only used by pthread_once.c, sem_*wait.c and
pthread_join_common.c.
Seems currently everything but pthread_join_common.c is compiled with
-fexceptions -fasynchronous-unwind-tables, so guarding the pthreadP.h
2 #defines (but probably not the function declarations) with
#if !defined(__GNU__) || !defined(__EXCEPTIONS)
seems the way to go to me.

And, perhaps incrementally consider compiling pthread_join_common.c
with -fexceptions -fasynchronous-unwind-tables too.
The pthread_once.c stuff is most important, sure, but if my fuzzy memory
serves me a little, I think for pthread_cancel once it hits a first
non-dtor/cleanup registered cleanup it will just handle the
3) method registered ones and not others, while if it uses the 1)/2)
cleanups, it will continue doing that until it reaches some 3) registered
ones if any.

	Jakub
  

Patch

diff --git a/ChangeLog b/ChangeLog
index 353b383..1cd79e8 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,11 @@ 
+2015-06-09  Martin Sebor  <msebor@redhat.com>
+
+	[BZ #18435]
+	* nptl/Makefile: Add tst-once5.cc.
+	* nptl/pthreadP.h (pthread_cleanup_push, pthread_cleanup_pop):
+	Remove macro redefinitions.
+	* nptl/tst-once5.cc: New test.
+
 2015-06-09  Andrew Senkevich  <andrew.senkevich@intel.com>
 
 	* sysdeps/x86_64/fpu/Makefile: New file.
diff --git a/NEWS b/NEWS
index 53f244d..196cf3f 100644
--- a/NEWS
+++ b/NEWS
@@ -19,8 +19,8 @@  Version 2.22
   18043, 18046, 18047, 18049, 18068, 18080, 18093, 18100, 18104, 18110,
   18111, 18116, 18125, 18128, 18138, 18185, 18196, 18197, 18206, 18210,
   18211, 18217, 18220, 18221, 18234, 18244, 18247, 18287, 18319, 18324,
-  18333, 18346, 18397, 18409, 18410, 18412, 18418, 18422, 18434, 18444,
-  18468, 18469, 18470, 18483, 18495, 18496, 18498.
+  18333, 18346, 18397, 18409, 18410, 18412, 18418, 18422, 18434, 18435,
+  18444, 18468, 18469, 18470, 18483, 18495, 18496, 18498.
 
 * Cache information can be queried via sysconf() function on s390 e.g. with
   _SC_LEVEL1_ICACHE_SIZE as argument.
diff --git a/nptl/Makefile b/nptl/Makefile
index d58324d..00aedae 100644
--- a/nptl/Makefile
+++ b/nptl/Makefile
@@ -210,6 +210,7 @@  CFLAGS-recvfrom.c = -fexceptions -fasynchronous-unwind-tables
 
 CFLAGS-pt-system.c = -fexceptions
 
+LDLIBS-tst-once5 = -lstdc++
 
 tests = tst-typesizes \
 	tst-attr1 tst-attr2 tst-attr3 tst-default-attr \
@@ -232,7 +233,7 @@  tests = tst-typesizes \
 	tst-rwlock5 tst-rwlock6 tst-rwlock7 tst-rwlock8 tst-rwlock9 \
 	tst-rwlock10 tst-rwlock11 tst-rwlock12 tst-rwlock13 tst-rwlock14 \
 	tst-rwlock15 tst-rwlock16 \
-	tst-once1 tst-once2 tst-once3 tst-once4 \
+	tst-once1 tst-once2 tst-once3 tst-once4 tst-once5 \
 	tst-key1 tst-key2 tst-key3 tst-key4 \
 	tst-sem1 tst-sem2 tst-sem3 tst-sem4 tst-sem5 tst-sem6 tst-sem7 \
 	tst-sem8 tst-sem9 tst-sem10 tst-sem11 tst-sem12 tst-sem13 tst-sem14 \
diff --git a/nptl/pthreadP.h b/nptl/pthreadP.h
index 84a7105..72d3e23 100644
--- a/nptl/pthreadP.h
+++ b/nptl/pthreadP.h
@@ -536,16 +536,9 @@  extern void __librt_disable_asynccancel (int oldtype)
 extern void __pthread_cleanup_push (struct _pthread_cleanup_buffer *buffer,
 				    void (*routine) (void *), void *arg)
      attribute_hidden;
-# undef pthread_cleanup_push
-# define pthread_cleanup_push(routine,arg) \
-  { struct _pthread_cleanup_buffer _buffer;				      \
-    __pthread_cleanup_push (&_buffer, (routine), (arg));
 
 extern void __pthread_cleanup_pop (struct _pthread_cleanup_buffer *buffer,
 				   int execute) attribute_hidden;
-# undef pthread_cleanup_pop
-# define pthread_cleanup_pop(execute) \
-    __pthread_cleanup_pop (&_buffer, (execute)); }
 #endif
 
 extern void __pthread_cleanup_push_defer (struct _pthread_cleanup_buffer *buffer,
diff --git a/nptl/tst-once5.cc b/nptl/tst-once5.cc
new file mode 100644
index 0000000..60bc78a
--- /dev/null
+++ b/nptl/tst-once5.cc
@@ -0,0 +1,80 @@ 
+/* Copyright (C) 2015 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   Contributed by Ulrich Drepper <drepper@redhat.com>, 2002.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+#include <pthread.h>
+#include <stdio.h>
+#include <string.h>
+
+
+static pthread_once_t once = PTHREAD_ONCE_INIT;
+
+// Exception type thrown from the pthread_once init routine.
+struct OnceException { };
+
+// Test iteration counter.
+static int niter;
+
+static void
+init_routine (void)
+{
+  if (niter < 2)
+    throw OnceException ();
+}
+
+// Verify that an exception thrown from the pthread_once init routine
+// is propagated to the pthread_once caller and that the function can
+// be subsequently invoked to attempt the initialization again.
+static int
+do_test (void)
+{
+  int result = 1;
+
+  // Repeat three times, having the init routine throw the first two
+  // times and succeed on the final attempt.
+  for (niter = 0; niter != 3; ++niter) {
+
+    try {
+      int rc = pthread_once (&once, init_routine);
+      if (rc)
+        fprintf (stderr, "pthread_once failed: %i (%s)\n",
+                 rc, strerror (rc));
+
+      if (niter < 2)
+        fputs ("pthread_once unexpectedly returned without"
+               " throwing an exception", stderr);
+    }
+    catch (OnceException) {
+      if (1 < niter)
+        fputs ("pthread_once unexpectedly threw", stderr);
+      result = 0;
+    }
+    catch (...) {
+      fputs ("pthread_once threw an unknown exception", stderr);
+    }
+
+    // Abort the test on the first failure.
+    if (result)
+      break;
+  }
+
+  return result;
+}
+
+#define TEST_FUNCTION do_test ()
+#include "../test-skeleton.c"