Tests for minimal signal handler functionality in MINSIGSTKSZ space.

Message ID CAMe9rOqCO0A8cCzqi2DjZqq+1HM9wHsx0t97WGq2N3vuGp=Hmg@mail.gmail.com
State Committed
Headers

Commit Message

H.J. Lu Jan. 18, 2019, 2:53 p.m. UTC
  On Thu, Jan 17, 2019 at 7:12 PM Zack Weinberg <zackw@panix.com> wrote:
>
> On Wed, Jan 16, 2019 at 10:31 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > The new tests failed on AVX512 machines:
> >
> > Program received signal SIGUSR1, User defined signal 1.
> > __GI_raise (sig=sig@entry=10) at ../sysdeps/unix/sysv/linux/raise.c:50
> > 50   return ret;
> > (gdb) c
> > Continuing.
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > _dl_runtime_resolve_xsavec () at ../sysdeps/x86_64/dl-trampoline.h:93
> > 93 movq %rax, REGISTER_SAVE_RAX(%rsp)
> > (gdb) bt
> > #0  _dl_runtime_resolve_xsavec () at ../sysdeps/x86_64/dl-trampoline.h:93
> > #1  0x0040248d in handler (unused=<optimized out>) at tst-minsigstksz-4.c:44
> > #2  <signal handler called>
> > #3  __GI_raise (sig=sig@entry=10) at ../sysdeps/unix/sysv/linux/raise.c:50
> > #4  0x004024da in do_test () at tst-minsigstksz-4.c:59
> > #5  0x00402cd6 in support_test_main (argc=1, argv=0xffffcef8,
> >     config=config@entry=0xffffcdf0) at support_test_main.c:350
> > #6  0x00402348 in main (argc=<optimized out>, argv=<optimized out>)
> >     at ../support/test-driver.c:168
> > (gdb)
> >
> > AVX512 needs 2560 bytes to save processor state.
>
> Well, this is the problem that we knew existed and can't fix quickly.
> If I'm reading http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html
> correctly, we can't introduce new sysconf constants unilaterally
> (there is no license to define system-specific _SC_* symbols).
>
> I wonder if this test passes if you run it with LD_BIND_NOW=t in the
> environment.  Forcing -z now for these tests might be the best we can
> do for 2.29.
>

This works:
  

Comments

Carlos O'Donell Jan. 18, 2019, 4:40 p.m. UTC | #1
On 1/18/19 9:53 AM, H.J. Lu wrote:
> On Thu, Jan 17, 2019 at 7:12 PM Zack Weinberg <zackw@panix.com> wrote:
>>
>> On Wed, Jan 16, 2019 at 10:31 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>>>
>>> The new tests failed on AVX512 machines:
>>>
>>> Program received signal SIGUSR1, User defined signal 1.
>>> __GI_raise (sig=sig@entry=10) at ../sysdeps/unix/sysv/linux/raise.c:50
>>> 50   return ret;
>>> (gdb) c
>>> Continuing.
>>>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> _dl_runtime_resolve_xsavec () at ../sysdeps/x86_64/dl-trampoline.h:93
>>> 93 movq %rax, REGISTER_SAVE_RAX(%rsp)
>>> (gdb) bt
>>> #0  _dl_runtime_resolve_xsavec () at ../sysdeps/x86_64/dl-trampoline.h:93
>>> #1  0x0040248d in handler (unused=<optimized out>) at tst-minsigstksz-4.c:44
>>> #2  <signal handler called>
>>> #3  __GI_raise (sig=sig@entry=10) at ../sysdeps/unix/sysv/linux/raise.c:50
>>> #4  0x004024da in do_test () at tst-minsigstksz-4.c:59
>>> #5  0x00402cd6 in support_test_main (argc=1, argv=0xffffcef8,
>>>     config=config@entry=0xffffcdf0) at support_test_main.c:350
>>> #6  0x00402348 in main (argc=<optimized out>, argv=<optimized out>)
>>>     at ../support/test-driver.c:168
>>> (gdb)
>>>
>>> AVX512 needs 2560 bytes to save processor state.
>>
>> Well, this is the problem that we knew existed and can't fix quickly.
>> If I'm reading http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html
>> correctly, we can't introduce new sysconf constants unilaterally
>> (there is no license to define system-specific _SC_* symbols).
>>
>> I wonder if this test passes if you run it with LD_BIND_NOW=t in the
>> environment.  Forcing -z now for these tests might be the best we can
>> do for 2.29.
>>
> 
> This works:
> 
> diff --git a/signal/Makefile b/signal/Makefile
> index 9597287bca..9c65c26887 100644
> --- a/signal/Makefile
> +++ b/signal/Makefile
> @@ -59,3 +59,9 @@ CFLAGS-sigwait.c += -fexceptions -fasynchronous-unwind-tables
>  CFLAGS-sigwaitinfo.c += -fexceptions -fasynchronous-unwind-tables
> 
>  CFLAGS-sigreturn.c += $(no-stack-protector)
> +

OK with comment:

# We don't want to test the lazy resolution stack usage
# just the execution of the handler and the functions
> +LDFLAGS-tst-minsigstksz-1 = -Wl,-z,now
> +LDFLAGS-tst-minsigstksz-2 = -Wl,-z,now
> +LDFLAGS-tst-minsigstksz-3 = -Wl,-z,now
> +LDFLAGS-tst-minsigstksz-3a = -Wl,-z,now
> +LDFLAGS-tst-minsigstksz-4 = -Wl,-z,now
> 

I'd say this is OK for master.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

All of this raises an interesting point. Should MINSIGSTKSZ
have included enough space for the lazy resolution to happen?
I think we have to, because you're never going to have already
called abort, quick_exit, or _exit, so they will all go through
lazy binding resolution if you're not BIND_NOW. Which means we
need an average estimate from all arches about the lazy binding
stack usage.
  
Zack Weinberg Jan. 18, 2019, 6:08 p.m. UTC | #2
On Fri, Jan 18, 2019 at 11:40 AM Carlos O'Donell <carlos@redhat.com> wrote:
>
> All of this raises an interesting point. Should MINSIGSTKSZ
> have included enough space for the lazy resolution to happen?
> I think we have to, because you're never going to have already
> called abort, quick_exit, or _exit, so they will all go through
> lazy binding resolution if you're not BIND_NOW. Which means we
> need an average estimate from all arches about the lazy binding
> stack usage.

in the long term, this strikes me as another reason we should be
thinking about making eager symbol resolution the default ... (and
then we could start thinking about moving the dynamic loader out of
process ...)

in the medium term, though, I completely agree

in the 2.29 term, though, perhaps the best we can do is some
documentation, I'll have a go at writing that in the next couple days

zw
  
H.J. Lu Jan. 18, 2019, 7:49 p.m. UTC | #3
On Fri, Jan 18, 2019 at 10:08 AM Zack Weinberg <zackw@panix.com> wrote:
>
> On Fri, Jan 18, 2019 at 11:40 AM Carlos O'Donell <carlos@redhat.com> wrote:
> >
> > All of this raises an interesting point. Should MINSIGSTKSZ
> > have included enough space for the lazy resolution to happen?
> > I think we have to, because you're never going to have already
> > called abort, quick_exit, or _exit, so they will all go through
> > lazy binding resolution if you're not BIND_NOW. Which means we
> > need an average estimate from all arches about the lazy binding
> > stack usage.
>
> in the long term, this strikes me as another reason we should be
> thinking about making eager symbol resolution the default ... (and
> then we could start thinking about moving the dynamic loader out of
> process ...)
>
> in the medium term, though, I completely agree
>
> in the 2.29 term, though, perhaps the best we can do is some
> documentation, I'll have a go at writing that in the next couple days
>

The 32-bit signal/tst-minsigstksz-1 failed on AVX512 machine:

Program received signal SIGSEGV, Segmentation fault.
0xf7e2e366 in __GI___libc_sigaction (sig=10, act=0xf7cf70a8, oact=0xf7cf7134)
    at ../sysdeps/unix/sysv/linux/sigaction.c:48
48   if (act)
(gdb) disass
Dump of assembler code for function __GI___libc_sigaction:
   0xf7e2e350 <+0>: sub    $0x14c,%esp
   0xf7e2e356 <+6>: xor    %ecx,%ecx
   0xf7e2e358 <+8>: mov    %esi,0x140(%esp)
   0xf7e2e35f <+15>: mov    0x154(%esp),%edx
=> 0xf7e2e366 <+22>: call   0xf7f44c2f <__x86.get_pc_thunk.si>

For 64-bit:

Breakpoint 3, __GI___libc_sigaction (sig=sig@entry=10,
    act=act@entry=0x7fffffffdb40, oact=oact@entry=0x0)
    at ../sysdeps/unix/sysv/linux/sigaction.c:48
48   if (act)
(gdb) disass
Dump of assembler code for function __GI___libc_sigaction:
=> 0x00007ffff7e4dc50 <+0>: sub    $0xd0,%rsp
   0x00007ffff7e4dc57 <+7>: mov    %rdx,%r8
   0x00007ffff7e4dc5a <+10>: test   %rsi,%rsi
   0x00007ffff7e4dc5d <+13>: je     0x7ffff7e4ddb0 <__GI___libc_sigaction+352>
   0x00007ffff7e4dc63 <+19>: mov    (%rsi),%rax

64-bit allocates smaller stack and it doesn't need to call
__x86.get_pc_thunk.si.  On AVX512, kernel needs larger
stack space to save signal context.
  
Carlos O'Donell Jan. 21, 2019, 6:45 p.m. UTC | #4
On 1/18/19 2:49 PM, H.J. Lu wrote:
> On Fri, Jan 18, 2019 at 10:08 AM Zack Weinberg <zackw@panix.com> wrote:
>>
>> On Fri, Jan 18, 2019 at 11:40 AM Carlos O'Donell <carlos@redhat.com> wrote:
>>>
>>> All of this raises an interesting point. Should MINSIGSTKSZ
>>> have included enough space for the lazy resolution to happen?
>>> I think we have to, because you're never going to have already
>>> called abort, quick_exit, or _exit, so they will all go through
>>> lazy binding resolution if you're not BIND_NOW. Which means we
>>> need an average estimate from all arches about the lazy binding
>>> stack usage.
>>
>> in the long term, this strikes me as another reason we should be
>> thinking about making eager symbol resolution the default ... (and
>> then we could start thinking about moving the dynamic loader out of
>> process ...)
>>
>> in the medium term, though, I completely agree
>>
>> in the 2.29 term, though, perhaps the best we can do is some
>> documentation, I'll have a go at writing that in the next couple days
>>
> 
> The 32-bit signal/tst-minsigstksz-1 failed on AVX512 machine:
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0xf7e2e366 in __GI___libc_sigaction (sig=10, act=0xf7cf70a8, oact=0xf7cf7134)
>     at ../sysdeps/unix/sysv/linux/sigaction.c:48
> 48   if (act)
> (gdb) disass
> Dump of assembler code for function __GI___libc_sigaction:
>    0xf7e2e350 <+0>: sub    $0x14c,%esp
>    0xf7e2e356 <+6>: xor    %ecx,%ecx
>    0xf7e2e358 <+8>: mov    %esi,0x140(%esp)
>    0xf7e2e35f <+15>: mov    0x154(%esp),%edx
> => 0xf7e2e366 <+22>: call   0xf7f44c2f <__x86.get_pc_thunk.si>
> 
> For 64-bit:
> 
> Breakpoint 3, __GI___libc_sigaction (sig=sig@entry=10,
>     act=act@entry=0x7fffffffdb40, oact=oact@entry=0x0)
>     at ../sysdeps/unix/sysv/linux/sigaction.c:48
> 48   if (act)
> (gdb) disass
> Dump of assembler code for function __GI___libc_sigaction:
> => 0x00007ffff7e4dc50 <+0>: sub    $0xd0,%rsp
>    0x00007ffff7e4dc57 <+7>: mov    %rdx,%r8
>    0x00007ffff7e4dc5a <+10>: test   %rsi,%rsi
>    0x00007ffff7e4dc5d <+13>: je     0x7ffff7e4ddb0 <__GI___libc_sigaction+352>
>    0x00007ffff7e4dc63 <+19>: mov    (%rsi),%rax
> 
> 64-bit allocates smaller stack and it doesn't need to call
> __x86.get_pc_thunk.si.  On AVX512, kernel needs larger
> stack space to save signal context.
 
I think these should be marked as XFAIL for x86, if the tests
can't pass with BIND_NOW then there is a real problem here
that needs fixing in the next release cycle.
  

Patch

diff --git a/signal/Makefile b/signal/Makefile
index 9597287bca..9c65c26887 100644
--- a/signal/Makefile
+++ b/signal/Makefile
@@ -59,3 +59,9 @@  CFLAGS-sigwait.c += -fexceptions -fasynchronous-unwind-tables
 CFLAGS-sigwaitinfo.c += -fexceptions -fasynchronous-unwind-tables

 CFLAGS-sigreturn.c += $(no-stack-protector)
+
+LDFLAGS-tst-minsigstksz-1 = -Wl,-z,now
+LDFLAGS-tst-minsigstksz-2 = -Wl,-z,now
+LDFLAGS-tst-minsigstksz-3 = -Wl,-z,now
+LDFLAGS-tst-minsigstksz-3a = -Wl,-z,now
+LDFLAGS-tst-minsigstksz-4 = -Wl,-z,now