Supress SIGTTOU when handling errors

Message ID B18A0439-729A-4F8E-AB7E-F4173A1D9BC6@arm.com
State New, archived
Headers

Commit Message

Alan Hayward May 17, 2019, 12:46 p.m. UTC
  > On 16 May 2019, at 19:06, Andrew Burgess <andrew.burgess@embecosm.com> wrote:
> 
> * Alan Hayward <Alan.Hayward@arm.com> [2019-05-16 15:51:53 +0000]:
> 
>> [I've seen this on and off over many months on AArch64 and Arm, and am
>> assuming it isn't the intended behaviour. Not sure if this should be at
>> tcdrain or it should be done at a higher level - eg in the terminal
>> handling code]
>> 
>> Calls to error () can cause SIGTTOU to send gdb to the background.
>> 
>> For example, on an Arm build:
>>  (gdb) b main
>>  Breakpoint 1 at 0x10774: file /build/gdb/testsuite/../../../src/binutils-gdb/gdb/testsuite/gdb.base/watchpoint.c, line 174.
>>  (gdb) r
>>  Starting program: /build/gdb/testsuite/outputs/gdb.base/watchpoint/watchpoint
>> 
>>  [1]+  Stopped                 ../gdb ./outputs/gdb.base/watchpoint/watchpoint
>>  localhost$ fg
>>  ../gdb ./outputs/gdb.base/watchpoint/watchpoint
>>  Cannot parse expression `.L1199 4@r4'.
>>  warning: Probes-based dynamic linker interface failed.
>>  Reverting to original interface.
>> 
>> The SIGTTOU is raised whilst inside a syscall during the call to tcdrain.
>> Fix is to use scoped_ignore_sigttou to ensure SIGTTOU is blocked.
>> 
>> In addition fix include comments - job_control is not included via
>> terminal.h
> 
> Maybe someone else knows this better, but this feels like the wrong
> solution to me.
> 
> As I understand it the problem you're seeing could be resolved by
> making sure that the terminal is correctly claimed by GDB at the point
> tcdrain is called.  This would require a call to either
> 'target_terminal::ours ()' or 'target_terminal::ours_for_output ()’.

That makes sense. Wasn’t aware about that code before. 

> 
> I've made several attempts to fix this in the past (non posted
> though), one problem I've previously run into that I've not yet
> understood is that calling ::ours_for_output doesn't seem to be enough
> to prevent the SIGTTOU (I assume from the tcdrain) and only calling
> ::ours is enough.


Playing about with the patch you posted, I also found that ::ours prevents
the signal, but ::ours_for_output doesn’t.  Looks like it’s due to the
following tcsetpgrp not happening:

inflow.c:child_terminal_ours_1 ()

if (job_control && desired_state == target_terminal_state::is_ours)
  {
#ifdef HAVE_TERMIOS_H
    result = tcsetpgrp (0, our_terminal_info.process_group);

> What I've been trying to figure out is what the intended difference
> between ::ours_for_output and ::ours actually is, if we can't always
> write output after calling ::ours_for_output.  Part of me wonders if
> we should just switch to using ::ours in all cases….

Agreed, I’m not sure of the intended differences here.

> 
> Anyway, I think most of the problems you're seeing should be fixed by
> claiming the terminal either permanently (calling ::ours or
> ::ours_for_output) or temporarily using
> target_terminal::scoped_restore_terminal_state.

The problem with that is finding all possible error cases and ensuring
they all all fixed up.

For example, my error was coming from the try/catch/exception_print
in solid-svr4.c:solib_event_probe_action ()

> 
> Like I said, I'm not an expert on this code, so maybe I've
> misunderstood the problem you're solving.
> 

We’re definitely looking at the same issue.

Working back up the call trace from where my change was, all the error
prints first end up in exceptions.c::print_flush () which already has
a call to ::ours_for_output.

I’ve posted two patches below. Both of them fix all my issues, including
your GDB_FAKE_ERROR case.

If ::ours_for_output and ::ours are working as intended, then the first
patch is probably the correct fix.

But, if ::ours_for_output and ::ours are not working as intended, then
possibly more investigation is required to know why patch 2 works.

Alan.



PATCH 1:

gdb/ChangeLog:

2019-05-17  Alan Hayward  <alan.hayward@arm.com>

        * exceptions.c (print_flush): Use target_terminal::ours.
  

Comments

Andrew Burgess May 18, 2019, 9:10 a.m. UTC | #1
* Alan Hayward <Alan.Hayward@arm.com> [2019-05-17 12:46:56 +0000]:

> 
> 
> > On 16 May 2019, at 19:06, Andrew Burgess <andrew.burgess@embecosm.com> wrote:
> > 
> > * Alan Hayward <Alan.Hayward@arm.com> [2019-05-16 15:51:53 +0000]:
> > 
> >> [I've seen this on and off over many months on AArch64 and Arm, and am
> >> assuming it isn't the intended behaviour. Not sure if this should be at
> >> tcdrain or it should be done at a higher level - eg in the terminal
> >> handling code]
> >> 
> >> Calls to error () can cause SIGTTOU to send gdb to the background.
> >> 
> >> For example, on an Arm build:
> >>  (gdb) b main
> >>  Breakpoint 1 at 0x10774: file /build/gdb/testsuite/../../../src/binutils-gdb/gdb/testsuite/gdb.base/watchpoint.c, line 174.
> >>  (gdb) r
> >>  Starting program: /build/gdb/testsuite/outputs/gdb.base/watchpoint/watchpoint
> >> 
> >>  [1]+  Stopped                 ../gdb ./outputs/gdb.base/watchpoint/watchpoint
> >>  localhost$ fg
> >>  ../gdb ./outputs/gdb.base/watchpoint/watchpoint
> >>  Cannot parse expression `.L1199 4@r4'.
> >>  warning: Probes-based dynamic linker interface failed.
> >>  Reverting to original interface.
> >> 
> >> The SIGTTOU is raised whilst inside a syscall during the call to tcdrain.
> >> Fix is to use scoped_ignore_sigttou to ensure SIGTTOU is blocked.
> >> 
> >> In addition fix include comments - job_control is not included via
> >> terminal.h
> > 
> > Maybe someone else knows this better, but this feels like the wrong
> > solution to me.
> > 
> > As I understand it the problem you're seeing could be resolved by
> > making sure that the terminal is correctly claimed by GDB at the point
> > tcdrain is called.  This would require a call to either
> > 'target_terminal::ours ()' or 'target_terminal::ours_for_output ()’.
> 
> That makes sense. Wasn’t aware about that code before. 
> 
> > 
> > I've made several attempts to fix this in the past (non posted
> > though), one problem I've previously run into that I've not yet
> > understood is that calling ::ours_for_output doesn't seem to be enough
> > to prevent the SIGTTOU (I assume from the tcdrain) and only calling
> > ::ours is enough.
> 
> 
> Playing about with the patch you posted, I also found that ::ours prevents
> the signal, but ::ours_for_output doesn’t.  Looks like it’s due to the
> following tcsetpgrp not happening:
> 
> inflow.c:child_terminal_ours_1 ()
> 
> if (job_control && desired_state == target_terminal_state::is_ours)
>   {
> #ifdef HAVE_TERMIOS_H
>     result = tcsetpgrp (0, our_terminal_info.process_group);
> 
> > What I've been trying to figure out is what the intended difference
> > between ::ours_for_output and ::ours actually is, if we can't always
> > write output after calling ::ours_for_output.  Part of me wonders if
> > we should just switch to using ::ours in all cases….
> 
> Agreed, I’m not sure of the intended differences here.
> 
> > 
> > Anyway, I think most of the problems you're seeing should be fixed by
> > claiming the terminal either permanently (calling ::ours or
> > ::ours_for_output) or temporarily using
> > target_terminal::scoped_restore_terminal_state.
> 
> The problem with that is finding all possible error cases and ensuring
> they all all fixed up.

This has bothered me too.  The requirement really is that we can't
have a call to error when the terminal is not ours _unless_ there's a
catch handler in place that will reacquire the terminal.

I've been trying to figure out a good way we could test this
condition, but so far I've not come up with anything good.

I think you're correct - in the short term we should assume that GDB
doesn't hold the above requirement, and ensure we grab the terminal at
the point the error is caught / printed.

> 
> For example, my error was coming from the try/catch/exception_print
> in solid-svr4.c:solib_event_probe_action ()
> 
> > 
> > Like I said, I'm not an expert on this code, so maybe I've
> > misunderstood the problem you're solving.
> > 
> 
> We’re definitely looking at the same issue.
> 
> Working back up the call trace from where my change was, all the error
> prints first end up in exceptions.c::print_flush () which already has
> a call to ::ours_for_output.
> 
> I’ve posted two patches below. Both of them fix all my issues, including
> your GDB_FAKE_ERROR case.
> 
> If ::ours_for_output and ::ours are working as intended, then the first
> patch is probably the correct fix.
> 
> But, if ::ours_for_output and ::ours are not working as intended, then
> possibly more investigation is required to know why patch 2 works.
> 
> Alan.
> 
> 
> 
> PATCH 1:
> 
> gdb/ChangeLog:
> 
> 2019-05-17  Alan Hayward  <alan.hayward@arm.com>
> 
>         * exceptions.c (print_flush): Use target_terminal::ours.
> 
> diff --git a/gdb/exceptions.c b/gdb/exceptions.c
> index ebdc71d98d..47bfb95153 100644
> --- a/gdb/exceptions.c
> +++ b/gdb/exceptions.c
> @@ -46,7 +46,7 @@ print_flush (void)
>    if (current_top_target () != NULL && target_supports_terminal_ours ())
>      {
>        term_state.emplace ();
> -      target_terminal::ours_for_output ();
> +      target_terminal::ours ();
>      }
> 
>    /* We want all output to appear now, before we print the error.  We
> 

I think that this is my preferred patch, though I think using 'ours'
instead of the more obvious 'ours_for_output' is worth a comment.

> 
> 
> PATCH 2:
> 
> 
>     gdb/ChangeLog:
> 
>     2019-05-17  Alan Hayward  <alan.hayward@arm.com>
> 
>             * inflow.c (child_terminal_ours_1): Call tcsetpgrp for
>             is_ours_for_output.
> 
> diff --git a/gdb/inflow.c b/gdb/inflow.c
> index 339b55c0bc..b376e24e86 100644
> --- a/gdb/inflow.c
> +++ b/gdb/inflow.c
> @@ -509,7 +509,9 @@ child_terminal_ours_1 (target_terminal_state desired_state)
>        /* If we only want output, then leave the inferior's pgrp in the
>          foreground, so that Ctrl-C/Ctrl-Z reach the inferior
>          directly.  */
> -      if (job_control && desired_state == target_terminal_state::is_ours)
> +      if (job_control
> +         && (desired_state == target_terminal_state::is_ours
> +             || desired_state == target_terminal_state::is_ours_for_output))
>         {
>  #ifdef HAVE_TERMIOS_H
> 
> 

The only reason I don't like this is that I don't really understand
the wider impact it might have.  I guess there are places in GDB where
we call 'ours_for_output' and assume that Ctrl-C / Ctrl-Z will be
delivered to the inferior.  If these suddenly start arriving in GDB
then it's not clear if we'll have the correct infrastructure in place
to pass these signals on to the inferior.

You should probably wait for a while to see if any terminal experts
want to comment, but if nobody else comments within a week I think you
should go ahead with patch #1.

Thanks,
Andrew
  

Patch

diff --git a/gdb/exceptions.c b/gdb/exceptions.c
index ebdc71d98d..47bfb95153 100644
--- a/gdb/exceptions.c
+++ b/gdb/exceptions.c
@@ -46,7 +46,7 @@  print_flush (void)
   if (current_top_target () != NULL && target_supports_terminal_ours ())
     {
       term_state.emplace ();
-      target_terminal::ours_for_output ();
+      target_terminal::ours ();
     }

   /* We want all output to appear now, before we print the error.  We




PATCH 2:


    gdb/ChangeLog:

    2019-05-17  Alan Hayward  <alan.hayward@arm.com>

            * inflow.c (child_terminal_ours_1): Call tcsetpgrp for
            is_ours_for_output.

diff --git a/gdb/inflow.c b/gdb/inflow.c
index 339b55c0bc..b376e24e86 100644
--- a/gdb/inflow.c
+++ b/gdb/inflow.c
@@ -509,7 +509,9 @@  child_terminal_ours_1 (target_terminal_state desired_state)
       /* If we only want output, then leave the inferior's pgrp in the
         foreground, so that Ctrl-C/Ctrl-Z reach the inferior
         directly.  */
-      if (job_control && desired_state == target_terminal_state::is_ours)
+      if (job_control
+         && (desired_state == target_terminal_state::is_ours
+             || desired_state == target_terminal_state::is_ours_for_output))
        {
 #ifdef HAVE_TERMIOS_H