[v5,2/8] Handle gdb SIGTERM by throwing / catching gdb_exception_force_quit

Message ID 20230222234613.29662-3-kevinb@redhat.com
State Committed
Commit 63509715af867d635ad0e8cfe5a662bfc67b4ddf
Headers
Series Fix gdb.base/gdb-sigterm.exp failure/error |

Commit Message

Kevin Buettner Feb. 22, 2023, 11:46 p.m. UTC
  When a GDB process receives the SIGTERM signal, handle_sigterm() in
event-top.c is called.  The global variable 'sync_quit_force_run' is
set by this signal handler.  It does some other things too, but the
setting of this global is the important bit for the SIGTERM part of
this discussion.

GDB will periodically check to see whether a Ctrl-C or SIGTERM has
been received.  This is performed via use of the QUIT macro in
GDB's code.  QUIT is defined to invoke maybe_quit(), which will be
periodically called during any lengthy operation.  This is supposed to
ensure that the user won't have to wait too long for a Ctrl-C or
SIGTERM to be acted upon.

When a Ctrl-C / SIGINT is received, quit_handler() will decide whether
to pass the SIGINT onto the inferior or to call quit() which causes
gdb_exception_quit to be thrown.  This exception (usually) propagates
to the top level.  Control is then returned to the top level event
loop.

At the moment, SIGTERM is handled very differently.  Instead of
throwing an exception, quit_force() is called.  This does eventually
cause GDB to exit(), but prior to that happening, the inferiors
are killed or detached and other target related cleanup occurs.
As shown in this discussion between Pedro Alves and myself...

https://sourceware.org/pipermail/gdb-patches/2021-July/180802.html
https://sourceware.org/pipermail/gdb-patches/2021-July/180902.html
https://sourceware.org/pipermail/gdb-patches/2021-July/180903.html

...we found that it is possible for inferior_ptid and current_thread_
to get out of sync.  When that happens, the "current_thread_ != nullptr"
assertion in inferior_thread() can fail resulting in a GDB internal
error.

Pedro recommended that we "let the normal quit exception propagate all
the way to the top level, and then have the top level call quit_force
if sync_quit_force_run is set."  However, after the v2 series for this
patch set, we tweaked that idea by introducing a new exception for
handling SIGTERM.

This commit implements the obvious part of Pedro's suggestion:
Instead of calling quit_force from quit(), throw_forced_quit() is now
called instead.  This causes the new exception 'gdb_exception_forced_quit'
to be thrown.

At the top level, I changed catch_command_errors() and captured_main()
to catch gdb_exception_forced_quit and then call quit_force() from the
catch block.  I also changed start_event_loop() to also catch
gdb_exception_forced_quit; while we could also call quit_force() from
that catch block, it's sufficient to simply rethrow the exception
since it'll be caught by the newly added code in captured_main().

Making these changes fixed the failure / regression that I was seeing
for gdb.base/gdb-sigterm.exp when run on a machine with glibc-2.34.
However, there are many other paths back to the top level which this
test case does not test.  I did an audit of all of the try / catch
code in GDB in which calls in the try-block might (eventually) call
QUIT.  I found many cases where gdb_exception_quit and the new
gdb_exception_forced_quit will be swallowed.  (When using GDB, have
you ever hit Ctrl-C and not have it do anything; if so, it could be
due to a swallowed gdb_exception_quit in one of the cases I've
identified.)  The rest of the patches in this series deal with this
concern.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=26761
Tested-by: Tom de Vries <tdevries@suse.de>
Approved-by: Pedro Alves <pedro@palves.net>
---
 gdb/main.c  | 12 ++++++++++++
 gdb/utils.c |  2 +-
 2 files changed, 13 insertions(+), 1 deletion(-)
  

Comments

Simon Marchi March 2, 2023, 8:14 p.m. UTC | #1
On 2/22/23 18:46, Kevin Buettner via Gdb-patches wrote:
> When a GDB process receives the SIGTERM signal, handle_sigterm() in
> event-top.c is called.  The global variable 'sync_quit_force_run' is
> set by this signal handler.  It does some other things too, but the
> setting of this global is the important bit for the SIGTERM part of
> this discussion.
> 
> GDB will periodically check to see whether a Ctrl-C or SIGTERM has
> been received.  This is performed via use of the QUIT macro in
> GDB's code.  QUIT is defined to invoke maybe_quit(), which will be
> periodically called during any lengthy operation.  This is supposed to
> ensure that the user won't have to wait too long for a Ctrl-C or
> SIGTERM to be acted upon.
> 
> When a Ctrl-C / SIGINT is received, quit_handler() will decide whether
> to pass the SIGINT onto the inferior or to call quit() which causes
> gdb_exception_quit to be thrown.  This exception (usually) propagates
> to the top level.  Control is then returned to the top level event
> loop.
> 
> At the moment, SIGTERM is handled very differently.  Instead of
> throwing an exception, quit_force() is called.  This does eventually
> cause GDB to exit(), but prior to that happening, the inferiors
> are killed or detached and other target related cleanup occurs.
> As shown in this discussion between Pedro Alves and myself...
> 
> https://sourceware.org/pipermail/gdb-patches/2021-July/180802.html
> https://sourceware.org/pipermail/gdb-patches/2021-July/180902.html
> https://sourceware.org/pipermail/gdb-patches/2021-July/180903.html
> 
> ...we found that it is possible for inferior_ptid and current_thread_
> to get out of sync.  When that happens, the "current_thread_ != nullptr"
> assertion in inferior_thread() can fail resulting in a GDB internal
> error.
> 
> Pedro recommended that we "let the normal quit exception propagate all
> the way to the top level, and then have the top level call quit_force
> if sync_quit_force_run is set."  However, after the v2 series for this
> patch set, we tweaked that idea by introducing a new exception for
> handling SIGTERM.
> 
> This commit implements the obvious part of Pedro's suggestion:
> Instead of calling quit_force from quit(), throw_forced_quit() is now
> called instead.  This causes the new exception 'gdb_exception_forced_quit'
> to be thrown.
> 
> At the top level, I changed catch_command_errors() and captured_main()
> to catch gdb_exception_forced_quit and then call quit_force() from the
> catch block.  I also changed start_event_loop() to also catch
> gdb_exception_forced_quit; while we could also call quit_force() from
> that catch block, it's sufficient to simply rethrow the exception
> since it'll be caught by the newly added code in captured_main().
> 
> Making these changes fixed the failure / regression that I was seeing
> for gdb.base/gdb-sigterm.exp when run on a machine with glibc-2.34.
> However, there are many other paths back to the top level which this
> test case does not test.  I did an audit of all of the try / catch
> code in GDB in which calls in the try-block might (eventually) call
> QUIT.  I found many cases where gdb_exception_quit and the new
> gdb_exception_forced_quit will be swallowed.  (When using GDB, have
> you ever hit Ctrl-C and not have it do anything; if so, it could be
> due to a swallowed gdb_exception_quit in one of the cases I've
> identified.)  The rest of the patches in this series deal with this
> concern.
> 
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=26761
> Tested-by: Tom de Vries <tdevries@suse.de>
> Approved-by: Pedro Alves <pedro@palves.net>

I see this failure following this commit:

    $ make check TESTS="gdb.base/quit-live.exp" RUNTESTFLAGS="--target_board=native-gdbserver"
    FAIL: gdb.base/quit-live.exp: appear_how=run: extra_inferior=0: quit_how=sigterm: quit with SIGTERM (GDB internal error)
    FAIL: gdb.base/quit-live.exp: appear_how=run: extra_inferior=1: quit_how=sigterm: quit with SIGTERM (GDB internal error)

In gdb.log:

    (gdb) continue
    Continuing.

    Breakpoint 1, main () at /home/smarchi/src/binutils-gdb/gdb/testsuite/gdb.base/quit-live.c:23
    23        int secs = 30;
    (gdb) Executing on host: kill -TERM 1989873    (timeout = 300)
    builtin_spawn -ignore SIGHUP kill -TERM 1989873
    SIGTERM
    /home/smarchi/src/binutils-gdb/gdb/exceptions.c:100: internal-error: Bad switch.
    A problem internal to GDB has been detected,
    further debugging may prove unreliable.
    ----- Backtrace -----
    FAIL: gdb.base/quit-live.exp: appear_how=run: extra_inferior=0: quit_how=sigterm: quit with SIGTERM (GDB internal error)
    Resyncing due to internal error.
    0x557cb89b93f0 gdb_internal_backtrace_1
            /home/smarchi/src/binutils-gdb/gdb/bt-utils.c:122
    0x557cb89b93f0 _Z22gdb_internal_backtracev
            /home/smarchi/src/binutils-gdb/gdb/bt-utils.c:168
    0x557cb8e118f4 internal_vproblem
            /home/smarchi/src/binutils-gdb/gdb/utils.c:401
    0x557cb8e11bb0 _Z15internal_verrorPKciS0_P13__va_list_tag
            /home/smarchi/src/binutils-gdb/gdb/utils.c:481
    0x557cb8f666c4 _Z18internal_error_locPKciS0_z
            /home/smarchi/src/binutils-gdb/gdbsupport/errors.cc:58
    0x557cb8aece58 print_exception
            /home/smarchi/src/binutils-gdb/gdb/exceptions.c:100
    0x557cb8d9f363 _Z10quit_forcePii
            /home/smarchi/src/binutils-gdb/gdb/top.c:1849
    0x557cb8977cdb _Z28invoke_async_signal_handlersv
            /home/smarchi/src/binutils-gdb/gdb/async-event.c:233
    0x557cb8f67ad7 _Z16gdb_do_one_eventi
            /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:199
    0x557cb8bed709 start_event_loop
            /home/smarchi/src/binutils-gdb/gdb/main.c:411
    0x557cb8bed709 captured_command_loop
            /home/smarchi/src/binutils-gdb/gdb/main.c:475
    0x557cb8bef2f4 captured_main
            /home/smarchi/src/binutils-gdb/gdb/main.c:1318
    0x557cb8bef2f4 _Z8gdb_mainP18captured_main_args
            /home/smarchi/src/binutils-gdb/gdb/main.c:1337
    0x557cb890d87f main
            /home/smarchi/src/binutils-gdb/gdb/gdb.c:32
    ---------------------
    /home/smarchi/src/binutils-gdb/gdb/exceptions.c:100: internal-error: Bad switch.

Simon
  

Patch

diff --git a/gdb/main.c b/gdb/main.c
index c04d37a45f9..0bf6f98a797 100644
--- a/gdb/main.c
+++ b/gdb/main.c
@@ -410,6 +410,10 @@  start_event_loop ()
 	{
 	  result = gdb_do_one_event ();
 	}
+      catch (const gdb_exception_forced_quit &ex)
+	{
+	  throw;
+	}
       catch (const gdb_exception &ex)
 	{
 	  exception_print (gdb_stderr, ex);
@@ -518,6 +522,10 @@  catch_command_errors (catch_command_errors_const_ftype command,
       if (do_bp_actions)
 	bpstat_do_actions ();
     }
+  catch (const gdb_exception_forced_quit &e)
+    {
+      quit_force (NULL, 0);
+    }
   catch (const gdb_exception &e)
     {
       return handle_command_errors (e);
@@ -1309,6 +1317,10 @@  captured_main (void *data)
 	{
 	  captured_command_loop ();
 	}
+      catch (const gdb_exception_forced_quit &ex)
+	{
+	  quit_force (NULL, 0);
+	}
       catch (const gdb_exception &ex)
 	{
 	  exception_print (gdb_stderr, ex);
diff --git a/gdb/utils.c b/gdb/utils.c
index 91e6974b976..08cc41b4ef8 100644
--- a/gdb/utils.c
+++ b/gdb/utils.c
@@ -642,7 +642,7 @@  quit (void)
   if (sync_quit_force_run)
     {
       sync_quit_force_run = 0;
-      quit_force (NULL, 0);
+      throw_forced_quit ("SIGTERM");
     }
 
 #ifdef __MSDOS__