[v4,2/4] gdb, infrun, record: fix hang when step-over fails with no-history

Message ID 20240312113423.3543956-3-markus.t.metzger@intel.com
State New
Headers
Series btrace: infrun fixes |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Testing passed

Commit Message

Metzger, Markus T March 12, 2024, 11:34 a.m. UTC
  When trying to step over a breakpoint at the end of the trace while
another thread is replaying, the step-over will fail with no-history.
This does not clear step_over_info so a subsequent resume will cause GDB
to not resume the thread and expect a SIGTRAP to complete the step-over.
This will never come causing GDB to hang in the wait-for-event poll.

This is a variant of the issue fixed in the parent commit.  That commit
addressed the issue for a single-threaded process and fixed an issue with
reverse/replay stepping in general.

This commit addresses the issue for a multi-threaded process.  In this
case, the single-step does not complete.

Finish an in-flight step-over when a thread stopped with NO_HISTORY.
Since we did not move, we will simply start the step-over again.
---
 gdb/infrun.c                                  | 14 ++++
 .../gdb.btrace/multi-thread-break-hang.exp    | 84 +++++++++++++++++++
 2 files changed, 98 insertions(+)
 create mode 100644 gdb/testsuite/gdb.btrace/multi-thread-break-hang.exp
  

Comments

Guinevere Larsen April 4, 2024, 6:27 p.m. UTC | #1
On 3/12/24 08:34, Markus Metzger wrote:
> When trying to step over a breakpoint at the end of the trace while
> another thread is replaying, the step-over will fail with no-history.
> This does not clear step_over_info so a subsequent resume will cause GDB
> to not resume the thread and expect a SIGTRAP to complete the step-over.
> This will never come causing GDB to hang in the wait-for-event poll.
>
> This is a variant of the issue fixed in the parent commit.  That commit
> addressed the issue for a single-threaded process and fixed an issue with
> reverse/replay stepping in general.
>
> This commit addresses the issue for a multi-threaded process.  In this
> case, the single-step does not complete.
>
> Finish an in-flight step-over when a thread stopped with NO_HISTORY.
> Since we did not move, we will simply start the step-over again.

Apart from a very minor nit inlined, this patch LGTM. I can't test the 
fix directly, but I tested for regressions with special care for clang 
and see no new failures, so FWIW:

Reviewed-By: Guinevere Larsen <blarsen@redhat.com>

> ---
>   gdb/infrun.c                                  | 14 ++++
>   .../gdb.btrace/multi-thread-break-hang.exp    | 84 +++++++++++++++++++
>   2 files changed, 98 insertions(+)
>   create mode 100644 gdb/testsuite/gdb.btrace/multi-thread-break-hang.exp
>
> diff --git a/gdb/infrun.c b/gdb/infrun.c
> index f38d96b64df..eb5164b4066 100644
> --- a/gdb/infrun.c
> +++ b/gdb/infrun.c
> @@ -78,6 +78,8 @@
>   #include "disasm.h"
>   #include "interps.h"
>   
> +struct execution_control_state;
> +
>   /* Prototypes for local functions */
>   
>   static void sig_print_info (enum gdb_signal);
> @@ -109,6 +111,8 @@ static bool step_over_info_valid_p (void);
>   
>   static bool schedlock_applies (struct thread_info *tp);
>   
> +static int finish_step_over (execution_control_state *ecs);
> +
This declaration already exists in line 4253. If you want it higher in 
the file, I think it would be good to remove that one.
>   /* Asynchronous signal handler registered as event loop source for
>      when we have pending events ready to be passed to the core.  */
>   static struct async_event_handler *infrun_async_inferior_event_token;
> @@ -6535,6 +6539,16 @@ handle_inferior_event (struct execution_control_state *ecs)
>   	return;
>   
>         interps_notify_no_history ();
> +
> +      /* Cancel an in-flight step-over.  It will not succeed since we
> +	 won't be able to step at the end of the execution history.  */
> +      {
> +	/* finish_step_over may call restart_threads, which may change the
> +	   current thread.  make sure we leave the event thread as the
> +	   current thread.  */
> +	scoped_restore_current_thread restore_thread;
> +	finish_step_over (ecs);
> +      }
>         stop_waiting (ecs);
>         return;
>       }
> diff --git a/gdb/testsuite/gdb.btrace/multi-thread-break-hang.exp b/gdb/testsuite/gdb.btrace/multi-thread-break-hang.exp
> new file mode 100644
> index 00000000000..33edcf05612
> --- /dev/null
> +++ b/gdb/testsuite/gdb.btrace/multi-thread-break-hang.exp
> @@ -0,0 +1,84 @@
> +# This testcase is part of GDB, the GNU debugger.
> +#
> +# Copyright 2024 Free Software Foundation, Inc.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see<http://www.gnu.org/licenses/>.
> +
> +# Test that we cancel an in-flight step-over at the end of the execution
> +# history as long as some other thread is still replaying.
> +#
> +# This used to cause GDB to hang in poll ().
> +
> +require allow_btrace_tests
> +
> +standard_testfile multi-thread-step.c
> +if [prepare_for_testing "failed to prepare" $testfile $srcfile {debug pthreads}] {
> +    return -1
> +}
> +
> +if ![runto_main] {
> +    return -1
> +}
> +
> +# Set up breakpoints.
> +set bp_1 [gdb_get_line_number "bp.1" $srcfile]
> +set bp_2 [gdb_get_line_number "bp.2" $srcfile]
> +
> +# Trace the code between the two breakpoints.
> +gdb_breakpoint $srcfile:$bp_1
> +gdb_continue_to_breakpoint "continue to bp.1" ".*$srcfile:$bp_1\r\n.*"
> +
> +gdb_test_no_output "record btrace"
> +
> +# We have two threads at or close to bp.1 but handled only one stop event.
> +# Remove the breakpoint so we do not need to deal with the 2nd event.
> +delete_breakpoints
> +gdb_breakpoint $srcfile:$bp_2
> +gdb_continue_to_breakpoint "continue to bp.2" ".*$srcfile:$bp_2\r\n.*"
> +
> +# Determine the thread that reported the breakpoint.
> +set thread [get_integer_valueof "\$_thread" bad]
> +
> +# Determine the other thread.
> +set other "bad"
> +if { $thread == 1 } {
> +    set other 2
> +} elseif { $thread == 2 } {
> +    set other 1
> +}
> +
> +# This test requires scheduler-locking 'on' or 'step'; 'replay' would
> +# implicitly stop replaying, avoiding the problem; 'off' would step one
> +# and resume the other.
> +#
> +# With the current record-btrace implementation that steps all resumed
> +# threads in lock-step, 'off' might actually pass but we don't want to
> +# bake that behavior into tests.
> +gdb_test_no_output "set scheduler-locking step"
> +
> +# Start replaying the other thread.  This will prevent stepping the thread
> +# that reported the event.
> +gdb_test "thread apply $other record goto begin"
> +gdb_test "thread apply $other info record" "Replay in progress.*"
> +
> +# We're at a breakpoint so this triggers step-over.  Since we're at the
> +# end of the trace, the step will fail.
> +gdb_test "stepi" "No more reverse-execution history.*" "stepi.1"
> +
> +# We used to hang at the second step since step-over insisted on polling
> +# the next event.
> +gdb_test "stepi" "No more reverse-execution history.*" "stepi.2"
> +
> +# Do one more just in case.
> +gdb_test "stepi" "No more reverse-execution history.*" "stepi.3"
  

Patch

diff --git a/gdb/infrun.c b/gdb/infrun.c
index f38d96b64df..eb5164b4066 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -78,6 +78,8 @@ 
 #include "disasm.h"
 #include "interps.h"
 
+struct execution_control_state;
+
 /* Prototypes for local functions */
 
 static void sig_print_info (enum gdb_signal);
@@ -109,6 +111,8 @@  static bool step_over_info_valid_p (void);
 
 static bool schedlock_applies (struct thread_info *tp);
 
+static int finish_step_over (execution_control_state *ecs);
+
 /* Asynchronous signal handler registered as event loop source for
    when we have pending events ready to be passed to the core.  */
 static struct async_event_handler *infrun_async_inferior_event_token;
@@ -6535,6 +6539,16 @@  handle_inferior_event (struct execution_control_state *ecs)
 	return;
 
       interps_notify_no_history ();
+
+      /* Cancel an in-flight step-over.  It will not succeed since we
+	 won't be able to step at the end of the execution history.  */
+      {
+	/* finish_step_over may call restart_threads, which may change the
+	   current thread.  make sure we leave the event thread as the
+	   current thread.  */
+	scoped_restore_current_thread restore_thread;
+	finish_step_over (ecs);
+      }
       stop_waiting (ecs);
       return;
     }
diff --git a/gdb/testsuite/gdb.btrace/multi-thread-break-hang.exp b/gdb/testsuite/gdb.btrace/multi-thread-break-hang.exp
new file mode 100644
index 00000000000..33edcf05612
--- /dev/null
+++ b/gdb/testsuite/gdb.btrace/multi-thread-break-hang.exp
@@ -0,0 +1,84 @@ 
+# This testcase is part of GDB, the GNU debugger.
+#
+# Copyright 2024 Free Software Foundation, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test that we cancel an in-flight step-over at the end of the execution
+# history as long as some other thread is still replaying.
+#
+# This used to cause GDB to hang in poll ().
+
+require allow_btrace_tests
+
+standard_testfile multi-thread-step.c
+if [prepare_for_testing "failed to prepare" $testfile $srcfile {debug pthreads}] {
+    return -1
+}
+
+if ![runto_main] {
+    return -1
+}
+
+# Set up breakpoints.
+set bp_1 [gdb_get_line_number "bp.1" $srcfile]
+set bp_2 [gdb_get_line_number "bp.2" $srcfile]
+
+# Trace the code between the two breakpoints.
+gdb_breakpoint $srcfile:$bp_1
+gdb_continue_to_breakpoint "continue to bp.1" ".*$srcfile:$bp_1\r\n.*"
+
+gdb_test_no_output "record btrace"
+
+# We have two threads at or close to bp.1 but handled only one stop event.
+# Remove the breakpoint so we do not need to deal with the 2nd event.
+delete_breakpoints
+gdb_breakpoint $srcfile:$bp_2
+gdb_continue_to_breakpoint "continue to bp.2" ".*$srcfile:$bp_2\r\n.*"
+
+# Determine the thread that reported the breakpoint.
+set thread [get_integer_valueof "\$_thread" bad]
+
+# Determine the other thread.
+set other "bad"
+if { $thread == 1 } {
+    set other 2
+} elseif { $thread == 2 } {
+    set other 1
+}
+
+# This test requires scheduler-locking 'on' or 'step'; 'replay' would
+# implicitly stop replaying, avoiding the problem; 'off' would step one
+# and resume the other.
+#
+# With the current record-btrace implementation that steps all resumed
+# threads in lock-step, 'off' might actually pass but we don't want to
+# bake that behavior into tests.
+gdb_test_no_output "set scheduler-locking step"
+
+# Start replaying the other thread.  This will prevent stepping the thread
+# that reported the event.
+gdb_test "thread apply $other record goto begin"
+gdb_test "thread apply $other info record" "Replay in progress.*"
+
+# We're at a breakpoint so this triggers step-over.  Since we're at the
+# end of the trace, the step will fail.
+gdb_test "stepi" "No more reverse-execution history.*" "stepi.1"
+
+# We used to hang at the second step since step-over insisted on polling
+# the next event.
+gdb_test "stepi" "No more reverse-execution history.*" "stepi.2"
+
+# Do one more just in case.
+gdb_test "stepi" "No more reverse-execution history.*" "stepi.3"