Fix assert in delete_breakpoint

Message ID 20231113152609.32726-1-tdevries@suse.de
State New
Headers
Series Fix assert in delete_breakpoint |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Testing passed

Commit Message

Tom de Vries Nov. 13, 2023, 3:26 p.m. UTC
  On a pinebook (aarch64 with 64-bit kernel and 32-bit userland) with test-case
gdb.base/gdb-sigterm.exp I run into:
...
intrusive_list.h:458: internal-error: erase_element: \
  Assertion `elem_node->prev != INTRUSIVE_LIST_UNLINKED_VALUE' failed.^M
...

Fix this similar to:
- commit c0afd99439f ("[gdb/tui] Fix assert in ~gdbpy_tui_window_maker"), and
- commit 995a34b1772 ("Guard against frame.c destructors running before
  frame-info.c's"
by checking is_linked () before attempting to remove from breakpoint_chain.

Tested on the pinebook and x86_64-linux.

PR gdb/31061
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31061
---
 gdb/breakpoint.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


base-commit: 6b682bbf86f37982ce1d270fb47f363413490bda
  

Comments

Simon Marchi Nov. 14, 2023, 3:09 p.m. UTC | #1
On 11/13/23 10:26, Tom de Vries wrote:
> On a pinebook (aarch64 with 64-bit kernel and 32-bit userland) with test-case
> gdb.base/gdb-sigterm.exp I run into:
> ...
> intrusive_list.h:458: internal-error: erase_element: \
>   Assertion `elem_node->prev != INTRUSIVE_LIST_UNLINKED_VALUE' failed.^M
> ...
> 
> Fix this similar to:
> - commit c0afd99439f ("[gdb/tui] Fix assert in ~gdbpy_tui_window_maker"), and
> - commit 995a34b1772 ("Guard against frame.c destructors running before
>   frame-info.c's"
> by checking is_linked () before attempting to remove from breakpoint_chain.
> 
> Tested on the pinebook and x86_64-linux.
> 
> PR gdb/31061
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31061

Hi Tom,

Did you dig a little bit to understand why this happened specifically
with this setup, and that it's indeed expected that the breakpoint is
not in the breakpoint chain?  The stack trace you showed contains (with
names demangled):

    0xaae1d42b delete_breakpoint(breakpoint*)^M
            /home/rock/gdb/src/gdb/breakpoint.c:12636^M
    0xab1ed045 delete_thread_breakpoint^M
            /home/rock/gdb/src/gdb/thread.c:98^M
    0xab1ed0ab delete_single_step_breakpoints(thread_info*)^M
            /home/rock/gdb/src/gdb/thread.c:123^M
    0xab021e81 delete_thread_infrun_breakpoints^M
            /home/rock/gdb/src/gdb/infrun.c:3733^M
    0xab021edd for_each_just_stopped_thread^M
            /home/rock/gdb/src/gdb/infrun.c:3752^M
    0xab021f79 delete_just_stopped_threads_infrun_breakpoints^M
            /home/rock/gdb/src/gdb/infrun.c:3768^M

It looks like this is specific to architectures that use software
single stepping?  Does "32-bit userland on 64-bit kernel on aarch64" use
software single stepping, like 32-bit ARM?

Software single step breakpoints are inserted with:

  if (tp->control.single_step_breakpoints == NULL)
    {
      std::unique_ptr<breakpoint> b
	(new momentary_breakpoint (gdbarch, bp_single_step,
				   current_program_space,
				   null_frame_id,
				   tp->global_num));

      tp->control.single_step_breakpoints
	= add_to_breakpoint_chain (std::move (b));
    }

They are added to the global breakpoint_chain by
add_to_breakpoint_chain.  So how come the breakpoint is not linked when
comes the time to delete it?

Simon
  
Tom de Vries Nov. 15, 2023, 11:12 a.m. UTC | #2
On 11/14/23 16:09, Simon Marchi wrote:
> On 11/13/23 10:26, Tom de Vries wrote:
>> On a pinebook (aarch64 with 64-bit kernel and 32-bit userland) with test-case
>> gdb.base/gdb-sigterm.exp I run into:
>> ...
>> intrusive_list.h:458: internal-error: erase_element: \
>>    Assertion `elem_node->prev != INTRUSIVE_LIST_UNLINKED_VALUE' failed.^M
>> ...
>>
>> Fix this similar to:
>> - commit c0afd99439f ("[gdb/tui] Fix assert in ~gdbpy_tui_window_maker"), and
>> - commit 995a34b1772 ("Guard against frame.c destructors running before
>>    frame-info.c's"
>> by checking is_linked () before attempting to remove from breakpoint_chain.
>>
>> Tested on the pinebook and x86_64-linux.
>>
>> PR gdb/31061
>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31061
> 
> Hi Tom,
> 
> Did you dig a little bit to understand why this happened specifically
> with this setup, 

No, I just applied the usual recipe and observed that the problem was fixed.

But now that you asked, I did (and should have done of course).

> and that it's indeed expected that the breakpoint is
> not in the breakpoint chain?  The stack trace you showed contains (with
> names demangled):
> 
>      0xaae1d42b delete_breakpoint(breakpoint*)^M
>              /home/rock/gdb/src/gdb/breakpoint.c:12636^M
>      0xab1ed045 delete_thread_breakpoint^M
>              /home/rock/gdb/src/gdb/thread.c:98^M
>      0xab1ed0ab delete_single_step_breakpoints(thread_info*)^M
>              /home/rock/gdb/src/gdb/thread.c:123^M
>      0xab021e81 delete_thread_infrun_breakpoints^M
>              /home/rock/gdb/src/gdb/infrun.c:3733^M
>      0xab021edd for_each_just_stopped_thread^M
>              /home/rock/gdb/src/gdb/infrun.c:3752^M
>      0xab021f79 delete_just_stopped_threads_infrun_breakpoints^M
>              /home/rock/gdb/src/gdb/infrun.c:3768^M
> 
> It looks like this is specific to architectures that use software
> single stepping?  Does "32-bit userland on 64-bit kernel on aarch64" use
> software single stepping, like 32-bit ARM?
> 

System gdb is configured with --host=arm-linux-gnueabihf 
--target=arm-linux-gnueabihf (and I've copied that for my gdb build) so 
I suppose it's similar to 32-bit arm.

> Software single step breakpoints are inserted with:
> 
>    if (tp->control.single_step_breakpoints == NULL)
>      {
>        std::unique_ptr<breakpoint> b
> 	(new momentary_breakpoint (gdbarch, bp_single_step,
> 				   current_program_space,
> 				   null_frame_id,
> 				   tp->global_num));
> 
>        tp->control.single_step_breakpoints
> 	= add_to_breakpoint_chain (std::move (b));
>      }
> 
> They are added to the global breakpoint_chain by
> add_to_breakpoint_chain.  So how come the breakpoint is not linked when
> comes the time to delete it?

Because it's deleted twice.

The last thing we see in gdb.log before the internal error is:
...
    [infrun] handle_signal_stop: [13633.13633.0] hit its single-step 
breakpoint^M
...

And the statement following the infrun_debug_printf printing that 
message is:
...
   delete_just_stopped_threads_single_step_breakpoints ();
...

So, the following events happen:
- the single-step breakpoint is hit
- delete_just_stopped_threads_single_step_breakpoints is called
- during delete_just_stopped_threads_single_step_breakpoints,
   delete_breakpoint is called which removes the breakpoint from the
   breakpoint chain
- gdb is interrupted by SIGTERM before finish delete_breakpoint
- the SIGTERM triggers a SCOPE_EXIT cleanup, calling
   delete_just_stopped_threads_infrun_breakpoints which ends up
   calling delete_breakpoint again for the same breakpoint.

The proposed fix tries to handle delete_breakpoint being interrupted, 
and called again.

This is an alternative fix:
...
diff --git a/gdb/thread.c b/gdb/thread.c
index 47cc5c9cd14..7f029f2414c 100644
--- a/gdb/thread.c
+++ b/gdb/thread.c
@@ -93,11 +93,12 @@ inferior_thread (void)
  static void
  delete_thread_breakpoint (struct breakpoint **bp_p)
  {
-  if (*bp_p != NULL)
-    {
-      delete_breakpoint (*bp_p);
-      *bp_p = NULL;
-    }
+  struct breakpoint *bp = *bp_p;
+
+  *bp_p = nullptr;
+
+  if (bp != nullptr)
+    delete_breakpoint (bp);
  }

  void
...

It makes sure delete_breakpoint is called only once, but I don't think 
this is the way to go, since it prevents the cleanup.

Thanks,
- Tom
  
Tom de Vries Nov. 15, 2023, 12:12 p.m. UTC | #3
On 11/15/23 12:12, Tom de Vries wrote:
> Because it's deleted twice.

Just to confirm this theory, I added this patch:
...
diff --git a/gdb/breakpoint.c b/gdb/breakpoint.c
index fe09dbcbeee..12a3a2260d7 100644
--- a/gdb/breakpoint.c
+++ b/gdb/breakpoint.c
@@ -12583,6 +12583,8 @@ void
  delete_breakpoint (struct breakpoint *bpt)
  {
    gdb_assert (bpt != NULL);
+  fprintf (stderr, "delete_breakpoint: %p\n", bpt);
+  fflush (stderr);

    /* Has this bp already been deleted?  This can happen because
       multiple lists can hold pointers to bp's.  bpstat lists are
...
and got:
...
   [infrun] handle_signal_stop: [17091.17091.0] hit its single-step 
breakpoint^M
delete_breakpoint: 0xabc756a8^M
delete_breakpoint: 0xabc756a8^M
/home/rock/gdb/src/gdb/../gdbsupport/intrusive_list.h:458: 
internal-error: erase_element: Assertion `elem_node->prev \
!= INTRUSIVE_LIST_UNLINKED_VALUE' failed.^M
...

Thanks,
- Tom
  
Tom de Vries Nov. 21, 2023, 12:22 p.m. UTC | #4
On 11/15/23 12:12, Tom de Vries wrote:
> On 11/14/23 16:09, Simon Marchi wrote:
>> On 11/13/23 10:26, Tom de Vries wrote:
>>> On a pinebook (aarch64 with 64-bit kernel and 32-bit userland) with 
>>> test-case
>>> gdb.base/gdb-sigterm.exp I run into:
>>> ...
>>> intrusive_list.h:458: internal-error: erase_element: \
>>>    Assertion `elem_node->prev != INTRUSIVE_LIST_UNLINKED_VALUE' 
>>> failed.^M
>>> ...
>>>
>>> Fix this similar to:
>>> - commit c0afd99439f ("[gdb/tui] Fix assert in 
>>> ~gdbpy_tui_window_maker"), and
>>> - commit 995a34b1772 ("Guard against frame.c destructors running before
>>>    frame-info.c's"
>>> by checking is_linked () before attempting to remove from 
>>> breakpoint_chain.
>>>
>>> Tested on the pinebook and x86_64-linux.
>>>
>>> PR gdb/31061
>>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31061
>>
>> Hi Tom,
>>
>> Did you dig a little bit to understand why this happened specifically
>> with this setup, 
> 
> No, I just applied the usual recipe and observed that the problem was 
> fixed.
> 
> But now that you asked, I did (and should have done of course).
> 
>> and that it's indeed expected that the breakpoint is
>> not in the breakpoint chain?  The stack trace you showed contains (with
>> names demangled):
>>
>>      0xaae1d42b delete_breakpoint(breakpoint*)^M
>>              /home/rock/gdb/src/gdb/breakpoint.c:12636^M
>>      0xab1ed045 delete_thread_breakpoint^M
>>              /home/rock/gdb/src/gdb/thread.c:98^M
>>      0xab1ed0ab delete_single_step_breakpoints(thread_info*)^M
>>              /home/rock/gdb/src/gdb/thread.c:123^M
>>      0xab021e81 delete_thread_infrun_breakpoints^M
>>              /home/rock/gdb/src/gdb/infrun.c:3733^M
>>      0xab021edd for_each_just_stopped_thread^M
>>              /home/rock/gdb/src/gdb/infrun.c:3752^M
>>      0xab021f79 delete_just_stopped_threads_infrun_breakpoints^M
>>              /home/rock/gdb/src/gdb/infrun.c:3768^M
>>
>> It looks like this is specific to architectures that use software
>> single stepping?  Does "32-bit userland on 64-bit kernel on aarch64" use
>> software single stepping, like 32-bit ARM?
>>
> 
> System gdb is configured with --host=arm-linux-gnueabihf 
> --target=arm-linux-gnueabihf (and I've copied that for my gdb build) so 
> I suppose it's similar to 32-bit arm.
> 
>> Software single step breakpoints are inserted with:
>>
>>    if (tp->control.single_step_breakpoints == NULL)
>>      {
>>        std::unique_ptr<breakpoint> b
>>     (new momentary_breakpoint (gdbarch, bp_single_step,
>>                    current_program_space,
>>                    null_frame_id,
>>                    tp->global_num));
>>
>>        tp->control.single_step_breakpoints
>>     = add_to_breakpoint_chain (std::move (b));
>>      }
>>
>> They are added to the global breakpoint_chain by
>> add_to_breakpoint_chain.  So how come the breakpoint is not linked when
>> comes the time to delete it?
> 
> Because it's deleted twice.
> 
> The last thing we see in gdb.log before the internal error is:
> ...
>     [infrun] handle_signal_stop: [13633.13633.0] hit its single-step 
> breakpoint^M
> ...
> 
> And the statement following the infrun_debug_printf printing that 
> message is:
> ...
>    delete_just_stopped_threads_single_step_breakpoints ();
> ...
> 
> So, the following events happen:
> - the single-step breakpoint is hit
> - delete_just_stopped_threads_single_step_breakpoints is called
> - during delete_just_stopped_threads_single_step_breakpoints,
>    delete_breakpoint is called which removes the breakpoint from the
>    breakpoint chain
> - gdb is interrupted by SIGTERM before finish delete_breakpoint
> - the SIGTERM triggers a SCOPE_EXIT cleanup, calling
>    delete_just_stopped_threads_infrun_breakpoints which ends up
>    calling delete_breakpoint again for the same breakpoint.
> 
> The proposed fix tries to handle delete_breakpoint being interrupted, 
> and called again.
> 
> This is an alternative fix:
> ...
> diff --git a/gdb/thread.c b/gdb/thread.c
> index 47cc5c9cd14..7f029f2414c 100644
> --- a/gdb/thread.c
> +++ b/gdb/thread.c
> @@ -93,11 +93,12 @@ inferior_thread (void)
>   static void
>   delete_thread_breakpoint (struct breakpoint **bp_p)
>   {
> -  if (*bp_p != NULL)
> -    {
> -      delete_breakpoint (*bp_p);
> -      *bp_p = NULL;
> -    }
> +  struct breakpoint *bp = *bp_p;
> +
> +  *bp_p = nullptr;
> +
> +  if (bp != nullptr)
> +    delete_breakpoint (bp);
>   }
> 
>   void
> ...
> 
> It makes sure delete_breakpoint is called only once, but I don't think 
> this is the way to go, since it prevents the cleanup.

Hi,

did you have any further comments?

I'm inclined to apply the patch as is, but I'm not sure if I addressed 
your questions sufficiently.

Thanks,
- Tom
  
Simon Marchi Nov. 21, 2023, 4:11 p.m. UTC | #5
On 11/15/23 06:12, Tom de Vries wrote:
> Because it's deleted twice.
> 
> The last thing we see in gdb.log before the internal error is:
> ...
>    [infrun] handle_signal_stop: [13633.13633.0] hit its single-step breakpoint^M
> ...
> 
> And the statement following the infrun_debug_printf printing that message is:
> ...
>   delete_just_stopped_threads_single_step_breakpoints ();
> ...
> 
> So, the following events happen:
> - the single-step breakpoint is hit
> - delete_just_stopped_threads_single_step_breakpoints is called
> - during delete_just_stopped_threads_single_step_breakpoints,
>   delete_breakpoint is called which removes the breakpoint from the
>   breakpoint chain
> - gdb is interrupted by SIGTERM before finish delete_breakpoint
> - the SIGTERM triggers a SCOPE_EXIT cleanup, calling
>   delete_just_stopped_threads_infrun_breakpoints which ends up
>   calling delete_breakpoint again for the same breakpoint.
> 
> The proposed fix tries to handle delete_breakpoint being interrupted, and called again.
> 
> This is an alternative fix:
> ...
> diff --git a/gdb/thread.c b/gdb/thread.c
> index 47cc5c9cd14..7f029f2414c 100644
> --- a/gdb/thread.c
> +++ b/gdb/thread.c
> @@ -93,11 +93,12 @@ inferior_thread (void)
>  static void
>  delete_thread_breakpoint (struct breakpoint **bp_p)
>  {
> -  if (*bp_p != NULL)
> -    {
> -      delete_breakpoint (*bp_p);
> -      *bp_p = NULL;
> -    }
> +  struct breakpoint *bp = *bp_p;
> +
> +  *bp_p = nullptr;
> +
> +  if (bp != nullptr)
> +    delete_breakpoint (bp);
>  }
> 
>  void
> ...
> 
> It makes sure delete_breakpoint is called only once, but I don't think this is the way to go, since it prevents the cleanup.

Err, my intuition is that it doesn't make sense for operations like
that, that update GDB's state (especially the tricky infrun / breakpoint
state) in reaction to target events, to be interruptible by SIGTERM.
It's just a recipe for ending up with half-baked state that is not
up-to-date with the reality.

I suppose that after receiving the SIGTERM, execution is interrupted by
a QUIT macro and a gdb_exception_quit is thrown?  Are you able to tell
which QUIT causes this to happen, and show what the stack looks like at
this point?

Simon
  
Luis Machado Nov. 21, 2023, 4:40 p.m. UTC | #6
On 11/14/23 15:09, Simon Marchi wrote:
> On 11/13/23 10:26, Tom de Vries wrote:
>> On a pinebook (aarch64 with 64-bit kernel and 32-bit userland) with test-case
>> gdb.base/gdb-sigterm.exp I run into:
>> ...
>> intrusive_list.h:458: internal-error: erase_element: \
>>   Assertion `elem_node->prev != INTRUSIVE_LIST_UNLINKED_VALUE' failed.^M
>> ...
>>
>> Fix this similar to:
>> - commit c0afd99439f ("[gdb/tui] Fix assert in ~gdbpy_tui_window_maker"), and
>> - commit 995a34b1772 ("Guard against frame.c destructors running before
>>   frame-info.c's"
>> by checking is_linked () before attempting to remove from breakpoint_chain.
>>
>> Tested on the pinebook and x86_64-linux.
>>
>> PR gdb/31061
>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31061
> 
> Hi Tom,
> 
> Did you dig a little bit to understand why this happened specifically
> with this setup, and that it's indeed expected that the breakpoint is
> not in the breakpoint chain?  The stack trace you showed contains (with
> names demangled):
> 
>     0xaae1d42b delete_breakpoint(breakpoint*)^M
>             /home/rock/gdb/src/gdb/breakpoint.c:12636^M
>     0xab1ed045 delete_thread_breakpoint^M
>             /home/rock/gdb/src/gdb/thread.c:98^M
>     0xab1ed0ab delete_single_step_breakpoints(thread_info*)^M
>             /home/rock/gdb/src/gdb/thread.c:123^M
>     0xab021e81 delete_thread_infrun_breakpoints^M
>             /home/rock/gdb/src/gdb/infrun.c:3733^M
>     0xab021edd for_each_just_stopped_thread^M
>             /home/rock/gdb/src/gdb/infrun.c:3752^M
>     0xab021f79 delete_just_stopped_threads_infrun_breakpoints^M
>             /home/rock/gdb/src/gdb/infrun.c:3768^M
> 
> It looks like this is specific to architectures that use software
> single stepping?  Does "32-bit userland on 64-bit kernel on aarch64" use
> software single stepping, like 32-bit ARM?

Given AArch64 supports hardware single-stepping, I'd expect 32-bit userspace
running on 64-bit Linux Kernel to also support it.
  
Tom de Vries Nov. 21, 2023, 4:52 p.m. UTC | #7
On 11/21/23 17:11, Simon Marchi wrote:
> On 11/15/23 06:12, Tom de Vries wrote:
>> Because it's deleted twice.
>>
>> The last thing we see in gdb.log before the internal error is:
>> ...
>>     [infrun] handle_signal_stop: [13633.13633.0] hit its single-step breakpoint^M
>> ...
>>
>> And the statement following the infrun_debug_printf printing that message is:
>> ...
>>    delete_just_stopped_threads_single_step_breakpoints ();
>> ...
>>
>> So, the following events happen:
>> - the single-step breakpoint is hit
>> - delete_just_stopped_threads_single_step_breakpoints is called
>> - during delete_just_stopped_threads_single_step_breakpoints,
>>    delete_breakpoint is called which removes the breakpoint from the
>>    breakpoint chain
>> - gdb is interrupted by SIGTERM before finish delete_breakpoint
>> - the SIGTERM triggers a SCOPE_EXIT cleanup, calling
>>    delete_just_stopped_threads_infrun_breakpoints which ends up
>>    calling delete_breakpoint again for the same breakpoint.
>>
>> The proposed fix tries to handle delete_breakpoint being interrupted, and called again.
>>
>> This is an alternative fix:
>> ...
>> diff --git a/gdb/thread.c b/gdb/thread.c
>> index 47cc5c9cd14..7f029f2414c 100644
>> --- a/gdb/thread.c
>> +++ b/gdb/thread.c
>> @@ -93,11 +93,12 @@ inferior_thread (void)
>>   static void
>>   delete_thread_breakpoint (struct breakpoint **bp_p)
>>   {
>> -  if (*bp_p != NULL)
>> -    {
>> -      delete_breakpoint (*bp_p);
>> -      *bp_p = NULL;
>> -    }
>> +  struct breakpoint *bp = *bp_p;
>> +
>> +  *bp_p = nullptr;
>> +
>> +  if (bp != nullptr)
>> +    delete_breakpoint (bp);
>>   }
>>
>>   void
>> ...
>>
>> It makes sure delete_breakpoint is called only once, but I don't think this is the way to go, since it prevents the cleanup.
> 
> Err, my intuition is that it doesn't make sense for operations like
> that, that update GDB's state (especially the tricky infrun / breakpoint
> state) in reaction to target events, to be interruptible by SIGTERM.
> It's just a recipe for ending up with half-baked state that is not
> up-to-date with the reality.
> 
> I suppose that after receiving the SIGTERM, execution is interrupted by
> a QUIT macro and a gdb_exception_quit is thrown?  Are you able to tell
> which QUIT causes this to happen, and show what the stack looks like at
> this point?


It's the QUIT in target_write_with_progress.

Backtrace:
...
(gdb) bt
#0  0xf75906d2 in __cxa_throw () from 
/usr/lib/arm-linux-gnueabihf/libstdc++.so.6
#1  0xab1ce512 in throw_it (reason=RETURN_FORCED_QUIT, 
error=GDB_NO_ERROR, fmt=0xab303268 "SIGTERM", ap=...)
     at /home/rock/gdb/src/gdbsupport/common-exceptions.cc:202
#2  0xab1ce622 in throw_forced_quit (fmt=0xab303268 "SIGTERM")
     at /home/rock/gdb/src/gdbsupport/common-exceptions.cc:247
#3  0xab022576 in quit () at /home/rock/gdb/src/gdb/utils.c:639
#4  0xab0225f0 in maybe_quit () at /home/rock/gdb/src/gdb/utils.c:666
#5  0xaaf867e4 in target_write_with_progress (ops=0xab43d7b0 
<the_arm_linux_nat_target>,
     object=TARGET_OBJECT_RAW_MEMORY, annex=0x0, buf=0xab51f4dc 
"\376\347", offset=2863310110, len=2, progress=0x0,
     baton=0x0) at /home/rock/gdb/src/gdb/target.c:2213
#6  0xaaf86828 in target_write (ops=0xab43d7b0 
<the_arm_linux_nat_target>, object=TARGET_OBJECT_RAW_MEMORY,
     annex=0x0, buf=0xab51f4dc "\376\347", offset=2863310110, len=2) at 
/home/rock/gdb/src/gdb/target.c:2226
#7  0xaaf85d30 in target_write_raw_memory (memaddr=2863310110, 
myaddr=0xab51f4dc "\376\347", len=2)
     at /home/rock/gdb/src/gdb/target.c:1849
#8  0xaae48d64 in default_memory_remove_breakpoint (gdbarch=0xab519290, 
bp_tgt=0xab51f4c0)
     at /home/rock/gdb/src/gdb/mem-break.c:83
#9  0xaab57bee in gdbarch_memory_remove_breakpoint (gdbarch=0xab519290, 
bp_tgt=0xab51f4c0)
     at /home/rock/gdb/src/gdb/gdbarch.c:2892
#10 0xaae48da2 in memory_remove_breakpoint (ops=0xab43d7b0 
<the_arm_linux_nat_target>, gdbarch=0xab519290,
     bp_tgt=0xab51f4c0, reason=REMOVE_BREAKPOINT) at 
/home/rock/gdb/src/gdb/mem-break.c:100
#11 0xaab6a9b8 in 
memory_breakpoint_target<process_stratum_target>::remove_breakpoint (
     this=0xab43d7b0 <the_arm_linux_nat_target>, gdbarch=0xab519290, 
bp_tgt=0xab51f4c0, reason=REMOVE_BREAKPOINT)
     at /home/rock/gdb/src/gdb/target.h:2440
#12 0xaaf86b70 in target_remove_breakpoint (gdbarch=0xab519290, 
bp_tgt=0xab51f4c0, reason=REMOVE_BREAKPOINT)
     at /home/rock/gdb/src/gdb/target.c:2381
#13 0xaabc6db8 in code_breakpoint::remove_location (this=0xab51f3a8, 
bl=0xab51f450, reason=REMOVE_BREAKPOINT)
     at /home/rock/gdb/src/gdb/breakpoint.c:12002
#14 0xaabb65f8 in remove_breakpoint_1 (bl=0xab51f450, 
reason=REMOVE_BREAKPOINT)
     at /home/rock/gdb/src/gdb/breakpoint.c:4109
#15 0xaabb688a in remove_breakpoint (bl=0xab51f450) at 
/home/rock/gdb/src/gdb/breakpoint.c:4214
#16 0xaabc6136 in update_global_location_list (insert_mode=UGLL_DONT_INSERT)
     at /home/rock/gdb/src/gdb/breakpoint.c:11554
#17 0xaabc7ff8 in delete_breakpoint (bpt=0xab51f3a8) at 
/home/rock/gdb/src/gdb/breakpoint.c:12657
#18 0xaaf9d556 in delete_thread_breakpoint (bp_p=0xab4d3650) at 
/home/rock/gdb/src/gdb/thread.c:98
#19 0xaaf9d5bc in delete_single_step_breakpoints (tp=0xab4d3618) at 
/home/rock/gdb/src/gdb/thread.c:123
#20 0xaadce6ee in for_each_just_stopped_thread (func=0xaaf9d5a5 
<delete_single_step_breakpoints(thread_info*)>)
     at /home/rock/gdb/src/gdb/infrun.c:3920
#21 0xaadce7a6 in delete_just_stopped_threads_single_step_breakpoints () 
at /home/rock/gdb/src/gdb/infrun.c:3945
#22 0xaadd4a52 in handle_signal_stop (ecs=0xfffeeda0) at 
/home/rock/gdb/src/gdb/infrun.c:6908
#23 0xaadd39b2 in handle_inferior_event (ecs=0xfffeeda0) at 
/home/rock/gdb/src/gdb/infrun.c:6494
#24 0xaadd008e in fetch_inferior_event () at 
/home/rock/gdb/src/gdb/infrun.c:4654
#25 0xaadb3554 in inferior_event_handler (event_type=INF_REG_EVENT) at 
/home/rock/gdb/src/gdb/inf-loop.c:42
#26 0xaae0c700 in handle_target_event (error=0, client_data=0x0) at 
/home/rock/gdb/src/gdb/linux-nat.c:4316
#27 0xab1d2b80 in handle_file_event (file_ptr=0xab5e3348, ready_mask=1)
     at /home/rock/gdb/src/gdbsupport/event-loop.cc:573
#28 0xab1d2e74 in gdb_wait_for_event (block=0) at 
/home/rock/gdb/src/gdbsupport/event-loop.cc:694
#29 0xab1d205e in gdb_do_one_event (mstimeout=-1) at 
/home/rock/gdb/src/gdbsupport/event-loop.cc:217
#30 0xaafa6f8e in wait_sync_command_done () at 
/home/rock/gdb/src/gdb/top.c:427
#31 0xaafa7012 in maybe_wait_sync_command_done (was_sync=0) at 
/home/rock/gdb/src/gdb/top.c:444
#32 0xaafa757a in execute_command (p=0xfffef148 "", from_tty=0) at 
/home/rock/gdb/src/gdb/top.c:577
#33 0xaad5cb96 in command_handler (command=0xfffef144 "step") at 
/home/rock/gdb/src/gdb/event-top.c:566
#34 0xaafa6da4 in read_command_file (stream=0xab4ba590) at 
/home/rock/gdb/src/gdb/top.c:342
#35 0xaac3657e in script_from_file (stream=0xab4ba590, file=0xfffef7e2 
"outputs/gdb.base/gdb-sigterm/gdb.in.1")
     at /home/rock/gdb/src/gdb/cli/cli-script.c:1642
#36 0xaac203ca in source_script_from_stream (stream=0xab4ba590,
     file=0xfffef7e2 "outputs/gdb.base/gdb-sigterm/gdb.in.1",
     file_to_open=0xab466da8 "outputs/gdb.base/gdb-sigterm/gdb.in.1") at 
/home/rock/gdb/src/gdb/cli/cli-cmds.c:730
#37 0xaac204d0 in source_script_with_search (file=0xfffef7e2 
"outputs/gdb.base/gdb-sigterm/gdb.in.1", from_tty=0,
     search_path=0) at /home/rock/gdb/src/gdb/cli/cli-cmds.c:775
#38 0xaac20524 in source_script (file=0xfffef7e2 
"outputs/gdb.base/gdb-sigterm/gdb.in.1", from_tty=0)
     at /home/rock/gdb/src/gdb/cli/cli-cmds.c:784
#39 0xaae33760 in catch_command_errors (command=0xaac20511 
<source_script(char const*, int)>,
     arg=0xfffef7e2 "outputs/gdb.base/gdb-sigterm/gdb.in.1", from_tty=0, 
do_bp_actions=false)
     at /home/rock/gdb/src/gdb/main.c:513
#40 0xaae338da in execute_cmdargs (cmdarg_vec=0xfffef3d4, 
file_type=CMDARG_FILE, cmd_type=CMDARG_COMMAND,
     ret=0xfffef3c4) at /home/rock/gdb/src/gdb/main.c:609
#41 0xaae34c9a in captured_main_1 (context=0xfffef514) at 
/home/rock/gdb/src/gdb/main.c:1293
#42 0xaae34e52 in captured_main (data=0xfffef514) at 
/home/rock/gdb/src/gdb/main.c:1314
#43 0xaae34ebc in gdb_main (args=0xfffef514) at 
/home/rock/gdb/src/gdb/main.c:1343
#44 0xaaaee67a in main (argc=5, argv=0xfffef684) at 
/home/rock/gdb/src/gdb/gdb.c:39
(gdb)
...

Thanks,
- Tom
  
Tom de Vries Nov. 22, 2023, 12:44 p.m. UTC | #8
On 11/21/23 17:52, Tom de Vries wrote:
> On 11/21/23 17:11, Simon Marchi wrote:
>> On 11/15/23 06:12, Tom de Vries wrote:
>>> Because it's deleted twice.
>>>
>>> The last thing we see in gdb.log before the internal error is:
>>> ...
>>>     [infrun] handle_signal_stop: [13633.13633.0] hit its single-step 
>>> breakpoint^M
>>> ...
>>>
>>> And the statement following the infrun_debug_printf printing that 
>>> message is:
>>> ...
>>>    delete_just_stopped_threads_single_step_breakpoints ();
>>> ...
>>>
>>> So, the following events happen:
>>> - the single-step breakpoint is hit
>>> - delete_just_stopped_threads_single_step_breakpoints is called
>>> - during delete_just_stopped_threads_single_step_breakpoints,
>>>    delete_breakpoint is called which removes the breakpoint from the
>>>    breakpoint chain
>>> - gdb is interrupted by SIGTERM before finish delete_breakpoint
>>> - the SIGTERM triggers a SCOPE_EXIT cleanup, calling
>>>    delete_just_stopped_threads_infrun_breakpoints which ends up
>>>    calling delete_breakpoint again for the same breakpoint.
>>>
>>> The proposed fix tries to handle delete_breakpoint being interrupted, 
>>> and called again.
>>>
>>> This is an alternative fix:
>>> ...
>>> diff --git a/gdb/thread.c b/gdb/thread.c
>>> index 47cc5c9cd14..7f029f2414c 100644
>>> --- a/gdb/thread.c
>>> +++ b/gdb/thread.c
>>> @@ -93,11 +93,12 @@ inferior_thread (void)
>>>   static void
>>>   delete_thread_breakpoint (struct breakpoint **bp_p)
>>>   {
>>> -  if (*bp_p != NULL)
>>> -    {
>>> -      delete_breakpoint (*bp_p);
>>> -      *bp_p = NULL;
>>> -    }
>>> +  struct breakpoint *bp = *bp_p;
>>> +
>>> +  *bp_p = nullptr;
>>> +
>>> +  if (bp != nullptr)
>>> +    delete_breakpoint (bp);
>>>   }
>>>
>>>   void
>>> ...
>>>
>>> It makes sure delete_breakpoint is called only once, but I don't 
>>> think this is the way to go, since it prevents the cleanup.
>>
>> Err, my intuition is that it doesn't make sense for operations like
>> that, that update GDB's state (especially the tricky infrun / breakpoint
>> state) in reaction to target events, to be interruptible by SIGTERM.
>> It's just a recipe for ending up with half-baked state that is not
>> up-to-date with the reality.
>>
>> I suppose that after receiving the SIGTERM, execution is interrupted by
>> a QUIT macro and a gdb_exception_quit is thrown?  Are you able to tell
>> which QUIT causes this to happen, and show what the stack looks like at
>> this point?
> 
> 
> It's the QUIT in target_write_with_progress.
> 
> Backtrace:
> ...
> (gdb) bt
> #0  0xf75906d2 in __cxa_throw () from 
> /usr/lib/arm-linux-gnueabihf/libstdc++.so.6
> #1  0xab1ce512 in throw_it (reason=RETURN_FORCED_QUIT, 
> error=GDB_NO_ERROR, fmt=0xab303268 "SIGTERM", ap=...)
>      at /home/rock/gdb/src/gdbsupport/common-exceptions.cc:202
> #2  0xab1ce622 in throw_forced_quit (fmt=0xab303268 "SIGTERM")
>      at /home/rock/gdb/src/gdbsupport/common-exceptions.cc:247
> #3  0xab022576 in quit () at /home/rock/gdb/src/gdb/utils.c:639
> #4  0xab0225f0 in maybe_quit () at /home/rock/gdb/src/gdb/utils.c:666
> #5  0xaaf867e4 in target_write_with_progress (ops=0xab43d7b0 
> <the_arm_linux_nat_target>,
>      object=TARGET_OBJECT_RAW_MEMORY, annex=0x0, buf=0xab51f4dc 
> "\376\347", offset=2863310110, len=2, progress=0x0,
>      baton=0x0) at /home/rock/gdb/src/gdb/target.c:2213
> #6  0xaaf86828 in target_write (ops=0xab43d7b0 
> <the_arm_linux_nat_target>, object=TARGET_OBJECT_RAW_MEMORY,
>      annex=0x0, buf=0xab51f4dc "\376\347", offset=2863310110, len=2) at 
> /home/rock/gdb/src/gdb/target.c:2226
> #7  0xaaf85d30 in target_write_raw_memory (memaddr=2863310110, 
> myaddr=0xab51f4dc "\376\347", len=2)
>      at /home/rock/gdb/src/gdb/target.c:1849
> #8  0xaae48d64 in default_memory_remove_breakpoint (gdbarch=0xab519290, 
> bp_tgt=0xab51f4c0)
>      at /home/rock/gdb/src/gdb/mem-break.c:83
> #9  0xaab57bee in gdbarch_memory_remove_breakpoint (gdbarch=0xab519290, 
> bp_tgt=0xab51f4c0)
>      at /home/rock/gdb/src/gdb/gdbarch.c:2892
> #10 0xaae48da2 in memory_remove_breakpoint (ops=0xab43d7b0 
> <the_arm_linux_nat_target>, gdbarch=0xab519290,
>      bp_tgt=0xab51f4c0, reason=REMOVE_BREAKPOINT) at 
> /home/rock/gdb/src/gdb/mem-break.c:100
> #11 0xaab6a9b8 in 
> memory_breakpoint_target<process_stratum_target>::remove_breakpoint (
>      this=0xab43d7b0 <the_arm_linux_nat_target>, gdbarch=0xab519290, 
> bp_tgt=0xab51f4c0, reason=REMOVE_BREAKPOINT)
>      at /home/rock/gdb/src/gdb/target.h:2440
> #12 0xaaf86b70 in target_remove_breakpoint (gdbarch=0xab519290, 
> bp_tgt=0xab51f4c0, reason=REMOVE_BREAKPOINT)
>      at /home/rock/gdb/src/gdb/target.c:2381
> #13 0xaabc6db8 in code_breakpoint::remove_location (this=0xab51f3a8, 
> bl=0xab51f450, reason=REMOVE_BREAKPOINT)
>      at /home/rock/gdb/src/gdb/breakpoint.c:12002
> #14 0xaabb65f8 in remove_breakpoint_1 (bl=0xab51f450, 
> reason=REMOVE_BREAKPOINT)
>      at /home/rock/gdb/src/gdb/breakpoint.c:4109
> #15 0xaabb688a in remove_breakpoint (bl=0xab51f450) at 
> /home/rock/gdb/src/gdb/breakpoint.c:4214
> #16 0xaabc6136 in update_global_location_list 
> (insert_mode=UGLL_DONT_INSERT)
>      at /home/rock/gdb/src/gdb/breakpoint.c:11554
> #17 0xaabc7ff8 in delete_breakpoint (bpt=0xab51f3a8) at 
> /home/rock/gdb/src/gdb/breakpoint.c:12657
> #18 0xaaf9d556 in delete_thread_breakpoint (bp_p=0xab4d3650) at 
> /home/rock/gdb/src/gdb/thread.c:98
> #19 0xaaf9d5bc in delete_single_step_breakpoints (tp=0xab4d3618) at 
> /home/rock/gdb/src/gdb/thread.c:123
> #20 0xaadce6ee in for_each_just_stopped_thread (func=0xaaf9d5a5 
> <delete_single_step_breakpoints(thread_info*)>)
>      at /home/rock/gdb/src/gdb/infrun.c:3920
> #21 0xaadce7a6 in delete_just_stopped_threads_single_step_breakpoints () 
> at /home/rock/gdb/src/gdb/infrun.c:3945
> #22 0xaadd4a52 in handle_signal_stop (ecs=0xfffeeda0) at 
> /home/rock/gdb/src/gdb/infrun.c:6908
> #23 0xaadd39b2 in handle_inferior_event (ecs=0xfffeeda0) at 
> /home/rock/gdb/src/gdb/infrun.c:6494
> #24 0xaadd008e in fetch_inferior_event () at 
> /home/rock/gdb/src/gdb/infrun.c:4654
> #25 0xaadb3554 in inferior_event_handler (event_type=INF_REG_EVENT) at 
> /home/rock/gdb/src/gdb/inf-loop.c:42
> #26 0xaae0c700 in handle_target_event (error=0, client_data=0x0) at 
> /home/rock/gdb/src/gdb/linux-nat.c:4316
> #27 0xab1d2b80 in handle_file_event (file_ptr=0xab5e3348, ready_mask=1)
>      at /home/rock/gdb/src/gdbsupport/event-loop.cc:573
> #28 0xab1d2e74 in gdb_wait_for_event (block=0) at 
> /home/rock/gdb/src/gdbsupport/event-loop.cc:694
> #29 0xab1d205e in gdb_do_one_event (mstimeout=-1) at 
> /home/rock/gdb/src/gdbsupport/event-loop.cc:217
> #30 0xaafa6f8e in wait_sync_command_done () at 
> /home/rock/gdb/src/gdb/top.c:427
> #31 0xaafa7012 in maybe_wait_sync_command_done (was_sync=0) at 
> /home/rock/gdb/src/gdb/top.c:444
> #32 0xaafa757a in execute_command (p=0xfffef148 "", from_tty=0) at 
> /home/rock/gdb/src/gdb/top.c:577
> #33 0xaad5cb96 in command_handler (command=0xfffef144 "step") at 
> /home/rock/gdb/src/gdb/event-top.c:566
> #34 0xaafa6da4 in read_command_file (stream=0xab4ba590) at 
> /home/rock/gdb/src/gdb/top.c:342
> #35 0xaac3657e in script_from_file (stream=0xab4ba590, file=0xfffef7e2 
> "outputs/gdb.base/gdb-sigterm/gdb.in.1")
>      at /home/rock/gdb/src/gdb/cli/cli-script.c:1642
> #36 0xaac203ca in source_script_from_stream (stream=0xab4ba590,
>      file=0xfffef7e2 "outputs/gdb.base/gdb-sigterm/gdb.in.1",
>      file_to_open=0xab466da8 "outputs/gdb.base/gdb-sigterm/gdb.in.1") at 
> /home/rock/gdb/src/gdb/cli/cli-cmds.c:730
> #37 0xaac204d0 in source_script_with_search (file=0xfffef7e2 
> "outputs/gdb.base/gdb-sigterm/gdb.in.1", from_tty=0,
>      search_path=0) at /home/rock/gdb/src/gdb/cli/cli-cmds.c:775
> #38 0xaac20524 in source_script (file=0xfffef7e2 
> "outputs/gdb.base/gdb-sigterm/gdb.in.1", from_tty=0)
>      at /home/rock/gdb/src/gdb/cli/cli-cmds.c:784
> #39 0xaae33760 in catch_command_errors (command=0xaac20511 
> <source_script(char const*, int)>,
>      arg=0xfffef7e2 "outputs/gdb.base/gdb-sigterm/gdb.in.1", from_tty=0, 
> do_bp_actions=false)
>      at /home/rock/gdb/src/gdb/main.c:513
> #40 0xaae338da in execute_cmdargs (cmdarg_vec=0xfffef3d4, 
> file_type=CMDARG_FILE, cmd_type=CMDARG_COMMAND,
>      ret=0xfffef3c4) at /home/rock/gdb/src/gdb/main.c:609
> #41 0xaae34c9a in captured_main_1 (context=0xfffef514) at 
> /home/rock/gdb/src/gdb/main.c:1293
> #42 0xaae34e52 in captured_main (data=0xfffef514) at 
> /home/rock/gdb/src/gdb/main.c:1314
> #43 0xaae34ebc in gdb_main (args=0xfffef514) at 
> /home/rock/gdb/src/gdb/main.c:1343
> #44 0xaaaee67a in main (argc=5, argv=0xfffef684) at 
> /home/rock/gdb/src/gdb/gdb.c:39
> (gdb)
> ...

Hmm, looking at this backtrace, it seems that the forced_quit is thrown 
during update_global_location_list.

Removing the QUIT from target_write_with_progress makes the test-case 
pass, but the impact of that is likely to broad.

This patch removes the QUIT from default_memory_remove_breakpoint only, 
which also fixes the test-case.

Is this an acceptible approach?

Thanks,
- Tom
  
Tom de Vries Nov. 22, 2023, 2:49 p.m. UTC | #9
On 11/22/23 13:44, Tom de Vries wrote:
> On 11/21/23 17:52, Tom de Vries wrote:
>> On 11/21/23 17:11, Simon Marchi wrote:
>>> On 11/15/23 06:12, Tom de Vries wrote:
>>>> Because it's deleted twice.
>>>>
>>>> The last thing we see in gdb.log before the internal error is:
>>>> ...
>>>>     [infrun] handle_signal_stop: [13633.13633.0] hit its single-step 
>>>> breakpoint^M
>>>> ...
>>>>
>>>> And the statement following the infrun_debug_printf printing that 
>>>> message is:
>>>> ...
>>>>    delete_just_stopped_threads_single_step_breakpoints ();
>>>> ...
>>>>
>>>> So, the following events happen:
>>>> - the single-step breakpoint is hit
>>>> - delete_just_stopped_threads_single_step_breakpoints is called
>>>> - during delete_just_stopped_threads_single_step_breakpoints,
>>>>    delete_breakpoint is called which removes the breakpoint from the
>>>>    breakpoint chain
>>>> - gdb is interrupted by SIGTERM before finish delete_breakpoint
>>>> - the SIGTERM triggers a SCOPE_EXIT cleanup, calling
>>>>    delete_just_stopped_threads_infrun_breakpoints which ends up
>>>>    calling delete_breakpoint again for the same breakpoint.
>>>>
>>>> The proposed fix tries to handle delete_breakpoint being 
>>>> interrupted, and called again.
>>>>
>>>> This is an alternative fix:
>>>> ...
>>>> diff --git a/gdb/thread.c b/gdb/thread.c
>>>> index 47cc5c9cd14..7f029f2414c 100644
>>>> --- a/gdb/thread.c
>>>> +++ b/gdb/thread.c
>>>> @@ -93,11 +93,12 @@ inferior_thread (void)
>>>>   static void
>>>>   delete_thread_breakpoint (struct breakpoint **bp_p)
>>>>   {
>>>> -  if (*bp_p != NULL)
>>>> -    {
>>>> -      delete_breakpoint (*bp_p);
>>>> -      *bp_p = NULL;
>>>> -    }
>>>> +  struct breakpoint *bp = *bp_p;
>>>> +
>>>> +  *bp_p = nullptr;
>>>> +
>>>> +  if (bp != nullptr)
>>>> +    delete_breakpoint (bp);
>>>>   }
>>>>
>>>>   void
>>>> ...
>>>>
>>>> It makes sure delete_breakpoint is called only once, but I don't 
>>>> think this is the way to go, since it prevents the cleanup.
>>>
>>> Err, my intuition is that it doesn't make sense for operations like
>>> that, that update GDB's state (especially the tricky infrun / breakpoint
>>> state) in reaction to target events, to be interruptible by SIGTERM.
>>> It's just a recipe for ending up with half-baked state that is not
>>> up-to-date with the reality.
>>>
>>> I suppose that after receiving the SIGTERM, execution is interrupted by
>>> a QUIT macro and a gdb_exception_quit is thrown?  Are you able to tell
>>> which QUIT causes this to happen, and show what the stack looks like at
>>> this point?
>>
>>
>> It's the QUIT in target_write_with_progress.
>>
>> Backtrace:
>> ...
>> (gdb) bt
>> #0  0xf75906d2 in __cxa_throw () from 
>> /usr/lib/arm-linux-gnueabihf/libstdc++.so.6
>> #1  0xab1ce512 in throw_it (reason=RETURN_FORCED_QUIT, 
>> error=GDB_NO_ERROR, fmt=0xab303268 "SIGTERM", ap=...)
>>      at /home/rock/gdb/src/gdbsupport/common-exceptions.cc:202
>> #2  0xab1ce622 in throw_forced_quit (fmt=0xab303268 "SIGTERM")
>>      at /home/rock/gdb/src/gdbsupport/common-exceptions.cc:247
>> #3  0xab022576 in quit () at /home/rock/gdb/src/gdb/utils.c:639
>> #4  0xab0225f0 in maybe_quit () at /home/rock/gdb/src/gdb/utils.c:666
>> #5  0xaaf867e4 in target_write_with_progress (ops=0xab43d7b0 
>> <the_arm_linux_nat_target>,
>>      object=TARGET_OBJECT_RAW_MEMORY, annex=0x0, buf=0xab51f4dc 
>> "\376\347", offset=2863310110, len=2, progress=0x0,
>>      baton=0x0) at /home/rock/gdb/src/gdb/target.c:2213
>> #6  0xaaf86828 in target_write (ops=0xab43d7b0 
>> <the_arm_linux_nat_target>, object=TARGET_OBJECT_RAW_MEMORY,
>>      annex=0x0, buf=0xab51f4dc "\376\347", offset=2863310110, len=2) 
>> at /home/rock/gdb/src/gdb/target.c:2226
>> #7  0xaaf85d30 in target_write_raw_memory (memaddr=2863310110, 
>> myaddr=0xab51f4dc "\376\347", len=2)
>>      at /home/rock/gdb/src/gdb/target.c:1849
>> #8  0xaae48d64 in default_memory_remove_breakpoint 
>> (gdbarch=0xab519290, bp_tgt=0xab51f4c0)
>>      at /home/rock/gdb/src/gdb/mem-break.c:83
>> #9  0xaab57bee in gdbarch_memory_remove_breakpoint 
>> (gdbarch=0xab519290, bp_tgt=0xab51f4c0)
>>      at /home/rock/gdb/src/gdb/gdbarch.c:2892
>> #10 0xaae48da2 in memory_remove_breakpoint (ops=0xab43d7b0 
>> <the_arm_linux_nat_target>, gdbarch=0xab519290,
>>      bp_tgt=0xab51f4c0, reason=REMOVE_BREAKPOINT) at 
>> /home/rock/gdb/src/gdb/mem-break.c:100
>> #11 0xaab6a9b8 in 
>> memory_breakpoint_target<process_stratum_target>::remove_breakpoint (
>>      this=0xab43d7b0 <the_arm_linux_nat_target>, gdbarch=0xab519290, 
>> bp_tgt=0xab51f4c0, reason=REMOVE_BREAKPOINT)
>>      at /home/rock/gdb/src/gdb/target.h:2440
>> #12 0xaaf86b70 in target_remove_breakpoint (gdbarch=0xab519290, 
>> bp_tgt=0xab51f4c0, reason=REMOVE_BREAKPOINT)
>>      at /home/rock/gdb/src/gdb/target.c:2381
>> #13 0xaabc6db8 in code_breakpoint::remove_location (this=0xab51f3a8, 
>> bl=0xab51f450, reason=REMOVE_BREAKPOINT)
>>      at /home/rock/gdb/src/gdb/breakpoint.c:12002
>> #14 0xaabb65f8 in remove_breakpoint_1 (bl=0xab51f450, 
>> reason=REMOVE_BREAKPOINT)
>>      at /home/rock/gdb/src/gdb/breakpoint.c:4109
>> #15 0xaabb688a in remove_breakpoint (bl=0xab51f450) at 
>> /home/rock/gdb/src/gdb/breakpoint.c:4214
>> #16 0xaabc6136 in update_global_location_list 
>> (insert_mode=UGLL_DONT_INSERT)
>>      at /home/rock/gdb/src/gdb/breakpoint.c:11554
>> #17 0xaabc7ff8 in delete_breakpoint (bpt=0xab51f3a8) at 
>> /home/rock/gdb/src/gdb/breakpoint.c:12657
>> #18 0xaaf9d556 in delete_thread_breakpoint (bp_p=0xab4d3650) at 
>> /home/rock/gdb/src/gdb/thread.c:98
>> #19 0xaaf9d5bc in delete_single_step_breakpoints (tp=0xab4d3618) at 
>> /home/rock/gdb/src/gdb/thread.c:123
>> #20 0xaadce6ee in for_each_just_stopped_thread (func=0xaaf9d5a5 
>> <delete_single_step_breakpoints(thread_info*)>)
>>      at /home/rock/gdb/src/gdb/infrun.c:3920
>> #21 0xaadce7a6 in delete_just_stopped_threads_single_step_breakpoints 
>> () at /home/rock/gdb/src/gdb/infrun.c:3945
>> #22 0xaadd4a52 in handle_signal_stop (ecs=0xfffeeda0) at 
>> /home/rock/gdb/src/gdb/infrun.c:6908
>> #23 0xaadd39b2 in handle_inferior_event (ecs=0xfffeeda0) at 
>> /home/rock/gdb/src/gdb/infrun.c:6494
>> #24 0xaadd008e in fetch_inferior_event () at 
>> /home/rock/gdb/src/gdb/infrun.c:4654
>> #25 0xaadb3554 in inferior_event_handler (event_type=INF_REG_EVENT) at 
>> /home/rock/gdb/src/gdb/inf-loop.c:42
>> #26 0xaae0c700 in handle_target_event (error=0, client_data=0x0) at 
>> /home/rock/gdb/src/gdb/linux-nat.c:4316
>> #27 0xab1d2b80 in handle_file_event (file_ptr=0xab5e3348, ready_mask=1)
>>      at /home/rock/gdb/src/gdbsupport/event-loop.cc:573
>> #28 0xab1d2e74 in gdb_wait_for_event (block=0) at 
>> /home/rock/gdb/src/gdbsupport/event-loop.cc:694
>> #29 0xab1d205e in gdb_do_one_event (mstimeout=-1) at 
>> /home/rock/gdb/src/gdbsupport/event-loop.cc:217
>> #30 0xaafa6f8e in wait_sync_command_done () at 
>> /home/rock/gdb/src/gdb/top.c:427
>> #31 0xaafa7012 in maybe_wait_sync_command_done (was_sync=0) at 
>> /home/rock/gdb/src/gdb/top.c:444
>> #32 0xaafa757a in execute_command (p=0xfffef148 "", from_tty=0) at 
>> /home/rock/gdb/src/gdb/top.c:577
>> #33 0xaad5cb96 in command_handler (command=0xfffef144 "step") at 
>> /home/rock/gdb/src/gdb/event-top.c:566
>> #34 0xaafa6da4 in read_command_file (stream=0xab4ba590) at 
>> /home/rock/gdb/src/gdb/top.c:342
>> #35 0xaac3657e in script_from_file (stream=0xab4ba590, file=0xfffef7e2 
>> "outputs/gdb.base/gdb-sigterm/gdb.in.1")
>>      at /home/rock/gdb/src/gdb/cli/cli-script.c:1642
>> #36 0xaac203ca in source_script_from_stream (stream=0xab4ba590,
>>      file=0xfffef7e2 "outputs/gdb.base/gdb-sigterm/gdb.in.1",
>>      file_to_open=0xab466da8 "outputs/gdb.base/gdb-sigterm/gdb.in.1") 
>> at /home/rock/gdb/src/gdb/cli/cli-cmds.c:730
>> #37 0xaac204d0 in source_script_with_search (file=0xfffef7e2 
>> "outputs/gdb.base/gdb-sigterm/gdb.in.1", from_tty=0,
>>      search_path=0) at /home/rock/gdb/src/gdb/cli/cli-cmds.c:775
>> #38 0xaac20524 in source_script (file=0xfffef7e2 
>> "outputs/gdb.base/gdb-sigterm/gdb.in.1", from_tty=0)
>>      at /home/rock/gdb/src/gdb/cli/cli-cmds.c:784
>> #39 0xaae33760 in catch_command_errors (command=0xaac20511 
>> <source_script(char const*, int)>,
>>      arg=0xfffef7e2 "outputs/gdb.base/gdb-sigterm/gdb.in.1", 
>> from_tty=0, do_bp_actions=false)
>>      at /home/rock/gdb/src/gdb/main.c:513
>> #40 0xaae338da in execute_cmdargs (cmdarg_vec=0xfffef3d4, 
>> file_type=CMDARG_FILE, cmd_type=CMDARG_COMMAND,
>>      ret=0xfffef3c4) at /home/rock/gdb/src/gdb/main.c:609
>> #41 0xaae34c9a in captured_main_1 (context=0xfffef514) at 
>> /home/rock/gdb/src/gdb/main.c:1293
>> #42 0xaae34e52 in captured_main (data=0xfffef514) at 
>> /home/rock/gdb/src/gdb/main.c:1314
>> #43 0xaae34ebc in gdb_main (args=0xfffef514) at 
>> /home/rock/gdb/src/gdb/main.c:1343
>> #44 0xaaaee67a in main (argc=5, argv=0xfffef684) at 
>> /home/rock/gdb/src/gdb/gdb.c:39
>> (gdb)
>> ...
> 
> Hmm, looking at this backtrace, it seems that the forced_quit is thrown 
> during update_global_location_list.
> 
> Removing the QUIT from target_write_with_progress makes the test-case 
> pass, but the impact of that is likely to broad.
> 
> This patch removes the QUIT from default_memory_remove_breakpoint only, 
> which also fixes the test-case.
> 

Update: I've tested it on x86_64-linux and pinebook, no issues found.

Thanks,
- Tom

> Is this an acceptible approach?
> 
> Thanks,
> - Tom
  
Tom de Vries Nov. 27, 2023, 10:19 a.m. UTC | #10
On 11/22/23 13:44, Tom de Vries wrote:
> On 11/21/23 17:52, Tom de Vries wrote:
>> On 11/21/23 17:11, Simon Marchi wrote:
>>> On 11/15/23 06:12, Tom de Vries wrote:
>>>> Because it's deleted twice.
>>>>
>>>> The last thing we see in gdb.log before the internal error is:
>>>> ...
>>>>     [infrun] handle_signal_stop: [13633.13633.0] hit its single-step 
>>>> breakpoint^M
>>>> ...
>>>>
>>>> And the statement following the infrun_debug_printf printing that 
>>>> message is:
>>>> ...
>>>>    delete_just_stopped_threads_single_step_breakpoints ();
>>>> ...
>>>>
>>>> So, the following events happen:
>>>> - the single-step breakpoint is hit
>>>> - delete_just_stopped_threads_single_step_breakpoints is called
>>>> - during delete_just_stopped_threads_single_step_breakpoints,
>>>>    delete_breakpoint is called which removes the breakpoint from the
>>>>    breakpoint chain
>>>> - gdb is interrupted by SIGTERM before finish delete_breakpoint
>>>> - the SIGTERM triggers a SCOPE_EXIT cleanup, calling
>>>>    delete_just_stopped_threads_infrun_breakpoints which ends up
>>>>    calling delete_breakpoint again for the same breakpoint.
>>>>
>>>> The proposed fix tries to handle delete_breakpoint being 
>>>> interrupted, and called again.
>>>>
>>>> This is an alternative fix:
>>>> ...
>>>> diff --git a/gdb/thread.c b/gdb/thread.c
>>>> index 47cc5c9cd14..7f029f2414c 100644
>>>> --- a/gdb/thread.c
>>>> +++ b/gdb/thread.c
>>>> @@ -93,11 +93,12 @@ inferior_thread (void)
>>>>   static void
>>>>   delete_thread_breakpoint (struct breakpoint **bp_p)
>>>>   {
>>>> -  if (*bp_p != NULL)
>>>> -    {
>>>> -      delete_breakpoint (*bp_p);
>>>> -      *bp_p = NULL;
>>>> -    }
>>>> +  struct breakpoint *bp = *bp_p;
>>>> +
>>>> +  *bp_p = nullptr;
>>>> +
>>>> +  if (bp != nullptr)
>>>> +    delete_breakpoint (bp);
>>>>   }
>>>>
>>>>   void
>>>> ...
>>>>
>>>> It makes sure delete_breakpoint is called only once, but I don't 
>>>> think this is the way to go, since it prevents the cleanup.
>>>
>>> Err, my intuition is that it doesn't make sense for operations like
>>> that, that update GDB's state (especially the tricky infrun / breakpoint
>>> state) in reaction to target events, to be interruptible by SIGTERM.
>>> It's just a recipe for ending up with half-baked state that is not
>>> up-to-date with the reality.
>>>
>>> I suppose that after receiving the SIGTERM, execution is interrupted by
>>> a QUIT macro and a gdb_exception_quit is thrown?  Are you able to tell
>>> which QUIT causes this to happen, and show what the stack looks like at
>>> this point?
>>
>>
>> It's the QUIT in target_write_with_progress.
>>
>> Backtrace:
>> ...
>> (gdb) bt
>> #0  0xf75906d2 in __cxa_throw () from 
>> /usr/lib/arm-linux-gnueabihf/libstdc++.so.6
>> #1  0xab1ce512 in throw_it (reason=RETURN_FORCED_QUIT, 
>> error=GDB_NO_ERROR, fmt=0xab303268 "SIGTERM", ap=...)
>>      at /home/rock/gdb/src/gdbsupport/common-exceptions.cc:202
>> #2  0xab1ce622 in throw_forced_quit (fmt=0xab303268 "SIGTERM")
>>      at /home/rock/gdb/src/gdbsupport/common-exceptions.cc:247
>> #3  0xab022576 in quit () at /home/rock/gdb/src/gdb/utils.c:639
>> #4  0xab0225f0 in maybe_quit () at /home/rock/gdb/src/gdb/utils.c:666
>> #5  0xaaf867e4 in target_write_with_progress (ops=0xab43d7b0 
>> <the_arm_linux_nat_target>,
>>      object=TARGET_OBJECT_RAW_MEMORY, annex=0x0, buf=0xab51f4dc 
>> "\376\347", offset=2863310110, len=2, progress=0x0,
>>      baton=0x0) at /home/rock/gdb/src/gdb/target.c:2213
>> #6  0xaaf86828 in target_write (ops=0xab43d7b0 
>> <the_arm_linux_nat_target>, object=TARGET_OBJECT_RAW_MEMORY,
>>      annex=0x0, buf=0xab51f4dc "\376\347", offset=2863310110, len=2) 
>> at /home/rock/gdb/src/gdb/target.c:2226
>> #7  0xaaf85d30 in target_write_raw_memory (memaddr=2863310110, 
>> myaddr=0xab51f4dc "\376\347", len=2)
>>      at /home/rock/gdb/src/gdb/target.c:1849
>> #8  0xaae48d64 in default_memory_remove_breakpoint 
>> (gdbarch=0xab519290, bp_tgt=0xab51f4c0)
>>      at /home/rock/gdb/src/gdb/mem-break.c:83
>> #9  0xaab57bee in gdbarch_memory_remove_breakpoint 
>> (gdbarch=0xab519290, bp_tgt=0xab51f4c0)
>>      at /home/rock/gdb/src/gdb/gdbarch.c:2892
>> #10 0xaae48da2 in memory_remove_breakpoint (ops=0xab43d7b0 
>> <the_arm_linux_nat_target>, gdbarch=0xab519290,
>>      bp_tgt=0xab51f4c0, reason=REMOVE_BREAKPOINT) at 
>> /home/rock/gdb/src/gdb/mem-break.c:100
>> #11 0xaab6a9b8 in 
>> memory_breakpoint_target<process_stratum_target>::remove_breakpoint (
>>      this=0xab43d7b0 <the_arm_linux_nat_target>, gdbarch=0xab519290, 
>> bp_tgt=0xab51f4c0, reason=REMOVE_BREAKPOINT)
>>      at /home/rock/gdb/src/gdb/target.h:2440
>> #12 0xaaf86b70 in target_remove_breakpoint (gdbarch=0xab519290, 
>> bp_tgt=0xab51f4c0, reason=REMOVE_BREAKPOINT)
>>      at /home/rock/gdb/src/gdb/target.c:2381
>> #13 0xaabc6db8 in code_breakpoint::remove_location (this=0xab51f3a8, 
>> bl=0xab51f450, reason=REMOVE_BREAKPOINT)
>>      at /home/rock/gdb/src/gdb/breakpoint.c:12002
>> #14 0xaabb65f8 in remove_breakpoint_1 (bl=0xab51f450, 
>> reason=REMOVE_BREAKPOINT)
>>      at /home/rock/gdb/src/gdb/breakpoint.c:4109
>> #15 0xaabb688a in remove_breakpoint (bl=0xab51f450) at 
>> /home/rock/gdb/src/gdb/breakpoint.c:4214
>> #16 0xaabc6136 in update_global_location_list 
>> (insert_mode=UGLL_DONT_INSERT)
>>      at /home/rock/gdb/src/gdb/breakpoint.c:11554
>> #17 0xaabc7ff8 in delete_breakpoint (bpt=0xab51f3a8) at 
>> /home/rock/gdb/src/gdb/breakpoint.c:12657
>> #18 0xaaf9d556 in delete_thread_breakpoint (bp_p=0xab4d3650) at 
>> /home/rock/gdb/src/gdb/thread.c:98
>> #19 0xaaf9d5bc in delete_single_step_breakpoints (tp=0xab4d3618) at 
>> /home/rock/gdb/src/gdb/thread.c:123
>> #20 0xaadce6ee in for_each_just_stopped_thread (func=0xaaf9d5a5 
>> <delete_single_step_breakpoints(thread_info*)>)
>>      at /home/rock/gdb/src/gdb/infrun.c:3920
>> #21 0xaadce7a6 in delete_just_stopped_threads_single_step_breakpoints 
>> () at /home/rock/gdb/src/gdb/infrun.c:3945
>> #22 0xaadd4a52 in handle_signal_stop (ecs=0xfffeeda0) at 
>> /home/rock/gdb/src/gdb/infrun.c:6908
>> #23 0xaadd39b2 in handle_inferior_event (ecs=0xfffeeda0) at 
>> /home/rock/gdb/src/gdb/infrun.c:6494
>> #24 0xaadd008e in fetch_inferior_event () at 
>> /home/rock/gdb/src/gdb/infrun.c:4654
>> #25 0xaadb3554 in inferior_event_handler (event_type=INF_REG_EVENT) at 
>> /home/rock/gdb/src/gdb/inf-loop.c:42
>> #26 0xaae0c700 in handle_target_event (error=0, client_data=0x0) at 
>> /home/rock/gdb/src/gdb/linux-nat.c:4316
>> #27 0xab1d2b80 in handle_file_event (file_ptr=0xab5e3348, ready_mask=1)
>>      at /home/rock/gdb/src/gdbsupport/event-loop.cc:573
>> #28 0xab1d2e74 in gdb_wait_for_event (block=0) at 
>> /home/rock/gdb/src/gdbsupport/event-loop.cc:694
>> #29 0xab1d205e in gdb_do_one_event (mstimeout=-1) at 
>> /home/rock/gdb/src/gdbsupport/event-loop.cc:217
>> #30 0xaafa6f8e in wait_sync_command_done () at 
>> /home/rock/gdb/src/gdb/top.c:427
>> #31 0xaafa7012 in maybe_wait_sync_command_done (was_sync=0) at 
>> /home/rock/gdb/src/gdb/top.c:444
>> #32 0xaafa757a in execute_command (p=0xfffef148 "", from_tty=0) at 
>> /home/rock/gdb/src/gdb/top.c:577
>> #33 0xaad5cb96 in command_handler (command=0xfffef144 "step") at 
>> /home/rock/gdb/src/gdb/event-top.c:566
>> #34 0xaafa6da4 in read_command_file (stream=0xab4ba590) at 
>> /home/rock/gdb/src/gdb/top.c:342
>> #35 0xaac3657e in script_from_file (stream=0xab4ba590, file=0xfffef7e2 
>> "outputs/gdb.base/gdb-sigterm/gdb.in.1")
>>      at /home/rock/gdb/src/gdb/cli/cli-script.c:1642
>> #36 0xaac203ca in source_script_from_stream (stream=0xab4ba590,
>>      file=0xfffef7e2 "outputs/gdb.base/gdb-sigterm/gdb.in.1",
>>      file_to_open=0xab466da8 "outputs/gdb.base/gdb-sigterm/gdb.in.1") 
>> at /home/rock/gdb/src/gdb/cli/cli-cmds.c:730
>> #37 0xaac204d0 in source_script_with_search (file=0xfffef7e2 
>> "outputs/gdb.base/gdb-sigterm/gdb.in.1", from_tty=0,
>>      search_path=0) at /home/rock/gdb/src/gdb/cli/cli-cmds.c:775
>> #38 0xaac20524 in source_script (file=0xfffef7e2 
>> "outputs/gdb.base/gdb-sigterm/gdb.in.1", from_tty=0)
>>      at /home/rock/gdb/src/gdb/cli/cli-cmds.c:784
>> #39 0xaae33760 in catch_command_errors (command=0xaac20511 
>> <source_script(char const*, int)>,
>>      arg=0xfffef7e2 "outputs/gdb.base/gdb-sigterm/gdb.in.1", 
>> from_tty=0, do_bp_actions=false)
>>      at /home/rock/gdb/src/gdb/main.c:513
>> #40 0xaae338da in execute_cmdargs (cmdarg_vec=0xfffef3d4, 
>> file_type=CMDARG_FILE, cmd_type=CMDARG_COMMAND,
>>      ret=0xfffef3c4) at /home/rock/gdb/src/gdb/main.c:609
>> #41 0xaae34c9a in captured_main_1 (context=0xfffef514) at 
>> /home/rock/gdb/src/gdb/main.c:1293
>> #42 0xaae34e52 in captured_main (data=0xfffef514) at 
>> /home/rock/gdb/src/gdb/main.c:1314
>> #43 0xaae34ebc in gdb_main (args=0xfffef514) at 
>> /home/rock/gdb/src/gdb/main.c:1343
>> #44 0xaaaee67a in main (argc=5, argv=0xfffef684) at 
>> /home/rock/gdb/src/gdb/gdb.c:39
>> (gdb)
>> ...
> 
> Hmm, looking at this backtrace, it seems that the forced_quit is thrown 
> during update_global_location_list.
> 
> Removing the QUIT from target_write_with_progress makes the test-case 
> pass, but the impact of that is likely to broad.
> 
> This patch removes the QUIT from default_memory_remove_breakpoint only, 
> which also fixes the test-case.
> 
> Is this an acceptible approach?

This uses the same approach, but implements it in a less intrusive way, 
by adding a suppress_quit variable.

Any comments about this approach?

Thanks,
- Tom

diff --git a/gdb/mem-break.c b/gdb/mem-break.c
index abff4b47081..ce61e055945 100644
--- a/gdb/mem-break.c
+++ b/gdb/mem-break.c
@@ -71,6 +71,9 @@ default_memory_insert_breakpoint (struct gdbarch *gdbarch,
    return val;
  }

+extern bool suppress_quit;
+#define SCOPED_SUPPRESS_QUIT \
+  scoped_restore save_suppress_quit = make_scoped_restore 
(&suppress_quit, true)

  int
  default_memory_remove_breakpoint (struct gdbarch *gdbarch,
@@ -80,6 +83,7 @@ default_memory_remove_breakpoint (struct gdbarch *gdbarch,

    gdbarch_sw_breakpoint_from_kind (gdbarch, bp_tgt->kind, &bplen);

+  SCOPED_SUPPRESS_QUIT;
    return target_write_raw_memory (bp_tgt->placed_address, 
bp_tgt->shadow_contents,
  				  bplen);
  }
diff --git a/gdb/utils.c b/gdb/utils.c
index 7a1841ba21e..b5e6d792383 100644
--- a/gdb/utils.c
+++ b/gdb/utils.c
@@ -654,11 +654,17 @@ quit (void)
  #endif
  }

+extern bool suppress_quit;
+bool suppress_quit;
+
  /* See defs.h.  */

  void
  maybe_quit (void)
  {
+  if (suppress_quit)
+    return;
+
    if (!is_main_thread ())
      return;
  
Simon Marchi Nov. 27, 2023, 4:29 p.m. UTC | #11
On 11/27/23 05:19, Tom de Vries wrote:
> On 11/22/23 13:44, Tom de Vries wrote:
>> Hmm, looking at this backtrace, it seems that the forced_quit is thrown during update_global_location_list.
>>
>> Removing the QUIT from target_write_with_progress makes the test-case pass, but the impact of that is likely to broad.
>>
>> This patch removes the QUIT from default_memory_remove_breakpoint only, which also fixes the test-case.
>>
>> Is this an acceptible approach?
> 
> This uses the same approach, but implements it in a less intrusive way, by adding a suppress_quit variable.
> 
> Any comments about this approach?
> 
> Thanks,
> - Tom
> 
> diff --git a/gdb/mem-break.c b/gdb/mem-break.c
> index abff4b47081..ce61e055945 100644
> --- a/gdb/mem-break.c
> +++ b/gdb/mem-break.c
> @@ -71,6 +71,9 @@ default_memory_insert_breakpoint (struct gdbarch *gdbarch,
>    return val;
>  }
> 
> +extern bool suppress_quit;
> +#define SCOPED_SUPPRESS_QUIT \
> +  scoped_restore save_suppress_quit = make_scoped_restore (&suppress_quit, true)
> 
>  int
>  default_memory_remove_breakpoint (struct gdbarch *gdbarch,
> @@ -80,6 +83,7 @@ default_memory_remove_breakpoint (struct gdbarch *gdbarch,
> 
>    gdbarch_sw_breakpoint_from_kind (gdbarch, bp_tgt->kind, &bplen);
> 
> +  SCOPED_SUPPRESS_QUIT;
>    return target_write_raw_memory (bp_tgt->placed_address, bp_tgt->shadow_contents,
>                    bplen);
>  }
> diff --git a/gdb/utils.c b/gdb/utils.c
> index 7a1841ba21e..b5e6d792383 100644
> --- a/gdb/utils.c
> +++ b/gdb/utils.c
> @@ -654,11 +654,17 @@ quit (void)
>  #endif
>  }
> 
> +extern bool suppress_quit;
> +bool suppress_quit;
> +
>  /* See defs.h.  */
> 
>  void
>  maybe_quit (void)
>  {
> +  if (suppress_quit)
> +    return;
> +
>    if (!is_main_thread ())
>      return;
> 
> 

I believe that something like this (suppress_quit) would be good to
have.  As you said, removing some QUITs from functions that do memory
reads and writes would be too broad.  In some cases, you want those to
be interruptible, and in other cases you don't.

For instance, you ask GDB to print a lot of memory, it's taking too
long, you want to stop.  There's not risk in stopping that operation in
the middle.  Then let's say you ask GDB to write a big object in memory,
it's taking too long, you want to interrupt it.  It might leave you with
a half-written object, but you know the risks of interrupting your write
operation, that's fine.

On the other hand, I suspect that any memory read / write used to
implement GDB's internal logic should be considered by default unsafe to
interrupt.  Otherwise, that code should be written in a way that ensures
that should exceptions get thrown, the internal state is left in a
coherent... state, either by rolling back changes or otherwise.  That
sounds difficult, if not impossible.  So in those cases, I think the
only safe thing to do is suppress QUITs.

The question is, how to decide where QUITs should be suppressed.  Here's
a random backtrace I just grabbed from a maybe_quit call under an
update_global_location_list call:

    #0  maybe_quit () at /home/smarchi/src/binutils-gdb/gdb/utils.c:662
    #1  0x000056366ecaa8ee in target_read (ops=0x56367c2f4240 <the_amd64_linux_nat_target>, object=TARGET_OBJECT_MEMORY, annex=0x0, buf=0x7ffe481b40e0 "HA\033H\376\177", offset=93824992235889,
        len=1) at /home/smarchi/src/binutils-gdb/gdb/target.c:1972
    #2  0x000056366eca95a0 in target_read_memory (memaddr=0x555555555171, myaddr=0x7ffe481b40e0 "HA\033H\376\177", len=1) at /home/smarchi/src/binutils-gdb/gdb/target.c:1755
    #3  0x000056366dc0f6be in default_memory_insert_breakpoint (gdbarch=0x61c00001d080, bp_tgt=0x616000073f28) at /home/smarchi/src/binutils-gdb/gdb/mem-break.c:54
    #4  0x000056366c17574a in gdbarch_memory_insert_breakpoint (gdbarch=0x61c00001d080, bp_tgt=0x616000073f28) at /home/smarchi/src/binutils-gdb/gdb/gdbarch.c:2875
    #5  0x000056366dc0fb94 in memory_insert_breakpoint (ops=0x56367c2f4240 <the_amd64_linux_nat_target>, gdbarch=0x61c00001d080, bp_tgt=0x616000073f28)
        at /home/smarchi/src/binutils-gdb/gdb/mem-break.c:92
    #6  0x000056366c0067d5 in memory_breakpoint_target<process_stratum_target>::insert_breakpoint (this=0x56367c2f4240 <the_amd64_linux_nat_target>, gdbarch=0x61c00001d080,
        bp_tgt=0x616000073f28) at /home/smarchi/src/binutils-gdb/gdb/target.h:2435
    #7  0x000056366ecacf34 in target_insert_breakpoint (gdbarch=0x61c00001d080, bp_tgt=0x616000073f28) at /home/smarchi/src/binutils-gdb/gdb/target.c:2359
    #8  0x000056366c52ebd5 in code_breakpoint::insert_location (this=0x611000045dc0, bl=0x616000073e80) at /home/smarchi/src/binutils-gdb/gdb/breakpoint.c:11977
    #9  0x000056366c477a64 in insert_bp_location (bl=0x616000073e80, tmp_error_stream=0x7ffe481b48f0, disabled_breaks=0x7ffe481b4700, hw_breakpoint_error=0x7ffe481b4710,
        hw_bp_error_explained_already=0x7ffe481b4720) at /home/smarchi/src/binutils-gdb/gdb/breakpoint.c:2901
    #10 0x000056366c481f9c in insert_breakpoint_locations () at /home/smarchi/src/binutils-gdb/gdb/breakpoint.c:3291
    #11 0x000056366c529ae4 in update_global_location_list (insert_mode=UGLL_MAY_INSERT) at /home/smarchi/src/binutils-gdb/gdb/breakpoint.c:11720
    #12 0x000056366c54998d in breakpoint_re_set () at /home/smarchi/src/binutils-gdb/gdb/breakpoint.c:13271
    #13 0x000056366deccb27 in objfile_relocate (objfile=0x614000005440, new_offsets=std::__debug::vector of length 38, capacity 38 = {...}) at /home/smarchi/src/binutils-gdb/gdb/objfiles.c:707
    #14 0x000056366e8aaa6b in svr4_relocate_main_executable () at /home/smarchi/src/binutils-gdb/gdb/solib-svr4.c:3004
    #15 0x000056366e8aaefe in svr4_solib_create_inferior_hook (from_tty=0) at /home/smarchi/src/binutils-gdb/gdb/solib-svr4.c:3048
    #16 0x000056366e904f3c in solib_create_inferior_hook (from_tty=0) at /home/smarchi/src/binutils-gdb/gdb/solib.c:1217
    #17 0x000056366d717f57 in post_create_inferior (from_tty=0) at /home/smarchi/src/binutils-gdb/gdb/infcmd.c:275
    #18 0x000056366d71a14e in run_command_1 (args=0x0, from_tty=1, run_how=RUN_NORMAL) at /home/smarchi/src/binutils-gdb/gdb/infcmd.c:486
    #19 0x000056366d71a5b2 in run_command (args=0x0, from_tty=1) at /home/smarchi/src/binutils-gdb/gdb/infcmd.c:512
    #20 0x000056366c8d66d5 in do_simple_func (During symbol reading: macro `CTRL' redefined at /usr/include/x86_64-linux-gnu/sys/ttydefaults.h:55; original definition at /usr/include/readline/chardefs.h:51
    args=0x0, from_tty=1, c=0x61200005bcc0) at /home/smarchi/src/binutils-gdb/gdb/cli/cli-decode.c:95
    #21 0x000056366c8ebe3b in cmd_func (cmd=0x61200005bcc0, args=0x0, from_tty=1) at /home/smarchi/src/binutils-gdb/gdb/cli/cli-decode.c:2735
    #22 0x000056366ee07837 in execute_command (p=0x6020000393d1 "", from_tty=1) at /home/smarchi/src/binutils-gdb/gdb/top.c:575
    #23 0x000056366d2e9cdb in command_handler (command=0x6020000393d0 "") at /home/smarchi/src/binutils-gdb/gdb/event-top.c:566
    #24 0x000056366d2eafd1 in command_line_handler (rl=...) at /home/smarchi/src/binutils-gdb/gdb/event-top.c:802
    #25 0x000056366ef64bdb in tui_command_line_handler (rl=...) at /home/smarchi/src/binutils-gdb/gdb/tui/tui-interp.c:104
    #26 0x000056366d2e7d40 in gdb_rl_callback_handler (rl=0x602000039370 "r") at /home/smarchi/src/binutils-gdb/gdb/event-top.c:259
    #27 0x00007f3ad551246d in rl_callback_read_char () from /lib/x86_64-linux-gnu/libreadline.so.8
    #28 0x000056366d2e7764 in gdb_rl_callback_read_char_wrapper_noexcept () at /home/smarchi/src/binutils-gdb/gdb/event-top.c:195
    #29 0x000056366d2e7990 in gdb_rl_callback_read_char_wrapper (client_data=0x611000001440) at /home/smarchi/src/binutils-gdb/gdb/event-top.c:234
    #30 0x000056366f0b730c in stdin_event_handler (error=0, client_data=0x611000001440) at /home/smarchi/src/binutils-gdb/gdb/ui.c:155
    #31 0x0000563670708345 in handle_file_event (file_ptr=0x60700000dd10, ready_mask=1) at /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:573
    #32 0x0000563670708c58 in gdb_wait_for_event (block=1) at /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:694
    #33 0x00005636707069ce in gdb_do_one_event (mstimeout=-1) at /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:264
    #34 0x000056366db776a8 in start_event_loop () at /home/smarchi/src/binutils-gdb/gdb/main.c:407
    #35 0x000056366db77e4b in captured_command_loop () at /home/smarchi/src/binutils-gdb/gdb/main.c:471
    #36 0x000056366db7d3d4 in captured_main (data=0x7ffe481b6780) at /home/smarchi/src/binutils-gdb/gdb/main.c:1324
    #37 0x000056366db7d511 in gdb_main (During symbol reading: const value length mismatch for 'sigall_set', got 8, expected 0
    args=0x7ffe481b6780) at /home/smarchi/src/binutils-gdb/gdb/main.c:1343
    #38 0x000056366bbf0703 in main (argc=5, argv=0x7ffe481b68f8) at /home/smarchi/src/binutils-gdb/gdb/gdb.c:39

If an exception gets thrown at that maybe_quit, I guess it goes
directly through that stack to start_event_loop (ignoring the special
readline handling)?  So, we end up with an half-started inferior?
So, should run_command_1 suppress QUITs during all its execution?

Simon
  
Tom de Vries Nov. 28, 2023, 3:22 p.m. UTC | #12
On 11/27/23 17:29, Simon Marchi wrote:
> On 11/27/23 05:19, Tom de Vries wrote:
>> On 11/22/23 13:44, Tom de Vries wrote:
>>> Hmm, looking at this backtrace, it seems that the forced_quit is thrown during update_global_location_list.
>>>
>>> Removing the QUIT from target_write_with_progress makes the test-case pass, but the impact of that is likely to broad.
>>>
>>> This patch removes the QUIT from default_memory_remove_breakpoint only, which also fixes the test-case.
>>>
>>> Is this an acceptible approach?
>>
>> This uses the same approach, but implements it in a less intrusive way, by adding a suppress_quit variable.
>>
>> Any comments about this approach?
>>
>> Thanks,
>> - Tom
>>
>> diff --git a/gdb/mem-break.c b/gdb/mem-break.c
>> index abff4b47081..ce61e055945 100644
>> --- a/gdb/mem-break.c
>> +++ b/gdb/mem-break.c
>> @@ -71,6 +71,9 @@ default_memory_insert_breakpoint (struct gdbarch *gdbarch,
>>     return val;
>>   }
>>
>> +extern bool suppress_quit;
>> +#define SCOPED_SUPPRESS_QUIT \
>> +  scoped_restore save_suppress_quit = make_scoped_restore (&suppress_quit, true)
>>
>>   int
>>   default_memory_remove_breakpoint (struct gdbarch *gdbarch,
>> @@ -80,6 +83,7 @@ default_memory_remove_breakpoint (struct gdbarch *gdbarch,
>>
>>     gdbarch_sw_breakpoint_from_kind (gdbarch, bp_tgt->kind, &bplen);
>>
>> +  SCOPED_SUPPRESS_QUIT;
>>     return target_write_raw_memory (bp_tgt->placed_address, bp_tgt->shadow_contents,
>>                     bplen);
>>   }
>> diff --git a/gdb/utils.c b/gdb/utils.c
>> index 7a1841ba21e..b5e6d792383 100644
>> --- a/gdb/utils.c
>> +++ b/gdb/utils.c
>> @@ -654,11 +654,17 @@ quit (void)
>>   #endif
>>   }
>>
>> +extern bool suppress_quit;
>> +bool suppress_quit;
>> +
>>   /* See defs.h.  */
>>
>>   void
>>   maybe_quit (void)
>>   {
>> +  if (suppress_quit)
>> +    return;
>> +
>>     if (!is_main_thread ())
>>       return;
>>
>>
> 
> I believe that something like this (suppress_quit) would be good to
> have.  As you said, removing some QUITs from functions that do memory
> reads and writes would be too broad.  In some cases, you want those to
> be interruptible, and in other cases you don't.
> 
> For instance, you ask GDB to print a lot of memory, it's taking too
> long, you want to stop.  There's not risk in stopping that operation in
> the middle.  Then let's say you ask GDB to write a big object in memory,
> it's taking too long, you want to interrupt it.  It might leave you with
> a half-written object, but you know the risks of interrupting your write
> operation, that's fine.

I've submitted a v2 ( 
https://sourceware.org/pipermail/gdb-patches/2023-November/204583.html ) 
suppressing QUIT during delete_breakpoint.

> On the other hand, I suspect that any memory read / write used to
> implement GDB's internal logic should be considered by default unsafe to
> interrupt.  Otherwise, that code should be written in a way that ensures
> that should exceptions get thrown, the internal state is left in a
> coherent... state, either by rolling back changes or otherwise.  That
> sounds difficult, if not impossible.  So in those cases, I think the
> only safe thing to do is suppress QUITs.
> 
> The question is, how to decide where QUITs should be suppressed.  Here's
> a random backtrace I just grabbed from a maybe_quit call under an
> update_global_location_list call:
> 
>      #0  maybe_quit () at /home/smarchi/src/binutils-gdb/gdb/utils.c:662
>      #1  0x000056366ecaa8ee in target_read (ops=0x56367c2f4240 <the_amd64_linux_nat_target>, object=TARGET_OBJECT_MEMORY, annex=0x0, buf=0x7ffe481b40e0 "HA\033H\376\177", offset=93824992235889,
>          len=1) at /home/smarchi/src/binutils-gdb/gdb/target.c:1972
>      #2  0x000056366eca95a0 in target_read_memory (memaddr=0x555555555171, myaddr=0x7ffe481b40e0 "HA\033H\376\177", len=1) at /home/smarchi/src/binutils-gdb/gdb/target.c:1755
>      #3  0x000056366dc0f6be in default_memory_insert_breakpoint (gdbarch=0x61c00001d080, bp_tgt=0x616000073f28) at /home/smarchi/src/binutils-gdb/gdb/mem-break.c:54
>      #4  0x000056366c17574a in gdbarch_memory_insert_breakpoint (gdbarch=0x61c00001d080, bp_tgt=0x616000073f28) at /home/smarchi/src/binutils-gdb/gdb/gdbarch.c:2875
>      #5  0x000056366dc0fb94 in memory_insert_breakpoint (ops=0x56367c2f4240 <the_amd64_linux_nat_target>, gdbarch=0x61c00001d080, bp_tgt=0x616000073f28)
>          at /home/smarchi/src/binutils-gdb/gdb/mem-break.c:92
>      #6  0x000056366c0067d5 in memory_breakpoint_target<process_stratum_target>::insert_breakpoint (this=0x56367c2f4240 <the_amd64_linux_nat_target>, gdbarch=0x61c00001d080,
>          bp_tgt=0x616000073f28) at /home/smarchi/src/binutils-gdb/gdb/target.h:2435
>      #7  0x000056366ecacf34 in target_insert_breakpoint (gdbarch=0x61c00001d080, bp_tgt=0x616000073f28) at /home/smarchi/src/binutils-gdb/gdb/target.c:2359
>      #8  0x000056366c52ebd5 in code_breakpoint::insert_location (this=0x611000045dc0, bl=0x616000073e80) at /home/smarchi/src/binutils-gdb/gdb/breakpoint.c:11977
>      #9  0x000056366c477a64 in insert_bp_location (bl=0x616000073e80, tmp_error_stream=0x7ffe481b48f0, disabled_breaks=0x7ffe481b4700, hw_breakpoint_error=0x7ffe481b4710,
>          hw_bp_error_explained_already=0x7ffe481b4720) at /home/smarchi/src/binutils-gdb/gdb/breakpoint.c:2901
>      #10 0x000056366c481f9c in insert_breakpoint_locations () at /home/smarchi/src/binutils-gdb/gdb/breakpoint.c:3291
>      #11 0x000056366c529ae4 in update_global_location_list (insert_mode=UGLL_MAY_INSERT) at /home/smarchi/src/binutils-gdb/gdb/breakpoint.c:11720
>      #12 0x000056366c54998d in breakpoint_re_set () at /home/smarchi/src/binutils-gdb/gdb/breakpoint.c:13271
>      #13 0x000056366deccb27 in objfile_relocate (objfile=0x614000005440, new_offsets=std::__debug::vector of length 38, capacity 38 = {...}) at /home/smarchi/src/binutils-gdb/gdb/objfiles.c:707
>      #14 0x000056366e8aaa6b in svr4_relocate_main_executable () at /home/smarchi/src/binutils-gdb/gdb/solib-svr4.c:3004
>      #15 0x000056366e8aaefe in svr4_solib_create_inferior_hook (from_tty=0) at /home/smarchi/src/binutils-gdb/gdb/solib-svr4.c:3048
>      #16 0x000056366e904f3c in solib_create_inferior_hook (from_tty=0) at /home/smarchi/src/binutils-gdb/gdb/solib.c:1217
>      #17 0x000056366d717f57 in post_create_inferior (from_tty=0) at /home/smarchi/src/binutils-gdb/gdb/infcmd.c:275
>      #18 0x000056366d71a14e in run_command_1 (args=0x0, from_tty=1, run_how=RUN_NORMAL) at /home/smarchi/src/binutils-gdb/gdb/infcmd.c:486
>      #19 0x000056366d71a5b2 in run_command (args=0x0, from_tty=1) at /home/smarchi/src/binutils-gdb/gdb/infcmd.c:512
>      #20 0x000056366c8d66d5 in do_simple_func (During symbol reading: macro `CTRL' redefined at /usr/include/x86_64-linux-gnu/sys/ttydefaults.h:55; original definition at /usr/include/readline/chardefs.h:51
>      args=0x0, from_tty=1, c=0x61200005bcc0) at /home/smarchi/src/binutils-gdb/gdb/cli/cli-decode.c:95
>      #21 0x000056366c8ebe3b in cmd_func (cmd=0x61200005bcc0, args=0x0, from_tty=1) at /home/smarchi/src/binutils-gdb/gdb/cli/cli-decode.c:2735
>      #22 0x000056366ee07837 in execute_command (p=0x6020000393d1 "", from_tty=1) at /home/smarchi/src/binutils-gdb/gdb/top.c:575
>      #23 0x000056366d2e9cdb in command_handler (command=0x6020000393d0 "") at /home/smarchi/src/binutils-gdb/gdb/event-top.c:566
>      #24 0x000056366d2eafd1 in command_line_handler (rl=...) at /home/smarchi/src/binutils-gdb/gdb/event-top.c:802
>      #25 0x000056366ef64bdb in tui_command_line_handler (rl=...) at /home/smarchi/src/binutils-gdb/gdb/tui/tui-interp.c:104
>      #26 0x000056366d2e7d40 in gdb_rl_callback_handler (rl=0x602000039370 "r") at /home/smarchi/src/binutils-gdb/gdb/event-top.c:259
>      #27 0x00007f3ad551246d in rl_callback_read_char () from /lib/x86_64-linux-gnu/libreadline.so.8
>      #28 0x000056366d2e7764 in gdb_rl_callback_read_char_wrapper_noexcept () at /home/smarchi/src/binutils-gdb/gdb/event-top.c:195
>      #29 0x000056366d2e7990 in gdb_rl_callback_read_char_wrapper (client_data=0x611000001440) at /home/smarchi/src/binutils-gdb/gdb/event-top.c:234
>      #30 0x000056366f0b730c in stdin_event_handler (error=0, client_data=0x611000001440) at /home/smarchi/src/binutils-gdb/gdb/ui.c:155
>      #31 0x0000563670708345 in handle_file_event (file_ptr=0x60700000dd10, ready_mask=1) at /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:573
>      #32 0x0000563670708c58 in gdb_wait_for_event (block=1) at /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:694
>      #33 0x00005636707069ce in gdb_do_one_event (mstimeout=-1) at /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:264
>      #34 0x000056366db776a8 in start_event_loop () at /home/smarchi/src/binutils-gdb/gdb/main.c:407
>      #35 0x000056366db77e4b in captured_command_loop () at /home/smarchi/src/binutils-gdb/gdb/main.c:471
>      #36 0x000056366db7d3d4 in captured_main (data=0x7ffe481b6780) at /home/smarchi/src/binutils-gdb/gdb/main.c:1324
>      #37 0x000056366db7d511 in gdb_main (During symbol reading: const value length mismatch for 'sigall_set', got 8, expected 0
>      args=0x7ffe481b6780) at /home/smarchi/src/binutils-gdb/gdb/main.c:1343
>      #38 0x000056366bbf0703 in main (argc=5, argv=0x7ffe481b68f8) at /home/smarchi/src/binutils-gdb/gdb/gdb.c:39
> 
> If an exception gets thrown at that maybe_quit, I guess it goes
> directly through that stack to start_event_loop (ignoring the special
> readline handling)?  So, we end up with an half-started inferior?
> So, should run_command_1 suppress QUITs during all its execution?

I don't have a good answer for this.  I suppose each function is 
responsible for handling cleanup, and if each function in the callstack 
handles that appropriately, then perhaps there's no need to suppress 
QUIT during run_command_1?

Thanks,
- Tom
  
Tom Tromey Nov. 28, 2023, 3:30 p.m. UTC | #13
>>>>> "Simon" == Simon Marchi <simark@simark.ca> writes:

Simon> On the other hand, I suspect that any memory read / write used to
Simon> implement GDB's internal logic should be considered by default unsafe to
Simon> interrupt.  Otherwise, that code should be written in a way that ensures
Simon> that should exceptions get thrown, the internal state is left in a
Simon> coherent... state, either by rolling back changes or otherwise.  That
Simon> sounds difficult, if not impossible.

Maybe I don't understand the context or something, because this seems
like a pretty big departure from historical practice.  In gdb,
exceptions can happen nearly anywhere and code must ordinarily be
exception-safe.

In this situation I think the issue is that some code was not, in a
subtle way.

Simon> The question is, how to decide where QUITs should be suppressed.  Here's
Simon> a random backtrace I just grabbed from a maybe_quit call under an
Simon> update_global_location_list call:

The danger with suppressing these quits is that then gdb can enter
uninterruptible states if something bad happens to the remote.  That
seems unfortunate.  At least assuming I understand correctly.

Tom
  
Tom de Vries Nov. 29, 2023, 12:08 p.m. UTC | #14
Hi,

just to mention, in case I'm the only one who can reproduce this 
assertion failure, I've attached a patch to the PR ( 
https://sourceware.org/bugzilla/show_bug.cgi?id=31061#c4 ) that 
implements software single step on amd64 sufficiently for the test-case 
(with some modifications to make sure software single step is being used 
when the SIGTERM arrives) to reproduce it.

Thanks,
- Tom
  
Tom de Vries Nov. 29, 2023, 8:46 p.m. UTC | #15
On 11/29/23 13:08, Tom de Vries wrote:
> Hi,
> 
> just to mention, in case I'm the only one who can reproduce this 
> assertion failure, I've attached a patch to the PR ( 
> https://sourceware.org/bugzilla/show_bug.cgi?id=31061#c4 ) that 
> implements software single step on amd64 sufficiently for the test-case 
> (with some modifications to make sure software single step is being used 
> when the SIGTERM arrives) to reproduce it.
> 

Now somewhat more structured, and submitted as RFC ( 
https://sourceware.org/pipermail/gdb-patches/2023-November/204624.html ).

Thanks,
- Tom
  
Tom de Vries Nov. 29, 2023, 9:33 p.m. UTC | #16
On 11/28/23 16:30, Tom Tromey wrote:
>>>>>> "Simon" == Simon Marchi <simark@simark.ca> writes:
> 
> Simon> On the other hand, I suspect that any memory read / write used to
> Simon> implement GDB's internal logic should be considered by default unsafe to
> Simon> interrupt.  Otherwise, that code should be written in a way that ensures
> Simon> that should exceptions get thrown, the internal state is left in a
> Simon> coherent... state, either by rolling back changes or otherwise.  That
> Simon> sounds difficult, if not impossible.
> 
> Maybe I don't understand the context or something, because this seems
> like a pretty big departure from historical practice.  In gdb,
> exceptions can happen nearly anywhere and code must ordinarily be
> exception-safe.
> 
> In this situation I think the issue is that some code was not, in a
> subtle way.
> 

The problem is that the function delete_breakpoint is first interrupted 
by a forced_quit (triggered by a SIGTERM), and then called again during 
the cleanup that happens while handling the forced_quit.

So, should we:
- suppress the force_quit during delete_breakpoint, to make sure it
   finishes and won't be called again?
- make delete_breakpoint handle being called twice?
   This seems to be the current strategy, which is broken but can be
   fixed using is_linked.
- make sure delete_breakpoint bails out early if it's called the second
   time, by adding a SCOPE_EXIT { bpt->type = bp_none }?

I'm not sure which approach is better.

Thanks,
- Tom

> Simon> The question is, how to decide where QUITs should be suppressed.  Here's
> Simon> a random backtrace I just grabbed from a maybe_quit call under an
> Simon> update_global_location_list call:
> 
> The danger with suppressing these quits is that then gdb can enter
> uninterruptible states if something bad happens to the remote.  That
> seems unfortunate.  At least assuming I understand correctly.
> 
> Tom
  
Simon Marchi Nov. 30, 2023, 5:08 p.m. UTC | #17
On 11/28/23 10:30, Tom Tromey wrote:
>>>>>> "Simon" == Simon Marchi <simark@simark.ca> writes:
> 
> Simon> On the other hand, I suspect that any memory read / write used to
> Simon> implement GDB's internal logic should be considered by default unsafe to
> Simon> interrupt.  Otherwise, that code should be written in a way that ensures
> Simon> that should exceptions get thrown, the internal state is left in a
> Simon> coherent... state, either by rolling back changes or otherwise.  That
> Simon> sounds difficult, if not impossible.
> 
> Maybe I don't understand the context or something, because this seems
> like a pretty big departure from historical practice.  In gdb,
> exceptions can happen nearly anywhere and code must ordinarily be
> exception-safe.

Thanks for chiming in, I'm learning about this as we go.

It's true that errors can happen nearly everywhere, in the case of
remote debugging, every time we try to send or receive something, so
it's true that pretty much all code should be written in a way that can
deal with exceptions.  I think there's a difference between exceptions
from errors (like the remote connection was broken) and exceptions from
ctrl-Cs.  In the case of severe errors, we don't expect to be able to
continue debugging after that.  The goal would be to leave without
crashing, leaving the state in a good enough shape so that it can be
cleaned up later when the target is unpushed, inferiors mourned, etc.
In the case of ctrl-Cs, the bar is higher: we would need to leave a
coherent state that allows the user to keep debugging.  That's
impossible, I think.

And looking a bit more closely at the backtrace shown earlier I noticed
that it's not a user ctrl-c interruption, but a "forced quit", resulting
in a gdb_exception_forced_quit, not gdb_exception_quit (should have
caught this earlier, but I'm not that familiar with this area).  I guess
that's the consequence of receiving a SIGTERM?

So, actually I see that QUITs of the ctrl-c kind, throwing
gdb_exception_quit, are already disabled during handle_inferior_event,
since:

  Don't throw quit while handling inferior events
  https://gitlab.com/gnutools/binutils-gdb/-/commit/0ace6ace1bfc2982f62ec684cdb26de64aec3366

So that's good.

But apparently, QUITs of the "received SIGTERM" kind, throwing
gdb_exception_forced_quit, are not disabled during
handle_inferior_event.  Perhaps they should be disabled too?  If
receiving a SIGTERM, we want GDB to exit relatively quickly.  But even
if we're going to exit soon, it's not like we can abort an event
handling in the middle: we still need the internal state of GDB to be
coherent, because we're going to either kill or detach inferiors before
exiting - two operations that need the internal state to be in good
shapre.

So, if receiving a SIGTERM while in the middle of handle_inferior_event,
it would be safer to finish handling the current event and process the
"forced quit" before going back to the event loop (like we do for
SIGINTs during handle_inferior_event).  But perhaps _some_ operations
during event handling are safe to be interrupted following a SIGTERM.
Like, if the event handling causes some DWARF symtabs to be expanded,
and that takes a lot of time, it might be safe to interrupt the
expansion of the symtab, but not the whole event handling.  We're going
to exit, that debug info is probably not going to be useful, and it
shouldn't impact the killing or detaching operations.

> Simon> The question is, how to decide where QUITs should be suppressed.  Here's
> Simon> a random backtrace I just grabbed from a maybe_quit call under an
> Simon> update_global_location_list call:
> 
> The danger with suppressing these quits is that then gdb can enter
> uninterruptible states if something bad happens to the remote.  That
> seems unfortunate.  At least assuming I understand correctly.

I think the remote target has some features to mitigate that? When doing
I/O, it installs its own quit handler that allows the user to interrupt
things (remote_target::remote_serial_quit_handler).

So my understanding is that it goes like this:

 - fetch_inferior_event installs infrun_quit_handler as the quit
   handler, effectively disabling ctrl-c exceptions.
 - If, while handling the event, the remote target needs to do some
   I/O, installs its own quit handler that allows things to be
   interrupted

If the user interrupts things in that state, an exception is thrown
during event handling, but it is a gdb_exception_error, not
gdb_exception_quit.  So that falls in the "severe error" category where
we just want to tear things down without crashing, not in the user
interrupt category where we would expect to keep debugging.

Simon
  

Patch

diff --git a/gdb/breakpoint.c b/gdb/breakpoint.c
index fe09dbcbeee..0a58821fc00 100644
--- a/gdb/breakpoint.c
+++ b/gdb/breakpoint.c
@@ -12633,7 +12633,8 @@  delete_breakpoint (struct breakpoint *bpt)
   if (bpt->number)
     notify_breakpoint_deleted (bpt);
 
-  breakpoint_chain.erase (breakpoint_chain.iterator_to (*bpt));
+  if (bpt->is_linked ())
+    breakpoint_chain.erase (breakpoint_chain.iterator_to (*bpt));
 
   /* Be sure no bpstat's are pointing at the breakpoint after it's
      been freed.  */