[v2] Fix assert in delete_breakpoint

Message ID 20231128150121.23760-1-tdevries@suse.de
State New
Headers
Series [v2] Fix assert in delete_breakpoint |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Testing passed

Commit Message

Tom de Vries Nov. 28, 2023, 3:01 p.m. UTC
  On a pinebook (aarch64 with 64-bit kernel and 32-bit userland) with test-case
gdb.base/gdb-sigterm.exp I run into:
...
intrusive_list.h:458: internal-error: erase_element: \
  Assertion `elem_node->prev != INTRUSIVE_LIST_UNLINKED_VALUE' failed.^M
...
which happens while executing this statement in delete_breakpoint:
...
  breakpoint_chain.erase (breakpoint_chain.iterator_to (*bpt));
...

The following events lead to the assertion failure:
- a single-step breakpoint is hit,
- delete_just_stopped_threads_single_step_breakpoints is called,
- during delete_just_stopped_threads_single_step_breakpoints,
  delete_breakpoint is called,
- breakpoint_chain.erase is called,
- gdb is interrupted by SIGTERM before finishing delete_breakpoint.
  The interrupt happens due to a QUIT during a target_write_with_progress
  call, which is called during default_memory_remove_breakpoint,
- the SIGTERM triggers a SCOPE_EXIT cleanup, calling
  delete_just_stopped_threads_infrun_breakpoints which ends up
  calling delete_breakpoint again for the same breakpoint, and
- breakpoint_chain.erase is called the second time, and the assert triggers.

There is an ad-hoc reentrancy control mechanism in delete_breakpoint:
...
void
delete_breakpoint (struct breakpoint *bpt)
{
  ....
  if (bpt->type == bp_none)
    return;
  ...
  bpt->type = bp_none;
  ...
}
...
which partitions up the function into three parts:
- an initial part that is always executed,
- a second part that can be executed more than once per breakpoint, and
- a third part that can be executed only once per breakpoint.

One way of looking at the assertion failure is that commit e2a1578868a ("gdb:
link breakpoints with intrusive_list") introduced code in the second part that
cannot be executed more than once, which can be fixed by guarding the
breakpoint_chain.erase call with is_linked.

Another way of looking at it is to ask the question: if we're so eager to
delete the breakpoint that we call it as a SCOPE_EXIT cleanup, why do we allow
it to be interrupted in the first place?

Fix this by suppressing QUIT during delete_breakpoint.

Tested on the pinebook and x86_64-linux.

PR gdb/31061
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31061
---
 gdb/breakpoint.c |  3 +++
 gdb/utils.c      | 45 +++++++++++++++++++++++++++++++++++++++++++++
 gdb/utils.h      | 21 +++++++++++++++++++++
 3 files changed, 69 insertions(+)


base-commit: 14414227bfac8ef1803715b3b642f8ba0ab6fff8
  

Comments

Thiago Jung Bauermann Jan. 2, 2024, 5 p.m. UTC | #1
Hello,

Nice patch. As the discussion in v1 shows, QUIT handling has some
subtleties, so I don't think a Reviewed-by from me would be very
meaningful. I just have one small nit, if you agree with it:

Tom de Vries <tdevries@suse.de> writes:

> +scoped_suppress_quit::~scoped_suppress_quit () noexcept(false)
> +{
> +  suppress_quit_enabled_cnt--;
> +  gdb_assert (suppress_quit_enabled_cnt >= 0);
> +
> +  if (suppress_quit_enabled_cnt != 0)
> +    return;
> +
> +  bool tmp_suppress_quit_seen = suppress_quit_seen;
> +  suppress_quit_seen = false;
> +
> +  if (!tmp_suppress_quit_seen)
> +    return;

Is the tmp variable needed? It confused me a bit and made me think
harder to see if it was addressing some concern I wasn't aware of. But I
couldn't think of anything.

IMHO the code would be more straightforward if written as:

     if (!suppress_quit_seen)
       return;

     suppress_quit_seen = false;

> +
> +  /* A QUIT was suppressed, unsuppress it here.  This may throw.  */
> +  QUIT;
> +}
  

Patch

diff --git a/gdb/breakpoint.c b/gdb/breakpoint.c
index 699919e32b3..768b92676f8 100644
--- a/gdb/breakpoint.c
+++ b/gdb/breakpoint.c
@@ -12582,6 +12582,9 @@  notify_breakpoint_deleted (breakpoint *b)
 void
 delete_breakpoint (struct breakpoint *bpt)
 {
+  /* Make sure that the function cannot be interrupted by QUIT.  */
+  scoped_suppress_quit do_scoped_suppress_quit;
+
   gdb_assert (bpt != NULL);
 
   /* Has this bp already been deleted?  This can happen because
diff --git a/gdb/utils.c b/gdb/utils.c
index 7a1841ba21e..405f5de29c4 100644
--- a/gdb/utils.c
+++ b/gdb/utils.c
@@ -654,6 +654,48 @@  quit (void)
 #endif
 }
 
+/* See utils.h.  */
+
+int scoped_suppress_quit::suppress_quit_enabled_cnt;
+bool scoped_suppress_quit::suppress_quit_seen;
+
+/* See utils.h.  */
+
+scoped_suppress_quit::scoped_suppress_quit ()
+{
+  gdb_assert (is_main_thread ());
+  suppress_quit_enabled_cnt++;
+}
+
+/* See utils.h.  */
+
+scoped_suppress_quit::~scoped_suppress_quit () noexcept(false)
+{
+  suppress_quit_enabled_cnt--;
+  gdb_assert (suppress_quit_enabled_cnt >= 0);
+
+  if (suppress_quit_enabled_cnt != 0)
+    return;
+
+  bool tmp_suppress_quit_seen = suppress_quit_seen;
+  suppress_quit_seen = false;
+
+  if (!tmp_suppress_quit_seen)
+    return;
+
+  /* A QUIT was suppressed, unsuppress it here.  This may throw.  */
+  QUIT;
+}
+
+/* See utils.h.  */
+
+bool
+scoped_suppress_quit::suppress_quit_enabled ()
+{
+  suppress_quit_seen = true;
+  return suppress_quit_enabled_cnt > 0;
+}
+
 /* See defs.h.  */
 
 void
@@ -662,6 +704,9 @@  maybe_quit (void)
   if (!is_main_thread ())
     return;
 
+  if (scoped_suppress_quit::suppress_quit_enabled ())
+    return;
+
   if (sync_quit_force_run)
     quit ();
 
diff --git a/gdb/utils.h b/gdb/utils.h
index f646b300530..03a76a035b4 100644
--- a/gdb/utils.h
+++ b/gdb/utils.h
@@ -425,4 +425,25 @@  struct deferred_warnings
   std::vector<string_file> m_warnings;
 };
 
+/* RAII-style class to suppress the effect of calling QUIT until scope exit.  */
+
+class scoped_suppress_quit
+{
+public:
+
+  scoped_suppress_quit ();
+  ~scoped_suppress_quit () noexcept(false);
+
+  /* Whether suppression is currently active.  */
+  static bool suppress_quit_enabled ();
+
+private:
+
+  /* Nesting level.  */
+  static int suppress_quit_enabled_cnt;
+
+  /* Whether QUIT was called during suppression.  */
+  static bool suppress_quit_seen;
+};
+
 #endif /* UTILS_H */