[PATCHv4,5/7] gdb: don't clear inserted flag in disable_breakpoints_in_unloaded_shlib

Message ID e71a655a8311c66adec3d7bbbc9c194ad4d9138e.1733499582.git.aburgess@redhat.com
State New
Headers
Series disabled_by_cond fixes for breakpoints |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Build passed

Commit Message

Andrew Burgess Dec. 6, 2024, 3:42 p.m. UTC
  This commit removes the clearing of bp_location::inserted from
disable_breakpoints_in_unloaded_shlib, my claim is that this call is
not needed (any more), and with the next commit, this line actually
causes some problems.

The disable_breakpoints_in_unloaded_shlib function was added back in
2004 with commit 84acb35a5a97c, and from this first version the
function cleared the bp_location::inserted flag.  The motivation for
this is that the shared library might have already been unmapped, in
which case, a later attempt to remove the location could fail.

In 2013 a similar function disable_breakpoints_in_freed_objfile was
added.  This function also cleared bp_location::inserted for similar
reasons.  This code was added in commit:

  commit 63644780babdca3f40e1978a236b6cd78473c91b
  Date:   Tue Mar 12 11:10:18 2013 +0100

      New remove-symbol-file command.

Then in 2014 the clearing of bp_location::inserted was removed from
disable_breakpoints_in_freed_objfile in the commit:

  commit 08351840eabb44799e3d01026610420758f4fa40
  Date:   Tue Apr 22 23:19:19 2014 +0100

      Stale breakpoint instructions, spurious SIGTRAPS.

The reason that clearing the ::inserted flag was removed in this
commit is that if the disable_breakpoints_in_freed_objfile function
was called when the b/p were actually inserted, and the memory for the
associated objfile wasn't actually unmapped, then we could end up
leaving breakpoints inserted into the inferior, which leads to
spurious SIGTRAPs.

In the next commit I'll change disable_breakpoints_in_unloaded_shlib
so that all breakpoints, not just user breakpoints, will be
disabled (via shlib_disabled) when a shared library is unloaded.  This
addresses PR gdb/32079, see the next commit for a fuller justification
for this change.

The problem is that when I tested the next commit I ran into some
regressions from the gdb.base/nostdlib.exp test when run on an AArch64
GNU/Linux system where executables are compiled as PIE by default.
This test compiles a simple binary with the -nostdlib flag.

What happens is this:

  - The executable is compiled as PIE, this means that we get a
    dynamically linked executable, the dynamic linker is used to
    perform the PIE relocation, but the executable uses no other
    shared libraries.

  - When GDB starts the inferior, initially the dynamic linker is
    discovered as a shared library being used by the application, GDB
    loads in the library and its debug symbols, placing the internal
    "shlib event" breakpoints so that future shared library events can
    be tracked.

  - For the target I tested on systemtap probes were not used, instead
    GDB fell back to the old style even breakpoint.

  - As the inferior progresses, after the PIE relocation has been
    performed, the dynamic linker performs some house keeping on the
    list of shared libraries being used by the application.  During
    this process the dynamic linker is removed from the list of shared
    libraries being used by the inferior, this causes GDB see a shared
    library event, which GDB understands to mean that it should unload
    the dynamic linker from the inferior.

    I spoke with the glibc engineers at RH, and the feeling is that
    this is likely a bug (it's still being investigated).  But I don't
    think it really matters if this is a bug or not.  There are
    versions of glibc in the wild that have this behaviour, so GDB
    should (if the cost is not too great) be updated to handle this.

    Obviously after removing the dynamic linker from the list of
    shared libraries, the dynamic linker is not actually unmapped,
    that would not be possible, it's the dynamic linker that does the
    unmapping, so the dynamic linker is left mapped into the
    inferior's address space.

  - With the next patch in place all breakpoints (user and internal)
    within the dynamic linker are disabled (shlib_disabled) and
    currently marked as not inserted (bp_location::inserted flag is
    cleared).

  - Having processed the shared library event GDB then resumes the
    inferior.  As the shared library event is not a full stop of the
    inferior (i.e. we don't remove all breakpoints before handling the
    event), all of the breakpoints in the dynamic linker are still
    inserted, but are now marked as not-inserted.

  - GDB then resumes the inferior and immediately hits the breakpoint
    that is still inserted.  As GDB thinks this breakpoint is not
    inserted, this is reported to the user as a SIGTRAP.

The fix I think is just to not clear the bp_location::inserted flag in
disable_breakpoints_in_unloaded_shlib.  This will leave the breakpoint
as inserted in the case above.  GDB will now be able to successfully
resume the inferior after the shared library event (knowing there is a
breakpoint inserted GDB will step over it and continue as expected).
The next time the inferior performs a full stop the now shlib_disabled
breakpoint will be removed from the inferior we would want.

For the usual case, where a shared library is being unloaded due to
say a dlclose, the breakpoints in the library will be marked as
disabled, but will be left inserted.  The next time remove_breakpoints
is called GDB will try to remove those breakpoint locations.  If the
removal fails, as the breakpoint is marked shlib_disabled, GDB will
hide the error message from the user and just assume that the shared
library has been unmapped.  This functionality was first added in 2008
in commit 879d1e6b4674bc8.

There are two aspects to testing this change.  First whether no
clearing the ::inserted flag causes general problems.  That is tested
by running the full testsuite (I see no regressions).

Then there is the specific problem that caused me to make this
change.  That issue only occurs on AArch64, with GNU/Linux using
glibc, when the executable is compiled as PIE, and doesn't use any
shared libraries other than the dynamic linker (which can be the
gdb.base/nostdlib.exp test if run on the right system).  What I don't
know is how to recreate this setup in a more general form.

We can't use add-symbol-file/remove-symbol-file as that passes through
disable_breakpoints_in_freed_objfile instead, which the ::installed
flag is already not adjusted.

Also the bug doesn't trigger on x86 targets due to code in
handle_signal_stop which sees the inserted breakpoint, and decides
this must be a breakpoint that actually exists in the program, and
then because gdbarch_decr_pc_after_break returns non-zero for x86, GDB
steps the inferior past the breakpoint.  This is the big difference
from AArch64 where gdbarch_decr_pc_after_break returns zero, and so
the inferior gets stuck hitting the unexpectedly inserted breakpoint.
---
 gdb/breakpoint.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)
  

Patch

diff --git a/gdb/breakpoint.c b/gdb/breakpoint.c
index d0f4cbe18a9..456d5bbcc59 100644
--- a/gdb/breakpoint.c
+++ b/gdb/breakpoint.c
@@ -8102,10 +8102,21 @@  disable_breakpoints_in_unloaded_shlib (program_space *pspace, const solib &solib
 
 	  loc.shlib_disabled = 1;
 
-	  /* At this point, we cannot rely on remove_breakpoint
-	     succeeding so we must mark the breakpoint as not inserted
-	     to prevent future errors occurring in remove_breakpoints.  */
-	  loc.inserted = 0;
+	  /* At this point, we don't know whether the shared library
+	     was unmapped from the inferior or not, so leave the
+	     inserted flag alone.  We'll handle failure to uninsert
+	     quietly, in case the library was indeed unmapped.
+
+	     The test gdb.base/nostdlib.exp when run on AArch64
+	     GNU/Linux using glibc will cause the dynamic linker to be
+	     unloaded from the inferior, but the linker will never be
+	     unmapped.  Additionally, at the time the dynamic linker
+	     is unloaded the inferior will be stopped within the
+	     dynamic linker.
+
+	     If we clear the inserted flag here then GDB will fail to
+	     remove the internal breakpoints from the dynamic linker
+	     leading to unexpected SIGTRAPs.  */
 
 	  bp_modified = true;