Do not close BFDs, breaking deleted inferior shlibs

Message ID 20150216210958.GA9701@host1.jankratochvil.net
State New, archived
Headers

Commit Message

Jan Kratochvil Feb. 16, 2015, 9:09 p.m. UTC
  Hi,

------------------------------------------------------------------------------
https://bugzilla.redhat.com/show_bug.cgi?id=1170861

Ulrich Drepper 2014-12-05 19:02:47 CET

Here is a self-contained reproducer:

#include <dlfcn.h>
#include <stdlib.h>
int main() {
  system("echo 'int foo() { return foo(); }' | gcc -x c -shared -fpic -o u.so -");
  void *d = dlopen("./u.so", RTLD_LAZY);
  if (d == NULL) {
    puts(dlerror());
    return 1;
  }
  unlink("u.so");
  int (*fp)(void) = (int(*)(void)) dlsym(d, "foo");
  return fp();
}

Run this under gdb and then just execute "p $pc".  Notice that the program crashes due to a stack overrun.  If you have this error in the executable itself it'll work fine.

The problem is that the DSO goes away before gdb reads the debug info.  This happens, for instance, with gcc's JIT.

------------------------------------------------------------------------------

continue
Continuing.
Program received signal SIGUSR1, User defined signal 1.
0x0000003424e348c7 in __GI_raise (BFD: reopening .../gdb/testsuite/gdb.base/close-deleted-bfd-solib.so: No such file or directory
BFD: reopening .../gdb/testsuite/gdb.base/close-deleted-bfd-solib.so: No such file or directory
sig=10, sig@entry=<error reading variable: Can't read data for section '.eh_frame' in file '.../gdb/testsuite/gdb.base/close-deleted-bfd-solib.so'>) at ../sysdeps/unix/sysv/linux/raise.c:55
55        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) PASS: gdb.base/close-deleted-bfd.exp: continue
bt
#0  0x0000003424e348c7 in __GI_raise (sig=10) at ../sysdeps/unix/sysv/linux/raise.c:55
(gdb) FAIL: gdb.base/close-deleted-bfd.exp: bt

->

continue
Continuing.
Program received signal SIGUSR1, User defined signal 1.
0x0000003424e348c7 in __GI_raise (sig=10) at ../sysdeps/unix/sysv/linux/raise.c:55
55        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) PASS: gdb.base/close-deleted-bfd.exp: continue
bt
#0  0x0000003424e348c7 in __GI_raise (sig=10) at ../sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007ffff7dd36ee in foo () at ./gdb.base/close-deleted-bfd-lib.c:21
#2  0x000000000040071c in main () at ./gdb.base/close-deleted-bfd-main.c:31
(gdb) PASS: gdb.base/close-deleted-bfd.exp: bt

------------------------------------------------------------------------------

The bfd_cache_close_all() calls are there since:
	commit ce7d45220e4ed342d4a77fcd2f312e85e1100971
	Author: Jerome Guitton <guitton@adacore.com>
	Date:   Fri Jul 30 12:05:45 2004 +0000
	Message-ID: <20040730130834.GA3855@act-europe.fr>
	Date: Fri, 30 Jul 2004 15:08:34 +0200
	https://sourceware.org/ml/gdb-patches/2004-05/msg00678.html
	https://sourceware.org/ml/gdb-patches/2004-06/msg00008.html
	https://sourceware.org/ml/gdb-patches/2004-07/msg00439.html

While I do not know how to fix it on MS-Windows at least Linux systems do not
need to be diadvantaged by the MS-Windows problem workaround.

I have no idea if the used #ifdefs match the host systems needing this
workaround from 2004.

Additionally I believe such patch is useful for performance reasons.

No regressions on {x86_64,x86_64-m32,i686}-fedora23pre-linux-gnu.


Thanks,
Jan
gdb/ChangeLog
2015-02-16  Jan Kratochvil  <jan.kratochvil@redhat.com>

	* corefile.c (reopen_exec_file): Wrap bfd_cache_close_all call into
	MS-Windows conditional.
	* exec.c (exec_file_attach): Likewise.
	* symfile.c (symbol_file_add_with_addrs): Likewise.

gdb/testsuite/ChangeLog
2015-02-16  Jan Kratochvil  <jan.kratochvil@redhat.com>

	* gdb.base/close-deleted-bfd-lib.c: New file.
	* gdb.base/close-deleted-bfd-main.c: New file.
	* gdb.base/close-deleted-bfd.exp: New file.
  

Comments

Doug Evans Feb. 17, 2015, 7:44 a.m. UTC | #1
On Mon, Feb 16, 2015 at 1:09 PM, Jan Kratochvil
<jan.kratochvil@redhat.com> wrote:
> Hi,
>
> ------------------------------------------------------------------------------
> https://bugzilla.redhat.com/show_bug.cgi?id=1170861
>
> Ulrich Drepper 2014-12-05 19:02:47 CET
>
> Here is a self-contained reproducer:
>
> #include <dlfcn.h>
> #include <stdlib.h>
> int main() {
>   system("echo 'int foo() { return foo(); }' | gcc -x c -shared -fpic -o u.so -");
>   void *d = dlopen("./u.so", RTLD_LAZY);
>   if (d == NULL) {
>     puts(dlerror());
>     return 1;
>   }
>   unlink("u.so");
>   int (*fp)(void) = (int(*)(void)) dlsym(d, "foo");
>   return fp();
> }
>
> Run this under gdb and then just execute "p $pc".  Notice that the program crashes due to a stack overrun.  If you have this error in the executable itself it'll work fine.
>
> The problem is that the DSO goes away before gdb reads the debug info.  This happens, for instance, with gcc's JIT.
>
> ------------------------------------------------------------------------------
>
> continue
> Continuing.
> Program received signal SIGUSR1, User defined signal 1.
> 0x0000003424e348c7 in __GI_raise (BFD: reopening .../gdb/testsuite/gdb.base/close-deleted-bfd-solib.so: No such file or directory
> BFD: reopening .../gdb/testsuite/gdb.base/close-deleted-bfd-solib.so: No such file or directory
> sig=10, sig@entry=<error reading variable: Can't read data for section '.eh_frame' in file '.../gdb/testsuite/gdb.base/close-deleted-bfd-solib.so'>) at ../sysdeps/unix/sysv/linux/raise.c:55
> 55        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
> (gdb) PASS: gdb.base/close-deleted-bfd.exp: continue
> bt
> #0  0x0000003424e348c7 in __GI_raise (sig=10) at ../sysdeps/unix/sysv/linux/raise.c:55
> (gdb) FAIL: gdb.base/close-deleted-bfd.exp: bt
>
> ->
>
> continue
> Continuing.
> Program received signal SIGUSR1, User defined signal 1.
> 0x0000003424e348c7 in __GI_raise (sig=10) at ../sysdeps/unix/sysv/linux/raise.c:55
> 55        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
> (gdb) PASS: gdb.base/close-deleted-bfd.exp: continue
> bt
> #0  0x0000003424e348c7 in __GI_raise (sig=10) at ../sysdeps/unix/sysv/linux/raise.c:55
> #1  0x00007ffff7dd36ee in foo () at ./gdb.base/close-deleted-bfd-lib.c:21
> #2  0x000000000040071c in main () at ./gdb.base/close-deleted-bfd-main.c:31
> (gdb) PASS: gdb.base/close-deleted-bfd.exp: bt
>
> ------------------------------------------------------------------------------
>
> The bfd_cache_close_all() calls are there since:
>         commit ce7d45220e4ed342d4a77fcd2f312e85e1100971
>         Author: Jerome Guitton <guitton@adacore.com>
>         Date:   Fri Jul 30 12:05:45 2004 +0000
>         Message-ID: <20040730130834.GA3855@act-europe.fr>
>         Date: Fri, 30 Jul 2004 15:08:34 +0200
>         https://sourceware.org/ml/gdb-patches/2004-05/msg00678.html
>         https://sourceware.org/ml/gdb-patches/2004-06/msg00008.html
>         https://sourceware.org/ml/gdb-patches/2004-07/msg00439.html
>
> While I do not know how to fix it on MS-Windows at least Linux systems do not
> need to be diadvantaged by the MS-Windows problem workaround.
>
> I have no idea if the used #ifdefs match the host systems needing this
> workaround from 2004.
>
> Additionally I believe such patch is useful for performance reasons.
>
> No regressions on {x86_64,x86_64-m32,i686}-fedora23pre-linux-gnu.
>
>
> Thanks,
> Jan
>
> gdb/ChangeLog
> 2015-02-16  Jan Kratochvil  <jan.kratochvil@redhat.com>
>
>         * corefile.c (reopen_exec_file): Wrap bfd_cache_close_all call into
>         MS-Windows conditional.
>         * exec.c (exec_file_attach): Likewise.
>         * symfile.c (symbol_file_add_with_addrs): Likewise.

Hi.  The comments in the code for the call to bfd_cache_close_all
explain the windows situation well enough, but IIUC it is necessary
to *not* call bfd_cache_close_all here on linux, but there are no
comments in the code explaining this.
Can you add something?  Thanks.

E.g., something like

  /* It is necessary to not call bfd_cache_close_all here on
     linux because ... */

Also, IIUC there's a fragileness here that we still need to address.
AFAICT there's nothing in the bfd cache API that prevents bfd
from randomly closing the file in the future, and yet from reading
the bug report it is necessary that we do not close the file.

E.g., does gdb need to keep open, for example, every shared library
used by the inferior?
Perhaps a better way to go would be to teach gdb to handle
the file going way.
  

Patch

diff --git a/gdb/corefile.c b/gdb/corefile.c
index a042e6d..d8de0a2 100644
--- a/gdb/corefile.c
+++ b/gdb/corefile.c
@@ -148,10 +148,14 @@  reopen_exec_file (void)
   if (exec_bfd_mtime && exec_bfd_mtime != st.st_mtime)
     exec_file_attach (filename, 0);
   else
-    /* If we accessed the file since last opening it, close it now;
-       this stops GDB from holding the executable open after it
-       exits.  */
-    bfd_cache_close_all ();
+    {
+      /* If we accessed the file since last opening it, close it now;
+	 this stops GDB from holding the executable open after it
+	 exits.  */
+#if defined _WIN32 || defined __CYGWIN__
+      bfd_cache_close_all ();
+#endif
+    }
 
   do_cleanups (cleanups);
 }
diff --git a/gdb/exec.c b/gdb/exec.c
index 124074f..dba75a7 100644
--- a/gdb/exec.c
+++ b/gdb/exec.c
@@ -258,7 +258,12 @@  exec_file_attach (const char *filename, int from_tty)
 
   do_cleanups (cleanups);
 
+  // Users also complain that they can't rebuild their executable
+  // while GDB is debugging.
+#if defined _WIN32 || defined __CYGWIN__
   bfd_cache_close_all ();
+#endif
+
   observer_notify_executable_changed ();
 }
 
diff --git a/gdb/symfile.c b/gdb/symfile.c
index c2a71ec..c433be8 100644
--- a/gdb/symfile.c
+++ b/gdb/symfile.c
@@ -1238,7 +1238,12 @@  symbol_file_add_with_addrs (bfd *abfd, const char *name, int add_flags,
 
   observer_notify_new_objfile (objfile);
 
+  // Users also complain that they can't rebuild their executable
+  // while GDB is debugging.
+#if defined _WIN32 || defined __CYGWIN__
   bfd_cache_close_all ();
+#endif
+
   return (objfile);
 }
 
diff --git a/gdb/testsuite/gdb.base/close-deleted-bfd-lib.c b/gdb/testsuite/gdb.base/close-deleted-bfd-lib.c
new file mode 100644
index 0000000..782302c
--- /dev/null
+++ b/gdb/testsuite/gdb.base/close-deleted-bfd-lib.c
@@ -0,0 +1,22 @@ 
+/* Copyright 2015 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <signal.h>
+
+void
+foo (void)
+{
+  raise (SIGUSR1);
+}
diff --git a/gdb/testsuite/gdb.base/close-deleted-bfd-main.c b/gdb/testsuite/gdb.base/close-deleted-bfd-main.c
new file mode 100644
index 0000000..669a275
--- /dev/null
+++ b/gdb/testsuite/gdb.base/close-deleted-bfd-main.c
@@ -0,0 +1,34 @@ 
+/* Copyright 2015 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <dlfcn.h>
+#include <assert.h>
+#include <stddef.h>
+
+int
+main (void)
+{
+  void *handle = dlopen (SHLIB_NAME, RTLD_LAZY);
+  void (*fp) (void);
+
+  assert (handle != NULL);
+
+  fp = (void (*) (void)) dlsym (handle, "foo");
+  assert (fp != NULL); /* break-here */
+
+  fp ();
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.base/close-deleted-bfd.exp b/gdb/testsuite/gdb.base/close-deleted-bfd.exp
new file mode 100644
index 0000000..cdbbd33
--- /dev/null
+++ b/gdb/testsuite/gdb.base/close-deleted-bfd.exp
@@ -0,0 +1,47 @@ 
+# Copyright 2015 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+if { [skip_shlib_tests] || [is_remote target] } {
+    return 0
+}
+
+standard_testfile close-deleted-bfd-main.c close-deleted-bfd-lib.c
+
+set libname $testfile-solib
+set binfile_lib [standard_output_file $libname.so]
+
+if { [gdb_compile_shlib $srcdir/$subdir/$srcfile2 $binfile_lib debug] != "" } {
+    untested "Could not compile $binfile_lib."
+    return -1
+}
+
+if { [prepare_for_testing $testfile.exp $testfile $srcfile \
+      [list debug shlib_load additional_flags=-DSHLIB_NAME=\"$binfile_lib\"]] } {
+    return -1
+}
+
+if ![runto_main] then {
+    return 0
+}
+
+gdb_breakpoint [gdb_get_line_number "break-here"]
+gdb_continue_to_breakpoint "break-here" ".* break-here .*"
+
+file rename -force -- $binfile_lib $binfile_lib-moved
+
+gdb_test "continue" "\r\nProgram received signal SIGUSR1, .*"
+
+# FAIL was: BFD: reopening $$binfile_lib: No such file or directory
+gdb_test "bt" " in foo \[^\r\n\]*\r\n\[^\r\n\]* in main \[^\r\n\]*"