[rfc] PR 20569, segv in follow_exec

Message ID 57F6D57D.8070603@codesourcery.com
State New, archived
Headers

Commit Message

Sandra Loosemore Oct. 6, 2016, 10:51 p.m. UTC
  As I noted in PR20569, several exec-related tests cause GDB to segv when 
sysroot translation fails on the executable pathname reported by 
gdbserver.  The immediate cause of the segv is that follow_exec is 
passing a NULL argument (the result of exec_file_find) to strlen, but as 
I looked at the code in more detail it seemed like follow_exec simply 
isn't prepared for the case where sysroot translation fails to locate 
the new executable, and there is no obvious recovery strategy.

I thought I could copy logic from the other caller of exec_file_find, 
exec_file_locate_attach, but I think it's doing the wrong thing there as 
well.  Plus, from reading the code I found other bugs in both callers of 
exec_file_find due to confusion between host and target pathnames.

The attached patch attempts to fix all the bugs.  In terms of the 
testcases that were formerly segv'ing, GDB now prints a warning but 
continues execution of the new program, so that the tests now mostly 
FAIL instead.  You could argue the FAILs are due to a legitimate problem 
with the test environment setting up the sysroot translation 
incorrectly, but I'm not sure continuing execution rather than leaving 
the target stopped is the most user-friendly fallback behavior, either. 
  E.g. the GDB manual suggests that you can set a breakpoint on main and 
GDB will stop on main of the newly exec'ed program, too, but it can't do 
that if it can't find the symbol information, and there's no way for the 
user to specify the executable file to GDB explicitly before it resumes 
execution.  But, seemed better not to complicate this 
already-too-complicated patch further by trying to address that in the 
first cut.

Comments?  Suggestions?  Etc.

-Sandra
  

Comments

Luis Machado Oct. 18, 2016, 6:11 p.m. UTC | #1
On 10/06/2016 05:51 PM, Sandra Loosemore wrote:
> As I noted in PR20569, several exec-related tests cause GDB to segv when
> sysroot translation fails on the executable pathname reported by
> gdbserver.  The immediate cause of the segv is that follow_exec is
> passing a NULL argument (the result of exec_file_find) to strlen, but as
> I looked at the code in more detail it seemed like follow_exec simply
> isn't prepared for the case where sysroot translation fails to locate
> the new executable, and there is no obvious recovery strategy.
>
> I thought I could copy logic from the other caller of exec_file_find,
> exec_file_locate_attach, but I think it's doing the wrong thing there as
> well.  Plus, from reading the code I found other bugs in both callers of
> exec_file_find due to confusion between host and target pathnames.
>
> The attached patch attempts to fix all the bugs.  In terms of the
> testcases that were formerly segv'ing, GDB now prints a warning but
> continues execution of the new program, so that the tests now mostly
> FAIL instead.  You could argue the FAILs are due to a legitimate problem
> with the test environment setting up the sysroot translation
> incorrectly, but I'm not sure continuing execution rather than leaving
> the target stopped is the most user-friendly fallback behavior, either.
>  E.g. the GDB manual suggests that you can set a breakpoint on main and
> GDB will stop on main of the newly exec'ed program, too, but it can't do
> that if it can't find the symbol information, and there's no way for the
> user to specify the executable file to GDB explicitly before it resumes
> execution.  But, seemed better not to complicate this
> already-too-complicated patch further by trying to address that in the
> first cut.
>
> Comments?  Suggestions?  Etc.
>
> -Sandra
>
>
> gdb-segv2.log
>
>
> 2016-10-06  Sandra Loosemore  <sandra@codesourcery.com>
>
> 	PR gdb/20569
> 	gdb/
> 	* exceptions.c (exception_print_same): Moved here from exec.c.
> 	Fixed message comparison.
> 	* exceptions.h (exception_print_same): Declare.
> 	* exec_file_locate_attach (exception_print_same): Delete copy here.
> 	(exec_file_locate_attach): Rename exec_file and full_exec_path
> 	variables to avoid confusion between target and host pathnames.
> 	Move pathname processing logic to exec_file_find.  Do not return
> 	early if pathname lookup fails;	guard symbol_file_add_main call
> 	instead.
> 	* infrun.c (follow_exec): Split and rename execd_pathname variable
> 	to avoid confusion between target and host pathnames.  Replace
> 	brokenpathname copy with cleanup to free malloc'ed string.  Warn
> 	if pathname lookup fails.  Pass target pathname to
> 	target_follow_exec, not hostpathname.  Borrow exception-handling
> 	logic from exec_file_locate_attach.
> 	* solib.c (exec_file_find): Incorporate fallback logic for relative
> 	pathnames formerly in exec_file_locate_attach.
>
>
> gdb-segv2.patch
>
>

I went through the patch and, although this code as a whole is a bit on 
the convoluted side, it looks reasonable to me.

Segfaults are not supposed to happen, so allowing the session to 
continue is a good thing IMO.

Sounds like a good candidate for master and even stable branches.
  
Pedro Alves Oct. 19, 2016, 1:37 p.m. UTC | #2
On 10/18/2016 07:11 PM, Luis Machado wrote:

> I went through the patch and, although this code as a whole is a bit on
> the convoluted side, it looks reasonable to me.
> 
> Segfaults are not supposed to happen, so allowing the session to
> continue is a good thing IMO.
> 
> Sounds like a good candidate for master and even stable branches.

I didn't look at the patch in detail yet, but I think it'd be
very good to have tests?

Thanks,
Pedro Alves
  
Luis Machado Oct. 19, 2016, 4:14 p.m. UTC | #3
On 10/19/2016 08:37 AM, Pedro Alves wrote:
> On 10/18/2016 07:11 PM, Luis Machado wrote:
>
>> I went through the patch and, although this code as a whole is a bit on
>> the convoluted side, it looks reasonable to me.
>>
>> Segfaults are not supposed to happen, so allowing the session to
>> continue is a good thing IMO.
>>
>> Sounds like a good candidate for master and even stable branches.
>
> I didn't look at the patch in detail yet, but I think it'd be
> very good to have tests?
>
> Thanks,
> Pedro Alves
>

I fixed a gotcha with the patch and i have a reproducer that makes GDB 
crash on x86-64. I'll craft a test.
  
Luis Machado Oct. 19, 2016, 8:19 p.m. UTC | #4
On 10/19/2016 11:14 AM, Luis Machado wrote:
> On 10/19/2016 08:37 AM, Pedro Alves wrote:
>> On 10/18/2016 07:11 PM, Luis Machado wrote:
>>
>>> I went through the patch and, although this code as a whole is a bit on
>>> the convoluted side, it looks reasonable to me.
>>>
>>> Segfaults are not supposed to happen, so allowing the session to
>>> continue is a good thing IMO.
>>>
>>> Sounds like a good candidate for master and even stable branches.
>>
>> I didn't look at the patch in detail yet, but I think it'd be
>> very good to have tests?
>>
>> Thanks,
>> Pedro Alves
>>
>
> I fixed a gotcha with the patch and i have a reproducer that makes GDB
> crash on x86-64. I'll craft a test.
>

I was thinking of a way to test this and decided to exercise everything 
against an invalid sysroot (by always passing 'set sysroot 
<something_invalid>' and i noticed quite a few segmentation faults 
ocurring in 10+ tests.

Now we know things are broken and we know how to show that, but i'm 
wondering if we want to re-run tests with an invalid sysroot or if the 
manual testing with a sysroot override is enough.

I could add a loop to each test that is failing, but, though that 
exercises and shows the failure, it sounds like a waste of time to 
repeat those tests.

I could also pick one candidate and isolate that in a test, but i'm not 
yet sure if all those 10+ failures fail for the same exact reason.

Suggestions?

These are the failing tests:

gdb.base/catch-syscall.exp
gdb.base/execl-update-breakpoints.exp
gdb.base/foll-exec-mode.exp
gdb.base/foll-exec.exp
gdb.base/foll-vfork.exp
gdb.base/pie-execl.exp
gdb.linespec/explicit.exp
gdb.multi/bkpt-multi-exec.exp
gdb.python/py-finish-breakpoint.exp
gdb.threads/execl.exp
gdb.threads/non-ldr-exc-1.exp
gdb.threads/non-ldr-exc-2.exp
gdb.threads/non-ldr-exc-3.exp
gdb.threads/non-ldr-exc-4.exp
gdb.threads/thread-execl.exp
  
Pedro Alves Oct. 20, 2016, 11:27 p.m. UTC | #5
On 10/19/2016 09:19 PM, Luis Machado wrote:

> I was thinking of a way to test this and decided to exercise everything
> against an invalid sysroot (by always passing 'set sysroot
> <something_invalid>' and i noticed quite a few segmentation faults
> ocurring in 10+ tests.
> 
> Now we know things are broken and we know how to show that, but i'm
> wondering if we want to re-run tests with an invalid sysroot or if the
> manual testing with a sysroot override is enough.

I think we should have some representative test that always runs,
without requiring manual testing.

> 
> I could add a loop to each test that is failing, but, though that
> exercises and shows the failure, it sounds like a waste of time to
> repeat those tests.

Yeah.

> 
> I could also pick one candidate and isolate that in a test, but i'm not
> yet sure if all those 10+ failures fail for the same exact reason.
> 
> Suggestions?

I think it is sufficient to have one representative test for
each reasons (or reasons).  Whether that is a new separate testcase
or whether we reuse some existing testcase, I guess depends on how
complicated the test needs to be.  If trivial, maybe go for separate,
focused test.   If a lot of test set up is needed, e.g., to get the
inferior to the state that triggers the bug), might make sense to
reuse some existing testcase.


> 
> These are the failing tests:
> 
> gdb.base/catch-syscall.exp
> gdb.base/execl-update-breakpoints.exp
> gdb.base/foll-exec-mode.exp
> gdb.base/foll-exec.exp
> gdb.base/foll-vfork.exp
> gdb.base/pie-execl.exp
> gdb.linespec/explicit.exp
> gdb.multi/bkpt-multi-exec.exp
> gdb.python/py-finish-breakpoint.exp
> gdb.threads/execl.exp
> gdb.threads/non-ldr-exc-1.exp
> gdb.threads/non-ldr-exc-2.exp
> gdb.threads/non-ldr-exc-3.exp
> gdb.threads/non-ldr-exc-4.exp
> gdb.threads/thread-execl.exp

The obvious pattern here is that these are tests that exec.  :-)
  

Patch

diff --git a/gdb/exceptions.c b/gdb/exceptions.c
index 9a10f66..402a5af 100644
--- a/gdb/exceptions.c
+++ b/gdb/exceptions.c
@@ -256,3 +256,16 @@  catch_errors (catch_errors_ftype *func, void *func_args, char *errstring,
     return 0;
   return val;
 }
+
+/* Returns non-zero if exceptions E1 and E2 are equal.  Returns zero
+   otherwise.  */
+
+int
+exception_print_same (struct gdb_exception e1, struct gdb_exception e2)
+{
+  return (e1.reason == e2.reason
+	  && e1.error == e2.error
+	  && (e1.message == e2.message
+	      || (e1.message && e2.message
+		  && strcmp (e1.message, e2.message) == 0)));
+}
diff --git a/gdb/exceptions.h b/gdb/exceptions.h
index 5711c1a..a2d39d7 100644
--- a/gdb/exceptions.h
+++ b/gdb/exceptions.h
@@ -88,4 +88,7 @@  extern int catch_exceptions_with_msg (struct ui_out *uiout,
 typedef int (catch_errors_ftype) (void *);
 extern int catch_errors (catch_errors_ftype *, void *, char *, return_mask);
 
+/* Compare two exception objects for print equality.  */
+extern int exception_print_same (struct gdb_exception e1,
+				 struct gdb_exception e2);
 #endif
diff --git a/gdb/exec.c b/gdb/exec.c
index d858e99..b418ef3 100644
--- a/gdb/exec.c
+++ b/gdb/exec.c
@@ -139,39 +139,23 @@  exec_file_clear (int from_tty)
 /* Returns non-zero if exceptions E1 and E2 are equal.  Returns zero
    otherwise.  */
 
-static int
-exception_print_same (struct gdb_exception e1, struct gdb_exception e2)
-{
-  const char *msg1 = e1.message;
-  const char *msg2 = e2.message;
-
-  if (msg1 == NULL)
-    msg1 = "";
-  if (msg2 == NULL)
-    msg2 = "";
-
-  return (e1.reason == e2.reason
-	  && e1.error == e2.error
-	  && strcmp (e1.message, e2.message) == 0);
-}
-
 /* See gdbcore.h.  */
 
 void
 exec_file_locate_attach (int pid, int defer_bp_reset, int from_tty)
 {
-  char *exec_file, *full_exec_path = NULL;
+  char *exec_file_target, *exec_file_host = NULL;
   struct cleanup *old_chain;
   struct gdb_exception prev_err = exception_none;
 
   /* Do nothing if we already have an executable filename.  */
-  exec_file = (char *) get_exec_file (0);
-  if (exec_file != NULL)
+  exec_file_target = (char *) get_exec_file (0);
+  if (exec_file_target != NULL)
     return;
 
   /* Try to determine a filename from the process itself.  */
-  exec_file = target_pid_to_exec_file (pid);
-  if (exec_file == NULL)
+  exec_file_target = target_pid_to_exec_file (pid);
+  if (exec_file_target == NULL)
     {
       warning (_("No executable has been specified and target does not "
 		 "support\n"
@@ -180,28 +164,8 @@  exec_file_locate_attach (int pid, int defer_bp_reset, int from_tty)
       return;
     }
 
-  /* If gdb_sysroot is not empty and the discovered filename
-     is absolute then prefix the filename with gdb_sysroot.  */
-  if (*gdb_sysroot != '\0' && IS_ABSOLUTE_PATH (exec_file))
-    {
-      full_exec_path = exec_file_find (exec_file, NULL);
-      if (full_exec_path == NULL)
-	return;
-    }
-  else
-    {
-      /* It's possible we don't have a full path, but rather just a
-	 filename.  Some targets, such as HP-UX, don't provide the
-	 full path, sigh.
-
-	 Attempt to qualify the filename against the source path.
-	 (If that fails, we'll just fall back on the original
-	 filename.  Not much more we can do...)  */
-      if (!source_full_path_of (exec_file, &full_exec_path))
-	full_exec_path = xstrdup (exec_file);
-    }
-
-  old_chain = make_cleanup (xfree, full_exec_path);
+  exec_file_host = exec_file_find (exec_file_target, NULL);
+  old_chain = make_cleanup (xfree, exec_file_target);
   make_cleanup (free_current_contents, &prev_err.message);
 
   /* exec_file_attach and symbol_file_add_main may throw an error if the file
@@ -217,7 +181,9 @@  exec_file_locate_attach (int pid, int defer_bp_reset, int from_tty)
      errors/exceptions in the following code.  */
   TRY
     {
-      exec_file_attach (full_exec_path, from_tty);
+      /* We must do this step even if exec_file_host is NULL, so that
+	 exec_file_attach will clear state.  */
+      exec_file_attach (exec_file_host, from_tty);
     }
   CATCH (err, RETURN_MASK_ERROR)
     {
@@ -231,19 +197,22 @@  exec_file_locate_attach (int pid, int defer_bp_reset, int from_tty)
     }
   END_CATCH
 
-  TRY
-    {
-      if (defer_bp_reset)
-	current_inferior ()->symfile_flags |= SYMFILE_DEFER_BP_RESET;
-      symbol_file_add_main (full_exec_path, from_tty);
-    }
-  CATCH (err, RETURN_MASK_ERROR)
+  if (exec_file_host)
     {
-      if (!exception_print_same (prev_err, err))
-	warning ("%s", err.message);
-    }
-  END_CATCH
-  current_inferior ()->symfile_flags &= ~SYMFILE_DEFER_BP_RESET;
+      TRY
+	{
+	  if (defer_bp_reset)
+	    current_inferior ()->symfile_flags |= SYMFILE_DEFER_BP_RESET;
+	  symbol_file_add_main (exec_file_host, from_tty);
+	}
+      CATCH (err, RETURN_MASK_ERROR)
+	{
+	  if (!exception_print_same (prev_err, err))
+	    warning ("%s", err.message);
+	}
+      END_CATCH
+    current_inferior ()->symfile_flags &= ~SYMFILE_DEFER_BP_RESET;
+  }
 
   do_cleanups (old_chain);
 }
diff --git a/gdb/infrun.c b/gdb/infrun.c
index 2636a19..a365a6b 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -1077,15 +1077,18 @@  show_follow_exec_mode_string (struct ui_file *file, int from_tty,
   fprintf_filtered (file, _("Follow exec mode is \"%s\".\n"),  value);
 }
 
-/* EXECD_PATHNAME is assumed to be non-NULL.  */
+/* EXEC_FILE_TARGET is assumed to be non-NULL.  */
 
 static void
-follow_exec (ptid_t ptid, char *execd_pathname)
+follow_exec (ptid_t ptid, char *exec_file_target)
 {
   struct thread_info *th, *tmp;
   struct inferior *inf = current_inferior ();
   int pid = ptid_get_pid (ptid);
   ptid_t process_ptid;
+  char *exec_file_host = NULL;
+  struct cleanup *old_chain;
+  struct gdb_exception prev_err = exception_none;
 
   /* This is an exec event that we actually wish to pay attention to.
      Refresh our symbol table to the newly exec'd program, remove any
@@ -1155,7 +1158,7 @@  follow_exec (ptid_t ptid, char *execd_pathname)
   process_ptid = pid_to_ptid (pid);
   printf_unfiltered (_("%s is executing new program: %s\n"),
 		     target_pid_to_str (process_ptid),
-		     execd_pathname);
+		     exec_file_target);
 
   /* We've followed the inferior through an exec.  Therefore, the
      inferior has essentially been killed & reborn.  */
@@ -1164,14 +1167,18 @@  follow_exec (ptid_t ptid, char *execd_pathname)
 
   breakpoint_init_inferior (inf_execd);
 
-  if (*gdb_sysroot != '\0')
-    {
-      char *name = exec_file_find (execd_pathname, NULL);
+  exec_file_host = exec_file_find (exec_file_target, NULL);
+  old_chain = make_cleanup (xfree, exec_file_host);
+  make_cleanup (free_current_contents, &prev_err.message);
 
-      execd_pathname = (char *) alloca (strlen (name) + 1);
-      strcpy (execd_pathname, name);
-      xfree (name);
-    }
+  /* If we were unable to map the executable target pathname onto a host
+     pathname, tell the user that.  Otherwise GDB's subsequent behavior
+     is confusing.  Maybe it would even be better to stop at this point
+     so that the user can specify a file manually before continuing.  */
+  if (!exec_file_host)
+    warning (_("Could not load symbols for executable %s.\n"
+	       "Do you need \"set sysroot\"?"),
+	     exec_file_target);
 
   /* Reset the shared library package.  This ensures that we get a
      shlib event when the child reaches "_start", at which point the
@@ -1193,7 +1200,7 @@  follow_exec (ptid_t ptid, char *execd_pathname)
 
       inf = add_inferior_with_spaces ();
       inf->pid = pid;
-      target_follow_exec (inf, execd_pathname);
+      target_follow_exec (inf, exec_file_target);
 
       set_current_inferior (inf);
       set_current_program_space (inf->pspace);
@@ -1212,22 +1219,62 @@  follow_exec (ptid_t ptid, char *execd_pathname)
 
   gdb_assert (current_program_space == inf->pspace);
 
-  /* That a.out is now the one to use.  */
-  exec_file_attach (execd_pathname, 0);
+  /* exec_file_attach and symbol_file_add_main may throw an error if the file
+     cannot be opened either locally or remotely.
+
+     This happens for example, when the file is first found in the local
+     sysroot (above), and then disappears (a TOCTOU race), or when it doesn't
+     exist in the target filesystem, or when the file does exist, but
+     is not readable.
+
+     Even without a symbol file, the remote-based debugging session should
+     continue normally instead of ending abruptly.  Hence we catch thrown
+     errors/exceptions in the following code.  */
+  TRY
+    {
+      /* We must do this step even if exec_file_host is NULL, so that
+	 exec_file_attach will clear state.  */
+      exec_file_attach (exec_file_host, 0);
+    }
+  CATCH (err, RETURN_MASK_ERROR)
+    {
+      if (err.message != NULL)
+	warning ("%s", err.message);
+
+      prev_err = err;
+
+      /* Save message so it doesn't get trashed by the catch below.  */
+      prev_err.message = xstrdup (err.message);
+    }
+  END_CATCH
+
+  if (exec_file_host)
+    {
+      TRY
+	{
+	  symbol_file_add (exec_file_host,
+			   (inf->symfile_flags
+			    | SYMFILE_MAINLINE | SYMFILE_DEFER_BP_RESET),
+			   NULL, 0);
+	  
+	  if ((inf->symfile_flags & SYMFILE_NO_READ) == 0)
+	    set_initial_language ();
+	}
+      CATCH (err, RETURN_MASK_ERROR)
+	{
+	  if (!exception_print_same (prev_err, err))
+	    warning ("%s", err.message);
+	}
+      END_CATCH
+    }
+
+  do_cleanups (old_chain);
 
   /* SYMFILE_DEFER_BP_RESET is used as the proper displacement for PIE
      (Position Independent Executable) main symbol file will get applied by
      solib_create_inferior_hook below.  breakpoint_re_set would fail to insert
      the breakpoints with the zero displacement.  */
 
-  symbol_file_add (execd_pathname,
-		   (inf->symfile_flags
-		    | SYMFILE_MAINLINE | SYMFILE_DEFER_BP_RESET),
-		   NULL, 0);
-
-  if ((inf->symfile_flags & SYMFILE_NO_READ) == 0)
-    set_initial_language ();
-
   /* If the target can specify a description, read it.  Must do this
      after flipping to the new executable (because the target supplied
      description must be compatible with the executable's
diff --git a/gdb/solib.c b/gdb/solib.c
index 2235505..cfcbd5d 100644
--- a/gdb/solib.c
+++ b/gdb/solib.c
@@ -388,13 +388,17 @@  solib_find_1 (char *in_pathname, int *fd, int is_solib)
 char *
 exec_file_find (char *in_pathname, int *fd)
 {
-  char *result = solib_find_1 (in_pathname, fd, 0);
+  char *result;
+  const char *fskind = effective_target_file_system_kind ();
+
+  if (!in_pathname)
+    return NULL;
 
-  if (result == NULL)
+  if (*gdb_sysroot != '\0' && IS_TARGET_ABSOLUTE_PATH (fskind, in_pathname))
     {
-      const char *fskind = effective_target_file_system_kind ();
+      result = solib_find_1 (in_pathname, fd, 0);
 
-      if (fskind == file_system_kind_dos_based)
+      if (result == NULL && fskind == file_system_kind_dos_based)
 	{
 	  char *new_pathname;
 
@@ -405,6 +409,21 @@  exec_file_find (char *in_pathname, int *fd)
 	  result = solib_find_1 (new_pathname, fd, 0);
 	}
     }
+  else
+    {
+      /* It's possible we don't have a full path, but rather just a
+	 filename.  Some targets, such as HP-UX, don't provide the
+	 full path, sigh.
+
+	 Attempt to qualify the filename against the source path.
+	 (If that fails, we'll just fall back on the original
+	 filename.  Not much more we can do...)  */
+
+      if (!source_full_path_of (in_pathname, &result))
+	result = xstrdup (in_pathname);
+      if (fd != NULL)
+	*fd = -1;
+    }
 
   return result;
 }