[rfc] PR 20569, segv in follow_exec
Commit Message
As I noted in PR20569, several exec-related tests cause GDB to segv when
sysroot translation fails on the executable pathname reported by
gdbserver. The immediate cause of the segv is that follow_exec is
passing a NULL argument (the result of exec_file_find) to strlen, but as
I looked at the code in more detail it seemed like follow_exec simply
isn't prepared for the case where sysroot translation fails to locate
the new executable, and there is no obvious recovery strategy.
I thought I could copy logic from the other caller of exec_file_find,
exec_file_locate_attach, but I think it's doing the wrong thing there as
well. Plus, from reading the code I found other bugs in both callers of
exec_file_find due to confusion between host and target pathnames.
The attached patch attempts to fix all the bugs. In terms of the
testcases that were formerly segv'ing, GDB now prints a warning but
continues execution of the new program, so that the tests now mostly
FAIL instead. You could argue the FAILs are due to a legitimate problem
with the test environment setting up the sysroot translation
incorrectly, but I'm not sure continuing execution rather than leaving
the target stopped is the most user-friendly fallback behavior, either.
E.g. the GDB manual suggests that you can set a breakpoint on main and
GDB will stop on main of the newly exec'ed program, too, but it can't do
that if it can't find the symbol information, and there's no way for the
user to specify the executable file to GDB explicitly before it resumes
execution. But, seemed better not to complicate this
already-too-complicated patch further by trying to address that in the
first cut.
Comments? Suggestions? Etc.
-Sandra
Comments
On 10/06/2016 05:51 PM, Sandra Loosemore wrote:
> As I noted in PR20569, several exec-related tests cause GDB to segv when
> sysroot translation fails on the executable pathname reported by
> gdbserver. The immediate cause of the segv is that follow_exec is
> passing a NULL argument (the result of exec_file_find) to strlen, but as
> I looked at the code in more detail it seemed like follow_exec simply
> isn't prepared for the case where sysroot translation fails to locate
> the new executable, and there is no obvious recovery strategy.
>
> I thought I could copy logic from the other caller of exec_file_find,
> exec_file_locate_attach, but I think it's doing the wrong thing there as
> well. Plus, from reading the code I found other bugs in both callers of
> exec_file_find due to confusion between host and target pathnames.
>
> The attached patch attempts to fix all the bugs. In terms of the
> testcases that were formerly segv'ing, GDB now prints a warning but
> continues execution of the new program, so that the tests now mostly
> FAIL instead. You could argue the FAILs are due to a legitimate problem
> with the test environment setting up the sysroot translation
> incorrectly, but I'm not sure continuing execution rather than leaving
> the target stopped is the most user-friendly fallback behavior, either.
> E.g. the GDB manual suggests that you can set a breakpoint on main and
> GDB will stop on main of the newly exec'ed program, too, but it can't do
> that if it can't find the symbol information, and there's no way for the
> user to specify the executable file to GDB explicitly before it resumes
> execution. But, seemed better not to complicate this
> already-too-complicated patch further by trying to address that in the
> first cut.
>
> Comments? Suggestions? Etc.
>
> -Sandra
>
>
> gdb-segv2.log
>
>
> 2016-10-06 Sandra Loosemore <sandra@codesourcery.com>
>
> PR gdb/20569
> gdb/
> * exceptions.c (exception_print_same): Moved here from exec.c.
> Fixed message comparison.
> * exceptions.h (exception_print_same): Declare.
> * exec_file_locate_attach (exception_print_same): Delete copy here.
> (exec_file_locate_attach): Rename exec_file and full_exec_path
> variables to avoid confusion between target and host pathnames.
> Move pathname processing logic to exec_file_find. Do not return
> early if pathname lookup fails; guard symbol_file_add_main call
> instead.
> * infrun.c (follow_exec): Split and rename execd_pathname variable
> to avoid confusion between target and host pathnames. Replace
> brokenpathname copy with cleanup to free malloc'ed string. Warn
> if pathname lookup fails. Pass target pathname to
> target_follow_exec, not hostpathname. Borrow exception-handling
> logic from exec_file_locate_attach.
> * solib.c (exec_file_find): Incorporate fallback logic for relative
> pathnames formerly in exec_file_locate_attach.
>
>
> gdb-segv2.patch
>
>
I went through the patch and, although this code as a whole is a bit on
the convoluted side, it looks reasonable to me.
Segfaults are not supposed to happen, so allowing the session to
continue is a good thing IMO.
Sounds like a good candidate for master and even stable branches.
On 10/18/2016 07:11 PM, Luis Machado wrote:
> I went through the patch and, although this code as a whole is a bit on
> the convoluted side, it looks reasonable to me.
>
> Segfaults are not supposed to happen, so allowing the session to
> continue is a good thing IMO.
>
> Sounds like a good candidate for master and even stable branches.
I didn't look at the patch in detail yet, but I think it'd be
very good to have tests?
Thanks,
Pedro Alves
On 10/19/2016 08:37 AM, Pedro Alves wrote:
> On 10/18/2016 07:11 PM, Luis Machado wrote:
>
>> I went through the patch and, although this code as a whole is a bit on
>> the convoluted side, it looks reasonable to me.
>>
>> Segfaults are not supposed to happen, so allowing the session to
>> continue is a good thing IMO.
>>
>> Sounds like a good candidate for master and even stable branches.
>
> I didn't look at the patch in detail yet, but I think it'd be
> very good to have tests?
>
> Thanks,
> Pedro Alves
>
I fixed a gotcha with the patch and i have a reproducer that makes GDB
crash on x86-64. I'll craft a test.
On 10/19/2016 11:14 AM, Luis Machado wrote:
> On 10/19/2016 08:37 AM, Pedro Alves wrote:
>> On 10/18/2016 07:11 PM, Luis Machado wrote:
>>
>>> I went through the patch and, although this code as a whole is a bit on
>>> the convoluted side, it looks reasonable to me.
>>>
>>> Segfaults are not supposed to happen, so allowing the session to
>>> continue is a good thing IMO.
>>>
>>> Sounds like a good candidate for master and even stable branches.
>>
>> I didn't look at the patch in detail yet, but I think it'd be
>> very good to have tests?
>>
>> Thanks,
>> Pedro Alves
>>
>
> I fixed a gotcha with the patch and i have a reproducer that makes GDB
> crash on x86-64. I'll craft a test.
>
I was thinking of a way to test this and decided to exercise everything
against an invalid sysroot (by always passing 'set sysroot
<something_invalid>' and i noticed quite a few segmentation faults
ocurring in 10+ tests.
Now we know things are broken and we know how to show that, but i'm
wondering if we want to re-run tests with an invalid sysroot or if the
manual testing with a sysroot override is enough.
I could add a loop to each test that is failing, but, though that
exercises and shows the failure, it sounds like a waste of time to
repeat those tests.
I could also pick one candidate and isolate that in a test, but i'm not
yet sure if all those 10+ failures fail for the same exact reason.
Suggestions?
These are the failing tests:
gdb.base/catch-syscall.exp
gdb.base/execl-update-breakpoints.exp
gdb.base/foll-exec-mode.exp
gdb.base/foll-exec.exp
gdb.base/foll-vfork.exp
gdb.base/pie-execl.exp
gdb.linespec/explicit.exp
gdb.multi/bkpt-multi-exec.exp
gdb.python/py-finish-breakpoint.exp
gdb.threads/execl.exp
gdb.threads/non-ldr-exc-1.exp
gdb.threads/non-ldr-exc-2.exp
gdb.threads/non-ldr-exc-3.exp
gdb.threads/non-ldr-exc-4.exp
gdb.threads/thread-execl.exp
On 10/19/2016 09:19 PM, Luis Machado wrote:
> I was thinking of a way to test this and decided to exercise everything
> against an invalid sysroot (by always passing 'set sysroot
> <something_invalid>' and i noticed quite a few segmentation faults
> ocurring in 10+ tests.
>
> Now we know things are broken and we know how to show that, but i'm
> wondering if we want to re-run tests with an invalid sysroot or if the
> manual testing with a sysroot override is enough.
I think we should have some representative test that always runs,
without requiring manual testing.
>
> I could add a loop to each test that is failing, but, though that
> exercises and shows the failure, it sounds like a waste of time to
> repeat those tests.
Yeah.
>
> I could also pick one candidate and isolate that in a test, but i'm not
> yet sure if all those 10+ failures fail for the same exact reason.
>
> Suggestions?
I think it is sufficient to have one representative test for
each reasons (or reasons). Whether that is a new separate testcase
or whether we reuse some existing testcase, I guess depends on how
complicated the test needs to be. If trivial, maybe go for separate,
focused test. If a lot of test set up is needed, e.g., to get the
inferior to the state that triggers the bug), might make sense to
reuse some existing testcase.
>
> These are the failing tests:
>
> gdb.base/catch-syscall.exp
> gdb.base/execl-update-breakpoints.exp
> gdb.base/foll-exec-mode.exp
> gdb.base/foll-exec.exp
> gdb.base/foll-vfork.exp
> gdb.base/pie-execl.exp
> gdb.linespec/explicit.exp
> gdb.multi/bkpt-multi-exec.exp
> gdb.python/py-finish-breakpoint.exp
> gdb.threads/execl.exp
> gdb.threads/non-ldr-exc-1.exp
> gdb.threads/non-ldr-exc-2.exp
> gdb.threads/non-ldr-exc-3.exp
> gdb.threads/non-ldr-exc-4.exp
> gdb.threads/thread-execl.exp
The obvious pattern here is that these are tests that exec. :-)
@@ -256,3 +256,16 @@ catch_errors (catch_errors_ftype *func, void *func_args, char *errstring,
return 0;
return val;
}
+
+/* Returns non-zero if exceptions E1 and E2 are equal. Returns zero
+ otherwise. */
+
+int
+exception_print_same (struct gdb_exception e1, struct gdb_exception e2)
+{
+ return (e1.reason == e2.reason
+ && e1.error == e2.error
+ && (e1.message == e2.message
+ || (e1.message && e2.message
+ && strcmp (e1.message, e2.message) == 0)));
+}
@@ -88,4 +88,7 @@ extern int catch_exceptions_with_msg (struct ui_out *uiout,
typedef int (catch_errors_ftype) (void *);
extern int catch_errors (catch_errors_ftype *, void *, char *, return_mask);
+/* Compare two exception objects for print equality. */
+extern int exception_print_same (struct gdb_exception e1,
+ struct gdb_exception e2);
#endif
@@ -139,39 +139,23 @@ exec_file_clear (int from_tty)
/* Returns non-zero if exceptions E1 and E2 are equal. Returns zero
otherwise. */
-static int
-exception_print_same (struct gdb_exception e1, struct gdb_exception e2)
-{
- const char *msg1 = e1.message;
- const char *msg2 = e2.message;
-
- if (msg1 == NULL)
- msg1 = "";
- if (msg2 == NULL)
- msg2 = "";
-
- return (e1.reason == e2.reason
- && e1.error == e2.error
- && strcmp (e1.message, e2.message) == 0);
-}
-
/* See gdbcore.h. */
void
exec_file_locate_attach (int pid, int defer_bp_reset, int from_tty)
{
- char *exec_file, *full_exec_path = NULL;
+ char *exec_file_target, *exec_file_host = NULL;
struct cleanup *old_chain;
struct gdb_exception prev_err = exception_none;
/* Do nothing if we already have an executable filename. */
- exec_file = (char *) get_exec_file (0);
- if (exec_file != NULL)
+ exec_file_target = (char *) get_exec_file (0);
+ if (exec_file_target != NULL)
return;
/* Try to determine a filename from the process itself. */
- exec_file = target_pid_to_exec_file (pid);
- if (exec_file == NULL)
+ exec_file_target = target_pid_to_exec_file (pid);
+ if (exec_file_target == NULL)
{
warning (_("No executable has been specified and target does not "
"support\n"
@@ -180,28 +164,8 @@ exec_file_locate_attach (int pid, int defer_bp_reset, int from_tty)
return;
}
- /* If gdb_sysroot is not empty and the discovered filename
- is absolute then prefix the filename with gdb_sysroot. */
- if (*gdb_sysroot != '\0' && IS_ABSOLUTE_PATH (exec_file))
- {
- full_exec_path = exec_file_find (exec_file, NULL);
- if (full_exec_path == NULL)
- return;
- }
- else
- {
- /* It's possible we don't have a full path, but rather just a
- filename. Some targets, such as HP-UX, don't provide the
- full path, sigh.
-
- Attempt to qualify the filename against the source path.
- (If that fails, we'll just fall back on the original
- filename. Not much more we can do...) */
- if (!source_full_path_of (exec_file, &full_exec_path))
- full_exec_path = xstrdup (exec_file);
- }
-
- old_chain = make_cleanup (xfree, full_exec_path);
+ exec_file_host = exec_file_find (exec_file_target, NULL);
+ old_chain = make_cleanup (xfree, exec_file_target);
make_cleanup (free_current_contents, &prev_err.message);
/* exec_file_attach and symbol_file_add_main may throw an error if the file
@@ -217,7 +181,9 @@ exec_file_locate_attach (int pid, int defer_bp_reset, int from_tty)
errors/exceptions in the following code. */
TRY
{
- exec_file_attach (full_exec_path, from_tty);
+ /* We must do this step even if exec_file_host is NULL, so that
+ exec_file_attach will clear state. */
+ exec_file_attach (exec_file_host, from_tty);
}
CATCH (err, RETURN_MASK_ERROR)
{
@@ -231,19 +197,22 @@ exec_file_locate_attach (int pid, int defer_bp_reset, int from_tty)
}
END_CATCH
- TRY
- {
- if (defer_bp_reset)
- current_inferior ()->symfile_flags |= SYMFILE_DEFER_BP_RESET;
- symbol_file_add_main (full_exec_path, from_tty);
- }
- CATCH (err, RETURN_MASK_ERROR)
+ if (exec_file_host)
{
- if (!exception_print_same (prev_err, err))
- warning ("%s", err.message);
- }
- END_CATCH
- current_inferior ()->symfile_flags &= ~SYMFILE_DEFER_BP_RESET;
+ TRY
+ {
+ if (defer_bp_reset)
+ current_inferior ()->symfile_flags |= SYMFILE_DEFER_BP_RESET;
+ symbol_file_add_main (exec_file_host, from_tty);
+ }
+ CATCH (err, RETURN_MASK_ERROR)
+ {
+ if (!exception_print_same (prev_err, err))
+ warning ("%s", err.message);
+ }
+ END_CATCH
+ current_inferior ()->symfile_flags &= ~SYMFILE_DEFER_BP_RESET;
+ }
do_cleanups (old_chain);
}
@@ -1077,15 +1077,18 @@ show_follow_exec_mode_string (struct ui_file *file, int from_tty,
fprintf_filtered (file, _("Follow exec mode is \"%s\".\n"), value);
}
-/* EXECD_PATHNAME is assumed to be non-NULL. */
+/* EXEC_FILE_TARGET is assumed to be non-NULL. */
static void
-follow_exec (ptid_t ptid, char *execd_pathname)
+follow_exec (ptid_t ptid, char *exec_file_target)
{
struct thread_info *th, *tmp;
struct inferior *inf = current_inferior ();
int pid = ptid_get_pid (ptid);
ptid_t process_ptid;
+ char *exec_file_host = NULL;
+ struct cleanup *old_chain;
+ struct gdb_exception prev_err = exception_none;
/* This is an exec event that we actually wish to pay attention to.
Refresh our symbol table to the newly exec'd program, remove any
@@ -1155,7 +1158,7 @@ follow_exec (ptid_t ptid, char *execd_pathname)
process_ptid = pid_to_ptid (pid);
printf_unfiltered (_("%s is executing new program: %s\n"),
target_pid_to_str (process_ptid),
- execd_pathname);
+ exec_file_target);
/* We've followed the inferior through an exec. Therefore, the
inferior has essentially been killed & reborn. */
@@ -1164,14 +1167,18 @@ follow_exec (ptid_t ptid, char *execd_pathname)
breakpoint_init_inferior (inf_execd);
- if (*gdb_sysroot != '\0')
- {
- char *name = exec_file_find (execd_pathname, NULL);
+ exec_file_host = exec_file_find (exec_file_target, NULL);
+ old_chain = make_cleanup (xfree, exec_file_host);
+ make_cleanup (free_current_contents, &prev_err.message);
- execd_pathname = (char *) alloca (strlen (name) + 1);
- strcpy (execd_pathname, name);
- xfree (name);
- }
+ /* If we were unable to map the executable target pathname onto a host
+ pathname, tell the user that. Otherwise GDB's subsequent behavior
+ is confusing. Maybe it would even be better to stop at this point
+ so that the user can specify a file manually before continuing. */
+ if (!exec_file_host)
+ warning (_("Could not load symbols for executable %s.\n"
+ "Do you need \"set sysroot\"?"),
+ exec_file_target);
/* Reset the shared library package. This ensures that we get a
shlib event when the child reaches "_start", at which point the
@@ -1193,7 +1200,7 @@ follow_exec (ptid_t ptid, char *execd_pathname)
inf = add_inferior_with_spaces ();
inf->pid = pid;
- target_follow_exec (inf, execd_pathname);
+ target_follow_exec (inf, exec_file_target);
set_current_inferior (inf);
set_current_program_space (inf->pspace);
@@ -1212,22 +1219,62 @@ follow_exec (ptid_t ptid, char *execd_pathname)
gdb_assert (current_program_space == inf->pspace);
- /* That a.out is now the one to use. */
- exec_file_attach (execd_pathname, 0);
+ /* exec_file_attach and symbol_file_add_main may throw an error if the file
+ cannot be opened either locally or remotely.
+
+ This happens for example, when the file is first found in the local
+ sysroot (above), and then disappears (a TOCTOU race), or when it doesn't
+ exist in the target filesystem, or when the file does exist, but
+ is not readable.
+
+ Even without a symbol file, the remote-based debugging session should
+ continue normally instead of ending abruptly. Hence we catch thrown
+ errors/exceptions in the following code. */
+ TRY
+ {
+ /* We must do this step even if exec_file_host is NULL, so that
+ exec_file_attach will clear state. */
+ exec_file_attach (exec_file_host, 0);
+ }
+ CATCH (err, RETURN_MASK_ERROR)
+ {
+ if (err.message != NULL)
+ warning ("%s", err.message);
+
+ prev_err = err;
+
+ /* Save message so it doesn't get trashed by the catch below. */
+ prev_err.message = xstrdup (err.message);
+ }
+ END_CATCH
+
+ if (exec_file_host)
+ {
+ TRY
+ {
+ symbol_file_add (exec_file_host,
+ (inf->symfile_flags
+ | SYMFILE_MAINLINE | SYMFILE_DEFER_BP_RESET),
+ NULL, 0);
+
+ if ((inf->symfile_flags & SYMFILE_NO_READ) == 0)
+ set_initial_language ();
+ }
+ CATCH (err, RETURN_MASK_ERROR)
+ {
+ if (!exception_print_same (prev_err, err))
+ warning ("%s", err.message);
+ }
+ END_CATCH
+ }
+
+ do_cleanups (old_chain);
/* SYMFILE_DEFER_BP_RESET is used as the proper displacement for PIE
(Position Independent Executable) main symbol file will get applied by
solib_create_inferior_hook below. breakpoint_re_set would fail to insert
the breakpoints with the zero displacement. */
- symbol_file_add (execd_pathname,
- (inf->symfile_flags
- | SYMFILE_MAINLINE | SYMFILE_DEFER_BP_RESET),
- NULL, 0);
-
- if ((inf->symfile_flags & SYMFILE_NO_READ) == 0)
- set_initial_language ();
-
/* If the target can specify a description, read it. Must do this
after flipping to the new executable (because the target supplied
description must be compatible with the executable's
@@ -388,13 +388,17 @@ solib_find_1 (char *in_pathname, int *fd, int is_solib)
char *
exec_file_find (char *in_pathname, int *fd)
{
- char *result = solib_find_1 (in_pathname, fd, 0);
+ char *result;
+ const char *fskind = effective_target_file_system_kind ();
+
+ if (!in_pathname)
+ return NULL;
- if (result == NULL)
+ if (*gdb_sysroot != '\0' && IS_TARGET_ABSOLUTE_PATH (fskind, in_pathname))
{
- const char *fskind = effective_target_file_system_kind ();
+ result = solib_find_1 (in_pathname, fd, 0);
- if (fskind == file_system_kind_dos_based)
+ if (result == NULL && fskind == file_system_kind_dos_based)
{
char *new_pathname;
@@ -405,6 +409,21 @@ exec_file_find (char *in_pathname, int *fd)
result = solib_find_1 (new_pathname, fd, 0);
}
}
+ else
+ {
+ /* It's possible we don't have a full path, but rather just a
+ filename. Some targets, such as HP-UX, don't provide the
+ full path, sigh.
+
+ Attempt to qualify the filename against the source path.
+ (If that fails, we'll just fall back on the original
+ filename. Not much more we can do...) */
+
+ if (!source_full_path_of (in_pathname, &result))
+ result = xstrdup (in_pathname);
+ if (fd != NULL)
+ *fd = -1;
+ }
return result;
}