Message ID | bd7d83f4-49e9-787c-cc36-6fa97e73f6e7@redhat.com |
---|---|
State | New |
Headers | show |
On 2020-01-24 10:11 a.m., Pedro Alves wrote: > On 1/24/20 2:59 PM, Simon Marchi wrote: >> On 2020-01-24 6:22 a.m., Aktemur, Tankut Baris wrote: >>> On Friday, January 24, 2020 4:02 AM, Pedro Alves wrote: >>>> This fixes a latent bug exposed by the multi-target patch (5b6d1e4fa >>>> "Multi-target support). >>> >>> The patch led to the assertion violation below when running >>> gdb.threads/vfork-follow-child-exit.exp. >>> >>> gdb/progspace.c:243: internal-error: void set_current_program_space(program_space*): Assertion `pspace != NULL' failed. >> >> >> Oh, I see that too. > > Me too. No idea how I missed this. Sorry about that... > There's actually yet another internal error in addition > to that one. > > Here's the updated patch, which fixes all. > > From 2880f692d69cf54ef9fb41f84ac00a1d04f28447 Mon Sep 17 00:00:00 2001 > From: Pedro Alves <palves@redhat.com> > Date: Fri, 24 Jan 2020 14:55:43 +0000 > Subject: [PATCH] Fix re-runs of a second inferior (PR gdb/25410) > > This fixes a latent bug exposed by the multi-target patch (5b6d1e4fa > "Multi-target support), and then fixes two other latent bugs exposed > by fixing that first latent bug. > > The symptom described in the bug report is that starting a first > inferior, then trying to run a second (multi-threaded) inferior twice, > causes libthread_db to fail to load, along with other erratic > behavior: > > (gdb) run > Starting program: /tmp/foo > warning: td_ta_new failed: generic error > > Going a bit deeply, I found that if the two inferiors have different > symbols, we can see that just after inferior 2 exits, we are left with > inferior 2 selected, which is correct, but the symbols in scope belong > to inferior 1, which is obviously incorrect... > > This problem is that there's a path in > scoped_restore_current_thread::restore() that switches to no thread > selected, and switches the current inferior, but leaves the current > program space as is, resulting in leaving the program space pointing > to the wrong program space (the one of the other inferior). This was > happening after handling TARGET_WAITKIND_NO_RESUMED, which is an event > that triggers after TARGET_WAITKIND_EXITED for the previous inferior > exit. Subsequent symbol lookups find the symbols of the wrong > inferior. > > The fix is to use switch_to_inferior_no_thread in that problem spot. > This function was recently added along with the multi-target work > exactly for these situations. > > As for testing, this patch adds a new testcase that tests symbol > printing just after inferior exit, which exercises the root cause of > the problem more directly. And then, to cover the use case described > in the bug too, it also exercises the lithread_db.so mis-loading, by > using TLS printing as a proxy for being sure that threaded debugging > was activated sucessfully. The testcase fails without the fix like > this, for the "print symbol just after exit" bits: > > ... > [Inferior 1 (process 8719) exited normally] > (gdb) PASS: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=1: continue until exit > print re_run_var_1 > No symbol "re_run_var_1" in current context. > (gdb) FAIL: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=1: print re_run_var_1 > ... > > And like this for the "libthread_db.so loading" bits: > > (gdb) run > Starting program: /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.multi/multi-re-run/multi-re-run > warning: td_ta_new failed: generic error > [New LWP 27001] > > Thread 1.1 "multi-re-run" hit Breakpoint 3, all_started () at /home/pedro/gdb/binutils-gdb/build/../src/gdb/testsuite/gdb.multi/multi-re-run.c:44 > 44 } > (gdb) PASS: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=2: running to all_started in runto > print tls_var > Cannot find thread-local storage for LWP 27000, executable file /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.multi/multi-re-run/multi-re-run: > Cannot find thread-local variables on this target > (gdb) FAIL: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=2: print tls_var > > > As mentioned, that fix above goes on to expose a couple other latent > bugs. This commit fixes those as well. > > The first latent bug exposed is in > infrun.c:handle_vfork_child_exec_or_exit. The current code is leaving > inf->pspace == NULL while calling clone_program_space. The idea was > to make it so that the breakpoints module doesn't use this inferior's > pspace to set breakpoints. With that, any > scoped_restore_current_thread use from within clone_program_space > tries to restore a NULL program space, which hits an assertion: > > Attaching after Thread 0x7ffff74b8700 (LWP 27276) vfork to child process 27277] > [New inferior 2 (process 27277)] > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > /home/pedro/gdb/binutils-gdb/build/../src/gdb/progspace.c:243: internal-error: void set_current_program_space(program_space*): Assertion `pspace != NULL' faile > d. > A problem internal to GDB has been detected, > further debugging may prove unreliable. > Quit this debugging session? (y or n) FAIL: gdb.threads/vfork-follow-child-exit.exp: detach-on-fork=off: continue (GDB internal error) > > That NULL pspace idea was legitimate, but it's no longer necessary, > since commit b2e586e850db ("Defer breakpoint reset when cloning > progspace for fork child"). So the fix is to just set the inferior's > program space earlier. > > > The other latent bug exposed is in exec.c. When exec_close is called > from the program_space destructor, it is purposedly called with a > current program space that is not current inferior's program space. > The problem is that the multi-target work added some code to > remove_target_sections that loops over all inferiors, and uses > scoped_restore_current_thread to save/restore the previous > thread/inferior/frame state. This makes it so that exec_close returns > with the current program space set to the current inferior's program > space, which is exactly what we did not want. Then the program_space > destructor continues into free_all_objfiles, but it is not running > that method on the wrong program space, resulting in: > > Reading symbols from /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.threads/fork-plus-threads/fork-plus-threads... > Reading symbols from /usr/lib/debug/usr/lib64/libpthread-2.26.so.debug... > Reading symbols from /usr/lib/debug/usr/lib64/libm-2.26.so.debug... > Reading symbols from /usr/lib/debug/usr/lib64/libc-2.26.so.debug... > Reading symbols from /usr/lib/debug/usr/lib64/ld-2.26.so.debug... > [Inferior 3 (process 9583) exited normally] > /home/pedro/gdb/binutils-gdb/build/../src/gdb/progspace.c:170: internal-error: void program_space::free_all_objfiles(): Assertion `so->objfile == NULL' failed. > A problem internal to GDB has been detected, > further debugging may prove unreliable. > Quit this debugging session? (y or n) FAIL: gdb.threads/fork-plus-threads.exp: detach-on-fork=off: inferior 1 exited (GDB internal error) > > The fix is to use scoped_restore_current_pspace_and_thread instead of > scoped_restore_current_thread. I did not look at the code in depth, but your explanations make sense to me, so if the testsuite shows no regression, this LGTM. Simon
On Friday, January 24, 2020 6:05 PM, Simon Marchi wrote: > On 2020-01-24 10:11 a.m., Pedro Alves wrote: > > On 1/24/20 2:59 PM, Simon Marchi wrote: > >> On 2020-01-24 6:22 a.m., Aktemur, Tankut Baris wrote: > >>> On Friday, January 24, 2020 4:02 AM, Pedro Alves wrote: > >>>> This fixes a latent bug exposed by the multi-target patch (5b6d1e4fa > >>>> "Multi-target support). > >>> > >>> The patch led to the assertion violation below when running > >>> gdb.threads/vfork-follow-child-exit.exp. > >>> > >>> gdb/progspace.c:243: internal-error: void set_current_program_space(program_space*): Assertion `pspace != NULL' > failed. > >> > >> > >> Oh, I see that too. > > > > Me too. No idea how I missed this. Sorry about that... > > There's actually yet another internal error in addition > > to that one. > > > > Here's the updated patch, which fixes all. > > <snipped> > > The fix is to use scoped_restore_current_pspace_and_thread instead of > > scoped_restore_current_thread. > > I did not look at the code in depth, but your explanations make sense to me, > so if the testsuite shows no regression, this LGTM. > > Simon I ran the patch locally and did not see any regression. Thank you. -Baris Intel Deutschland GmbH Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany Tel: +49 89 99 8853-0, www.intel.de Managing Directors: Christin Eisenschmid, Gary Kershaw Chairperson of the Supervisory Board: Nicole Lau Registered Office: Munich Commercial Register: Amtsgericht Muenchen HRB 186928
diff --git a/gdb/exec.c b/gdb/exec.c index eb07b51dda..2506e84157 100644 --- a/gdb/exec.c +++ b/gdb/exec.c @@ -33,6 +33,7 @@ #include "arch-utils.h" #include "gdbthread.h" #include "progspace.h" +#include "progspace-and-thread.h" #include "gdb_bfd.h" #include "gcore.h" #include "source.h" @@ -547,7 +548,7 @@ add_target_sections (void *owner, table->sections[space + i].owner = owner; } - scoped_restore_current_thread restore_thread; + scoped_restore_current_pspace_and_thread restore_pspace_thread; program_space *curr_pspace = current_program_space; /* If these are the first file sections we can provide memory @@ -645,7 +646,7 @@ remove_target_sections (void *owner) inferior sharing the program space. */ if (old_count + (dest - src) == 0) { - scoped_restore_current_thread restore_thread; + scoped_restore_current_pspace_and_thread restore_pspace_thread; program_space *curr_pspace = current_program_space; for (inferior *inf : all_inferiors ()) diff --git a/gdb/infrun.c b/gdb/infrun.c index 9c4a07daba..22de42c2ae 100644 --- a/gdb/infrun.c +++ b/gdb/infrun.c @@ -1020,8 +1020,6 @@ handle_vfork_child_exec_or_exit (int exec) } else { - struct program_space *pspace; - /* If this is a vfork child exiting, then the pspace and aspaces were shared with the parent. Since we're reporting the process exit, we'll be mourning all that is @@ -1037,18 +1035,12 @@ handle_vfork_child_exec_or_exit (int exec) scoped_restore restore_ptid = make_scoped_restore (&inferior_ptid, null_ptid); - /* This inferior is dead, so avoid giving the breakpoints - module the option to write through to it (cloning a - program space resets breakpoints). */ - inf->aspace = NULL; - inf->pspace = NULL; - pspace = new program_space (maybe_new_address_space ()); - set_current_program_space (pspace); + inf->pspace = new program_space (maybe_new_address_space ()); + inf->aspace = inf->pspace->aspace; + set_current_program_space (inf->pspace); inf->removable = 1; inf->symfile_flags = SYMFILE_NO_READ; - clone_program_space (pspace, vfork_parent->pspace); - inf->pspace = pspace; - inf->aspace = pspace->aspace; + clone_program_space (inf->pspace, vfork_parent->pspace); resume_parent = vfork_parent->pid; } diff --git a/gdb/testsuite/gdb.multi/multi-re-run-1.c b/gdb/testsuite/gdb.multi/multi-re-run-1.c new file mode 100644 index 0000000000..91b0dbce30 --- /dev/null +++ b/gdb/testsuite/gdb.multi/multi-re-run-1.c @@ -0,0 +1,61 @@ +/* This testcase is part of GDB, the GNU debugger. + + Copyright 2020 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. */ + +#include <stdlib.h> +#include <stdio.h> +#include <unistd.h> +#include <limits.h> +#include <string.h> +#include <pthread.h> + +int re_run_var_1 = 1; + +#define NUM_THREADS 1 + +__thread int tls_var = 1; + +static pthread_barrier_t barrier; + +static void * +thread_start (void *arg) +{ + pthread_barrier_wait (&barrier); + + while (1) + sleep (1); + return NULL; +} + +static void +all_started (void) +{ +} + +int +main (int argc, char ** argv) +{ + pthread_t thread; + int len; + + pthread_barrier_init (&barrier, NULL, NUM_THREADS + 1); + pthread_create (&thread, NULL, thread_start, NULL); + + pthread_barrier_wait (&barrier); + all_started (); + + return 0; +} diff --git a/gdb/testsuite/gdb.multi/multi-re-run-2.c b/gdb/testsuite/gdb.multi/multi-re-run-2.c new file mode 100644 index 0000000000..6925e0cb2d --- /dev/null +++ b/gdb/testsuite/gdb.multi/multi-re-run-2.c @@ -0,0 +1,61 @@ +/* This testcase is part of GDB, the GNU debugger. + + Copyright 2020 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. */ + +#include <stdlib.h> +#include <stdio.h> +#include <unistd.h> +#include <limits.h> +#include <string.h> +#include <pthread.h> + +int re_run_var_2 = 2; + +#define NUM_THREADS 1 + +__thread int tls_var = 1; + +static pthread_barrier_t barrier; + +static void * +thread_start (void *arg) +{ + pthread_barrier_wait (&barrier); + + while (1) + sleep (1); + return NULL; +} + +static void +all_started (void) +{ +} + +int +main (int argc, char ** argv) +{ + pthread_t thread; + int len; + + pthread_barrier_init (&barrier, NULL, NUM_THREADS + 1); + pthread_create (&thread, NULL, thread_start, NULL); + + pthread_barrier_wait (&barrier); + all_started (); + + return 0; +} diff --git a/gdb/testsuite/gdb.multi/multi-re-run.exp b/gdb/testsuite/gdb.multi/multi-re-run.exp new file mode 100644 index 0000000000..93cd709b5c --- /dev/null +++ b/gdb/testsuite/gdb.multi/multi-re-run.exp @@ -0,0 +1,115 @@ +# Copyright 2020 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +# Test loading two inferiors into GDB, and running one of them twice +# in a row. GDB used to have a bug that made it so that after an +# inferior exit, the current program space was left pointing to the +# wrong inferior's pspace, causing subsequent symbol lookups to +# misbehave, including failing to load libthread_db.so. See PR +# gdb/25410. + +# Build two executables, with different symbols. + +set exec1 "multi-re-run-1" +set srcfile1 multi-re-run-1.c +set binfile1 [standard_output_file ${exec1}] + +set exec2 "multi-re-run-2" +set srcfile2 multi-re-run-2.c +set binfile2 [standard_output_file ${exec2}] + +with_test_prefix "exec1" { + if { [prepare_for_testing "failed to prepare" ${exec1} "${srcfile1}" \ + [list pthreads debug]] } { + return -1 + } +} + +with_test_prefix "exec2" { + if { [prepare_for_testing "failed to prepare" ${exec2} "${srcfile2}" \ + [list pthreads debug]] } { + return -1 + } +} + +# Start two inferiors, leave one stopped, and run the other a couple +# times. RE_RUN_INF is the inferior that is re-run. + +proc test_re_run {re_run_inf} { + global binfile1 binfile2 + global inferior_exited_re + global gdb_prompt + + clean_restart ${binfile1} + + delete_breakpoints + + # Start another inferior. + gdb_test "add-inferior" "Added inferior 2.*" \ + "add empty inferior 2" + gdb_test "inferior 2" "Switching to inferior 2.*" \ + "switch to inferior 2" + gdb_load ${binfile2} + + if {$re_run_inf == 1} { + set steady_inf 2 + } else { + set steady_inf 1 + } + + gdb_test "inferior $steady_inf" "Switching to inferior $steady_inf.*" \ + "switch to steady inferior" + + # Run the steady inferior to a breakpoint, and let it stay stopped + # there. + if ![runto all_started message] then { + untested "setup failed" + return 0 + } + + gdb_test "inferior $re_run_inf" "Switching to inferior $re_run_inf.*" \ + "switch to re-run inferior" + + # Now run the RE_RUN_INF inferior a couple times. GDB used to + # have a bug that caused the second run to fail to load + # libthread_db.so. + foreach_with_prefix iter {1 2} { + delete_breakpoints + + if ![runto all_started message] { + return 0 + } + + # If a thread_stratum target fails to load, then TLS debugging + # fails too. + gdb_test "print tls_var" " = 1" + + gdb_continue_to_end "" continue 1 + + # In the original bug, after an inferior exit, GDB would leave + # the current program space pointing to the wrong inferior's + # pspace, and thus the wrong symbols were visible. + if {$re_run_inf == 1} { + gdb_test "print re_run_var_1" " = 1" + } else { + gdb_test "print re_run_var_2" " = 2" + } + } +} + +# For completeness, test re-running either inferior 1 or inferior 2. +foreach_with_prefix re_run_inf {1 2} { + test_re_run $re_run_inf +} diff --git a/gdb/thread.c b/gdb/thread.c index 001fdd42c5..302a49e984 100644 --- a/gdb/thread.c +++ b/gdb/thread.c @@ -1429,10 +1429,7 @@ scoped_restore_current_thread::restore () && m_inf->pid != 0) switch_to_thread (m_thread); else - { - switch_to_no_thread (); - set_current_inferior (m_inf); - } + switch_to_inferior_no_thread (m_inf); /* The running state of the originally selected thread may have changed, so we have to recheck it here. */