Message ID | 1464965361-25399-1-git-send-email-antoine.tremblay@ericsson.com |
---|---|
State | New |
Headers | show |
On 06/03/2016 03:49 PM, Antoine Tremblay wrote: > @@ -1491,7 +1523,28 @@ linux_nat_detach (struct target_ops *ops, const char *args, int from_tty) > linux_fork_detach (args, from_tty); > } > else > - linux_ops->to_detach (ops, args, from_tty); > + { > + TRY > + { > + linux_ops->to_detach (ops, args, from_tty); > + } > + CATCH (ex, RETURN_MASK_ERROR) > + { > + if (!check_ptrace_stopped_lwp_gone (main_lwp)) > + { > + delete_lwp (main_lwp->ptid); Is deleting the right thing to do? I think we'll likely crash, the next time we try to do something (e.g., try to detach again). Seems like on the gdbserver side, the lwp isn't deleted in this case. > + throw_exception (ex); > + } > + /* Ignore the error since the thread is gone already. */ > + else > + { > + inferior_ptid = null_ptid; > + detach_inferior (pid); > + inf_child_maybe_unpush_target (ops); Please factor these three lines out of inf_ptrace_attach into a separate function, exported from inf-ptrace.h and called from both places. E.g.: extern void inf_ptrace_attach_success (struct target_ops *ops); > + } > + } > + } > + delete_lwp (main_lwp->ptid); > } > > +clean_restart ${testfile} > + > +if ![runto_main] { > + fail "Can't run to main to check for trace support" "to check for trace support" is stale. > + return -1 > +} > + > +gdb_breakpoint "_exit" > +gdb_continue_to_breakpoint "_exit" ".*_exit.*" > + > +# Since the gdbserver-native board ends remote debugging on detach, > +# we can't use gdb_test as this function intercepts the Ending remote > +# debugging message and tries to restart gdb. Eh, that looks like a bogus think for gdb_test_multiple to be doing. Seems like it's been there since "forever": commit 19fa4a0af34ff07f260fc75e903ab92d8ca03d33 Author: Mike Werner <mtw@cygnus> AuthorDate: Sun Feb 21 20:03:55 1993 +0000 * gdb/testsuite: Initial creation of gdb/testsuite. Migrated dejagnu testcases and support files for testing nm to gdb/testsuite from deja-gnu. These files were moved "as is" with no modifications. This migration is part of a major overhaul of dejagnu. The modifications to these testcases, etc., which will allow them to work with the new version of dejagnu will be made in a future update. I'd support just removing that bit. In any case, there's no need to avoid gdb_test/gdb_test_multiple. Any user-specified pattern overrides gdb_test_multiple's internal patterns. So you just need to intercept the "Ending remote debugging" message yourself. > However, it can't do that > +# as gdb_exit on this board will call monitor exit which will fail as > +# we're not remote debugging anymore after a detach in this case. > +send_gdb "detach\n" > +gdb_expect { set test "detach" gdb_test_multiple $test $test { -re "Detaching from .*, process $decimal\r\nEnding remote debugging\.\r\n$gdb_prompt $" { # This is what you get with "target remote". pass $test } -re "Detaching from .*, process $decimal\r\n$gdb_prompt $" { pass $test } } > + -re "Detaching from .*, process $decimal\r\nEnding remote debugging\." { Missing $gdb_prompt match. > + #Don't try to exit gdbserver see above comment. > + set gdbserver_reconnect_p 1 I don't understand this one. It shouldn't be needed. > + pass "detach non-extended" Please use the same test message for all branches: pass $test > + } > + -re "Detaching from .*, process $decimal\r\n$gdb_prompt $" { > + pass "detach" > + } > + -re ".*$gdb_prompt $" { > + fail "detach" > + } > + timeout { > + fail "detach timeout" FYI, this would have been the standard: > + fail "$test (timeout)" > + } > +} > Thanks, Pedro Alves
Pedro Alves writes: > On 06/03/2016 03:49 PM, Antoine Tremblay wrote: > >> @@ -1491,7 +1523,28 @@ linux_nat_detach (struct target_ops *ops, const char *args, int from_tty) >> linux_fork_detach (args, from_tty); >> } >> else >> - linux_ops->to_detach (ops, args, from_tty); >> + { >> + TRY >> + { >> + linux_ops->to_detach (ops, args, from_tty); >> + } >> + CATCH (ex, RETURN_MASK_ERROR) >> + { >> + if (!check_ptrace_stopped_lwp_gone (main_lwp)) >> + { >> + delete_lwp (main_lwp->ptid); > > Is deleting the right thing to do? I think we'll likely > crash, the next time we try to do something (e.g., try > to detach again). Seems like on the gdbserver side, > the lwp isn't deleted in this case. > Indeed, OK. >> + throw_exception (ex); >> + } >> + /* Ignore the error since the thread is gone already. */ >> + else >> + { >> + inferior_ptid = null_ptid; >> + detach_inferior (pid); >> + inf_child_maybe_unpush_target (ops); > > Please factor these three lines out of inf_ptrace_attach into > a separate function, exported from inf-ptrace.h and called > from both places. E.g.: > OK. > extern void inf_ptrace_attach_success (struct target_ops *ops); > OK. (detach_success) >> + } >> + } >> + } >> + delete_lwp (main_lwp->ptid); >> } >> > >> +clean_restart ${testfile} >> + >> +if ![runto_main] { >> + fail "Can't run to main to check for trace support" > > "to check for trace support" is stale. > Oops thanks. >> + return -1 >> +} >> + >> +gdb_breakpoint "_exit" >> +gdb_continue_to_breakpoint "_exit" ".*_exit.*" >> + >> +# Since the gdbserver-native board ends remote debugging on detach, >> +# we can't use gdb_test as this function intercepts the Ending remote >> +# debugging message and tries to restart gdb. > > Eh, that looks like a bogus think for gdb_test_multiple to be doing. > > Seems like it's been there since "forever": > > commit 19fa4a0af34ff07f260fc75e903ab92d8ca03d33 > Author: Mike Werner <mtw@cygnus> > AuthorDate: Sun Feb 21 20:03:55 1993 +0000 > > * gdb/testsuite: Initial creation of gdb/testsuite. > Migrated dejagnu testcases and support files for testing nm to > gdb/testsuite from deja-gnu. These files were moved "as is" > with no modifications. This migration is part of a major overhaul > of dejagnu. The modifications to these testcases, etc., which > will allow them to work with the new version of dejagnu will be > made in a future update. > > I'd support just removing that bit. > > In any case, there's no need to avoid gdb_test/gdb_test_multiple. > Any user-specified pattern overrides gdb_test_multiple's internal > patterns. So you just need to intercept the "Ending remote debugging" > message yourself. > Right, I tested that before without sucess but had not the perfect regex yet... I'll use this now. thanks. >> However, it can't do that >> +# as gdb_exit on this board will call monitor exit which will fail as >> +# we're not remote debugging anymore after a detach in this case. >> +send_gdb "detach\n" >> +gdb_expect { > > set test "detach" > gdb_test_multiple $test $test { > -re "Detaching from .*, process $decimal\r\nEnding remote debugging\.\r\n$gdb_prompt $" { > # This is what you get with "target remote". > pass $test > } > -re "Detaching from .*, process $decimal\r\n$gdb_prompt $" { > pass $test > } > } > >> + -re "Detaching from .*, process $decimal\r\nEnding remote debugging\." { > > Missing $gdb_prompt match. > OK. >> + #Don't try to exit gdbserver see above comment. >> + set gdbserver_reconnect_p 1 > > I don't understand this one. It shouldn't be needed. > This one is needed since on the teardown of the .exp test gdb_exit is called and calls monitor_exit which fails with a timeout this avoids this and the original gdb_exit is called. >> + pass "detach non-extended" > > Please use the same test message for all branches: > > pass $test > >> + } >> + -re "Detaching from .*, process $decimal\r\n$gdb_prompt $" { >> + pass "detach" >> + } >> + -re ".*$gdb_prompt $" { >> + fail "detach" >> + } >> + timeout { >> + fail "detach timeout" > > FYI, this would have been the standard: > >> + fail "$test (timeout)" > >> + } >> +} >> Noted, thx.
On 06/03/2016 04:32 PM, Antoine Tremblay wrote: >>> >> + #Don't try to exit gdbserver see above comment. >>> >> + set gdbserver_reconnect_p 1 >> > >> > I don't understand this one. It shouldn't be needed. >> > > This one is needed since on the teardown of the .exp test gdb_exit is > called and calls monitor_exit which fails with a timeout this avoids > this and the original gdb_exit is called. Then that strikes me as the wrong thing to do. We should instead expect that gdbserver exits cleanly, and then clear server_spawn_id. E.g.: proc test_server_exit {} { global server_spawn_id if ![info exists server_spawn_id] { return } set test "server exits" gdb_expect { -i $server_spawn_id eof { pass $test wait -i $server_spawn_id unset server_spawn_id } timeout { fail "$test (timeout)" } } } } and then: -re "Detaching from .*, process $decimal\r\nEnding remote debugging\.\r\n$gdb_prompt $" { # This is what you get with "target remote". pass $test # If testing with gdbserver, make sure it manages to exit. test_server_exit } Thanks, Pedro Alves
Pedro Alves writes: > On 06/03/2016 04:32 PM, Antoine Tremblay wrote: >>>> >> + #Don't try to exit gdbserver see above comment. >>>> >> + set gdbserver_reconnect_p 1 >>> > >>> > I don't understand this one. It shouldn't be needed. >>> > >> This one is needed since on the teardown of the .exp test gdb_exit is >> called and calls monitor_exit which fails with a timeout this avoids >> this and the original gdb_exit is called. > > Then that strikes me as the wrong thing to do. We should instead > expect that gdbserver exits cleanly, and then clear server_spawn_id. > > E.g.: > > proc test_server_exit {} { > global server_spawn_id > if ![info exists server_spawn_id] { > return > } > > set test "server exits" > gdb_expect { > -i $server_spawn_id > eof { > pass $test > wait -i $server_spawn_id > unset server_spawn_id > } > timeout { > fail "$test (timeout)" > } > } > } > } > > and then: > > -re "Detaching from .*, process $decimal\r\nEnding remote debugging\.\r\n$gdb_prompt $" { > # This is what you get with "target remote". > pass $test > # If testing with gdbserver, make sure it manages to exit. > test_server_exit > } > Thanks does seems much better ! However, testing it it triggers the test_server_exit timeout case, I'm guessing because eof already happened and we're waiting for nothing ? I'm tempted just to unset the var... but ideas are welcome.. Regards, Antoine
On 06/03/2016 04:58 PM, Antoine Tremblay wrote: > However, testing it it triggers the test_server_exit timeout case, I'm > guessing because eof already happened and we're waiting for nothing ? > > I'm tempted just to unset the var... but ideas are welcome.. Would be easier to try if I had access to an updated patch.. :-) Thanks, Pedro Alves
diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c index 81134b0..5f02dab 100644 --- a/gdb/gdbserver/linux-low.c +++ b/gdb/gdbserver/linux-low.c @@ -1447,6 +1447,39 @@ get_detach_signal (struct thread_info *thread) } } +/* Called when we try to resume or detach a stopped LWP and that errors + out. If the LWP is no longer in ptrace-stopped state (meaning it's + zombie, or about to become), discard the error, clear any pending + status the LWP may have, and return true (we'll collect the exit status + soon enough). Otherwise, return false. */ + +static int +check_ptrace_stopped_lwp_gone (struct lwp_info *lp) +{ + struct thread_info *thread = get_lwp_thread (lp); + + /* If we get an error after resuming the LWP successfully, we'd + confuse !T state for the LWP being gone. */ + gdb_assert (lp->stopped); + + /* We can't just check whether the LWP is in 'Z (Zombie)' state, + because even if ptrace failed with ESRCH, the tracee may be "not + yet fully dead", but already refusing ptrace requests. In that + case the tracee has 'R (Running)' state for a little bit + (observed in Linux 3.18). See also the note on ESRCH in the + ptrace(2) man page. Instead, check whether the LWP has any state + other than ptrace-stopped. */ + + /* Don't assume anything if /proc/PID/status can't be read. */ + if (linux_proc_pid_is_trace_stopped_nowarn (lwpid_of (thread)) == 0) + { + lp->stop_reason = TARGET_STOPPED_BY_NO_REASON; + lp->status_pending_p = 0; + return 1; + } + return 0; +} + static int linux_detach_one_lwp (struct inferior_list_entry *entry, void *args) { @@ -1480,9 +1513,10 @@ linux_detach_one_lwp (struct inferior_list_entry *entry, void *args) the_low_target.prepare_to_resume (lwp); if (ptrace (PTRACE_DETACH, lwpid_of (thread), (PTRACE_TYPE_ARG3) 0, (PTRACE_TYPE_ARG4) (long) sig) < 0) - error (_("Can't detach %s: %s"), - target_pid_to_str (ptid_of (thread)), - strerror (errno)); + if (!check_ptrace_stopped_lwp_gone (lwp)) + error (_("Can't detach %s: %s"), + target_pid_to_str (ptid_of (thread)), + strerror (errno)); delete_lwp (lwp); return 0; @@ -4331,39 +4365,6 @@ linux_resume_one_lwp_throw (struct lwp_info *lwp, lwp->stop_reason = TARGET_STOPPED_BY_NO_REASON; } -/* Called when we try to resume a stopped LWP and that errors out. If - the LWP is no longer in ptrace-stopped state (meaning it's zombie, - or about to become), discard the error, clear any pending status - the LWP may have, and return true (we'll collect the exit status - soon enough). Otherwise, return false. */ - -static int -check_ptrace_stopped_lwp_gone (struct lwp_info *lp) -{ - struct thread_info *thread = get_lwp_thread (lp); - - /* If we get an error after resuming the LWP successfully, we'd - confuse !T state for the LWP being gone. */ - gdb_assert (lp->stopped); - - /* We can't just check whether the LWP is in 'Z (Zombie)' state, - because even if ptrace failed with ESRCH, the tracee may be "not - yet fully dead", but already refusing ptrace requests. In that - case the tracee has 'R (Running)' state for a little bit - (observed in Linux 3.18). See also the note on ESRCH in the - ptrace(2) man page. Instead, check whether the LWP has any state - other than ptrace-stopped. */ - - /* Don't assume anything if /proc/PID/status can't be read. */ - if (linux_proc_pid_is_trace_stopped_nowarn (lwpid_of (thread)) == 0) - { - lp->stop_reason = TARGET_STOPPED_BY_NO_REASON; - lp->status_pending_p = 0; - return 1; - } - return 0; -} - /* Like linux_resume_one_lwp_throw, but no error is thrown if the LWP disappears while we try to resume it. */ diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c index e6d525f..1c8c82c 100644 --- a/gdb/linux-nat.c +++ b/gdb/linux-nat.c @@ -1382,6 +1382,38 @@ get_pending_status (struct lwp_info *lp, int *status) return 0; } +/* Called when we try to resume or detach a stopped LWP and that errors + out. If the LWP is no longer in ptrace-stopped state (meaning it's + zombie, or about to become), discard the error, clear any pending + status the LWP may have, and return true (we'll collect the exit status + soon enough). Otherwise, return false. */ + +static int +check_ptrace_stopped_lwp_gone (struct lwp_info *lp) +{ + /* If we get an error after resuming the LWP successfully, we'd + confuse !T state for the LWP being gone. */ + gdb_assert (lp->stopped); + + /* We can't just check whether the LWP is in 'Z (Zombie)' state, + because even if ptrace failed with ESRCH, the tracee may be "not + yet fully dead", but already refusing ptrace requests. In that + case the tracee has 'R (Running)' state for a little bit + (observed in Linux 3.18). See also the note on ESRCH in the + ptrace(2) man page. Instead, check whether the LWP has any state + other than ptrace-stopped. */ + + /* Don't assume anything if /proc/PID/status can't be read. */ + if (linux_proc_pid_is_trace_stopped_nowarn (ptid_get_lwp (lp->ptid)) == 0) + { + lp->stop_reason = TARGET_STOPPED_BY_NO_REASON; + lp->status = 0; + lp->waitstatus.kind = TARGET_WAITKIND_IGNORE; + return 1; + } + return 0; +} + static int detach_callback (struct lwp_info *lp, void *data) { @@ -1418,8 +1450,9 @@ detach_callback (struct lwp_info *lp, void *data) errno = 0; if (ptrace (PTRACE_DETACH, ptid_get_lwp (lp->ptid), 0, WSTOPSIG (status)) < 0) - error (_("Can't detach %s: %s"), target_pid_to_str (lp->ptid), - safe_strerror (errno)); + if (!check_ptrace_stopped_lwp_gone (lp)) + error (_("Can't detach %s: %s"), target_pid_to_str (lp->ptid), + safe_strerror (errno)); if (debug_linux_nat) fprintf_unfiltered (gdb_stdlog, @@ -1480,7 +1513,6 @@ linux_nat_detach (struct target_ops *ops, const char *args, int from_tty) if (linux_nat_prepare_to_resume != NULL) linux_nat_prepare_to_resume (main_lwp); - delete_lwp (main_lwp->ptid); if (forks_exist_p ()) { @@ -1491,7 +1523,28 @@ linux_nat_detach (struct target_ops *ops, const char *args, int from_tty) linux_fork_detach (args, from_tty); } else - linux_ops->to_detach (ops, args, from_tty); + { + TRY + { + linux_ops->to_detach (ops, args, from_tty); + } + CATCH (ex, RETURN_MASK_ERROR) + { + if (!check_ptrace_stopped_lwp_gone (main_lwp)) + { + delete_lwp (main_lwp->ptid); + throw_exception (ex); + } + /* Ignore the error since the thread is gone already. */ + else + { + inferior_ptid = null_ptid; + detach_inferior (pid); + inf_child_maybe_unpush_target (ops); + } + } + } + delete_lwp (main_lwp->ptid); } /* Resume execution of the inferior process. If STEP is nonzero, @@ -1531,38 +1584,6 @@ linux_resume_one_lwp_throw (struct lwp_info *lp, int step, registers_changed_ptid (lp->ptid); } -/* Called when we try to resume a stopped LWP and that errors out. If - the LWP is no longer in ptrace-stopped state (meaning it's zombie, - or about to become), discard the error, clear any pending status - the LWP may have, and return true (we'll collect the exit status - soon enough). Otherwise, return false. */ - -static int -check_ptrace_stopped_lwp_gone (struct lwp_info *lp) -{ - /* If we get an error after resuming the LWP successfully, we'd - confuse !T state for the LWP being gone. */ - gdb_assert (lp->stopped); - - /* We can't just check whether the LWP is in 'Z (Zombie)' state, - because even if ptrace failed with ESRCH, the tracee may be "not - yet fully dead", but already refusing ptrace requests. In that - case the tracee has 'R (Running)' state for a little bit - (observed in Linux 3.18). See also the note on ESRCH in the - ptrace(2) man page. Instead, check whether the LWP has any state - other than ptrace-stopped. */ - - /* Don't assume anything if /proc/PID/status can't be read. */ - if (linux_proc_pid_is_trace_stopped_nowarn (ptid_get_lwp (lp->ptid)) == 0) - { - lp->stop_reason = TARGET_STOPPED_BY_NO_REASON; - lp->status = 0; - lp->waitstatus.kind = TARGET_WAITKIND_IGNORE; - return 1; - } - return 0; -} - /* Like linux_resume_one_lwp_throw, but no error is thrown if the LWP disappears while we try to resume it. */ diff --git a/gdb/testsuite/gdb.threads/detach-gone-thread.c b/gdb/testsuite/gdb.threads/detach-gone-thread.c new file mode 100644 index 0000000..19f6fc2 --- /dev/null +++ b/gdb/testsuite/gdb.threads/detach-gone-thread.c @@ -0,0 +1,47 @@ +/* This testcase is part of GDB, the GNU debugger. + + Copyright 2016 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. */ + +#include <pthread.h> +#include <unistd.h> +#include <stdlib.h> + +pthread_barrier_t barrier; + +#define NTHREADS 256 + +void * +child_function (void *arg) +{ + pthread_barrier_wait (&barrier); + _exit (0); +} + +int +main () +{ + pthread_t threads[NTHREADS]; + int res; + int i; + + pthread_barrier_init (&barrier, NULL, NTHREADS + 1); + + for (i = 0; i < NTHREADS; i++) + res = pthread_create (&threads[i], NULL, child_function, NULL); + + pthread_barrier_wait (&barrier); + exit (0); +} diff --git a/gdb/testsuite/gdb.threads/detach-gone-thread.exp b/gdb/testsuite/gdb.threads/detach-gone-thread.exp new file mode 100644 index 0000000..812246c --- /dev/null +++ b/gdb/testsuite/gdb.threads/detach-gone-thread.exp @@ -0,0 +1,52 @@ +# Copyright 2015-2016 Free Software Foundation, Inc. +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +standard_testfile + +if {[gdb_compile_pthreads "${srcdir}/${subdir}/${srcfile}" "${binfile}" executable {debug}] != "" } { + return -1 +} + +clean_restart ${testfile} + +if ![runto_main] { + fail "Can't run to main to check for trace support" + return -1 +} + +gdb_breakpoint "_exit" +gdb_continue_to_breakpoint "_exit" ".*_exit.*" + +# Since the gdbserver-native board ends remote debugging on detach, +# we can't use gdb_test as this function intercepts the Ending remote +# debugging message and tries to restart gdb. However, it can't do that +# as gdb_exit on this board will call monitor exit which will fail as +# we're not remote debugging anymore after a detach in this case. +send_gdb "detach\n" +gdb_expect { + -re "Detaching from .*, process $decimal\r\nEnding remote debugging\." { + #Don't try to exit gdbserver see above comment. + set gdbserver_reconnect_p 1 + pass "detach non-extended" + } + -re "Detaching from .*, process $decimal\r\n$gdb_prompt $" { + pass "detach" + } + -re ".*$gdb_prompt $" { + fail "detach" + } + timeout { + fail "detach timeout" + } +}