[0/9] Fixes for multiprocess for FreeBSD's native target

Message ID 20230228181845.99936-1-jhb@FreeBSD.org
Headers
Series Fixes for multiprocess for FreeBSD's native target |

Message

John Baldwin Feb. 28, 2023, 6:18 p.m. UTC
  When I originally added multiprocess support to the FreeBSD target I
did not fully understand what that meant.  This series seeks to fix
several of my incorrect assumptions (e.g. the need to resume multiple
inferiors for a given call to resume, as well as the need to stop
other inferiors when reporting an event back from wait).

This does not add support for non-stop mode, but probably gets closer
to being able to support it.

In terms of test results, the before summary of a testsuite run on
FreeBSD/amd64 is:

                === gdb Summary ===

# of unexpected core files      11
# of expected passes            98135
# of unexpected failures        3104
# of expected failures          58
# of known failures             125
# of unresolved testcases       74
# of untested testcases         58
# of unsupported tests          818
# of duplicate test names       10

and the summary after this series is:

                === gdb Summary ===

# of unexpected core files      9
# of expected passes            98586
# of unexpected failures        3002
# of expected failures          58
# of known failures             125
# of unresolved testcases       54
# of untested testcases         58
# of unsupported tests          818
# of duplicate test names       9

Some of these other failures are improved further by the other
testsuite patches I posted earlier involving races in tests, or
matching on the output of 'info threads'.

I do see a few failures of the new assertions added in this series
still, namely a handful of tests do something I'm not sure is
expected: they schedule multiple individual threads from a single
process multiple resume() calls.  I think though this may be a test
that is assuming the target supports non-stop when it doesn't.  For
example, here are the debugging logs from a test that I caught that
does this:

Running /home/john/work/git/gdb/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.exp ...
...
(gdb) file /usr/home/john/work/git/gdb/obj/gdb/testsuite/outputs/gdb.threads/attach-many-short-lived-threads/attach-many-short-lived-threads
Reading symbols from /usr/home/john/work/git/gdb/obj/gdb/testsuite/outputs/gdb.threads/attach-many-short-lived-threads/attach-many-short-lived-threads...
(gdb) builtin_spawn /usr/home/john/work/git/gdb/obj/gdb/testsuite/outputs/gdb.threads/attach-many-short-lived-threads/attach-many-short-lived-threads
attach 44426
attach: running_lwps 1
Attaching to program: /usr/home/john/work/git/gdb/obj/gdb/testsuite/outputs/gdb.threads/attach-many-short-lived-threads/attach-many-short-lived-threads, process 44426
[New LWP 959198 of process 44426]
[New LWP 959404 of process 44426]
[New LWP 959532 of process 44426]
[New LWP 959544 of process 44426]
[New LWP 959568 of process 44426]
...
[Switching to LWP 432744 of process 44426]
_nanosleep () at _nanosleep.S:4
4       PSEUDO(nanosleep)
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: attach
info threads
  Id   Target Id                              Frame 
* 1    LWP 432744 of process 44426            _nanosleep () at _nanosleep.S:4
  2    LWP 959198 of process 44426 "detached" _umtx_op_err () at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:40
...
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: no new threads
set breakpoint always-inserted on
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: set breakpoint always-inserted on
break break_fn
Breakpoint 1 at 0x400ebc: file /home/john/work/git/gdb/gdb/testsuite/gdb.threads/attach-many-short-lived-threads.c, line 57.
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: break break_fn
set debug infrun on
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: set debug infrun on
set debug fbsd-nat on
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: set debug fbsd-nat on
continue
Continuing.
[infrun] clear_proceed_status_thread: 44426.432744.0
[infrun] clear_proceed_status_thread: 44426.959198.0
[infrun] clear_proceed_status_thread: 44426.959404.0
...
[infrun] clear_proceed_status_thread: 44426.960473.0
[infrun] clear_proceed_status_thread: 44426.960474.0
[infrun] clear_proceed_status_thread: 44426.960475.0
[infrun] proceed: enter
  [infrun] user_visible_resume_ptid: resuming all threads of current process
  [infrun] proceed: addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT
  [infrun] scoped_disable_commit_resumed: reason=proceeding
  [infrun] start_step_over: enter
    [infrun] start_step_over: stealing global queue of threads to step, length = 0
    [infrun] operator(): step-over queue now empty
  [infrun] start_step_over: exit
  [infrun] proceed: start: resuming threads, all-stop-on-top-of-non-stop
    [infrun] proceed: resuming 44426.432744.0
    [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [44426.432744.0] at 0x823baf64a
    [infrun] internal_resume_ptid: non_stop uses inferior_ptid
    [infrun] do_target_resume: resume_ptid=44426.432744.0, step=0, sig=GDB_SIGNAL_0
    [fbsd-nat] resume: start: [LWP 432744 of process 44426], step 0, signo 0 (0)
      [fbsd-nat] resume_one_process: [LWP 432744 of process 44426], step 0, signo 0 (0)
    [fbsd-nat] resume: end: [LWP 432744 of process 44426], step 0, signo 0 (0)
    [infrun] infrun_async: enable=1
    [infrun] prepare_to_wait: prepare_to_wait
    [infrun] proceed: resuming 44426.959198.0
    [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [44426.959198.0] at 0x821d32ccc
    [infrun] internal_resume_ptid: non_stop uses inferior_ptid
    [infrun] do_target_resume: resume_ptid=44426.959198.0, step=0, sig=GDB_SIGNAL_0
    [fbsd-nat] resume: start: [LWP 959198 of process 44426], step 0, signo 0 (0)
      [fbsd-nat] resume_one_process: [LWP 959198 of process 44426], step 0, signo 0 (0)
pid 44426: running_lwps 1
/home/john/work/git/gdb/gdb/fbsd-nat.c:1199: internal-error: resume_one_process: Assertion `info->running_lwps == 0' failed.

I've started on a patch to defer actual process resumes until either
wait or commit_resumed is called, but that has so far caused more
regressions than fixes, so I'm not including it here.  Also, the
logged line of 'start: resuming threads, all-stop-on-top-of-non-stop'
when the target is all-stop only makes me think the test is perhaps
broken in this case.

John Baldwin (9):
  fbsd-nat: Add missing spaces.
  fbsd-nat: Avoid a direct write to target_waitstatus::kind.
  fbsd-nat: Use correct constant for target_waitstatus::sig.
  fbsd-nat: Add a list of pending events.
  fbsd-nat: Defer any ineligible events reported by wait.
  fbsd-nat: Fix resuming and waiting with multiple processes.
  fbsd-nat: Fix several issues with detaching.
  fbsd-nat: Fix thread_alive against a running thread.
  fbsd-nat: Stop a process if it is running before killing it.

 gdb/fbsd-nat.c | 892 +++++++++++++++++++++++++++++++++++++++++--------
 gdb/fbsd-nat.h |  14 +
 2 files changed, 769 insertions(+), 137 deletions(-)