[00/34] Windows non-stop mode

Message ID 20240507234233.371123-1-pedro@palves.net
Headers
Series Windows non-stop mode |

Message

Pedro Alves May 7, 2024, 11:41 p.m. UTC
  This series adds non-stop mode support to the Windows native backend,
on Windows 10 and above.  Earlier Windows version lack the necessary
feature, so those keep working in all-stop mode, only.

After the series, the Windows target backend defaults to working in
non-stop mode (as in, "maint set target-non-stop"), even if
user-visible mode is all-stop ("set non-stop off").  This is the same
as the Linux backend.

I've been testing this on Cygwin native with the GDB testsuite as I've
been developing this.  Running the testsuite on Cygwin is a pain, and
many testcases run into cascading timeouts still, and some even hang
the test run forever until you kill them manually.  I've got another
series of patches to improve such tests and skip others, and that's
what I've been testing with.  I've also tested the series with the
windows-nat backend forced to all-stop mode, but admittedly not after
the last rebase (as it's so painful), yet.

Pedro Alves (34):
  Windows gdb: Dead code in windows_nat_target::do_initial_windows_stuff
  Windows gdb: Eliminate global current_process.dr[8] global
  Windows gdb+gdbserver: New find_thread, replaces
    thread_rec(DONT_INVALIDATE_CONTEXT)
  Windows gdb: handle_output_debug_string return type
  Windows gdb: Eliminate reload_context
  Windows gdb+gdbserver: Eliminate thread_rec(INVALIDATE_CONTEXT) calls
  Windows gdb+gdbserver: Eliminate DONT_SUSPEND
  Windows gdb+gdbserver: Eliminate windows_process_info::thread_rec
  Windows gdb: Simplify windows_nat_target::wait
  Windows gdb+gdbserver: Move suspending thread to when returning event
  Windows gdb: Introduce continue_last_debug_event_main_thread
  Windows gdb: Introduce windows_continue_flags
  Windows gdb: Factor code out of windows_nat_target::windows_continue
  Windows gdb: Pending stop and current_event
  Windows gdb+gdbserver: Elim desired_stop_thread_id / rework
    pending_stops
  Windows gdb+gdbserver: Introduce get_last_debug_event_ptid
  Windows gdb: Can't pass signal to thread other than last stopped
    thread
  Windows gdbserver: Fix scheduler-locking
  Windows gdb: Enable "set scheduler-locking on"
  Windows gdbserver: Eliminate soft-interrupt mechanism
  Windows gdb+gdbserver: Make current_event per-thread state
  Windows gdb+gdbserver: Make last_sig per-thread state
  Windows gdb+gdbserver: Make siginfo_er per-thread state
  Add backpointer from windows_thread_info to windows_process_info
  Windows gdb+gdbserver: Share $_siginfo reading code
  Windows gdb+gdbserver: Eliminate struct pending_stop
  Windows gdb: Change serial_event management
  Windows gdb: cygwin_set_dr => windows_set_dr, etc.
  windows_per_inferior::continue_one_thread, unify WoW64/non-WoW64 paths
  windows-nat.c: Avoid writing debug registers if watchpoint hit pending
  Windows gdb+gdbserver: Check whether DBG_REPLY_LATER is available
  Windows gdb: Add non-stop support
  Windows gdb: Watchpoints while running (internal vs external stops)
  Mention Windows non-stop support in NEWS

 gdb/NEWS               |    3 +
 gdb/nat/windows-nat.c  |  207 +++--
 gdb/nat/windows-nat.h  |  219 +++---
 gdb/windows-nat.c      | 1616 ++++++++++++++++++++++++++++------------
 gdbserver/win32-low.cc |  401 +++++-----
 gdbserver/win32-low.h  |   19 +-
 6 files changed, 1621 insertions(+), 844 deletions(-)


base-commit: d68f983f88c7469befddcf221228070990cf25e1
  

Comments

Tom Tromey May 8, 2024, 3:40 p.m. UTC | #1
>>>>> "Pedro" == Pedro Alves <pedro@palves.net> writes:

Pedro> This series adds non-stop mode support to the Windows native backend,
Pedro> on Windows 10 and above.  Earlier Windows version lack the necessary
Pedro> feature, so those keep working in all-stop mode, only.

Pedro> After the series, the Windows target backend defaults to working in
Pedro> non-stop mode (as in, "maint set target-non-stop"), even if
Pedro> user-visible mode is all-stop ("set non-stop off").  This is the same
Pedro> as the Linux backend.

Thanks for doing this.  This implements a pretty big subset of my
windows-nat wish-list.

I reviewed the easy subset of the patches.  I'll do some more later.

I'd suggest not landing the more major changes until after GDB 15.
It seems pretty close to the release to add a big change like this.

Pedro> I've been testing this on Cygwin native with the GDB testsuite as I've
Pedro> been developing this.  Running the testsuite on Cygwin is a pain, and
Pedro> many testcases run into cascading timeouts still, and some even hang
Pedro> the test run forever until you kill them manually.

At some point I'll apply the patches and run them through the AdaCore
test suite.

Tom
  
Tom Tromey May 15, 2024, 5:35 p.m. UTC | #2
Tom> At some point I'll apply the patches and run them through the AdaCore
Tom> test suite.

I did this today and I found a few failures.

Now, one thing to note is that I did this by merging your branch into
the AdaCore branch; and AdaCore carries a few local changes.

In particular AdaCore still has the "random thread switch" change that I
submitted a long time ago -- and one of the problems seems to be related
to that.  Without really debugging I don't know if that's a problem in
the series or with the local changes.

Anyway, there are some other problems as well.  For instance:

    (gdb) start
    Temporary breakpoint 1 at 0x140001aab: file p.adb, line 4.
    Starting program: C:\Users\itmgr\sandbox\x86_64-windows64\gdb_version-head_test\tmp\tes
    t\gdb-TS-f539jxg_\J225-024__attach_detach_task\p.exe

    Thread 1 hit Temporary breakpoint 1, p () at p.adb:4
    4          Barrier : Integer := 0;
    PASSED:J225-024__attach_detach_task:start
    (gdb) detach
    Detaching from program: C:\Users\itmgr\sandbox\x86_64-windows64\gdb_version-head_test\t
    mp\test\gdb-TS-f539jxg_\J225-024__attach_detach_task\p.exe, process 4272
    [Inferior 1 (process 4272) detached]
    PASSED:J225-024__attach_detach_task:detach
    (gdb) attach 4272
    Can't attach to process 4272 (error 87): The parameter is incorrect.

Here the test detaches and then re-attaches -- but attach fails.


Another test runs gdb without a symbol file and then does an attach.
The expected output (happens with gdb head) is:

    (gdb) attach 2328
    Attaching to process 2328
    Reading symbols from C:\Users\itmgr\sandbox\x86_64-windows64\gdb_version-head_test\tmp\
    test\gdb-TS-bgt06sfm\N203-009__attach_no_exe\foo_n203_009.EXE...

However with the branch, gdb doesn't try to read the symbols and instead
mentions some system dll:

    (gdb) attach 1276
    Attaching to process 1276
    0x00007ff8b9b50274 in ntdll!ZwDelayExecution ()
       from C:\Windows\SYSTEM32\ntdll.dll


There was also a timeout when detaching.  The inferior being run is
fairly ordinary -- nothing special, just a spot to set a breakpoint:

    (gdb) tbreak break_me
    Temporary breakpoint 1 at 0x140001ca4: file pck.adb, line 37.
    PASSED:CA30-017__detach:tbreak break_me
    (gdb) run
    Starting program: C:\Users\itmgr\sandbox\x86_64-windows64\gdb_version-head_test\tmp\test\gdb-TS-f539jxg_\CA30-017__detach\foo.exe 

    Thread 1 hit Temporary breakpoint 1, pck.break_me () at pck.adb:37
    37	      null;
    PASSED:CA30-017__detach:runto 'break_me'
    (gdb) detach
    TIMEOUT:CA30-017__detach:detach (timeout)
    [killing GDB (pid = N/A (already dead?))]



FWIW I ran the tests before and after the series, on the same machine,
so there shouldn't be any machine configuration issues or anything like
that.

hope this helps,
Tom
  
Pedro Alves May 15, 2024, 8:39 p.m. UTC | #3
Thank you very much!  I will try to reproduce the issues and see if I can fix them.

To understand whether non-stop or all-stop modes were being used in
the tests used below -- was this with a Windows >= 10 machine or older Windows?

Pedro Alves

On 2024-05-15 18:35, Tom Tromey wrote:
> Tom> At some point I'll apply the patches and run them through the AdaCore
> Tom> test suite.
> 
> I did this today and I found a few failures.
> 
> Now, one thing to note is that I did this by merging your branch into
> the AdaCore branch; and AdaCore carries a few local changes.
> 
> In particular AdaCore still has the "random thread switch" change that I
> submitted a long time ago -- and one of the problems seems to be related
> to that.  Without really debugging I don't know if that's a problem in
> the series or with the local changes.
> 
> Anyway, there are some other problems as well.  For instance:
> 
>     (gdb) start
>     Temporary breakpoint 1 at 0x140001aab: file p.adb, line 4.
>     Starting program: C:\Users\itmgr\sandbox\x86_64-windows64\gdb_version-head_test\tmp\tes
>     t\gdb-TS-f539jxg_\J225-024__attach_detach_task\p.exe
> 
>     Thread 1 hit Temporary breakpoint 1, p () at p.adb:4
>     4          Barrier : Integer := 0;
>     PASSED:J225-024__attach_detach_task:start
>     (gdb) detach
>     Detaching from program: C:\Users\itmgr\sandbox\x86_64-windows64\gdb_version-head_test\t
>     mp\test\gdb-TS-f539jxg_\J225-024__attach_detach_task\p.exe, process 4272
>     [Inferior 1 (process 4272) detached]
>     PASSED:J225-024__attach_detach_task:detach
>     (gdb) attach 4272
>     Can't attach to process 4272 (error 87): The parameter is incorrect.
> 
> Here the test detaches and then re-attaches -- but attach fails.
> 
> 
> Another test runs gdb without a symbol file and then does an attach.
> The expected output (happens with gdb head) is:
> 
>     (gdb) attach 2328
>     Attaching to process 2328
>     Reading symbols from C:\Users\itmgr\sandbox\x86_64-windows64\gdb_version-head_test\tmp\
>     test\gdb-TS-bgt06sfm\N203-009__attach_no_exe\foo_n203_009.EXE...
> 
> However with the branch, gdb doesn't try to read the symbols and instead
> mentions some system dll:
> 
>     (gdb) attach 1276
>     Attaching to process 1276
>     0x00007ff8b9b50274 in ntdll!ZwDelayExecution ()
>        from C:\Windows\SYSTEM32\ntdll.dll
> 
> 
> There was also a timeout when detaching.  The inferior being run is
> fairly ordinary -- nothing special, just a spot to set a breakpoint:
> 
>     (gdb) tbreak break_me
>     Temporary breakpoint 1 at 0x140001ca4: file pck.adb, line 37.
>     PASSED:CA30-017__detach:tbreak break_me
>     (gdb) run
>     Starting program: C:\Users\itmgr\sandbox\x86_64-windows64\gdb_version-head_test\tmp\test\gdb-TS-f539jxg_\CA30-017__detach\foo.exe 
> 
>     Thread 1 hit Temporary breakpoint 1, pck.break_me () at pck.adb:37
>     37	      null;
>     PASSED:CA30-017__detach:runto 'break_me'
>     (gdb) detach
>     TIMEOUT:CA30-017__detach:detach (timeout)
>     [killing GDB (pid = N/A (already dead?))]
> 
> 
> 
> FWIW I ran the tests before and after the series, on the same machine,
> so there shouldn't be any machine configuration issues or anything like
> that.
> 
> hope this helps,
> Tom
>
  
Tom Tromey May 16, 2024, 3:53 p.m. UTC | #4
>>>>> "Pedro" == Pedro Alves <pedro@palves.net> writes:

Pedro> Thank you very much!  I will try to reproduce the issues and see if I can fix them.
Pedro> To understand whether non-stop or all-stop modes were being used in
Pedro> the tests used below -- was this with a Windows >= 10 machine or older Windows?

I think it is a "Windows 2019" machine, which searching tells me is
based on Windows 10.  I'm not sure I have access to anything older.

Tom