Cygwin/testsuite: Avoid infinite hang

Message ID 20240412152121.418780-1-pedro@palves.net
State New
Headers
Series Cygwin/testsuite: Avoid infinite hang |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 warning Patch is already merged
linaro-tcwg-bot/tcwg_gdb_build--master-arm warning Patch is already merged

Commit Message

Pedro Alves April 12, 2024, 3:21 p.m. UTC
  On Cygwin, the gdb.base/fork-no-detach-follow-child-dlopen.exp
testcase hits a sequence of cascading FAILs:

 (gdb) run
 Starting program: ..../gdb.base/fork-no-detach-follow-child-dlopen/fork-no-detach-follow-child-dlopen
 [New Thread 12672.0x318c]
 [New Thread 12672.0x2844]
 [New Thread 12672.0x714]
 FAIL: gdb.base/fork-no-detach-follow-child-dlopen.exp: runto: run to add (timeout)
 frame
 FAIL: gdb.base/fork-no-detach-follow-child-dlopen.exp: frame (timeout)
 list
 FAIL: gdb.base/fork-no-detach-follow-child-dlopen.exp: list (timeout)

And the test program never makes progress.

... and at this point, Cygwin is completely stuck.  I can't run any
other Cygwin program.

However, if we run the test program outside DejaGnu, we see something
different:

  (gdb) b add
  Function "add" not defined.
  Make breakpoint pending on future shared library load? (y or [n]) y
  Breakpoint 1 (add) pending.
  (gdb) r
  Starting program: ..../gdb.base/fork-no-detach-follow-child-dlopen/fork-no-detach-follow-child-dlopen
  [New Thread 10968.0x834]
  [New Thread 10968.0x29a4]
  [New Thread 10968.0x16b8]
  [New Thread 10968.0xf9c]
  [Switching to Thread 10968.0x16b8]

  Thread 4 "sig" hit Breakpoint 1.2, pending_signals::add (pack=..., this=0x7ffa1e748a40 <sigq>) at /usr/src/debug/cygwin-3.4.9-1/winsup/cygwin/sigproc.cc:1304
  1304      se = sigs + pack.si.si_signo;
  (gdb)

Ah, the test wanted to run to a global "add" function, but managed to
stop at an internal Cygwin method called "add".  And stopping there
deadlocks everything Cygwin in the system.  (I believe some
cygwin1.dll mechanisms use cross-process synchronization or
communication, we're probably blocking something like that.)

Fix this by using "break -q".  The tests FAIL because we don't support
follow-fork for Cygwin, but at least we no longer deadlock the
machine.

Change-Id: I7181d8481c2ae1024b0d73e3bb194f9a4f0a7eb9
---
 gdb/testsuite/gdb.base/fork-no-detach-follow-child-dlopen.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


base-commit: 02d02fc7924992ddd98073b95810b957efdc421a
prerequisite-patch-id: f69cbdabf27c6a6b022b9bbb4244a313146c5d42
prerequisite-patch-id: ea1c719c5c15c52231fd0f75436a90b15f55ff40
prerequisite-patch-id: 7774d24814a00d54d128607cfb12e57aa095a15a
  

Comments

Kevin Buettner April 12, 2024, 4:54 p.m. UTC | #1
On Fri, 12 Apr 2024 16:21:21 +0100
Pedro Alves <pedro@palves.net> wrote:

> On Cygwin, the gdb.base/fork-no-detach-follow-child-dlopen.exp
> testcase hits a sequence of cascading FAILs:
> 
>  (gdb) run
>  Starting program: ..../gdb.base/fork-no-detach-follow-child-dlopen/fork-no-detach-follow-child-dlopen
>  [New Thread 12672.0x318c]
>  [New Thread 12672.0x2844]
>  [New Thread 12672.0x714]
>  FAIL: gdb.base/fork-no-detach-follow-child-dlopen.exp: runto: run to add (timeout)
>  frame
>  FAIL: gdb.base/fork-no-detach-follow-child-dlopen.exp: frame (timeout)
>  list
>  FAIL: gdb.base/fork-no-detach-follow-child-dlopen.exp: list (timeout)
> 
> And the test program never makes progress.
> 
> ... and at this point, Cygwin is completely stuck.  I can't run any
> other Cygwin program.
> 
> However, if we run the test program outside DejaGnu, we see something
> different:
> 
>   (gdb) b add
>   Function "add" not defined.
>   Make breakpoint pending on future shared library load? (y or [n]) y
>   Breakpoint 1 (add) pending.
>   (gdb) r
>   Starting program: ..../gdb.base/fork-no-detach-follow-child-dlopen/fork-no-detach-follow-child-dlopen
>   [New Thread 10968.0x834]
>   [New Thread 10968.0x29a4]
>   [New Thread 10968.0x16b8]
>   [New Thread 10968.0xf9c]
>   [Switching to Thread 10968.0x16b8]
> 
>   Thread 4 "sig" hit Breakpoint 1.2, pending_signals::add (pack=..., this=0x7ffa1e748a40 <sigq>) at /usr/src/debug/cygwin-3.4.9-1/winsup/cygwin/sigproc.cc:1304
>   1304      se = sigs + pack.si.si_signo;
>   (gdb)
> 
> Ah, the test wanted to run to a global "add" function, but managed to
> stop at an internal Cygwin method called "add".  And stopping there
> deadlocks everything Cygwin in the system.  (I believe some
> cygwin1.dll mechanisms use cross-process synchronization or
> communication, we're probably blocking something like that.)
> 
> Fix this by using "break -q".  The tests FAIL because we don't support
> follow-fork for Cygwin, but at least we no longer deadlock the
> machine.

Thanks for the detailed explanation!

Approved-by: Kevin Buettner <kevinb@redhat.com>
  

Patch

diff --git a/gdb/testsuite/gdb.base/fork-no-detach-follow-child-dlopen.exp b/gdb/testsuite/gdb.base/fork-no-detach-follow-child-dlopen.exp
index d56148d79e8..de339c531bd 100644
--- a/gdb/testsuite/gdb.base/fork-no-detach-follow-child-dlopen.exp
+++ b/gdb/testsuite/gdb.base/fork-no-detach-follow-child-dlopen.exp
@@ -45,7 +45,7 @@  proc do_test {} {
     gdb_test_no_output "set follow-fork-mode child"
     gdb_test_no_output "set detach-on-fork off"
 
-    runto "add" allow-pending
+    runto "add" qualified allow-pending
 
     # Since we have debug info in the shlib, we should have the file name available.
     gdb_test "frame" "add \(.*\) at .*$::srcfile2:\[0-9\]+.*"