[v2] Fix random dejagnu test abort with simulator target

Message ID AS8P193MB1285F810C8C9E340FCAAE304E4122@AS8P193MB1285.EURP193.PROD.OUTLOOK.COM
State New
Headers
Series [v2] Fix random dejagnu test abort with simulator target |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Testing passed

Commit Message

Bernd Edlinger April 22, 2024, 9:55 a.m. UTC
  This is probably a dejagnu issue with the gdb testsuite
when a simulator target is used.  I observed random
testrun aborts with dejagnu 1.6.2-1 from ubuntu 20.04
The problem starts when the test case gdb.base/sigwinch-notty.exp
tries to execute "sleep", although that is impossible with a
simulator.  And for unknown reason the test case completes
(with errors) before the "after 1000" block is run.

Then in a totally different test this happens with 50% likelihood:

ERROR: (DejaGnu) proc "bgerror {can't read "gdb_pid": no such variable}" does not exist.
The error code is TCL LOOKUP COMMAND bgerror
The info on the error is:
invalid command name "bgerror"
    while executing
"::tcl_unknown bgerror {can't read "gdb_pid": no such variable}"
    ("uplevel" body line 1)
    invoked from within
"uplevel 1 ::tcl_unknown $args"

                === gdb Summary ===

 # of expected passes            30815
 # of unexpected failures        241
 # of expected failures          3
 # of known failures             23
 # of unresolved testcases       241
 # of untested testcases         96
 # of unsupported tests          532
 # of paths in test names        1

So the whole test run is aborted in the middle.

This patch should fix the issue.

Co-Authored-By: Tom de Vries <tdevries@suse.de>
---
 gdb/testsuite/gdb.base/sigwinch-notty.exp | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

v2: I took over Tom's suggestion 1:1, and gave it a few test runs with
no unexpected test aborts so far.
So this looks quite good to me, and has also a nice improvement of giving
an UNSUPPORTED message, with what exactly was the reason why the test
did not run.
  

Comments

Andrew Burgess April 22, 2024, 1:49 p.m. UTC | #1
Bernd Edlinger <bernd.edlinger@hotmail.de> writes:

> This is probably a dejagnu issue with the gdb testsuite
> when a simulator target is used.  I observed random
> testrun aborts with dejagnu 1.6.2-1 from ubuntu 20.04
> The problem starts when the test case gdb.base/sigwinch-notty.exp
> tries to execute "sleep", although that is impossible with a
> simulator.  And for unknown reason the test case completes
> (with errors) before the "after 1000" block is run.
>
> Then in a totally different test this happens with 50% likelihood:
>
> ERROR: (DejaGnu) proc "bgerror {can't read "gdb_pid": no such variable}" does not exist.
> The error code is TCL LOOKUP COMMAND bgerror
> The info on the error is:
> invalid command name "bgerror"
>     while executing
> "::tcl_unknown bgerror {can't read "gdb_pid": no such variable}"
>     ("uplevel" body line 1)
>     invoked from within
> "uplevel 1 ::tcl_unknown $args"
>
>                 === gdb Summary ===
>
>  # of expected passes            30815
>  # of unexpected failures        241
>  # of expected failures          3
>  # of known failures             23
>  # of unresolved testcases       241
>  # of untested testcases         96
>  # of unsupported tests          532
>  # of paths in test names        1
>
> So the whole test run is aborted in the middle.
>
> This patch should fix the issue.
>
> Co-Authored-By: Tom de Vries <tdevries@suse.de>
> ---
>  gdb/testsuite/gdb.base/sigwinch-notty.exp | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
>
> v2: I took over Tom's suggestion 1:1, and gave it a few test runs with
> no unexpected test aborts so far.
> So this looks quite good to me, and has also a nice improvement of giving
> an UNSUPPORTED message, with what exactly was the reason why the test
> did not run.
>
> diff --git a/gdb/testsuite/gdb.base/sigwinch-notty.exp b/gdb/testsuite/gdb.base/sigwinch-notty.exp
> index cef21c07c59..621231df6af 100644
> --- a/gdb/testsuite/gdb.base/sigwinch-notty.exp
> +++ b/gdb/testsuite/gdb.base/sigwinch-notty.exp
> @@ -19,11 +19,17 @@
>  
>  require {!target_info exists gdb,nosignals}
>  
> -# The testfile relies on "run" from the command line, so only works
> -# with "target native".
> -if { [target_info gdb_protocol] != "" } {
> -    return
> -}
> +# The test-case relies on "run" from the command line, so it only works
> +# with "target native", so we need host == target.

I'm currently reading and rereaching the V1 series to try and understand
what's going on here, but I don't understand this comment.

  # The test-case relies on "run" from the command line, so it only works
  # with "target native", so we need host == target.

Just doesn't make sense to me: 'target extended-remote' will also
support the `run` command, as will 'target sim'.

The conclusion might well be valid, but I think the logic used to
justify it is incorrect.   Can we rephrase this comment so it makes
sense?

Thanks,
Andrew


> +#
> +# The test-case uses "exp_pid -i $gdb_spawn_id" which doesn't work with
> +# remote host, so we need build == host.
> +#
> +# In other words, we need build == host == target.
> +require {!is_remote host} {!is_remote target}
> +
> +# Check that we have "target native" as opposed to native-gdbserver etc.
> +require {string equal [target_info gdb_protocol] ""}
>  
>  gdb_exit
>  
> -- 
> 2.39.2
  
Bernd Edlinger April 22, 2024, 2:53 p.m. UTC | #2
On 4/22/24 15:49, Andrew Burgess wrote:
> Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
>> +# The test-case relies on "run" from the command line, so it only works
>> +# with "target native", so we need host == target.
> 
> I'm currently reading and rereaching the V1 series to try and understand
> what's going on here, but I don't understand this comment.
> 
>   # The test-case relies on "run" from the command line, so it only works
>   # with "target native", so we need host == target.
> 
> Just doesn't make sense to me: 'target extended-remote' will also
> support the `run` command, as will 'target sim'.
> 
> The conclusion might well be valid, but I think the logic used to
> justify it is incorrect.   Can we rephrase this comment so it makes
> sense?
> 

I probably don't know how to properly explain that.
But would very much appreciate any advice how to.

The test case ties to do this:
The gdb host/build environment is x86_64-pc-linux-gnu,
and the target environment is riscv-unknown-elf
with newlib so the run command would work but only
to run a program that was built with my riscv-unknown-elf-gcc.
But the test case ties to run "/usr/bin/sleep 3",
therefore the gdb is unable to run that, and since
the stdin is redirected to /dev/null the gdb terminates
immediately, so the test script terminates with one FAIL
and one PASS test case, BUT the after 1000 is still hanging
on, and interrupts the whole test run, or just one test case
that depends on some kind of race condition, and maybe also
a bit on the dejagnu version.
It is interesting that the gdb_pid is no longer known, when
the after 1000 executes, and creates the cryptic message
ERROR: (DejaGnu) proc "bgerror {can't read "gdb_pid": no such variable}" does not exist.

Please feel free to ask if I can give more information.


Thanks
Bernd.
  

Patch

diff --git a/gdb/testsuite/gdb.base/sigwinch-notty.exp b/gdb/testsuite/gdb.base/sigwinch-notty.exp
index cef21c07c59..621231df6af 100644
--- a/gdb/testsuite/gdb.base/sigwinch-notty.exp
+++ b/gdb/testsuite/gdb.base/sigwinch-notty.exp
@@ -19,11 +19,17 @@ 
 
 require {!target_info exists gdb,nosignals}
 
-# The testfile relies on "run" from the command line, so only works
-# with "target native".
-if { [target_info gdb_protocol] != "" } {
-    return
-}
+# The test-case relies on "run" from the command line, so it only works
+# with "target native", so we need host == target.
+#
+# The test-case uses "exp_pid -i $gdb_spawn_id" which doesn't work with
+# remote host, so we need build == host.
+#
+# In other words, we need build == host == target.
+require {!is_remote host} {!is_remote target}
+
+# Check that we have "target native" as opposed to native-gdbserver etc.
+require {string equal [target_info gdb_protocol] ""}
 
 gdb_exit