[v4,6/6] gdb/testsuite: Add kfail in gdb.threads/threadcrash.exp on 32-bit arm targets

Message ID 20240308093342.26745-7-tdevries@suse.de
State Dropped
Headers
Series Fixes to gdb.threads/threadcrash.exp |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Testing passed

Commit Message

Tom de Vries March 8, 2024, 9:33 a.m. UTC
  There is an issue with the test gdb.threads/threadcrash.exp on arm
targets, relating to how the targets handles gcores.  The issue is that GDB
can't properly backtrace from a gcore.

Add a kfail referring back to PR corefiles/31294, which tracks the issues with
gcores in 32-bit arm targets.

I cannot reproduce the problem, so this is a best effort based on the logs in
the PR.

Tested on arm-linux and x86_64-linux.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31294
---
 gdb/testsuite/gdb.threads/threadcrash.exp | 27 +++++++++++++++--------
 1 file changed, 18 insertions(+), 9 deletions(-)
  

Comments

Tom de Vries March 8, 2024, 10:24 a.m. UTC | #1
On 3/8/24 10:33, Tom de Vries wrote:
> +	if { $have_kfail } {
> +	    kfail $test
> +	} else {

This needs and additional fixup:
...
diff --git a/gdb/testsuite/gdb.threads/threadcrash.exp 
b/gdb/testsuite/gdb.threads/threadcrash.exp
index 7839f5c358e..3679cdc69fa 100644
--- a/gdb/testsuite/gdb.threads/threadcrash.exp
+++ b/gdb/testsuite/gdb.threads/threadcrash.exp
@@ -123,7 +123,8 @@ proc do_full_test { phase } {
      } else {
  	set have_kfail [expr [string equal $phase gcore] && [is_aarch32_target]]
  	if { $have_kfail } {
-	    kfail $test
+	    kfail corefiles/31294 $test
+	    return
  	} else {
  	    fail $test
  	}
...

Thiago, you mentioned you could reproduce the problem, can you check 
whether this catches it?

Thanks,
- Tom
  
Thiago Jung Bauermann March 8, 2024, 11:57 p.m. UTC | #2
Hello Tom,

Thank you for this series!

Tom de Vries <tdevries@suse.de> writes:

> On 3/8/24 10:33, Tom de Vries wrote:
>> +	if { $have_kfail } {
>> +	    kfail $test
>> +	} else {
>
> This needs and additional fixup:
> ...
> diff --git a/gdb/testsuite/gdb.threads/threadcrash.exp
> b/gdb/testsuite/gdb.threads/threadcrash.exp
> index 7839f5c358e..3679cdc69fa 100644
> --- a/gdb/testsuite/gdb.threads/threadcrash.exp
> +++ b/gdb/testsuite/gdb.threads/threadcrash.exp
> @@ -123,7 +123,8 @@ proc do_full_test { phase } {
>      } else {
>  	set have_kfail [expr [string equal $phase gcore] && [is_aarch32_target]]
>  	if { $have_kfail } {
> -	    kfail $test
> +	    kfail corefiles/31294 $test
> +	    return
>  	} else {
>  	    fail $test
>  	}
> ...
>
> Thiago, you mentioned you could reproduce the problem, can you check whether this catches
> it?

I applied the patch series plus the fixup above to upstream commit
2755241d02d7 ("Add return value to DAP scope"), and I actually can't
reproduce the problem anymore, not sure why. The threads' backtraces
look reasonable. The only "??" are in linux/arm/clone.S:74 from
libc.so.6.

I put the gdb.log here, if there's interest:

https://people.linaro.org/~thiago.bauermann/gdb-pr-31294/gdb.log

--
Thiago
  
Tom de Vries March 11, 2024, 10:15 a.m. UTC | #3
On 3/9/24 00:57, Thiago Jung Bauermann wrote:
> 
> Hello Tom,
> 
> Thank you for this series!
> 
> Tom de Vries <tdevries@suse.de> writes:
> 
>> On 3/8/24 10:33, Tom de Vries wrote:
>>> +	if { $have_kfail } {
>>> +	    kfail $test
>>> +	} else {
>>
>> This needs and additional fixup:
>> ...
>> diff --git a/gdb/testsuite/gdb.threads/threadcrash.exp
>> b/gdb/testsuite/gdb.threads/threadcrash.exp
>> index 7839f5c358e..3679cdc69fa 100644
>> --- a/gdb/testsuite/gdb.threads/threadcrash.exp
>> +++ b/gdb/testsuite/gdb.threads/threadcrash.exp
>> @@ -123,7 +123,8 @@ proc do_full_test { phase } {
>>       } else {
>>   	set have_kfail [expr [string equal $phase gcore] && [is_aarch32_target]]
>>   	if { $have_kfail } {
>> -	    kfail $test
>> +	    kfail corefiles/31294 $test
>> +	    return
>>   	} else {
>>   	    fail $test
>>   	}
>> ...
>>
>> Thiago, you mentioned you could reproduce the problem, can you check whether this catches
>> it?
> 
> I applied the patch series plus the fixup above to upstream commit
> 2755241d02d7 ("Add return value to DAP scope"), and I actually can't
> reproduce the problem anymore, not sure why. The threads' backtraces
> look reasonable. The only "??" are in linux/arm/clone.S:74 from
> libc.so.6.
> 
> I put the gdb.log here, if there's interest:
> 
> https://people.linaro.org/~thiago.bauermann/gdb-pr-31294/gdb.log

Hi Thiago,

thanks for checking.

Ideally, we need to either:
- confirm that the problem still exists, and test and commit the kfail,
   or
- confirm that the problem doesn't exist, drop the kfail patch and close
   the PR.

So, can you try to confirm one or the other, f.i. going back to an 
earlier commit where you were able to reproduce the problem, running in 
a loop, try with/without debug info, etc.  AFAIU, you're the only one 
who can do this because the behaviour hasn't been spotted outside linaro.

If it turns out that we can't confirm either existence or non-existence, 
I suppose we'll have to drop the kfail patch, leave the PR open and wait 
for it to appear again.

I went ahead and committed the series apart from this patch, which 
reflects that situation.

Thanks,
- Tom
  
Thiago Jung Bauermann March 11, 2024, 6:10 p.m. UTC | #4
Hello Tom,

Tom de Vries <tdevries@suse.de> writes:
> On 3/9/24 00:57, Thiago Jung Bauermann wrote:
>>> On 3/8/24 10:33, Tom de Vries wrote:
>>>
>>> Thiago, you mentioned you could reproduce the problem, can you check whether this catches
>>> it?
>> I applied the patch series plus the fixup above to upstream commit
>> 2755241d02d7 ("Add return value to DAP scope"), and I actually can't
>> reproduce the problem anymore, not sure why. The threads' backtraces
>> look reasonable. The only "??" are in linux/arm/clone.S:74 from
>> libc.so.6.
>> I put the gdb.log here, if there's interest:
>> https://people.linaro.org/~thiago.bauermann/gdb-pr-31294/gdb.log
>
> thanks for checking.
>
> Ideally, we need to either:
> - confirm that the problem still exists, and test and commit the kfail,
>   or
> - confirm that the problem doesn't exist, drop the kfail patch and close
>   the PR.
>
> So, can you try to confirm one or the other, f.i. going back to an earlier commit where
> you were able to reproduce the problem, running in a loop, try with/without debug info,
> etc.  AFAIU, you're the only one who can do this because the behaviour hasn't been spotted
> outside linaro.

Yes, of course. I'll see what's going on and determine whether the
problem has really been fixed.

> I went ahead and committed the series apart from this patch, which reflects that
> situation.

Thank you!

--
Thiago
  
Thiago Jung Bauermann March 11, 2024, 11:03 p.m. UTC | #5
Thiago Jung Bauermann <thiago.bauermann@linaro.org> writes:
> Tom de Vries <tdevries@suse.de> writes:
>> Ideally, we need to either:
>> - confirm that the problem still exists, and test and commit the kfail,
>>   or
>> - confirm that the problem doesn't exist, drop the kfail patch and close
>>   the PR.
>>
>> So, can you try to confirm one or the other, f.i. going back to an earlier commit where
>> you were able to reproduce the problem, running in a loop, try with/without debug info,
>> etc.  AFAIU, you're the only one who can do this because the behaviour hasn't been
>> spotted
>> outside linaro.
>
> Yes, of course. I'll see what's going on and determine whether the
> problem has really been fixed.

I confirmed that the problem doesn't happen anymore, and a git bisect
found that commit 9c0aa4c53104 ("Fix disabling of year 2038 support on
32-bit hosts by default") fixed it.

I updated PR 31294 with this information.

--
Thiago
  

Patch

diff --git a/gdb/testsuite/gdb.threads/threadcrash.exp b/gdb/testsuite/gdb.threads/threadcrash.exp
index c72ce73fd6d..7839f5c358e 100644
--- a/gdb/testsuite/gdb.threads/threadcrash.exp
+++ b/gdb/testsuite/gdb.threads/threadcrash.exp
@@ -94,14 +94,14 @@  proc thread_apply_all {} {
 	}
     }
 
-    gdb_assert {$unwind_fail == false}
-
     if { $have_sleep == -1 } {
 	set have_sleep 0
     }
     if { $have_pthread_kill == -1 } {
 	set have_pthread_kill 0
     }
+
+    return [expr !$unwind_fail]
 }
 
 # Perform all the tests we're interested in.  They are:
@@ -109,16 +109,25 @@  proc thread_apply_all {} {
 # * Creating the list of backtraces for all threads seen
 # * testing if GDB recreated the full backtrace we expect for all threads
 
-proc do_full_test {} {
+proc do_full_test { phase } {
     global have_sleep
     global have_pthread_kill
     global test_list
     set thread_count [get_valueof "" "\$_inferior_thread_count" 0]
     gdb_assert {$thread_count == 7}
 
-    thread_apply_all
-
-    gdb_assert {$thread_count == [llength $test_list]}
+    set test thread_apply_all
+    set res [$test]
+    if { $res && $thread_count == [llength $test_list] } {
+	pass $test
+    } else {
+	set have_kfail [expr [string equal $phase gcore] && [is_aarch32_target]]
+	if { $have_kfail } {
+	    kfail $test
+	} else {
+	    fail $test
+	}
+    }
 
     if { $have_sleep } {
 	set sleep ".*sleep.*"
@@ -209,7 +218,7 @@  proc_with_prefix test_live_inferior {} {
     gdb_breakpoint "breakpt"
     gdb_continue_to_breakpoint "running to breakpoint" ".*"
 
-    do_full_test
+    do_full_test live_inferior
 }
 
 # Do all preparation steps for running the corefile tests, then
@@ -229,7 +238,7 @@  proc_with_prefix test_corefile {} {
 	     "A program is being debugged already\\\.  Kill it\\\? \\\(y or n\\\) " \
 	     "y"
 
-    do_full_test
+    do_full_test corefile
 }
 
 # Do all preparation steps for running the gcore tests, then
@@ -267,7 +276,7 @@  proc_with_prefix test_gcore {} {
 	     "A program is being debugged already\\\.  Kill it\\\? \\\(y or n\\\) " \
 	     "y"
 
-    do_full_test
+    do_full_test gcore
 }
 
 standard_testfile