gdb/testsuite: Avoid infinite loop in gdb.reverse/step-reverse.exp

Message ID 20230531200602.165033-1-thiago.bauermann@linaro.org
State New
Headers
Series gdb/testsuite: Avoid infinite loop in gdb.reverse/step-reverse.exp |

Commit Message

Thiago Jung Bauermann May 31, 2023, 8:06 p.m. UTC
  This testcase sometimes gets stuck in a loop for hours when running in our
CI. The problem is that due to an issue unrelated to reverse debugging the
inferior exits early, and because of the overly generic ".*" pattern the
testcase keeps sending the "next" command without noticing that the
inferior is gone.

gdb_test_multiple has a pattern to detect that "The program is not being
run.", but since it is placed after the patterns from the caller it won't
be triggered. It also has a timeout pattern, but for some reason I don't
understand it often doesn't trigger.

Since the test binary is compiled with debug information, fix by changing
the pattern to match the source code line number that is shown by GDB right
after the "step" command.
---
 gdb/testsuite/gdb.reverse/step-reverse.exp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
  

Comments

Guinevere Larsen June 1, 2023, 7:30 a.m. UTC | #1
On 31/05/2023 22:06, Thiago Jung Bauermann via Gdb-patches wrote:
> This testcase sometimes gets stuck in a loop for hours when running in our
> CI. The problem is that due to an issue unrelated to reverse debugging the
> inferior exits early, and because of the overly generic ".*" pattern the
> testcase keeps sending the "next" command without noticing that the
> inferior is gone.
>
> gdb_test_multiple has a pattern to detect that "The program is not being
> run.", but since it is placed after the patterns from the caller it won't
> be triggered. It also has a timeout pattern, but for some reason I don't
> understand it often doesn't trigger.
>
> Since the test binary is compiled with debug information, fix by changing
> the pattern to match the source code line number that is shown by GDB right
> after the "step" command.
> ---

Oops, sorry about that, that never even crossed my mind. This change 
looks pretty simple IMO.

Reviewed-By: Bruno Larsen <blarsen@redhat.com>
  
Tom de Vries June 1, 2023, 9:15 a.m. UTC | #2
On 5/31/23 22:06, Thiago Jung Bauermann via Gdb-patches wrote:
> This testcase sometimes gets stuck in a loop for hours when running in our
> CI. The problem is that due to an issue unrelated to reverse debugging the
> inferior exits early, and because of the overly generic ".*" pattern the
> testcase keeps sending the "next" command without noticing that the
> inferior is gone.
> 
> gdb_test_multiple has a pattern to detect that "The program is not being
> run.", but since it is placed after the patterns from the caller it won't
> be triggered. It also has a timeout pattern, but for some reason I don't
> understand it often doesn't trigger.
> 
> Since the test binary is compiled with debug information, fix by changing
> the pattern to match the source code line number that is shown by GDB right
> after the "step" command.
> ---
>   gdb/testsuite/gdb.reverse/step-reverse.exp | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gdb/testsuite/gdb.reverse/step-reverse.exp b/gdb/testsuite/gdb.reverse/step-reverse.exp
> index 729218d4cb8c..766ca02910af 100644
> --- a/gdb/testsuite/gdb.reverse/step-reverse.exp
> +++ b/gdb/testsuite/gdb.reverse/step-reverse.exp
> @@ -261,7 +261,7 @@ if { "$step_out" == 1 } {
>   	-re -wrap "NEXT OVER THIS RECURSION.*" {
>   	    set step_out 0
>   	}
> -	-re -wrap ".*" {
> +	-re -wrap "^\[0-9\].*"  {
>   	    send_gdb "next\n"
>   	    exp_continue
>   	}
> @@ -286,7 +286,7 @@ gdb_test_multiple "next" "step over recursion inside the recursion" {
>   	gdb_assert {"$seen_recursive_call" == 1} \
>   	    "step over recursion inside the recursion"
>       }
> -    -re -wrap ".*" {
> +    -re -wrap "^\[0-9\].*" {
>   	send_gdb "next\n"
>   	exp_continue
>       }

Hi,

the fix as is addresses the immediate problem.

Introducing an iteration bound as a safety net is probably a good idea 
as well, given the problem you ran into.

Also, I prefer a different style of fix.  I think that this 
gdb_test_multiple is overly complicated by dealing with the prompt in 
two clauses.

I came up with this as first version which removes the need for any .* 
usage (though to be clear, -wrap "" and -wrap ".*" is the same thing):
...
if { "$step_out" == 1 } {
     gdb_test_multiple "next" "stepping out of recursion" {
         -re "NEXT OVER THIS RECURSION" {
             set step_out 0
             exp_continue
         }
         -re -wrap "" {
             if { $step_out == 1 } {
                 send_gdb "next\n"
                 exp_continue
             }
             pass $gdb_test_name
         }
     }
}
...
and this after adding the iteration bound:
...
if { "$step_out" == 1 } {
     set iterations 0
     set max_iterations 10
     gdb_test_multiple "next" "stepping out of recursion" {
         -re "NEXT OVER THIS RECURSION" {
             set step_out 0
             exp_continue
         }
         -re -wrap "" {
             if { $step_out == 1 } {
                 incr iterations
                 if { $iterations == $max_iterations } {
                     fail $gdb_test_name
                     return
                 }

                 send_gdb "next\n"
                 exp_continue
             }
             pass $gdb_test_name
         }
     }
}
...

[ The gdb_test_multiple can be even further simplified, by wrapping it 
in a while that takes care of the iteration. ]

This code doesn't actually trigger for me, so I added this:
...
-gdb_test_multiple "next" "reverse next over recursion" { 

+gdb_test_multiple "step" "reverse next over recursion" { 

...

That gave me the default fail:
...
(gdb) PASS: gdb.reverse/step-reverse.exp: reverse next over call
step^M
recursive_callee (val=17) at 
/data/vries/gdb/src/gdb/testsuite/gdb.reverse/step-reverse.c:41^M
41      } /* EXIT RECURSIVE FUNCTION */^M
(gdb) FAIL: gdb.reverse/step-reverse.exp: reverse next over recursion
...
which did not set step_out to 1, so still the code didn't get 
triggered,so I did:
...
-    -re -wrap ".*RECURSIVE CALL.*" { 

+    -re -wrap ".*(RECURSIVE CALL|EXIT RECURSIVE FUNCTION).*" { 

...

ISTM that the test-case contains complicated and hard-to-trigger code, 
which could be fixed by merging the code for the next and step scenario.

Anyway, I'm fine with your fix, if an iteration bound is added.

But I think at least this part of the test-case could do with a rewrite.

Thanks,
- Tom
  
Andrew Burgess June 1, 2023, 9:32 a.m. UTC | #3
Thiago Jung Bauermann via Gdb-patches <gdb-patches@sourceware.org>
writes:

> This testcase sometimes gets stuck in a loop for hours when running in our
> CI. The problem is that due to an issue unrelated to reverse debugging the
> inferior exits early, and because of the overly generic ".*" pattern the
> testcase keeps sending the "next" command without noticing that the
> inferior is gone.
>
> gdb_test_multiple has a pattern to detect that "The program is not being
> run.", but since it is placed after the patterns from the caller it won't
> be triggered. It also has a timeout pattern, but for some reason I don't
> understand it often doesn't trigger.

The timeout is triggered between successful matches.  So each time the
test matches the '-re -wrap ".*"' this counts as a successful match and
the timeout is reset.

Thanks,
Andrew

>
> Since the test binary is compiled with debug information, fix by changing
> the pattern to match the source code line number that is shown by GDB right
> after the "step" command.
> ---
>  gdb/testsuite/gdb.reverse/step-reverse.exp | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gdb/testsuite/gdb.reverse/step-reverse.exp b/gdb/testsuite/gdb.reverse/step-reverse.exp
> index 729218d4cb8c..766ca02910af 100644
> --- a/gdb/testsuite/gdb.reverse/step-reverse.exp
> +++ b/gdb/testsuite/gdb.reverse/step-reverse.exp
> @@ -261,7 +261,7 @@ if { "$step_out" == 1 } {
>  	-re -wrap "NEXT OVER THIS RECURSION.*" {
>  	    set step_out 0
>  	}
> -	-re -wrap ".*" {
> +	-re -wrap "^\[0-9\].*"  {
>  	    send_gdb "next\n"
>  	    exp_continue
>  	}
> @@ -286,7 +286,7 @@ gdb_test_multiple "next" "step over recursion inside the recursion" {
>  	gdb_assert {"$seen_recursive_call" == 1} \
>  	    "step over recursion inside the recursion"
>      }
> -    -re -wrap ".*" {
> +    -re -wrap "^\[0-9\].*" {
>  	send_gdb "next\n"
>  	exp_continue
>      }
  
Guinevere Larsen June 1, 2023, 9:40 a.m. UTC | #4
On 01/06/2023 11:15, Tom de Vries wrote:
> On 5/31/23 22:06, Thiago Jung Bauermann via Gdb-patches wrote:
>> This testcase sometimes gets stuck in a loop for hours when running 
>> in our
>> CI. The problem is that due to an issue unrelated to reverse 
>> debugging the
>> inferior exits early, and because of the overly generic ".*" pattern the
>> testcase keeps sending the "next" command without noticing that the
>> inferior is gone.
>>
>> gdb_test_multiple has a pattern to detect that "The program is not being
>> run.", but since it is placed after the patterns from the caller it 
>> won't
>> be triggered. It also has a timeout pattern, but for some reason I don't
>> understand it often doesn't trigger.
>>
>> Since the test binary is compiled with debug information, fix by 
>> changing
>> the pattern to match the source code line number that is shown by GDB 
>> right
>> after the "step" command.
>> ---
>>   gdb/testsuite/gdb.reverse/step-reverse.exp | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/gdb/testsuite/gdb.reverse/step-reverse.exp 
>> b/gdb/testsuite/gdb.reverse/step-reverse.exp
>> index 729218d4cb8c..766ca02910af 100644
>> --- a/gdb/testsuite/gdb.reverse/step-reverse.exp
>> +++ b/gdb/testsuite/gdb.reverse/step-reverse.exp
>> @@ -261,7 +261,7 @@ if { "$step_out" == 1 } {
>>       -re -wrap "NEXT OVER THIS RECURSION.*" {
>>           set step_out 0
>>       }
>> -    -re -wrap ".*" {
>> +    -re -wrap "^\[0-9\].*"  {
>>           send_gdb "next\n"
>>           exp_continue
>>       }
>> @@ -286,7 +286,7 @@ gdb_test_multiple "next" "step over recursion 
>> inside the recursion" {
>>       gdb_assert {"$seen_recursive_call" == 1} \
>>           "step over recursion inside the recursion"
>>       }
>> -    -re -wrap ".*" {
>> +    -re -wrap "^\[0-9\].*" {
>>       send_gdb "next\n"
>>       exp_continue
>>       }
>
> Hi,
>
> the fix as is addresses the immediate problem.
>
> Introducing an iteration bound as a safety net is probably a good idea 
> as well, given the problem you ran into.
>
> Also, I prefer a different style of fix.  I think that this 
> gdb_test_multiple is overly complicated by dealing with the prompt in 
> two clauses.
>
> I came up with this as first version which removes the need for any .* 
> usage (though to be clear, -wrap "" and -wrap ".*" is the same thing):
> ...
> if { "$step_out" == 1 } {
>     gdb_test_multiple "next" "stepping out of recursion" {
>         -re "NEXT OVER THIS RECURSION" {
>             set step_out 0
>             exp_continue
>         }
>         -re -wrap "" {
>             if { $step_out == 1 } {
>                 send_gdb "next\n"
>                 exp_continue
>             }
>             pass $gdb_test_name
>         }
>     }
> }
> ...
> and this after adding the iteration bound:
> ...
> if { "$step_out" == 1 } {
>     set iterations 0
>     set max_iterations 10
>     gdb_test_multiple "next" "stepping out of recursion" {
>         -re "NEXT OVER THIS RECURSION" {
>             set step_out 0
>             exp_continue
>         }
>         -re -wrap "" {
>             if { $step_out == 1 } {
>                 incr iterations
>                 if { $iterations == $max_iterations } {
>                     fail $gdb_test_name
>                     return
>                 }
>
>                 send_gdb "next\n"
>                 exp_continue
>             }
>             pass $gdb_test_name
>         }
>     }
> }
> ...
>
> [ The gdb_test_multiple can be even further simplified, by wrapping it 
> in a while that takes care of the iteration. ]
>
> This code doesn't actually trigger for me, so I added this:
> ...
> -gdb_test_multiple "next" "reverse next over recursion" {
> +gdb_test_multiple "step" "reverse next over recursion" {
> ...
>
> That gave me the default fail:
> ...
> (gdb) PASS: gdb.reverse/step-reverse.exp: reverse next over call
> step^M
> recursive_callee (val=17) at 
> /data/vries/gdb/src/gdb/testsuite/gdb.reverse/step-reverse.c:41^M
> 41      } /* EXIT RECURSIVE FUNCTION */^M
> (gdb) FAIL: gdb.reverse/step-reverse.exp: reverse next over recursion
> ...
> which did not set step_out to 1, so still the code didn't get 
> triggered,so I did:
> ...
> -    -re -wrap ".*RECURSIVE CALL.*" {
> +    -re -wrap ".*(RECURSIVE CALL|EXIT RECURSIVE FUNCTION).*" {
> ...
>
> ISTM that the test-case contains complicated and hard-to-trigger code, 
> which could be fixed by merging the code for the next and step scenario.
>
Some of the changes you introduced are not correct with the original 
idea of the test. As a refresher, the problem that is being tested is 
that GDB would not be able to reverse-next and skip recursive functions 
(details: PR 16678). It would skip the first call then stop, which in 
this test case would still be 4 calls deep into the recursion when we 
wanted to stop on main.

The reason the test works in 2 clauses is because only the first part is 
actually testing the fix (that we can reverse-next without stopping at a 
recursive call site), the second case is just resetting the inferior's 
state into a known position in case the first test fails, so we dont get 
the whole test being useless if this feature ever regresses. That's why 
the variable I used is called "step_out", it knows if we have to 
manually step out of all recursive calls. You could simplify this case 
into a single test by only emitting a pass/fail when you leave the loop 
and asserting that step_out is 0, but I thought it would be even harder 
to figure out what the test was doing in the future.

Another problem with adding a maximum iteration count (and the reason I 
didn't go for it originally) is because you have to have enough 
iterations to step out of 4 function calls, and I didn't want to hard 
code that number because changes to the function would not be 
immediately obvious while the use case continued to work. Another way to 
know when the inferior is at the expected state is to look for the 
message that we're back at the main function, so we avoid using ".*" in 
the match.
  
Thiago Jung Bauermann June 1, 2023, 1:21 p.m. UTC | #5
Hello Andrew,

Andrew Burgess <aburgess@redhat.com> writes:

> Thiago Jung Bauermann via Gdb-patches <gdb-patches@sourceware.org>
> writes:
>
>> This testcase sometimes gets stuck in a loop for hours when running in our
>> CI. The problem is that due to an issue unrelated to reverse debugging the
>> inferior exits early, and because of the overly generic ".*" pattern the
>> testcase keeps sending the "next" command without noticing that the
>> inferior is gone.
>>
>> gdb_test_multiple has a pattern to detect that "The program is not being
>> run.", but since it is placed after the patterns from the caller it won't
>> be triggered. It also has a timeout pattern, but for some reason I don't
>> understand it often doesn't trigger.
>
> The timeout is triggered between successful matches.  So each time the
> test matches the '-re -wrap ".*"' this counts as a successful match and
> the timeout is reset.

Thank you for the explanation, it was very helpful. If you don't mind,
I will incorporate it in the patch description:

> gdb_test_multiple has a pattern to detect that "The program is not being
> run.", but since it is placed after the patterns from the caller it won't
> be triggered.  It also has a timeout pattern but because it is triggered
> between successful matches, each time the test matches the '-re -wrap ".*"'
> this counts as a successful match and the timeout is reset.
  
Thiago Jung Bauermann June 1, 2023, 1:27 p.m. UTC | #6
Hello Tom, Hello Bruno,

Bruno Larsen <blarsen@redhat.com> writes:

> On 01/06/2023 11:15, Tom de Vries wrote:
>> On 5/31/23 22:06, Thiago Jung Bauermann via Gdb-patches wrote:
>>> This testcase sometimes gets stuck in a loop for hours when running in our
>>> CI. The problem is that due to an issue unrelated to reverse debugging the
>>> inferior exits early, and because of the overly generic ".*" pattern the
>>> testcase keeps sending the "next" command without noticing that the
>>> inferior is gone.
>>>
>>> gdb_test_multiple has a pattern to detect that "The program is not being
>>> run.", but since it is placed after the patterns from the caller it won't
>>> be triggered. It also has a timeout pattern, but for some reason I don't
>>> understand it often doesn't trigger.
>>>
>>> Since the test binary is compiled with debug information, fix by changing
>>> the pattern to match the source code line number that is shown by GDB right
>>> after the "step" command.
>>> ---
>>>   gdb/testsuite/gdb.reverse/step-reverse.exp | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/gdb/testsuite/gdb.reverse/step-reverse.exp
>>> b/gdb/testsuite/gdb.reverse/step-reverse.exp
>>> index 729218d4cb8c..766ca02910af 100644
>>> --- a/gdb/testsuite/gdb.reverse/step-reverse.exp
>>> +++ b/gdb/testsuite/gdb.reverse/step-reverse.exp
>>> @@ -261,7 +261,7 @@ if { "$step_out" == 1 } {
>>>       -re -wrap "NEXT OVER THIS RECURSION.*" {
>>>           set step_out 0
>>>       }
>>> -    -re -wrap ".*" {
>>> +    -re -wrap "^\[0-9\].*"  {
>>>           send_gdb "next\n"
>>>           exp_continue
>>>       }
>>> @@ -286,7 +286,7 @@ gdb_test_multiple "next" "step over recursion inside the recursion"
>>> {
>>>       gdb_assert {"$seen_recursive_call" == 1} \
>>>           "step over recursion inside the recursion"
>>>       }
>>> -    -re -wrap ".*" {
>>> +    -re -wrap "^\[0-9\].*" {
>>>       send_gdb "next\n"
>>>       exp_continue
>>>       }
>>
>> Hi,
>>
>> the fix as is addresses the immediate problem.
>>
>> Introducing an iteration bound as a safety net is probably a good idea as well, given
>> the problem you ran into.
>>
>> Also, I prefer a different style of fix.  I think that this gdb_test_multiple is overly
>> complicated by dealing with the prompt in two clauses.
>>
>> I came up with this as first version which removes the need for any .* usage (though to
>> be clear, -wrap "" and -wrap ".*" is the same thing):
>> ...
>> if { "$step_out" == 1 } {
>>     gdb_test_multiple "next" "stepping out of recursion" {
>>         -re "NEXT OVER THIS RECURSION" {
>>             set step_out 0
>>             exp_continue
>>         }
>>         -re -wrap "" {
>>             if { $step_out == 1 } {
>>                 send_gdb "next\n"
>>                 exp_continue
>>             }
>>             pass $gdb_test_name
>>         }
>>     }
>> }
>> ...
>> and this after adding the iteration bound:
>> ...
>> if { "$step_out" == 1 } {
>>     set iterations 0
>>     set max_iterations 10
>>     gdb_test_multiple "next" "stepping out of recursion" {
>>         -re "NEXT OVER THIS RECURSION" {
>>             set step_out 0
>>             exp_continue
>>         }
>>         -re -wrap "" {
>>             if { $step_out == 1 } {
>>                 incr iterations
>>                 if { $iterations == $max_iterations } {
>>                     fail $gdb_test_name
>>                     return
>>                 }
>>
>>                 send_gdb "next\n"
>>                 exp_continue
>>             }
>>             pass $gdb_test_name
>>         }
>>     }
>> }
>> ...
>>
>> [ The gdb_test_multiple can be even further simplified, by wrapping it in a while that
>> takes care of the iteration. ]
>>
>> This code doesn't actually trigger for me, so I added this:
>> ...
>> -gdb_test_multiple "next" "reverse next over recursion" {
>> +gdb_test_multiple "step" "reverse next over recursion" {
>> ...
>>
>> That gave me the default fail:
>> ...
>> (gdb) PASS: gdb.reverse/step-reverse.exp: reverse next over call
>> step^M
>> recursive_callee (val=17) at
>> /data/vries/gdb/src/gdb/testsuite/gdb.reverse/step-reverse.c:41^M
>> 41      } /* EXIT RECURSIVE FUNCTION */^M
>> (gdb) FAIL: gdb.reverse/step-reverse.exp: reverse next over recursion
>> ...
>> which did not set step_out to 1, so still the code didn't get triggered,so I did:
>> ...
>> -    -re -wrap ".*RECURSIVE CALL.*" {
>> +    -re -wrap ".*(RECURSIVE CALL|EXIT RECURSIVE FUNCTION).*" {
>> ...
>>
>> ISTM that the test-case contains complicated and hard-to-trigger code, which could be
>> fixed by merging the code for the next and step scenario.
>>
> Some of the changes you introduced are not correct with the original idea of the test. As
> a refresher, the problem that is being tested is that GDB would not be able to
> reverse-next and skip recursive functions (details: PR 16678). It would skip the first
> call then stop, which in this test case would still be 4 calls deep into the recursion
> when we wanted to stop on main.
>
> The reason the test works in 2 clauses is because only the first part is actually testing
> the fix (that we can reverse-next without stopping at a recursive call site), the second
> case is just resetting the inferior's state into a known position in case the first test
> fails, so we dont get the whole test being useless if this feature ever regresses. That's
> why the variable I used is called "step_out", it knows if we have to manually step out of
> all recursive calls. You could simplify this case into a single test by only emitting a
> pass/fail when you leave the loop and asserting that step_out is 0, but I thought it would
> be even harder to figure out what the test was doing in the future.
>
> Another problem with adding a maximum iteration count (and the reason I didn't go for it
> originally) is because you have to have enough iterations to step out of 4 function calls,
> and I didn't want to hard code that number because changes to the function would not be
> immediately obvious while the use case continued to work. Another way to know when the
> inferior is at the expected state is to look for the message that we're back at the main
> function, so we avoid using ".*" in the match.

Thank you both for the quick review and feedback. Regarding adding a
maximum iteration count, I agree with Tom that some ceiling is useful to
avoid the possibility of the testcase running for hours, as has been
happening in my case. Would it be ok if the maximum number of iterations
was very big, such as 100 or 1000? That would leave room for future
changes to the testcase while providing a bounded run time.
  
Guinevere Larsen June 1, 2023, 2:04 p.m. UTC | #7
On 01/06/2023 15:27, Thiago Jung Bauermann wrote:
> Hello Tom, Hello Bruno,
>
> Bruno Larsen <blarsen@redhat.com> writes:
>
>> On 01/06/2023 11:15, Tom de Vries wrote:
>>> On 5/31/23 22:06, Thiago Jung Bauermann via Gdb-patches wrote:
>>>> This testcase sometimes gets stuck in a loop for hours when running in our
>>>> CI. The problem is that due to an issue unrelated to reverse debugging the
>>>> inferior exits early, and because of the overly generic ".*" pattern the
>>>> testcase keeps sending the "next" command without noticing that the
>>>> inferior is gone.
>>>>
>>>> gdb_test_multiple has a pattern to detect that "The program is not being
>>>> run.", but since it is placed after the patterns from the caller it won't
>>>> be triggered. It also has a timeout pattern, but for some reason I don't
>>>> understand it often doesn't trigger.
>>>>
>>>> Since the test binary is compiled with debug information, fix by changing
>>>> the pattern to match the source code line number that is shown by GDB right
>>>> after the "step" command.
>>>> ---
>>>>    gdb/testsuite/gdb.reverse/step-reverse.exp | 4 ++--
>>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/gdb/testsuite/gdb.reverse/step-reverse.exp
>>>> b/gdb/testsuite/gdb.reverse/step-reverse.exp
>>>> index 729218d4cb8c..766ca02910af 100644
>>>> --- a/gdb/testsuite/gdb.reverse/step-reverse.exp
>>>> +++ b/gdb/testsuite/gdb.reverse/step-reverse.exp
>>>> @@ -261,7 +261,7 @@ if { "$step_out" == 1 } {
>>>>        -re -wrap "NEXT OVER THIS RECURSION.*" {
>>>>            set step_out 0
>>>>        }
>>>> -    -re -wrap ".*" {
>>>> +    -re -wrap "^\[0-9\].*"  {
>>>>            send_gdb "next\n"
>>>>            exp_continue
>>>>        }
>>>> @@ -286,7 +286,7 @@ gdb_test_multiple "next" "step over recursion inside the recursion"
>>>> {
>>>>        gdb_assert {"$seen_recursive_call" == 1} \
>>>>            "step over recursion inside the recursion"
>>>>        }
>>>> -    -re -wrap ".*" {
>>>> +    -re -wrap "^\[0-9\].*" {
>>>>        send_gdb "next\n"
>>>>        exp_continue
>>>>        }
>>> Hi,
>>>
>>> the fix as is addresses the immediate problem.
>>>
>>> Introducing an iteration bound as a safety net is probably a good idea as well, given
>>> the problem you ran into.
>>>
>>> Also, I prefer a different style of fix.  I think that this gdb_test_multiple is overly
>>> complicated by dealing with the prompt in two clauses.
>>>
>>> I came up with this as first version which removes the need for any .* usage (though to
>>> be clear, -wrap "" and -wrap ".*" is the same thing):
>>> ...
>>> if { "$step_out" == 1 } {
>>>      gdb_test_multiple "next" "stepping out of recursion" {
>>>          -re "NEXT OVER THIS RECURSION" {
>>>              set step_out 0
>>>              exp_continue
>>>          }
>>>          -re -wrap "" {
>>>              if { $step_out == 1 } {
>>>                  send_gdb "next\n"
>>>                  exp_continue
>>>              }
>>>              pass $gdb_test_name
>>>          }
>>>      }
>>> }
>>> ...
>>> and this after adding the iteration bound:
>>> ...
>>> if { "$step_out" == 1 } {
>>>      set iterations 0
>>>      set max_iterations 10
>>>      gdb_test_multiple "next" "stepping out of recursion" {
>>>          -re "NEXT OVER THIS RECURSION" {
>>>              set step_out 0
>>>              exp_continue
>>>          }
>>>          -re -wrap "" {
>>>              if { $step_out == 1 } {
>>>                  incr iterations
>>>                  if { $iterations == $max_iterations } {
>>>                      fail $gdb_test_name
>>>                      return
>>>                  }
>>>
>>>                  send_gdb "next\n"
>>>                  exp_continue
>>>              }
>>>              pass $gdb_test_name
>>>          }
>>>      }
>>> }
>>> ...
>>>
>>> [ The gdb_test_multiple can be even further simplified, by wrapping it in a while that
>>> takes care of the iteration. ]
>>>
>>> This code doesn't actually trigger for me, so I added this:
>>> ...
>>> -gdb_test_multiple "next" "reverse next over recursion" {
>>> +gdb_test_multiple "step" "reverse next over recursion" {
>>> ...
>>>
>>> That gave me the default fail:
>>> ...
>>> (gdb) PASS: gdb.reverse/step-reverse.exp: reverse next over call
>>> step^M
>>> recursive_callee (val=17) at
>>> /data/vries/gdb/src/gdb/testsuite/gdb.reverse/step-reverse.c:41^M
>>> 41      } /* EXIT RECURSIVE FUNCTION */^M
>>> (gdb) FAIL: gdb.reverse/step-reverse.exp: reverse next over recursion
>>> ...
>>> which did not set step_out to 1, so still the code didn't get triggered,so I did:
>>> ...
>>> -    -re -wrap ".*RECURSIVE CALL.*" {
>>> +    -re -wrap ".*(RECURSIVE CALL|EXIT RECURSIVE FUNCTION).*" {
>>> ...
>>>
>>> ISTM that the test-case contains complicated and hard-to-trigger code, which could be
>>> fixed by merging the code for the next and step scenario.
>>>
>> Some of the changes you introduced are not correct with the original idea of the test. As
>> a refresher, the problem that is being tested is that GDB would not be able to
>> reverse-next and skip recursive functions (details: PR 16678). It would skip the first
>> call then stop, which in this test case would still be 4 calls deep into the recursion
>> when we wanted to stop on main.
>>
>> The reason the test works in 2 clauses is because only the first part is actually testing
>> the fix (that we can reverse-next without stopping at a recursive call site), the second
>> case is just resetting the inferior's state into a known position in case the first test
>> fails, so we dont get the whole test being useless if this feature ever regresses. That's
>> why the variable I used is called "step_out", it knows if we have to manually step out of
>> all recursive calls. You could simplify this case into a single test by only emitting a
>> pass/fail when you leave the loop and asserting that step_out is 0, but I thought it would
>> be even harder to figure out what the test was doing in the future.
>>
>> Another problem with adding a maximum iteration count (and the reason I didn't go for it
>> originally) is because you have to have enough iterations to step out of 4 function calls,
>> and I didn't want to hard code that number because changes to the function would not be
>> immediately obvious while the use case continued to work. Another way to know when the
>> inferior is at the expected state is to look for the message that we're back at the main
>> function, so we avoid using ".*" in the match.
> Thank you both for the quick review and feedback. Regarding adding a
> maximum iteration count, I agree with Tom that some ceiling is useful to
> avoid the possibility of the testcase running for hours, as has been
> happening in my case. Would it be ok if the maximum number of iterations
> was very big, such as 100 or 1000? That would leave room for future
> changes to the testcase while providing a bounded run time.
>
I'm not against this idea in itself, but I think having the regex 
contain "main" so that it picks up the following output:

(gdb) reverse-next
main ()
     at 
/home/blarsen/Documents/fsf_build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.reverse/step-reverse.c:69
69         recursive_callee (32); /* NEXT OVER THIS RECURSION */

which is when we left recursive_callee. This would make it so 
gdb_test_multiple picks up the fact that the inferior finished running.
  
Thiago Jung Bauermann June 1, 2023, 9:50 p.m. UTC | #8
Bruno Larsen <blarsen@redhat.com> writes:

> On 01/06/2023 15:27, Thiago Jung Bauermann wrote:
>> Hello Tom, Hello Bruno,
>>
>> Bruno Larsen <blarsen@redhat.com> writes:
>>
>>> On 01/06/2023 11:15, Tom de Vries wrote:
>>>> On 5/31/23 22:06, Thiago Jung Bauermann via Gdb-patches wrote:
>>>>> This testcase sometimes gets stuck in a loop for hours when running in our
>>>>> CI. The problem is that due to an issue unrelated to reverse debugging the
>>>>> inferior exits early, and because of the overly generic ".*" pattern the
>>>>> testcase keeps sending the "next" command without noticing that the
>>>>> inferior is gone.
>>>>>
>>>>> gdb_test_multiple has a pattern to detect that "The program is not being
>>>>> run.", but since it is placed after the patterns from the caller it won't
>>>>> be triggered. It also has a timeout pattern, but for some reason I don't
>>>>> understand it often doesn't trigger.
>>>>>
>>>>> Since the test binary is compiled with debug information, fix by changing
>>>>> the pattern to match the source code line number that is shown by GDB right
>>>>> after the "step" command.
>>>>> ---
>>>>>    gdb/testsuite/gdb.reverse/step-reverse.exp | 4 ++--
>>>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/gdb/testsuite/gdb.reverse/step-reverse.exp
>>>>> b/gdb/testsuite/gdb.reverse/step-reverse.exp
>>>>> index 729218d4cb8c..766ca02910af 100644
>>>>> --- a/gdb/testsuite/gdb.reverse/step-reverse.exp
>>>>> +++ b/gdb/testsuite/gdb.reverse/step-reverse.exp
>>>>> @@ -261,7 +261,7 @@ if { "$step_out" == 1 } {
>>>>>        -re -wrap "NEXT OVER THIS RECURSION.*" {
>>>>>            set step_out 0
>>>>>        }
>>>>> -    -re -wrap ".*" {
>>>>> +    -re -wrap "^\[0-9\].*"  {
>>>>>            send_gdb "next\n"
>>>>>            exp_continue
>>>>>        }
>>>>> @@ -286,7 +286,7 @@ gdb_test_multiple "next" "step over recursion inside the recursion"
>>>>> {
>>>>>        gdb_assert {"$seen_recursive_call" == 1} \
>>>>>            "step over recursion inside the recursion"
>>>>>        }
>>>>> -    -re -wrap ".*" {
>>>>> +    -re -wrap "^\[0-9\].*" {
>>>>>        send_gdb "next\n"
>>>>>        exp_continue
>>>>>        }
>>>> Hi,
>>>>
>>>> the fix as is addresses the immediate problem.
>>>>
>>>> Introducing an iteration bound as a safety net is probably a good idea as well, given
>>>> the problem you ran into.
>>>>
>>>> Also, I prefer a different style of fix.  I think that this gdb_test_multiple is overly
>>>> complicated by dealing with the prompt in two clauses.
>>>>
>>>> I came up with this as first version which removes the need for any .* usage (though to
>>>> be clear, -wrap "" and -wrap ".*" is the same thing):
>>>> ...
>>>> if { "$step_out" == 1 } {
>>>>      gdb_test_multiple "next" "stepping out of recursion" {
>>>>          -re "NEXT OVER THIS RECURSION" {
>>>>              set step_out 0
>>>>              exp_continue
>>>>          }
>>>>          -re -wrap "" {
>>>>              if { $step_out == 1 } {
>>>>                  send_gdb "next\n"
>>>>                  exp_continue
>>>>              }
>>>>              pass $gdb_test_name
>>>>          }
>>>>      }
>>>> }
>>>> ...
>>>> and this after adding the iteration bound:
>>>> ...
>>>> if { "$step_out" == 1 } {
>>>>      set iterations 0
>>>>      set max_iterations 10
>>>>      gdb_test_multiple "next" "stepping out of recursion" {
>>>>          -re "NEXT OVER THIS RECURSION" {
>>>>              set step_out 0
>>>>              exp_continue
>>>>          }
>>>>          -re -wrap "" {
>>>>              if { $step_out == 1 } {
>>>>                  incr iterations
>>>>                  if { $iterations == $max_iterations } {
>>>>                      fail $gdb_test_name
>>>>                      return
>>>>                  }
>>>>
>>>>                  send_gdb "next\n"
>>>>                  exp_continue
>>>>              }
>>>>              pass $gdb_test_name
>>>>          }
>>>>      }
>>>> }
>>>> ...
>>>>
>>>> [ The gdb_test_multiple can be even further simplified, by wrapping it in a while that
>>>> takes care of the iteration. ]
>>>>
>>>> This code doesn't actually trigger for me, so I added this:
>>>> ...
>>>> -gdb_test_multiple "next" "reverse next over recursion" {
>>>> +gdb_test_multiple "step" "reverse next over recursion" {
>>>> ...
>>>>
>>>> That gave me the default fail:
>>>> ...
>>>> (gdb) PASS: gdb.reverse/step-reverse.exp: reverse next over call
>>>> step^M
>>>> recursive_callee (val=17) at
>>>> /data/vries/gdb/src/gdb/testsuite/gdb.reverse/step-reverse.c:41^M
>>>> 41      } /* EXIT RECURSIVE FUNCTION */^M
>>>> (gdb) FAIL: gdb.reverse/step-reverse.exp: reverse next over recursion
>>>> ...
>>>> which did not set step_out to 1, so still the code didn't get triggered,so I did:
>>>> ...
>>>> -    -re -wrap ".*RECURSIVE CALL.*" {
>>>> +    -re -wrap ".*(RECURSIVE CALL|EXIT RECURSIVE FUNCTION).*" {
>>>> ...
>>>>
>>>> ISTM that the test-case contains complicated and hard-to-trigger code, which could be
>>>> fixed by merging the code for the next and step scenario.
>>>>
>>> Some of the changes you introduced are not correct with the original idea of the test. As
>>> a refresher, the problem that is being tested is that GDB would not be able to
>>> reverse-next and skip recursive functions (details: PR 16678). It would skip the first
>>> call then stop, which in this test case would still be 4 calls deep into the recursion
>>> when we wanted to stop on main.
>>>
>>> The reason the test works in 2 clauses is because only the first part is actually testing
>>> the fix (that we can reverse-next without stopping at a recursive call site), the second
>>> case is just resetting the inferior's state into a known position in case the first test
>>> fails, so we dont get the whole test being useless if this feature ever regresses. That's
>>> why the variable I used is called "step_out", it knows if we have to manually step out of
>>> all recursive calls. You could simplify this case into a single test by only emitting a
>>> pass/fail when you leave the loop and asserting that step_out is 0, but I thought it would
>>> be even harder to figure out what the test was doing in the future.
>>>
>>> Another problem with adding a maximum iteration count (and the reason I didn't go for it
>>> originally) is because you have to have enough iterations to step out of 4 function calls,
>>> and I didn't want to hard code that number because changes to the function would not be
>>> immediately obvious while the use case continued to work. Another way to know when the
>>> inferior is at the expected state is to look for the message that we're back at the main
>>> function, so we avoid using ".*" in the match.
>> Thank you both for the quick review and feedback. Regarding adding a
>> maximum iteration count, I agree with Tom that some ceiling is useful to
>> avoid the possibility of the testcase running for hours, as has been
>> happening in my case. Would it be ok if the maximum number of iterations
>> was very big, such as 100 or 1000? That would leave room for future
>> changes to the testcase while providing a bounded run time.
>>
> I'm not against this idea in itself, but I think having the regex contain "main" so that
> it picks up the following output:
>
> (gdb) reverse-next
> main ()
>     at
> /home/blarsen/Documents/fsf_build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.reverse/step-reverse.c:69
> 69         recursive_callee (32); /* NEXT OVER THIS RECURSION */
>
> which is when we left recursive_callee. This would make it so gdb_test_multiple picks up
> the fact that the inferior finished running.

I just sent a v2, with one of the patterns changed to match the main
function for the case above. Please let me know if that's what you had
in mind.

It also adds a maximum iteration count of 1000 (thank you Tom, for
providing the code). With the new regexes, I don't hit it because
gdb_test_multiple matches the message "The program is not being run."
but if I change to the old regex again then I get, as expected:

FAIL: gdb.reverse/step-reverse.exp: step over recursion inside the recursion (reached 1000 iterations)
  

Patch

diff --git a/gdb/testsuite/gdb.reverse/step-reverse.exp b/gdb/testsuite/gdb.reverse/step-reverse.exp
index 729218d4cb8c..766ca02910af 100644
--- a/gdb/testsuite/gdb.reverse/step-reverse.exp
+++ b/gdb/testsuite/gdb.reverse/step-reverse.exp
@@ -261,7 +261,7 @@  if { "$step_out" == 1 } {
 	-re -wrap "NEXT OVER THIS RECURSION.*" {
 	    set step_out 0
 	}
-	-re -wrap ".*" {
+	-re -wrap "^\[0-9\].*"  {
 	    send_gdb "next\n"
 	    exp_continue
 	}
@@ -286,7 +286,7 @@  gdb_test_multiple "next" "step over recursion inside the recursion" {
 	gdb_assert {"$seen_recursive_call" == 1} \
 	    "step over recursion inside the recursion"
     }
-    -re -wrap ".*" {
+    -re -wrap "^\[0-9\].*" {
 	send_gdb "next\n"
 	exp_continue
     }