exceptions.KeyboardInterrupt is thrown in gdb.base/random-signal.exp

Message ID 86ziy2xdt7.fsf@gmail.com
State New, archived
Headers

Commit Message

Yao Qi Nov. 25, 2015, 4:07 p.m. UTC
  Hi,
I am fixing a fail in gdb.base/random-signal.exp like this,

Continuing.^M
PASS: gdb.base/random-signal.exp: continue
^CPython Exception <type 'exceptions.KeyboardInterrupt'> <type 'exceptions.KeyboardInterrupt'>: ^M
FAIL: gdb.base/random-signal.exp: stop with control-c (timeout)

I only see this fail out of 15~20 runs each time.  Is it because GDB
received SIGINT before async_handle_remote_sigint is installed?  so
handle_sigint is still the SIGINT handler, set_quit_flag will call
python stuff, and KeyboardInterrupt is raised as a result.

In the test, we've already been aware of that the signal handler isn't
ready, so "Continuing" is consumed and ctrl-c is delayed in 500ms.

# For this to work we must be sure to consume the "Continuing."
# message first, or GDB's signal handler may not be in place.
after 500 {send_gdb "\003"}

After I read the tcl manul about "after", I feel the usage of "after"
above isn't 100% correct.  As the manual says, the "after" command
returns immediately, and the tcl command {send_gdb "\003"} will be
executed 500 ms later.  It is an asynchronous flavour, but what we want is
a synchronous operation, like this,

after 500
send_gdb "\003"

with this change, I don't see the timeout fail again.  Is it a fix or a
hack?
  

Comments

Pedro Alves Nov. 25, 2015, 4:26 p.m. UTC | #1
On 11/25/2015 04:07 PM, Yao Qi wrote:
> Hi,
> I am fixing a fail in gdb.base/random-signal.exp like this,
> 
> Continuing.^M
> PASS: gdb.base/random-signal.exp: continue
> ^CPython Exception <type 'exceptions.KeyboardInterrupt'> <type 'exceptions.KeyboardInterrupt'>: ^M
> FAIL: gdb.base/random-signal.exp: stop with control-c (timeout)
> 
> I only see this fail out of 15~20 runs each time.  Is it because GDB
> received SIGINT before async_handle_remote_sigint is installed?  so
> handle_sigint is still the SIGINT handler, set_quit_flag will call
> python stuff, and KeyboardInterrupt is raised as a result.
> 
> In the test, we've already been aware of that the signal handler isn't
> ready, so "Continuing" is consumed and ctrl-c is delayed in 500ms.
> 
> # For this to work we must be sure to consume the "Continuing."
> # message first, or GDB's signal handler may not be in place.
> after 500 {send_gdb "\003"}
> 
> After I read the tcl manul about "after", I feel the usage of "after"
> above isn't 100% correct.  As the manual says, the "after" command
> returns immediately, and the tcl command {send_gdb "\003"} will be
> executed 500 ms later.  It is an asynchronous flavour, but what we want is
> a synchronous operation, like this,
> 
> after 500
> send_gdb "\003"
> 
> with this change, I don't see the timeout fail again.  Is it a fix or a
> hack?
> 

Seems like a hack -- I don't see how that can make a difference.  In both
cases, we send \003 after 500ms.

The test sets a software watchpoint, and resumes the target.  That means
the program will be constantly single-stepping, and gdb will be evaluating
the watched expression at each single-step.  I'd suspect that the problem
is likely that while the program is stopped to evaluate the watched
expression, something is calling target_terminal_ours, which restores
handle_sigint as SIGINT handler.  Then somehow you're unlucky to manage to
ctrl-c at that exact time.  The fix in that case is likely to be to call
target_terminal_ours_for_output instead, which doesn't touch the SIGINT
handler.

Thanks,
Pedro Alves
  
Yao Qi Nov. 25, 2015, 5:16 p.m. UTC | #2
Pedro Alves <palves@redhat.com> writes:

> Seems like a hack -- I don't see how that can make a difference.  In both
> cases, we send \003 after 500ms.

The only difference is that {send_gdb "\003"} will be executed in an
event handler later while tcl has already been in gdb_test.

>
> The test sets a software watchpoint, and resumes the target.  That means
> the program will be constantly single-stepping, and gdb will be evaluating
> the watched expression at each single-step.  I'd suspect that the problem
> is likely that while the program is stopped to evaluate the watched
> expression, something is calling target_terminal_ours, which restores
> handle_sigint as SIGINT handler.  Then somehow you're unlucky to manage to
> ctrl-c at that exact time.  The fix in that case is likely to be to call
> target_terminal_ours_for_output instead, which doesn't touch the SIGINT
> handler.

That was one of my clues...  I set breakpoint on target_terminal_ours,
but it isn't hit.  Anyway, I'll look into it further.
  
Yao Qi Dec. 1, 2015, 5:14 p.m. UTC | #3
Pedro Alves <palves@redhat.com> writes:

> The test sets a software watchpoint, and resumes the target.  That means
> the program will be constantly single-stepping, and gdb will be evaluating
> the watched expression at each single-step.  I'd suspect that the problem
> is likely that while the program is stopped to evaluate the watched
> expression, something is calling target_terminal_ours, which restores
> handle_sigint as SIGINT handler.  Then somehow you're unlucky to manage to
> ctrl-c at that exact time.  The fix in that case is likely to be to call
> target_terminal_ours_for_output instead, which doesn't touch the SIGINT
> handler.

The cause of this problem is the SIGINT is handled by python.

As you said, program is being single-stepped constantly, and in each
stop, python unwinder sniffer is used,

 #0  pyuw_sniffer (self=<optimised out>, this_frame=<optimised out>, cache_ptr=0xd554f8) at /home/yao/SourceCode/gnu/gdb/git/gdb/python/py-unwind.c:608
 #1  0x00000000006a10ae in frame_unwind_try_unwinder (this_frame=this_frame@entry=0xd554e0, this_cache=this_cache@entry=0xd554f8, unwinder=0xecd540)
     at /home/yao/SourceCode/gnu/gdb/git/gdb/frame-unwind.c:107
 #2  0x00000000006a143f in frame_unwind_find_by_frame (this_frame=this_frame@entry=0xd554e0, this_cache=this_cache@entry=0xd554f8)
     at /home/yao/SourceCode/gnu/gdb/git/gdb/frame-unwind.c:163
 #3  0x000000000069dc6b in compute_frame_id (fi=0xd554e0) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:454
 #4  get_prev_frame_if_no_cycle (this_frame=this_frame@entry=0xd55410) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1781
 #5  0x000000000069fdb9 in get_prev_frame_always_1 (this_frame=0xd55410) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1955
 #6  get_prev_frame_always (this_frame=this_frame@entry=0xd55410) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1971
 #7  0x00000000006a04b1 in get_prev_frame (this_frame=this_frame@entry=0xd55410) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:2213

and the extension language is set, so GDB changes to cooperative SIGINT
handling by the extension language.  See extension.c:set_active_ext_lang.
If ctrl-c is pressed at that time, it is reasonable to me that python
exception KeyboardInterrupt is thrown out.  In the previous discussion
https://www.sourceware.org/ml/gdb-patches/2014-01/msg00106.html

Tom Tromey wrote:
> The basic idea is that if some extension code is running, then C-c ought
> to interrupt that code in the way expected by programmers writing code
> in that language.

As a GDB developer, I can understand that when python is running, Ctrl-c
should trigger KeyboardInterrupt, and when the inferior is running,
Ctrl-c should send interrupt sequence to the remote target.  If we
accept this fact, we can fix test case, since output of
KeyboardInterrupt is also correct.  However, as a GDB user, it is quite
confusing on the expected behavior for ctrl-c, which depends on the time
ctrl-c is pressed, but I don't have a good idea to fix it, because there
is no such clear boundary of GDB code and python code from both
developers' and users' point of view.
  
Pedro Alves Dec. 3, 2015, 12:46 p.m. UTC | #4
On 12/01/2015 05:14 PM, Yao Qi wrote:
> Pedro Alves <palves@redhat.com> writes:
> 
>> The test sets a software watchpoint, and resumes the target.  That means
>> the program will be constantly single-stepping, and gdb will be evaluating
>> the watched expression at each single-step.  I'd suspect that the problem
>> is likely that while the program is stopped to evaluate the watched
>> expression, something is calling target_terminal_ours, which restores
>> handle_sigint as SIGINT handler.  Then somehow you're unlucky to manage to
>> ctrl-c at that exact time.  The fix in that case is likely to be to call
>> target_terminal_ours_for_output instead, which doesn't touch the SIGINT
>> handler.
> 
> The cause of this problem is the SIGINT is handled by python.
> 
> As you said, program is being single-stepped constantly, and in each
> stop, python unwinder sniffer is used,
> 
>  #0  pyuw_sniffer (self=<optimised out>, this_frame=<optimised out>, cache_ptr=0xd554f8) at /home/yao/SourceCode/gnu/gdb/git/gdb/python/py-unwind.c:608
>  #1  0x00000000006a10ae in frame_unwind_try_unwinder (this_frame=this_frame@entry=0xd554e0, this_cache=this_cache@entry=0xd554f8, unwinder=0xecd540)
>      at /home/yao/SourceCode/gnu/gdb/git/gdb/frame-unwind.c:107
>  #2  0x00000000006a143f in frame_unwind_find_by_frame (this_frame=this_frame@entry=0xd554e0, this_cache=this_cache@entry=0xd554f8)
>      at /home/yao/SourceCode/gnu/gdb/git/gdb/frame-unwind.c:163
>  #3  0x000000000069dc6b in compute_frame_id (fi=0xd554e0) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:454
>  #4  get_prev_frame_if_no_cycle (this_frame=this_frame@entry=0xd55410) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1781
>  #5  0x000000000069fdb9 in get_prev_frame_always_1 (this_frame=0xd55410) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1955
>  #6  get_prev_frame_always (this_frame=this_frame@entry=0xd55410) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1971
>  #7  0x00000000006a04b1 in get_prev_frame (this_frame=this_frame@entry=0xd55410) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:2213
> 
> and the extension language is set, so GDB changes to cooperative SIGINT
> handling by the extension language.  See extension.c:set_active_ext_lang.
> If ctrl-c is pressed at that time, it is reasonable to me that python
> exception KeyboardInterrupt is thrown out.  In the previous discussion
> https://www.sourceware.org/ml/gdb-patches/2014-01/msg00106.html
> 
> Tom Tromey wrote:
>> The basic idea is that if some extension code is running, then C-c ought
>> to interrupt that code in the way expected by programmers writing code
>> in that language.
> 
> As a GDB developer, I can understand that when python is running, Ctrl-c
> should trigger KeyboardInterrupt, and when the inferior is running,
> Ctrl-c should send interrupt sequence to the remote target.  If we
> accept this fact, we can fix test case, since output of
> KeyboardInterrupt is also correct.  However, as a GDB user, it is quite
> confusing on the expected behavior for ctrl-c, which depends on the time
> ctrl-c is pressed, but I don't have a good idea to fix it, because there
> is no such clear boundary of GDB code and python code from both
> developers' and users' point of view.

IMO, if the inferior is running and target_terminal_inferior is not in
effect (*) then the ctrl-c should _not_ trigger a Python KeyboardInterrupt, but instead
be sent to the target -- if the target is running and we're running some
not-supposed-to-be-interactive Python unwinder code while processing some internal stop
event, we know that the Python code will finish quickly and the target should stop
for the SIGINT very soon.  IOW, we treat the ctrl-c at exactly the wrong time as if
it had been pressed a little sooner or later, outside Python.

(*) - and it shouldn't, while an internal event is being processed.

To handle the case of something going wrong and gdb getting stuck in a loop too
long that the target's SIGINT takes forever to be processed, we could make gdb
react to a _second_ (impatient) ctrl-c as "okay, I'm sick of waiting, please stop
whatever you're doing".  This is like how remote.c handles ctrl-c at exactly the
wrong moment (while an internal event is processed) nowadays:

  https://sourceware.org/ml/gdb-patches/2015-08/msg00574.html

Thanks,
Pedro Alves
  
Yao Qi Dec. 3, 2015, 5:09 p.m. UTC | #5
Pedro Alves <palves@redhat.com> writes:

> IMO, if the inferior is running and target_terminal_inferior is not in
> effect (*) then the ctrl-c should _not_ trigger a Python
> KeyboardInterrupt, but instead
> be sent to the target -- if the target is running and we're running some
> not-supposed-to-be-interactive Python unwinder code while processing
> some internal stop
> event, we know that the Python code will finish quickly and the target
> should stop
> for the SIGINT very soon.  IOW, we treat the ctrl-c at exactly the
> wrong time as if
> it had been pressed a little sooner or later, outside Python.
>
> (*) - and it shouldn't, while an internal event is being processed.

If I understand you correctly, ctrl-c shouldn't trigger a Python
KeyboardInterrupt, and we should fix it somewhere in GDB.

>
> To handle the case of something going wrong and gdb getting stuck in a loop too
> long that the target's SIGINT takes forever to be processed, we could make gdb
> react to a _second_ (impatient) ctrl-c as "okay, I'm sick of waiting,
> please stop
> whatever you're doing".  This is like how remote.c handles ctrl-c at exactly the
> wrong moment (while an internal event is processed) nowadays:
>
>   https://sourceware.org/ml/gdb-patches/2015-08/msg00574.html

I don't know how is this problem related to the second ctrl-c.  It is
expected that single ctrl-c should interrupt the target, why do we need
the second ctrl-c in this case?
  
Pedro Alves Dec. 3, 2015, 5:19 p.m. UTC | #6
On 12/03/2015 05:09 PM, Yao Qi wrote:
> Pedro Alves <palves@redhat.com> writes:
> 
>> IMO, if the inferior is running and target_terminal_inferior is not in
>> effect (*) then the ctrl-c should _not_ trigger a Python
>> KeyboardInterrupt, but instead
>> be sent to the target -- if the target is running and we're running some
>> not-supposed-to-be-interactive Python unwinder code while processing
>> some internal stop
>> event, we know that the Python code will finish quickly and the target
>> should stop
>> for the SIGINT very soon.  IOW, we treat the ctrl-c at exactly the
>> wrong time as if
>> it had been pressed a little sooner or later, outside Python.
>>
>> (*) - and it shouldn't, while an internal event is being processed.
> 
> If I understand you correctly, ctrl-c shouldn't trigger a Python
> KeyboardInterrupt, and we should fix it somewhere in GDB.

Yes.

> 
>>
>> To handle the case of something going wrong and gdb getting stuck in a loop too
>> long that the target's SIGINT takes forever to be processed, we could make gdb
>> react to a _second_ (impatient) ctrl-c as "okay, I'm sick of waiting,
>> please stop
>> whatever you're doing".  This is like how remote.c handles ctrl-c at exactly the
>> wrong moment (while an internal event is processed) nowadays:
>>
>>   https://sourceware.org/ml/gdb-patches/2015-08/msg00574.html
> 
> I don't know how is this problem related to the second ctrl-c.  It is
> expected that single ctrl-c should interrupt the target, why do we need
> the second ctrl-c in this case?
> 

Say the Python code has a bug and ends up stuck in a loop and
doesn't finish quickly what it's supposed to do (the target is never
re-resumed and the SIGINT stop never processed).  The user types
ctrl-c, and, nothing happens.  After a while, the user gets sick of waiting,
and presses ctrl-c again.  At this point, we could ask the user what
do to, raise KeyboardInterrupt, Quit, etc.

Thanks,
Pedro Alves
  

Patch

diff --git a/gdb/testsuite/gdb.base/random-signal.exp b/gdb/testsuite/gdb.base/random-signal.exp
index 566668a..59c8f5b 100644
--- a/gdb/testsuite/gdb.base/random-signal.exp
+++ b/gdb/testsuite/gdb.base/random-signal.exp
@@ -38,5 +38,6 @@  gdb_test_multiple "continue" "continue" {
 
 # For this to work we must be sure to consume the "Continuing."
 # message first, or GDB's signal handler may not be in place.
-after 500 {send_gdb "\003"}
+after 500
+send_gdb "\003"