Fix fail in gdb.base/interrupt-noterm.exp

Message ID 1453480183-5131-1-git-send-email-yao.qi@linaro.org
State New, archived
Headers

Commit Message

Yao Qi Jan. 22, 2016, 4:29 p.m. UTC
  Hi,
In my testing, I see the following fail intermittently,

interrupt
(gdb) PASS: gdb.base/interrupt-noterm.exp: interrupt
[Inferior 1 (process 13407) exited normally]

Child exited with status 0
FAIL: gdb.base/interrupt-noterm.exp: inferior received SIGINT (timeout)

because the interrupt packet may be sent to GDBserver before the SIGIO
handler is installed.  The fix in this patch is to let GDB wait
for 500 ms between "continue &" and "interrupt" to make sure
SIGIO handler is installed already in GDBserver side.

gdb/testsuite:

2016-01-22  Yao Qi  <yao.qi@linaro.org>

	* gdb.base/interrupt-noterm.exp: Add "after 500".
---
 gdb/testsuite/gdb.base/interrupt-noterm.exp | 3 +++
 1 file changed, 3 insertions(+)
  

Comments

Pedro Alves Jan. 22, 2016, 4:47 p.m. UTC | #1
On 01/22/2016 04:29 PM, Yao Qi wrote:
> Hi,
> In my testing, I see the following fail intermittently,
> 
> interrupt
> (gdb) PASS: gdb.base/interrupt-noterm.exp: interrupt
> [Inferior 1 (process 13407) exited normally]
> 
> Child exited with status 0
> FAIL: gdb.base/interrupt-noterm.exp: inferior received SIGINT (timeout)
> 
> because the interrupt packet may be sent to GDBserver before the SIGIO
> handler is installed.  The fix in this patch is to let GDB wait
> for 500 ms between "continue &" and "interrupt" to make sure
> SIGIO handler is installed already in GDBserver side.

Can you expand the rationale some more?

E.g., why is this not a gdbserver bug?  Instintively I'd say it is.

Thanks,
Pedro Alves

> 
> gdb/testsuite:
> 
> 2016-01-22  Yao Qi  <yao.qi@linaro.org>
> 
> 	* gdb.base/interrupt-noterm.exp: Add "after 500".
> ---
>  gdb/testsuite/gdb.base/interrupt-noterm.exp | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/gdb/testsuite/gdb.base/interrupt-noterm.exp b/gdb/testsuite/gdb.base/interrupt-noterm.exp
> index 05f6076..9b5bb17 100644
> --- a/gdb/testsuite/gdb.base/interrupt-noterm.exp
> +++ b/gdb/testsuite/gdb.base/interrupt-noterm.exp
> @@ -55,6 +55,9 @@ if { $async_supported < 0 } {
>      return 1
>  }
>  
> +# Wait a while so that GDBserver's SIGIO handler is in place.
> +after 500
> +
>  # With native debugging, and no terminal (emulated by interactive-mode
>  # off, above), GDB had a bug where "interrupt" would send SIGINT to
>  # its own process group, instead of the inferior's.
>
  
Yao Qi Jan. 22, 2016, 5:14 p.m. UTC | #2
Pedro Alves <palves@redhat.com> writes:

> Can you expand the rationale some more?
>
> E.g., why is this not a gdbserver bug?  Instintively I'd say it is.

The interaction between GDB and GDBserver is like this,

  1. GDB sends vCont;c and doesn't wait for the stop reply because
  "continue &" is background command,
  2. GDBserver receives vCont;c, enables the async i/o (by
  enable_async_io) and resumes the inferior.
  3. GDB sends interrupt packet,

#1 happens before #2 and #3, but the order of #2 and #3 is not
determined.  If #2 happens before #3, it is fine, otherwise, the
GDBserver doesn't know the interrupt from GDB.
  
Pedro Alves Jan. 22, 2016, 5:35 p.m. UTC | #3
On 01/22/2016 05:14 PM, Yao Qi wrote:
> Pedro Alves <palves@redhat.com> writes:
> 
>> Can you expand the rationale some more?
>>
>> E.g., why is this not a gdbserver bug?  Instintively I'd say it is.
> 
> The interaction between GDB and GDBserver is like this,
> 
>   1. GDB sends vCont;c and doesn't wait for the stop reply because
>   "continue &" is background command,
>   2. GDBserver receives vCont;c, enables the async i/o (by
>   enable_async_io) and resumes the inferior.
>   3. GDB sends interrupt packet,
> 
> #1 happens before #2 and #3, but the order of #2 and #3 is not
> determined.  If #2 happens before #3, it is fine, otherwise, the
> GDBserver doesn't know the interrupt from GDB.

If 1. is followed by 3., then the \\003 is always read by gdb
after the vCont;c.  We call enable_async_io before reaching
mywait.  Since we're in all-stop, that means we'll block
inside mywait -> waitpid, all the while \\003 is already available to
read in the socket.  Since we're blocked in waitpid, we won't see
the \\003 until after the next time the program happens to stop.

Agree?

It still seems to me like a gdbserver bug.

I think that after calling enable_async_io, we need to check whether
input is already pending from GDB, and if so, process it immediately -- we
know the only input coming from GDB at this point is a \\003.  IOW, I think
we need to call input_interrupt after calling enable_async_io.  input_interrupt
already uses select before reading, so it handles the case of there
being no input available without blocking.

However, we need to be careful, because a SIGIO can race with calling
input_interrupt from mainline code...

Thanks,
Pedro Alves
  
Pedro Alves Jan. 22, 2016, 6:30 p.m. UTC | #4
On 01/22/2016 05:35 PM, Pedro Alves wrote:
> On 01/22/2016 05:14 PM, Yao Qi wrote:
>> Pedro Alves <palves@redhat.com> writes:
>>
>>> Can you expand the rationale some more?
>>>
>>> E.g., why is this not a gdbserver bug?  Instintively I'd say it is.
>>
>> The interaction between GDB and GDBserver is like this,
>>
>>   1. GDB sends vCont;c and doesn't wait for the stop reply because
>>   "continue &" is background command,
>>   2. GDBserver receives vCont;c, enables the async i/o (by
>>   enable_async_io) and resumes the inferior.
>>   3. GDB sends interrupt packet,
>>
>> #1 happens before #2 and #3, but the order of #2 and #3 is not
>> determined.  If #2 happens before #3, it is fine, otherwise, the
>> GDBserver doesn't know the interrupt from GDB.
> 
> If 1. is followed by 3., then the \\003 is always read by gdb
> after the vCont;c.  We call enable_async_io before reaching
> mywait.  Since we're in all-stop, that means we'll block
> inside mywait -> waitpid, all the while \\003 is already available to
> read in the socket.  Since we're blocked in waitpid, we won't see
> the \\003 until after the next time the program happens to stop.
> 
> Agree?
> 
> It still seems to me like a gdbserver bug.
> 
> I think that after calling enable_async_io, we need to check whether
> input is already pending from GDB, and if so, process it immediately -- we
> know the only input coming from GDB at this point is a \\003.  IOW, I think
> we need to call input_interrupt after calling enable_async_io.  input_interrupt
> already uses select before reading, so it handles the case of there
> being no input available without blocking.
> 
> However, we need to be careful, because a SIGIO can race with calling
> input_interrupt from mainline code...

Might be simpler to always have the SIGIO handler installed (install
it early), and change enable_async_io/disable_async_io to use
sigprocmask to block/unblock the signal.  That way, if input comes
before the signal is unblocked, the handler is called immediately
when enable_async_io is called.

Thanks,
Pedro Alves
  
Yao Qi Jan. 25, 2016, 9:30 a.m. UTC | #5
Pedro Alves <palves@redhat.com> writes:

> If 1. is followed by 3., then the \\003 is always read by gdb

s/ready by/sent by/ ?

> after the vCont;c.  We call enable_async_io before reaching
> mywait.  Since we're in all-stop, that means we'll block

Although we call enable_async_io earlier, so the window between received
vCont;c and calling enable_async_io is tiny, it is still possible that
\\003 arrives at that period.

> inside mywait -> waitpid, all the while \\003 is already available to
> read in the socket.  Since we're blocked in waitpid, we won't see
> the \\003 until after the next time the program happens to stop.
>
> Agree?

Yes, I agree.

>
> It still seems to me like a gdbserver bug.
>
> I think that after calling enable_async_io, we need to check whether
> input is already pending from GDB, and if so, process it immediately -- we
> know the only input coming from GDB at this point is a \\003.  IOW, I think
> we need to call input_interrupt after calling enable_async_io.  input_interrupt
> already uses select before reading, so it handles the case of there
> being no input available without blocking.
>
> However, we need to be careful, because a SIGIO can race with calling
> input_interrupt from mainline code...

What you mean here is that we can call input_interrupt after calling
enable_async_io, but meanwhile, \\0003 arrives, and input_interrupt is
invoked as a SIGIO handler, so there is a race.  Is it correct?

I agree your next email about the approach of block/unblock SIGIO is
better.  I'll give a fix that way.
  
Pedro Alves Jan. 25, 2016, 10:43 a.m. UTC | #6
On 01/25/2016 09:30 AM, Yao Qi wrote:
> Pedro Alves <palves@redhat.com> writes:
> 
>> If 1. is followed by 3., then the \\003 is always read by gdb
> 
> s/ready by/sent by/ ?

I meant, s/by gdb/by gdbserver/.

>>
>> It still seems to me like a gdbserver bug.
>>
>> I think that after calling enable_async_io, we need to check whether
>> input is already pending from GDB, and if so, process it immediately -- we
>> know the only input coming from GDB at this point is a \\003.  IOW, I think
>> we need to call input_interrupt after calling enable_async_io.  input_interrupt
>> already uses select before reading, so it handles the case of there
>> being no input available without blocking.
>>
>> However, we need to be careful, because a SIGIO can race with calling
>> input_interrupt from mainline code...
> 
> What you mean here is that we can call input_interrupt after calling
> enable_async_io, but meanwhile, \\0003 arrives, and input_interrupt is
> invoked as a SIGIO handler, so there is a race.  Is it correct?

That's correct.

> 
> I agree your next email about the approach of block/unblock SIGIO is
> better.  I'll give a fix that way.
> 

Great, thanks!
  
Yao Qi Jan. 26, 2016, 9:58 a.m. UTC | #7
Hi,
In my test, I see the following fail intermittently,

 interrupt
 (gdb) PASS: gdb.base/interrupt-noterm.exp: interrupt
 [Inferior 1 (process 13407) exited normally]

 Child exited with status 0
 FAIL: gdb.base/interrupt-noterm.exp: inferior received SIGINT (timeout)

because interrupt character isn't processed by GDBserver in time due to
two different problems:

 1. GDBserver may receive vCont;c and '\003' together, and read them
    into the buffer.  GDBserver will wait for the inferior but there
    won't be SIGIO because GDBserver has already read '\003' into the
    buffer.

 2. GDBserver may receive vCont;c only.  GDBserver will resume the
    inferior and wait for it.  If '\003' arrives at the moment between
    GDBserver received vCont;c and install SIGIO handler, GDBserver
    can't receive SIGIO.

these two problems above cause GDB wait for the inferior and leave the
interrupt there, so the test is timeout.

Two patches in the series fix these two problems separately.  Patch 1
fixes the problem 1 by checking the buffer *after* reading in each
packet.  Patch 2 fixes the problem 2 by block/unblock SIGIO rather than
install/uninstall signal handler.

These two patches are tested on {x86_64,arm,aarch64}-linux.  No
regression.

*** BLURB HERE ***

Yao Qi (2):
  [GDBserver] Check input interrupt after reading in a packet
  [GDBserver] Block and unblock SIGIO

 gdb/gdbserver/remote-utils.c | 60 ++++++++++++++++++++++++++++----------------
 1 file changed, 39 insertions(+), 21 deletions(-)
  

Patch

diff --git a/gdb/testsuite/gdb.base/interrupt-noterm.exp b/gdb/testsuite/gdb.base/interrupt-noterm.exp
index 05f6076..9b5bb17 100644
--- a/gdb/testsuite/gdb.base/interrupt-noterm.exp
+++ b/gdb/testsuite/gdb.base/interrupt-noterm.exp
@@ -55,6 +55,9 @@  if { $async_supported < 0 } {
     return 1
 }
 
+# Wait a while so that GDBserver's SIGIO handler is in place.
+after 500
+
 # With native debugging, and no terminal (emulated by interactive-mode
 # off, above), GDB had a bug where "interrupt" would send SIGINT to
 # its own process group, instead of the inferior's.