Prelimit number of bytes to read in "vFile:pread:"

Message ID	1439980862-21305-1-git-send-email-gbenson@redhat.com
State	New, archived
Headers	Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk Sender: gdb-patches-owner@sourceware.org From: Gary Benson <gbenson@redhat.com> To: gdb-patches@sourceware.org Cc: Pedro Alves <palves@redhat.com>, Sandra Loosemore <sandra@codesourcery.com>, Joel Brobecker <brobecker@adacore.com>, Doug Evans <dje@google.com>, Jan Kratochvil <jan.kratochvil@redhat.com>, =?UTF-8?q?Andr=C3=A9=20P=C3=B6nitz?= <apoenitz@t-online.de>, Paul Koning <Paul_Koning@dell.com> Subject: [PATCH] Prelimit number of bytes to read in "vFile:pread:" Date: Wed, 19 Aug 2015 11:41:02 +0100 Message-Id: <1439980862-21305-1-git-send-email-gbenson@redhat.com> In-Reply-To: <55D3DB83.4050204@redhat.com> References: <55D3DB83.4050204@redhat.com>

Message ID

1439980862-21305-1-git-send-email-gbenson@redhat.com

State

New, archived

Headers

Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
Sender: gdb-patches-owner@sourceware.org
From: Gary Benson <gbenson@redhat.com>
To: gdb-patches@sourceware.org
Cc: Pedro Alves <palves@redhat.com>,
	Sandra Loosemore <sandra@codesourcery.com>,
	Joel Brobecker <brobecker@adacore.com>, Doug Evans <dje@google.com>,
	Jan Kratochvil <jan.kratochvil@redhat.com>,
	=?UTF-8?q?Andr=C3=A9=20P=C3=B6nitz?= <apoenitz@t-online.de>,
	Paul Koning <Paul_Koning@dell.com>
Subject: [PATCH] Prelimit number of bytes to read in "vFile:pread:"
Date: Wed, 19 Aug 2015 11:41:02 +0100
Message-Id: <1439980862-21305-1-git-send-email-gbenson@redhat.com>
In-Reply-To: <55D3DB83.4050204@redhat.com>
References: <55D3DB83.4050204@redhat.com>

Commit Message

Gary Benson Aug. 19, 2015, 10:41 a.m. UTC

  Pedro Alves wrote:
> The fact that Gary's chunk size limiting patch made things much
> faster on the nios2 board is still mysterious to me.  I'd expect the
> slowness to be latency bound, given the request/response nature of
> the RSP, but then I'd expect that more chunking would slow things
> down, not speed it up.

I think I figured this out...

While handling "vFile:pread:" packets, gdbserver would read the
number of bytes requested regardless of whether this would fit
into the reply packet.  gdbserver would then return a packet's
worth of data and discard the remainder.  When accessing large
binaries GDB (via BFD) routinely makes large "vFile:pread:"
requests, resulting in gdbserver allocating large unnecessary
buffers and reading some portions of the file many times over.

This commit causes gdbserver to limit the number of bytes to be
read to a sensible maximum prior to allocating buffers and reading
data.

Built and regtested on RHEL 6.6 x86_64.

May I push this to HEAD and to the branch?

Thanks,
Gary

---
gdb/gdbserver/ChangeLog:

	* hostio.c (handle_pread): Do not attempt to read more data
	than hostio_reply_with_data can fit in a packet.
---
 gdb/gdbserver/ChangeLog |    5 +++++
 gdb/gdbserver/hostio.c  |   12 ++++++++++++
 2 files changed, 17 insertions(+), 0 deletions(-)

Comments

Gary Benson Aug. 19, 2015, 10:50 a.m. UTC | #1

Sandra, could you please try this patch on your Altera 3c120 and
on your PandaBoard?  I'm interested to know what the times are
now.

Cheers,
Gary

Gary Benson wrote:
> Pedro Alves wrote:
> > The fact that Gary's chunk size limiting patch made things much
> > faster on the nios2 board is still mysterious to me.  I'd expect
> > the slowness to be latency bound, given the request/response
> > nature of the RSP, but then I'd expect that more chunking would
> > slow things down, not speed it up.
> 
> I think I figured this out...
> 
> While handling "vFile:pread:" packets, gdbserver would read the
> number of bytes requested regardless of whether this would fit
> into the reply packet.  gdbserver would then return a packet's
> worth of data and discard the remainder.  When accessing large
> binaries GDB (via BFD) routinely makes large "vFile:pread:"
> requests, resulting in gdbserver allocating large unnecessary
> buffers and reading some portions of the file many times over.
> 
> This commit causes gdbserver to limit the number of bytes to be
> read to a sensible maximum prior to allocating buffers and reading
> data.
> 
> Built and regtested on RHEL 6.6 x86_64.
> 
> May I push this to HEAD and to the branch?
> 
> Thanks,
> Gary
> 
> ---
> gdb/gdbserver/ChangeLog:
> 
> 	* hostio.c (handle_pread): Do not attempt to read more data
> 	than hostio_reply_with_data can fit in a packet.
> ---
>  gdb/gdbserver/ChangeLog |    5 +++++
>  gdb/gdbserver/hostio.c  |   12 ++++++++++++
>  2 files changed, 17 insertions(+), 0 deletions(-)
> 
> diff --git a/gdb/gdbserver/hostio.c b/gdb/gdbserver/hostio.c
> index b38a6bd..8788f07 100644
> --- a/gdb/gdbserver/hostio.c
> +++ b/gdb/gdbserver/hostio.c
> @@ -344,6 +344,7 @@ handle_pread (char *own_buf, int *new_packet_len)
>  {
>    int fd, ret, len, offset, bytes_sent;
>    char *p, *data;
> +  static int max_reply_size = -1;
>  
>    p = own_buf + strlen ("vFile:pread:");
>  
> @@ -359,6 +360,17 @@ handle_pread (char *own_buf, int *new_packet_len)
>        return;
>      }
>  
> +  /* Do not attempt to read more than the maximum number of bytes
> +     hostio_reply_with_data can fit in a packet.  We may still read
> +     too much because of escaping, but this is handled below.  */
> +  if (max_reply_size == -1)
> +    {
> +      sprintf (own_buf, "F%x;", PBUFSIZ);
> +      max_reply_size = PBUFSIZ - strlen (own_buf);
> +    }
> +  if (len > max_reply_size)
> +    len = max_reply_size;
> +
>    data = xmalloc (len);
>  #ifdef HAVE_PREAD
>    ret = pread (fd, data, len, offset);
> -- 
> 1.7.1
>

Pedro Alves Aug. 19, 2015, 11:44 a.m. UTC | #2

On 08/19/2015 11:41 AM, Gary Benson wrote:
> Pedro Alves wrote:
>> The fact that Gary's chunk size limiting patch made things much
>> faster on the nios2 board is still mysterious to me.  I'd expect the
>> slowness to be latency bound, given the request/response nature of
>> the RSP, but then I'd expect that more chunking would slow things
>> down, not speed it up.
> 
> I think I figured this out...
> 
> While handling "vFile:pread:" packets, gdbserver would read the
> number of bytes requested regardless of whether this would fit
> into the reply packet.  gdbserver would then return a packet's
> worth of data and discard the remainder.  When accessing large
> binaries GDB (via BFD) routinely makes large "vFile:pread:"
> requests, resulting in gdbserver allocating large unnecessary
> buffers and reading some portions of the file many times over.
> 
> This commit causes gdbserver to limit the number of bytes to be
> read to a sensible maximum prior to allocating buffers and reading
> data.
> 
> Built and regtested on RHEL 6.6 x86_64.
> 
> May I push this to HEAD and to the branch?

OK.

Thanks,
Pedro Alves

Pedro Alves Aug. 19, 2015, noon UTC | #3

On 08/19/2015 11:50 AM, Gary Benson wrote:
> Sandra, could you please try this patch on your Altera 3c120 and
> on your PandaBoard?  I'm interested to know what the times are
> now.

Yes please.  If you could then also test the readahead cache patch
I sent, I'd be also interested in whether it further makes a
difference for you.

Thanks,
Pedro Alves

Gary Benson Aug. 19, 2015, 1:07 p.m. UTC | #4

Pedro Alves wrote:
> On 08/19/2015 11:41 AM, Gary Benson wrote:
> > Pedro Alves wrote:
> > > The fact that Gary's chunk size limiting patch made things much
> > > faster on the nios2 board is still mysterious to me.  I'd expect
> > > the slowness to be latency bound, given the request/response
> > > nature of the RSP, but then I'd expect that more chunking would
> > > slow things down, not speed it up.
> > 
> > I think I figured this out...
> > 
> > While handling "vFile:pread:" packets, gdbserver would read the
> > number of bytes requested regardless of whether this would fit
> > into the reply packet.  gdbserver would then return a packet's
> > worth of data and discard the remainder.  When accessing large
> > binaries GDB (via BFD) routinely makes large "vFile:pread:"
> > requests, resulting in gdbserver allocating large unnecessary
> > buffers and reading some portions of the file many times over.
> > 
> > This commit causes gdbserver to limit the number of bytes to be
> > read to a sensible maximum prior to allocating buffers and reading
> > data.
> > 
> > Built and regtested on RHEL 6.6 x86_64.
> > 
> > May I push this to HEAD and to the branch?
> 
> OK.

Pushed to both.

Thanks,
Gary

Sandra Loosemore Aug. 19, 2015, 4:40 p.m. UTC | #5

On 08/19/2015 04:50 AM, Gary Benson wrote:
> Sandra, could you please try this patch on your Altera 3c120 and
> on your PandaBoard?  I'm interested to know what the times are
> now.

Wow, this patch made a big improvement!  On the nios2 board the startup 
took 18 seconds the first time and 10 seconds on subsequent attempts -- 
probably some NFS-level caching?  On the PandaBoard it was 3 seconds or 
less.

On 08/19/2015 07:42 AM, Gary Benson wrote:
> Pedro Alves wrote:
>> BTW, the transfers seem to be always interruptible for me, even without
>> Gary's patch, and even the slow ones.
>
> Ok, I'm glad I'm not the only one :)

Unfortunately, I still can't see that ^C is doing anything useful for 
me.  It is not coming back to a gdb prompt any sooner and "info 
sharedlibrary" afterwards prints the same thing whether or not I've 
tried to interrupt it.  This was with unpatched FSF trunk.  How am I 
supposed to tell whether ^C did anything?  How are you guys telling that 
it is doing something useful?  Is there supposed to be some sort of 
message?  If the file transfer from the target is aborted, I think it 
should say that.

I'm also not seeing the warning on the initial sysroot file transfer 
from the target.  I've lost track of this; was that patch not 
approved/committed yet?

-Sandra

Gary Benson Aug. 19, 2015, 5:20 p.m. UTC | #6

Sandra Loosemore wrote:
> On 08/19/2015 04:50 AM, Gary Benson wrote:
> > Sandra, could you please try this patch on your Altera 3c120 and
> > on your PandaBoard?  I'm interested to know what the times are
> > now.
> 
> Wow, this patch made a big improvement!  On the nios2 board the
> startup took 18 seconds the first time and 10 seconds on subsequent
> attempts -- probably some NFS-level caching?  On the PandaBoard it
> was 3 seconds or less.

Great :)  It's in master and 7.10 now.

Could you try Pedro's readahead patch too?  It's the third one here:

https://sourceware.org/ml/gdb-patches/2015-08/msg00489.html

Maybe with the readahead_cache_invalidate in
remote_hostio_set_filesystem removed?  I'm interested to see if that
helps any.  You don't need the other two patches in that message.

> On 08/19/2015 07:42 AM, Gary Benson wrote:
> > Pedro Alves wrote:
> > > BTW, the transfers seem to be always interruptible for me, even
> > > without Gary's patch, and even the slow ones.
> >
> > Ok, I'm glad I'm not the only one :)
> 
> Unfortunately, I still can't see that ^C is doing anything useful
> for me.  It is not coming back to a gdb prompt any sooner and "info
> sharedlibrary" afterwards prints the same thing whether or not I've
> tried to interrupt it.  This was with unpatched FSF trunk.  How am I
> supposed to tell whether ^C did anything?  How are you guys telling
> that it is doing something useful?  Is there supposed to be some
> sort of message?  If the file transfer from the target is aborted, I
> think it should say that.

It stops immediately when I try it.  I'm not familiar with ^C handling
either, so I don't know what would affect it.  Pedro, could this be
a sync/async thing, or something to do with all-stop/non-stop?

> I'm also not seeing the warning on the initial sysroot file transfer
> from the target.  I've lost track of this; was that patch not
> approved/committed yet?

It's not committed yet.  It was part of a pair of patches, the other
being the interruptibility patch that didn't work for you.

If somebody approves the warning patch alone it I'll push it to master
and the branch.  I'm slightly wary about adding a warning as one of
the Valgrind people said their testsuite started tripping over another
warning I recently added, but I think making sure users know what's
happening is more important... testsuites can be fixed :)

Thanks,
Gary

Sandra Loosemore Aug. 19, 2015, 9:12 p.m. UTC | #7

On 08/19/2015 11:20 AM, Gary Benson wrote:
> Sandra Loosemore wrote:
>> On 08/19/2015 04:50 AM, Gary Benson wrote:
>>> Sandra, could you please try this patch on your Altera 3c120 and
>>> on your PandaBoard?  I'm interested to know what the times are
>>> now.
>>
>> Wow, this patch made a big improvement!  On the nios2 board the
>> startup took 18 seconds the first time and 10 seconds on subsequent
>> attempts -- probably some NFS-level caching?  On the PandaBoard it
>> was 3 seconds or less.
>
> Great :)  It's in master and 7.10 now.
>
> Could you try Pedro's readahead patch too?  It's the third one here:
>
> https://sourceware.org/ml/gdb-patches/2015-08/msg00489.html
>
> Maybe with the readahead_cache_invalidate in
> remote_hostio_set_filesystem removed?  I'm interested to see if that
> helps any.  You don't need the other two patches in that message.

This didn't do anything to help; the startup time is still 10-11 seconds.

>> On 08/19/2015 07:42 AM, Gary Benson wrote:
>>> Pedro Alves wrote:
>>>> BTW, the transfers seem to be always interruptible for me, even
>>>> without Gary's patch, and even the slow ones.
>>>
>>> Ok, I'm glad I'm not the only one :)
>>
>> Unfortunately, I still can't see that ^C is doing anything useful
>> for me.  It is not coming back to a gdb prompt any sooner and "info
>> sharedlibrary" afterwards prints the same thing whether or not I've
>> tried to interrupt it.  This was with unpatched FSF trunk.  How am I
>> supposed to tell whether ^C did anything?  How are you guys telling
>> that it is doing something useful?  Is there supposed to be some
>> sort of message?  If the file transfer from the target is aborted, I
>> think it should say that.
>
> It stops immediately when I try it.  I'm not familiar with ^C handling
> either, so I don't know what would affect it.  Pedro, could this be
> a sync/async thing, or something to do with all-stop/non-stop?

Well, here is a clue.  I tried logging the RSP traffic so I could 
compare the interrupted and non-interrupted behavior.  Aside from 
differences in PID numbers (etc), the *only* difference is that the log 
from the interrupted run shows that it's sending a \x03 character to the 
remote target after it does a vCont, after it has already read the 
entire contents of libc.so.  Here's the relevant snippet from the 
transcript:

w $vFile:pread:5,3fff,b77987#e0
r $F3fca; [...]
w $qSymbol::#5b
r $qSymbol:6e70746c5f76657273696f6e#13
w $qSymbol::6e70746c5f76657273696f6e#4d
r $OK#9a
w $m2760,4#9c
r $fa6f3b00#58
w $m2ab0374c,4#f3
r $04fcffde#c2
w $m2ab0374c,4#f3
r $04fcffde#c2
w $m2ab0374c,4#f3
r $04fcffde#c2
w $m2aab6d98,4#2e
r $fa6f3b00#58
w $m2aab6d94,4#2a
r $3ae83e10#2a
w $X2aab6d98,4::(\x00\xf8#ad
r $OK#9a
w $m2aab6d98,4#2e
r $3a2800f8#fc
w $g#67
r $0*,2c84ac2a4084ac2a0*-10**58020100986dab2a0866aa2a030*"9a080* 
1c70ac2a3ceab42
a84b7c22a0*"004084ac2a8883ac2a4084ac2a646eac2a487bac2a0070ac2a3827c32a0*4f8f6fe7
f98f7fe7f986dab2a0*"00d0b1aa2a986dab2a0*}0*;#15
w $m2aaab1d0,4#49
r $17a082b0#f5
w $m2aaab1d0,4#49
r $17a082b0#f5
w $X2aaab1d0,4:\xfao;\x00#12
r $OK#9a
w $QPassSignals:#f3
r $OK#9a
w $vCont;c:p33e.33e#16\x03  <=== here
r $T05swbreak:;1b:f8f6fe7f;20:d0b1aa2a;thread:p33e.33e;core:0;#89

FAOD, this is a trunk checkout with the patch described above applied 
and nothing else.  And, the exact sequence of commands I'm using to try 
to reproduce this is

<target>-gdb a.out
target remote ...
break main
c
^C

-Sandra

Pedro Alves Aug. 20, 2015, 2:48 p.m. UTC | #8

On 08/19/2015 10:12 PM, Sandra Loosemore wrote:
> On 08/19/2015 11:20 AM, Gary Benson wrote:
>> Sandra Loosemore wrote:
>>> On 08/19/2015 04:50 AM, Gary Benson wrote:
>>>> Sandra, could you please try this patch on your Altera 3c120 and
>>>> on your PandaBoard?  I'm interested to know what the times are
>>>> now.
>>>
>>> Wow, this patch made a big improvement!  On the nios2 board the
>>> startup took 18 seconds the first time and 10 seconds on subsequent
>>> attempts -- probably some NFS-level caching?  On the PandaBoard it
>>> was 3 seconds or less.
>>
>> Great :)  It's in master and 7.10 now.
>>
>> Could you try Pedro's readahead patch too?  It's the third one here:
>>
>> https://sourceware.org/ml/gdb-patches/2015-08/msg00489.html
>>
>> Maybe with the readahead_cache_invalidate in
>> remote_hostio_set_filesystem removed?  I'm interested to see if that
>> helps any.  You don't need the other two patches in that message.
> 
> This didn't do anything to help; the startup time is still 10-11 seconds.

OK, that shows that the issue with that board is not really
roundtrip latency.  Not that surprising given you're connected
through ethernet.  Knowing it didn't slow down is already very
useful.

Thanks,
Pedro Alves

Pedro Alves Aug. 20, 2015, 3:51 p.m. UTC | #9

On 08/19/2015 10:12 PM, Sandra Loosemore wrote:

>>> Unfortunately, I still can't see that ^C is doing anything useful
>>> for me.  It is not coming back to a gdb prompt any sooner and "info
>>> sharedlibrary" afterwards prints the same thing whether or not I've
>>> tried to interrupt it.  This was with unpatched FSF trunk.  How am I
>>> supposed to tell whether ^C did anything?  How are you guys telling
>>> that it is doing something useful?  Is there supposed to be some
>>> sort of message?  If the file transfer from the target is aborted, I
>>> think it should say that.

For me, it just says the usual "Quit":

$ gdb -ex "tar rem :9999"
...
Remote debugging using :9999
^CQuit
(gdb)

That interrupts the main executable transfer.  But from below, I
see we weren't testing the same exact thing.

> Well, here is a clue.  I tried logging the RSP traffic so I could 
> compare the interrupted and non-interrupted behavior.  Aside from 
> differences in PID numbers (etc), the *only* difference is that the log 
> from the interrupted run shows that it's sending a \x03 character to the 
> remote target after it does a vCont, after it has already read the 
> entire contents of libc.so.  Here's the relevant snippet from the 
> transcript:

...

> w $QPassSignals:#f3
> r $OK#9a
> w $vCont;c:p33e.33e#16\x03  <=== here
> r $T05swbreak:;1b:f8f6fe7f;20:d0b1aa2a;thread:p33e.33e;core:0;#89

OK, so ctrl-c while the program is running just causes the target
to report a SIGINT stop.  If you ctrl-c at that time, then it should
make no difference whether GDB is retrieving remote files or not.

The trouble is ctrl-c after the target has stopped for the shared
library load event, and gdb is busy parsing the shared library
(elf headers, debug info, etc.), which involves reading chunks of
the file out of the target side with the vFile:pread packets.

> 
> FAOD, this is a trunk checkout with the patch described above applied 
> and nothing else.  And, the exact sequence of commands I'm using to try 
> to reproduce this is
> 
> <target>-gdb a.out
> target remote ...
> break main
> c
> ^C
> 

Thanks.  Given the recipe, it's easy to reproduce.  Gary, can you
check whether you also see this?

(this is again remote debugging gcc76 on the gcc compile farm
from my office machine)

$ gdb -ex "set verbose on" /tmp/gdbserver -ex "tar rem :9999"
GNU gdb (GDB) 7.10.50.20150820-cvs
Reading symbols from /tmp/gdbserver...done.
Remote debugging using :9999
Created trace state variable $trace_timestamp for target's variable 1.
Reading symbols from target:/lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Reading symbols from system-supplied DSO at 0x7ffff7ff8000...(no debugging symbols found)...done.
0x00007ffff7ddd190 in ?? () from target:/lib64/ld-linux-x86-64.so.2
(gdb) b main
Breakpoint 1 at 0x41200c: file ../../../src/gdb/gdbserver/server.c, line 3635.
(gdb) c
Continuing.
^CReading symbols from target:/lib/x86_64-linux-gnu/libdl.so.2...(no debugging symbols found)...done.
Reading symbols from target:/lib/x86_64-linux-gnu/libc.so.6...(no debugging symbols found)...done.

Breakpoint 1, main (argc=1, argv=0x7fffffffe3d8) at ../../../src/gdb/gdbserver/server.c:3635
3635    ../../../src/gdb/gdbserver/server.c: No such file or directory.
(gdb)

There's a "^C" right after "Continuing.", but it was ignored.

Trying again, but this time debugging gdb, I get:

$ gdb --args $g -ex "set verbose on" /tmp/gdbserver  -ex "tar rem :9999" -ex "b main"
...
Breakpoint 1 at 0x41200c: file ../../../src/gdb/gdbserver/server.c, line 3635.
(gdb) c
Continuing.
Reading symbols from target:/lib/x86_64-linux-gnu/libdl.so.2...(no debugging symbols found)...done.
Reading symbols from target:/lib/x86_64-linux-gnu/libc.so.6...^C
Program received signal SIGINT, Interrupt.
0x0000003615eec5d3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81
81      T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)

stepping into the signal handler is a good way to find what
handler is installed:

(top-gdb) handle SIGINT pass
SIGINT is used by the debugger.
Are you sure you want to change it? (y or n) y
Signal        Stop      Print   Pass to program Description
SIGINT        Yes       Yes     Yes             Interrupt
(top-gdb) s
async_handle_remote_sigint (sig=0) at /home/pedro/gdb/mygit/build/../src/gdb/remote.c:5202
5202    {

That is going to send a ctrl-c request to the target.  Let's resume.

(top-gdb) c
Continuing.
(no debugging symbols found)...done.

Breakpoint 1, main (argc=1, argv=0x7fffffffe3d8) at ../../../src/gdb/gdbserver/server.c:3635
3635    ../../../src/gdb/gdbserver/server.c: No such file or directory.
(gdb)

Again, now with getting a backtrace on that SIGINT:

(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x0000003615eec5d3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81
81      T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
Missing separate debuginfos, use: debuginfo-install python-libs-2.7.5-16.fc20.x86_64
(top-gdb) bt
#0  0x0000003615eec5d3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81
#1  0x000000000057849c in gdb_select (n=10, readfds=0x7fffffffc310, writefds=0x0, exceptfds=0x7fffffffc390, timeout=0x7fffffffc410)
    at /home/pedro/gdb/mygit/build/../src/gdb/posix-hdep.c:31
#2  0x00000000004b685a in ser_base_wait_for (scb=0x191e2f0, timeout=1) at /home/pedro/gdb/mygit/build/../src/gdb/ser-base.c:224
#3  0x00000000004b6af2 in do_ser_base_readchar (scb=0x191e2f0, timeout=10000) at /home/pedro/gdb/mygit/build/../src/gdb/ser-base.c:335
#4  0x00000000004b6c36 in generic_readchar (scb=0x191e2f0, timeout=10000, do_readchar=0x4b6a9d <do_ser_base_readchar>)
    at /home/pedro/gdb/mygit/build/../src/gdb/ser-base.c:410
#5  0x00000000004b6cb0 in ser_base_readchar (scb=0x191e2f0, timeout=10000) at /home/pedro/gdb/mygit/build/../src/gdb/ser-base.c:437
#6  0x000000000075d5dd in serial_readchar (scb=0x191e2f0, timeout=10000) at /home/pedro/gdb/mygit/build/../src/gdb/serial.c:382
#7  0x00000000004e2174 in readchar (timeout=10000) at /home/pedro/gdb/mygit/build/../src/gdb/remote.c:7593
#8  0x00000000004e2cdd in getpkt_or_notif_sane_1 (buf=0xec2650, sizeof_buf=0xec2658, forever=0, expecting_notif=0, is_notif=0x0)
    at /home/pedro/gdb/mygit/build/../src/gdb/remote.c:8134
#9  0x00000000004e2fa1 in getpkt_sane (buf=0xec2650, sizeof_buf=0xec2658, forever=0) at /home/pedro/gdb/mygit/build/../src/gdb/remote.c:8234
#10 0x00000000004e71ad in remote_hostio_send_command (command_bytes=20, which_packet=12, remote_errno=0x7fffffffc7a4, attachment=0x7fffffffc680,
    attachment_len=0x7fffffffc678) at /home/pedro/gdb/mygit/build/../src/gdb/remote.c:10319
#11 0x00000000004e77d0 in remote_hostio_pread_vFile (self=0xdcf120 <remote_ops>, fd=5, read_buf=0x19ed770 "\024", len=16383, offset=0, remote_errno=0x7fffffffc7a4)
    at /home/pedro/gdb/mygit/build/../src/gdb/remote.c:10508
#12 0x00000000004e7a76 in remote_hostio_pread (self=0xdcf120 <remote_ops>, fd=5, read_buf=0x7fffffffc940 "", len=64, offset=0, remote_errno=0x7fffffffc7a4)
    at /home/pedro/gdb/mygit/build/../src/gdb/remote.c:10581
#13 0x0000000000691b98 in target_fileio_pread (fd=0, read_buf=0x7fffffffc940 "", len=64, offset=0, target_errno=0x7fffffffc7a4)
    at /home/pedro/gdb/mygit/build/../src/gdb/target.c:2853
#14 0x0000000000678c68 in gdb_bfd_iovec_fileio_pread (abfd=0x1a63240, stream=0x19203a0, buf=0x7fffffffc940, nbytes=64, offset=0)
    at /home/pedro/gdb/mygit/build/../src/gdb/gdb_bfd.c:294
#15 0x00000000007f7c14 in opncls_bread (abfd=0x1a63240, buf=0x7fffffffc940, nbytes=64) at /home/pedro/gdb/mygit/build/../src/bfd/opncls.c:503
#16 0x00000000007f21a8 in bfd_bread (ptr=0x7fffffffc940, size=64, abfd=0x1a63240) at /home/pedro/gdb/mygit/build/../src/bfd/bfdio.c:196
#17 0x000000000080fe2f in bfd_elf64_object_p (abfd=0x1a63240) at /home/pedro/gdb/mygit/build/../src/bfd/elfcode.h:503
#18 0x00000000007f4acb in bfd_check_format_matches (abfd=0x1a63240, format=bfd_object, matching=0x0) at /home/pedro/gdb/mygit/build/../src/bfd/format.c:305
#19 0x00000000007f4545 in bfd_check_format (abfd=0x1a63240, format=bfd_object) at /home/pedro/gdb/mygit/build/../src/bfd/format.c:94
#20 0x00000000007872f2 in solib_bfd_open (pathname=0x1a5dac0 "/lib/x86_64-linux-gnu/libdl.so.2") at /home/pedro/gdb/mygit/build/../src/gdb/solib.c:504
#21 0x0000000000787418 in solib_map_sections (so=0x1a62ba0) at /home/pedro/gdb/mygit/build/../src/gdb/solib.c:544
#22 0x0000000000787c8f in update_solib_list (from_tty=0, target=0xded2a0 <current_target>) at /home/pedro/gdb/mygit/build/../src/gdb/solib.c:892
#23 0x0000000000787e7c in solib_add (pattern=0x0, from_tty=0, target=0xded2a0 <current_target>, readsyms=1) at /home/pedro/gdb/mygit/build/../src/gdb/solib.c:991
#24 0x00000000007886ba in handle_solib_event () at /home/pedro/gdb/mygit/build/../src/gdb/solib.c:1340
#25 0x00000000005b0c19 in bpstat_stop_status (aspace=0x1095140, bp_addr=140737351955008, ptid=..., ws=0x7fffffffd110)
    at /home/pedro/gdb/mygit/build/../src/gdb/breakpoint.c:5612
#26 0x0000000000638379 in handle_signal_stop (ecs=0x7fffffffd0f0) at /home/pedro/gdb/mygit/build/../src/gdb/infrun.c:5599
#27 0x0000000000637167 in handle_inferior_event_1 (ecs=0x7fffffffd0f0) at /home/pedro/gdb/mygit/build/../src/gdb/infrun.c:5028
#28 0x0000000000637209 in handle_inferior_event (ecs=0x7fffffffd0f0) at /home/pedro/gdb/mygit/build/../src/gdb/infrun.c:5053
...

That is, the target stopped for the shared library event, and GDB is busy
reading the shared libraries.

While this is ongoing, the inferior still owns terminal _input_
(including ctrl-c):

(top-gdb) p terminal_state
$1 = terminal_is_ours_for_output

The "step into signal handler" above shows that at the time the problematic
ctrl-c is pressed, the sigint handler that is registered is
async_handle_remote_sigint.  That's the function that sends the remote-interrupt
packet (\x03) to the remote side.  But the target won't be able to report
a SIGINT until the next time it is resumed (and the kernel reports a SIGINT
to ptrace).  So supposedly, the ctrl-c should be ignored until the
whole file is transferred and gdb re-resumes the target.

I don't see a quick, easy and good way around waiting for the whole file to read.
Having GDB handle ctrl-c differently while it is handling internal events
is already a source of trouble, and the fix always seemed to me to be to make ctrl-c
work like what is happening here.  Specifically, I mean that e.g., say that the
user sets a conditional breakpoint that is evaluated on gdb's side, and then the
target stops for that conditional breakpoint, gdb evaluates the expression,
and then resumes the target again, on and on.  If the user presses ctrl-c just
while the conditional breakpoint's expression is being evaluated, and something along
that code path called target_terminal_ours (thus pulling input/ctrl-c to the
regular "Quit" SIGINT handler), gdb behaves differently (shows a "Quit" and the
debug session ends up likely broken) from when the ctrl-c is pressed while the target
is really running.  I'd argue that the ctrl-c in both cases should be passed
down all the way to the target the same way, and that any internal stop and
breakpoint condition evaluation is magic that should be hidden.  Just like
what is happening here with file reading.

Though having said that, it does look like even that isn't working properly,
as I'd expect this:

(top-gdb) c
Continuing.
(no debugging symbols found)...done.

Breakpoint 1, main (argc=1, argv=0x7fffffffe3d8) at ../../../src/gdb/gdbserver/server.c:3635
3635    ../../../src/gdb/gdbserver/server.c: No such file or directory.
(gdb)

to be slow (that is, the file reading isn't really interrupted), but then
the target stops with SIGINT as soon as gdb resumes it again after reading
the DSOs.  But it is reaching main anyway, and there's no sign
of "Program stopped with SIGINT"...

Not sure where the bug is.  It may be on the gdbserver side.

Thanks,
Pedro Alves

diff mbox

Patch

diff --git a/gdb/gdbserver/hostio.c b/gdb/gdbserver/hostio.c
index b38a6bd..8788f07 100644
--- a/gdb/gdbserver/hostio.c
+++ b/gdb/gdbserver/hostio.c
@@ -344,6 +344,7 @@  handle_pread (char *own_buf, int *new_packet_len)
 {
   int fd, ret, len, offset, bytes_sent;
   char *p, *data;
+  static int max_reply_size = -1;
 
   p = own_buf + strlen ("vFile:pread:");
 
@@ -359,6 +360,17 @@  handle_pread (char *own_buf, int *new_packet_len)
       return;
     }
 
+  /* Do not attempt to read more than the maximum number of bytes
+     hostio_reply_with_data can fit in a packet.  We may still read
+     too much because of escaping, but this is handled below.  */
+  if (max_reply_size == -1)
+    {
+      sprintf (own_buf, "F%x;", PBUFSIZ);
+      max_reply_size = PBUFSIZ - strlen (own_buf);
+    }
+  if (len > max_reply_size)
+    len = max_reply_size;
+
   data = xmalloc (len);
 #ifdef HAVE_PREAD
   ret = pread (fd, data, len, offset);