Remote debugging without a binary (regression)

Message ID 1455200365-5270-1-git-send-email-lgustavo@codesourcery.com
State New, archived
Headers

Commit Message

Luis Machado Feb. 11, 2016, 2:19 p.m. UTC
  cc-ing Gary.

It looks like this is fallout from the changes that were added to make
GDB a bit smarter about locating the binary that is being debugged.

When one attempts to do gdbserver-based debugging in the same
machine/filesystem, there is no problem at all.

If the user wants to have the files transfered over the wire, GDB will
handle it. If the user sets a local sysroot path and doesn't want the
file coming through the wire, GDB will use process information to
attempt to locate the binary in the local filesystem.

Now, considering we have a GDB instance running on a local machine and
a gdbserver instance running on a remote machine with a completely separate
filesystem, having the sysroot set will prevent the file from being
downloaded.

GDB will then attempt to be smart and locate the binary through the path that
is reported by gdbserver. This path is from the remote filesystem though, so
there is a chance this file won't even exist in the local filesystem.

In a normal native session (where we start the process from scratch) this would
result in a "No such file or directory" error. And that is fine, because we
really need a binary to get the process started.

But with a local GDB plus a remote gdbserver on a different filesystem, we will
see the same error and the debugging session will end abruptly, giving the user
no chance of doing some debugging without a symbol file.

--
Remote debugging using some_machine:12345
<some_remote_filesystem_path/gdb.d/gdb.base/break: No such file or directory.
--

I tracked this down to remote_add_inferior and its call to (mainly)
exec_file_locate_attach. This specific function will call other functions
that may throw an error, causing everything to stop dead on its tracks.

The following patch guards such a call to prevent those errors from
disrupting a potential debugging session, and display only a warning.

--
Remote debugging using some_machine:12345
warning: <some_remote_filesystem_path/gdb.d/gdb.base/break: No such file or directory.
--

I tried to come up with a valid testcase that would fail with a local
gdb/gdbserver combination, but it seems GDB is smart enough to recognize
a deleted binary with the help of /proc, thus foiling my attempts.

Any ideas?

gdb/ChangeLog:

2016-02-11  Luis Machado  <lgustavo@codesourcery.com>

	* remote.c (remote_add_inferior): Guard block that can throw
	errors.
---
 gdb/remote.c | 30 ++++++++++++++++++++++++++----
 1 file changed, 26 insertions(+), 4 deletions(-)
  

Comments

Gary Benson Feb. 11, 2016, 4:35 p.m. UTC | #1
Hi Luis,

Luis Machado wrote:
> cc-ing Gary.
> 
> It looks like this is fallout from the changes that were added to
> make GDB a bit smarter about locating the binary that is being
> debugged.
> 
> When one attempts to do gdbserver-based debugging in the same
> machine/filesystem, there is no problem at all.
> 
> If the user wants to have the files transfered over the wire, GDB
> will handle it. If the user sets a local sysroot path and doesn't
> want the file coming through the wire, GDB will use process
> information to attempt to locate the binary in the local filesystem.
> 
> Now, considering we have a GDB instance running on a local machine
> and a gdbserver instance running on a remote machine with a
> completely separate filesystem, having the sysroot set will prevent
> the file from being downloaded.
> 
> GDB will then attempt to be smart and locate the binary through the
> path that is reported by gdbserver. This path is from the remote
> filesystem though, so there is a chance this file won't even exist
> in the local filesystem.
> 
> In a normal native session (where we start the process from scratch)
> this would result in a "No such file or directory" error. And that
> is fine, because we really need a binary to get the process started.
> 
> But with a local GDB plus a remote gdbserver on a different
> filesystem, we will see the same error and the debugging session
> will end abruptly, giving the user no chance of doing some debugging
> without a symbol file.
> 
> --
> Remote debugging using some_machine:12345
> <some_remote_filesystem_path/gdb.d/gdb.base/break: No such file or directory.
> --
> 
> I tracked this down to remote_add_inferior and its call to (mainly)
> exec_file_locate_attach. This specific function will call other
> functions that may throw an error, causing everything to stop dead
> on its tracks.
> 
> The following patch guards such a call to prevent those errors from
> disrupting a potential debugging session, and display only a warning.
> 
> --
> Remote debugging using some_machine:12345
> warning: <some_remote_filesystem_path/gdb.d/gdb.base/break: No such file or directory.
> --
> 
> I tried to come up with a valid testcase that would fail with a
> local gdb/gdbserver combination, but it seems GDB is smart enough to
> recognize a deleted binary with the help of /proc, thus foiling my
> attempts.

I don't have any fundamental objection to your patch but I'm not
really sure I understand what's going on here.  You have the sysroot
set to some path that does not exist?  What are you trying to do and
what are you expecting to be able to do?  What did GDB do before?

Thanks,
Gary

--
http://gbenson.net/
  
Luis Machado Feb. 11, 2016, 5:06 p.m. UTC | #2
On 02/11/2016 02:35 PM, Gary Benson wrote:
> Hi Luis,
>
> Luis Machado wrote:
>> cc-ing Gary.
>>
>> It looks like this is fallout from the changes that were added to
>> make GDB a bit smarter about locating the binary that is being
>> debugged.
>>
>> When one attempts to do gdbserver-based debugging in the same
>> machine/filesystem, there is no problem at all.
>>
>> If the user wants to have the files transfered over the wire, GDB
>> will handle it. If the user sets a local sysroot path and doesn't
>> want the file coming through the wire, GDB will use process
>> information to attempt to locate the binary in the local filesystem.
>>
>> Now, considering we have a GDB instance running on a local machine
>> and a gdbserver instance running on a remote machine with a
>> completely separate filesystem, having the sysroot set will prevent
>> the file from being downloaded.
>>
>> GDB will then attempt to be smart and locate the binary through the
>> path that is reported by gdbserver. This path is from the remote
>> filesystem though, so there is a chance this file won't even exist
>> in the local filesystem.
>>
>> In a normal native session (where we start the process from scratch)
>> this would result in a "No such file or directory" error. And that
>> is fine, because we really need a binary to get the process started.
>>
>> But with a local GDB plus a remote gdbserver on a different
>> filesystem, we will see the same error and the debugging session
>> will end abruptly, giving the user no chance of doing some debugging
>> without a symbol file.
>>
>> --
>> Remote debugging using some_machine:12345
>> <some_remote_filesystem_path/gdb.d/gdb.base/break: No such file or directory.
>> --
>>
>> I tracked this down to remote_add_inferior and its call to (mainly)
>> exec_file_locate_attach. This specific function will call other
>> functions that may throw an error, causing everything to stop dead
>> on its tracks.
>>
>> The following patch guards such a call to prevent those errors from
>> disrupting a potential debugging session, and display only a warning.
>>
>> --
>> Remote debugging using some_machine:12345
>> warning: <some_remote_filesystem_path/gdb.d/gdb.base/break: No such file or directory.
>> --
>>
>> I tried to come up with a valid testcase that would fail with a
>> local gdb/gdbserver combination, but it seems GDB is smart enough to
>> recognize a deleted binary with the help of /proc, thus foiling my
>> attempts.
>
> I don't have any fundamental objection to your patch but I'm not
> really sure I understand what's going on here.  You have the sysroot
> set to some path that does not exist?  What are you trying to do and
> what are you expecting to be able to do?  What did GDB do before?

No. The sysroot being anything other than "target:" is needed is order 
to prevent gdbserver from transfering the files over (too slow). Plus, 
i'm not loading any symbol file on GDB's side.

So i'm trying to connect to a gdbserver running on a remote system with 
a separate filesystem. gdbserver will now report the full path to the 
binary on the remote end via the new qXfer:exec-file packet, even if i 
force the sysroot to be empty.

In summary, GDB (running on a local machine) is attempting to use that 
path provided by qXfer:exec-file to open a symbol file that only exists 
on the remote end's filesystem, not in the local filesystem where GDB is 
running.

If GDB fails to locate that file, it will drop the connection due to a 
error that is thrown from within exec_file_locate_attach and its callees.

The correct behavior is for GDB to ignore the lack of a symbol file and 
carry on connecting to the remote target, allowing a symbol-less 
debugging session.

Does that make it clear?
  
Pedro Alves Feb. 11, 2016, 5:31 p.m. UTC | #3
On 02/11/2016 05:06 PM, Luis Machado wrote:

> Does that make it clear?

Sounds like we're missing a test.  Say, a test in gdb.server/
that:

 #1 - does "set sysroot /dev/null"
 #2 - connects
 #3 - does "disassemble", "si", "info registers", or something.

Sounds like we currently fail step #2.  Steps #3 would be there just
to make sure the session is not semi-borked even if we connected
successfully.

Thanks,
Pedro Alves
  
Luis Machado Feb. 11, 2016, 5:42 p.m. UTC | #4
On 02/11/2016 03:31 PM, Pedro Alves wrote:
> On 02/11/2016 05:06 PM, Luis Machado wrote:
>
>> Does that make it clear?
>
> Sounds like we're missing a test.  Say, a test in gdb.server/
> that:
>
>   #1 - does "set sysroot /dev/null"
>   #2 - connects
>   #3 - does "disassemble", "si", "info registers", or something.
>
> Sounds like we currently fail step #2.  Steps #3 would be there just
> to make sure the session is not semi-borked even if we connected
> successfully.

Yeah. Unfortunately that does not work well because GDB is smart enough 
to try and load the binary gdbserver loaded, which happens to be sitting 
in the same filesystem, so it won't fail.

I also attempted to delete the binary right after firing gdbserver up, 
but it also does not fail, though GDB acknowledges now that the file has 
been deleted.

The problematic scenario appears when one has two distinct filesystems. 
Say, one in each machine. Then GDB won't be able to load the symbol file 
and will error out.

This is usually the case when one is doing Linux cross debugging.
  
Gary Benson Feb. 12, 2016, 10:31 a.m. UTC | #5
Luis Machado wrote:
> On 02/11/2016 02:35 PM, Gary Benson wrote:
> >Luis Machado wrote:
> > > It looks like this is fallout from the changes that were added to
> > > make GDB a bit smarter about locating the binary that is being
> > > debugged.
> > > 
> > > When one attempts to do gdbserver-based debugging in the same
> > > machine/filesystem, there is no problem at all.
> > > 
> > > If the user wants to have the files transfered over the wire, GDB
> > > will handle it. If the user sets a local sysroot path and doesn't
> > > want the file coming through the wire, GDB will use process
> > > information to attempt to locate the binary in the local filesystem.
> > > 
> > > Now, considering we have a GDB instance running on a local machine
> > > and a gdbserver instance running on a remote machine with a
> > > completely separate filesystem, having the sysroot set will prevent
> > > the file from being downloaded.
> > > 
> > > GDB will then attempt to be smart and locate the binary through the
> > > path that is reported by gdbserver. This path is from the remote
> > > filesystem though, so there is a chance this file won't even exist
> > > in the local filesystem.
> > > 
> > > In a normal native session (where we start the process from scratch)
> > > this would result in a "No such file or directory" error. And that
> > > is fine, because we really need a binary to get the process started.
> > > 
> > > But with a local GDB plus a remote gdbserver on a different
> > > filesystem, we will see the same error and the debugging session
> > > will end abruptly, giving the user no chance of doing some debugging
> > > without a symbol file.
> > > 
> > > --
> > > Remote debugging using some_machine:12345
> > > <some_remote_filesystem_path/gdb.d/gdb.base/break: No such file or directory.
> > > --
> > > 
> > > I tracked this down to remote_add_inferior and its call to (mainly)
> > > exec_file_locate_attach. This specific function will call other
> > > functions that may throw an error, causing everything to stop dead
> > > on its tracks.
> > > 
> > > The following patch guards such a call to prevent those errors from
> > > disrupting a potential debugging session, and display only a warning.
> > > 
> > > --
> > > Remote debugging using some_machine:12345
> > > warning: <some_remote_filesystem_path/gdb.d/gdb.base/break: No such file or directory.
> > > --
> > > 
> > > I tried to come up with a valid testcase that would fail with a
> > > local gdb/gdbserver combination, but it seems GDB is smart enough to
> > > recognize a deleted binary with the help of /proc, thus foiling my
> > > attempts.
> >
> > I don't have any fundamental objection to your patch but I'm not
> > really sure I understand what's going on here.  You have the
> > sysroot set to some path that does not exist?  What are you trying
> > to do and what are you expecting to be able to do?  What did GDB
> > do before?
> 
> No. The sysroot being anything other than "target:" is needed is
> order to prevent gdbserver from transfering the files over (too
> slow). Plus, i'm not loading any symbol file on GDB's side.
> 
> So i'm trying to connect to a gdbserver running on a remote system
> with a separate filesystem. gdbserver will now report the full path
> to the binary on the remote end via the new qXfer:exec-file packet,
> even if i force the sysroot to be empty.
> 
> In summary, GDB (running on a local machine) is attempting to use
> that path provided by qXfer:exec-file to open a symbol file that
> only exists on the remote end's filesystem, not in the local
> filesystem where GDB is running.
> 
> If GDB fails to locate that file, it will drop the connection due to
> a error that is thrown from within exec_file_locate_attach and its
> callees.
> 
> The correct behavior is for GDB to ignore the lack of a symbol file
> and carry on connecting to the remote target, allowing a symbol-less
> debugging session.
> 
> Does that make it clear?

I'm getting there, but I have a couple more questions:

1) What exactly are you setting sysroot to?  Is it:
    - the empty string
    - a directory full of shared libraries but not the main executable
    - an empty directory
    - a non-existent directory?

2) What exactly is the error being thrown within exec_file_locate_attach?

FWIW I tried this (both on the same machine):

  gdbserver :9999 /bin/ls
  gdb -q -ex "set sysroot /whatever" -ex "target remote :9999"

and got this:

  Reading symbols from /bin/ls...(no debugging symbols found)...done.

which I think is an error: the sysroot is being ignored.

Once again, I have no fundamental problem with your patch, but I want
to make sure we're not papering over some deeper issue.

Thanks,
Gary
  
Luis Machado Feb. 12, 2016, 10:59 a.m. UTC | #6
On 02/12/2016 08:31 AM, Gary Benson wrote:
> Luis Machado wrote:
>> On 02/11/2016 02:35 PM, Gary Benson wrote:
>>> Luis Machado wrote:
>>>> It looks like this is fallout from the changes that were added to
>>>> make GDB a bit smarter about locating the binary that is being
>>>> debugged.
>>>>
>>>> When one attempts to do gdbserver-based debugging in the same
>>>> machine/filesystem, there is no problem at all.
>>>>
>>>> If the user wants to have the files transfered over the wire, GDB
>>>> will handle it. If the user sets a local sysroot path and doesn't
>>>> want the file coming through the wire, GDB will use process
>>>> information to attempt to locate the binary in the local filesystem.
>>>>
>>>> Now, considering we have a GDB instance running on a local machine
>>>> and a gdbserver instance running on a remote machine with a
>>>> completely separate filesystem, having the sysroot set will prevent
>>>> the file from being downloaded.
>>>>
>>>> GDB will then attempt to be smart and locate the binary through the
>>>> path that is reported by gdbserver. This path is from the remote
>>>> filesystem though, so there is a chance this file won't even exist
>>>> in the local filesystem.
>>>>
>>>> In a normal native session (where we start the process from scratch)
>>>> this would result in a "No such file or directory" error. And that
>>>> is fine, because we really need a binary to get the process started.
>>>>
>>>> But with a local GDB plus a remote gdbserver on a different
>>>> filesystem, we will see the same error and the debugging session
>>>> will end abruptly, giving the user no chance of doing some debugging
>>>> without a symbol file.
>>>>
>>>> --
>>>> Remote debugging using some_machine:12345
>>>> <some_remote_filesystem_path/gdb.d/gdb.base/break: No such file or directory.
>>>> --
>>>>
>>>> I tracked this down to remote_add_inferior and its call to (mainly)
>>>> exec_file_locate_attach. This specific function will call other
>>>> functions that may throw an error, causing everything to stop dead
>>>> on its tracks.
>>>>
>>>> The following patch guards such a call to prevent those errors from
>>>> disrupting a potential debugging session, and display only a warning.
>>>>
>>>> --
>>>> Remote debugging using some_machine:12345
>>>> warning: <some_remote_filesystem_path/gdb.d/gdb.base/break: No such file or directory.
>>>> --
>>>>
>>>> I tried to come up with a valid testcase that would fail with a
>>>> local gdb/gdbserver combination, but it seems GDB is smart enough to
>>>> recognize a deleted binary with the help of /proc, thus foiling my
>>>> attempts.
>>>
>>> I don't have any fundamental objection to your patch but I'm not
>>> really sure I understand what's going on here.  You have the
>>> sysroot set to some path that does not exist?  What are you trying
>>> to do and what are you expecting to be able to do?  What did GDB
>>> do before?
>>
>> No. The sysroot being anything other than "target:" is needed is
>> order to prevent gdbserver from transfering the files over (too
>> slow). Plus, i'm not loading any symbol file on GDB's side.
>>
>> So i'm trying to connect to a gdbserver running on a remote system
>> with a separate filesystem. gdbserver will now report the full path
>> to the binary on the remote end via the new qXfer:exec-file packet,
>> even if i force the sysroot to be empty.
>>
>> In summary, GDB (running on a local machine) is attempting to use
>> that path provided by qXfer:exec-file to open a symbol file that
>> only exists on the remote end's filesystem, not in the local
>> filesystem where GDB is running.
>>
>> If GDB fails to locate that file, it will drop the connection due to
>> a error that is thrown from within exec_file_locate_attach and its
>> callees.
>>
>> The correct behavior is for GDB to ignore the lack of a symbol file
>> and carry on connecting to the remote target, allowing a symbol-less
>> debugging session.
>>
>> Does that make it clear?
>
> I'm getting there, but I have a couple more questions:
>
> 1) What exactly are you setting sysroot to?  Is it:
>      - the empty string
>      - a directory full of shared libraries but not the main executable
>      - an empty directory
>      - a non-existent directory?
>

It doesn't matter, as long as it is not "target:", meaning we really 
don't want to load files using the help of gdbserver.

> 2) What exactly is the error being thrown within exec_file_locate_attach?
>

The one i mentioned above:

<some_remote_filesystem_path>/gdb.d/gdb.base/break: No such file or 
directory.

This comes from exec_file_attach. In my specific case above, this is 
thrown from gdb/exec.c:268, a call to perror_with_name.

           if (scratch_chan < 0)
             perror_with_name (filename);

It seems this was introduced with commit 
1b6e6f5c7ffba559a681d11852acf38ef48dceff, with the addition of a call to 
exec_file_locate_attach from within remote_add_inferior.

That unguarded call to exec_file_locate_attach doesn't look safe since 
its callees can throw errors and potentially disrupt a connection attempt.

> FWIW I tried this (both on the same machine):
>

Doing so with both on the same machine/filesystem will not reproduce the 
problem, as i mentioned in my original post. GDB and gdbserver need to 
be on separate filesystems.

>    gdbserver :9999 /bin/ls
>    gdb -q -ex "set sysroot /whatever" -ex "target remote :9999"
>
> and got this:
>
>    Reading symbols from /bin/ls...(no debugging symbols found)...done.
>
> which I think is an error: the sysroot is being ignored.
>

Isn't it being ignored because GDB managed to figure out the path and 
successfully open the symbol file?

If it should honor the sysroot in that case, that looks like a different 
problem than the one i was originally chasing. There may be more bugs. :-)

> Once again, I have no fundamental problem with your patch, but I want
> to make sure we're not papering over some deeper issue.
>
> Thanks,
> Gary
>
  
Pedro Alves Feb. 12, 2016, 3:24 p.m. UTC | #7
On 02/12/2016 10:31 AM, Gary Benson wrote:

> FWIW I tried this (both on the same machine):
> 
>   gdbserver :9999 /bin/ls
>   gdb -q -ex "set sysroot /whatever" -ex "target remote :9999"
> 
> and got this:
> 
>   Reading symbols from /bin/ls...(no debugging symbols found)...done.
> 
> which I think is an error: the sysroot is being ignored.

I agree.  If you tell gdb about a sysroot, then I can't think why
you'd want it to try opening an absolute filename on the host, outside
the sysroot.

(caching and buildid matching aside)

Thanks,
Pedro Alves
  
Pedro Alves Feb. 12, 2016, 3:29 p.m. UTC | #8
On 02/11/2016 02:19 PM, Luis Machado wrote:

> gdb/ChangeLog:
> 
> 2016-02-11  Luis Machado  <lgustavo@codesourcery.com>
> 
> 	* remote.c (remote_add_inferior): Guard block that can throw
> 	errors.

So the question is: why guard this call, and not the others?

E.g., I'd think that failing to find the executable in the sysroot
shouldn't error out of "attach" either.

Thanks,
Pedro Alves
  
Luis Machado Feb. 12, 2016, 4:08 p.m. UTC | #9
On 02/12/2016 01:29 PM, Pedro Alves wrote:
> On 02/11/2016 02:19 PM, Luis Machado wrote:
>
>> gdb/ChangeLog:
>>
>> 2016-02-11  Luis Machado  <lgustavo@codesourcery.com>
>>
>> 	* remote.c (remote_add_inferior): Guard block that can throw
>> 	errors.
>
> So the question is: why guard this call, and not the others?
>
> E.g., I'd think that failing to find the executable in the sysroot
> shouldn't error out of "attach" either.

I did not exercise that, but did not discard it either. I was attempting 
to solve one problem at a time. There may be a failure there too.

If you're attempting to attach to a remote process via a remote 
gdbserver, then GDB is already connected to gdbserver by the time it 
issues a vAttach request, no?

If so, then it is not as problematic as not being able to connect to the 
target at all, but still wrong.

In any case, GDB should honor the user option of not using a symbol file 
at all for debugging processes that are already running.
  
Pedro Alves Feb. 12, 2016, 4:36 p.m. UTC | #10
On 02/12/2016 04:08 PM, Luis Machado wrote:

> I did not exercise that, but did not discard it either. I was attempting 
> to solve one problem at a time. There may be a failure there too.

I matters to determine where the TRY/CATCH should go, or even whether
the fix should be to not throw in the first place.

Like, with:

(gdb) define foo
>  attach PID
>  info threads
> end
(gdb) foo

should failing to load the executable error out before
reaching "info threads", or continue.

I think it should not error out, like we don't error out
if the target doesn't support target_pid_to_exec_file
at all.

Thanks,
Pedro Alves
  
Luis Machado Feb. 12, 2016, 5:30 p.m. UTC | #11
On 02/12/2016 02:36 PM, Pedro Alves wrote:
> On 02/12/2016 04:08 PM, Luis Machado wrote:
>
>> I did not exercise that, but did not discard it either. I was attempting
>> to solve one problem at a time. There may be a failure there too.
>
> I matters to determine where the TRY/CATCH should go, or even whether
> the fix should be to not throw in the first place.
>

Ok, i crafted a test application that i can attach to via a remote 
gdbserver and through the extended remote protocol.

I see the same error as before. On GDB's side, for example:

--
/lib64/ld-2.22.so: No such file or directory.
(gdb)
--

But it seems we have already attached to the process by the time we 
error out. So we remain attached to it, just not sure if in a sane state:

--
(gdb) info proc
process 5464
warning: target file /proc/5464/cmdline contained unexpected null characters
cmdline = '/lib64/ld.so.1'
cwd = '/home/tester'
exe = '/lib64/ld-2.22.so'
(gdb) i r
           zero       at       v0       v1       a0       a1       a2 
     a3
  R0   000013aa 00000014 00000204 00000000 ffa831f8 ffa831f8 00000000 
00000001
             t0       t1       t2       t3       t4       t5       t6 
     t7
  R8   ffa83060 557da7e0 00000000 015555f7 00000000 00000000 801123d8 
00000000
             s0       s1       s2       s3       s4       s5       s6 
     s7
  R16  ffa83178 00020000 ffa830f8 00000000 00503bd8 00000000 004ee308 
00000000
             t8       t9       k0       k1       gp       sp       s8 
     ra
  R24  00000038 55700e4c 00000000 00000000 557da7e0 ffa83000 ffa83240 
55700ba0
         status       lo       hi badvaddr    cause       pc
       0400a4f3 00000000 00000017 557a20e4 00800020 55700e78
fcsr: 0x0
fir: 0x173890c
restart: 0x13aa
(gdb)
--

> Like, with:
>
> (gdb) define foo
>>   attach PID
>>   info threads
>> end
> (gdb) foo
>
> should failing to load the executable error out before
> reaching "info threads", or continue.
>
> I think it should not error out, like we don't error out
> if the target doesn't support target_pid_to_exec_file
> at all.

Agreed. In this case "info threads" doesn't run at all.

>
> Thanks,
> Pedro Alves
>
  
Pedro Alves Feb. 17, 2016, 11:46 a.m. UTC | #12
On 02/12/2016 05:30 PM, Luis Machado wrote:
>> > I think it should not error out, like we don't error out
>> > if the target doesn't support target_pid_to_exec_file
>> > at all.
> Agreed. In this case "info threads" doesn't run at all.
> 

That then suggests to me that either the TRY/CATCH should be
somewhere inside exec_file_locate_attach instead of wrapping one
particular call, or, exec_file_locate_attach shouldn't be throwing
in the first place.

Thanks,
Pedro Alves
  
Gary Benson Feb. 17, 2016, 1:53 p.m. UTC | #13
Pedro Alves wrote:
> On 02/12/2016 10:31 AM, Gary Benson wrote:
> > FWIW I tried this (both on the same machine):
> > 
> >   gdbserver :9999 /bin/ls
> >   gdb -q -ex "set sysroot /whatever" -ex "target remote :9999"
> > 
> > and got this:
> > 
> >   Reading symbols from /bin/ls...(no debugging symbols found)...done.
> > 
> > which I think is an error: the sysroot is being ignored.
> 
> I agree.  If you tell gdb about a sysroot, then I can't think why
> you'd want it to try opening an absolute filename on the host,
> outside the sysroot.
> 
> (caching and buildid matching aside)

There isn't any caching or buildid matching, the logic in
exec_file_locate_attach is wrong.  It might even be the cause
of the exception Luis is seeing so with luck we can avoid a
try-catch in remote_add_inferior.  I'm putting together a patch
now.

Cheers,
Gary
  
Pedro Alves Feb. 17, 2016, 2:40 p.m. UTC | #14
On 02/17/2016 01:53 PM, Gary Benson wrote:
> Pedro Alves wrote:
>> On 02/12/2016 10:31 AM, Gary Benson wrote:
>>> FWIW I tried this (both on the same machine):
>>>
>>>   gdbserver :9999 /bin/ls
>>>   gdb -q -ex "set sysroot /whatever" -ex "target remote :9999"
>>>
>>> and got this:
>>>
>>>   Reading symbols from /bin/ls...(no debugging symbols found)...done.
>>>
>>> which I think is an error: the sysroot is being ignored.
>>
>> I agree.  If you tell gdb about a sysroot, then I can't think why
>> you'd want it to try opening an absolute filename on the host,
>> outside the sysroot.
>>
>> (caching and buildid matching aside)
> 
> There isn't any caching or buildid matching, the logic in
> exec_file_locate_attach is wrong.

Yes, what I meant was -- a valid reason I'd see gdb trying to open a file
on the host outside the sysroot would be for smarter buildid matching and
local caching of (hunks of) files.  But we don't have that.

> It might even be the cause
> of the exception Luis is seeing so with luck we can avoid a
> try-catch in remote_add_inferior.  I'm putting together a patch
> now.

Yes, if GDB ends up not trying to open any file, then it'll mask off
the exception Luis is seeing.

However, even if the sysroot points at the location with a
valid copy of the file, if the file is NOT readable, then we'll
still throw and close the connection.

Wait, actually, that's actually a good way to add a test to the testsuite
to cover Luis's use case, even without separate filesystems:

gdbserver:

 $ ./gdbserver/gdbserver :9999 ~/gdb/tests/threads&
 $ chmod 000 ~/gdb/tests/threads

gdb:

 (gdb) tar remote :9999
 Remote debugging using :9999
 Reading /home/pedro/gdb/tests/threads from remote target...
 warning: File transfers from remote targets can be slow. Use "set sysroot" to access files locally instead.
 "target:/home/pedro/gdb/tests/threads": could not open as an executable file: Permission denied.
 (gdb) maint print target-stack
 The current target stack is:
   - None (None)
 (gdb)

Above we dropped the connection...

Thanks,
Pedro Alves
  
Gary Benson Feb. 18, 2016, 12:30 p.m. UTC | #15
Hi Luis,

Luis Machado wrote:
> The following patch guards such a call to prevent those errors
> from disrupting a potential debugging session, and display only
> a warning.

After looking into this it seems your patch is the right way to
go.  There is a separate bug in exec_file_locate_attach in that
in some cases the sysroot is ignored, but fixing that will not
remove the need for what you are proposing.

My only nit is that I would prefer the TRY-CATCH block smaller,
covering just exec_file_locate_attach, like so:

  /* If no main executable is currently open then attempt to
     open the file that was executed to create this inferior.  */
  if (try_open_exec && get_exec_file (0) == NULL)
    {
      /* exec_file_locate_attach may throw an error...
      TRY
        {
          exec_file_locate_attach (pid, 1);
        }
      CATCH...

Other than that I am ok with this change.

Thanks for tracking this down.

Cheers,
Gary
  
Luis Machado Feb. 18, 2016, 12:40 p.m. UTC | #16
Hi Gary,

On 02/18/2016 10:30 AM, Gary Benson wrote:
> Hi Luis,
>
> Luis Machado wrote:
>> The following patch guards such a call to prevent those errors
>> from disrupting a potential debugging session, and display only
>> a warning.
>
> After looking into this it seems your patch is the right way to
> go.  There is a separate bug in exec_file_locate_attach in that
> in some cases the sysroot is ignored, but fixing that will not
> remove the need for what you are proposing.
>
> My only nit is that I would prefer the TRY-CATCH block smaller,
> covering just exec_file_locate_attach, like so:
>
>    /* If no main executable is currently open then attempt to
>       open the file that was executed to create this inferior.  */
>    if (try_open_exec && get_exec_file (0) == NULL)
>      {
>        /* exec_file_locate_attach may throw an error...
>        TRY
>          {
>            exec_file_locate_attach (pid, 1);
>          }
>        CATCH...
>
> Other than that I am ok with this change.
>
> Thanks for tracking this down.

No problem. I'll come up with a new version of the patch and, while at 
it, i'll try Pedro's suggestion for the testcase.
  

Patch

diff --git a/gdb/remote.c b/gdb/remote.c
index 6d56f19..3b63e9f 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -1789,10 +1789,32 @@  remote_add_inferior (int fake_pid_p, int pid, int attached,
   inf->attach_flag = attached;
   inf->fake_pid_p = fake_pid_p;
 
-  /* If no main executable is currently open then attempt to
-     open the file that was executed to create this inferior.  */
-  if (try_open_exec && get_exec_file (0) == NULL)
-    exec_file_locate_attach (pid, 1);
+
+  /* exec_file_locate_attach may throw an error if the file cannot be
+     opened either locally or remotely.  This happens when, for example,
+     GDB connects to a gdbserver that is running on a different
+     filesystem and the sysroot is set to non-target-based (no "target:").
+
+     Then GDB will neither load the binary from the target nor be able to
+     load a binary from the local filesystem (it may not exist in the local
+     filesystem in the same path as in the remote filesystem).
+
+     Even without a binary, the remote-based debugging session should
+     continue normally instead of ending abruptly.  */
+
+  TRY
+    {
+      /* If no main executable is currently open then attempt to
+	 open the file that was executed to create this inferior.  */
+      if (try_open_exec && get_exec_file (0) == NULL)
+	exec_file_locate_attach (pid, 1);
+    }
+  CATCH (err, RETURN_MASK_ALL)
+    {
+      if (err.message != NULL)
+	warning ("%s", err.message);
+    }
+  END_CATCH
 
   return inf;
 }