Fix lookup of separate debug file on MS-Windows

Message ID 835zr68kiw.fsf@gnu.org
State New, archived
Headers

Commit Message

Eli Zaretskii April 22, 2019, 9:19 a.m. UTC
  > From: Simon Marchi <simark@simark.ca>
> Date: Sun, 21 Apr 2019 08:55:07 -0400
> 
> On 2019-04-21 8:05 a.m., Eli Zaretskii wrote:
> > Ping!
> > 
> > Would people please voice their opinions regarding preservation of the
> > drive letter (and removing the colon) vs just dropping the drive
> > letter altogether?  I'd like to fix this issue for the upcoming
> > release of GDB 8.3.
> 
> I first read your patch (which initiated this thread), and my first reaction was
> also to be surprised that we would lose the drive letter.  The risk of clash
> is low, but why take that risk at all when it's easy enough to keep the drive letter
> and just strip the colon?

OK, so how about the patch below?

> I don't know the complete history of this, so if you know about distributions that
> place debug files in a path without the drive letter (what your patch implements),
> then I would be fine if GDB looked in both places, starting with the path including
> the drive letter.

I'm not aware of any Windows distributions that put debug info in the
directory suggested by my previous patch.  So I think we can safely
use just the /X/ subdirectory method.
  

Comments

Eli Zaretskii April 27, 2019, 10:56 a.m. UTC | #1
Ping!  OK to commit the below to both branches?

> Date: Mon, 22 Apr 2019 12:19:19 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> CC: gdb-patches@sourceware.org
> 
> > From: Simon Marchi <simark@simark.ca>
> > Date: Sun, 21 Apr 2019 08:55:07 -0400
> > 
> > On 2019-04-21 8:05 a.m., Eli Zaretskii wrote:
> > > Ping!
> > > 
> > > Would people please voice their opinions regarding preservation of the
> > > drive letter (and removing the colon) vs just dropping the drive
> > > letter altogether?  I'd like to fix this issue for the upcoming
> > > release of GDB 8.3.
> > 
> > I first read your patch (which initiated this thread), and my first reaction was
> > also to be surprised that we would lose the drive letter.  The risk of clash
> > is low, but why take that risk at all when it's easy enough to keep the drive letter
> > and just strip the colon?
> 
> OK, so how about the patch below?
> 
> > I don't know the complete history of this, so if you know about distributions that
> > place debug files in a path without the drive letter (what your patch implements),
> > then I would be fine if GDB looked in both places, starting with the path including
> > the drive letter.
> 
> I'm not aware of any Windows distributions that put debug info in the
> directory suggested by my previous patch.  So I think we can safely
> use just the /X/ subdirectory method.
> 
> diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
> index a3a5f3e..533a52c 100644
> --- a/gdb/doc/gdb.texinfo
> +++ b/gdb/doc/gdb.texinfo
> @@ -19917,7 +19917,11 @@
>  the directory of the executable file, then in a subdirectory of that
>  directory named @file{.debug}, and finally under each one of the global debug
>  directories, in a subdirectory whose name is identical to the leading
> -directories of the executable's absolute file name.
> +directories of the executable's absolute file name.  (On MS-Windows,
> +the drive letter of the executable's leading directories is converted
> +to a one-letter subdirectory, i.e.@: @file{d:/usr/bin/} is converted
> +to @file{/d/usr/bin/}, because Windows filesystems disallow colons in
> +file names.)
>  
>  @item
>  For the ``build ID'' method, @value{GDBN} looks in the
> diff --git a/gdb/symfile.c b/gdb/symfile.c
> index 5736666..5749c61 100644
> --- a/gdb/symfile.c
> +++ b/gdb/symfile.c
> @@ -1443,11 +1443,24 @@ find_separate_debug_file (const char *dir,
>      = dirnames_to_char_ptr_vec (debug_file_directory);
>    gdb::unique_xmalloc_ptr<char> canon_sysroot = gdb_realpath (gdb_sysroot);
>  
> + /* MS-Windows/MS-DOS don't allow colons in file names; we must
> +    convert the drive letter into a one-letter directory, so that the
> +    file name resulting from splicing below will be valid.  */
> +  std::string drive;
> +  if (HAS_DRIVE_SPEC (dir_notarget))
> +    {
> +      drive = std::string (1, dir_notarget[0]);
> +      dir_notarget = STRIP_DRIVE_SPEC (dir_notarget);
> +    }
> +  else
> +    drive = "";
> +
>    for (const gdb::unique_xmalloc_ptr<char> &debugdir : debugdir_vec)
>      {
>        debugfile = target_prefix ? "target:" : "";
>        debugfile += debugdir.get ();
>        debugfile += "/";
> +      debugfile += drive;
>        debugfile += dir_notarget;
>        debugfile += debuglink;
>  
>
  
Simon Marchi April 27, 2019, 4:16 p.m. UTC | #2
On 2019-04-22 5:19 a.m., Eli Zaretskii wrote:
>> From: Simon Marchi <simark@simark.ca>
>> Date: Sun, 21 Apr 2019 08:55:07 -0400
>>
>> On 2019-04-21 8:05 a.m., Eli Zaretskii wrote:
>>> Ping!
>>>
>>> Would people please voice their opinions regarding preservation of the
>>> drive letter (and removing the colon) vs just dropping the drive
>>> letter altogether?  I'd like to fix this issue for the upcoming
>>> release of GDB 8.3.
>>
>> I first read your patch (which initiated this thread), and my first reaction was
>> also to be surprised that we would lose the drive letter.  The risk of clash
>> is low, but why take that risk at all when it's easy enough to keep the drive letter
>> and just strip the colon?
> 
> OK, so how about the patch below?
> 
>> I don't know the complete history of this, so if you know about distributions that
>> place debug files in a path without the drive letter (what your patch implements),
>> then I would be fine if GDB looked in both places, starting with the path including
>> the drive letter.
> 
> I'm not aware of any Windows distributions that put debug info in the
> directory suggested by my previous patch.  So I think we can safely
> use just the /X/ subdirectory method.
> 
> diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
> index a3a5f3e..533a52c 100644
> --- a/gdb/doc/gdb.texinfo
> +++ b/gdb/doc/gdb.texinfo
> @@ -19917,7 +19917,11 @@
>  the directory of the executable file, then in a subdirectory of that
>  directory named @file{.debug}, and finally under each one of the global debug
>  directories, in a subdirectory whose name is identical to the leading
> -directories of the executable's absolute file name.
> +directories of the executable's absolute file name.  (On MS-Windows,
> +the drive letter of the executable's leading directories is converted
> +to a one-letter subdirectory, i.e.@: @file{d:/usr/bin/} is converted
> +to @file{/d/usr/bin/}, because Windows filesystems disallow colons in
> +file names.)
>  
>  @item
>  For the ``build ID'' method, @value{GDBN} looks in the
> diff --git a/gdb/symfile.c b/gdb/symfile.c
> index 5736666..5749c61 100644
> --- a/gdb/symfile.c
> +++ b/gdb/symfile.c
> @@ -1443,11 +1443,24 @@ find_separate_debug_file (const char *dir,
>      = dirnames_to_char_ptr_vec (debug_file_directory);
>    gdb::unique_xmalloc_ptr<char> canon_sysroot = gdb_realpath (gdb_sysroot);
>  
> + /* MS-Windows/MS-DOS don't allow colons in file names; we must
> +    convert the drive letter into a one-letter directory, so that the
> +    file name resulting from splicing below will be valid.  */
> +  std::string drive;
> +  if (HAS_DRIVE_SPEC (dir_notarget))

I think we should consider the case where we build GDB on GNU/Linux, to remotely
debug a Windows program.  When building on GNU/Linux, HAS_DRIVE_SPEC always return false,
since it's defined as (see include/filenames.h):

  #define HAS_DRIVE_SPEC(f) (0)

Let's suppose DEBUGDIR is "D:/my/debug", DIR is "target:E:/the/directory/" and DEBUGLINK
is "program.debug".  On GNU/Linux, we would build the path

  target:D:/my/debug/E:/the/directory/program.debug

And I suppose that the "E:" would result in the debug file not being found.

So I think we should be using HAS_DRIVE_SPEC_1, which allows us to do the same check.  We
just need to pass to the macro whether the target filesystem id DOS-based.  The only problem
is, how do we know whether the target filesystem is DOS-based?  We wouldn't want
HAS_DRIVE_SPEC_1 to do this stripping erroneously when debugging natively on GNU/Linux...

If we can't find a practical solution to this, then I would say it's better to keep using
HAS_DRIVE_SPEC, and it will be a known deficiency for the moment that it is wrong when debugging
remotely from GNU/Linux to Windows.  It wouldn't be a regression, as it is already wrong today).
But you will have fixed it for the native Windows case, which is a net gain.

> +    {
> +      drive = std::string (1, dir_notarget[0]);

This should be equivalent to:

    drive = dir_notarget[0];

> +      dir_notarget = STRIP_DRIVE_SPEC (dir_notarget);
> +    }
> +  else
> +    drive = "";

The else clause is unnecessary, as drive is already the empty string;

So you could reduce it to:

  std::string drive;
  if (HAS_DRIVE_SPEC (dir_notarget))
    {
      drive = dir_notarget[0];
      dir_notarget = STRIP_DRIVE_SPEC(dir_notarget);
    }

Simon
  
Eli Zaretskii April 27, 2019, 4:39 p.m. UTC | #3
> Cc: gdb-patches@sourceware.org
> From: Simon Marchi <simark@simark.ca>
> Date: Sat, 27 Apr 2019 12:16:23 -0400
> 
> I think we should consider the case where we build GDB on GNU/Linux, to remotely
> debug a Windows program.  When building on GNU/Linux, HAS_DRIVE_SPEC always return false,
> since it's defined as (see include/filenames.h):
> 
>   #define HAS_DRIVE_SPEC(f) (0)
> 
> Let's suppose DEBUGDIR is "D:/my/debug", DIR is "target:E:/the/directory/" and DEBUGLINK
> is "program.debug".  On GNU/Linux, we would build the path
> 
>   target:D:/my/debug/E:/the/directory/program.debug
> 
> And I suppose that the "E:" would result in the debug file not being found.

When debugging remotely, is the debug info on a Windows or on a
GNU/Linux filesystem?  If the latter, the above will work.  I always
thought that in remote debugging, GDB itself runs on the local host,
i.e. on GNU/Linux in this case, and the part that runs on the remote
is gdbserver.  Isn't that correct?

> So I think we should be using HAS_DRIVE_SPEC_1, which allows us to do the same check.  We
> just need to pass to the macro whether the target filesystem id DOS-based.  The only problem
> is, how do we know whether the target filesystem is DOS-based?  We wouldn't want
> HAS_DRIVE_SPEC_1 to do this stripping erroneously when debugging natively on GNU/Linux...

But if we do that, how do we distinguish between the use case you
describe above and a use case where the we debug locally and the file
names just happen to include colons?  Are we willing to restrict file
names on GNU/Linux to support the remote debugging on Windows?

Thanks for the other comments, I will implement them when we agree
about the above issue.
  
Simon Marchi April 27, 2019, 6:56 p.m. UTC | #4
On 2019-04-27 12:39, Eli Zaretskii wrote:
>> Cc: gdb-patches@sourceware.org
>> From: Simon Marchi <simark@simark.ca>
>> Date: Sat, 27 Apr 2019 12:16:23 -0400
>> 
>> I think we should consider the case where we build GDB on GNU/Linux, 
>> to remotely
>> debug a Windows program.  When building on GNU/Linux, HAS_DRIVE_SPEC 
>> always return false,
>> since it's defined as (see include/filenames.h):
>> 
>>   #define HAS_DRIVE_SPEC(f) (0)
>> 
>> Let's suppose DEBUGDIR is "D:/my/debug", DIR is 
>> "target:E:/the/directory/" and DEBUGLINK
>> is "program.debug".  On GNU/Linux, we would build the path
>> 
>>   target:D:/my/debug/E:/the/directory/program.debug
>> 
>> And I suppose that the "E:" would result in the debug file not being 
>> found.
> 
> When debugging remotely, is the debug info on a Windows or on a
> GNU/Linux filesystem?  If the latter, the above will work.  I always
> thought that in remote debugging, GDB itself runs on the local host,
> i.e. on GNU/Linux in this case, and the part that runs on the remote
> is gdbserver.  Isn't that correct?

The latter part is correct, gdbserver would run on the Windows machine, 
while GDB would run on the local GNU/Linux machine.

But the debug info can come from either places, depending on the setup.  
If the executable read by GDB comes from the "target:" filesystem, we 
will search for the separate debug files on the target: filesystem 
(hence all this complexity with target_prefix in this function).  If you 
are debugging remotely, the target: filesystem represents the remote 
filesystem, which GDB can read through requests serviced by gdbserver.  
If you have a local copy of the remote filesystem (a sysroot) and made 
"set sysroot" point to it, then GDB will read from there.

If you are debugging locally on Windows, you want the path to end up 
like:

   target:D:/my/debug/E/the/directory/program.debug

If you are debugging remotely your Windows from another Windows, using 
"set sysroot target:" (the default), you also want the path to end up 
like

   target:D:/my/debug/E/the/directory/program.debug

If you are debugging remotely from a Windows machine to another Windows 
machine, but installed a sysroot of the debugged machine in F:/sysroot, 
then I guess you'll want the path to end up like

   F:/sysroot/D/my/debug/E/the/directory/program.debug

So maybe we would want to apply the same treatment to the sysroot drive 
letter?

If you are debugging remotely your Windows from GNU/Linux, using "set 
sysroot target:", we would want the path to end up like

   target:D:/my/debug/E/the/directory/program.debug

Since the target filesystem wouldn't support a "E:" directory.

If you are debugging remotely your Windows from GNU/Linux, but installed 
a sysroot of the debugged machine in /sysroot, then either of these 
could work:

   /sysroot/D:/my/debug/E:/the/directory/program.debug
   /sysroot/D:/my/debug/E/the/directory/program.debug
   /sysroot/D/my/debug/E:/the/directory/program.debug
   /sysroot/D/my/debug/E/the/directory/program.debug

This leads me to say that for simplicity, when we convert a Windows 
drive letter into a directory component, we should always convert it to 
the letter without the colon.  This way, whatever platform GDB itself 
runs on, the searched path will be the same and predictable.  Also, if 
you were to copy D:/my/debug over to your GNU/Linux machine, you 
wouldn't have to rename the "E" directory to "E:".


? I would suggest the first one,

If you are debugging remotely from GNU/Linux and installed a sysroot of 
your Windows machine in /sysroot, then I suppose you would want the path 
to end up like:

   /sysroot/D:/my/debug/E/the/directory/program.debug

?  Or

   /sysroot/D/my/debug/E/the/directory/program.debug

?

In this last example, the file is lookup up on the GNU/Linux filesystem, 
so the path technically could include the E:.  However, since the

So if you are using a local sysroot of your
In the first case since you are reading from the GNU/Linux filesystem, 
so colons would be supported.  But in the latter case, they wouldn't.

One use case to consider is: you start debugging your Windows machine 
from your GNU/Linux machine, we do strip the colon, such that the 
separate debug info is searched at:

   target:D:/my/debug/E/the/directory/program.debug

and it is found.  However, it is quite slow, because everytime you start 
debugging, GDB needs to download this information.  So you make a 
sysroot of your Windows machine on your GNU/Linux machine, at /sysroot, 
such that you have locally:

   /sysroot/E:/the/directory/program.exe
   /sysroot/D:/my/debug/E/the/directory/program.debug

Then, you point GDB to the executable in your local sysroot, and it 
should At this point, the local filesystem

> 
>> So I think we should be using HAS_DRIVE_SPEC_1, which allows us to do 
>> the same check.  We
>> just need to pass to the macro whether the target filesystem id 
>> DOS-based.  The only problem
>> is, how do we know whether the target filesystem is DOS-based?  We 
>> wouldn't want
>> HAS_DRIVE_SPEC_1 to do this stripping erroneously when debugging 
>> natively on GNU/Linux...
> 
> But if we do that, how do we distinguish between the use case you
> describe above and a use case where the we debug locally and the file
> names just happen to include colons?  Are we willing to restrict file
> names on GNU/Linux to support the remote debugging on Windows?
> 
> Thanks for the other comments, I will implement them when we agree
> about the above issue.
On 2019-04-27 12:39, Eli Zaretskii wrote:
>> Cc: gdb-patches@sourceware.org
>> From: Simon Marchi <simark@simark.ca>
>> Date: Sat, 27 Apr 2019 12:16:23 -0400
>> 
>> I think we should consider the case where we build GDB on GNU/Linux, 
>> to remotely
>> debug a Windows program.  When building on GNU/Linux, HAS_DRIVE_SPEC 
>> always return false,
>> since it's defined as (see include/filenames.h):
>> 
>>   #define HAS_DRIVE_SPEC(f) (0)
>> 
>> Let's suppose DEBUGDIR is "D:/my/debug", DIR is 
>> "target:E:/the/directory/" and DEBUGLINK
>> is "program.debug".  On GNU/Linux, we would build the path
>> 
>>   target:D:/my/debug/E:/the/directory/program.debug
>> 
>> And I suppose that the "E:" would result in the debug file not being 
>> found.
> 
> When debugging remotely, is the debug info on a Windows or on a
> GNU/Linux filesystem?  If the latter, the above will work.  I always
> thought that in remote debugging, GDB itself runs on the local host,
> i.e. on GNU/Linux in this case, and the part that runs on the remote
> is gdbserver.  Isn't that correct?

The latter part is correct, gdbserver would run on the Windows machine, 
while GDB would run on the local GNU/Linux machine.

But the debug info can come from either places, depending on the setup.  
If the executable read by GDB comes from the "target:" filesystem, we 
will search for the separate debug files on the target: filesystem 
(hence all this complexity with target_prefix in this function).  If you 
are debugging remotely, the target: filesystem represents the remote 
filesystem, which GDB can read through requests serviced by gdbserver.  
If you have a local copy of the remote filesystem (a sysroot) and made 
"set sysroot" point to it, then GDB will read from there.

I tried to think about different scenarios, it brought more questions 
than answers, but here it is.

If you are debugging locally on Windows, you want the path to end up 
like:

   target:D:/my/debug/E/the/directory/program.debug

If you are debugging remotely your Windows from another Windows, using 
"set sysroot target:" (the default), you also want the path to end up 
like

   target:D:/my/debug/E/the/directory/program.debug

If you are debugging remotely from a Windows machine to another Windows 
machine, but installed a sysroot of the debugged machine in F:/sysroot, 
then I guess you'll want the path to end up like

   F:/sysroot/D/my/debug/E/the/directory/program.debug

We couldn't have a "D:" directory, so maybe we would want to apply the 
same treatment to the sysroot drive letter?

If you are debugging remotely your Windows from GNU/Linux, using "set 
sysroot target:", we would want the path to end up like

   target:D:/my/debug/E/the/directory/program.debug

... since the target filesystem wouldn't support a "E:" directory.

If you are debugging remotely your Windows from GNU/Linux, but installed 
a sysroot of the debugged machine in /sysroot, then either of these 
could work:

   /sysroot/D:/my/debug/E:/the/directory/program.debug
   /sysroot/D:/my/debug/E/the/directory/program.debug
   /sysroot/D/my/debug/E:/the/directory/program.debug
   /sysroot/D/my/debug/E/the/directory/program.debug

This leads me to say that for simplicity, when we convert a Windows 
drive letter into a directory component, we should always convert it to 
the letter without the colon.  This way, whatever platform GDB itself 
runs on, the searched path will be the same and predictable.  Also, if 
you were to copy D:/my/debug/E/the/directory/program.debug over to your 
sysroot on the GNU/Linux machine, you wouldn't have to rename the "E" 
directory to "E:" for it to be found.

Now, the question remains: how do we know for sure a path is a Windows 
path, in which we need to strip the colon?  I don't know.

Given the complexity of supporting all these scenarios (if we want to 
support them, we at least want to test them, which is a lot of work) and 
the fact that they are all theoretical at this point, I would be fine to 
go with what you had in your patch (using HAS_DRIVE_SPEC).  If somebody 
ever needs to remotely debug a Windows machine from a Linux machine and 
have GDB find separate debug file in the remote debug file directory... 
then they can add the missing pieces :).

Simon
  
Eli Zaretskii April 27, 2019, 7:04 p.m. UTC | #5
> Date: Sat, 27 Apr 2019 14:56:11 -0400
> From: Simon Marchi <simark@simark.ca>
> Cc: gdb-patches@sourceware.org
> 
> Given the complexity of supporting all these scenarios (if we want to 
> support them, we at least want to test them, which is a lot of work) and 
> the fact that they are all theoretical at this point, I would be fine to 
> go with what you had in your patch (using HAS_DRIVE_SPEC).  If somebody 
> ever needs to remotely debug a Windows machine from a Linux machine and 
> have GDB find separate debug file in the remote debug file directory... 
> then they can add the missing pieces :).

OK, thanks.  I will wait for a day or two for any other comments, and
if nothing comes up, will push with a FIXME comment regarding the
remote debugging use cases.
  
Eli Zaretskii May 3, 2019, 7:35 a.m. UTC | #6
> Date: Sat, 27 Apr 2019 22:04:37 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> CC: gdb-patches@sourceware.org
> 
> OK, thanks.  I will wait for a day or two for any other comments, and
> if nothing comes up, will push with a FIXME comment regarding the
> remote debugging use cases.

Done.
  

Patch

diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
index a3a5f3e..533a52c 100644
--- a/gdb/doc/gdb.texinfo
+++ b/gdb/doc/gdb.texinfo
@@ -19917,7 +19917,11 @@ 
 the directory of the executable file, then in a subdirectory of that
 directory named @file{.debug}, and finally under each one of the global debug
 directories, in a subdirectory whose name is identical to the leading
-directories of the executable's absolute file name.
+directories of the executable's absolute file name.  (On MS-Windows,
+the drive letter of the executable's leading directories is converted
+to a one-letter subdirectory, i.e.@: @file{d:/usr/bin/} is converted
+to @file{/d/usr/bin/}, because Windows filesystems disallow colons in
+file names.)
 
 @item
 For the ``build ID'' method, @value{GDBN} looks in the
diff --git a/gdb/symfile.c b/gdb/symfile.c
index 5736666..5749c61 100644
--- a/gdb/symfile.c
+++ b/gdb/symfile.c
@@ -1443,11 +1443,24 @@  find_separate_debug_file (const char *dir,
     = dirnames_to_char_ptr_vec (debug_file_directory);
   gdb::unique_xmalloc_ptr<char> canon_sysroot = gdb_realpath (gdb_sysroot);
 
+ /* MS-Windows/MS-DOS don't allow colons in file names; we must
+    convert the drive letter into a one-letter directory, so that the
+    file name resulting from splicing below will be valid.  */
+  std::string drive;
+  if (HAS_DRIVE_SPEC (dir_notarget))
+    {
+      drive = std::string (1, dir_notarget[0]);
+      dir_notarget = STRIP_DRIVE_SPEC (dir_notarget);
+    }
+  else
+    drive = "";
+
   for (const gdb::unique_xmalloc_ptr<char> &debugdir : debugdir_vec)
     {
       debugfile = target_prefix ? "target:" : "";
       debugfile += debugdir.get ();
       debugfile += "/";
+      debugfile += drive;
       debugfile += dir_notarget;
       debugfile += debuglink;