RFC: PR 31761: Stop the linker from overwriting source files

Message ID 87cyoeb2sm.fsf@redhat.com
State New
Headers
Series RFC: PR 31761: Stop the linker from overwriting source files |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_binutils_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_binutils_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_binutils_check--master-arm success Test passed

Commit Message

Nick Clifton June 18, 2024, 5:04 p.m. UTC
  Hi Guys,

  Attached is a proposed patch to address one of the issues raised in
  PR 31761 - namely that the linker will happily overwrite a source file
  if it was invoked by mistake.  eg:

    ld -o foo.c foo.o

  The patch does not change the linker's behaviour of deleting the
  output file before writing to it, even if errors occur during the
  link.  I still consider this to be the correct behaviour.

  With the patch applied the linker will now behave like this:

    $ touch foo.c foo.o
    $ ld -o foo.c foo.o
    ld: fatal: output file 'foo.c' exists and is a known input file type
    You can suppress this error by linking with --noinhibit-exec
    $ echo $?
    1

  Any comments or suggestions ?

Cheers
  Nick
  

Comments

Jonathan Wakely June 18, 2024, 5:21 p.m. UTC | #1
On 18/06/24 18:04 +0100, Nick Clifton wrote:
>Hi Guys,
>
>  Attached is a proposed patch to address one of the issues raised in
>  PR 31761 - namely that the linker will happily overwrite a source file
>  if it was invoked by mistake.  eg:
>
>    ld -o foo.c foo.o
>
>  The patch does not change the linker's behaviour of deleting the
>  output file before writing to it, even if errors occur during the
>  link.  I still consider this to be the correct behaviour.
>
>  With the patch applied the linker will now behave like this:
>
>    $ touch foo.c foo.o
>    $ ld -o foo.c foo.o
>    ld: fatal: output file 'foo.c' exists and is a known input file type
>    You can suppress this error by linking with --noinhibit-exec

I see it already exists, but that option name looks odd to me. I would
expect it to be --no-inhibit-exec.

>    $ echo $?
>    1
>
>  Any comments or suggestions ?

Is .so a valid extension for both input and output files? So this
can't prevent overwriting those ... but that's probably OK.

>Cheers
>  Nick
>

>diff --git a/ld/ld.texi b/ld/ld.texi
>index cc0a37d44ad..537189dbf21 100644
>--- a/ld/ld.texi
>+++ b/ld/ld.texi
>@@ -994,6 +994,14 @@ Use @var{output} as the name for the program produced by @command{ld}; if this
> option is not specified, the name @file{a.out} is used by default.  The
> script command @code{OUTPUT} can also specify the output file name.
> 
>+Note - the linker will refuse to create an output file if its name
>+matches the name of any of the input files.
>+
>+In addition the linker will refuse to create an output file if it
>+already exists and its name has a file extension that matches a known
>+source file or input file extension.  eg .c or .o.  This behaviour can
>+be suppressed by using the @option{--noinhibit-exec} option.
>+
> @kindex --dependency-file=@var{depfile}
> @cindex dependency file
> @item --dependency-file=@var{depfile}
>@@ -2303,9 +2311,22 @@ archive files.
> @kindex --noinhibit-exec
> @item --noinhibit-exec
> Retain the executable output file whenever it is still usable.
>-Normally, the linker will not produce an output file if it encounters
>-errors during the link process; it exits without writing an output file
>-when it issues any error whatsoever.
>+Normally, the linker will only produce an output file if no errors
>+occurred during the linking process.  This option tells the linker to
>+produce the output file if at all possible, even if there were errors.
>+
>+Note - the output file is always deleted before it is written to.
>+This happens even if there are errors during the linking process.
>+This is done in order to prevent a user from running an old executable
>+thinking that it is the result of the latest invocation of the linker.

Ah, I see - that's a reasonable rationale for always removing it. But
that guarantee already doesn't hold with other linkers, and now
doesn't hold without --noinhibit-exec, right?

That is, if you are a strange person who wants to use foo.ld output
file, and you don't use --noinhibit-exec then the linker could exit,
and the output file won't have been changed so could be the old
version. So the new default behaviour seems to conflict with the
desire to always remove the output, even if an error occurs.

>+In addition, if this option is enabled the linker will not test to see
>+if the output file exists and its name includes an extension which is
>+a known sequence used by source or input files.  For example the
>+linker will normally refuse to overwrite an output file called
>+@option{foo.c}, since doing so would probably be a mistake.  But if
>+the @option{--noinhibit-exec} option is used then the overwritting

s/overwritting/overwriting/

>+will be allowed.
> 
> @kindex -nostdlib
> @item -nostdlib
>diff --git a/ld/ldlang.c b/ld/ldlang.c
>index 9e8cc224f4d..636c4b8e799 100644
>--- a/ld/ldlang.c
>+++ b/ld/ldlang.c
>@@ -3356,6 +3356,42 @@ lang_get_output_target (void)
>   return default_target;
> }
> 
>+/* Returns TRUE if NAME has an extension that is known to be used
>+   by typical input files to the linker, or input files to a build
>+   system that eventually ends up invoking the linker.  */
>+
>+static bool
>+has_known_file_extension (const char * name)
>+{
>+  if (name == NULL)
>+    return false;
>+
>+  size_t namelen = strlen (name);
>+
>+  if (namelen < 3)
>+    return false;
>+
>+  /* FIXME: We could optimize this lookup a lot.  But it is coded
>+     this way for simplicity and ease of extension.  */
>+  static const char * extensions[] =
>+    {
>+      ".c", ".h", ".s", ".C", ".S", ".o", ".t",
>+      ".ld",
>+      ".cpp", ".cxx"
>+    };

I'd add .cc for C++ files, as used in GCC.

>+
>+  size_t i;
>+  for (i = 0; i < ARRAY_SIZE (extensions); i++)
>+    {
>+      size_t len = strlen (extensions[i]);
>+
>+      if (strcmp (name + namelen - len, extensions[i]) == 0)
>+	return true;
>+    }
>+
>+  return false;
>+}
>+
> /* Open the output file.  */
> 
> static void
>@@ -3371,12 +3407,27 @@ open_output (const char *name)
>       {
> 	char *in = lrealpath (f->local_sym_name);
> 	if (filename_cmp (in, out) == 0)
>-	  einfo (_("%F%P: input file '%s' is the same as output file\n"),
>+	  einfo (_("%F%P: fatal: input file '%s' is the same as output file\n"),
> 		 f->filename);
> 	free (in);
>       }
>   free (out);
> 
>+  if (! force_make_executable)
>+    {
>+      /* PR 31761: Try to catch the case where the output file name is
>+	 probably a mistake.  eg a source file foo.c or a script file
>+	 foo.t.  We cannot catch every case of course, but we can try
>+	 to be a little bit paranoid.  */
>+      struct stat s;
>+
>+      if (lstat (name, &s) == 0
>+	  && S_ISREG (s.st_mode)
>+	  && has_known_file_extension (name))
>+	einfo (_("%F%P: fatal: output file '%s' exists and is a known input file type\n\
>+You can suppress this error by linking with --noinhibit-exec\n"), name);
>+    }
>+
>   output_target = lang_get_output_target ();
> 
>   /* Has the user requested a particular endianness on the command
  
Andreas Schwab June 18, 2024, 5:23 p.m. UTC | #2
On Jun 18 2024, Nick Clifton wrote:

> +/* Returns TRUE if NAME has an extension that is known to be used
> +   by typical input files to the linker, or input files to a build
> +   system that eventually ends up invoking the linker.  */
> +
> +static bool
> +has_known_file_extension (const char * name)
> +{
> +  if (name == NULL)
> +    return false;
> +
> +  size_t namelen = strlen (name);
> +
> +  if (namelen < 3)
> +    return false;
> +
> +  /* FIXME: We could optimize this lookup a lot.  But it is coded
> +     this way for simplicity and ease of extension.  */
> +  static const char * extensions[] =
> +    {
> +      ".c", ".h", ".s", ".C", ".S", ".o", ".t",
> +      ".ld",
> +      ".cpp", ".cxx"
> +    };
> +
> +  size_t i;
> +  for (i = 0; i < ARRAY_SIZE (extensions); i++)
> +    {
> +      size_t len = strlen (extensions[i]);
> +
> +      if (strcmp (name + namelen - len, extensions[i]) == 0)

This can be undefined if len > 3.
  
Peter0x44 June 18, 2024, 5:55 p.m. UTC | #3
On 2024-06-18 18:04, Nick Clifton wrote:
> Hi Guys,
> 
>   Attached is a proposed patch to address one of the issues raised in
>   PR 31761 - namely that the linker will happily overwrite a source 
> file
>   if it was invoked by mistake.  eg:
> 
>     ld -o foo.c foo.o
> 
>   The patch does not change the linker's behaviour of deleting the
>   output file before writing to it, even if errors occur during the
>   link.  I still consider this to be the correct behaviour.
> 
>   With the patch applied the linker will now behave like this:
> 
>     $ touch foo.c foo.o
>     $ ld -o foo.c foo.o
>     ld: fatal: output file 'foo.c' exists and is a known input file 
> type
>     You can suppress this error by linking with --noinhibit-exec
>     $ echo $?
>     1
> 
>   Any comments or suggestions ?

This patch is missing .cc at a minimum (GCC uses this extension).
This approach won't help users of other languages (Fortran, ada, etc).

Are you really certain there is a good reason to retain that previous 
behavior?
Surely the experience of the consequences of not looking at your 
compiler exit status are "useful lessons" too... and far less 
destructive ones.

Best wishes,
Peter D.

> Cheers
>   Nick
  
Georg-Johann Lay June 18, 2024, 5:56 p.m. UTC | #4
Am 18.06.24 um 19:04 schrieb Nick Clifton:
> Hi Guys,
> 
>    Attached is a proposed patch to address one of the issues raised in
>    PR 31761 - namely that the linker will happily overwrite a source file
>    if it was invoked by mistake.  eg:
> 
>      ld -o foo.c foo.o
> 
>    The patch does not change the linker's behaviour of deleting the
>    output file before writing to it, even if errors occur during the
>    link.  I still consider this to be the correct behaviour.
> 
>    With the patch applied the linker will now behave like this:
> 
>      $ touch foo.c foo.o
>      $ ld -o foo.c foo.o
>      ld: fatal: output file 'foo.c' exists and is a known input file type
>      You can suppress this error by linking with --noinhibit-exec
>      $ echo $?
>      1
> 
>    Any comments or suggestions ?
> 
> Cheers
>    Nick

> +static bool
> +has_known_file_extension (const char * name)
> +{
> +  if (name == NULL)
> +    return false;
> +
> +  size_t namelen = strlen (name);
> +
> +  if (namelen < 3)
> +    return false;
> +
> +  /* FIXME: We could optimize this lookup a lot.  But it is coded
> +     this way for simplicity and ease of extension.  */
> +  static const char * extensions[] =
> +    {
> +      ".c", ".h", ".s", ".C", ".S", ".o", ".t",
> +      ".ld",
> +      ".cpp", ".cxx"
> +    };


GCC support much more file extensions like .sx, .CPP, .cc (GCC's own C++
sources), .f, .for, .f90, ...

https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/gcc.cc;h=d80b604a48dfacd09007c9b19c63e6851b7b4087;hb=HEAD#l1418

And maybe sort them according to the recognized language (and
mention the language) like C++.

Johann
  
Jan Beulich June 19, 2024, 6:49 a.m. UTC | #5
On 18.06.2024 19:04, Nick Clifton wrote:
> Hi Guys,
> 
>   Attached is a proposed patch to address one of the issues raised in
>   PR 31761 - namely that the linker will happily overwrite a source file
>   if it was invoked by mistake.  eg:
> 
>     ld -o foo.c foo.o
> 
>   The patch does not change the linker's behaviour of deleting the
>   output file before writing to it, even if errors occur during the
>   link.  I still consider this to be the correct behaviour.
> 
>   With the patch applied the linker will now behave like this:
> 
>     $ touch foo.c foo.o
>     $ ld -o foo.c foo.o
>     ld: fatal: output file 'foo.c' exists and is a known input file type
>     You can suppress this error by linking with --noinhibit-exec
>     $ echo $?
>     1
> 
>   Any comments or suggestions ?

As others have indicated, the list of known file types would need extending.
Yet the wider it gets, the higher the risk of refusing something that makes
sense to someone, yet conflicts with (say) a language that this someone
doesn't even know about. IOW my view is that there better wouldn't be such
heuristics in ld. Instead (looking at the PR) we may want to see about
delaying the creation of the output file until we've recognized all inputs
(a typical source file would not pass input type checking).

Further, how does .o being in the list fit with "ld -r" wanting to write a
.o in the common case?

Jan
  
Alan Modra June 20, 2024, 12:16 a.m. UTC | #6
On Tue, Jun 18, 2024 at 06:04:25PM +0100, Nick Clifton wrote:
> Hi Guys,
> 
>   Attached is a proposed patch to address one of the issues raised in
>   PR 31761 - namely that the linker will happily overwrite a source file
>   if it was invoked by mistake.  eg:
> 
>     ld -o foo.c foo.o
> 
>   The patch does not change the linker's behaviour of deleting the
>   output file before writing to it, even if errors occur during the
>   link.  I still consider this to be the correct behaviour.
> 
>   With the patch applied the linker will now behave like this:
> 
>     $ touch foo.c foo.o
>     $ ld -o foo.c foo.o
>     ld: fatal: output file 'foo.c' exists and is a known input file type
>     You can suppress this error by linking with --noinhibit-exec
>     $ echo $?
>     1
> 
>   Any comments or suggestions ?

I think your instincts about this PR as shown by your initial comments
in the PR were correct.  I don't think ld should restrict the file
name given on its command line in any way, except for sanity checking
against other command line arguments.  Trying to make things foolproof
always fails.
  
Simon Richter June 20, 2024, 6:14 a.m. UTC | #7
Hi,

On 6/20/24 09:16, Alan Modra wrote:

> I think your instincts about this PR as shown by your initial comments
> in the PR were correct.  I don't think ld should restrict the file
> name given on its command line in any way, except for sanity checking
> against other command line arguments.  Trying to make things foolproof
> always fails.

FWIW I have accidentally overwritten source files, but always in the 
context of "gcc -o <tab> <space> <tab>" when I meant to edit the output 
name first. The "ld -o foo.c foo.o" case is a lot more difficult to 
reach, except through the gcc frontend.

What makes sense is checking that input and output files are different 
(which ld does), and maybe the gcc frontend could get a check like that 
as well.

    Simon
  
Nick Clifton June 20, 2024, 10:13 a.m. UTC | #8
Hi Guys,

 > Alan Modra wrote:
> I think your instincts about this PR as shown by your initial comments
> in the PR were correct.  I don't think ld should restrict the file
> name given on its command line in any way, except for sanity checking
> against other command line arguments.  Trying to make things foolproof
> always fails.

You know what - you are right.  It is a fool's game trying to stop people
from being fools.

Instead I think that the best thing to do would be to document the current
behaviour.  So attached is an alternative patch that updates the description
of the -o option.  With the patch applied the text now reads like this:

   '-o OUTPUT'
   '--output=OUTPUT'
      Use OUTPUT as the name for the program produced by 'ld'; if this
      option is not specified, the name 'a.out' is used by default.  The
      script command 'OUTPUT' can also specify the output file name.

      Note - the linker will delete the output file before it starts to
      write to it.  It will do this even if it turns out that the link
      cannot be completed due to errors.

      Note - the linker will check to make sure that the output file name
      does not match the name of any of the input files, but that is all.
      In particular it will not complain if the output file might
      overwrite a source file or some other important file.  Therefore in
      build systems it is recommended to use the '-o' option as the last
      option on the linker command line.  For example consider:

             ld -o $(EXE) $(OBJS)
             ld $(OBJS) -o $(EXE)

      If the 'EXE' variable is not defined for some reason, the first
      version of the linker command could end up deleting one of the
      object files (the first one in the 'OBJS' list) whereas the second
      version of the linker command will generate an error message and
      not delete anything.

   What do people think ?

Cheers
   Nick
  
Jan Beulich June 20, 2024, 10:19 a.m. UTC | #9
On 20.06.2024 12:13, Nick Clifton wrote:
> Hi Guys,
> 
>  > Alan Modra wrote:
>> I think your instincts about this PR as shown by your initial comments
>> in the PR were correct.  I don't think ld should restrict the file
>> name given on its command line in any way, except for sanity checking
>> against other command line arguments.  Trying to make things foolproof
>> always fails.
> 
> You know what - you are right.  It is a fool's game trying to stop people
> from being fools.
> 
> Instead I think that the best thing to do would be to document the current
> behaviour.  So attached is an alternative patch that updates the description
> of the -o option.  With the patch applied the text now reads like this:
> 
>    '-o OUTPUT'
>    '--output=OUTPUT'
>       Use OUTPUT as the name for the program produced by 'ld'; if this
>       option is not specified, the name 'a.out' is used by default.  The
>       script command 'OUTPUT' can also specify the output file name.
> 
>       Note - the linker will delete the output file before it starts to
>       write to it.  It will do this even if it turns out that the link
>       cannot be completed due to errors.
> 
>       Note - the linker will check to make sure that the output file name
>       does not match the name of any of the input files, but that is all.
>       In particular it will not complain if the output file might
>       overwrite a source file or some other important file.  Therefore in
>       build systems it is recommended to use the '-o' option as the last
>       option on the linker command line.  For example consider:
> 
>              ld -o $(EXE) $(OBJS)
>              ld $(OBJS) -o $(EXE)
> 
>       If the 'EXE' variable is not defined for some reason, the first
>       version of the linker command could end up deleting one of the
>       object files (the first one in the 'OBJS' list) whereas the second
>       version of the linker command will generate an error message and
>       not delete anything.
> 
>    What do people think ?

Lgtm, fwiw.

Jan
  

Patch

diff --git a/ld/ld.texi b/ld/ld.texi
index cc0a37d44ad..537189dbf21 100644
--- a/ld/ld.texi
+++ b/ld/ld.texi
@@ -994,6 +994,14 @@  Use @var{output} as the name for the program produced by @command{ld}; if this
 option is not specified, the name @file{a.out} is used by default.  The
 script command @code{OUTPUT} can also specify the output file name.
 
+Note - the linker will refuse to create an output file if its name
+matches the name of any of the input files.
+
+In addition the linker will refuse to create an output file if it
+already exists and its name has a file extension that matches a known
+source file or input file extension.  eg .c or .o.  This behaviour can
+be suppressed by using the @option{--noinhibit-exec} option.
+
 @kindex --dependency-file=@var{depfile}
 @cindex dependency file
 @item --dependency-file=@var{depfile}
@@ -2303,9 +2311,22 @@  archive files.
 @kindex --noinhibit-exec
 @item --noinhibit-exec
 Retain the executable output file whenever it is still usable.
-Normally, the linker will not produce an output file if it encounters
-errors during the link process; it exits without writing an output file
-when it issues any error whatsoever.
+Normally, the linker will only produce an output file if no errors
+occurred during the linking process.  This option tells the linker to
+produce the output file if at all possible, even if there were errors.
+
+Note - the output file is always deleted before it is written to.
+This happens even if there are errors during the linking process.
+This is done in order to prevent a user from running an old executable
+thinking that it is the result of the latest invocation of the linker.
+
+In addition, if this option is enabled the linker will not test to see
+if the output file exists and its name includes an extension which is
+a known sequence used by source or input files.  For example the
+linker will normally refuse to overwrite an output file called
+@option{foo.c}, since doing so would probably be a mistake.  But if
+the @option{--noinhibit-exec} option is used then the overwritting
+will be allowed.
 
 @kindex -nostdlib
 @item -nostdlib
diff --git a/ld/ldlang.c b/ld/ldlang.c
index 9e8cc224f4d..636c4b8e799 100644
--- a/ld/ldlang.c
+++ b/ld/ldlang.c
@@ -3356,6 +3356,42 @@  lang_get_output_target (void)
   return default_target;
 }
 
+/* Returns TRUE if NAME has an extension that is known to be used
+   by typical input files to the linker, or input files to a build
+   system that eventually ends up invoking the linker.  */
+
+static bool
+has_known_file_extension (const char * name)
+{
+  if (name == NULL)
+    return false;
+
+  size_t namelen = strlen (name);
+
+  if (namelen < 3)
+    return false;
+
+  /* FIXME: We could optimize this lookup a lot.  But it is coded
+     this way for simplicity and ease of extension.  */
+  static const char * extensions[] =
+    {
+      ".c", ".h", ".s", ".C", ".S", ".o", ".t",
+      ".ld",
+      ".cpp", ".cxx"
+    };
+
+  size_t i;
+  for (i = 0; i < ARRAY_SIZE (extensions); i++)
+    {
+      size_t len = strlen (extensions[i]);
+
+      if (strcmp (name + namelen - len, extensions[i]) == 0)
+	return true;
+    }
+
+  return false;
+}
+
 /* Open the output file.  */
 
 static void
@@ -3371,12 +3407,27 @@  open_output (const char *name)
       {
 	char *in = lrealpath (f->local_sym_name);
 	if (filename_cmp (in, out) == 0)
-	  einfo (_("%F%P: input file '%s' is the same as output file\n"),
+	  einfo (_("%F%P: fatal: input file '%s' is the same as output file\n"),
 		 f->filename);
 	free (in);
       }
   free (out);
 
+  if (! force_make_executable)
+    {
+      /* PR 31761: Try to catch the case where the output file name is
+	 probably a mistake.  eg a source file foo.c or a script file
+	 foo.t.  We cannot catch every case of course, but we can try
+	 to be a little bit paranoid.  */
+      struct stat s;
+
+      if (lstat (name, &s) == 0
+	  && S_ISREG (s.st_mode)
+	  && has_known_file_extension (name))
+	einfo (_("%F%P: fatal: output file '%s' exists and is a known input file type\n\
+You can suppress this error by linking with --noinhibit-exec\n"), name);
+    }
+
   output_target = lang_get_output_target ();
 
   /* Has the user requested a particular endianness on the command