RFC: PR 31761: Stop the linker from overwriting source files
Checks
Context |
Check |
Description |
linaro-tcwg-bot/tcwg_binutils_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_binutils_check--master-aarch64 |
success
|
Test passed
|
linaro-tcwg-bot/tcwg_binutils_check--master-arm |
success
|
Test passed
|
Commit Message
Hi Guys,
Attached is a proposed patch to address one of the issues raised in
PR 31761 - namely that the linker will happily overwrite a source file
if it was invoked by mistake. eg:
ld -o foo.c foo.o
The patch does not change the linker's behaviour of deleting the
output file before writing to it, even if errors occur during the
link. I still consider this to be the correct behaviour.
With the patch applied the linker will now behave like this:
$ touch foo.c foo.o
$ ld -o foo.c foo.o
ld: fatal: output file 'foo.c' exists and is a known input file type
You can suppress this error by linking with --noinhibit-exec
$ echo $?
1
Any comments or suggestions ?
Cheers
Nick
Comments
On 18/06/24 18:04 +0100, Nick Clifton wrote:
>Hi Guys,
>
> Attached is a proposed patch to address one of the issues raised in
> PR 31761 - namely that the linker will happily overwrite a source file
> if it was invoked by mistake. eg:
>
> ld -o foo.c foo.o
>
> The patch does not change the linker's behaviour of deleting the
> output file before writing to it, even if errors occur during the
> link. I still consider this to be the correct behaviour.
>
> With the patch applied the linker will now behave like this:
>
> $ touch foo.c foo.o
> $ ld -o foo.c foo.o
> ld: fatal: output file 'foo.c' exists and is a known input file type
> You can suppress this error by linking with --noinhibit-exec
I see it already exists, but that option name looks odd to me. I would
expect it to be --no-inhibit-exec.
> $ echo $?
> 1
>
> Any comments or suggestions ?
Is .so a valid extension for both input and output files? So this
can't prevent overwriting those ... but that's probably OK.
>Cheers
> Nick
>
>diff --git a/ld/ld.texi b/ld/ld.texi
>index cc0a37d44ad..537189dbf21 100644
>--- a/ld/ld.texi
>+++ b/ld/ld.texi
>@@ -994,6 +994,14 @@ Use @var{output} as the name for the program produced by @command{ld}; if this
> option is not specified, the name @file{a.out} is used by default. The
> script command @code{OUTPUT} can also specify the output file name.
>
>+Note - the linker will refuse to create an output file if its name
>+matches the name of any of the input files.
>+
>+In addition the linker will refuse to create an output file if it
>+already exists and its name has a file extension that matches a known
>+source file or input file extension. eg .c or .o. This behaviour can
>+be suppressed by using the @option{--noinhibit-exec} option.
>+
> @kindex --dependency-file=@var{depfile}
> @cindex dependency file
> @item --dependency-file=@var{depfile}
>@@ -2303,9 +2311,22 @@ archive files.
> @kindex --noinhibit-exec
> @item --noinhibit-exec
> Retain the executable output file whenever it is still usable.
>-Normally, the linker will not produce an output file if it encounters
>-errors during the link process; it exits without writing an output file
>-when it issues any error whatsoever.
>+Normally, the linker will only produce an output file if no errors
>+occurred during the linking process. This option tells the linker to
>+produce the output file if at all possible, even if there were errors.
>+
>+Note - the output file is always deleted before it is written to.
>+This happens even if there are errors during the linking process.
>+This is done in order to prevent a user from running an old executable
>+thinking that it is the result of the latest invocation of the linker.
Ah, I see - that's a reasonable rationale for always removing it. But
that guarantee already doesn't hold with other linkers, and now
doesn't hold without --noinhibit-exec, right?
That is, if you are a strange person who wants to use foo.ld output
file, and you don't use --noinhibit-exec then the linker could exit,
and the output file won't have been changed so could be the old
version. So the new default behaviour seems to conflict with the
desire to always remove the output, even if an error occurs.
>+In addition, if this option is enabled the linker will not test to see
>+if the output file exists and its name includes an extension which is
>+a known sequence used by source or input files. For example the
>+linker will normally refuse to overwrite an output file called
>+@option{foo.c}, since doing so would probably be a mistake. But if
>+the @option{--noinhibit-exec} option is used then the overwritting
s/overwritting/overwriting/
>+will be allowed.
>
> @kindex -nostdlib
> @item -nostdlib
>diff --git a/ld/ldlang.c b/ld/ldlang.c
>index 9e8cc224f4d..636c4b8e799 100644
>--- a/ld/ldlang.c
>+++ b/ld/ldlang.c
>@@ -3356,6 +3356,42 @@ lang_get_output_target (void)
> return default_target;
> }
>
>+/* Returns TRUE if NAME has an extension that is known to be used
>+ by typical input files to the linker, or input files to a build
>+ system that eventually ends up invoking the linker. */
>+
>+static bool
>+has_known_file_extension (const char * name)
>+{
>+ if (name == NULL)
>+ return false;
>+
>+ size_t namelen = strlen (name);
>+
>+ if (namelen < 3)
>+ return false;
>+
>+ /* FIXME: We could optimize this lookup a lot. But it is coded
>+ this way for simplicity and ease of extension. */
>+ static const char * extensions[] =
>+ {
>+ ".c", ".h", ".s", ".C", ".S", ".o", ".t",
>+ ".ld",
>+ ".cpp", ".cxx"
>+ };
I'd add .cc for C++ files, as used in GCC.
>+
>+ size_t i;
>+ for (i = 0; i < ARRAY_SIZE (extensions); i++)
>+ {
>+ size_t len = strlen (extensions[i]);
>+
>+ if (strcmp (name + namelen - len, extensions[i]) == 0)
>+ return true;
>+ }
>+
>+ return false;
>+}
>+
> /* Open the output file. */
>
> static void
>@@ -3371,12 +3407,27 @@ open_output (const char *name)
> {
> char *in = lrealpath (f->local_sym_name);
> if (filename_cmp (in, out) == 0)
>- einfo (_("%F%P: input file '%s' is the same as output file\n"),
>+ einfo (_("%F%P: fatal: input file '%s' is the same as output file\n"),
> f->filename);
> free (in);
> }
> free (out);
>
>+ if (! force_make_executable)
>+ {
>+ /* PR 31761: Try to catch the case where the output file name is
>+ probably a mistake. eg a source file foo.c or a script file
>+ foo.t. We cannot catch every case of course, but we can try
>+ to be a little bit paranoid. */
>+ struct stat s;
>+
>+ if (lstat (name, &s) == 0
>+ && S_ISREG (s.st_mode)
>+ && has_known_file_extension (name))
>+ einfo (_("%F%P: fatal: output file '%s' exists and is a known input file type\n\
>+You can suppress this error by linking with --noinhibit-exec\n"), name);
>+ }
>+
> output_target = lang_get_output_target ();
>
> /* Has the user requested a particular endianness on the command
On Jun 18 2024, Nick Clifton wrote:
> +/* Returns TRUE if NAME has an extension that is known to be used
> + by typical input files to the linker, or input files to a build
> + system that eventually ends up invoking the linker. */
> +
> +static bool
> +has_known_file_extension (const char * name)
> +{
> + if (name == NULL)
> + return false;
> +
> + size_t namelen = strlen (name);
> +
> + if (namelen < 3)
> + return false;
> +
> + /* FIXME: We could optimize this lookup a lot. But it is coded
> + this way for simplicity and ease of extension. */
> + static const char * extensions[] =
> + {
> + ".c", ".h", ".s", ".C", ".S", ".o", ".t",
> + ".ld",
> + ".cpp", ".cxx"
> + };
> +
> + size_t i;
> + for (i = 0; i < ARRAY_SIZE (extensions); i++)
> + {
> + size_t len = strlen (extensions[i]);
> +
> + if (strcmp (name + namelen - len, extensions[i]) == 0)
This can be undefined if len > 3.
On 2024-06-18 18:04, Nick Clifton wrote:
> Hi Guys,
>
> Attached is a proposed patch to address one of the issues raised in
> PR 31761 - namely that the linker will happily overwrite a source
> file
> if it was invoked by mistake. eg:
>
> ld -o foo.c foo.o
>
> The patch does not change the linker's behaviour of deleting the
> output file before writing to it, even if errors occur during the
> link. I still consider this to be the correct behaviour.
>
> With the patch applied the linker will now behave like this:
>
> $ touch foo.c foo.o
> $ ld -o foo.c foo.o
> ld: fatal: output file 'foo.c' exists and is a known input file
> type
> You can suppress this error by linking with --noinhibit-exec
> $ echo $?
> 1
>
> Any comments or suggestions ?
This patch is missing .cc at a minimum (GCC uses this extension).
This approach won't help users of other languages (Fortran, ada, etc).
Are you really certain there is a good reason to retain that previous
behavior?
Surely the experience of the consequences of not looking at your
compiler exit status are "useful lessons" too... and far less
destructive ones.
Best wishes,
Peter D.
> Cheers
> Nick
Am 18.06.24 um 19:04 schrieb Nick Clifton:
> Hi Guys,
>
> Attached is a proposed patch to address one of the issues raised in
> PR 31761 - namely that the linker will happily overwrite a source file
> if it was invoked by mistake. eg:
>
> ld -o foo.c foo.o
>
> The patch does not change the linker's behaviour of deleting the
> output file before writing to it, even if errors occur during the
> link. I still consider this to be the correct behaviour.
>
> With the patch applied the linker will now behave like this:
>
> $ touch foo.c foo.o
> $ ld -o foo.c foo.o
> ld: fatal: output file 'foo.c' exists and is a known input file type
> You can suppress this error by linking with --noinhibit-exec
> $ echo $?
> 1
>
> Any comments or suggestions ?
>
> Cheers
> Nick
> +static bool
> +has_known_file_extension (const char * name)
> +{
> + if (name == NULL)
> + return false;
> +
> + size_t namelen = strlen (name);
> +
> + if (namelen < 3)
> + return false;
> +
> + /* FIXME: We could optimize this lookup a lot. But it is coded
> + this way for simplicity and ease of extension. */
> + static const char * extensions[] =
> + {
> + ".c", ".h", ".s", ".C", ".S", ".o", ".t",
> + ".ld",
> + ".cpp", ".cxx"
> + };
GCC support much more file extensions like .sx, .CPP, .cc (GCC's own C++
sources), .f, .for, .f90, ...
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/gcc.cc;h=d80b604a48dfacd09007c9b19c63e6851b7b4087;hb=HEAD#l1418
And maybe sort them according to the recognized language (and
mention the language) like C++.
Johann
On 18.06.2024 19:04, Nick Clifton wrote:
> Hi Guys,
>
> Attached is a proposed patch to address one of the issues raised in
> PR 31761 - namely that the linker will happily overwrite a source file
> if it was invoked by mistake. eg:
>
> ld -o foo.c foo.o
>
> The patch does not change the linker's behaviour of deleting the
> output file before writing to it, even if errors occur during the
> link. I still consider this to be the correct behaviour.
>
> With the patch applied the linker will now behave like this:
>
> $ touch foo.c foo.o
> $ ld -o foo.c foo.o
> ld: fatal: output file 'foo.c' exists and is a known input file type
> You can suppress this error by linking with --noinhibit-exec
> $ echo $?
> 1
>
> Any comments or suggestions ?
As others have indicated, the list of known file types would need extending.
Yet the wider it gets, the higher the risk of refusing something that makes
sense to someone, yet conflicts with (say) a language that this someone
doesn't even know about. IOW my view is that there better wouldn't be such
heuristics in ld. Instead (looking at the PR) we may want to see about
delaying the creation of the output file until we've recognized all inputs
(a typical source file would not pass input type checking).
Further, how does .o being in the list fit with "ld -r" wanting to write a
.o in the common case?
Jan
On Tue, Jun 18, 2024 at 06:04:25PM +0100, Nick Clifton wrote:
> Hi Guys,
>
> Attached is a proposed patch to address one of the issues raised in
> PR 31761 - namely that the linker will happily overwrite a source file
> if it was invoked by mistake. eg:
>
> ld -o foo.c foo.o
>
> The patch does not change the linker's behaviour of deleting the
> output file before writing to it, even if errors occur during the
> link. I still consider this to be the correct behaviour.
>
> With the patch applied the linker will now behave like this:
>
> $ touch foo.c foo.o
> $ ld -o foo.c foo.o
> ld: fatal: output file 'foo.c' exists and is a known input file type
> You can suppress this error by linking with --noinhibit-exec
> $ echo $?
> 1
>
> Any comments or suggestions ?
I think your instincts about this PR as shown by your initial comments
in the PR were correct. I don't think ld should restrict the file
name given on its command line in any way, except for sanity checking
against other command line arguments. Trying to make things foolproof
always fails.
Hi,
On 6/20/24 09:16, Alan Modra wrote:
> I think your instincts about this PR as shown by your initial comments
> in the PR were correct. I don't think ld should restrict the file
> name given on its command line in any way, except for sanity checking
> against other command line arguments. Trying to make things foolproof
> always fails.
FWIW I have accidentally overwritten source files, but always in the
context of "gcc -o <tab> <space> <tab>" when I meant to edit the output
name first. The "ld -o foo.c foo.o" case is a lot more difficult to
reach, except through the gcc frontend.
What makes sense is checking that input and output files are different
(which ld does), and maybe the gcc frontend could get a check like that
as well.
Simon
Hi Guys,
> Alan Modra wrote:
> I think your instincts about this PR as shown by your initial comments
> in the PR were correct. I don't think ld should restrict the file
> name given on its command line in any way, except for sanity checking
> against other command line arguments. Trying to make things foolproof
> always fails.
You know what - you are right. It is a fool's game trying to stop people
from being fools.
Instead I think that the best thing to do would be to document the current
behaviour. So attached is an alternative patch that updates the description
of the -o option. With the patch applied the text now reads like this:
'-o OUTPUT'
'--output=OUTPUT'
Use OUTPUT as the name for the program produced by 'ld'; if this
option is not specified, the name 'a.out' is used by default. The
script command 'OUTPUT' can also specify the output file name.
Note - the linker will delete the output file before it starts to
write to it. It will do this even if it turns out that the link
cannot be completed due to errors.
Note - the linker will check to make sure that the output file name
does not match the name of any of the input files, but that is all.
In particular it will not complain if the output file might
overwrite a source file or some other important file. Therefore in
build systems it is recommended to use the '-o' option as the last
option on the linker command line. For example consider:
ld -o $(EXE) $(OBJS)
ld $(OBJS) -o $(EXE)
If the 'EXE' variable is not defined for some reason, the first
version of the linker command could end up deleting one of the
object files (the first one in the 'OBJS' list) whereas the second
version of the linker command will generate an error message and
not delete anything.
What do people think ?
Cheers
Nick
On 20.06.2024 12:13, Nick Clifton wrote:
> Hi Guys,
>
> > Alan Modra wrote:
>> I think your instincts about this PR as shown by your initial comments
>> in the PR were correct. I don't think ld should restrict the file
>> name given on its command line in any way, except for sanity checking
>> against other command line arguments. Trying to make things foolproof
>> always fails.
>
> You know what - you are right. It is a fool's game trying to stop people
> from being fools.
>
> Instead I think that the best thing to do would be to document the current
> behaviour. So attached is an alternative patch that updates the description
> of the -o option. With the patch applied the text now reads like this:
>
> '-o OUTPUT'
> '--output=OUTPUT'
> Use OUTPUT as the name for the program produced by 'ld'; if this
> option is not specified, the name 'a.out' is used by default. The
> script command 'OUTPUT' can also specify the output file name.
>
> Note - the linker will delete the output file before it starts to
> write to it. It will do this even if it turns out that the link
> cannot be completed due to errors.
>
> Note - the linker will check to make sure that the output file name
> does not match the name of any of the input files, but that is all.
> In particular it will not complain if the output file might
> overwrite a source file or some other important file. Therefore in
> build systems it is recommended to use the '-o' option as the last
> option on the linker command line. For example consider:
>
> ld -o $(EXE) $(OBJS)
> ld $(OBJS) -o $(EXE)
>
> If the 'EXE' variable is not defined for some reason, the first
> version of the linker command could end up deleting one of the
> object files (the first one in the 'OBJS' list) whereas the second
> version of the linker command will generate an error message and
> not delete anything.
>
> What do people think ?
Lgtm, fwiw.
Jan
@@ -994,6 +994,14 @@ Use @var{output} as the name for the program produced by @command{ld}; if this
option is not specified, the name @file{a.out} is used by default. The
script command @code{OUTPUT} can also specify the output file name.
+Note - the linker will refuse to create an output file if its name
+matches the name of any of the input files.
+
+In addition the linker will refuse to create an output file if it
+already exists and its name has a file extension that matches a known
+source file or input file extension. eg .c or .o. This behaviour can
+be suppressed by using the @option{--noinhibit-exec} option.
+
@kindex --dependency-file=@var{depfile}
@cindex dependency file
@item --dependency-file=@var{depfile}
@@ -2303,9 +2311,22 @@ archive files.
@kindex --noinhibit-exec
@item --noinhibit-exec
Retain the executable output file whenever it is still usable.
-Normally, the linker will not produce an output file if it encounters
-errors during the link process; it exits without writing an output file
-when it issues any error whatsoever.
+Normally, the linker will only produce an output file if no errors
+occurred during the linking process. This option tells the linker to
+produce the output file if at all possible, even if there were errors.
+
+Note - the output file is always deleted before it is written to.
+This happens even if there are errors during the linking process.
+This is done in order to prevent a user from running an old executable
+thinking that it is the result of the latest invocation of the linker.
+
+In addition, if this option is enabled the linker will not test to see
+if the output file exists and its name includes an extension which is
+a known sequence used by source or input files. For example the
+linker will normally refuse to overwrite an output file called
+@option{foo.c}, since doing so would probably be a mistake. But if
+the @option{--noinhibit-exec} option is used then the overwritting
+will be allowed.
@kindex -nostdlib
@item -nostdlib
@@ -3356,6 +3356,42 @@ lang_get_output_target (void)
return default_target;
}
+/* Returns TRUE if NAME has an extension that is known to be used
+ by typical input files to the linker, or input files to a build
+ system that eventually ends up invoking the linker. */
+
+static bool
+has_known_file_extension (const char * name)
+{
+ if (name == NULL)
+ return false;
+
+ size_t namelen = strlen (name);
+
+ if (namelen < 3)
+ return false;
+
+ /* FIXME: We could optimize this lookup a lot. But it is coded
+ this way for simplicity and ease of extension. */
+ static const char * extensions[] =
+ {
+ ".c", ".h", ".s", ".C", ".S", ".o", ".t",
+ ".ld",
+ ".cpp", ".cxx"
+ };
+
+ size_t i;
+ for (i = 0; i < ARRAY_SIZE (extensions); i++)
+ {
+ size_t len = strlen (extensions[i]);
+
+ if (strcmp (name + namelen - len, extensions[i]) == 0)
+ return true;
+ }
+
+ return false;
+}
+
/* Open the output file. */
static void
@@ -3371,12 +3407,27 @@ open_output (const char *name)
{
char *in = lrealpath (f->local_sym_name);
if (filename_cmp (in, out) == 0)
- einfo (_("%F%P: input file '%s' is the same as output file\n"),
+ einfo (_("%F%P: fatal: input file '%s' is the same as output file\n"),
f->filename);
free (in);
}
free (out);
+ if (! force_make_executable)
+ {
+ /* PR 31761: Try to catch the case where the output file name is
+ probably a mistake. eg a source file foo.c or a script file
+ foo.t. We cannot catch every case of course, but we can try
+ to be a little bit paranoid. */
+ struct stat s;
+
+ if (lstat (name, &s) == 0
+ && S_ISREG (s.st_mode)
+ && has_known_file_extension (name))
+ einfo (_("%F%P: fatal: output file '%s' exists and is a known input file type\n\
+You can suppress this error by linking with --noinhibit-exec\n"), name);
+ }
+
output_target = lang_get_output_target ();
/* Has the user requested a particular endianness on the command