[v2,RFC] libdep plugin: fix bugs in parser and drop escaping
Checks
Context |
Check |
Description |
linaro-tcwg-bot/tcwg_binutils_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_binutils_check--master-aarch64 |
success
|
Test passed
|
linaro-tcwg-bot/tcwg_binutils_check--master-arm |
success
|
Test passed
|
Commit Message
Second version of this patch addresses the following:
1. fix another buffer overflow in libdep_plugin.c when the libdep argument
string was tab separated instead of ` ` white space.
2. drops handling of `\`: previously the parser simply dropped that character,
it did not escape anything, and it was impossible to specify `-Lfoo\bar`
with a literal `\` in the path. Further `\` triggered a bug where it did not
memmove the trailing null byte. So it is not breaking to make the parser
treat `\` like a literal character. There is no need for an escape character
because single and double quotes allow one to express any character as an
argument, include white space and quotes.
3. unterminated quotes are now a warning, the plugin does nothing in that case.
4. make the parser linear time. memmove made it quadratic worst case.
PR ld/31906
* libdep_plugin.c (str2vec): Fix bug where null byte was not copied on
memmove during quote handling and escaping, causing repeat of the last
character in the last argument. Fix buffer overflow in **res when
arguments were separated by `\t` instead of ` `. Remove handling of
the escape character `\`, as it made it impossible to specify paths
containing `\` -- the implementation merely dropped `\`, and was
affected by the memmove bug, so this should not be breaking; just
single and double quotes are sufficient to deal with white space and
quote characters, there is no need for escaping. Handle syntax errors
on unterminated quotes. Make the parser linear time instead of
quadratic.
Signed-off-by: Harmen Stoppels <me@harmenstoppels.nl>
---
ld/libdep_plugin.c | 153 ++++++++++++++++++---------------------------
1 file changed, 61 insertions(+), 92 deletions(-)
Comments
Harmen Stoppels wrote:
> Second version of this patch addresses the following:
>
> 1. fix another buffer overflow in libdep_plugin.c when the libdep argument
> string was tab separated instead of ` ` white space.
When would such an argument be produced? The string was itself passed in as a
command line argument, and a shell would delimit all arguments with a space. Any
embedded tabs would thus be data, and not delimiters.
> 2. drops handling of `\`: previously the parser simply dropped that character,
> it did not escape anything, and it was impossible to specify `-Lfoo\bar`
> with a literal `\` in the path. Further `\` triggered a bug where it did not
> memmove the trailing null byte. So it is not breaking to make the parser
> treat `\` like a literal character.
> There is no need for an escape character
> because single and double quotes allow one to express any character as an
> argument, include white space and quotes.
It's quite awkward to do the necessary intermixing of single and double quotes when
passing arguments through a Makefile and a shell, or thru autoconf to a shell to a
Makefile, etc. etc...
> 3. unterminated quotes are now a warning, the plugin does nothing in that case.
>
> 4. make the parser linear time. memmove made it quadratic worst case.
Sure, quadratic worst case, but we're not talking about megabytes of arguments.
> PR ld/31906
> * libdep_plugin.c (str2vec): Fix bug where null byte was not copied on
> memmove during quote handling and escaping, causing repeat of the last
> character in the last argument. Fix buffer overflow in **res when
> arguments were separated by `\t` instead of ` `. Remove handling of
> the escape character `\`, as it made it impossible to specify paths
> containing `\` -- the implementation merely dropped `\`, and was
> affected by the memmove bug, so this should not be breaking; just
> single and double quotes are sufficient to deal with white space and
> quote characters, there is no need for escaping. Handle syntax errors
> on unterminated quotes. Make the parser linear time instead of
> quadratic.
Hi Howard,
On Thursday, June 20th, 2024 at 19:36, Howard Chu <hyc@symas.com> wrote:
>
>
> Harmen Stoppels wrote:
>
> > Second version of this patch addresses the following:
> >
> > 1. fix another buffer overflow in libdep_plugin.c when the libdep argument
> > string was tab separated instead of white space.
>
>
> When would such an argument be produced? The string was itself passed in as a
> command line argument, and a shell would delimit all arguments with a space. Any
> embedded tabs would thus be data, and not delimiters.
It can happen in a pattern `ar cfl "-L$dir -lexample" libf.a f.o`. It may be
uncommon for directories to contain a `\t`, but it is allowed, so it shouldn't
segfault the linker.
> > 2. drops handling of `\\`: previously the parser simply dropped that character,
> > it did not escape anything, and it was impossible to specify `-Lfoo\\bar`
> > with a literal `\\` in the path. Further `\\` triggered a bug where it did not
> > memmove the trailing null byte. So it is not breaking to make the parser
> > treat `\\` like a literal character.
>
> > There is no need for an escape character
> > because single and double quotes allow one to express any character as an
> > argument, include white space and quotes.
>
>
> It's quite awkward to do the necessary intermixing of single and double quotes when
> passing arguments through a Makefile and a shell, or thru autoconf to a shell to a
> Makefile, etc. etc...
Sure, but not everyone uses autotools and makefiles. For example meson is python based
and can trivially generate a ninja file with proper escaping using shlex.quote(...),
it's a python builtin.
If \ is a special escaping character in the linker, then you actually have to do double
escaping from the shell when calling `ar`, which is only more confusing:
$ ar crl "-La\\b"
would escape the `\` in the shell, but the linker receives `-La\b` and interprets it as
an escape character for `\b` parsing it as `-Lab`, so users would have to specify
$ ar crl "-La\\\\b"
on the command line to get `-La\b` as a command line argument...
If `\` is taken as a literal character in the linker, then
$ ar crl "-La\\b"
is sufficient to get `-La\b` which is much better.
To me it seems that the current implementation of libdep.so is flawed, it's just
not comparable to how dynamic libraries are resolved, even though it claims to
make static linking behave similar to dynamic linking w.r.t. resolving
dependencies.
The relevant -L flags should have been local search paths at highest priority
scoped to the current static library (maybe inherited like rpaths). That's not
something that can be expressed easily in a bfd plugin:
- The gold linker gets it right to some extent, except that it only supports a
single -L flag per -l library, making it hard to use. Every new -L flag
overrides the previous one.
- The bfd linker adds a *global* search path at lowest priority, which results
in two issues: the dependency is often located in system directories instead
of the provided -L path, and the -L path affects locating of other libraries
specified on the command line.
Fixing those issues would require how registration of paths is done in bfd and
gold, which would potentially break other user plugins that rely on the current
behavior.
Further, the double escaping issues (that always persist with quotes) could've
been avoided by using a better API in ar, like one of the following:
ar cr --libdep-arg=-L/foo --libdep-arg=-L/bar --libdep-arg=-lbaz
ar cr --libdep-path=/foo --libdep-path=/bar --libdep-lib=baz
Those arguments can be stored as consecutive C-strings in the __.LIBDEP member.
No escaping needed, and it mimics `cc -Wl,-rpath`. But I'm afraid that ship has
sailed.
@@ -132,83 +132,56 @@ get_libdeps (int fd)
return rc;
}
-/* Turn a string into an argvec. */
-static char **
-str2vec (char *in)
+/* Parse arguments in-place as contiguous C-strings
+ and return the number of arguments */
+int
+parse_libdep (char *str)
{
- char **res;
- char *s, *first, *end;
- char *sq, *dq;
- int i;
-
- end = in + strlen (in);
- s = in;
- while (isspace ((unsigned char) *s)) s++;
- first = s;
-
- i = 1;
- while ((s = strchr (s, ' ')))
+ char *src, *dst;
+ char quote;
+ int narg;
+ src = dst = str;
+ for (; isspace ((unsigned char) *src); ++src);
+ if (*src == '\0')
+ return 0;
+ narg = 1;
+ quote = 0;
+ while (*src)
{
- s++;
- i++;
- }
- res = (char **)malloc ((i+1) * sizeof (char *));
- if (!res)
- return res;
-
- i = 0;
- sq = NULL;
- dq = NULL;
- res[0] = first;
- for (s = first; *s; s++)
- {
- if (*s == '\\')
- {
- memmove (s, s+1, end-s-1);
- end--;
- }
- if (isspace ((unsigned char) *s))
- {
- if (sq || dq)
- continue;
- *s++ = '\0';
- while (isspace ((unsigned char) *s)) s++;
- if (*s)
- res[++i] = s;
- }
- if (*s == '\'' && !dq)
+ if (*src == '\'' || *src == '\"')
{
- if (sq)
+ if (!quote)
+ quote = *src++;
+
+ else if (*src == quote)
{
- memmove (sq, sq+1, s-sq-1);
- memmove (s-2, s+1, end-s-1);
- end -= 2;
- s--;
- sq = NULL;
+ ++src;
+ quote = 0;
}
+
else
- {
- sq = s;
- }
+ *dst++ = *src++;
}
- if (*s == '"' && !sq)
+
+ else if (!quote && isspace ((unsigned char) *src))
{
- if (dq)
- {
- memmove (dq, dq+1, s-dq-1);
- memmove (s-2, s+1, end-s-1);
- end -= 2;
- s--;
- dq = NULL;
- }
- else
- {
- dq = s;
- }
+ ++narg;
+ ++src;
+ *dst++ = '\0';
+ for (; isspace ((unsigned char) *src); ++src);
}
+
+ else
+ *dst++ = *src++;
}
- res[++i] = NULL;
- return res;
+ *dst = '\0';
+ if (quote)
+ {
+ TV_MESSAGE (LDPL_WARNING,
+ "libdep syntax error: unterminated quoted string");
+ return 0;
+ }
+ return narg;
}
static char *prevfile;
@@ -260,39 +233,35 @@ static enum ld_plugin_status
onall_symbols_read (void)
{
linerec *lr;
- char **vec;
+ int nargs;
+ char const *arg;
enum ld_plugin_status rv = LDPS_OK;
while ((lr = line_head))
{
line_head = lr->next;
- vec = str2vec (lr->line);
- if (vec)
+ nargs = parse_libdep (lr->line);
+ arg = lr->line;
+ int i;
+ for (i = 0; i < nargs; i++, arg = strchr (arg, '\0') + 1)
{
- int i;
- for (i = 0; vec[i]; i++)
+ if (arg[0] != '-')
+ {
+ TV_MESSAGE (LDPL_WARNING, "ignoring libdep argument %s", arg);
+ fflush (NULL);
+ continue;
+ }
+ if (arg[1] == 'l')
+ rv = tv_add_input_library (arg + 2);
+ else if (arg[1] == 'L')
+ rv = tv_set_extra_library_path (arg + 2);
+ else
{
- if (vec[i][0] != '-')
- {
- TV_MESSAGE (LDPL_WARNING, "ignoring libdep argument %s",
- vec[i]);
- fflush (NULL);
- continue;
- }
- if (vec[i][1] == 'l')
- rv = tv_add_input_library (vec[i]+2);
- else if (vec[i][1] == 'L')
- rv = tv_set_extra_library_path (vec[i]+2);
- else
- {
- TV_MESSAGE (LDPL_WARNING, "ignoring libdep argument %s",
- vec[i]);
- fflush (NULL);
- }
- if (rv != LDPS_OK)
- break;
+ TV_MESSAGE (LDPL_WARNING, "ignoring libdep argument %s", arg);
+ fflush (NULL);
}
- free (vec);
+ if (rv != LDPS_OK)
+ break;
}
free (lr);
}