[v3] s390: Do not use canonical PLT if pointer equality is not needed
Checks
| Context |
Check |
Description |
| linaro-tcwg-bot/tcwg_binutils_build--master-arm |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_binutils_check--master-aarch64 |
success
|
Test passed
|
| linaro-tcwg-bot/tcwg_binutils_check--master-arm |
success
|
Test passed
|
Commit Message
Require pointer equality in executables for symbols with non-PLT
PC-relative relocations, that are likely in address taken context, and
direct relocations, that are likely in function reference context. Do so
for IFUNC symbols defined in a non-shared object. Clear value of PLT
undefined symbols if pointer equality is not needed and do not hash them
in '.gnu.hash' section.
Linkers need to make this distinction in order to decide whether to ask
the dynamic linker to use canonical PLT entries. Normally GOT entries
in shared libraries contain addresses of the respective functions, with
one notable exception: when a PDE (Position-Dependent Executable;
no-PIE) calls the respective function and also takes its address. Such
executables assume that all addresses are known in advance, so they use
addresses of the respective PLT entries. For consistency reasons, all
respective GOT entries in the process must also use them.
When a linker sees that a PDE both calls a function and also takes its
address, it creates a PLT entry and asks the dynamic linker to consider
it canonical by setting the respective undefined symbol's value, which
is normally zero, to the address of this executable PLT entry.
As workaround for GCC 12-14 treat PC32DBL relocation for address taking
instruction "larl rX,<sym>@PLT" as if it was without @PLT suffix and
require pointer equality.
GCC 12-14, since GCC commit 0990d93dd8a4 ("IBM Z: Use @PLT symbols for
local functions in 64-bit mode") [1], unconditionally suffix non-local
symbols with @PLT, regardless of whether they are used in function call
instructions (i.e. brasl) or address taking instructions (i.e. larl).
The assembler therefore generates a PLT32DBL instead of a PC32DBL
relocation for larl. The linker therefore cannot distinguish between
function call and address taking instructions solely from the relocation
type. The latter requiring pointer equality.
This complements GCC commit a2e0a30c52fa ("IBM zSystems: Do not use
@PLT with larl") [2], which makes GCC stop suffixing @PLT to address
taking larl instructions, so that the correct behavior with regards to
pointer equality is also achieved with affected GCC 12-14.
Note that this workaround can be reverted once GCC 12-14 emitting
address taking larl instructions with @PLT suffix have become
irrelevant.
Note that without the workaround for GCC 12-14 suffixing @PLT to larl
the following linker tests would fail:
FAIL: shared
FAIL: visibility (hidden_normal)
FAIL: visibility (hidden_weak)
FAIL: visibility (protected)
FAIL: visibility (protected_undef_def)
FAIL: visibility (protected_weak)
FAIL: visibility (normal)
Based on x86-64, especially Jakub Jelinek's x86 commits 47a9f7b34f7a
(clearing value of PLT undefined symbols if pointer equality not needed)
and fdc90cb46b0f (omitting PLT undefined symbols from '.gnu.hash').
[1] GCC commit 0990d93dd8a4 ("IBM Z: Use @PLT symbols for local
functions in 64-bit mode"),
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=0990d93dd8a4
[2] GCC commit a2e0a30c52fa ("IBM zSystems: Do not use @PLT with larl"),
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a2e0a30c52fa
bfd/
PR ld/29655
* elf64-s390.c (elf_s390_check_relocs): Require pointer equality
for direct and non-PLT PC-relative relocations indicating
address taking instructions and for PLT32DBL relocations, when
used with address taking larl instruction.
(elf_s390_finish_dynamic_symbol): Do not use canonical PLT for
non-local undefined symbols if pointer equality is not needed.
Abort if pointer equality needed flag not set although required.
(elf_s390_copy_indirect_symbol): Copy pointer equality needed
flag.
(elf_s390_hash_symbol): New function. Based on x86-64.
(elf_backend_hash_symbol): Wire up elf_s390_hash_symbol.
ld/testsuite/
PR ld/29655
* ld-elf/shared.exp: Add new pr29655 test.
* ld-elf/pr29655a.c: New file. Based on Rui's sample in PR.
* ld-elf/pr29655b.c: Likewise.
* ld-s390/plt_64-1.wf: Adjust expected test output to change in
.gnu.hash due to omitted PLT undefined symbols that do not need
pointer equality.
* ld-s390/plt_64-1_eh.wf: Likewise.
Bug: https://sourceware.org/PR29655
Co-authored-by: Andreas Krebbel <krebbel@linux.ibm.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v3:
- Turn pr29655 test into a run_cc_link_tests that checks the executable
using readelf --dyn-sym looking for a zero value undefined fun_public.
(Alan)
- clear-xfail arm*-*-* aarch64-*-* pr29655 test. (Linaro-TCWG-CI)
- clear-xfail alpha-*-* hppa-*-* ia64-*-* microblaze-*-* mips-*-*
mips64-*-*. (Alan)
Changes in v2:
- Reword commit message to reflect that function pointer equality is
required for all direct and non-PLT PC-relative relocations and to
mention which tests fail without the GCC 12-14 workaround.
- Fix typo in comment on GCC 12-14 workaround. (Andreas)
- Have main() return 0 in test. (Andreas)
- Adding Nick, Alan, and Jan due to the added common test case.
Note: Splitting the GCC 12-14 workaround into a separate patch either
requires it to be the first patch or causes one shared and multiple
visibility tests to fail. Addressing the latter using a setup_fail
condition as follows (that would get removed by the workaround-patch)
seemed rather odd to me:
# On s390 64-bit (s390x) GCC 12-14 suffix symbols in address
# taken context with @PLT, which breaks function pointer equality.
if { [istarget s390x-*-linux*]
&& [at_least_gcc_version 12 0]
&& ![at_least_gcc_version 15 0] } {
setup_xfail "s390x-*-linux*"
}
bfd/elf64-s390.c | 66 ++++++++++++++++++++++++++++-
ld/testsuite/ld-elf/pr29655.rd | 5 +++
ld/testsuite/ld-elf/pr29655a.c | 20 +++++++++
ld/testsuite/ld-elf/pr29655b.c | 15 +++++++
ld/testsuite/ld-elf/shared.exp | 30 +++++++++++++
ld/testsuite/ld-s390/plt_64-1.wf | 8 ++--
ld/testsuite/ld-s390/plt_64-1_eh.wf | 2 +-
7 files changed, 139 insertions(+), 7 deletions(-)
create mode 100644 ld/testsuite/ld-elf/pr29655.rd
create mode 100644 ld/testsuite/ld-elf/pr29655a.c
create mode 100644 ld/testsuite/ld-elf/pr29655b.c
Comments
On Tue, Mar 17, 2026 at 05:29:52PM +0100, Jens Remus wrote:
> +# PR 29655 (cont.): Check that in PIC code linked as PDE taking the address
> +# of a function defined in a DSO results in the function address (from GOT)
> +# and not the "canonical PLT" address from the PDE.
> +# This is just an optimization and both is valid, although libraries may
> +# depend on this specific behavior, so do not complain loudly.
> +setup_xfail *-*-*
> +clear_xfail aarch64-*-* alpha-*-* arm*-*-* hppa-*-* i?86-*-* ia64-*-* \
> + microblaze-*-* mips-*-* mips64-*-* powerpc*-*-* s390x-*-* x86_64-*-*
I think you should delete the preceding five lines, leaving it up to
target maintainers to add setup_xfail for their targets. No need to
repost with just that changed. The comment here is a little
misleading. This isn't just an optimisation, but rather about
generating a correct program.
I'll explain. In the early days of computing, executables were fixed
position. There was no dynamic linking and all addresses of variable
and functions were known at link time. When shared libraries were
invented this was no longer true, because shared libraries could not
reasonably be assigned fixed addresses. In order to keep the code
model in an executable the same, putting all the burden of supporting
shared libraries in the shared library (plus linker and ld.so),
schemes were invented to keep fixed addresses in the executable.
Global variables in a shared library were copied into a new linker
generated .dynbss section in the executable with dynamic copy
relocations initialising them from the shared library. Functions in a
shared library were defined to have an address on their plt call stub
code in the executable. To make this work, shared library code had to
be more than just PIC, it had to support overriding of variable and
function addresses; For example when a shared library accesses one of
its global variables it must access the copy in the executable.
Then symbol visibility was invented to avoid the shared library
inefficiency of supporting overriding when it isn't necessary.
Protected visibility complicates things. If there are two or more
real copies of a variable or function then you expect that modifying
one variable won't affect the copy, and that the function address of a
protected function in a shared library resolves to the function in the
shared library, not another function with the same name in the
executable. A .dynbss copy or a function symbol defined on a call
stub however is a linker generated artefact. Neither appear in the
source and thus should not affect user visible program operation.
That means a protected variable should not generate a copy (*) in
.dynbss as the copy won't be updated when shared library code modifies
that variable, nor will modification of the copy by the executable be
seen by shared library code. Instead the linker should error, forcing
a user to recompile accesses to that variable in the executable as PIC
indirect or the linker should edit code in the executable, and emit a
dynamic relocation. Likewise the linker should not define the address
of a protected function in a shared library as the address on its call
stub in the executable. Instead the linker should complain or edit
code using the address to PIC indirect, and emit dynamic relocations.
There are other details to the necessary linker editing.. But if
linker code editing is not implemented, then dynamic relocations are
needed in read-only sections, which should cause linker complaints.
The testcase you are fixing is even weirder. It is effectively
implementing protected visibility with hidden visibility plus an
alias, likely due to wrong handling of protected visibility. That
means the linker doesn't warn about and can't edit non-PIC code to
properly support taking the address of the function. However, if
executable code is PIC, it is wrong to needlessly define the function
symbol on the plt call stub. That isn't an optimisation, it's
avoiding wrong program behaviour.
*) HJ Lu implemented a complex scheme that did in fact copy protected
visibility variables into .dynbss. I thought the idea wrong at the
time and said so, because it negates the reason non-default visibility
was invented. I think it has largely been abandoned.
> +run_cc_link_tests [list \
> + [list \
> + "Build pr29655" \
> + "$NOPIE_LDFLAGS -Wl,--no-as-needed,-rpath,tmpdir tmpdir/pr29655.so" \
> + "-fPIC" \
> + { pr29655b.c } \
> + {{readelf {--dyn-syms --wide} pr29655.rd}} \
> + "pr29655" \
> + ] \
> +]
Hello Alan,
thanks a lot for your detailed explanation! This is much appreciated!
On 3/19/2026 6:42 AM, Alan Modra wrote:
> On Tue, Mar 17, 2026 at 05:29:52PM +0100, Jens Remus wrote:
>> +# PR 29655 (cont.): Check that in PIC code linked as PDE taking the address
>> +# of a function defined in a DSO results in the function address (from GOT)
>> +# and not the "canonical PLT" address from the PDE.
>> +# This is just an optimization and both is valid, although libraries may
>> +# depend on this specific behavior, so do not complain loudly.
>> +setup_xfail *-*-*
>> +clear_xfail aarch64-*-* alpha-*-* arm*-*-* hppa-*-* i?86-*-* ia64-*-* \
>> + microblaze-*-* mips-*-* mips64-*-* powerpc*-*-* s390x-*-* x86_64-*-*
>
> I think you should delete the preceding five lines, leaving it up to
> target maintainers to add setup_xfail for their targets. No need to
> repost with just that changed. The comment here is a little
> misleading. This isn't just an optimisation, but rather about
> generating a correct program.
Ok. Thank you for the approval (for adding the common test)!
Regards,
Jens
@@ -852,6 +852,7 @@ elf_s390_copy_indirect_symbol (struct bfd_link_info *info,
dir->ref_regular |= ind->ref_regular;
dir->ref_regular_nonweak |= ind->ref_regular_nonweak;
dir->needs_plt |= ind->needs_plt;
+ dir->pointer_equality_needed |= ind->pointer_equality_needed;
}
else
_bfd_elf_link_hash_copy_indirect (info, dir, ind);
@@ -1037,6 +1038,7 @@ elf_s390_check_relocs (bfd *abfd,
referenced. */
h->ref_regular = 1;
h->needs_plt = 1;
+ h->pointer_equality_needed = 1;
}
}
@@ -1078,6 +1080,38 @@ elf_s390_check_relocs (bfd *abfd,
{
h->needs_plt = 1;
h->plt.refcount += 1;
+
+ /* GCC 12-14 unconditionally suffix non-local symbols
+ with @PLT, regardless of whether they are used in
+ function call instructions (i.e. brasl) or address
+ taking instructions (i.e. larl). Treat PLT32DBL
+ relocation for "larl rX,<sym>@PLT" instruction as
+ address taking and require pointer equality. */
+ if (bfd_link_executable (info)
+ && r_type == R_390_PLT32DBL
+ && rel->r_offset >= 2)
+ {
+ bfd_byte *contents;
+ void *insn_start;
+ uint16_t op;
+
+ if (elf_section_data (sec)->this_hdr.contents != NULL)
+ contents = elf_section_data (sec)->this_hdr.contents;
+ else if (!_bfd_elf_mmap_section_contents (abfd, sec, &contents))
+ return false;
+
+ insn_start = contents + rel->r_offset - 2;
+ op = bfd_get_16 (abfd, insn_start) & 0xff0f;
+
+ if (op == 0xc000)
+ {
+ /* larl rX,<sym>@PLT */
+ h->pointer_equality_needed = 1;
+ }
+
+ if (elf_section_data (sec)->this_hdr.contents != contents)
+ _bfd_elf_munmap_section_contents (sec, contents);
+ }
}
break;
@@ -1227,6 +1261,12 @@ elf_s390_check_relocs (bfd *abfd,
refers to is in a shared lib. */
h->plt.refcount += 1;
}
+
+ /* Require pointer equality in PDE for above PC-relative
+ relocations, that are likely in address taken context,
+ and direct relocations, that are likely in function
+ reference context. */
+ h->pointer_equality_needed = 1;
}
/* If we are creating a shared library, and this is a reloc
@@ -3692,11 +3732,16 @@ elf_s390_finish_dynamic_symbol (bfd *output_bfd,
if (!h->def_regular)
{
/* Mark the symbol as undefined, rather than as defined in
- the .plt section. Leave the value alone. This is a clue
+ the .plt section. Leave the value if there were any
+ relocations where pointer equality matters (this is a clue
for the dynamic linker, to make function pointer
comparisons work between an application and shared
- library. */
+ library), otherwise set it to zero. If a function is only
+ called from a binary, there is no need to slow down
+ shared libraries because of that. */
sym->st_shndx = SHN_UNDEF;
+ if (!h->pointer_equality_needed)
+ sym->st_value = 0;
}
}
}
@@ -3730,6 +3775,9 @@ elf_s390_finish_dynamic_symbol (bfd *output_bfd,
}
else
{
+ if (!h->pointer_equality_needed)
+ abort ();
+
/* For non-shared objects explicit GOT slots must be
filled with the PLT slot address for pointer
equality reasons. */
@@ -4344,6 +4392,19 @@ elf_s390_create_dynamic_sections (bfd *dynobj,
return true;
}
+/* Return TRUE if symbol should be hashed in the `.gnu.hash' section. */
+
+static bool
+elf_s390_hash_symbol (struct elf_link_hash_entry *h)
+{
+ if (h->plt.offset != (bfd_vma) -1
+ && !h->def_regular
+ && !h->pointer_equality_needed)
+ return false;
+
+ return _bfd_elf_hash_symbol (h);
+}
+
/* Why was the hash table entry size definition changed from
ARCH_SIZE/8 to 4? This breaks the 64 bit dynamic linker and
this is the only reason for the s390_elf64_size_info structure. */
@@ -4424,6 +4485,7 @@ static const struct elf_size_info s390_elf64_size_info =
#define elf_backend_sort_relocs_p elf_s390_elf_sort_relocs_p
#define elf_backend_additional_program_headers elf_s390_additional_program_headers
#define elf_backend_modify_segment_map elf_s390_modify_segment_map
+#define elf_backend_hash_symbol elf_s390_hash_symbol
#define bfd_elf64_mkobject elf_s390_mkobject
#define elf_backend_object_p elf_s390_object_p
new file mode 100644
@@ -0,0 +1,5 @@
+Symbol table '\.dynsym' contains [0-9]+ entries:
+ +Num: +Value +Size Type +Bind +Vis +Ndx Name
+#...
+ +[0-9]+: +0+ +0 +FUNC +GLOBAL +DEFAULT +UND +fun_public
+#...
new file mode 100644
@@ -0,0 +1,20 @@
+#include <stdio.h>
+
+typedef void Fn();
+
+void __attribute__((visibility("hidden")))
+fun (void)
+{}
+
+extern void fun_public() __attribute__((alias("fun")));
+
+void
+call_callback (Fn *callback)
+{
+ if (callback == fun)
+ printf("PASS\n");
+ else
+ printf("FAIL\n");
+
+ callback ();
+}
new file mode 100644
@@ -0,0 +1,15 @@
+#ifndef __PIC__
+#error "this file must be compiled with -fPIC"
+#endif
+
+typedef void Fn();
+void fun_public(void);
+void call_callback(Fn *callback);
+
+int
+main ()
+{
+ fun_public ();
+ call_callback (fun_public);
+ return 0;
+}
@@ -1863,3 +1863,33 @@ run_ld_link_tests [list \
"pr23658-2" \
] \
]
+
+# PR 29655
+run_cc_link_tests [list \
+ [list \
+ "Build pr29655.so" \
+ "-shared" \
+ "-fPIC" \
+ { pr29655a.c } \
+ {} \
+ "pr29655.so" \
+ ] \
+]
+# PR 29655 (cont.): Check that in PIC code linked as PDE taking the address
+# of a function defined in a DSO results in the function address (from GOT)
+# and not the "canonical PLT" address from the PDE.
+# This is just an optimization and both is valid, although libraries may
+# depend on this specific behavior, so do not complain loudly.
+setup_xfail *-*-*
+clear_xfail aarch64-*-* alpha-*-* arm*-*-* hppa-*-* i?86-*-* ia64-*-* \
+ microblaze-*-* mips-*-* mips64-*-* powerpc*-*-* s390x-*-* x86_64-*-*
+run_cc_link_tests [list \
+ [list \
+ "Build pr29655" \
+ "$NOPIE_LDFLAGS -Wl,--no-as-needed,-rpath,tmpdir tmpdir/pr29655.so" \
+ "-fPIC" \
+ { pr29655b.c } \
+ {{readelf {--dyn-syms --wide} pr29655.rd}} \
+ "pr29655" \
+ ] \
+]
@@ -19,14 +19,14 @@ Contents of the .eh_frame section:
DW_CFA_nop
DW_CFA_nop
-00000018 000000000000001c 0000001c FDE cie=00000000 pc=00000000010002b8..00000000010002e4
+00000018 000000000000001c 0000001c FDE cie=00000000 pc=00000000010002b0..00000000010002dc
DW_CFA_remember_state
- DW_CFA_advance_loc: 6 to 00000000010002be
+ DW_CFA_advance_loc: 6 to 00000000010002b6
DW_CFA_offset: r14 at cfa-48
DW_CFA_offset: r15 at cfa-40
- DW_CFA_advance_loc: 8 to 00000000010002c6
+ DW_CFA_advance_loc: 8 to 00000000010002be
DW_CFA_def_cfa_offset: 320
- DW_CFA_advance_loc: 24 to 00000000010002de
+ DW_CFA_advance_loc: 24 to 00000000010002d6
DW_CFA_restore_state
DW_CFA_nop
DW_CFA_nop
@@ -19,7 +19,7 @@ Contents of the .eh_frame section:
DW_CFA_nop
DW_CFA_nop
-00000018 0000000000000014 0000001c FDE cie=00000000 pc=0000000001000258..00000000010002b8
+00000018 0000000000000014 0000001c FDE cie=00000000 pc=0000000001000250..00000000010002b0
DW_CFA_nop
DW_CFA_nop
DW_CFA_nop