[0/2] x86-64: enhancements to GOTPCREL / GOTTPOFF handling

Message ID bf7adb97-b1bc-4268-a8c7-246f2d4a01e6@suse.com
Headers
Series x86-64: enhancements to GOTPCREL / GOTTPOFF handling |

Message

Jan Beulich Jan. 14, 2025, 1:18 p.m. UTC
  While investigating a more fundamental issue (see below) I came to
notice shortcomings with GOTPCREL handling when it comes to APX.
Furthermore, with the advent of MOVRS accessing the GOT is going to
be a primary use case for this new insn (more generally: accessing
anything that effectively is never written to). Therefore the first
patch here extends support for GOTPCREL to EVEX-encoded arithmetic
insns plus MOVRS, while at the same time extending GOTTPOFF support
to MOVRS as well.

This in turn raises a further question: If arithmetic insns like
ADC are deemed okay to use with a @gotpcrel operand (I can see
possible use with ADD; I'm already having a harder time seeing use
with, say, SUB and CMP), why would other arithmetic insns not be
okay to use (and then relax to their immediate operand forms under
the right conditions). That would be primarily IMUL. Going from
there and seeing that MOV really is the primary insn to use with
these relocation types, I'd then further wonder why PUSH wouldn't
also be eligible (which is also possible to relax).

In the course of figuring how existing code work I further noticed
that certain checks are overly lax. I've adjusted right away what
can easily be in the new code cloned from the original one. The 2nd
patch then mirrors this back to pre-existing code.

Now on to the original issue, where the investigation started: I
can't help the impression that what the linker does is too
aggressive. In particular it appears to assume that @gotpcrel can
only be used in small model or x32 code, building upon addresses
of resolved symbols being within the low 4Gb (or within ±2Gb for
PIC code). Relocation overflows can result otherwise, which the
linker then converts to the suggestion to use --no-relax.

That suggestion has two problems: For one I think the linker ought
to work correctly and reliably without special command line options.
And then --no-relax suppresses more of the relaxation than strictly
necessary: For the specific purpose of building the Xen hypervisor
we'd want the (PIC-specific) MOV->LEA transformation (no issue with
the resulting PC-relative relocations), but we'd need to avoid the
memory operand -> immediate operand conversions resulting in
relocation overflows, for being absolute relocs. These overflows
are simply a result of us linking the binary to a base address in
the upper half of address space. And no, linking with -shared or
-pie is not an option, for the linker then refusing to process other
relocations that we use (R_X86_64_32 in particular).

The root of the problem, however, is an apparent compiler
limitation: While we build with -fpic and while we force all our own
symbols to be hidden (to permit respective relaxation), we can't
control visibility of compiler declared symbols. Thus the attempt to
use -fstack-protector -mstack-protector-guard=global will result in
@gotpcrel accesses to __stack_chk_guard. Without --no-relax the
linking of that code will fail (see above), while with --no-relax
there'll be a non-empty .got, which we deliberately reject by way of
a linker script assertion, for being inefficient.

I've been trying, with no success, to derive a strategy to overcome
the linker issue. The main problem appears to be that at the time
the decision is taken whether to relax certain insns (and into which
alternative forms), we don't know yet whether the replacement relocs
would end up overflowing. Thus, unless the whole determination can
be moved (much?) later, the only choice would look to be to undo the
transformations (perhaps just partly, i.e. after initially converting
to MOV-from-immediate, replace that by the less restrictive RIP-
relative LEA; nothing similar is of course possible for arithmetic
insns we convert). Which would of course require there to be enough
information to restore what had been there originally.

1: x86/APX: widen @gotpcrel and @gottpoff support (incl to MOVRS)
2: x86-64: tighten convert-load-reloc checking

Quite likely something will also need doing about the two relocations
remaining with the code thus commented:

	      /* These relocations are added only for completeness and
		 aren't be used.  */

Simply dropping relocations on the floor without failing the linking
process can't be quite right: The diagnostic issued may not be noticed
by people, while the resulting binary is almost certainly going to be
malfunctioning.

Jan