[v5,1/2] x86-64: Save APX and Tile registers in ld.so trampoline
Checks
Context |
Check |
Description |
redhat-pt-bot/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
linaro-tcwg-bot/tcwg_glibc_build--master-arm |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-arm |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 |
success
|
Testing passed
|
Commit Message
Add APX registers to STATE_SAVE_MASK so that APX and Tile registers are
saved in ld.so trampoline. This fixes BZ #31371.
Also update STATE_SAVE_OFFSET and STATE_SAVE_MASK for i386 which will
be used by i386 _dl_tlsdesc_dynamic.
---
sysdeps/x86/sysdep.h | 54 +++++++++++++++++++++++++++++++++++++++-----
1 file changed, 48 insertions(+), 6 deletions(-)
Comments
* H. J. Lu:
> Add APX registers to STATE_SAVE_MASK so that APX and Tile registers are
> saved in ld.so trampoline. This fixes BZ #31371.
First APX is confusing?
What's the impact on xsave_state_size and xsave_state_full_size?
I'm worried that the loader trampoline now overflows small stacks.
We used to have such problems in the past.
Thanks,
Florian
* Florian Weimer:
> * H. J. Lu:
>
>> Add APX registers to STATE_SAVE_MASK so that APX and Tile registers are
>> saved in ld.so trampoline. This fixes BZ #31371.
>
> First APX is confusing?
>
> What's the impact on xsave_state_size and xsave_state_full_size?
> I'm worried that the loader trampoline now overflows small stacks.
> We used to have such problems in the past.
It adds 8,256 bytes on one AMX-capable machine. This will impact
compatibility with existing software. Stack usage is increased even if
the application does not use AMX at all.
I suggest to use a mechanism like STO_AARCH64_VARIANT_PCS to deal with
AMX, and only add the APX flag for now.
Thanks,
Florian
On Thu, Feb 15, 2024 at 1:39 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Florian Weimer:
>
> > * H. J. Lu:
> >
> >> Add APX registers to STATE_SAVE_MASK so that APX and Tile registers are
> >> saved in ld.so trampoline. This fixes BZ #31371.
> >
> > First APX is confusing?
What do you suggest?
> >
> > What's the impact on xsave_state_size and xsave_state_full_size?
> > I'm worried that the loader trampoline now overflows small stacks.
> > We used to have such problems in the past.
>
> It adds 8,256 bytes on one AMX-capable machine. This will impact
> compatibility with existing software. Stack usage is increased even if
> the application does not use AMX at all.
Also AMX may be enabled much later.
> I suggest to use a mechanism like STO_AARCH64_VARIANT_PCS to deal with
> AMX, and only add the APX flag for now.
>
I will drop Tile registers.
@@ -21,14 +21,56 @@
#include <sysdeps/generic/sysdep.h>
+/* The extended state feature IDs in the state component bitmap. */
+#define X86_XSTATE_X87_ID 0
+#define X86_XSTATE_SSE_ID 1
+#define X86_XSTATE_AVX_ID 2
+#define X86_XSTATE_BNDREGS_ID 3
+#define X86_XSTATE_BNDCFG_ID 4
+#define X86_XSTATE_K_ID 5
+#define X86_XSTATE_ZMM_H_ID 6
+#define X86_XSTATE_ZMM_ID 7
+#define X86_XSTATE_PKRU_ID 9
+#define X86_XSTATE_TILECFG_ID 17
+#define X86_XSTATE_TILEDATA_ID 18
+#define X86_XSTATE_APX_F_ID 19
+
+#ifdef __x86_64__
/* Offset for fxsave/xsave area used by _dl_runtime_resolve. Also need
space to preserve RCX, RDX, RSI, RDI, R8, R9 and RAX. It must be
- aligned to 16 bytes for fxsave and 64 bytes for xsave. */
-#define STATE_SAVE_OFFSET (8 * 7 + 8)
-
-/* Save SSE, AVX, AVX512, mask and bound registers. */
-#define STATE_SAVE_MASK \
- ((1 << 1) | (1 << 2) | (1 << 3) | (1 << 5) | (1 << 6) | (1 << 7))
+ aligned to 16 bytes for fxsave and 64 bytes for xsave.
+
+ NB: Is is non-zero because of the 128-byte red-zone. Some registers
+ are saved on stack without adjusting stack pointer first. When we
+ update stack pointer to allocate more space, we need to take the
+ red-zone into account. */
+# define STATE_SAVE_OFFSET (8 * 7 + 8)
+
+/* Save SSE, AVX, AVX512, mask, bound and APX registers. Bound and APX
+ registers are mutually exclusive. */
+# define STATE_SAVE_MASK \
+ ((1 << X86_XSTATE_SSE_ID) \
+ | (1 << X86_XSTATE_AVX_ID) \
+ | (1 << X86_XSTATE_BNDREGS_ID) \
+ | (1 << X86_XSTATE_K_ID) \
+ | (1 << X86_XSTATE_ZMM_H_ID) \
+ | (1 << X86_XSTATE_ZMM_ID) \
+ | (1 << X86_XSTATE_TILECFG_ID) \
+ | (1 << X86_XSTATE_TILEDATA_ID) \
+ | (1 << X86_XSTATE_APX_F_ID))
+#else
+/* Offset for fxsave/xsave area used by _dl_tlsdesc_dynamic. Since i386
+ doesn't have red-zone, use 0 here. */
+# define STATE_SAVE_OFFSET 0
+
+/* Save SSE, AVX, AXV512, mask and bound registers. */
+# define STATE_SAVE_MASK \
+ ((1 << X86_XSTATE_SSE_ID) \
+ | (1 << X86_XSTATE_AVX_ID) \
+ | (1 << X86_XSTATE_BNDREGS_ID) \
+ | (1 << X86_XSTATE_K_ID) \
+ | (1 << X86_XSTATE_ZMM_H_ID))
+#endif
/* Constants for bits in __x86_string_control: */