[14/22] aarch64: Add GCS support to the unwinder
Commit Message
From: Szabolcs Nagy <szabolcs.nagy@arm.com>
Follows the current linux ABI that uses single signal entry token
and shared shadow stack between thread and alt stack.
Could be behind __ARM_FEATURE_GCS_DEFAULT ifdef (only do anything
special with gcs compat codegen) but there is a runtime check anyway.
Change affected tests to be compatible with -mbranch-protection=standard
gcc/testsuite/ChangeLog:
* g++.target/aarch64/pr94515-1.C (f1_no_pac_ret): Update.
(main): Update.
Co-authored-by: Matthieu Longo <matthieu.longo@arm.com>
* gcc.target/aarch64/pr104689.c (unwind): Update.
Co-authored-by: Matthieu Longo <matthieu.longo@arm.com>
libgcc/ChangeLog:
* config/aarch64/aarch64-unwind.h (_Unwind_Frames_Extra): Update.
(_Unwind_Frames_Increment): Define.
Co-authored-by: Matthieu Longo <matthieu.longo@arm.com>
---
gcc/testsuite/g++.target/aarch64/pr94515-1.C | 6 +-
gcc/testsuite/gcc.target/aarch64/pr104689.c | 3 +-
libgcc/config/aarch64/aarch64-unwind.h | 59 +++++++++++++++++++-
3 files changed, 63 insertions(+), 5 deletions(-)
Comments
Yury Khrustalev <yury.khrustalev@arm.com> writes:
> From: Szabolcs Nagy <szabolcs.nagy@arm.com>
>
> Follows the current linux ABI that uses single signal entry token
> and shared shadow stack between thread and alt stack.
> Could be behind __ARM_FEATURE_GCS_DEFAULT ifdef (only do anything
> special with gcs compat codegen) but there is a runtime check anyway.
>
> Change affected tests to be compatible with -mbranch-protection=standard
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/aarch64/pr94515-1.C (f1_no_pac_ret): Update.
> (main): Update.
> Co-authored-by: Matthieu Longo <matthieu.longo@arm.com>
>
> * gcc.target/aarch64/pr104689.c (unwind): Update.
> Co-authored-by: Matthieu Longo <matthieu.longo@arm.com>
Too many Co-authored-bys :) The one below is indented correctly.
> libgcc/ChangeLog:
>
> * config/aarch64/aarch64-unwind.h (_Unwind_Frames_Extra): Update.
> (_Unwind_Frames_Increment): Define.
>
> Co-authored-by: Matthieu Longo <matthieu.longo@arm.com>
> ---
> gcc/testsuite/g++.target/aarch64/pr94515-1.C | 6 +-
> gcc/testsuite/gcc.target/aarch64/pr104689.c | 3 +-
> libgcc/config/aarch64/aarch64-unwind.h | 59 +++++++++++++++++++-
> 3 files changed, 63 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/testsuite/g++.target/aarch64/pr94515-1.C b/gcc/testsuite/g++.target/aarch64/pr94515-1.C
> index 359039e1753..8175ea50c32 100644
> --- a/gcc/testsuite/g++.target/aarch64/pr94515-1.C
> +++ b/gcc/testsuite/g++.target/aarch64/pr94515-1.C
> @@ -5,7 +5,7 @@
>
> volatile int zero = 0;
>
> -__attribute__((noinline, target("branch-protection=none")))
> +__attribute__((noinline, target("branch-protection=bti")))
> void unwind (void)
> {
> if (zero == 0)
> @@ -22,7 +22,7 @@ int test (int z)
> // autiasp -> cfi_negate_ra_state: RA_signing_SP -> RA_no_signing
> return 1;
> } else {
> - // 2nd cfi_negate_ra_state because the CFI directives are processed linearily.
> + // 2nd cfi_negate_ra_state because the CFI directives are processed linearly.
> // At this point, the unwinder would believe that the address is not signed
> // due to the previous return. That's why the compiler has to emit second
> // cfi_negate_ra_state to mean that the return address is still signed.
> @@ -33,7 +33,7 @@ int test (int z)
> }
> }
>
> -__attribute__((target("branch-protection=none")))
> +__attribute__((target("branch-protection=bti")))
> int main ()
> {
> try {
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr104689.c b/gcc/testsuite/gcc.target/aarch64/pr104689.c
> index 3b7adbdfe7d..9688ecc85f9 100644
> --- a/gcc/testsuite/gcc.target/aarch64/pr104689.c
> +++ b/gcc/testsuite/gcc.target/aarch64/pr104689.c
> @@ -98,6 +98,7 @@ asm(""
> "unusual_no_pac_ret:\n"
> " .cfi_startproc\n"
> " " SET_RA_STATE_0 "\n"
> +" bti c\n"
> " stp x29, x30, [sp, -16]!\n"
> " .cfi_def_cfa_offset 16\n"
> " .cfi_offset 29, -16\n"
> @@ -121,7 +122,7 @@ static void f2_pac_ret (void)
> die ();
> }
>
> -__attribute__((target("branch-protection=none")))
> +__attribute__((target("branch-protection=bti")))
> static void f1_no_pac_ret (void)
> {
> unusual_pac_ret (f2_pac_ret);
Could you explain these testsuite changes in more detail? It seems
on the face of it that they're changing the tests to test something
other than the original intention.
Having new tests alongside the same lines would be fine though.
> diff --git a/libgcc/config/aarch64/aarch64-unwind.h b/libgcc/config/aarch64/aarch64-unwind.h
> index 4d36f0b26f7..cf4ec749c05 100644
> --- a/libgcc/config/aarch64/aarch64-unwind.h
> +++ b/libgcc/config/aarch64/aarch64-unwind.h
> @@ -178,6 +178,9 @@ aarch64_demangle_return_addr (struct _Unwind_Context *context,
> return addr;
> }
>
> +/* GCS enable flag for chkfeat instruction. */
> +#define CHKFEAT_GCS 1
> +
> /* SME runtime function local to libgcc, streaming compatible
> and preserves more registers than the base PCS requires, but
> we don't rely on that here. */
> @@ -185,12 +188,66 @@ __attribute__ ((visibility ("hidden")))
> void __libgcc_arm_za_disable (void);
>
> /* Disable the SME ZA state in case an unwound frame used the ZA
> - lazy saving scheme. */
> + lazy saving scheme. And unwind the GCS for EH. */
> #undef _Unwind_Frames_Extra
> #define _Unwind_Frames_Extra(x) \
> do \
> { \
> __libgcc_arm_za_disable (); \
> + if (__builtin_aarch64_chkfeat (CHKFEAT_GCS) == 0) \
> + { \
> + for (_Unwind_Word n = (x); n != 0; n--) \
> + __builtin_aarch64_gcspopm (); \
> + } \
> + } \
> + while (0)
> +
> +/* On signal entry the OS places a token on the GCS that can be used to
> + verify the integrity of the GCS pointer on signal return. It also
> + places the signal handler return address (the restorer that calls the
> + signal return syscall) on the GCS so the handler can return.
> + Because of this token, each stack frame visited during unwinding has
> + exactly one corresponding entry on the GCS, so the frame count is
> + the number of entries that will have to be popped at EH return time.
> +
> + Note: This depends on the GCS signal ABI of the OS.
> +
> + When unwinding across a stack frame for each frame the corresponding
> + entry is checked on the GCS against the computed return address from
> + the normal stack. If they don't match then _URC_FATAL_PHASE2_ERROR
> + is returned. This check is omitted if
> +
> + 1. GCS is disabled. Note: asynchronous GCS disable is supported here
> + if GCSPR and the GCS remains readable.
> + 2. Non-catchable exception where exception_class == 0. Note: the
> + pthread cancellation implementation in glibc sets exception_class
> + to 0 when the unwinder is used for cancellation cleanup handling,
> + so this allows the GCS to get out of sync during cancellation.
> + This weakens security but avoids an ABI break in glibc.
> + 3. Zero return address which marks the outermost stack frame.
I suppose this is a question for the x86 implementation too, but:
doesn't this weaken the checks somewhat? I would imagine zero is the
easiest value to force. Would it be possible to add some extra sanity
check that this really is the outermost frame? E.g. IIUC, the entry
above the outermost frame's would be a cap, or at least would not be
a valid procedure return record, so could we check for that?
Thanks,
Richard
> + 4. Signal stack frame, the GCS entry is an OS specific token then
> + with the top bit set.
> + */
> +#undef _Unwind_Frames_Increment
> +#define _Unwind_Frames_Increment(exc, context, frames) \
> + do \
> + { \
> + frames++; \
> + if (__builtin_aarch64_chkfeat (CHKFEAT_GCS) != 0 \
> + || exc->exception_class == 0 \
> + || _Unwind_GetIP (context) == 0) \
> + break; \
> + const _Unwind_Word *gcs = __builtin_aarch64_gcspr (); \
> + if (_Unwind_IsSignalFrame (context)) \
> + { \
> + if (gcs[frames] >> 63 == 0) \
> + return _URC_FATAL_PHASE2_ERROR; \
> + } \
> + else \
> + { \
> + if (gcs[frames] != _Unwind_GetIP (context)) \
> + return _URC_FATAL_PHASE2_ERROR; \
> + } \
> } \
> while (0)
Hi Richard,
On Thu, Oct 24, 2024 at 05:27:24PM +0100, Richard Sandiford wrote:
> Yury Khrustalev <yury.khrustalev@arm.com> writes:
> > From: Szabolcs Nagy <szabolcs.nagy@arm.com>
>
> Could you explain these testsuite changes in more detail? It seems
> on the face of it that they're changing the tests to test something
> other than the original intention.
After reviewing I agree that this change to the tests is not needed in
the context of GCS work. I will remove this from the patch.
>
> Having new tests alongside the same lines would be fine though.
Agree, but I think it's better to submit these test changes separately.
> > +/* On signal entry the OS places a token on the GCS that can be used to
> > + verify the integrity of the GCS pointer on signal return. It also
> > + places the signal handler return address (the restorer that calls the
> > + signal return syscall) on the GCS so the handler can return.
> > + Because of this token, each stack frame visited during unwinding has
> > + exactly one corresponding entry on the GCS, so the frame count is
> > + the number of entries that will have to be popped at EH return time.
> > +
> > + Note: This depends on the GCS signal ABI of the OS.
> > +
> > + When unwinding across a stack frame for each frame the corresponding
> > + entry is checked on the GCS against the computed return address from
> > + the normal stack. If they don't match then _URC_FATAL_PHASE2_ERROR
> > + is returned. This check is omitted if
> > +
> > + 1. GCS is disabled. Note: asynchronous GCS disable is supported here
> > + if GCSPR and the GCS remains readable.
> > + 2. Non-catchable exception where exception_class == 0. Note: the
> > + pthread cancellation implementation in glibc sets exception_class
> > + to 0 when the unwinder is used for cancellation cleanup handling,
> > + so this allows the GCS to get out of sync during cancellation.
> > + This weakens security but avoids an ABI break in glibc.
> > + 3. Zero return address which marks the outermost stack frame.
>
> I suppose this is a question for the x86 implementation too, but:
> doesn't this weaken the checks somewhat? I would imagine zero is the
> easiest value to force. Would it be possible to add some extra sanity
> check that this really is the outermost frame? E.g. IIUC, the entry
> above the outermost frame's would be a cap, or at least would not be
> a valid procedure return record, so could we check for that?
Zero might be indeeded easier to forge, however in practice it would mean
that execution would not be able to return anywhere, so one could speculate
this would be hard to exploit in practice.
When we are in the middle of unwinding and hit zero return address (as seen
by the undwinder) we could theoretically check if current GCS entry is non-
zero (in which case it would mean that stack has been corrupted and we need
to raise an error). However if we are really at the outermost frame and GCS
stack has been unwound properly, we would not be able to read from shadow
stack as we have already reached to top of its allocation. This is why we
need to if _Unwind_GetIP (context) == 0 before we access GCS pointer via
__builtin_aarch64_gcspr ().
So I will leave this as implemented by Szabolcs.
Kind regards,
Yury
>
> Thanks,
> Richard
>
> > + 4. Signal stack frame, the GCS entry is an OS specific token then
> > + with the top bit set.
> > + */
> > +#undef _Unwind_Frames_Increment
> > +#define _Unwind_Frames_Increment(exc, context, frames) \
> > + do \
> > + { \
> > + frames++; \
> > + if (__builtin_aarch64_chkfeat (CHKFEAT_GCS) != 0 \
> > + || exc->exception_class == 0 \
> > + || _Unwind_GetIP (context) == 0) \
> > + break; \
> > + const _Unwind_Word *gcs = __builtin_aarch64_gcspr (); \
> > + if (_Unwind_IsSignalFrame (context)) \
> > + { \
> > + if (gcs[frames] >> 63 == 0) \
> > + return _URC_FATAL_PHASE2_ERROR; \
> > + } \
> > + else \
> > + { \
> > + if (gcs[frames] != _Unwind_GetIP (context)) \
> > + return _URC_FATAL_PHASE2_ERROR; \
> > + } \
> > } \
> > while (0)
@@ -5,7 +5,7 @@
volatile int zero = 0;
-__attribute__((noinline, target("branch-protection=none")))
+__attribute__((noinline, target("branch-protection=bti")))
void unwind (void)
{
if (zero == 0)
@@ -22,7 +22,7 @@ int test (int z)
// autiasp -> cfi_negate_ra_state: RA_signing_SP -> RA_no_signing
return 1;
} else {
- // 2nd cfi_negate_ra_state because the CFI directives are processed linearily.
+ // 2nd cfi_negate_ra_state because the CFI directives are processed linearly.
// At this point, the unwinder would believe that the address is not signed
// due to the previous return. That's why the compiler has to emit second
// cfi_negate_ra_state to mean that the return address is still signed.
@@ -33,7 +33,7 @@ int test (int z)
}
}
-__attribute__((target("branch-protection=none")))
+__attribute__((target("branch-protection=bti")))
int main ()
{
try {
@@ -98,6 +98,7 @@ asm(""
"unusual_no_pac_ret:\n"
" .cfi_startproc\n"
" " SET_RA_STATE_0 "\n"
+" bti c\n"
" stp x29, x30, [sp, -16]!\n"
" .cfi_def_cfa_offset 16\n"
" .cfi_offset 29, -16\n"
@@ -121,7 +122,7 @@ static void f2_pac_ret (void)
die ();
}
-__attribute__((target("branch-protection=none")))
+__attribute__((target("branch-protection=bti")))
static void f1_no_pac_ret (void)
{
unusual_pac_ret (f2_pac_ret);
@@ -178,6 +178,9 @@ aarch64_demangle_return_addr (struct _Unwind_Context *context,
return addr;
}
+/* GCS enable flag for chkfeat instruction. */
+#define CHKFEAT_GCS 1
+
/* SME runtime function local to libgcc, streaming compatible
and preserves more registers than the base PCS requires, but
we don't rely on that here. */
@@ -185,12 +188,66 @@ __attribute__ ((visibility ("hidden")))
void __libgcc_arm_za_disable (void);
/* Disable the SME ZA state in case an unwound frame used the ZA
- lazy saving scheme. */
+ lazy saving scheme. And unwind the GCS for EH. */
#undef _Unwind_Frames_Extra
#define _Unwind_Frames_Extra(x) \
do \
{ \
__libgcc_arm_za_disable (); \
+ if (__builtin_aarch64_chkfeat (CHKFEAT_GCS) == 0) \
+ { \
+ for (_Unwind_Word n = (x); n != 0; n--) \
+ __builtin_aarch64_gcspopm (); \
+ } \
+ } \
+ while (0)
+
+/* On signal entry the OS places a token on the GCS that can be used to
+ verify the integrity of the GCS pointer on signal return. It also
+ places the signal handler return address (the restorer that calls the
+ signal return syscall) on the GCS so the handler can return.
+ Because of this token, each stack frame visited during unwinding has
+ exactly one corresponding entry on the GCS, so the frame count is
+ the number of entries that will have to be popped at EH return time.
+
+ Note: This depends on the GCS signal ABI of the OS.
+
+ When unwinding across a stack frame for each frame the corresponding
+ entry is checked on the GCS against the computed return address from
+ the normal stack. If they don't match then _URC_FATAL_PHASE2_ERROR
+ is returned. This check is omitted if
+
+ 1. GCS is disabled. Note: asynchronous GCS disable is supported here
+ if GCSPR and the GCS remains readable.
+ 2. Non-catchable exception where exception_class == 0. Note: the
+ pthread cancellation implementation in glibc sets exception_class
+ to 0 when the unwinder is used for cancellation cleanup handling,
+ so this allows the GCS to get out of sync during cancellation.
+ This weakens security but avoids an ABI break in glibc.
+ 3. Zero return address which marks the outermost stack frame.
+ 4. Signal stack frame, the GCS entry is an OS specific token then
+ with the top bit set.
+ */
+#undef _Unwind_Frames_Increment
+#define _Unwind_Frames_Increment(exc, context, frames) \
+ do \
+ { \
+ frames++; \
+ if (__builtin_aarch64_chkfeat (CHKFEAT_GCS) != 0 \
+ || exc->exception_class == 0 \
+ || _Unwind_GetIP (context) == 0) \
+ break; \
+ const _Unwind_Word *gcs = __builtin_aarch64_gcspr (); \
+ if (_Unwind_IsSignalFrame (context)) \
+ { \
+ if (gcs[frames] >> 63 == 0) \
+ return _URC_FATAL_PHASE2_ERROR; \
+ } \
+ else \
+ { \
+ if (gcs[frames] != _Unwind_GetIP (context)) \
+ return _URC_FATAL_PHASE2_ERROR; \
+ } \
} \
while (0)